 |
|
|
 |
|

Welcome to the LTRC website! |
|
The Language Technologies Research
Centre (LTRC) is a newly created research institute
dedicated to the language industry and language technology.
As a unique collaboration between the university, public
sector, and private sector communities, the LTRC aims
to become a leader in language technology on both the
national and international level.
To find out more about the LTRC and its members, click on one of the menus at the sides of the screen. This site is still under construction, so be sure to visit often, as new information is added regularly.


|
Seminar: How to use and extend statistical language models for information retrieval? |
| |
Jian-Yun Nie, Professor, Department of Computer Science and Operations Research, Université de Montréal
December 15, 2006
Abstract
Statistical language models have been developed to capture linguistic features hidden in texts, such as the probability of words or word sequences in a language. During the last decade, these models have also been successfully used in information retrieval (IR). In this presentation, we will first describe the basic approaches to IR using language models. We observe that a strong limitation of these models is the assumption of word independence. So extensions are proposed to take into account word dependencies. Two types of dependence are considered: those between words in a query and those between a document word and a query word. Our experiments show that all these extensions can produce large and significant improvements in retrieval effectiveness. Finally, we will show an application of language models to cross-language IR.
Biography
Jian-Yun Nie is a full Professor at the Université de Montréal's computer Science and Operational Research Department. He obtained a Ph. D. in computer science from the Université Joseph Fourier of Grenoble in 1990. His research focuses on information retrieval (IR), and he works on both theoretical and practical aspects of it. In particular, he tries to integrate artificial intelligence techniques and NLP into IR. Professor Nie is also interested in the methodology adapted to open systems that use computer network to implant easily accessible systems.
|
 |

|
| |
LISA Forum Europe
Europe's Eastern Frontier
Doing Business in an Expanding Europe
November 13 - 17, 2006
The expansion of the European Union has made Central and Eastern Europe increasingly important, both as a market for goods and services and as a low-cost location for globalization outsourcing and production. This year’s annual European meeting of the Localization Industry Standards Association (LISA) will focus on the opportunities and challenges associated with this region’s rapid growth and newfound prominence. As the European Union works through the historical and regulatory challenges associated with economic consolidation and growth, this ground-breaking conference will address localization business issues specific to the region’s legal, banking, life sciences, manufacturing, IT/Telco, government and multimedia industries. Special attention will be given to language and translation technologies in web development, content production, management and distribution.
A three-day exhibition featuring the industry's leading language technology developers of machine translation, content management and workflow system, web-services, internationalization, translation and localization suppliers will take place during the forum.
http://www.lisa.org/events/2006warsaw/ |
 |

|
| |
The Directory of Terminologists Practising in Canada Is Finally Here! |
| |
On October 24, the LTRC will welcome the members of the Joint Committee on Terminology in Canada (JCTC).
These terminology enthusiasts are eager to unveil the prototype of the Directory of Terminologists Practising in Canada.
The JCTC – a multi-sectoral partnership composed of university, private sector and Government of Canada’s Translation Bureau representatives – is committed to promoting the terminology profession in Canada, and the creation of the Directory is a step toward achieving that goal. This one-of-a-kind resource will help to raise awareness of the profession across the country.
Get ready, fellow language professionals: the Directory is finally here! |
 |

|
| |
Official Inauguration of the LTRC |
| |
Located in a brand new building on UQO’s Alexandre Taché campus in Gatineau, the LRTC provides a stimulating environment where researchers, academics, entrepreneurs and government specialists can work in synergy under the same roof.
Photos from the opening ceremony |
 |

|
| |
Seminar: Optimal And Information Theoretic Syntactic Pattern Recognition |
| |
B. John Oommen, Professor and Fellow of the IEEE, School for Computer Science, Carleton University
Wednesday February 8, 2006
Abstract
In this talk we shall show how we can achieve Information Theoretic Optimal Syntactic Pattern Recognition for arbitrary systems. We do this by presenting a new model for noisy channels which permits arbitrarily distributed substitution, deletion and insertion errors. Apart from its straightforward applications in string generation and recognition, the model also has potential applications in speech and uni-dimensional signal processing. The model, which is specified in terms of a noisy string generation technique, is functionally complete and stochastically consistent.
Apart from presenting the channel we also specify a technique by which Pr[Y|U], the probability of receiving Y given that U was transmitted, can be computed in cubic time. This procedure involves dynamic programming, and is to our knowledge, among the few non-trivial applications of dynamic programming which evaluate quantities involving relatively complex combinatorial expressions and which simultaneously maintain rigid probability consistency constraints.
This work was done in collaboration with: Prof. R. L. Kashyap School of Elec. Engg., Purdue University, W. Lafayette ; IN : 47907.
The results reported here won the Honourable Mention Paper of the Year Award from the journal Pattern Recognition. The results have been dedicated to the honour of the late Prof. K-S. Fu from Purdue University.
Biography
Dr. John Oommen is a Professor in the School of Computer Science at Carleton University. He obtained a Bachelor of Technology in Electrical Engineering from the Indian Institute of Technology (Madras) in 1975, a Master of Engineering in Electrical Engineering from the Indian Institute of Science (Bangalore) in 1977, and a Master of Science (1979) and a Doctor of Philosophy (1982) in the Department of Engineering, Purdue University, USA. A Fellow of the IEEE, Dr. Oommen has over 240 refereed publications, which have appeared as either journal or conference papers. His research spans the areas of Learning Systems, Data Retrieval and Storage, Statistical Pattern Recognition, Syntactical Pattern Recognition, Robotics, Adaptive Data Structures, Artificial Neural Networks, Query Optimization in Database Systems and Data Compression. Among many other distinctions and prizes he has received, his paper on "Optimal and Information Theoretic Syntactic Pattern Recognition'' won the Honorable mention of the Year Award in 1998, from the journal Pattern Recognition, and his paper "Stochastic Generalized Pursuit Learning Algorithms'' was nominated for the Best Paper of the Year Award in 2003, from the journal IEEE Transactions on Systems, Man and Cybernetics. He is also on the Editorial Board of two highly-acclaimed journals.
|
|

|
| |
Promising results for the Barçah Project January 25, 2006 |
| |
Terminometrics measure term or terminology usage in a population. For example, we've noticed the recent emergence of the French term dessin intelligent, rarely used just over a year ago. Terminometrics may be comparative (does the population prefer the French terms: dessin intelligent or dessein intelligent, or rather the English term: intelligent design?) and time-dependent (what is the trend in usage over time?).
The LTRC's Barçah research project has created an environment that will make it possible to conduct terminometric studies - or measure terminology implantation - on entire subject fields.

The Barçah software produces an accelerated and efficient measurement of terminology use within a subject field. It sorts the texts according to their language, indexes the corpuses, examines them, presents contexts in a user-friendly environment for manual disambiguation and assists in results management.
In March 2005, the Barçah prototype received an honourable mention in the Software application - large organizations category during the Mérites du français dans les technologies de l'information (Merits of French in Information Technology 2005) awards ceremony. Since then, the Barçah research team has been working on a second version of the software.
With the second version of the Barçah terminometric prototype, the step required for disambiguation will be semi-automatic. Indeed, the main problem with terminometrics is its ambiguity. For example, in golf, the terms aigle, aiglon and moins-deux all represent the notion of "two under par." However, these terms are ambiguous, and a lot more so than you would realize at first glance. Aigle, in particular, also refers to a bird of prey, a military ensign or a corporate name; it's even a city in Switzerland! The second version of Barçah allows the semi-automatic disambiguation of terms based on the principle of active learning. The software asks the user to disambiguate difficult tokens for it (e. g.: "I played golf in the city of Aigle." and "I played golf and I made an eagle (réussi un aigle)!"). The software then evaluates its ability to disambiguate the terms as the user did. When its performance reaches a fairly high threshold, the software automatically completes the evaluation of the corpus.
The Translation Bureau is also interested in the Barçah prototype. Since September 2005, the Terminology Standardization Directorate has been examining the Government On-Line (GOL) terminology, as recommended by the Translation Bureau, in various public federal structures.
Since the spring of 2005, another organization, the Office québécois de la langue française (OQLF), and the Université du Québec en Outaouais have also been collaborating on a project: terminometrics in the subject field of nanoscience and nanotechnology in Québec.
The LTRC's Barçah Project is the venture of Jean Quirion, a terminology professor in the Department of Language Studies at the Université du Québec en Outaouais, and is being carried out in partnership with Caroline Barrière and David Nadeau, respectively a research officer and a programmer-analyst for the Interactive Language Technologies Group of the National Research Council of Canada.
For additional information, please contact:
|
|

|
| |
Seminar: Reading Comprehension: from Merlin to Modularity |
| |
Patricia M. Raymond, Professor affiliated with the Department of Language Studies at Université du Québec en Outaouais
Thursday, December 13, 2005
Abstract
Overview of established mental models and of how they have informed current views of reading comprehension in both L1 and L2. On-screen reading and areas of future research.
Biography
Patricia M. Raymond (Ph.D in education, University of Montréal) was a full professor at the University of Ottawa's Second Language Institute. She is currently affiliated with the Department of Language Studies at UQO. Her research interests include reading and writing in L1/L2, rhetorical genre theory, and situated literacy.
|
 |

|
| |
Seminar: CAEL Assessment Presentation |
| |
Virginia A. Taylor, Assistant Director, School of Linguistics and Applied Language Studies, Carleton University
Muhammad Usman Erdosy, test manager for the Canadian Academic English Language (CAEL) Assessment
Thursday, December 8, 2005
Abstract
The Canadian Academic English Language (CAEL) Assessment instrument was developed at Carleton University in response to dissatisfaction with the performance of non-native speaking students who had met their English proficiency requirements by writing a major international proficiency test, such as TOEFL or MELAB. It was felt, in particular, that existing proficiency tests failed to incorporate in their designs the specific linguistic demands made on students in North American institutions of higher education.
To meet this challenge, a team of test developers relied on the expertise of both ESL instructors and faculty at Carleton University to draw up test tasks and assessment criteria that reflected the demands of using English in academic settings. The test as it currently stands assesses test takers' ability to read and understand academic texts, to listen to and understand an academic lecture, to use the information from both aural and written input to respond to an essay prompt, and to demonstrate oral proficiency over a range of academic speaking tasks. A distinguishing feature of CAEL is the thematic integration of the Listening, Reading, and Writing modules - the lecture test takers listen to, the texts they read, and the essay prompt, all focus on a single topic. This is the feature of CAEL that simulates academic settings, so that the performance of test takers may be confidently extrapolated to performance in authentic academic settings beyond the narrower context of a language proficiency test.
Although not thematically integrated with the rest of the test, the Oral Language Test (OLT) component of the CAEL is also characterized by its simulation of academic environments, and sampling of a wide variety of academic registers. It is designed to be administered through computers, with test takers listening for input through a headphone and recording their responses into a microphone. The recorded sound files are later accessed by raters so test takers' responses can be scored using scoring rubrics for both language and content. It is the OLT that is the focus of much current research within CAEL. Briefly, we are confronting two specific challenges, whose resolution would not only enhance CAEL's capacity to administer the test, but also enhance the ability of the Canadian language industry to facilitate language assessment. The first of these challenges is to develop a reliable system for providing aural input to test takers and for recording their responses, and the second is to explore the possibility of constructing algorithms that would allow for the computerized scoring of test takers' responses. These are areas that we feel would provide for fruitful collaboration between the CAEL Assessment team and the consortium of experts brought together under the aegis of the LTRC.
Biographies
Virginia A. Taylor, Asst. Director, School of Linguistics and Applied Language Studies, Carleton University
Ms. Taylor currently resides in Ottawa, Canada where she manages language programs at Carleton University, and teaches in the area of Cross-Cultural Communication and Teacher Education. As the Asst. Director she is responsible for the Intensive ESL programs, Special Projects, and the CAEL Assessment. She holds a bachelor's degree in sociology, a Certificate in Teaching English as Second Language, and an M.A. in Applied Language Studies from Carleton University. Virginia has taught in Canada, as well as the Czech Republic and Mexico. More recently, she managed workplace language training programs with both the public and private sector. Virginia is presently on the Board of Directors for the Canada Language Council.
Muhammad Usman Erdosy holds an M.A. and a PhD in Second-Language Education
from the Ontario Institute for Studies in Education.
Besides extensive experience in teaching English as
a Second Language at the University of Toronto, he has
been involved in language testing since 1997, first
as an examiner, later as test developer, and most recently
as the test manager for the Canadian Academic English
Language (CAEL) Assessment. He has participated in numerous
research and test development projects with a special
focus on the assessment of language ability in academic
contexts. His current preoccupations are with updating
the test specifications for the Listening, Speaking,
Reading, and Writing modules of the CAEL Assessment,
and with developing a long-term program of test validation
based on test content, test construct(s), and the study
of indigenous assessment criteria in students' target
environment, namely (North American) institutions of
higher education.
|
|

|
| |
Seminar: The Contribution of Computer Technology to Lexicographic Research |
| |
Roda P. Roberts, Director of the Bilingual Canadian Dictionary Project, University of Ottawa
Wednesday November 30, 2005
Abstract
Computer technology plays a central role in all stages of dictionary production: consultation and analysis of documentation, entry preparation and entry revision. In this presentation, I will discuss the contribution of computer technology to the creation of the Bilingual Canadian Dictionary, which is one of the primary objectives of an interuniversity Canadian research project.
Biography
Dr. Roda P. Roberts, a certified translator, is a Full Professor at the School of Translation and Interpretation of the University of Ottawa, where she was Director from 1979 to 1989. She has taught languages, translation and interpretation at a number of universities in Canada, the United States and India. In addition, she has trained translation and interpretation trainers in Canada, the United States and Mexico, and has served as curriculum consultant to various educational institutions. She has written numerous articles on translation theory, translator/interpreter training, terminology and lexicography. She is presently Director of the Bilingual Canadian Dictionary Project, an interuniversity lexicographic project which has centres at the University of Ottawa, the University of Montreal and Laval University.
|
|

|
| |
Seminar: Using Information Extraction techniques for the Discovery of Business Opportunities |
| |
François Paradis, Researcher, RALI laboratory, University of Montreal Thursday, November 24, 2005
Abstract
MBOI (Matching Business Opportunities on the Internet) is a joint project between RALI and Nstein Technologies, which addresses a key issue in electronic commerce: the discovery of business opportunities by analysing, matching and classifying call for tenders on the Web. The following research directions are currently being investigated: the definition of language models, the use of enterprise profiles, content-based filtering, business intelligence, etc. In this talk I will focus on the extraction of information and two of its uses in our project: the selection of "subject" passages, and the identification of business links between organisations. I will present various filtering approaches, based on vocabulary, domain ontology and named entity extraction, and their impact on the classification of call for tenders. I will then explain our strategy for business links extraction, which exploits the co-occurrences context and semantic contents via Wordnet. I will conclude with the lessons learnt so far in the project, and our stance regarding the current trends in information retrieval.
Biography
François PARADIS obtained his Ph.D. from Joseph Fourier University in 1997, and has sinced worked at CSIRO Australia, the University of Waikato, and currently at the University of Montreal. His interests include digital libraries, information retrieval and classification.
|
 |

|
| |
Seminar: Automatic Detection of Translation Errors: The TransCheck System |
| |
Graham Russell, Researcher, RALI Laboratory, University of Montreal
Wednesday, November 16, 2005
Abstract
Although language technology has been widely adopted in the translation world, several parts of the overall translation process remain relatively unexploited. One of these is quality control, and more specifically the detection of errors of translation.
This talk presents TransCheck, a system currently under joint development by the NRC and RALI at the University of Montreal within the framework of the LTRC. TransCheck provides a framework for error detection, together with modules designed to detect a number of common errors. The presentation will discuss the difficulty of translation error detection in general, and identify several classes of error which can be detected using state-of-the-art language technology. Each is exemplified and characterized from a linguistic and translational point of view, and the mechanisms required for its treatment are described.
A demonstration of the TransCheck system is available, and parties interested in pursuing its testing, integration and development are welcome.
Biography
Graham Russell is a research officer at the RALI laboratory in the Computer Science Department of the Université de Montréal, as well as a visiting researcher of the NRC Interactive Language Technologies Group, and works principally on translation technology. Previously, Mr. Russell has carried out R&D at the Universities of East Anglia, Cambridge and Geneva, and the Canadian Centre for Information Technology Innovation (CITI), with research interests including lexical structure and organization, the application of finite-state methods to language analysis, generation, and machine translation. More recently, he has also participated in the language industry Technology Roadmap directed by AILIA.
|
 |

|
| |
CLiNE 2005 at UQO – A Success! |
| |
CLiNE (Computational Linguistics in the North East) is a one-day meeting initiated by Sabine Bergler of Concordia University (Montreal), where the first two CliNE events were held: CliNE 2002 and CliNE 2004. It was a great pleasure and a privilege for the Outaouais researchers to organize CLiNE 2005, which was held Friday August 26th at the pavillon Alexandre-Taché of the Université du Québec en Outaouais (UQO).
A total of 53 participants interested in computational linguistics and its applications – professors, researchers and students – attended the CLiNE 2005 event, mainly from Montreal, Quebec City and Ottawa. The conference was organized by Caroline Barrière, researcher at the ILTG (Interactive Language Technologies Group) of the NRC, together with her colleague George Foster, from the same R&D group, and Jean Quirion, professor and Director of the Département d'études langagières at UQO.
The session saw very positive exchanges of information and discussions. A variety of research themes were addressed by eight presentations:
- lexical semantics,
- computational terminology,
- automated text summarization,
- information extraction from web sites,
- text categorization, and
- machine translation.
In addition, a poster session was organized during lunch break, in order to allow in-depth discussion by participants of the topics presented by the authors of the posters.
The conference papers are available at: http://www.crtl.ca/cline05/papers_enfr.htm.
We are extremely pleased by the success of CLiNE 2005. Next year, CliNE 2006 will move to Quebec City, where Marie-Josée Goulet and Joël Bourgeoys, both students at Université Laval, will host us for a new version of this excellent initiative.
|
|

|
| |
Seminar: Word-Level Confidence Estimation for Machine
Translation |
| |
Nicola Ueffing, doctoral student, Aachen University, Germany
Tuesday, October 11, 2005
Abstract
This talk will address the problem of assessing the correctness of machine translation output on the word level. Especially in contexts where human users are involved, it is helpful to know when the machine translation system is correct and when it's not.
An overview on word-level confidence measures for statistical machine translation will be given. The basis behind all presented approaches are word posterior probabilities, which can be directly used as confidence measures. Different approaches to their calculation will be explained and compared. Among them are methods which make use of system output such as word graphs and N-best lists as well as methods which incorporate other statistical models as knowledge sources. Experimental comparison of different word-level confidence measures will be presented on a translation task consisting of technical manuals.
Additionally, I will show how word confidence measures can be applied in an interactive statistical machine translation system. This system predicts translations, taking into account parts of the sentence that have already been accepted or typed by the user. Through the use of confidence measures, the performance of the prediction engine can be improved.
Biography
Nicola Ueffing is currently completing her doctoral studies at the University of Aachen. Her research interests include statistical machine translation and machine learning. Her main focus is on confidence measures for statistical machine translation.
|
 |

|
| |
Seminar: Using the Lexical Capabilities of Corpora for Automatic Term Extraction |
| |
Dr. Patrick Drouin, University of Montreal
September 21, 2005
Abstract
I will present an automatic term extraction technique that is based on comparing corpora with different characteristics. I will describe the technique in detail and discuss how it applies to the TermoStat software. I will also elaborate on the findings of my experiments involving English, French and Korean. I will conclude my presentation by giving several examples of how this technique can be used for purposes other than term extraction and by providing several suggestions for further research that resulted from the work accomplished to date.
Biography
Patrick Drouin is an assistant professor with the linguistics and translation department at the University of Montreal, where he teaches localization and translation. His research focuses mainly on automatic term extraction. Before becoming a professor at the Univesity of Montreal, he worked as a language technologies specialist for both Nortel Networks and Computer Sciences Corporation.
|
 |

|
| |
LISA Forum Europe 2005 - Succeeding in Global Markets |
| |
LISA Forum Europe
Succeeding in Global Markets
Automating Process Technologies and Open Standards for Managing Information Worldwide
Are you saddled with going global, in addition to your "real job?" What if you could find clear guidelines, best practice and standards (all in one place), so that you could deliver on your international objectives and return to your other responsibilities?
Let the Localization Industry Standards Association (LISA) help you achieve peace of mind through accessing the process and procedures for going global without having to reinvent them. Shorten your learning curve and plug into a worldwide network of globalization professionals during the LISA Forum Europe 2005, Succeeding in Global Markets, to be held in Zurich, Switzerland from November 7-11.
Register now and you will receive an Early Bird Discount for the Forum, as well as an additional discount when you register for the Forum and any workshop!
https://www.lisa.org/events/2005zurich/registration.html?from=mm
A three-day exhibition featuring the industry's leading language technology developers of machine translation, content management and workflow systems, web-services, internationalization, translation and localization suppliers will take place during the Forum.
http://www.lisa.org/events/2005zurich/index.html/exhib/?from=mm
|
 |

|
| |
Seminar: Probabilistic learning for organising and managing documents |
| |
Cyril Goutte, Researcher, Xerox Research Centre Europe (Grenoble, France)
July 8, 2005
Abstract
Learning to automatically organize textual information is a key challenge in a world where document collections grow as storage costs decrease.
I will present a hierarchical probabilistic document model, inspired by Probabilistic Latent Semantic Analysis (PLSA) that may be used for automatically organizing document collections, and for categorizing incoming documents in an already existing taxonomy.
This model allows taking into account the intrinsic dependencies in a hierarchical structure, and has good scaling properties, which makes it suitable for handling structures with many thousands of categories. It has been used at Xerox on several problems involving filtering, routing or text mining, on both internal and external data.
We will see how this document model is linked to a data analysis technique used to decompose data in additive parts: Non-negative Matrix Factorisation. We will also see how to extend this model, in order to learn from a mixture of labeled and unlabelled data (semi-supervised learning). The standard EM (expectation-maximization) algorithm is extended to handle semi-supervised learning, while maintaining a proper estimation of the confidence level in the classification decision.
Biography
Cyril Goutte received an M.Sc. from ENSTA in Paris
in 1992, and a PhD from Université Paris 6, in 1997.
He is currently a senior researcher at the Xerox Research
Centre Europe in Grenoble, France, working on Machine
Learning approaches to document content analysis. His
published research includes papers on Machine Learning,
access to textual information and functional neuroimaging.
|
 |

|
| |
CLiNE 2005 - Computational Linguistics in the North-East |
| |
August 26th 2005
Université du Québec en Outaouais
Gatineau, Québec
SECOND CALL FOR PAPERS
In 2002, Sabine Bergler initiated the CLiNE workshop, at Concordia
University. It is a meeting point for local researchers in computational
linguistics. It is an informal, one-day Workshop with papers and posters
presentations. The aim of CLiNE is to get to know work in other labs
nearby, to exchange ideas and to give the possibility for graduate students
to present their work for peer discussion.
Enthusiasm at CLiNE2002 was high, and even higher at CLiNE 2004. We
are then going ahead with CLiNE 2005, and to have a bit of a change of
scenery (and give Sabine a break!!), we are moving the event to Gatineau
for 2005.
We invite you to submit papers to CLiNE 2005.
See our web page at http://www.crtl.ca/cline05/cline05_en.htm
There will be two tracks: full paper (up to 8 pages) to be presented in
a 20 minute presentation and posters/demo (up to 4 pages).
Papers and posters will be reviewed by a program committee.
Full papers should consist of previously unpublished material of mature
research.
Posters may cover work in progress.
Demos should be described (max 4 pages) with theoretical background and
possible applications.
Important dates:
Deadline for submission of papers and posters/demo: June 10, 2005
Deadline for notification of acceptance: July 8, 2005
Deadline for submission of final copy: August 10, 2005
Deadline for registration: August 10, 2005
CLiNE 2005 Workshop: August 26, 2005
Submission should be done by sending email to
George.Foster@nrc-cnrc.gc.ca
Registration fee: CAN$30.00 (details for registration will be provided
later)
Workshop organizer:
Caroline Barrière, NRC
Caroline.Barriere@nrc-cnrc.gc.ca
Program Committee:
George Foster, NRC (Program Committee chair)
Diana Inkpen, University of Ottawa
Lyne Da Sylva, Université de Montréal
Sabine Bergler, Concordia University
Local arrangements:
Caroline Barrière, NRC
Jean Quirion, Université du Québec en Outaouais
|
 |

|
| |
Seminar: TransType2: The final results |
| |
Elliott Macklovitch,
RALI laboratory, University of Montreal
May 27, 2005
Abstract
TransType (the system) represents an important innovation in interactive
machine translation in several respects: first, the focus of the interaction
between the user and the system is on the drafting of the target text, rather
than the disambiguation of the source text; and second, the predictions
proposed by the system derive from a probabilistic MT engine, making it
possible for the system to adapt to the user's input. TransType2 (the project)
was an international collaborative research effort involving European and
Canadian partners which sought to develop an advanced version of this
interactive MT system. The project ended in Europe a few months ago. Two
translation firms were part of the TT2 consortium and participated in quarterly
user trials of successive versions of the system. In this presentation, I will
discuss the results of the final rounds of these user trials and try to draw
some more general conclusions regarding this approach to MT.
Biography
A linguist by training, Elliott Macklovitch has been actively involved in
machine translation since 1977, when he joined the TAUM group at the University
of Montreal. When the TAUM group was disbanded in 1981, he moved to the Canadian
Translation Bureau, where he worked for two years as a French-to-English
translator - invaluable experience for someone interested in translation
automation - before directing the evaluation of commercial MT systems for the
government. He subsequently became a member of the machine-aided translation
group at the CITI, an Industry Canada research lab, and was responsible there
for the translator's workstation project. In 1997, this group was transferred to
the University of Montreal, forming a new research lab known as the RALI (a
French acronym for Recherche appliquée en linguistique informatique). Mr.
Macklovitch has been the RALI's Co-ordinator since January 1999.
Author of many publications in the field, he served as President of the
Association for Machine Translation in the Americas (AMTA) from 2000 to 2004.
The NRC-IIT colloquium does not require advance registration, and attendance
is free-of-charge.
Open to the public
|
|

|
| |
Seminar: The DiCo lexicon and its online version, DiCouèbe |
| |
Alain Polguère, Department of Linguistics and Translation, University of Montreal
May 6, 2005
Abstract
I will present the status of the DiCo project which is aimed at modelizing the lexical paradigmatic and syntagmatic links in French. Four issues will be examined:
- Content and structure of the DiCo;
- Development methodology;
- DicoOuèbe: an on-line interface to DiCo data; and
- Upcoming and potential uses and extensions
DiCo is a project carried out as part of the work of the Observatoire de linguistique Sens-Texte (OLST) group at the Université de Montréal. This work covers a broad spectrum of research in formal linguistics, computerized lexicography, terminology, translation and linguistics for language training.
Biography
Alain Polguère has been a professor at the Linguistics
and Translation department of the Université de Montréal
since 1995. Before that, he worked in research and development
in natural language processing, after which he taught
lexicology, computational linguistics and general linguistics
at the department of English language and literature
at the National University of Singapore. His main research
activities span across lexicology, lexicography, formal
semantics, natural language processing and linguistics
for language training. He is the current director of
the Observatoire de linguistique Sens-Texte (OLST) research
group.
|
|

|
| |
| |
|
The Localization Industry Standards Association (LISA) will be holding its
next conference in Boston, from May 23-27, 2005. The topic for this
conference is "Localization for the Next Millennium -- Managing Emerging
Opportunities and Challenges". For more information, please consult LISA's
website for more information (http://www.lisa.org/events/2005boston/).
To encourage Canadian participation at this event, Industry Canada's
Language Industry Program (LIP) will be available to assist Canadian
translation and localization companies in attending
(http://strategis.ic.gc.ca/epic/internet/inlip-pil.nsf/en/Home).
AILIA will also be present (http://www.ailia.ca).
Here are a few important points you should keep in mind:
- We encourage you to send us your application as soon as possible,
as they will be treated immediately. To accelerate the process,
please make sure that all the necessary information is included.
- If your request is accepted, please remember that LIP cannot
reimburse expenses retroactively (i.e., expenses which you have already
paid); you can make travel arrangements, but you can only pay the bill
once you have a dated and signed contribution agreement in hand.
- Finally, please remember that a company can only apply to LIP once
a year. So, if you want to take full advantage of this program, we
encourage you to include, in your application, a sufficient number of
activities so that you can obtain the maximum funding that we can give
to any one company annually, namely 50% of eligible expenses, up to $10,000.
If you have any other questions regarding LIP, please contact the program coordinator
directly at LIP-PIL@ic.gc.ca.
|
|

|
| |
Seminar - Statistical machine translation with non-contiguous phrases |
| |
Michel Simard, Xerox Research Centre Europe April 8, 2005
Abstract
I will present a phrase-based statistical machine translation method, based on non-contiguous phrases, i.e. phrases with "gaps".
I will propose a method for producing such phrases from word-aligned corpora, a statistical translation model that deals with such phrases, as well as a training method based on the maximization of translation accuracy, as measured with the NIST evaluation metric.
Translations are produced by means of a beam-search decoder, which I will briefly describe.
I will conclude with experimental results that demonstrate how the proposed method allows to better generalize from the training data.
Biography
Michel Simard is a postdoctoral researcher with the Machine Learning group at Xerox Research Centre Europe, Grenoble. He obtained his BSc in mathematics and computer science at the University of Montreal in 1986, his MSc in computer science at McGill University in 1990, and his PhD in computer science at the University of Montreal in 2003. In the past, he has worked as a researcher at the CITI (a research institute formerly part of Industry Canada) and at the University of Montreal's RALI laboratory. His research interests are in machine learning approaches to machine translation, machine-assisted translation, and other natural language processing tasks.
|
 |

|
| |
Symbolic beginning of construction on the LTRC building |
|
To mark the beginning of construction on the Language Technologies Research Centre (LTRC), the media was invited on January 20, 2005 to a press conference, followed by a ground-breaking ceremony.
Among the officials present were Benoît Pelletier, Minister for Canadian Intergovernmental Affairs and Native Affairs and the provincial member of parliament for Chapleau, Marcel Proulx, federal member of parliament for Hull-Aylmer, Roch Cholette, provincial member of parliament for Hull, and Yves Ducharme, mayor of Gatineau.
During the press conference, the rector of the Université du Québec en Outaouais (UQO), Francis R. Whyte, underlined the importance of this research centre for the University and officially introduced the group of architects, Fortin Corriveau Salvail / Menkès Shooner Dagenais LeTourneux, that would carry out this important infrastructure project on the University's campus.
The officials then proceeded outdoors for the traditional ground-breaking ceremony.
The LTRC is a collaboration between the UQO, the Translation
Bureau, and the National Research Council of Canada,
the main research participants, in addition to the following
partners: Industry Canada, the Government of Quebec's
Ministère du Développement économique, de l'Innovation
et de l'Exportation, Canada Economic Development, the
Language Industry Association (AILIA), and the Economic
Development Corporation of Gatineau (GEDC).


|
Seminar - Coveo's challenges relating to the multilingualism of content |
| |
Pascal Soucy and Frédérick Brault December 3, 2004
Abstract
Content management is a rapidly expanding field, both in terms of research and the growing market. From the acquisition of content until its publication, content management focusses on organizing, indexing, classifying and structuring the content so that it can be stored, published and reused.
Coveo Solutions Inc., formerly Copernic Business Solutions, develops content management solutions for businesses, especially solutions relating to information retrieval.
During this presentation, we will outline the challenges presented by the multilingualism of content in the context of a commercial search engine for businesses, challenges often found in other content management applications. In particular, we will discuss the issue of support for certain Asian languages, such as Japanese, Chinese and Korean.
Biographies
After teaching computer science at the Cégep de Sainte-Foy from 1997 to 2001, Pascal Soucy was hired by Copernic where, while working on a search solution for businesses, he became particularly interested in ranking, automatic language identification, document encoding and automatic correction of queries. In 2002, he received his master's degree in computer science from Université Laval; his thesis dealt with the selection of attributes for text categorization. He then began work on his doctoral research project, which looks at the temporal qualities of attributes in learning models.
Frédérick Brault received his bachelor's and master's degrees in linguistics from Université Laval. At Laval, he also worked on the Théorie sur les Contraintes et Stratégies de Réparation [theory on fix constraints and strategies] (phonology) and the CoPho Project, and he was involved in evaluating and improving terminology retrieval technology at CIRAL. During this period of his graduate studies, he studied the limits of trigrams in the automatic labelling of French. In 2001, he joined the Copernic team where, among other things, he was involved in developping technologies relating to automatic summary, terminology retrieval, syntax analysis and lemmatization. His major interests are syntax and phonology.
|
 |

|
| |
Seminar - A General Feature Space for Automatic Verb Classification |
| |
Eric Joanis November 12, 2004
Abstract
This seminar will describe a general
feature space that was developed and implemented for
the automatic classification of verbs into lexical semantic
classes. Such a classification may be useful in identifying
systematic patterns in language and may prove to be
a useful tool for machine translation or parsing.
To measure a set of potentially useful linguistic characteristics
of verb classes, 224 statistical indicators and extracted
estimates from the British National Corpus were defined
to form the general feature space (GFS). Using Support
Vector Machines (SVM), the GFS was tested with 11 two-way
and multi-way classification tasks of varying expected
difficulty.
Reductions in error rates of between 38% and 88% from
baseline were achieved. Performance using the GFS was
similar to performance using manually selected features
for the same tasks. By analyzing the classification
structure only once and at a general level, the need
to manually select discriminating features for particular
tasks, which requires a significant investment of time
by experts, was avoided.
Follow-up experiments were performed to determine the
contribution of different feature types to overall performance.
Surprisingly, syntactic features, especially prepositional
features, played the largest role, while features depending
on a deeper linguistic analysis added little. This may
be an artifact of varying signal-to-noise ratios, or
it may be the nature of Levin's (1993) classification,
or it may be a consequence of English language structure.
Experiments using subcategorization frames instead of
the GFS confirmed the predominant role of prepositions.
The results show that this approach is generally applicable
and avoids the need for resource-intensive linguistic
analysis for each new task. The machine learning system
successfully identifies and uses the information-rich
parts of the GFS and also gives us insight into the
structure of the language. As a result, this methodology
looks promising for other languages and classification
tasks.
Biography
Eric Joanis received his BMath in Computer Science
from the University Of Waterloo in 1996. He then joined
Televitesse (a former Newbridge affiliate), where he
worked on the automatic categorization and topic segmentation
of television news clips using the closed-captioning
stream.
In 2002, he received his MSc in Computer Science from
the University of Toronto. His thesis provides the basis
of this seminar. Since then he has continued to work
as a research programmer and research assistant in computational
linguistics: he has worked on semi-supervised verb classification
at the University of Toronto and on the token-wise disambiguation
of verbs at the University of Geneva. He is currently
working on the German component of a multilingual parsing
project at the University of Geneva.
|
 |

|
| |
Seminar - Developing a
Phrase-Based Statistical Machine Translation System:
An Examination of the IWSLT Shared Translation Task |
| |
Philippe Langlais October 19, 2004
Abstract
This presentation will describe how a Chinese-to-English
translation system was built for the International
Workshop on Spoken Language Translation (IWSLT)
2004 evaluation campaign. To begin, a short literature
survey of the statistical-based approach towards word
sequences will be presented. This will be followed by
an outline of how a decent translation system was built
in just one month by using readily available tools.
Finally, an attempt will be made to define the limits
of such an untertaking, taking into account that the
results are to be analyzed at the IWSLT workshop.
Biography
Philippe Langlais is a professor at the Department
of Computer Science and Operations Research (DIRO) at
the University of Montreal in the field of computational
linguistics. He obtained a PhD from Université d'Avignon
in 1995, where he worked on speech recognition at LIA
(Laboratoire Informatique d'Avignon), after spending
three years with the speech technology group at IDIAP,
(Institut Dalle Molle d'Intelligence Artificielle et
Perceptive) in Switzerland. From 1995 to 1997, Philippe
was a lecturer and researcher at Université d'Avignon
as well as co-ordinator of the AUPELF-UREF funded ARCADE
project, which was devoted to multilingual alignment.
The following year, he worked as an invited researcher
at the Centre for Speech Technology, part of the Department
of Speech, Music, and Hearing (TMH) of the the Royal
Institute of Technology (KTH), in Stockholm. Philippe
joined RALI in 1998, where he works on statistical machine
translation.
|
 |

|
| |
Seminar - On German Parsing: Experiences with a Language which isn't English |
| |
Amit Dubey September 28, 2004
Abstract
In this talk, I'll be discussing my dissertation work on
German statistical parsing. While statistical parsing is,
in general, well researched, much of the work on the topic
has primarily focussed on English. This raises an important
question: are the techniques that have been developed for
English useful for parsing, or simply useful for parsing English?
I will address this question by analyzing my results of parsing in German. German syntax differs from that of English in two important ways: the order of words in a sentence is more variable, and the morphology is more productive than in English. I show that two standard techniques developed for English, lexicalization (Collins, 1999; Charniak, 1997) and accounting for more tree structure (Johnson, 1998; Charniak 2000; Klein & Manning 2003) are not adequate for modelling the unique aspects of German syntax. In fact, these two standard techniques have less impact on overall parsing accuracy than an attribute-value-inspired approach specifically designed to account for word order and morphology.
Moreover, I found that standard parsing evaluation metrics
produced surprising results with the models I tested. Overall,
I found that it is beneficial to pay attention to language-specific
phenomena and that it is all too easy to overlook important
aspects of evaluation. These two findings have implications
for statistical parsers, which are being developed for an
increasing number of languages.
Biography
Amit Dubey completed his BMath in Co-op Computer Science from the University of Waterloo. He remained at Waterloo to complete his MMath, specializing in computational linguistics under Dr. Nick Cercone. He then moved to Saarland University in Germany to work on his doctorate, supervised by Dr. Matthew Crocker of Saarland University and Dr. Frank Keller of the University of Edinburgh.
|
 |

|
| |
Seminar - Multilingual Document Processing
at XRCE |
| |
Pierre Isabelle August 3, 2004
Abstract
Xerox has been doing research on NLP for more than 25 years. Since the creation of XRCE in 1993, the pace has been stepped up and the multi-lingual aspect is being strongly emphasized. Multi-lingual NLP is very challenging not only because one needs to cover many different and equally complex languages, but also because it involves the construction of bridges between linguistic systems that are often far apart. XRCE is tackling this challenge using a layered approach. The bottom layer is made up of language-independent ‘lingware’ based on a set of core technologies such as the Xerox finite-state calculus. These tools are then used for developing language-specific (or language-pair-specific) reusable components (e.g. morphological analyzers, part-of-speech taggers, syntactic parsers, bilingual dictionaries). Finally, the resulting linguistic resources are deployed in a various applications and for many different languages: information retrieval, document classification, terminology management, information extraction, document enrichment, translation aids and authoring aids.
Biography
Pierre Isabelle is currently managing the Content Analysis group of Xerox Research Centre Europe (Grenoble). He is an also an Associate Professor at the computer science department of the Université de Montréal.
Pierre holds a Ph.D. in computational linguistics. He started his research career in 1975 as a member of the TAUM machine translation group at the Université de Montréal. Between 1985 and 1996, he was in charge of the machine-aided translation team of CITI, a research laboratory of the Canadian Department of Industry. In 1997 he returned to the Université de Montréal as head of the RALI laboratory of the computer science department, until he joined Xerox Research Centre Europe in 1999.
He is the author of numerous scientific publications in machine-aided translation and natural language processing. He is currently serving as the editor of the 'Squibs and Discussions' section of the Computational Linguistics journal and as a member of the editorial board of the Machine Translation journal. He organized several international scientific conferences, including COLING-ACL'98 and ACL-02, and he is a member of the International Committee on Computational Linguistics (ICCL).
|
 |

|
| |
The Language Technologies Research Centre (LTRC) Building |
| |
On May 20, 2004, the construction of a building to house the Language Technologies Research Centre (LTRC) was officially announced. The four-storey, 5400 m2 structure (54 000 sq. ft.) will be built on the Université du Québec en Outaouais (UQO) campus in Gatineau, beside the Alexandre-Taché Building, which is located at 283 Alexandre-Taché Boulevard, Gatineau, Quebec.
When it is completed in 2006, the LTRC's building will be able to accommodate up to 150 researchers and experts in addition to all of the technical equipment necessary for research and development. Construction is expected to begin in early 2005.
This building project represents an investment of $15.2 million
in the LTRC. The Canadian government, through Canada
Economic Development for Quebec Regions, is contributing
$9.1 million, the Quebec provincial government's
Ministère de Développement économique, de l'Innovation
et de l'Exportation $5.75 million, and UQO and
other partners $350 000.
|
 |

|
| |
Seminar - Cross-Lingual
Information Retrieval: an approach based on comparable corpora
|
| |
Fatia Sadat July 5, 2004
Abstract
Expanded international collaboration, the increase in the availability of electronic foreign language texts and resources and the growing number of non-English speaking users compels us to develop Cross-Lingual Information Retrieval (CLIR) tools capable of bridging the language barrier. CLIR bridges this gap by enabling a person to search in one language and retrieve documents across different languages.
Empirical works show that translation ambiguity arising from polysemy, handling compounds and phrases, lack of lexical resources, missing words in the bilingual dictionary during translation are the main hurdles in CLIR tasks. The problems associated to CLIR have been acknowledged for many languages.
In this talk, I will focus on bilingual terminology acquisition from comparable corpora that will help cross the language barrier in CLIR and also enrich existing bilingual lexicons. I will present a two-stage translation model based on comparable corpora and morphological knowledge of the extracted pair of source term and target translation candidates. A case study in CLIR is completed using queries in Japanese and collection of documents in English. Furthermore, linear combination of translation models based on comparable corpora, bilingual dictionaries and transliteration is proposed.
Evaluations using different weighting schemes of SMART retrieval system showed that combination of different resources for query translation improves greatly the effectiveness of CLIR.
Biography
Fatiha Sadat is currently a visiting researcher under the JSPS postdoctoral fellowship at the National Institute of Informatics, Tokyo, Japan. She received a degree of Doctor of Engineering from the Graduate School of Information Science, Nara Institute of Science and Technology (NAIST) in September 2003. She has contributed to the MuchMore project as a research visitor at Xerox Research Centre Europe, during summer 2001. Her broad research interests include multi-lingual and cross-lingual information retrieval, natural language processing, multi-document summarization, among others. Her technical articles are published in numerous scholarly journals and conferences. Dr. Sadat is a member of IEEE, ACL, ACM SIGMOD and IPSJ.
http://db-www.naist.jp/~fatia-s/
|
 |

|
|
|
LTRC Language Studies Conference |
| |
The first ever LTRC Language Studies Conference will be held as part of the UQO Open House held on Wednesday, January 31, 2007 from 4:00 p.m. to 8:00 p.m.
The conference will take place at the Language Technologies Research Centre (LTRC), located at Alexandre-Taché Pavillion, 283 Alexandre-Taché Boulevard, in Gatineau, Hull.
You are invited to:
- meet the people in charge of the various language studies programs offered by UQO;
- discover the large variety of exhibitors within the language industry;
- attend conferences on current language study topics; and
- visit the Language Technologies Research Centre (LTRC).
Conference will be held at 5:00 p.m. and 6:30 p.m.
"The future of translation: great career opportunities" presented by Donald Barabé, Vice President of the Translation Bureau Local: F-0129 of the LTRC
Free parking
For more information: 819 595-3900, extension 3841 or 1 800-567-1283, extension 3841 or by email at questions@uqo.ca.
|
 |

|
| |
|
|
|