| Skip to main content | Skip to sub navigation |

ECS Intranet:
Statistical Multi-Lingual Analysis for Retrieval and Translation (SMART)


More than half of the EU citizens are not able to hold a conversation in a language other than their mother tongue, let alone to conduct a negotiation, or interpret a law. In a time of wide availability of communication technologies, language barriers are a serious bottleneck to European integration and to economic and cultural exchanges in general. More effective tools to overcome such barriers, in the form of software for machine translation and other cross-lingual textual information access tasks, are in strong demand.

Statistical methods are promising, in that they achieve performances equivalent or superior to those of rule-based systems, at a fraction of the development effort. There are, however, some identified shortcomings in these methods, preventing their broad diffusion. As an example, even though lexical choice is usually more accurate with Statistical Machine Translation (SMT) systems than with their rule-based counterparts, the text they produce tends to be less fluent. As a second example, SMT systems are trained in batch mode and do not adapt by taking user feedback into account. Finally, in Cross-Language Information Retrieval tasks, query words are most often translated independent of one another, thus giving up possibly relevant contextual clues.

SMART is an attempt to address these and other shortcomings by the methods of modern Statistical Learning. The scientific focus is on developing new and more effective statistical approaches while ensuring that existing know-how is duly taken into account. By bringing together leading research institutions in Statistical Learning, Machine Translation and Textual Information Access, the SMART consortium is well positioned to achieve this goal.

Thorough field evaluation on three user scenarios, involving user groups from innovation-oriented SMEs, and extensive exploitation and dissemination activities will ensure that advances make their way out of the laboratories, in the form of both significant and measurable improvements over existing technologies and of new applications currently beyond the state of the art.

  • A first user scenario focuses on the work of professional translators and aims at validating new technologies by assessing impact on productivity.
  • The second scenario considers the work of technicians providing support to customers over the phone, holding a conversation in a language different from the language the technical documentation available to him/her is written in.
  • The third user scenario, finally, consists in enabling a user to access portions of the multilingual Wikipedia in languages of which (s)he has limited command.

SMART is a 3-year "Specific Target Research Project" (STReP) funded by the European Commission through its "Information Society Technologies" (IST) priority, as part of the sixth Framework Programme. It started on October 1, 2006 and is coordinated by Nicola Cancedda at Xerox Research Centre Europe.

Homepage: http://www.smart-project.eu/
Type: Normal Research Project
Research Group: Information: Signals, Images, Systems Research Group
Themes: Machine Translation, Machine Learning
Dates: 1st October 2006 to 30th September 2009

Partners

  • Xerox Research Centre Europe
  • Amebis
  • Celer Soluciones
  • Jozef Stefan Institute
  • National Research Council Canada
  • University of Bristol
  • University of Helsinki
  • Universita` degli Studi di Milano
  • University College London

Funding

  • EU

Principal Investigators

  • [hidden]

Other Investigators

  • ss03v
  • yn05r
URI: http://id.ecs.soton.ac.uk/project/441
RDF: http://rdf.ecs.soton.ac.uk/project/441

More information


Associated Publications

Number of items: 3.

Ni, Y., Niranjan, M., Saunders, C. and Szedmak, S. (2010) Distance phrase reordering for MOSES - User Manual and Code Guide. Technical Report , ISIS Group, School of Electronics and Computer Science, University of Southampton.

Ni, Y., Saunders, C., Szedmak, S. and Niranjan, M. (2009) Handling phrase reordering for machine translation. In: the joint conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint ConConference Processing, Auguest, 2009, Singapore.

Ni, Y., Saunders, C., Szedmak, S. and Niranjan, M. (2009) STRUCTURE LEARNING FOR NATURAL LANGUAGE PROCESSING. In: IEEE Workshops on Machine Learning for Signal Processing, 2009, Sep 2, 2009 - Sep 4, 2009 , Grenoble, France.

This list was generated on Fri Feb 10 00:58:56 2012 GMT.

Publications included from http://eprints.ecs.soton.ac.uk/view/projects/441.include.