TILDE has extensive experience coordinating and managing large-scale European research and development projects financed by the European Commission. TILDE’s profile in EU funded projects varies from a content provider to a research partner and a project coordinator. In research and development projects co-financed by the European Commission TILDE has acted as coordinator in five of the projects in which it has participated.
The KPLEX Project, funded under the European Commission's Horizon 2020 research programme, is undertaking a 15-month investigation of the ways in which a focus on 'big data' in ICT research elides important issues about the information environment we live in.
For its part in the project, Tilde will examine one of the greatest challenges for Big Data: the analysis and processing of multilingual content in unstructured texts. Tilde will analyze the current situation in respect to coverage for language technologies, paying particularly attention to the current state of development of technologies and tools for EU languages. Research will also be conducted through an analysis of policy documents that address the development of language technology in Europe.
Overseen by Trinity College Dublin, the project's consortium also includes the Data Archiving and Networked Services, Freie Universität Berlin, and Tilde. The project will help the consortium to formulate clear recommendations for the European Commission, allowing policymakers to draft more comprehensive ICT work packages in the future that bridge the current gaps in language technology coverage.
Open Data Incubator for Europe (ODINE). The Open Data Incubator for Europe (ODINE) is a 6-month incubator for open data entrepreneurs across Europe. The programme is funded with a €7.8m grant from the EU’s Horizon 2020 programme. ODINE aims to support the next generation of digital businesses and support them to fast-track the development of their products.
As part of its ODINE incubator project, Tilde will gather, create, and contribute new Multilingual Open Data sets for EU languages, which enable the language technology community to develop key services such as machine translation systems. The project’s innovation is its commitment to contributing a much-needed open data resource to the language technology community, helping to rectify the scarcity of multilingual open data.
Thanks to Tilde's efforts in the ODINE incubator, language technology companies will be given access to new multilingual corpora, which they can use to develop solutions and satisfy the growing demand for localization services, helping to overcome language barriers in the Digital Single Market.
FREME (H2020 project) – Open Framework of e-Services for Multilingual and Semantic Enrichment of Digital Content. FREME addresses the general systemic and technological challenges to validate that multilingual and semantic technologies are ready for their integration in real life business cases in innovative way. These technologies are capable to process (harvest and analyse) content, capture datasets, and add value throughout content and data value chains across sectors, countries, and languages. Project website
QT21 (H2020 project) –Quality Translation 21. Many of the languages not supported by our current technologies show common traits: they are morphologically complex, with free and diverse word order. Often there are not enough training resources and/or processing tools. Together this results in drastic drops in translation quality. The combined challenges of linguistic phenomena and resource scenarios have created a large and under-explored grey area in the language technology map of European languages. Combining support from key stakeholders, QT21 addresses this grey area developing:
- substantially improved statistical and machine-learning based translation models for challenging languages and resource scenarios;
- improved evaluation and continuous learning from mistakes, guided by a systematic analysis of quality barriers, informed by human translators;
- all with a strong focus on scalability, to ensure that learning and decoding with these models is efficient and that reliance on data (annotated or not) is minimised. Project website
European Language Resource Coordination (ELRC). The European Commission launched the comprehensive European Language Resource Coordination (ELRC) effort in April of 2015. The objective is to identify and gather language and translation data relevant to national public services, administrations, and governmental institutions across all 30 European countries participating in the CEF programme. All data resources gathered in the initiative will be provided exclusively to the European Commission for use in the CEF Automated Translation platform. The Automated Translation platform will power Europe's public online services such as Europeana, the Open Data Portal, and the Online Dispute Resolution platform. The platform will help break down language barriers between people and nations in 21st century Europe. Project website
ACCURAT(FP7 project) – Analysis and evaluation of Comparable Corpora for Under Resourced Areas of machine Translation. The aim of the ACCURAT project was to research methods and techniques to overcome one of the central problems of machine translation (MT) – the lack of linguistic resources for under-resourced areas of machine translation. The main goal was to find, analyze and evaluate novel methods that exploit comparable corpora on order to compensate for the shortage of linguistic resources, and ultimately to significantly improve MT quality for under-resourced languages and narrow domains. Project website
- Multilingual technology allowing users to search the data in their native language;
- Multimodal technology allowing them to access the portal not just in written but also in spoken communication. Project website
LetsMT! (ICT PSP project) - Platform for Online Sharing of Training Data and Building User Tailored Machine Translation. To fully exploit the huge potential of existing open SMT technologies the project proposed to build an innovative online collaborative platform for data sharing and MT building. This platform supports upload of public as well as proprietary MT training data and building of multiple MT systems, public or proprietary, by combining and prioritizing this data. Project website
MATT (EUREKA project) – Web-based Multilingual Automated Terminology Translation system. The goal of the project MATT was to develop a new web-based translation system for automated translation of multilingual terminology that bridges the gap between traditional local (desktop) translation tools and terminology data on the Internet. This unique translation technology is meant for both professional translators using specialised translation environments (for example, SDL Trados, Wordfast, Kilgray MemoQ), and for various experts and other users requiring easy access to high quality term resources from standard office environments (Microsoft Word, Microsoft PowerPoint, OpenOffice Writer, etc). The platform for multilingual terminology translation is also made available to machine translation technologies.
META-NORD (ICT PSP project) – Baltic and Nordic Parts of the European Open Linguistic Infrastructure. The META-NORD project aimed to establish an open linguistic infrastructure in the Baltic and Nordic countries to serve the needs of the industry and research communities. The project focused on 8 European languages - Danish, Estonian, Finnish, Icelandic, Latvian, Lithuanian, Norwegian and Swedish - that each have less than 10 million speakers. The project assembled, linked across languages, and made widely available language resources of different types used by different categories of user communities in academia and industry to create products and applications that facilitate linguistic diversity in the EU. Project website
MIAUCE (FP6 project) – Multi modal Interaction Analysis and exploration of Users within a Controlled Environment. The project aimed to investigate and develop techniques to analyse the multi-modal behaviour of users within the context of real applications. The multi-modal behaviour takes the form of eye gaze/fixation, eye blink and body move. The techniques was developed and validated within the context of three different application domains: Security, Customized marketing, and Interactive web TV.
MLi (FP7 project) – Towards a MultiLingual Data & Services Infrastructure. The MLi Support Action is working to deliver the strategic vision and operational specifications needed for building a comprehensive European MultiLingual data & services Infrastructure, along with a multiannual plan for its development and deployment, and foster multi-stakeholders alliances ensuring its long term sustainability. Project website
SAFE (EUROSTARS project) – Social Analytics for Financial Engineering. The project results is a web based news service consisting of the real time social sentiment about a set of financial products. The news are multilingual (Latvian, Swedish, German, Dutch, Polish, French) social media sources (blogs, feeds). Tilde ensured multilingual social media translation for social sentiment analysis by matured and specially adapted for social networks and financial domains SMT (statistical machine translation) systems. The news feed will be available as a free version listing the sentiment only, and a paid subscription based feed offering added services (links to originating news message, personalization and archive functionality). Project description
SEMO. The retrieval of metadata from various documents and their conversion into another format is one of the most significant problems faced by document processing systems. The goal of the SEMO project was to develop a novel intelligent technology that retrieves metadata from documents both in paper and electronic format regardless of their type, structure and language. With the successful implementation of the project, a universal technology is created suitable for use in various document processing systems.
SOLIM (EUROSTARS project) – Spatial Ontology Language for multimedia Information Modelling. The objective of the SOLIM project was to improve context-aware information analysis by expansion of state of the art ontology languages and their support for automated reasoning by adding a spatial dimension. This enables semantic systems to venture beyond a static world and add the concepts of space and change.
TaaS (FP7 project) – Terminology as a Service. The TaaS project addressed the need for instant access to the most up-to-date terms, user participation in the acquisition and sharing of multilingual terminological data, and efficient solutions for terminology resources reuse. The developed cloud-based TaaS platform provides the following online core terminology services:
- Automatic extraction of monolingual term candidates from user uploaded documents using the state-of-the-art terminology extraction techniques
- Automatic recognition of translation equivalents for the extracted terms in user-defined target language(s) from different public and industry terminology databases
- Automatic acquisition of translation equivalents for terms not found in term banks from parallel/comparable web data using the state-of-the-art terminology extraction and bilingual terminology alignment methods (MS2: Prototype bilingual term extraction system/M12)
- Facilities for cleaning up (i.e., revising: editing, deleting) of automatically acquired terminology by users
- Facilities for terminology sharing and reusing: APIs and export tools for sharing resulting terminological data with major term banks and reuse in different user applications (MS3: TaaS platform and integrated core services). Project website
TRIPOD (FP6 project) – TRI-Partite multimedia Object Description. Tripod project aimed to automatically build rich multi-faceted text and semantic descriptions of the landscape and permanent man-made features pictured in a photograph; and to create a more advanced image search engine. Tripod augmented images with spatial data to compute contextual information about the location and features of the actual landscape pictured. Using 3D models, buildings and landscape features contained in the image are identified and located within the picture. Techniques from Web search and text summarisation were applied to automatically create textual descriptions of the photographs, producing a rich readable and multifaceted caption far removed from merely location but encompassing culturally encoded notions such as socially connoted language of place such as suburb, west end, etc.
TTC (FP7 project) – Terminology Extraction, Translation Tools and Comparable Corpora. The TTC project aimed at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in several European languages (i.e. English, French, German and Latvian) as well as in Chinese and Russian. Terms in different languages are aligned based on the similarity of words next to them in the corpora (immediate vicinity), the approach is known as lexical context analysis. The system generates candidate translations for single- or multi- word terms. The approach relies on the one-to-one relation between terms and concepts.