Linguistic Analysis

Methods for automated linguistic analysis are the foundation of robust multilingual solutions that allows to cross language barriers. Tilde’s researchers has deep knowledge in developing written language technology for complex, highly inflected languages.

Tilde’s long-term research activities on written language processing and linguistic analysis for highly inflected languages has resulted in the exceptional proofing tools and natural language processing technologies (e.g. morphological analyzers and taggers, syntactic parsers, named entity recognizers, etc.) for Baltic languages.

Linguistic analysis research

 

Linguistic analysis for better language technology

The quality of basic linguistic analysis tools for written text processing plays a crucial role in the development of high level, cutting-edge language technology solutions. Therefore, Tilde's team of researchers is constantly looking for novel methods to improve linguistic analysis tools. Methods used for linguistic analysis include knowledge based, data driven and hybrid. Recently our researchers started to investigate neural network models for three types of written text analysis tasks – syntactic analysis, assessment of grammaticality, and grammar correction.

Tilde’s research in syntactic analysis and grammar checking has been internationally acknowledged and has received best paper, 3rd place in 2014.

PROJECTS

Ongoing projects

Quality Translation 21

Quality Translation 21

Project aims to develop substantially improved statistical and machine-learning based translation models for challenging languages and resource scenarios.

Read more
Odine Project

Open Data Incubator for Europe (ODINE)

As part of its ODINE incubator project, Tilde will gather, create, and contribute new Multilingual Open Data sets for EU languages, which enable the language technology community to develop key services such as machine translation systems.

Read more
European Language Resource Coordination

European Language Resource Coordination

The objective of the project to identify and gather language and translation data relevant to public administration across all 30 European countries.

Read more

Completed Projects

project clarity logo

CLARITY (FP5 project) – Cross-Language Information Retrieval and Organisation of Text and Audio Documents

The aim of the CLARITY project was to develop cross-lingual information retrieval (CLIR) techniques for English -> Finnish, Swedish, Latvian & Lithuanian i.e low density languages with minimal translation resources and to investigate techniques of document organisation and presentation in concept hierarchies and by document genres and filters. Clarity was a fully-fledged retrieval system that supported the user during the whole process of query formulation, text retrieval and document browsing.

 

project ttc logo

TTC (FP7 project) – Terminology Extraction, Translation Tools and Comparable Corpora

The TTC project aimed at leveraging machine translation tools (MT tools), computer-assisted translation tools (CAT tools) and multilingual content management tools by automatically generating bilingual terminologies from comparable corpora in several European languages (i.e. English, French, German and Latvian) as well as in Chinese and Russian. Terms in different languages are aligned based on the similarity of words next to them in the corpora (immediate vicinity), the approach is known as lexical context analysis. The system generates candidate translations for single- or multi- word terms. The approach relies on the one-to-one relation between terms and concepts.

Publications

2023

2022