Different tools for different languages
Depending on different language specific tools and resources, supported languages can be divided in four categories:
A level languages (English and Latvian) have the highest level of support in the Tilde Terminology platform for term tagging, term normalisation, and term translation equivalent look-up in the TaaS Statistical Data Base (SDB). The following linguistic tools are available for A level languages:
- Part-of-speech (or morpho-syntactic) taggers trained on (high quality) human annotated training data
- Lemmatisers, which allow performing better statistical analysis for term candidate extraction
- Morphological analysers and synthesisers, which are required for term normalisation
- Rule-based term normalisers, which allow reducing redundancy in the extracted term candidate lists
B level languages (Dutch, Estonian, French, German, Hungarian, Italian, Lithuanian and Spanish) have the highest level of support in the Tilde Terminology platform for term tagging, however, they do not have a term normalisation tool. Term translation equivalent look-up in the TaaS SDB is performed using lemmatised term forms instead of normalised term forms. The following linguistic tools are available for B level languages:
- Part-of-speech (or morpho-syntactic) taggers trained on (high quality) human annotated training data
- Lemmatisers, which allow performing better statistical analysis for term candidate extraction and provide basic support for redundancy reduction in the extracted term candidate lists
C level languages (Bulgarian, Croatian, Czech, Danish, Finnish, Greek, Maltese, Polish, Portuguese, Romanian, Russian, Slovak, Slovene and Swedish) have basic support in the Tilde Terminology platform for term tagging, they do not have a term normalisation tool, and term translation equivalent look-up in the TaaS SDB is performed using only term surface forms (the forms found in contexts) instead of normalised forms. The following linguistic tools are available for C level languages:
D level languages (Irish and Turkish) have no linguistic tool support in the Tilde Terminology platform for term tagging, they do not have a term normalisation tool, and term translation equivalent look-up in the TaaS SDB is performed using only term surface forms (the forms found in contexts) instead of normalised forms. Term candidate extraction for these languages is based on language independent methods.