Make your solutions multilingual with our language technology services

Machine translation Terminology Linguistic tools

Language technology services are the foundation of robust multilingual solutions, which can enable businesses and governments to reach across language barriers. Tilde offers a range of language technology services in several areas: machine translation, terminology, proofreading tools, speech technology, and linguistic tools. Available in our cloud platform, these services can be used by developers – through our APIs – to build new multilingual solutions, supporting languages in the digital age.

Machine translation services

With Tilde’s Translation API, users can access MT systems in multiple language pairs and domains. The MT systems are hosted in the cloud and can be integrated into any platform or application.

Explore APIExplore Solutions

  • Domains
    • General
    • Legal
    • Pharmaceutical 
    • IT
  • Languages
    • English to Bulgarian, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Polish, Portuguese, Romanian, Russian, Slovenian, Spanish, Swedish
  • Documents
    • Currently the system supports these file formats: DOC, DOCX, XLSX, PPTX, ODT, ODP, ODS, HTML, HTM, XHTML, XHT, TXT, TMX, XLIFF, SDLXLIFF, and TTX.

Terminology services

Tilde’s online terminology services ensure clear, consistent communication with customers across the globe. With the Terminology API, Tilde provides services that keep terminology organized by identifying terms in documents, finding relevant translations, and assembling term glossaries. These services can be used to build comprehensive terminology solutions.

Explore APIExplore Solutions

  • Term identification
    • Identify terminology in documents and sentences using state-of-the-art linguistically, statistically, and reference corpora motivated term extraction methods. Supported document formats: PDF, Microsoft Word, Microsoft Excel, Microsoft PowerPoint, Text (.txt), Rich Text (.rtf), XLIFF, HTML, XML, MIF.

  • Term extraction
    • Extracts identified terms in document and assembles glossaries of term sets.

  • Term lookup
    • This service looks up identified terms in existing terminology resources. Translation equivalent candidates can then be acquired from parallel and comparable corpora acquired from the web using:

      • Statistical Data Base (SDB) - a large offline resource of automatically extracted multilingual terminology, which is refined by Tilde Terminology users whenever translation equivalents are validated
      • Online terminology extraction from parallel and comparable data sources found on the web and directed by users
  • Text enrichment with term translation equivalents
    • Identified terms in texts are enriched with translation equivalents acquired from terminology resources and databases. Texts can be in various formats: HTML, docx, etc.

Linguistic tools

With Tilde’s Linguistic tool API, users can access linguistic processing components of text data. The Linguistic tool API provides functionality for the following tasks: text tokenisation, sentence breaking, morphological analysis, part of speech (and for morphologically rich languages also morpho-syntactic) tagging, and language detection.

Explore Solutions

  • Tokenization
    • Domains: The component is domain independent.

      Languages: English, Estonian, Latvian and Lithuanian.

      Format: The Tokenization component works with plaintext data that is encoded in UTF -8.

      The Tokenization component allows to break down text into the smallest linguistic analysis elements – tokens. A token can be a word, a punctuation mark, a code, an e-mail address, a web address, a decimal number, etc. Tokenization as a technology is used in almost every natural language processing task.

  • Sentence breaking
    • Domains: The component is domain independent.

      Languages: English, Estonian, Latvian and Lithuanian.

      Format: The Sentence breaking component works with plaintext data that is encoded in UTF -8.

      The Sentence breaking component allows to break down text into sentences. The component is able to identify language specific sentence breaking characteristics (abbreviations, special use of punctuation for numerals, etc.). Sentence breaking as a technology is used in many natural language processing tasks (e.g., machine translation, part of speech tagging, named entity recognition, sentiment analysis, etc.).

  • Morphological analysis and lemmatization
    • Domains: The component is domain independent.

      Languages: English, French, German, Latvian and Lithuanian.

      Format: The Morphological analysis component works with plaintext data, pre-tokenised data or individual tokens that are encoded in UTF -8.

      The Morphological analysis component allows to perform fast linguistic analysis of words by identifying different possible morphological characteristics, for instance, possible parts of speech, possible lemmas, for each lemma – different morphological categories, e.g., for nouns – the gender, number, case, etc. The component is built upon finite state transducer technology that ensures processing speed. Morphological analysis as a technology is used in many natural language processing tasks (e.g., part of speech tagging, syntactic parsing, grammar checking, etc.).

  • Morphological synthesis
    • Domains: The component is domain independent.

      Languages: English, French, German, Latvian and Lithuanian.

      Format: The Morphological synthesis component works with individual tokens that are encoded in UTF -8.

      The Morphological synthesis component allows to generate all valid surface forms (or inflected forms) of a word given its lemma and part of speech. The component is built upon finite state transducer technology that ensures processing speed. Morphological synthesis as a technology is used in many natural language processing tasks (e.g., machine translation, term and named entity normalization, natural language generation, dialogue systems, etc.).

  • Part-of-speech and morpho-syntactic tagging
    • Domains: The component is domain independent.

      Languages: Latvian and Lithuanian.

      Format: The Part-of-speech and morpho-syntactic tagging component works with plaintext data, pre-tokenized (and broken into sentences) data or morphologically pre-analyzed data that is encoded in UTF -8.

      The Part-of-speech and morpho-syntactic tagging component allows to perform morphological disambiguation of words in context (i.e., plaintext data, pre-tokenized data or morphologically pre-analyzed data) using machine learning based models. For morphologically rich languages (e.g., Latvian and Lithuanian) the component can perform also smarter morpho-syntactic disambiguation of words. Part-of-speech and morpho-syntactic tagging as a technology is used in many natural language processing tasks (e.g., term extraction, named entity recognition, machine translation, sentiment analysis, etc.).

  • Language detection
    • Domains: The component is domain independent.

      Format: The Language detection component works with plaintext data that is encoded in UTF -8.

      The Language detection component allows to identify the language in which a text is possibly written. Language detection as a technology is used in many natural language processing workflows (e.g., machine translation platforms, terminology management platforms, search and indexing platforms, etc.).