Integrated Solutions API

Language technology services are the foundation of robust multilingual solutions, which can enable businesses and governments to reach across language barriers. Tilde offers a range of language technology services in several areas: machine translation, terminology, proofreading tools, speech technology, and linguistic tools. Available in our cloud platform, these services can be used by developers – through our APIs – to build new multilingual solutions, supporting languages in the digital age.

Tilde’s Translation API

With Tilde’s Translation API, users can access MT systems in multiple language pairs and domains. The MT systems are hosted in the cloud and can be integrated into any platform or application.

Domains

General
Legal
Pharmaceutical
IT

Languages

English to Bulgarian, Czech, Danish, Dutch, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Latvian, Lithuanian, Polish, Portuguese, Romanian, Russian, Slovenian, Spanish, Swedish

Documents

Currently the system supports these file formats: DOC, DOCX, XLSX, PPTX, ODT, ODP, ODS, HTML, HTM, XHTML, XHT, TXT, TMX, XLIFF, SDLXLIFF, and TTX.

Tilde's Terminology API

Tilde’s online terminology services ensure clear, consistent communication with customers across the globe. With the Terminology API, Tilde provides services that keep terminology organized by identifying terms in documents, finding relevant translations, and assembling term glossaries. These services can be used to build comprehensive terminology solutions.

Term identification

Identify terminology in documents and sentences using state-of-the-art linguistically, statistically, and reference corpora motivated term extraction methods. Supported document formats: PDF, Microsoft Word, Microsoft Excel, Microsoft PowerPoint, Text (.txt), Rich Text (.rtf), XLIFF, HTML, XML, MIF.

Term extraction

Extracts identified terms in document and assembles glossaries of term sets.

Term lookup

This service looks up identified terms in existing terminology resources. Translation equivalent candidates can then be acquired from parallel and comparable corpora acquired from the web using:

Statistical Data Base (SDB) - a large offline resource of automatically extracted multilingual terminology, which is refined by Tilde Terminology users whenever translation equivalents are validated
Online terminology extraction from parallel and comparable data sources found on the web and directed by users

Text enrichment with term translation equivalents

Identified terms in texts are enriched with translation equivalents acquired from terminology resources and databases. Texts can be in various formats: HTML, docx, etc.

Tilde’s Linguistic tool API

With Tilde’s Linguistic tool API, users can access linguistic processing components of text data. The Linguistic tool API provides functionality for the following tasks: text tokenisation, sentence breaking, morphological analysis, part of speech (and for morphologically rich languages also morpho-syntactic) tagging, and language detection.

Tokenization

Domains: The component is domain independent.

Languages: English, Estonian, Latvian and Lithuanian.

Format: The Tokenization component works with plaintext data that is encoded in UTF -8.

The Tokenization component allows to break down text into the smallest linguistic analysis elements – tokens. A token can be a word, a punctuation mark, a code, an e-mail address, a web address, a decimal number, etc. Tokenization as a technology is used in almost every natural language processing task.

Sentence breaking

Domains: The component is domain independent.

Languages: English, Estonian, Latvian and Lithuanian.

Format: The Sentence breaking component works with plaintext data that is encoded in UTF -8.

The Sentence breaking component allows to break down text into sentences. The component is able to identify language specific sentence breaking characteristics (abbreviations, special use of punctuation for numerals, etc.). Sentence breaking as a technology is used in many natural language processing tasks (e.g., machine translation, part of speech tagging, named entity recognition, sentiment analysis, etc.).

Morphological analysis and lemmatization

Domains: The component is domain independent.

Languages: English, French, German, Latvian and Lithuanian.

Format: The Morphological analysis component works with plaintext data, pre-tokenised data or individual tokens that are encoded in UTF -8.

The Morphological analysis component allows to perform fast linguistic analysis of words by identifying different possible morphological characteristics, for instance, possible parts of speech, possible lemmas, for each lemma – different morphological categories, e.g., for nouns – the gender, number, case, etc. The component is built upon finite state transducer technology that ensures processing speed. Morphological analysis as a technology is used in many natural language processing tasks (e.g., part of speech tagging, syntactic parsing, grammar checking, etc.).

Morphological synthesis

Domains: The component is domain independent.

Languages: English, French, German, Latvian and Lithuanian.

Format: The Morphological synthesis component works with individual tokens that are encoded in UTF -8.

The Morphological synthesis component allows to generate all valid surface forms (or inflected forms) of a word given its lemma and part of speech. The component is built upon finite state transducer technology that ensures processing speed. Morphological synthesis as a technology is used in many natural language processing tasks (e.g., machine translation, term and named entity normalization, natural language generation, dialogue systems, etc.).

Part-of-speech and morpho-syntactic tagging

Domains: The component is domain independent.

Languages: Latvian and Lithuanian.

Format: The Part-of-speech and morpho-syntactic tagging component works with plain text data, pre-tokenized (and broken into sentences) data or morphologically pre-analyzed data that is encoded in UTF -8.

The Part-of-speech and morpho-syntactic tagging component allows to perform morphological disambiguation of words in context (i.e., plain text data, pre-tokenized data or morphologically pre-analyzed data) using machine learning based models. For morphologically rich languages (e.g., Latvian and Lithuanian) the component can perform also smarter morpho-syntactic disambiguation of words. Part-of-speech and morpho-syntactic tagging as a technology is used in many natural language processing tasks (e.g., term extraction, named entity recognition, machine translation, sentiment analysis, etc.).

Language detection

Domains: The component is domain independent.

Format: The Language detection component works with plaintext data that is encoded in UTF -8.

The Language detection component allows to identify the language in which a text is possibly written. Language detection as a technology is used in many natural language processing workflows (e.g., machine translation platforms, terminology management platforms, search and indexing platforms, etc.).

Proofreading Tools

Tilde has developed the market-leading proofreading systems for Latvia and Lithuanian, two of the most morphologically rich and complex languages in Europe. These solutions are based on years of research and innovation in proofreading tools.

Latvian spelling checker

Verifies the spelling of every word and offers to replace a misspelled word with the correct one. Automatically changes words that are unambiguously misspelled. Tilde’s team constantly improves the spelling checker by including new lexical items and by adding new features (e.g., Intelligent AutoCorrect). The Latvian spelling checkers now recognize more than 22 million forms generated from more than 130 thousand lemmas.

Lithuanian spelling checker

Verifies the spelling of every word and offers to replace a misspelled word with the correct one. Automatically changes words that are unambiguously misspelled. Tilde’s team constantly improves the spelling checker by including new lexical items and by adding new features (e.g., Intelligent AutoCorrect). The Lithuanian spelling checkers now recognize more than 22 million forms generated from more than 130 thousand lemmas.

Latvian grammar checker

Verifies sentence structure and punctuation. Tilde's developed grammar checker is based on syntactic analysis of the text, which offers to correct the most common grammar mistakes. These include errors in word agreement, punctuation errors at the end of sentences, stylistic errors, as well as comma errors in insertions, participial phrases, equal parts of sentences, and sub-clauses.

The grammar checker also allows the program to find long distance syntactical errors between different sub-parts of a sentence. In addition, calques, slang, and some other undesirable words or language construction usage are identified. This module also corrects such simple errors as extra spaces before or after punctuation marks, mistakes in the number of opening and closing brackets, quotation marks, etc.

Lithuanian grammar checker

Verifies sentence structure and punctuation. Tilde's developed grammar checker is based on syntactic analysis of the text, which offers to correct the most common grammar mistakes. These include errors in word agreement, punctuation errors at the end of sentences, stylistic errors, as well as comma errors in insertions, participial phrases, equal parts of sentences, and sub-clauses.

The grammar checker also allows the program to find long distance syntactical errors between different sub-parts of a sentence. In addition, calques, slang, and some other undesirable words or language construction usage are identified. This module also corrects such simple errors as extra spaces before or after punctuation marks, mistakes in the number of opening and closing brackets, quotation marks, etc.

Latvian hyphenator

The hyphenator puts all the possible hyphens in the words in a text. For hyphenation, both rules defining the usual hyphenation process and exception list (words which cannot be hyphenated using just rules) are used.

Lithuanian hyphenator

The hyphenator puts all the possible hyphens in the words in a text. For hyphenation, both rules defining the usual hyphenation process and exception list (words which cannot be hyphenated using just rules) are used.

Speech Services

Speech is the next step in language technology. Though speech technology already exists for the world’s larger languages – such as English, Spanish, and German – smaller languages are underrepresented. Tilde is currently working on building speech technology services for Europe’s smaller languages like Latvian, Lithuanian, and Estonian.

Latvian Automated Speech Recognition (ASR)

Tilde was the world’s first company to create ASR for Latvian. The ARS service is based on huge database of spoken Latvian data. Since completion, the service has been integrated into a mobile app that recognizes spoken numerals.

Latvian text-to-speech service

Speech Synthesis (Text-to-Speech, TTS) technology transform the wording of an utterance into sounds that are outputted to the user.

Tilde has worked on Speech Synthesis technology development since the late 1990s. Technology for pronouncing Latvian words and texts is also included in our product Tildes Birojs, the market-leading proofreading software for Latvian.

In 2005, Tilde together with the Latvian Society of the Blind started a project to address the needs of visually impaired people using computers in Latvian. The architecture of the system covers the traditional TTS transformation, performing text normalization, grapheme-to-phoneme conversion, prosody generation, and waveform synthesis. Now it is available free of charge for all visually impaired people in Latvia.

Tilde 2021, All rights reserved, Terms of Use