For customers that don’t have enough data for building a MT system, we can draw from our rich resources to boost a system’s capabilities. Tilde Data Library includes 12.35 billion parallel sentences and 23.85 billion monolingual sentences in 124 languages. Represented domains include pharmaceutical, IT, legal, and finance.

MT RESOURCES

  Total Free
Number of languages 124 117
Number of corpora 1026 164
Size of parallel corpora 12.34 billion 4.01 billion
Size of monolingual corpora 23.85 billion 4.92 billion

 

Full language statistics

See our list of MT corpora