Machine Translation Glossaries: Why They Matter, and How to Use Them
With machine translation (MT), precision and recall are critical to success. Every translation counts. The more curated and accurate the information you provide for your MT engines, the better they’ll perform.
One of the simplest ways to customize machine translation is using machine translation glossaries. Learn what they are, why they matter, and how to leverage them to improve MT output in the long run.
What are machine translation glossaries?
Glossaries, in the context of machine translation, are a collection of words and phrases with a preferred machine translation. They’re sometimes referred to as:
- Custom terminology
- Custom vocabulary
- Custom dictionaries, etc.
MT glossaries are similar to term bases, but instead of being used by linguists, they are designed to be used by machine translation software.
When attached to MT engines, glossaries help improve the quality of the MT output by ensuring that the MT engines correctly apply pre-determined terminology.
Before a source text is translated by an MT engine, it will compare the attached glossary file to the source text to identify terms that have a preferred translation and apply those.
It’s important to note that an MT glossary doesn’t re-train an engine—it simply overrides any appropriate term with a predetermined translation.
Why are MT glossaries important?
Machine translation tools have dramatically improved in output quality over the past few years. Nevertheless, they still lack the contextual understanding of a human translator.
This means they can make some very basic errors, especially when handling an ambiguous word or a term that has a specific meaning in a given context.
Since glossaries are adapted to a domain’s or company’s specific terminology, they help machine translation output be far more accurate than if the engine just drew from general-purpose data sets.
How do MT glossaries work?
The steps that an MT engine usually follows are:
- Receive a source text
- Translate the source text
- Display the output translation
With an MT glossary included, MT engines add an intermediate step to the process:
- Receive source text
- Translate the source text
- Search and replace the translation with your preferred terminology
- Present the output translation
To put it another way, with the help of glossaries, the MT engine searches for matches and automatically applies them while translating.
For example, suppose you have a brand for a Bluetooth speaker called “Connected,” and you want to translate the following sentence into Spanish: “Your Connected device was not detected.”
Without an MT glossary, your MT engine would produce something like the following result: “No se ha detectado tu dispositivo conectado” (literal back-translation into English: “Your connected device was not detected”). As you can see, the brand name “Connected” has been translated as “conectado,” which would be incorrect in this case.
If you add the brand name “Connected” to your MT glossary, you can enforce the non-translatability of the term. In that case, the MT engine will produce this result: “No se ha detectado tu dispositivo Connected.” This is spot on—using an MT glossary significantly improves accuracy by automatically providing the desired translation.
Best practices for using MT glossaries
To ensure MT glossaries remain reliable and always up to date, here are a few best practices to follow:
- Keep it simple: Small glossaries, focusing only on the most essential terms, tend to be more effective—massive glossaries could even harm your translation output.
- Limit customizations to words that you only want to be translated in one way: The translation suggested by the MT engine should match exactly what you want.
- Ensure glossaries are free of errors: Keep your terms free of spelling mistakes, formatting errors, or incorrect translations.
- Avoid having duplicate terms: MT engines can struggle to apply the correct term if multiple instances are found.
- Post-edit essential translations: While glossaries can enhance translation quality, don’t trust them blindly—high-quality human checks on your MT output are always the best guarantee of accuracy. This process is called “post-editing.”
- Be mindful of your language pair: In morphologically complex languages, like Finnish, Arabic, or Turkish, words may change shape depending on the context—so customizations for these languages may not always produce the best results.
- Review documentation: Although the basic glossary functionality is similar across MT engines, the specifics might differ; it may be helpful to read the available documentation to find out how to best work with a given engine.
- Not all kinds of terms are appropriate for glossaries: For the best results, focus on compound nouns; examples often include product names, like “Postmates” or other specific terms like “WeWork.”
What terms are suitable for MT glossaries?
To maximize the impact and accuracy of MT glossaries, it’s important to use them for specific types of terms:
- Product names like “Ford Mondeo,” “Samsung Galaxy Note 5,” etc.
- Company names like “Apple,” “Microsoft,” etc.
- Ambiguous words, e.g., homonyms (multiple-meaning words) like “crane” (a machine vs. an animal) or “lead” (the metal vs. a potential client)
- Abbreviations: A shortened form of a word or phrase that’s frequently used in the industry or domain of interest, e.g., TMS for “translation management system”
- Borrowed words: Foreign words that the MT engine will likely keep in the original language, like the French “côte de boeuf” dish, but which you want to translate nevertheless—in this case, “rib eye”
What terms are less suitable for MT glossaries?
At the same time, some morphological categories are less suitable to be documented and used in a machine translation glossary:
- Verbs: MT glossaries can’t conjugate them correctly in grammatical person, number, gender, tense, aspect, mood, voice, degree of formality, clusivity, transitivity, or valency.
- Inflected languages with many cases and grammatical genders: MT glossaries can’t currently change the form or ending of some words when the way in which they’re used in sentences changes.
Managing MT glossaries for all engines directly within a translation management system
Translation management systems (TMS) allow localization managers not only to centralize and automate the localization workflow but also make full use of well-established translation technology like translation memories and glossaries.
Modern TMS solutions, like Phrase, enable the use and management of glossaries without the need to upload and manage them with each individual MT provider.
In Phrase, you can directly upload, edit, and use MT glossaries for all supported engines, which can significantly reduce the amount of deployment and management time.
How does glossary support work with each MT engine in Phrase?
MT glossaries are available as a part of Phrase Translate, the suite’s machine translation add-on. Besides MT glossaries, Phrase Translate subscribers can take advantage of a number of fully managed machine translation and advanced AI-powered features like MT quality estimation and MT autoselect.
Through Phrase Translate, users can also add their own MT glossaries, which they can apply to fully managed MT engines:
- Google Translate
- Amazon Translate
- Microsoft Translator
- Rozetta Translate
- Tencent TranSmart
As soon as you create a custom glossary, you need to attach it to an existing MT profile. You can create multiple MT glossaries and use them for different translation projects.
Looking to the future
MT glossaries are a simple and effective way to increase machine translation output quality. This is especially true for:
- Domains with low-frequency terms of translation memories that aren’t very large or well-curated
- Small- to mid-sized companies without big enough datasets to use custom MT
- Bigger companies that have compiled substantial amounts of terminology data over several years or decades—the data isn’t consistent or language or style best practices have evolved or changed
Nevertheless, MT glossaries come with limitations as well. At some point, an MT glossary can get so large that it can hinder localization managers who manage it—regular updates may become a headache and have a higher risk of accidentally introducing errors.
Equally important, most MT glossaries available on the market still have a search-and-replace functionality. With the continuous improvement in MT technology, engines are expected to get even better and let everyone use glossary terms with morphologically correct inflections.
To make the most of their machine translation efforts, localization managers should always prioritize their needs and available resources before deciding if custom machine translation glossaries are right for their use case.
Last updated on September 23, 2022.