Machine Translation: What It Is, and How It Works
The use of computers to translate text from one language to another has long been a dream of computer science. Nevertheless, it’s only in the past 10 years that machine translation has become a viable productivity tool in more widespread use. Advances in natural language processing, artificial intelligence, and computing power all contribute to this increasingly useful technology.
What is machine translation?
Machine translation is the process of automatically translating content from one language (the source) to another (the target) without any human input.
Translation was one of the first applications of computing power, starting in the 1950s. Unfortunately, the complexity of the task was far higher than early computer scientists’ estimates, requiring enormous data processing power and storage far beyond the capabilities of early machines.
It was only in the early 2000s that the software, data, and required hardware became capable of doing basic machine translation. Early developers used statistical databases of languages to “teach” computers to translate text. Training these machines involved a lot of manual labor, and each added language required starting over with the development for that language.
In 2016, Google had an experimental team testing the use of neural learning models and artificial intelligence (AI) to train translation engines. When the small team’s methodology was tested against Google’s main statistical machine translation engine, it proved far faster and more effective across many languages. In addition, it ‘learned’ as it was used generating constant improvement in quality.
Neural machine translation proved so effective that Google changed course and adopted it as their primary development model. Other major providers including Microsoft and Amazon soon followed suit, and modern machine translation became a viable addition to translation technology. Many translation management systems (TMSs) now incorporate MT into their solutions for their user’s workflows.
Automated vs. machine translation: What’s the difference?
Automated translation refers to any automation built into a traditional computer-assisted translation tool (CAT tool) or a modern translation management system (TMS) to automatically execute repetitive translation-related tasks.
Triggers are built into content that tell the system it can be automated. This may include inserting commonly used text such as legal disclaimers into documents from a database like a content management system (CMS).
Automated translation may be used to automate the machine translation of text as a stage in the localization workflow. However, automated translation and machine translation are not interchangeable terms as they serve entirely different functions.
What types of machine translation are there?
The three most common types of machine translation include:
Rule-based machine translation (RBMT)
The earliest form of MT, rule-based MT, has several serious disadvantages including requiring significant amounts of human post-editing, the requirement to manually add languages, and low quality in general. It has some uses in very basic situations where a quick understanding of meaning is required.
Statistical machine translation (SMT)
Statistical MT builds a statistical model of the relationships between words, phrases, and sentences in a text. It applies the model to a second language to convert those elements to the new language. Thereby, it improves on rule-based MT but shares many of the same problems.
Neural machine translation (NMT)
As mentioned above, the neural MT model uses artificial intelligence to learn languages and constantly improve that knowledge, much like the neural networks in the human brain. It is more accurate, easier to add languages, and much faster once trained. Neural MT is rapidly becoming the standard in MT engine development.
Which machine translation type should I use?
In general, the decision on which machine translation type you should use depends on:
- The budget available: Neural MT is more expensive to train than statistical MT, but the quality improvement is well worth any cost difference. Many systems are deprecating their older statistical models in favor of neural learning.
- The industry involved: Some industries demand complex and technical language that may require more sophisticated processing. Neural MT provides this.
- The language pairs you need: Statistical MT is often sufficient for certain language pairs such as Latin-based languages with similar grammatical rules and syntax.
- The amount of data you have: Neural MT requires the processing of large quantities of text for it to learn and for you to reap the benefits.
- Internal vs. customer-facing content: Customer-facing content, like marketing or sales materials that reflect brand quality, requires the most sophisticated combination of machine translation and experienced human translators doing post-editing. Basic employee communication or internal documentation may be able to be achieved by using basic machine translation when time and cost are factors.
When should I use machine translation?
Not all content lends itself to machine translation. Generally speaking, more structured content like technical documentation, legal, and IP, as well as internal communications, work better with MT than more colloquial content like marketing and branding, or other customer-facing content. In those situations, MT may be used, but the results will need more human editing, also known as machine translation post-editing, to ensure they are properly localized.
Which machine translation tool should I use?
The major developers of machine translation technology—Google, Microsoft, and Amazon—all currently use a type of neural MT as their preferred methodology since it allows for both more nuanced translation and constantly adding language pairs. This growth capability is made possible by the ability of the machine translation tools, also known as engines, to learn and improve as they are used more.
Generic machine translation engines
Machine translation works on training data. Depending on your needs, the data can be generic or custom. Generic data is simply the total of all the data learned from all the translations performed over time by the machine translation engine. It enables a generalized translation tool for all kinds of applications, including text, voice, and full documents, including formatting. Specialized training data is data fed to an MT to build a specialization in a subject matter area like engineering, programming, design, or any discipline with its own glossaries.
Generally considered one of the leading machine translation engines, based on usage, number of languages, and integration with search. Google Translate was the first MTE based on neural language processing that learns from repeated usage.
Amazon Translate is also neural-based and is closely integrated with Amazon Web Services (AWS). Some evidence suggests Amazon Translate is more accurate with certain languages, notably Chinese, however it is important when making comparisons to understand that all of these engines are constantly learning and improving.
Another cloud-based neural engine, Microsoft Translator is closely integrated with MS Office and other Microsoft products, providing instant access to translation abilities within a document or other software.
DeepL is the product of a smaller company based in Germany and is exclusively devoted to the development of a machine translation engine claiming more nuanced and natural output based on their proprietary neural AI.
Custom machine translation engines
These major, general-purpose MTEs are the big players. However, there are many specialized engines developed for specific translation management systems, scientific disciplines, and other specialized uses. They are created by taking a basic platform and training it in a discipline based on providing data specific to that discipline. A comprehensive list of machine translation engines can be found here.
What are the advantages of machine translation?
Before the introduction of neural learning, MT was still very much a beta product generating translations whose quality varied wildly, veering sometimes into being humorously poor or unreadable. Modern machine translation engines have largely changed all of that and now serve as an indispensable tool in the translation process. It can be used “as is” for less critical applications or combined with human post-editing to speed up traditional translation workflows.
Speed and volume
MT is fast, translating millions of words almost instantaneously, while continually improving as more content is translated. For very high-volume projects, MT can not only handle volume at speed, but it can also work with content management systems to organize and tag that content. This makes it possible to retain organization and context as the content is translated into multiple languages.
Large language selection
With the major providers offering 50-100 languages or more, translations can be done simultaneously across multiple languages for global product rollouts and updates to documentation.
Reduced costs and faster turnaround
The combination of high-speed throughput, as well as the ability to select from existing language pairs covering dozens of combinations, means the use of MT can cut costs and time to deliver translations, even when human translators are still post-editing the work. Basically, MT does the initial heavy lifting by providing basic but useful translations. The human translator then refines these basic versions to more closely reflect the original intent of the content and ensure proper localization per region.
Automated integration into translation workflows
Many translation management systems integrate one or more kinds of MT into their workflow. They include settings to automatically run a translation and send that off as part of the human translator content package. Given the low cost and lack of any latency in the MT step, there is really no reason to not include the machine-translated content in the automation of workflows, especially for internal documentation and communication (rather than customer-facing and brand-oriented).
Machine translation vs. human translation
It is no longer necessary to decide whether to use MT or human translation when beginning a project. The concept of post-editing, that is the editing of machine-translated content by a human linguist, is increasingly becoming accepted by translation professionals.
Best practices for machine translation post-editing:
- Prepare content for machine translation. This involves clarifying and simplifying the writing with shorter sentences, active voice, and other best practices for clear copy.
- Choose the best machine translation engine for your task. If you have developed glossaries, for example, related to a product line or project, consider starting to build a custom engine tailored to your business sector, market, or type of product.
- Choose the quality level of post-editing for the project. Light post-editing (LPE) focuses on eliminating any obvious errors or issues, while full post-editing (FPE) ensures that the content is fully localized, including the adjustment of any cultural references that may be inappropriate. Good content preparation at the beginning of the process should make this faster and easier.
How do I implement machine translation?
As soon as your global strategy is in place, you need to start thinking about implementation. Implementing machine translation doesn’t have to be a daunting task. There are a couple of steps that you can follow to make the most of it:
- Pick the right content for machine translation.
- Train the engine with your data if possible to increase the output quality.
- If you go for machine translation post-editing, you need to select a team who has training or experience with post-editing, or you need to make sure that they’re open to the idea of it.
- Run samples before deployment to get an idea of the quality or to identify areas that could be improved before deployment.
- Agree on a pricing model and be sure to involve all stakeholders, including your language service provider, in the decision.
- Deploy: Keep in mind that the results may not meet your expectations right away, but the output will get better over time.
What is the best machine translation software?
Choosing the best option can be complex with the major and specialized engines each having their own strengths and weaknesses. Ideally, there should be access to more than one engine for testing of results or to assign an engine to a project it is suited for.
The best machine translation software lets you use the optimal engine for your content
Your translation management system should have plugins or application programming interfaces (APIs) that connect the TMS to a choice of MT engines. Some systems offer the ability to automate the selection process based on artificial intelligence or algorithms that scan the content and match it to the optimal engine.
The best machine translation tool lets you increase translator productivity and reduce costs
One of the main functions of a translation management system is its ability to track the time and expenses of a project. With multiple machine translation engines in use, these metrics can be a strong indicator of the engine’s value. Is it increasing translator productivity or slowing it? Does it indicate improved efficiencies over time with one engine over another? As soon as you answer these questions, you will be able to get a better sense of its capabilities.
Let a TMS do quality estimation for machine translation
A robust translation management system offers access to multiple machine translation options. It can then make the choices for you or make recommendations for running tests of the different options. Today’s TMSs also use AI to estimate the quality of the machine-translated content. Using machine translation quality estimation (MTQE), quality scores are automatically calculated before any post-editing is done, removing the guesswork from MT and improving post-editing efficiency.
Last updated on September 24, 2022.