7 Key Steps to Successful Machine Translation Implementation
Most large enterprises around the world have moved beyond the why of using machine translation (MT) for global growth. Particularly the rise of neural MT prompted companies to see MT as a viable productivity tool for entering new markets more quickly while keeping costs low.
Nevertheless, when it comes to implementing MT into globalization strategies, many companies get overwhelmed with adopting a method that fits their goals and business needs. Some of the questions topping their globalization agendas include:
- How best to integrate MT into an existing localization workflow?
- What type of content is most suitable for MT?
- Which machine translation provider to choose?
- Which language pairs to focus on when using MT?
- Where to look for and how to obtain data for training custom MT models?
- How well does the MT data perform and how to clean data for custom MT model training?
- How to evaluate the performance of custom MT models?
- In what ways can machine translation post-editing (MTPE) be effectively used and at what price?
If you’re looking for answers to these questions, too, you’ve come to the right place. Follow this step-by-step guide to learn how to implement machine translation with confidence and avoid big, costly mistakes in the long run.
Select the type of content you want to translate with machine translation
You need to be highly selective about the content you translate as not all content is suitable for MT. The less creative or literary the content the better. If you localize marketing content or culturally-specific texts, MT may not be the best choice. Your content may change in the future, but for now, it’s better to be safe than sorry.
In general, you can follow these 3 key rules when you need to decide on using machine translation for a certain content type:
- Use raw machine translation for low-impact and unambiguous content (e.g., internal documentation, instruction manuals, chat/email support messages, website footers, etc.)
- Apply light or full post-editing to more sensitive content (e.g., product titles, product descriptions, knowledge bases, FAQs, etc.)
- Stick to human translation when branding and culture come into play (e.g., homepages, landing pages, newsletters, press releases, advertising banners, SEO content, etc.)
Check privacy policies
Review carefully the personal data use and protection policy of your machine translation provider. You should know what happens to your data and how it’s stored. If you have any confidentiality constraints in place, be certain they’re fulfilled by the technology you’ve chosen.
For example, Google’s Cloud Translation offering makes an official statement that it doesn’t use any content submitted for translation for any purpose other than providing the translation service. Nevertheless, it’s unclear how the company uses the information submitted to the free version of Google Translate—or if the data influences business decisions in any way.
Train your MT engine with your data—if possible
Whenever possible, train the MT engine of your choice with your own language data. To achieve good results you would typically need about 100K segments. There are many options for training data, such as building your own corpora, relying on publicly accessible corpora, or buying corpora from others. In any case, the data you use for training should be relevant to your translation needs and always of high quality.
Put together a strong post-editing team
With continuous improvements in quality, machine translation post-editing is now one of the core advantages of MT as a strong alternative to translating from scratch. To make the most of it, you need to rely on a team of post-editors who have training or experience with MT post-editing or who are at least open to the idea of it.
Post-editing is a slightly different ball game to translation and requires a specific set of skills. MT post-editors need to be able to make quick decisions on whether to use or delete MT output. They also need to be detail-oriented—certain MT output can be grammatically perfect but may not be the correct translation of the source.
Don’t force your linguists to do something they’re not completely open to or trained to do. You don’t want to ruin your relationship with them nor do you want to end up with a mediocre-quality output.
Run samples before deployment
The larger your samples the better. The first attempts may yield disappointing results, which aren’t worth sending to your post-editors. You can compare human translation with MT output and generate an edit-distance report. If the result is good, go for it. Otherwise, you may want to hold on and verify what can be improved.
Be ready for different results between language pairs as some pairs perform better in machine translation. For example, Japanese and Finnish have a long way to go before they reach the same level of quality as Italian and Spanish.
Agree upfront on a pricing model
Be sure to set a pricing model from the start and involve all stakeholders in the decision to avoid unpleasant pay disputes. Take into account the type of content, language pairs, the technology you use, your sample test results, etc. If you want to know upfront how much you may be spending, you can use historical performances—which isn’t always fully reliable—or technologies that are able to generate some kind of machine translation quality estimation (MTQE). You’ll then need to compare the result with the actual post-editing effort to verify that the initial estimation was correct.
Deploy and stay open to change
Once you have deployed, keep in mind that the results may not meet your expectations right away. Some projects will be good and others less so, but don’t be discouraged. Your machine translation engine will continually improve over time thanks to training and tuning. The more MT projects your team processes the more advantages you’ll see in the long run—and even your team may wonder how they ever managed before implementing MT.
Last updated on September 21, 2022.