Quest for Quality: Exploring Machine Translation Customization
Businesses worldwide are now leaning on machine translation (MT) more than ever. In 2021, the MT market size exceeded $800M and is projected to grow at an annual rate of 30% until 2030. This is largely because MT has become more reliable, and the pressure for global brands to deliver customized content in multiple languages faster keeps growing—all while keeping costs in check.
Machine translation increasingly delivers fast and cost-effective translations, but maintaining translation quality remains a pressing issue.
To succeed in today’s fast-paced global market, companies must localize content at scale that aligns with their domain, captures the right tone, and keeps their brand voice consistent across languages and distribution channels.
That’s where machine translation customization comes into play. By adapting and training MT engines to provide more optimal output, MT customization gives a strategic edge to companies aiming to connect with international audiences, drive engagement, and increase conversions across markets.
Keep reading to find out how you, too, can make MT customization work for your business.
What is MT customization?
Picture this: In the heart of a bustling city, a skilled tailor has made a name for himself by crafting suits that epitomize precision and artistry. Just as he tailors each suit to the wearer’s unique needs, MT customization refines translation engines for industry-specific accuracy.
This fusion of craftsmanship and technology ensures that translations seamlessly fit their context, much like a bespoke suit conveys style and confidence. Both endeavors exemplify how attention to detail transforms ordinary elements into exceptional outcomes.
MT customization is the process of creating, deploying, and maintaining a machine translation engine using data to generate high-quality translations in a specific language pair and domain. Also known as custom MT, it ensures that the final output aligns seamlessly with the unique requirements of a particular domain or industry.
The evolution of MT customization
To truly appreciate the power of MT customization, let’s rewind a bit. Not too long ago, the idea of building a personalized MT engine appeared rather distant. It was a resource-intensive endeavor that required substantial technical expertise. That’s why choices were limited: One could make a significant investment, possess technical know-how, or rely on a costly external partner.
However, just like all technology, MT customization evolved. As early as 2017, several MT providers began exploring ways to make customization more accessible. The aim was to empower language enthusiasts and developers with the ability to craft tailored MT solutions without breaking the bank.
In 2018, Google unveiled AutoML, a groundbreaking tool intended to democratize the MT customization process. Sundar Pichai, Google’s CEO, succinctly captured its essence:
We hope AutoML will take an ability that a few PhDs have today and will make it possible in three to five years for hundreds of thousands of developers to design new neural nets for their particular needs.
Today, the landscape is quite different. You have access to a variety of customizable MT engines, in addition to generic engines that offer varying degrees of customization.
What was once a cost-prohibitive endeavor has now transformed into an accessible resource for those seeking precision and excellence in their translations.
Your up-to-the-minute guide to machine translation
Learn about new technologies to improve machine translation output quality, the latest on MT post-editing pricing models, and how to best shop for machine translation.
The value of MT customization
Custom machine translation doesn’t only bring the power of generic MT engines—it goes the extra mile. In a translation race where time is of the essence, custom MT keeps the pace, swiftly processing large volumes of text. It helps you avoid the time-consuming complexities of human translation and moves forward as a cost-effective solution, freeing up resources that you can invest to improve quality.
Quality is precisely what sets custom machine translation apart from generic MT engines.
The quality of the translation depends on the quality of the machine translation models used. Think of trained models in custom machine translation as language experts. Their keen understanding of linguistic nuances ensures superior-quality translations. This honed skill means fewer bumps on the translation road and minimal to no machine translation post-editing.
With higher efficiency and unprecedented quality and accuracy, global businesses can quickly roll out multilingual content and strategically allocate resources to improve the overall customer experience. This, in turn, results in improved brand perception, higher customer engagement, and increased conversions across markets—driving sustained growth for the business on the international stage.
Who can benefit from MT customization?
As MT customization has become more accessible, it’s now a resource that a wider range of users can take advantage of.
On one hand, any organization that has a sufficient amount of translation data suitable for training can tap into this transformation. Recent advances in MT customization have reduced the required volume of data quite significantly. A sufficiently large translation memory (TM) is all you need to start amplifying your linguistic capabilities.
On the other hand, organizations that navigate large volumes of content within specific domains stand to gain significantly.
|How specific industries benefit from machine translation customization|
|Ecommerce and online retail||In the ecommerce and online retail sector, custom MT engines can translate product descriptions and user reviews, thus enhancing the overall shopping experience.|
|Travel and hospitality||Within the travel and hospitality industry, property listings and user reviews can be rendered with a personal touch.|
|SaaS (software as a service)||Software companies can benefit from user documentation, help content, and manuals being tailored to their specific industry jargon and terminology.|
|Automotive||Car makers can benefit from MT customization for various materials, including customer comments, dealer feedback, manuals, and production protocols—with a projected business value reaching several million, as per the example of BMW.|
|Finance and fintech||In the financial and fintech industry, MT customization proves valuable to accurately translate industry-specific vocabulary, incorporate risk-related terminology, and align with the preferred tone of each client for compliance documentation, regulations, and financial reports.|
|Pharmaceutical||The pharmaceutical industry can transform the translation challenges of the medical jargon included in prescriptions, patents, clinical trials, test results, and marketing material into benefits ensuring maximum accuracy and fluency with customized systems.|
What are the types of MT customization?
There are 2 primary forms of MT customization: light and full. Your choice between light and full MT customization depends on the nature of your translation project and the desired level of accuracy.
It’s similar to selecting your attire for a trip: A light outfit suits a family weekend, while a full suit is ideal for a business trip. The more you move from general to industry-specific content, the greater the customization required.
Light MT customization
Light MT customization entails tweaking engine-specific features to fine-tune translations. Think of it as adjusting the dials on a radio to get the best sound quality. This includes, among other things:
- Glossary adaptation
- “Do-not-translate” lists
- Translation memory adaptation
- Stylistic control
For example, DeepL’s formality feature showcases light customization.
Full MT customization
Full MT customization takes the process a step further. It involves training an MT engine using meticulously curated datasets to generate translations that precisely capture jargon, terminology, style, and tone of voice.
Essentially, full MT customization results in a translation engine that speaks your language—both figuratively and literally.
How to prepare your data for MT customization?
A short while back, organizations needed to feed millions of segments to train an MT engine. However, those days are gone—the process now needs considerably fewer segments.
What holds the key to training an MT engine is bilingual data. The greater the volume and variety of quality bilingual data, the better equipped the engine becomes in generating high-quality translations in the long run.
Key types of data used for MT customization
There are 2 pillars of data that underpin MT customization: translation memories and corpora.
Translation memories (TMs) stand as the bedrock of linguistic evolution. They have become accessible and familiar to most organizations operating in the translation and localization industry.
Just a few years ago, TMs were mainly regarded as repositories of human-revised translations. However, they are now invaluable in shaping the trajectory of MT engines, guiding them to replicate content with remarkable accuracy.
Corpora are large, structured collections of texts in multiple languages. These texts are carefully curated datasets acquired from external sources and selected to serve as training data for MT models.
By supplementing the TM data, corpora work with great efficacy, improving both efficiency and precision—particularly in specific language pairs and specialized domains.
Embracing corpora enriches the localization journey, fostering a well-rounded approach that harnesses the inherent strengths of both internal and external linguistic resources.
10 key steps to creating a machine translation strategy
Learn how to design a machine translation strategy that can help your brand connect with international customers at full speed.
Best practices for machine translation data cleaning
TMs and corpora are the fundamental blocks of data for MT customization. To provide your custom engine with a solid foundation, it is essential to first prepare meticulously curated training data. For this data cleaning is essential.
Various techniques can help you refine and enhance data quality, optimizing the engine’s performance:
- Filter segments by age
- Align source and target segments
- Segment length
- Remove non-translatables
- Remove duplicates
- Language check
- Inline tags
Previously, data cleaning relied on extensive (and expensive) manual review, but a large part of data preparation can now be automated. These strategies all work in synergy to refine and clean the data, ultimately enhancing the effectiveness of the training process. Let’s take a look at each of them.
Filter segments by age
For certain types of documents, filtering TM segments based on their age is a fundamental technique to clean data for MT as the efficiency of engine training is influenced by segment age adaptation to content.
The golden rule is to maintain the right balance between timeliness and relevance for ensuring accurate training. Utilizing segments that are either too outdated or overly current can backfire, especially when dealing with inherited or legacy translation memories whose quality, origin, attributes, and historical usage are not controlled.
Align source and target segments
Timeliness goes hand in hand with accuracy—this is where the alignment of source and target segments comes into play. It’s imperative to meticulously validate that segment pairs intended for training accurately convey the same meaning. This alignment safeguards against any discrepancies or inconsistencies that might negatively impact the performance of the MT engine.
Check segment length
Segment length is also crucial in data refinement. Pairs of segments that are excessively lengthy or unusually short can hinder the quality of MT. It can also be necessary to do this for purely technical reasons, as some customizable MT engines frequently impose segment length restrictions.
To address this, you can apply techniques like implementing a minimum character count, establishing guidelines for sentence pair length, and maintaining a balanced length ratio.
Next up is removing non-translatable elements. Some words or phrases might lack direct translations between languages, some do not require translation at all—for example names and addresses. It’s advisable to eliminate them from the data to prevent confusion and inaccuracies in the translation process.
Preventing data redundancy is just as important. Eliminating repeated or nearly identical segment pairs helps maintain data integrity, preventing undue influence on MT output.
Language checks matter as well. Sometimes translation memories used for customization can contain segment pairs with the wrong language pair. Making sure that all segments align with the desired language is vital for maintaining consistent and accurate customization.
The existence of inline tags within translation memories calls for attention. These tags, which often denote variables or special formatting, might not be supported consistently across different MT engines. That’s why, in certain instances, it’s worth excluding them from the training data to prevent potential inconsistencies in translation outcomes.
Interactive MT report: Uncover top performers
Find out how leading machine translation engines perform for different content types using the latest data in our quarterly machine translation report.
A brief look at the training of custom MT models
The realm of MT customization is intricate and unveils a dynamic landscape of training custom MT models. Below is an overview of the most popular MT models with customization support:
- Amazon Active Custom Translation offers an agile platform driven by user input showcasing human-machine collaboration.
- Globalese Custom NMT blends neural networks with advanced post-editing, ensuring meticulous adaptation.
- Google AutoML Translation refines models through iterative learning.
- IBM Custom NMT emerges as an exemplar of AI-powered precision, while Microsoft Custom Translator’s adaptive learning captures context intricacies.
- RWS Language Weaver focuses on domain specificity, ensuring robust comprehension.
- SDL PNMT and Systran PNMT present cutting-edge neural models for intricate language pairs.
- Tilde stands as a seasoned player integrating linguistic expertise.
- Yandex Translate Custom fosters fine-tuned translations.
- Phrase NextMT is the first neural machine translation engine developed with a translation management system in mind, providing Phrase customers with a greater degree of customization, automation, integration, and superior reporting. Now thanks to the Phrase Custom AI platform, it supports full customization.
What it takes to train a custom MT model
Training a custom machine translation model usually consists of multiple steps, roles, and timeframes. In the case of Microsoft Custom Translator, Google Translate AutoML, and Amazon’s Active Custom Translation, individuals with technical expertise play crucial roles and invest approximately:
- 10+ minutes for account creation
- 30+ minutes for the initial setup
- 30+ hours for parallel data preparation
- 30+ minutes for billing
- 6+ hours of training
Alternatively, with Phrase Custom AI, the custom model training process becomes more streamlined and user-friendly. It’s now possible to significantly reduce the time, expertise, and resources required to train your own custom machine translation engine.
Thanks to Phrase Custom AI, a process that previously took weeks can now be achieved in a matter of hours. Phrase Custom AI uses AI-powered data filtering, automated evaluation, and an intuitive interface to make engine customization available for everyone.
Charting MT model evaluation and fine-tuning
The journey in machine translation doesn’t stop at training a model—it’s just the beginning. The success of machine translation models depends on a careful process of evaluation and fine-tuning. You can assess the quality of machine translation models by using automated metrics, post-editing metrics, as well as through human evaluation.
|Machine translation evaluation methods|
|Automated metrics||BLEU, COMET, TER, chrf3, and METEOR provide quantifiable insights into translation fidelity.|
|Human evaluation||Involves standardized questionnaires to capture nuances only comprehensible to humans.|
|Post-editing metrics||TER, editing time, edit distance, thinking time, and more offer concrete measures of translation accuracy and efficiency.|
Your destination is right ahead: MT customization
The journey of MT customization doesn’t come to a stop after a single evaluation—it continues with ongoing and regular assessment. This continuous expedition empowers MT engines to adapt seamlessly to the constantly changing linguistic landscape.
Similar to how explorers update maps before embarking on new journeys, custom MT engines undergo periodic retraining using updated data. This process hones their abilities and boosts their performance, resulting in translations that embody practical excellence—customized to the company’s context and language—signifying a significant return on investment.
Unlock the power of machine translation
Discover advanced machine translation management features within our enterprise-ready TMS and create new business opportunities worldwide more quickly and efficiently.
Last updated on September 21, 2023.