Software localization
Localization Best Practices: How to Avoid the 10 Most Common Pitfalls
You’re ready to release your software and show it to the world: The code seems bug-free, and the design is crisp—but does it work in languages other than English?
If your answer isn’t affirmative, you might find yourself reworking the whole app to support other languages because you missed writing the code in a way that would allow your software product to be adapted for international markets.
Localization is more than just about the mere translation of words from one language into another—it's about cultural awareness and adapting your software to the preferences, habits, and expectations of your target users.
If you don’t want to spend months fixing localization bugs, make sure you consider these 10 common pitfalls preventing apps from being properly localized for global markets—and what we suggest doing instead.
Embedding text directly into the code of your software
Embedding text directly into the code can slow down the localization process tremendously, as the translator needs to actually read the code to determine which segments need translation and which ones don’t. It also makes localization more costly than necessary, and the consistency of the translation will be difficult—if not impossible—to maintain.
Files containing hard-coded localizable content are also difficult to version control and maintain, so make sure to keep all your text in external files.
Use separate resource files
Translatable strings include titles, product names, error messages, and any other text that users might see when using your app/software. You should get all of these user-facing strings out of your code and place them into resource files, giving each string a unique name (think of it as an identifier or a key).
These resource files will be loaded by a library that uses a combination of language and country (also known as the “locale”) to identify the right string.
Once you’ve placed your strings in external resource files, you can send these files to your translation vendor and get back translated files for each locale that your application is going to support.
Be careful when choosing key IDs for your strings: The IDs should always describe the string’s role in the interface (title, button label, etc.). You should also make sure that you aren’t duplicating an existing ID when adding new strings.
There are various file formats that make suitable resource files. Popular choices are JSON, XML, gettext, or YAML. Depending on the programming language or framework you are using, there will usually be a de-facto standard format.
In Python, the GNU gettext system is quite a popular choice. A .po resource file containing the translatable strings is created for each locale:
# ./locales/en_US/LC_MESSAGES/messages.po msgid "button_order" msgstr "Order Now" msgid "login_message" msgstr "Welcome back!"
# ./locales/de_DE/LC_MESSAGES/messages.po msgid "button_order" msgstr "Jetzt bestellen" msgid "login_message" msgstr "Willkommen zurück!""
And the gettext function is used to get the appropriate translation:
import gettext de_DE = gettext.translation('messages', localedir='locales', languages=['de_DE']) de_DE.install() print(gettext("login_message")) # Willkommen zurück! print(gettext("button_order")) # Jetzt bestellen]
Not accounting for varying language lengths
Don’t assume every language is as concise as English. English text is often very compact in comparison to other languages—like German or Finnish—and translations can vary considerably in length and density.
If you don’t prepare for this and there isn’t enough space, your strings might overlap with other controls and the interface will require editing after translation.
Design for +50% and give strings room to grow and shrink
The size of the interface must be adjustable to accommodate the length of translations provided at runtime.
You can solve this problem by leaving extra space after each label for the string to grow. However, by doing so, the labels and controls might appear pretty far apart from each other in compact languages. Some developers give their labels room to grow and shrink by aligning them to the right or by placing them above the controls.
You can also use layout managers that understand how locale affects a UI and manage the pixel positioning of widgets for you at runtime, so your interface will adjust properly.
Another way to solve this issue is by storing the dimensions for a label in the locale resource file.
Specifying a language but not a country
Sometimes a language differs depending on the country in which it's spoken because different regions may speak and spell a shared language with nuanced differences (e.g., British English differs from American English). Specifying a language but not a country code can make localization difficult.
Always use a full locale
Be as precise as possible, and always use a full locale property instead of just a language. Locales contain both the language and the country code where it's spoken, such as fr-FR (French in France) or en-GB (English in Great Britain). This allows your app to support alternate spellings, date formats, and other differences between two countries with a shared language.
# ./locales/en_US/LC_MESSAGES/messages.po msgid "login_message" msgstr "Hi there!"
# ./locales/en_AU/LC_MESSAGES/messages.po msgid "login_message" msgstr "G'Day Mate!"
Concatenating strings
Some developers love to create concatenated pieces of sentences using placeholders, where the order of words and phrases is hard-coded.
Splitting sentences into several keys presumes grammar rules and a certain sentence structure. If you use conditional statements and conditionalize single terms or a portion of a sentence, the granularity of conditional text might cause confusion during the translation process.
In this (intentionally bad) example, the structure is fixed and the sentence is broken up into tiny strings:
msgid "welcome_back_msg_start" msgstr "Hey " msgid "welcome_back_msg_end" msgstr ", welcome back!"
print(gettext('welcome_back_msg_start') + username + gettext('welcome_back_msg_end')) # Hey John, welcome back!
These word puzzles are very hard and sometimes almost impossible to translate, and will give translators a bitter hatred for your shenanigans, as they may only see parts of the sentence while translating and have to guess what belongs together.
Nobody likes guessing games!
Don’t assume grammar structures and be careful with granularity in conditional text
The structure of the sentence will often be completely different in another language. Therefore, it's best to create strings that are complete sentences.
Translators must be able to control the structure of a sentence, change the order freely, and insert all kinds of prefixes, suffixes, and any other grammar elements.
If a string contains a placeholder, always explain what each placeholder means and allow the translator to change the word order if necessary. Sometimes you are safer setting a condition at the sentence level.
Considering the above, here is a better example. The translator can freely move the placeholder and fully control the structure of the sentence:
msgid "welcome_back_msg" msgstr "Hey %(username)s, welcome back!"
print(gettext('welcome_back_msg', username="John")) # Hey John, welcome back!
Not supporting Unicode
Whenever you use a wrong character encoding and your source code handles strings using a data type that cannot handle Unicode, translations will break. Programming languages often store files using the system’s default encoding.
However, when your server is English and all of your users are browsing in Chinese, your characters will get corrupted.
Always use UTF-8
Therefore, another of our localization best practices is to make sure you use UTF-8. It's almost always the best choice as it fixes this issue by standardizing the encodings across browsers and servers.
So, ideally, every layer in your stack should use UTF-8: HTML, HTTP server, database as well as the application itself. Only when you’re working primarily with Asian languages, you might need UTF-16.
Specify the charset in the <head>
of your HTML document:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Verify your HTTP server is sending the correct HTTP Content-Type header:
Content-Type: text/html; charset=utf-8
Use UTF-8 in your database:
# MySQL CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;
Hard-coding numbers, units, dates, and times
Hard-coded date, time, or currency formats will cause trouble during the translation process, as languages and countries differ in date and time formats. 26.04.2015 or 04.26.2014? 14:00 or 2 p.m.? 1,000 miles or 1,609 kilometers?
Use a library to support different locales
As mentioned previously, never hard-code numbers, units, dates, and times, assuming that they don’t need localization. Go for localizable strings instead, and let translators decide what's best for their language.
You can store all dates and times in a standard ISO format and use a library to format them for the given locale. It will also help to convert time to different time zones.
The same applies to currencies and other number formats. So, always use a library with localized files for each of the locales your software needs to support.
Here’s an example using Python’s Babel library:
from babel.dates import format_datetime from babel.numbers import format_currency print(format_datetime(locale='ru_RU')) # 26 июля 2013 г., 15:48:18 print(format_currency(10.50, 'EUR', locale='de_DE')) # 10,50 € print(format_currency(10.50, 'USD', locale='en_AU')) # US$10.50
Not considering vertical writing and right-to-left languages
Arabic, Hebrew, and some other languages go from right to left and East-Asian languages using Chinese—or traditional Mongolian, if you feel adventurous—characters have a long history of vertical writing.
Prepare for a complex text flow
Don’t assume that the same rules apply to all languages and expect the need for implementing specialized versions for a complex text flow, e.g., vertical writing, and plan for languages that read right to left.
When it comes to vertical writing, strings are, for example, not rotated by 90 degrees. Instead, single characters are placed under one another.
You can include a direction string in the resourced strings and use that string to load a different stylesheet based on the current locale. There’s also a direction property in CSS.
Here's an example:
h1 { direction: rtl; }
<h1> Read me from right-to-left. </h1>
Creating ambiguity due to lack of context
When strings include variables, are used in a specific context, or have ambiguous wording, your translation vendor will likely have a hard time deciphering them. Translators usually work on files and strings in a context-free format. So, how will a translator know whether the single term “Contact” is a verb for a button or a noun for a label?
Provide localization notes and use code comments
Keep in mind to add comments and notes to the localizable files.
Aside from glossaries and style guides, you can provide context information to translators directly in your source files. The more context you give—by writing notes for translators and providing alternate phrasings—the better.
If you’re working with content in text-based code files (XML, HTML, JSON and so on), make sure to use code comments. If you handle your translations in a spreadsheet, you can easily add a column for context notes. For an even better understanding, provide screenshots.
Remember that context is king when it comes to software translation and localization—the more context, the better!
Using images that contain text
Images are a great way to save localization costs as they cut down the word count for translation and may even make your product easier to understand—not to mention they are visually more appealing to the reader.
However, sometimes images that contain text can be a serious pain for translators and can slow down and otherwise hinder the translation process. In some cases, it could even result in you paying more money.
Separate text from graphics
If a text needs to be associated with a graphic, try to separate your text from the image and create the text as a separate component.
If the text is separable, managing localized versions becomes a lot simpler.
Ideally, images should not contain text at all, because it eliminates the need to translate it. Pay attention to cross-cultural differences too, as not all images and symbols carry the same meaning across borders.
Not worrying about localization until it’s too late
Small mistakes can prevent your software from working in other languages. Errors in source content can be replicated, or worse, amplified in various language versions, and this can derive in months of work fixing localization bugs.
Don’t let this happen to you!
Test localizability early and often
You can save yourself a lot of trouble in the long run when you start testing for localization early and often.
As a developer, you can use automated tests of test translation files and character encoding for the localized version of your software.
Always test your patches not just for code errors but also check strings for grammar errors, capitalization. inconsistencies, and localizability issues.
Having localization in mind when creating the original software appeases the localization process a lot. If you avoid these 10 common pitfalls and diligently follow proven localization best practices, your software should be fully localizable and open up to the global market.