You are ready to release the new app, the software localization code is bug-free and the new design is crisp. But does it work in languages other than English?
If your answer is “no”, you might find yourself reworking the whole app to support other languages, because you did not care about this when writing your code.
Internationalization, or i18n if you want to be cool, is the process of developing a piece of software or an app so that it can easily be translated and localized into other languages – and it’s much easier if you do it right from the beginning.
If you don’t want to spend months fixing software localization bugs, you should consider these 10 common pitfalls that prevent applications from being properly translated and localized. This article covers each of these issues, explaining what to avoid and providing some key software localization best practices.
1. Embedding Text Directly to the Code
Embedding text directly to the code will slow down the software localization process tremendously, as the translator needs to actually read the code to determine which segments need translation and which ones do not. Also, it makes localization more costly than necessary and the consistency of the translation will be difficult – if not impossible – to maintain.
Use Separate Resource Files
Those strings include titles, product names, error messages, and any other text that users might see when using your app/software. You should get all of these user-visible strings out of your code and place them into resource files, giving each string a unique name (think of it as an identifier or a key) and specifying different translation values for that string.
These resource files will be loaded by a library that uses a combination of language and country (also known as the “Locale”) to identify the right string.
Once you’ve placed your strings in external resource files, you can send these files to your translation vendor and get back translated files for each locale that your application is going to support.
Be careful when choosing key IDs for your strings. The IDs should always describe the string’s role in the interface (title, button label, etc.). Make also sure that you are not duplicating an existing ID when adding new strings.
There are various file formats that make suitable resource files. Popular choices are JSON, XML, gettext or YAML.
Depending on the programming language or framework you are using, there will usually be a de-facto standard format.
In Python, the GNU gettext system is quite a popular choice. A .po resource file containing the translatable strings is created for each locale:
msgstr "Order Now"
msgstr "Welcome back!"
msgstr "Jetzt bestellen"
msgstr "Willkommen zurück!""
And the gettext function is used to get the appropriate translation:
de_DE = gettext.translation('messages', localedir='locales', languages=['de_DE'])
# Willkommen zurück!
# Jetzt bestellen
2. Pixel-based Layouts / UI Requires Editing after Translation
Don’t assume every language is as concise as English.
English text is often very compact in comparison to other languages – like German or Finnish, for example – and translations can vary considerably in length and density.
If you don’t prepare for this and there isn’t enough space, your strings might overlap other controls and the interface will require editing after translation.
Design for +50% and Give Strings Room to Grow and Shrink
The interface must be able to adjust size to accommodate the length of translations provided at runtime.
You can solve this problem by leaving extra space after each label for the string to grow. However, by doing so, the labels and controls might appear pretty far apart from each other in compact languages. Some developers give their labels room to grow and shrink by aligning them to the right or by placing them above the controls.
You can also use layout managers, that understand how locale affects a UI and manage the pixel positioning of widgets for you at runtime, so your interface will adjust properly.
Another way to solve this issue is by storing the dimensions for a label in the locale resource file.
3. Specifying a Language but Not a Country
Sometimes a language differs depending on the country in which it’s spoken because different regions may speak and spell a shared language with nuanced differences (en-GB and en-US). Specifying a language, but not a country code can make localization difficult.
Always Use a Full Locale
Be as precise as possible, and always use a full locale property instead of just a language, which contains both the language and the country code where it’s spoken, such as fr-FR or en-GB. It supports alternate spellings, date formats and other differences between two countries with a shared language.
msgstr "Hi there!"
msgstr "G'Day Mate!"
4. Concatenated Strings
Some developers love to create concatenated pieces of sentences using placeholders, where the order of words and phrases is hard-coded. Splitting sentences into several keys presumes grammar rules and a certain sentence structure. If you use conditional statements and conditionalize single terms or a portion of a sentence, the granularity of conditional text might cause confusion during the translation process.
In this (intentionally bad) example, the structure is fixed and the sentence is broken up into tiny strings:
msgstr "Hey "
msgstr ", welcome back!"
print(gettext('welcome_back_msg_start') + username + gettext('welcome_back_msg_end'))
# Hey John, welcome back!
These word puzzles are very hard and sometimes almost impossible to translate, and will give translators a bitter hatred for your shenanigans, as they may only see parts of the sentence while translating and have to guess what belongs together.
Nobody likes guessing games.
Don’t Assume Grammar Structures and Be Careful with Granularity of Conditional Text
The structure of the sentence will often be completely different in another language.
Therefore, it’s best to create strings that are complete sentences.
Translators must be able to control the structure of a sentence, change the order freely and insert all kinds of prefixes, suffixes and any other grammar elements.
If a string contains a placeholder, always explain what each placeholder means and allow the translator to change the word order if necessary. Sometimes you are safer setting a condition at the sentence level.
Considering the above, here is a better example. The translator can freely move the placeholder and fully control the structure of the sentence:
msgstr "Hey %(username)s, welcome back!"
# Hey John, welcome back!
5. Corrupted Characters / Lack of Unicode Support
Whenever you use a wrong character encoding and your source code handles strings using a datatype that cannot handle Unicode, translations will break. Programming languages often store files using the system’s default encoding.
However, when your server is English and all of your users are browsing in Chinese, your characters will get corrupted.
Always Use UTF-8
Therefore, another of our software localization best practices is to make sure you use UTF-8. It’s almost always the best choice as it fixes this issue by standardizing the encodings across browsers and servers. So, ideally, every layer in your stack should use UTF-8: HTML, HTTP server, database as well as the application itself. Only when you’re working primarily with Asian languages, you might need UTF-16.
Specify the charset in the <head> of your HTML document:
<meta http-equiv="content-type" content="text/html; charset=utf-8">
Verify your HTTP server is sending the correct HTTP Content-Type header:
Content-Type: text/html; charset=utf-8
Use UTF-8 in your database:
CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;
6. Hard-coded Numbers, Units, Dates and Times
Software localization is more than just about the mere translation of words – it’s about adopting the complete culture.
Hard-coded date, time or currency formats will cause trouble during the translation process, as languages and countries differ in date and time formats. 26.04.2015 or 04.26.2014? 14:00 or 2 p.m.? 1,000 miles or 1,609 kilometers?
Use a Library to Support Different Locales
As mentioned previously, never hard-code numbers, units, dates, and times, assuming that they don’t need localization.
Go for localizable strings instead and let translators decide what’s the best for their language.
You can store all dates and times in a standard ISO format and use a library to format them for the given locale. It will also help to convert time to different time zones.
The same applies to currencies and other number formats. So, always use a library with localized files for each of the locales your software needs to support.
Example using Pythons babel library:
from babel.dates import format_datetime
from babel.numbers import format_currency
# 26 июля 2013 г., 15:48:18
print(format_currency(10.50, 'EUR', locale='de_DE'))
# 10,50 €
print(format_currency(10.50, 'USD', locale='en_AU'))
7. Not Caring About Vertical Writing and Languages That Read Right to Left
Arabic, Hebrew and some other languages go from right to left and East-Asian languages using Chinese – or traditional Mongolian if you feel adventurous – characters have a long history of vertical writing.
Prepare for a Complex Text Flow
Don’t assume that the same rules apply to all languages and expect the need for implementing specialised versions for a complex text flow, e.g. vertical writing and plan for languages that read right to left.
When it comes to vertical writing, strings are, for example, not rotated by 90 degrees. Instead, single characters are placed under one another.
You can include a direction string in the resourced strings and use that string to load a different stylesheet based on the current locale. There’s also a direction property in CSS.
Read me from right-to-left.
8. Confusion / Ambiguities Due to Lack of Context
When strings include variables, are used in a specific context or wording is ambiguous, your translation vendor will have a hard time. Translators usually work on files and strings in a context-free format. So, how will a translator know whether the single term “Contact” is a verb for a button or a noun for a label?
Provide Localization Notes And Use Code Comments
Keep in mind to add comments and notes to the localizable files.
Aside from glossaries and style guides, you can provide context information to translators directly in your source files. The more context you give – by writing notes for translators and providing alternate phrasings – the better.
If you’re working with content in text-based code files (XML, HTML, JSON and so on) make sure to use code comments. If you handle your translations in a spreadsheet you can easily add a column for context notes. For an even better understanding, provide screenshots.
Remember that context is king when it comes to software translation and localization – the more context, the better!
9. Images Containing Text
Images are a great way to save localization costs as they cut down the word count for translation and may even make your product easier to understand – not to mention they are visually more appealing to the reader.
However, sometimes images that contain text can be a serious pain for translators and can slow down and otherwise hinder the translation process. In some cases it could even result in you paying more money.
Separate Text from Graphics
If a text needs to be associated with a graphic, try to separate your text from the image and create the text as a separate component.
If the text is separable, managing localized versions becomes a lot simpler.
Ideally, images should not contain text at all, because it eliminates the need to translate it. Pay also attention to cross-cultural differences, as not all images and symbols carry the same meaning across borders.
10. Not Worrying About Localization Until It’s Too Late
Small mistakes can prevent your software from working in other languages. Mistakes in source content can be replicated or worse, amplified in various language versions and can cause months fixing localization bugs.
Don’t let this happen to you!
Test Localizability Early and Often
You can save yourself a lot of trouble in the long run when you start testing for localization early and often.
As a developer, you can use automated tests of test translation files and character encoding for the localized version of your software.
Always test your patches not just for code errors but check also strings for grammar errors, capitalization. inconsistencies and localizability issues.
Having localization in mind when creating the original software or app eases the localization process a lot. If you avoid these 10 common pitfalls and follow the best practices detailed in this article, your application should be fully localizable and open up your application to the international market.
Be sure to subscribe and receive all updates from the Phrase blog straight to your inbox. You’ll receive software localization best practices, information about cultural aspects of breaking into new markets, guides and tutorials for optimizing software translation and other industry insights and information. Don’t miss out!