What character encoding should you use for localized software?

Always use UTF-8. It supports virtually all writing systems and prevents character corruption when your server and your users' browsers are in different languages. Apply UTF-8 consistently at every layer: HTML, HTTP headers, and your database. UTF-16 is only needed when working primarily with Asian scripts.

How do you handle right-to-left (RTL) languages like Arabic and Hebrew?

RTL languages such as Arabic and Hebrew require bidirectional text support and mirrored layouts. Don't assume left-to-right flow. Use CSS direction and unicode-bidi properties, load locale-specific stylesheets, and test layouts thoroughly. East Asian languages may also require vertical writing mode support.

How do you give translators enough context to translate software strings accurately?

Context is critical in software localization. A string like Contact could be a button label (verb) or a section header (noun) — translators need to know which. Add code comments and localization notes directly in your resource files, provide screenshots of the UI, and include a style guide and glossary.

Should images in software contain text?

Avoid embedding text inside images wherever possible. Text baked into graphics is difficult and expensive to translate, and slows down the localization process. If text must be associated with an image, render it as a separate HTML/UI element so it can be translated independently. Also check that images are culturally appropriate for each target market.

When should you start thinking about localization in the software development process?

As early as possible. Localization issues discovered late in development — or after launch — are expensive to fix, and source content errors get replicated across every language version. Build internationalization (i18n) in from the start, use automated tests for encoding and string files, and run localization QA on every release, not just at the end.

Home | Resources | Blog

10 Common Software Localization Mistakes

Matt Owen

Senior Content Manager at Phrase

Last updated on July 24, 2026.

Localization is more than just about the mere translation of words from one language into another—it’s about cultural awareness and adapting your software to the preferences, habits, and expectations of your target users.

See below 10 of the most comment localization mistakes we have seen (and helped to fix).

Overview

1. Embedding text directly into the code of your software

Embedding text directly into the code can slow down the localization process tremendously, as the translator needs to actually read the code to determine which segments need translation and which ones don’t. It also makes localization more costly than necessary, and the consistency of the translation will be difficult—if not impossible—to maintain.

Files containing hard-coded localizable content are also difficult to version control and maintain, so make sure to keep all your text in external files.

Use separate resource files

Translatable strings include titles, product names, error messages, and any other text that users might see when using your app/software. You should get all of these user-facing strings out of your code and place them into resource files, giving each string a unique name (think of it as an identifier or a key).

These resource files will be loaded by a library that uses a combination of language and country (also known as the “locale”) to identify the right string.

Once you’ve placed your strings in external resource files, you can send these files to your translation vendor and get back translated files for each locale that your application is going to support.

Be careful when choosing key IDs for your strings: The IDs should always describe the string’s role in the interface (title, button label, etc.). You should also make sure that you aren’t duplicating an existing ID when adding new strings.

There are various file formats that make suitable resource files. Popular choices are JSON, XML, gettext, or YAML. Depending on the programming language or framework you are using, there will usually be a de-facto standard format.

In Python, the GNU gettext system is quite a popular choice. A .po resource file containing the translatable strings is created for each locale:

# ./locales/en_US/LC_MESSAGES/messages.po
msgid "button_order"
msgstr "Order Now"
msgid "login_message"
msgstr "Welcome back!"

# ./locales/de_DE/LC_MESSAGES/messages.po
msgid "button_order"
msgstr "Jetzt bestellen"
msgid "login_message"
msgstr "Willkommen zurück!""

And the gettext function is used to get the appropriate translation:

import gettext
de_DE = gettext.translation('messages', localedir='locales', languages=['de_DE'])
de_DE.install()
print(gettext("login_message"))
# Willkommen zurück!
print(gettext("button_order"))
# Jetzt bestellen]

2. Not accounting for varying language lengths

Don’t assume every language is as concise as English. English text is often very compact in comparison to other languages—like German or Finnish—and translations can vary considerably in length and density.

If you don’t prepare for this and there isn’t enough space, your strings might overlap with other controls and the interface will require editing after translation.

Design for +50% and give strings room to grow and shrink

The size of the interface must be adjustable to accommodate the length of translations provided at runtime.

You can solve this problem by leaving extra space after each label for the string to grow. However, by doing so, the labels and controls might appear pretty far apart from each other in compact languages. Some developers give their labels room to grow and shrink by aligning them to the right or by placing them above the controls.

You can also use layout managers that understand how locale affects a UI and manage the pixel positioning of widgets for you at runtime, so your interface will adjust properly.
Another way to solve this issue is by storing the dimensions for a label in the locale resource file.

3. Specifying a language but not a country

Sometimes a language differs depending on the country in which it’s spoken because different regions may speak and spell a shared language with nuanced differences (e.g., British English differs from American English). Specifying a language but not a country code can make localization difficult.

Always use a full locale

Be as precise as possible, and always use a full locale property instead of just a language. Locales contain both the language and the country code where it’s spoken, such as fr-FR (French in France) or en-GB (English in Great Britain). This allows your app to support alternate spellings, date formats, and other differences between two countries with a shared language.

# ./locales/en_US/LC_MESSAGES/messages.po
msgid "login_message"
msgstr "Hi there!"

# ./locales/en_AU/LC_MESSAGES/messages.po
msgid "login_message"
msgstr "G'Day Mate!"

4. Concatenating strings

Some developers love to create concatenated pieces of sentences using placeholders, where the order of words and phrases is hard-coded.

Splitting sentences into several keys presumes grammar rules and a certain sentence structure. If you use conditional statements and conditionalize single terms or a portion of a sentence, the granularity of conditional text might cause confusion during the translation process.

In this (intentionally bad) example, the structure is fixed and the sentence is broken up into tiny strings:

msgid "welcome_back_msg_start"
msgstr "Hey "
msgid "welcome_back_msg_end"
msgstr ", welcome back!"

print(gettext('welcome_back_msg_start') + username + gettext('welcome_back_msg_end'))
# Hey John, welcome back!

These word puzzles are very hard and sometimes almost impossible to translate, and will give translators a bitter hatred for your shenanigans, as they may only see parts of the sentence while translating and have to guess what belongs together.

Nobody likes guessing games!

Don’t assume grammar structures and be careful with granularity in conditional text

The structure of the sentence will often be completely different in another language. Therefore, it’s best to create strings that are complete sentences.

Translators must be able to control the structure of a sentence, change the order freely, and insert all kinds of prefixes, suffixes, and any other grammar elements.

If a string contains a placeholder, always explain what each placeholder means and allow the translator to change the word order if necessary. Sometimes you are safer setting a condition at the sentence level.

Considering the above, here is a better example. The translator can freely move the placeholder and fully control the structure of the sentence:

msgid "welcome_back_msg"
msgstr "Hey %(username)s, welcome back!"

print(gettext('welcome_back_msg', username="John"))
# Hey John, welcome back!

5. Not supporting Unicode

Whenever you use a wrong character encoding and your source code handles strings using a data type that cannot handle Unicode, translations will break. Programming languages often store files using the system’s default encoding.

However, when your server is English and all of your users are browsing in Chinese, your characters will get corrupted.

Always use UTF-8

Therefore, another of our localization best practices is to make sure you use UTF-8. It’s almost always the best choice as it fixes this issue by standardizing the encodings across browsers and servers.
So, ideally, every layer in your stack should use UTF-8: HTML, HTTP server, database as well as the application itself. Only when you’re working primarily with Asian languages, you might need UTF-16.

Specify the charset in the <head> of your HTML document:

<meta http-equiv="content-type" content="text/html; charset=utf-8">

Verify your HTTP server is sending the correct HTTP Content-Type header:

Content-Type: text/html; charset=utf-8

Use UTF-8 in your database:

# MySQL
CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;

6. Hard-coding numbers, units, dates, and times

Hard-coded date, time, or currency formats will cause trouble during the translation process, as languages and countries differ in date and time formats. 26.04.2015 or 04.26.2014? 14:00 or 2 p.m.? 1,000 miles or 1,609 kilometers?

Use a library to support different locales

As mentioned previously, never hard-code numbers, units, dates, and times, assuming that they don’t need localization. Go for localizable strings instead, and let translators decide what’s best for their language.

You can store all dates and times in a standard ISO format and use a library to format them for the given locale. It will also help to convert time to different time zones.

The same applies to currencies and other number formats. So, always use a library with localized files for each of the locales your software needs to support.

Here’s an example using Python’s Babel library:

from babel.dates import format_datetime
from babel.numbers import format_currency
print(format_datetime(locale='ru_RU'))
# 26 июля 2013 г., 15:48:18
print(format_currency(10.50, 'EUR', locale='de_DE'))
# 10,50 €
print(format_currency(10.50, 'USD', locale='en_AU'))
# US$10.50

7. Not considering vertical writing and right-to-left languages

Arabic, Hebrew, and some other languages go from right to left and East-Asian languages using Chinese—or traditional Mongolian, if you feel adventurous—characters have a long history of vertical writing.

Prepare for a complex text flow

Don’t assume that the same rules apply to all languages and expect the need for implementing specialized versions for a complex text flow, e.g., vertical writing, and plan for languages that read right to left.

When it comes to vertical writing, strings are, for example, not rotated by 90 degrees. Instead, single characters are placed under one another.

You can include a direction string in the resourced strings and use that string to load a different stylesheet based on the current locale. There’s also a direction property in CSS.
Here’s an example:

h1 {
    direction: rtl;
}

<h1>
  Read me from right-to-left.
</h1>

8. Creating ambiguity due to lack of context

When strings include variables, are used in a specific context, or have ambiguous wording, your translation vendor will likely have a hard time deciphering them. Translators usually work on files and strings in a context-free format. So, how will a translator know whether the single term “Contact” is a verb for a button or a noun for a label?

Provide localization notes and use code comments

Keep in mind to add comments and notes to the localizable files.
Aside from glossaries and style guides, you can provide context information to translators directly in your source files. The more context you give—by writing notes for translators and providing alternate phrasings—the better.

If you’re working with content in text-based code files (XML, HTML, JSON and so on), make sure to use code comments. If you handle your translations in a spreadsheet, you can easily add a column for context notes. For an even better understanding, provide screenshots.

Remember that context is king when it comes to software translation and localization—the more context, the better!

9. Using images that contain text

Images are a great way to save localization costs as they cut down the word count for translation and may even make your product easier to understand—not to mention they are visually more appealing to the reader.

However, sometimes images that contain text can be a serious pain for translators and can slow down and otherwise hinder the translation process. In some cases, it could even result in you paying more money.

Separate text from graphics

If a text needs to be associated with a graphic, try to separate your text from the image and create the text as a separate component.

If the text is separable, managing localized versions becomes a lot simpler.
Ideally, images should not contain text at all, because it eliminates the need to translate it. Pay attention to cross-cultural differences too, as not all images and symbols carry the same meaning across borders.

10. Not worrying about localization until it’s too late

Small mistakes can prevent your software from working in other languages. Errors in source content can be replicated, or worse, amplified in various language versions, and this can derive in months of work fixing localization bugs.

Don’t let this happen to you!

Test localizability early and often

You can save yourself a lot of trouble in the long run when you start testing for localization early and often.

As a developer, you can use automated tests of test translation files and character encoding for the localized version of your software.

Always test your patches not just for code errors but also check strings for grammar errors, capitalization. inconsistencies, and localizability issues.

Having localization in mind when creating the original software appeases the localization process a lot. If you avoid these 10 common pitfalls and diligently follow proven localization best practices, your software should be fully localizable and open up to the global market.

FAQs

What happens if you embed text directly into your software code?

Embedding text directly in code makes localization slow and costly. Translators must read through code to find translatable strings, version control becomes difficult, and translation consistency is hard to maintain. The fix is to move all user-facing strings into separate resource files (JSON, XML, YAML, or gettext .po files) with unique key IDs.

Why do translated strings break the UI layout?

Translated text is often longer than the English source — German and Finnish can run 30–50% longer. If the UI doesn’t accommodate growth, strings overlap controls. Design layouts to allow at least +50% string expansion, use flexible layout managers, or store label dimensions in your locale resource files.

Should you specify a language code or a full locale in your software?

Always use a full locale (language + country code), such as fr-FR or en-GB, not just a language code. Different regions speaking the same language may have different spelling conventions, date formats, and idioms. A full locale lets your app handle those differences correctly.

Why is string concatenation a localization problem?

Concatenating sentence fragments with hard-coded word order assumes all languages share the same grammar structure — they don’t. Translators who only see fragments can’t reconstruct the meaning. Use complete sentence strings with named placeholders (e.g. %(username)s) so translators can reorder words freely to fit their language’s grammar.

How should dates, times, and numbers be handled in localization?

Never hard-code date, time, currency, or number formats. Different locales use different conventions — 26.04.2025 vs. 04/26/2025, or 1.000,50 vs. 1,000.50. Store dates in ISO format and use a locale-aware library (such as Python’s Babel) to format them correctly for each target locale at runtime.

Why high-quality automatic subtitling is harder than it looks

Subtitles are more than text on screen. Explore why segmentation, typography, and AI make automatic subtitling far more complex than it first appears.

Localization as code: a composable approach to localization

Why is localization still a manual, disconnected process in a world where everything else is already “as code”? Learn how a composable, developer-friendly approach brings localization into your CI/CD pipeline, with automation, observability, and Git-based workflows built in.

How to build a scalable WordPress i18n workflow

WordPress powers the web, but translating it well takes more than plugins. Discover how to build a scalable localization workflow using gettext, best practices, and the Phrase plugin.

Localizing Unity games with the official Phrase plugin

Want to localize your Unity game without the CSV chaos? Discover how the official Phrase Strings Unity plugin simplifies your game’s localization workflow—from string table setup to pulling translations directly into your project. Whether you’re building for German, Serbian, or beyond, this guide shows how to get started fast and localize like a pro.

Internationalization beyond code: A developer’s guide to real-world language challenges

Discover how language affects your UI. From text expansion to pluralization, this guide explores key i18n pitfalls and best practices for modern web developers.

Want to find out more?

Get in touch

Request a demo