
Translation management
Software localization
The ICU message format syntax is used by a significant number of i18n libraries and solutions. You may have used the format yourself.
{ "Hello World": "Hola Mundo", "An example": "{n, plural, one {Un ejemplo} other {# ejemplos}}" }
A basic message and a plural message in ICU message format
The syntax is intuitive. If you have any familiarity with i18n/l10n, you can probably tell what's going on in the translation file above. While the ICU message syntax can be intuitive, there are a few things about the message format that can be a bit confusing. Different i18n libraries implement different subsets of ICU. And of course, what ICU actually is can often be a source of confusion itself. We mean to clear some of these mysteries up in this article, as well as cover the practical usage of ICU messages when internationalizing and localizing our apps.
According to the official documentation, ICU stands for International Components for Unicode: a set of portable libraries that are meant to make working with i18n easier for Java and C/C++ developers.
🗒 Note » Since ICU's inception, the libraries' implementations have expanded beyond Java and C/C++, and can now be found in other languages. See Which i18n Libraries are Using the ICU Message Format? for more info.
ICU libraries cover a great deal more than translation messages. As you can probably infer from the name, ICU is closely tied to the Unicode international character encoding standard. The ICU library suite provides utilities for working with Unicode in Java and C/C++. It also provides functionality for i18n.
The following is a summarized overview of some of the different modules that make up the ICU library suite.
es_MX
or Spanish Mexican resource bundle—and the retrieval of these bundles' contents.🔗 Resource » We're just presenting some of the ICU modules here. Check out the ICU User Guide for a comprehensive look at everything ICU has to offer.
ICU itself is a general set of libraries for Unicode and i18n. One of these libraries/modules deals with i18n text formatting, and it provides the ICU message format syntax. The ICU message format is powerful and flexible enough to have grown out of its Java and C/++ origins and has been ported to several other languages and platforms. The next section, Which i18n Libraries are Using the ICU Message Format?, lists some of these ported implementations. We saw the ICU message format syntax earlier. It allows for basic messages, interpolation, and general selection based on value. The format also ties into the given library's date, time, and number formatting functions. Here's the example from the ICU documentation, put in a YAML file for some realistic context:
host_invites_guests_to_party: > {gender_of_host, select, female { {num_guests, plural, offset:1 =0 {{host} does not give a party.} =1 {{host} invites {guest} to her party.} =2 {{host} invites {guest} and one other person to her party.} other {{host} invites {guest} and # other people to her party.}}} male { {num_guests, plural, offset:1 =0 {{host} does not give a party.} =1 {{host} invites {guest} to his party.} =2 {{host} invites {guest} and one other person to his party.} other {{host} invites {guest} and # other people to his party.}}} other { {num_guests, plural, offset:1 =0 {{host} does not give a party.} =1 {{host} invites {guest} to their party.} =2 {{host} invites {guest} and one other person to their party.} other {{host} invites {guest} and # other people to their party.}}}}
Let's say we're using some JavaScript implementation of ICU message formatting that provides a function called format()
. We may be able to display the message above using a call like the following.
format('host_invites_guests_to_party', { gender_of_host: 'female', num_guests: 2, host: 'Maria', guest: 'Tamer', });
The output of the above would be "Maria invites Tamer and one other person to her party."
If we wanted to translate the above message to Spanish, we would provide a parameterized message, perhaps a bit like the following.
host_invites_guests_to_party: > {gender_of_host, select, female { {num_guests, plural, offset:1 =0 {{host} no da una fiesta.} =1 {{host} invita a {guest} a su fiesta.} =2 {{host} invita a {guest} y a otra persona a su fiesta.} other {{host} invita a {guest} y a otras # personas a su fiesta.}}} # ... }
The same format()
call above, given the Spanish translation message, would output "Maria invita a Tamer y a otra persona a su fiesta."
This allows translators a great deal of flexibility when working with messages and separates concerns between translators and programmers. A programmer doesn't need to worry about the nuances of a language to develop their software. They just put one function call in their code and depend on the translator to handle the linguistic minutia. The only thing the programmer and the translator need to know is the contract of the message, i.e. its ID and parameters. In later sections, we'll dive deeper into the different formatting options the syntax provides.
Across programming languages and platforms, different i18n libraries have implemented ICU message format support. What follows is a list of some of these libs.
✋🏽 Heads up » Different ports implement different subsets of ICU message format. In all cases, when adopting an ICU message format port be sure to read the documentation carefully and know what ICU features are supported by the library.
JavaScript doesn't have an official, first-party i18n message format. But this is JavaScript we're talking about, so of course you have your pick of third-party libraries to use for both browsers and Node.
🔗 Resource » Check out our Ultimate Guide to JavaScript Localization for all the steps you need to make your JS app accessible to international users.
🗒 Note » The first-party PHP i18n message solution uses gettext.
🗒 Note » The first-party Python i18n message solution is using gettext.
If you've perused the documentation of some of the above libraries, or the docs of the original ICU project, you may have seen references to the CLDR. And you may have wondered what that was. Well, CLDR stands for Common Locale Data Repository, and it's the official Unicode collection of l10n data. For a given locale, the CLDR can give you the locale's script, its preferred calendar, number system, date formats, pluralization rules, and more. The CLDR is used by the main ICU project and other libraries that implement ICU features.
🔗 Resource » Check out the official CLDR documentation for more info.
✋🏽 Heads up » We often use CLDR data without thinking about it since it's baked into some of the i18n libraries we use. Other times, however, we need to manually pull in CLDR data for the locales our apps support. Be sure to read the documentation of the i18n library you're using to see if you need to manually fetch CLDR data for your locales.
So what does working with ICU messages actually look like? It's fairly straightforward, actually. Let's take a look at the features the syntax gives us.
🗒 Note » In the examples below, we're assuming that our translation messages are stored in YAML or JSON files. We're also assuming that we're using a JavaScript i18n library with ICU message format support and a format()
function to display our messages. This is just for demonstration's sake, and you will probably want to take a look at how your i18n library handles message files, and the exact function or method it uses to display messages. The formats presented here should hold no matter which library you use, however. 🔗 Resource » If you want to play around with the following examples, or your own, we recommend the Online ICU Message Editor by Andy VanWagoner.
A basic ICU message is just plain text in a given locale.
# in messages_en.yaml basic_message: Hello World # in messages_fr.yaml basic_message: Bonjour tout le monde
format('basic_message'); // in French => "Bonjour tout le monde"
Dynamic text is denoted by curly braces {}
.
// in messages_en.json { "user_greeting": "Good day, {username}" } // in messages_es.json { "user_greeting": "Buenos días, {username}" } // in UI format('user_greeting', { username: 'Adam' }); // in Spanish => "Buenos días, Adam"
Pluralized forms can appear anywhere within a message, and have the form {n, plural, ...forms}
, where n
is the count variable, and forms
is one or more plural forms for the phrase.
# in messages_en.yaml awards_won: > The film won {n, plural, one {# award} other {# awards}} # in messages_ar.yaml awards_win: > الفلم {n, plural, zero {لم يحوز على جوائز} one {حاز على جائزة #} two {حاز على جائزتين} few {حاز على # جوائز} other {حاز على # جائزة}}
The special #
symbol will display the given count in the active locale's number system. We're using n
to denote the count here, but that's just a matter of convention. We can use any name we like as our count variable.
🗒 Note » Pluralization generally uses CLDR rules for the given locale. For example, according to the CLDR, English has two plural forms, one
an other
. Arabic has six forms. Each locale file can each specify its own plural forms for a given message.
// in UI format('awards_won', { n: 27 }); // in English => "The film won 27 awards" // in Arabic => "الفلم حاز على ٢٧ جائزة"
🗒 Note » The other
variant is always required in plural
expressions.
Also, we can nest other interpolated values within our plural messages. The following message is perfectly legal.
film_won_awards: > All said, {n, plural, one {{film} won # award} other {{film} won # awards}}
// in UI format('film_won_awards', { film: 'The Godfather', n: 27 }); // in English => "All said, The Godfather won 27 awards"
We can customize our messages using single number specifiers. We use the =42
syntax to provide these custom forms, which override CLDR locale forms like one
or other
.
film_won_awards: > All said, {n, plural, =0 {{film} did not win any awards} one {{film} won # award} other {{film} won # awards}}
// in UI format('film_won_awards', { film: 'The Godfather', n: 0 }); // in English => "All said, The Godfather did not win any awards"
An optional offset
specifier can be added to ICU plural messages. If an offset
is provided it will not be used for selecting the plural form. The plain value of n
will still be used for the form selection. The offset
will, however, be subtracted from the given count, and the difference will be used for the value of #
.
# in message_en.yaml trains_found: > {n, plural, offset:2 =0 {No (#) trains are available} one {# train is available} other {# trains are available}}
format('trains_found', { n: -1 }); // => "-3 trains are available" format('trains_found', { n: 0 }); // => "No (-2) trains are available" format('trains_found', { n: 1 }); // => "-1 train is available" format('trains_found', { n: 2 }); // => "0 trains are available"
✋🏽 Heads up » Because offsets can obfuscate the expected display of a message, they may cause confusion when used on a team. Use your own judgment here.
A select
expression is a conditional branch and is a lot like the switch statement found in many programming languages. It can be placed anywhere in a message and is denoted by {arg, select, ...forms}
where arg
is the argument variable to switch on, and forms
is one or more alternatives.
user_ran_first_100: > {username} has ran {gender, select, female {her} male {his} other {their}} first 100km 🚀
🗒 Note » The other
variant is always required in select
expressions.
format('user_ran_first_100', { username: 'Ava', gender: 'female' }); // in English => "Ava has ran her first 100km 🚀"
Plural forms and select expressions can be nested within themselves or each other. Dynamic strings can be nested in plurals and selects as well. Here's the example from the ICU docs:
host_invites_guests_to_party: > {gender_of_host, select, female { {num_guests, plural, offset:1 =0 {{host} does not give a party.} =1 {{host} invites {guest} to her party.} =2 {{host} invites {guest} and one other person to her party.} other {{host} invites {guest} and # other people to her party.}}} male { {num_guests, plural, offset:1 =0 {{host} does not give a party.} =1 {{host} invites {guest} to his party.} =2 {{host} invites {guest} and one other person to his party.} other {{host} invites {guest} and # other people to his party.}}} other { {num_guests, plural, offset:1 =0 {{host} does not give a party.} =1 {{host} invites {guest} to their party.} =2 {{host} invites {guest} and one other person to their party.} other {{host} invites {guest} and # other people to their party.}}}}
format('host_invites_guests_to_party', { gender_of_host: 'male', num_guests: 3, host: 'Javier', guest: 'Malak', }); // in English => "Javier invites Malak and 2 other people to his party."
ICU message format supports predefined number formats: percent
and currency
. We specify number formats in our messages via the syntax, {n, number, format}
where n
is the number argument, and format
is either percent
or currency
.
boopies: In two years, {n, number, percent} of boopies will continue to shmoopie.
The percent
format expects the number argument given to the message must be a decimal between 0 and 1.
format('boopies', { n: 0.2 }); // in English => In two years, 20% of boopies will continue to shmoopie.
🗒 Note » The format
specifier is optional, and a number
format will display the given number in the number system of the active locale.
You may find the above options limited for your needs. There are, of course, ways to have more fine-grained control over your number formatting. The official ICU Java libraries, for example, offer a NumberFormatter
class, which allows you to do things like the following.
NumberFormatter.with() .notation(Notation.compactShort()) .unit(Currency.getInstance("EUR")) .precision(Precision.maxDigits(2)) .locale(...) .format(1234) .toString(); // €1.2K in en-US
Every ICU library implements its number formatting a bit differently, so check the documentation of the library you're using to see how you can better control your number formats.
ICU has four predefined date formats: short
, medium
, long
, and full
. Date formats are specified using the syntax, {myDate, date, format}
, where myDate
is the date value argument, and format
is one of the predefined formats.
published_on: Publié le {published, date, short}
format('published_on', { published: new Date('2019-19-31') }); // => "Publié le 31/10/19"
The official ICU spec contains support for pattern characters, called Date Field Symbols, that allow for precise control over date formatting. Again, every i18n library does its custom date/time formatting a bit differently, and you may need to supplement your i18n library with another date-specific solution for your date formats. Peruse the docs of your i18n library to see how it achieves or allows for fine-grained control over date formatting.
We've covered what we think are the most commonly used formats here. However, the ICU spec and ICU third-party libraries (in varying degrees), offer much more functionality. Some of these include working with ordinals (e.g. 1st, 2nd ), units (e.g. km, lbs), durations, relative dates (e.g. yesterday), and more. Check out the documentation of your i18n library of choice to see which advanced formats are supported.
The ICU message format is certainly one of the de-facto standards of translation messages in i18n. We hope that we've shed some light on some of your questions about ICU and that you've enjoyed our short guide to ICU and the ICU message format.
As soon as you start working with a team on a software internationalization project, a few things will make you more efficient than a complete localization platform like the Phrase Localization Suite. Its dedicated software localization product, Phrase Strings, supports the ICU message format and gives your translators ICU syntax checking and highlighting.
ICU message format support in Phrase: syntax highlighting and easy plurals
Phrase Strings is a fully-featured software localization solution that helps product managers track their l10n progress, allows translators to work in an intuitive UI, gives developers the ability to sync translation files through the CLI, and much more. Leave the i18n pipeline to Phrase and stay focused on your product. Check out all Phrase features for developers, and see for yourself how they can help you take your software global easily.
Last updated on May 9, 2023.