Software localization

The Missing Guide to the ICU Message Format

The ICU message format syntax is used by a number of i18n libraries and can often become a source of confusion itself. Let's clear some facts up about it!
Software localization blog category featured image | Phrase

The ICU message format syntax is used by a significant number of i18n libraries and solutions. You may have used the format yourself.


    "Hello World": "Hola Mundo",

    "An example": "{n, plural, one {Un ejemplo} other {# ejemplos}}"


A basic message and a plural message in ICU message format

The syntax is intuitive. If you have any familiarity with i18n/l10n, you can probably tell what's going on in the translation file above. While the ICU message syntax can be intuitive, there are a few things about the message format that can be a bit confusing. Different i18n libraries implement different subsets of ICU. And of course, what ICU actually is can often be a source of confusion itself. We mean to clear some of these mysteries up in this article, as well as cover the practical usage of ICU messages when internationalizing and localizing our apps.

What is ICU?

According to the official documentation, ICU stands for International Components for Unicode: a set of portable libraries that are meant to make working with i18n easier for Java and C/C++ developers.

🗒 Note » Since ICU's inception, the libraries' implementations have expanded beyond Java and C/C++, and can now be found in other languages. See Which i18n Libraries are Using the ICU Message Format? for more info.

ICU libraries cover a great deal more than translation messages. As you can probably infer from the name, ICU is closely tied to the Unicode international character encoding standard. The ICU library suite provides utilities for working with Unicode in Java and C/C++. It also provides functionality for i18n.

The official ICU libraries

The following is a summarized overview of some of the different modules that make up the ICU library suite.

  • Unicode Strings — Provides macros and utilities for working with Unicode strings.
  • Conversion — Handles conversion between Unicode and non-Unicode character encoding.
  • Locale — Deals with the i18n concept of a locale (a language along with optional country and script variant) as well as information relevant to that locale, such as its calendar, currency, etc. Also deals with fallback logic when a locale is not supported.
  • Resources — Handles resource bundles, which are effectively translation message files—e.g. the es_MX or Spanish Mexican resource bundle—and the retrieval of these bundles' contents.
  • Date/Time Services — Handles the representation of time zones and provides logic to work with various kinds of calendars.
  • Formatting — Deals with the display of text, particularly when internationalizing, focusing on displaying numbers, dates, times, and messages (translated strings). This module, of course, describes the ICU message format and is of particular interest to us.

🔗 Resource » We're just presenting some of the ICU modules here. Check out the ICU User Guide for a comprehensive look at everything ICU has to offer.

The ICU message format

ICU itself is a general set of libraries for Unicode and i18n. One of these libraries/modules deals with i18n text formatting, and it provides the ICU message format syntax. The ICU message format is powerful and flexible enough to have grown out of its Java and C/++ origins and has been ported to several other languages and platforms. The next section, Which i18n Libraries are Using the ICU Message Format?, lists some of these ported implementations. We saw the ICU message format syntax earlier. It allows for basic messages, interpolation, and general selection based on value. The format also ties into the given library's date, time, and number formatting functions. Here's the example from the ICU documentation, put in a YAML file for some realistic context:

host_invites_guests_to_party: >

    {gender_of_host, select,

        female {

            {num_guests, plural, offset:1

              =0 {{host} does not give a party.}

              =1 {{host} invites {guest} to her party.}

              =2 {{host} invites {guest} and one other person to her party.}

              other {{host} invites {guest} and # other people to her party.}}}

          male {

            {num_guests, plural, offset:1

              =0 {{host} does not give a party.}

              =1 {{host} invites {guest} to his party.}

              =2 {{host} invites {guest} and one other person to his party.}

              other {{host} invites {guest} and # other people to his party.}}}

          other {

            {num_guests, plural, offset:1

              =0 {{host} does not give a party.}

              =1 {{host} invites {guest} to their party.}

              =2 {{host} invites {guest} and one other person to their party.}

              other {{host} invites {guest} and # other people to their party.}}}}

Let's say we're using some JavaScript implementation of ICU message formatting that provides a function called format() . We may be able to display the message above using a call like the following.

format('host_invites_guests_to_party', {

    gender_of_host: 'female',

        num_guests: 2,

              host: 'Maria',

             guest: 'Tamer',


The output of the above would be "Maria invites Tamer and one other person to her party." If we wanted to translate the above message to Spanish, we would provide a parameterized message, perhaps a bit like the following.

host_invites_guests_to_party: >

    {gender_of_host, select,

        female {

            {num_guests, plural, offset:1

              =0 {{host} no da una fiesta.}

              =1 {{host} invita a {guest} a su fiesta.}

              =2 {{host} invita a {guest} y a otra persona a su fiesta.}

              other {{host} invita a {guest} y a otras # personas a su fiesta.}}}

        # ...


The same format() call above, given the Spanish translation message, would output "Maria invita a Tamer y a otra persona a su fiesta." This allows translators a great deal of flexibility when working with messages and separates concerns between translators and programmers. A programmer doesn't need to worry about the nuances of a language to develop their software. They just put one function call in their code and depend on the translator to handle the linguistic minutia. The only thing the programmer and the translator need to know is the contract of the message, i.e. its ID and parameters. In later sections, we'll dive deeper into the different formatting options the syntax provides.

Which i18n libraries are using the ICU message format?

Across programming languages and platforms, different i18n libraries have implemented ICU message format support. What follows is a list of some of these libs.

✋🏽 Heads up » Different ports implement different subsets of ICU message format. In all cases, when adopting an ICU message format port be sure to read the documentation carefully and know what ICU features are supported by the library.


  • ICU4C—the papa spec and implementation; this and the Java library are where ICU started. Of course, ICU4C is a complete implementation of ICU.


  • intl—the first-party Dart i18n package implements ICU message formatting.
  • Flutter i18n—based on Dart's intl, Flutter's first-party i18n library also uses ICU message formats.


  • ICU4J—the mama spec and implementation; this and the C/C++ libary are the ICU originals. And it goes without saying that ICU4J is a complete implementation of ICU.


JavaScript doesn't have an official, first-party i18n message format. But this is JavaScript we're talking about, so of course you have your pick of third-party libraries to use for both browsers and Node.

  • Globalize—a well-rounded implementation of the ICU message format.
  • messageformat—resembling the original ICU implementations, messageformat provides message compilation to JavaScript for better performance.
  • i18next with ICU module—an official ICU extension for the robust i18next library.
  • Angular—Google's popular front-end framework uses ICU expressions in its first-party i18n solution.
  • react-intl—part of the FormatJS family, the i18n library for React uses the ICU message syntax.

🔗 Resource » Check out our Ultimate Guide to JavaScript Localization for all the steps you need to make your JS app accessible to international users.


  • Symfony—the popular web framework is showing support for ICU messages.

🗒 Note » The first-party PHP i18n message solution uses gettext.


  • PyICU—Python wrappers for the ICU C++ libraries.

🗒 Note » The first-party Python i18n message solution is using gettext.

What is CLDR?

If you've perused the documentation of some of the above libraries, or the docs of the original ICU project, you may have seen references to the CLDR. And you may have wondered what that was. Well, CLDR stands for Common Locale Data Repository, and it's the official Unicode collection of l10n data. For a given locale, the CLDR can give you the locale's script, its preferred calendar, number system, date formats, pluralization rules, and more. The CLDR is used by the main ICU project and other libraries that implement ICU features.

🔗 Resource » Check out the official CLDR documentation for more info.

✋🏽 Heads up » We often use CLDR data without thinking about it since it's baked into some of the i18n libraries we use. Other times, however, we need to manually pull in CLDR data for the locales our apps support. Be sure to read the documentation of the i18n library you're using to see if you need to manually fetch CLDR data for your locales.

Working with the ICU message format

So what does working with ICU messages actually look like? It's fairly straightforward, actually. Let's take a look at the features the syntax gives us.

🗒 Note » In the examples below, we're assuming that our translation messages are stored in YAML or JSON files. We're also assuming that we're using a JavaScript i18n library with ICU message format support and a format() function to display our messages. This is just for demonstration's sake, and you will probably want to take a look at how your i18n library handles message files, and the exact function or method it uses to display messages. The formats presented here should hold no matter which library you use, however. 🔗 Resource » If you want to play around with the following examples, or your own, we recommend the Online ICU Message Editor by Andy VanWagoner.

Basic messages

A basic ICU message is just plain text in a given locale.

# in messages_en.yaml

basic_message: Hello World

# in messages_fr.yaml

basic_message: Bonjour tout le monde

// in French => "Bonjour tout le monde"


Dynamic text is denoted by curly braces {}.

// in messages_en.json


    "user_greeting": "Good day, {username}"


// in messages_es.json


    "user_greeting": "Buenos días, {username}"


// in UI

format('user_greeting', { username: 'Adam' });

// in Spanish => "Buenos días, Adam"


Pluralized forms can appear anywhere within a message, and have the form {n, plural, ...forms}, where n is the count variable, and forms is one or more plural forms for the phrase.

# in messages_en.yaml

awards_won: >

    The film won {n, plural,

         one {# award}

         other {# awards}}

# in messages_ar.yaml

awards_win: >

    الفلم {n, plural,

        zero {لم يحوز على جوائز}

        one {حاز على جائزة #}

        two {حاز على جائزتين}

        few {حاز على # جوائز}

        other {حاز على # جائزة}}

The special # symbol will display the given count in the active locale's number system. We're using n to denote the count here, but that's just a matter of convention. We can use any name we like as our count variable.

🗒 Note » Pluralization generally uses CLDR rules for the given locale. For example, according to the CLDR, English has two plural forms, one an other. Arabic has six forms. Each locale file can each specify its own plural forms for a given message.

// in UI

format('awards_won', { n: 27 });

// in English => "The film won 27 awards"

// in Arabic => "الفلم حاز على ٢٧ جائزة"

🗒 Note » The other variant is always required in plural expressions.

Also, we can nest other interpolated values within our plural messages. The following message is perfectly legal.

film_won_awards: >

    All said, {n, plural,

        one {{film} won # award}

        other {{film} won # awards}}
// in UI

format('film_won_awards', { film: 'The Godfather', n: 27 });

// in English => "All said, The Godfather won 27 awards"

Overriding CLDR plural rules

We can customize our messages using single number specifiers. We use the =42 syntax to provide these custom forms, which override CLDR locale forms like one or other.

film_won_awards: >

    All said, {n, plural,

        =0 {{film} did not win any awards}

        one {{film} won # award}

        other {{film} won # awards}}
// in UI

format('film_won_awards', { film: 'The Godfather', n: 0 });

// in English => "All said, The Godfather did not win any awards"


An optional offset specifier can be added to ICU plural messages. If an offset is provided it will not be used for selecting the plural form. The plain value of n will still be used for the form selection. The offset will, however, be subtracted from the given count, and the difference will be used for the value of #.

# in message_en.yaml

trains_found: >

    {n, plural, offset:2

        =0 {No (#) trains are available}

        one {# train is available}

        other {# trains are available}}
format('trains_found', { n: -1 }); // => "-3 trains are available"

format('trains_found', { n: 0 });  // => "No (-2) trains are available"

format('trains_found', { n: 1 });  // => "-1 train is available"

format('trains_found', { n: 2 });  // => "0 trains are available"

✋🏽 Heads up » Because offsets can obfuscate the expected display of a message, they may cause confusion when used on a team. Use your own judgment here.

Switching with select

A select expression is a conditional branch and is a lot like the switch statement found in many programming languages. It can be placed anywhere in a message and is denoted by {arg, select, ...forms} where arg is the argument variable to switch on, and forms is one or more alternatives.

user_ran_first_100: >

    {username} has ran

    {gender, select,

        female {her}

        male {his}

        other {their}}

    first 100km 🚀

🗒 Note » The other variant is always required in select expressions.

format('user_ran_first_100', { username: 'Ava', gender: 'female' });

// in English => "Ava has ran her first 100km 🚀"


Plural forms and select expressions can be nested within themselves or each other. Dynamic strings can be nested in plurals and selects as well. Here's the example from the ICU docs:

host_invites_guests_to_party: >

    {gender_of_host, select,

        female {

            {num_guests, plural, offset:1

              =0 {{host} does not give a party.}

              =1 {{host} invites {guest} to her party.}

              =2 {{host} invites {guest} and one other person to her party.}

              other {{host} invites {guest} and # other people to her party.}}}

          male {

            {num_guests, plural, offset:1

              =0 {{host} does not give a party.}

              =1 {{host} invites {guest} to his party.}

              =2 {{host} invites {guest} and one other person to his party.}

              other {{host} invites {guest} and # other people to his party.}}}

          other {

            {num_guests, plural, offset:1

              =0 {{host} does not give a party.}

              =1 {{host} invites {guest} to their party.}

              =2 {{host} invites {guest} and one other person to their party.}

              other {{host} invites {guest} and # other people to their party.}}}}
format('host_invites_guests_to_party', {

    gender_of_host: 'male',

        num_guests: 3,

              host: 'Javier',

             guest: 'Malak',


// in English => "Javier invites Malak and 2 other people to his party."

Number formatting

ICU message format supports predefined number formats: percent and currency. We specify number formats in our messages via the syntax, {n, number, format} where n is the number argument, and format is either percent or currency.

boopies: In two years, {n, number, percent} of boopies will continue to shmoopie.

The percent format expects the number argument given to the message must be a decimal between 0 and 1.

format('boopies', { n: 0.2 });

// in English => In two years, 20% of boopies will continue to shmoopie.

🗒 Note » The format specifier is optional, and a number format will display the given number in the number system of the active locale.

Custom number formats

You may find the above options limited for your needs. There are, of course, ways to have more fine-grained control over your number formatting. The official ICU Java libraries, for example, offer a NumberFormatter class, which allows you to do things like the following.







     .toString();  // €1.2K in en-US

Every ICU library implements its number formatting a bit differently, so check the documentation of the library you're using to see how you can better control your number formats.

Date/time formatting

ICU has four predefined date formats: short, medium, long, and full. Date formats are specified using the syntax, {myDate, date, format}, where myDate is the date value argument, and format is one of the predefined formats.

published_on: Publié le {published, date, short}

format('published_on', { published: new Date('2019-19-31') });

// => "Publié le 31/10/19"

Custom date/time formatting

The official ICU spec contains support for pattern characters, called Date Field Symbols, that allow for precise control over date formatting. Again, every i18n library does its custom date/time formatting a bit differently, and you may need to supplement your i18n library with another date-specific solution for your date formats. Peruse the docs of your i18n library to see how it achieves or allows for fine-grained control over date formatting.

Even more

We've covered what we think are the most commonly used formats here. However, the ICU spec and ICU third-party libraries (in varying degrees), offer much more functionality. Some of these include working with ordinals (e.g. 1st, 2nd ), units (e.g. km, lbs), durations, relative dates (e.g. yesterday), and more. Check out the documentation of your i18n library of choice to see which advanced formats are supported.

Wrapping up

The ICU message format is certainly one of the de-facto standards of translation messages in i18n. We hope that we've shed some light on some of your questions about ICU and that you've enjoyed our short guide to ICU and the ICU message format.

As soon as you start working with a team on a software internationalization project, a few things will make you more efficient than a complete localization platform like the Phrase Localization Suite. Its dedicated software localization product, Phrase Strings, supports the ICU message format and gives your translators ICU syntax checking and highlighting. ICU message format support in Phrase | Phrase

ICU message format support in Phrase: syntax highlighting and easy plurals

Phrase Strings is a fully-featured software localization solution that helps product managers track their l10n progress, allows translators to work in an intuitive UI, gives developers the ability to sync translation files through the CLI, and much more. Leave the i18n pipeline to Phrase and stay focused on your product. Check out all Phrase features for developers, and see for yourself how they can help you take your software global easily.