Software localization

A Practical Guide to the ICU Message Format

The ICU message format syntax is used by a number of i18n libraries and can often become a source of confusion itself. Let's clear some facts up about it!
Software localization blog category featured image | Phrase

The International Components for Unicode (ICU) is a dependable open-source library suite in the world of software internationalization (i18n) and localization (l10n). Born from the innovative minds at Taligent, IBM, and Sun Microsystems and initially designed for Java and C/C++, ICU has since expanded its reach to encompass all major programming environments. Its standout feature is a comprehensive toolkit for Unicode text processing, complete with language-specific functions and a wealth of locale data. Plus, ICU shines when it comes to formatting plurals and selectors in translation messages. And it’s no slouch when it comes to parsing and formatting dates, times, and numbers, making it a gem in global software development.

Indeed, ICU has many features, and they can at times seem overwhelming. So this guide focuses on practical aspects of ICU (the portion you’re likely to use day-to-day). We’ll work through pragmatic examples in JavaScript, but you don’t need to be a JS expert to follow along. And we’ll provide links to ICU implementations in other programming languages.

🗒️ Internationalization (i18n) and localization (l10n) allow us to make our apps available in different languages and to different regions, often for more profit. If you’re new to i18n and l10n, check out our guide to internationalization.

How does ICU work with i18n libraries?

Let’s hit the ground running with our first example. Say we’re localizing an app: We often want to decouple our strings out of the UI and put them into message files.

# Part of our project folder structure
.
├── src
│   ├── feature1
│   └── feature2
└── translations
    ├── en-US.json
    ├── ar-EG.json
    └── fr-FR.json
Code language: plaintext (plaintext)
// translations/en-US.json
{
  "hello": "Hello, world!"
}

// translations/ar-EG.json
{
  "hello": "مرحبا بالعالم!"
}

// translations/fr-FR.json
{
  "hello": "Bonjour, monde!"
}
Code language: JSON / JSON with Comments (json)
// In our UI code

// We assume t() is provided by our i18n library,
// and resolves a translation message for the
// active locale given its key.
t("hello");

// en-US => "Hello, world!"
// fr-FR => "Bonjour, monde!"
Code language: JavaScript (javascript)

Our i18n library then manages an active locale and displays our app translated in that locale using the translations from the appropriate file.

Our message displayed in the translation of the active locale.
Our message displayed in the translation of the active locale.

Again, this locale management and switching is usually handled by an i18n library like i18next or react-intl (JavaScript), the first-party Flutter intl library, or the i18n capabilities of Java Spring. ICU itself handles the formatting of messages. We’ve seen plain text messages so far, but we’ll explore ICU’s powerful capabilities more so you can see what we mean.

A note on locales

A locale defines a language, a region, and sometimes more. Locales typically use IETF BCP 47 language tags, like en for English, fr for French, and es for Spanish. Adding a region with the ISO Alpha-2 code (e.g., BH for Bahrain, CN for China, US for the United States) is recommended for accurate date and number localization. So a complete locale might look like en-US for American English or zh-CN for Chinese as used in China.

🔗 Explore more language tags on Wikipedia and find country codes through the ISO’s search tool.

ICU Message Format

For simple translation messages like the ones above you might opt for a basic i18n library or even create your own, with no need for ICU. However, ICU’s strengths become evident in the complexity it can handle within translation messages. To showcase this, let’s dive into plurals. English has straightforward plural rules (e.g., “one tree” vs. “many trees”), but other languages may have no plural forms or extremely complex plural rules. ICU is equipped to handle all these variations.

// translations/en-US.js
{
  // English only has two plural forms,
  // one and other.
  "user_messages": `
    {count, plural,
      one {You have # message.}
      other {You have # messages.}
    }
  `
}
Code language: JavaScript (javascript)
// translations/ar-EG.js
{
  // The same message in Arabic needs to
  // cover six plural forms.
  "user_messages": `
    {count, plural,
       zero {ليس لديك رسائل}
       one {لديك رسالة واحدة}
       two {لديك رسالتين}
       few {لديك # رسائل}
       many {لديك # رسالة}
       other {لديك # رسالة}
   }
`
}
Code language: JavaScript (javascript)
// In our UI code
t("user_messages", { count: 9 });

// en-US => "You have 9 messages."
// ar-EG => "لديك ٩ رسائل"
Code language: JavaScript (javascript)

The English plural message rendered for different count values.
The Arabic plural message rendered for different count values.
This module of ICU is known as the ICU Message Format, and it manages plurals, interpolation, and other complex logic within translation messages. We’ll cover plurals in more detail, as well as the other capabilities of the ICU Message Format syntax, a bit later in this guide. Our goal here is to give you a taste of the power of ICU.

The demo

The screen captures above are from an interactive ICU demo that we’ve built as a companion to this article. It’s a React app using the React Intl i18n library, which has excellent ICU support. You don’t need to know React or React Intl to play with the demo, however.

🔗 Play with ICU features in our interactive demo on StackBlitz.

🔗 You can grab the code of the demo from GitHub and run it on your machine instead. Just make sure you have a recent version of Node.js installed.

ICU-enabled libraries

It’s important to clarify at this point that while ICU is canonically a suite of libraries in Java and C++, it’s also become an unofficial standard. So odds are ICU support will either be built into your i18n library, or there will be ICU plugins/packages for your i18n library or programming language.

For example, the demo we built for this article uses the React Intl library, which sits on top of Format.JS, a JavaScript i18n solution that adheres well to the ICU “standard”.

So before we continue our round-up of ICU features, let’s go over some of the ICU implementations in a variety of programming environments.

C/C++

ICU4C — an official spec and implementation; this and the Java library are where ICU started. Of course, ICU4C is a complete implementation of ICU.

Java

ICU4J — the other official spec and implementation; this and the C/C++ library are the ICU originals. ICU4J is a complete implementation of ICU.

Java Guides

🔗 The Java i18n Guide You’ve Been Waiting For covers Java i18n with the ICU.

JavaScript

JavaScript doesn’t have an official, first-party i18n message format. But this is JavaScript we’re talking about, so of course you have your pick of third-party libraries to use for both browsers and Node.

  • Format.JS — built on the ICU message syntax standard, among others, Format.JS is a popular collection of JavaScript i18n libraries.
  • React Intl — another part of the Format.JS suite, React Intl provides React components and hooks for easy ICU integration with React apps. (The ICU demo playground we built for this article was made with React Intl).
  • Vue Intl — you guessed it: yet another part of Format.JS, Vue Intl eases the usage of ICU in Vue apps.
  • Next Intl — while not a part of Format.JS, next-intl does use the ICU Message Format and easily integrates into Next.js apps (both Page and App Router).
  • Angular — Google’s popular front-end framework uses ICU in its first-party i18n solution.
  • i18next ICU module — an official ICU extension for the massively popular i18next library.

JavaScript Guides

🔗 A Guide to Localizing React Apps with react-intl/FormatJS

🔗 A Deep Dive into Next.js App Router Localization with next-intl

🔗 The Ultimate Guide to Angular Localization

Dart/Flutter

  • intl — the first-party Dart i18n package implements ICU message formatting.
    Flutter i18n — based on Dart’s intl, Flutter’s first-party i18n library also uses ICU message formats.

Flutter Guides

🔗 The Ultimate Guide to Flutter Localization

PHP

Symfony — the popular web framework is has first-party support for ICU messages.

PHP Guides

🔗 Symfony Internationalization: A Step-by-Step Guide

.NET

.NET globalization and ICU — Microsoft’s .NET environment has first-party ICU support.

Rust

ICU4X — a robust implementation of ICU for Rust.

This just a small sample of ICU-enabled i18n libraries and packages. So if you didn’t find something that works for your environment in the above list make sure to search around more. There are many ICU implementations out there!

Interpolation

Let’s get back to how we use ICU. First, we’ll take a look at how the ICU message syntax allows us to inject runtime values into our translation messages, otherwise known as interpolation.

// translations/en-US.json
{
  "hello_user": "Hello, {name}!"
}

// translations/ar-EG.json
{
  "hello_user": "مرحبا {name}!"
}

// translations/fr-FR.json
{
  "hello_user": "Bonjour, {name}!"
}
Code language: JSON / JSON with Comments (json)
// In our UI code.
// (i18n libraries often provide a
// key/value for injecting values
// at runtime).
t("hello_user", { name: "Adam" });

// en-US => "Hello, Adam!"
// ar-EG => "مرحبا Adam!"
// fr-FR => "Bonjour, Adam!"
Code language: JavaScript (javascript)

We can have as many {variable}s in our messages as we want.

Adding runtime variables to a translation message and providing values for them.

Plurals

We brushed over plurals a bit earlier so let’s cover them in a bit more detail. As we mentioned before, plurals are more complex than the relatively simple kind found in English: “one street” (the one form) and “three streets” (the other form). Each language has a different number of plural forms, and the rules around which form to use can get complicated.

🔗 The CLDR Language Plural Rules listing is the canonical source for languages’ plural forms. (We’ll cover the CLDR in the following section).

The ICU Message Format has full support for these plural rules. Let’s look at an English example first. The general syntax here is {counterVariable, plural, ...pluralForms}.

{
  "pokemon_count":
    `{count, plural,
      one {You have your first Pokémon!}
      other {You have collected # Pokémon!}
    }`
}
Code language: JavaScript (javascript)

🗒️ Generally speaking, the other form is always required.

In our code, we would use this message like this:

t("pokemon_count", { count: 1 });

// => "You have your first Pokémon!"
Code language: JavaScript (javascript)

The one and other forms of English plurals rendering for different counts.

Note that the special # character is swapped in for the value of count.

But what if we wanted a special case for zero Pokémon? No problem. The ICU plural syntax allows overriding any arbitrary value with the special form specifier, =n, where n is the integer case we’re overriding.

// Our translation message
{
  "pokemon_count":
    `{count, plural,
+     =0 {Go talk to Professor Oak!}
      one {You have your first Pokémon!}
      other {You have collected # Pokémon!}
    }`
}
Code language: Diff (diff)
// Our view code
t("pokemon_count", { count: 0 });

// => "Go talk to Professor Oak!"
Code language: JavaScript (javascript)

We could of course use any integer in the =n specifier and have as many =n specifiers as we want in a message.

As for the sometimes complicated rules around languages’ plural forms, ICU handles those too. Here’s the Arabic example from earlier:

The Arabic plural message rendered for different count values.

Arabic has six plural forms and its few, many, and other forms are not straightforward to resolve. For example, a count of 103 resolves to few, while 102 resolves to other. Arabic translators understand these nuances, and they can use the plural forms afforded by the ICU Message Format to ensure that Arabic readers get the best possible plural translation.

🔗 Feel free to play with the plural messages on our StackBlitz demo. (Alternatively, grab the demo code from GitHub and run it on your machine).

🗒️ We cover plurals in more detail in our Guide to Localizing Plurals.

What is the CLDR?

Let’s take a brief sidebar and go over the CLDR (Common Locale Data Repository). The CLDR is the official Unicode collection of localization data. For a given locale, the CLDR can give you the locale’s script (letters), preferred calendar, number system, date formats, pluralization rules, and more. The CLDR is used by the main ICU project and other libraries that implement the ICU “standard”.

🗒️ CLDR data is commonly embedded in i18n libraries, and we often use it without much thought. However, CLDR data must sometimes be manually imported for specific app locales. Always check your i18n library’s documentation to determine if this step is necessary.

🔗 Check out the official CLDR documentation for more info.

Ordinal plurals

Alright, back to the ICU Message Format, and what we can do with it. The plurals we covered above are technically called cardinal plurals. Another kind is ordinal plurals: These represent number ranks (e.g., first, second, third), and languages like English have special ordinal forms (e.g., 1st, 2nd, 3rd).

Just like cardinals, languages can vary in the number of ordinals they represent, if any. For example, English has four ordinal forms (one, two, few, and other). Many languages have 2-3 ordinal forms; some have one or just the other form.

🔗 The CLDR Language Plural Rules listing covers both languages’ cardinal and ordinal plural forms.

🗒️ We cover ordinal plurals in more detail in our Guide to Localizing Plurals.

As you’ve probably guessed, the ICU Message Format has excellent support for ordinal plurals. We designate an ordinal in an ICU message using the selectordinal keyword.

// English message
{
  "trees_planted":
  `{count, selectordinal,
     one {We planted our #st tree.}
     two {We planted our #nd tree.}
     few {We planted our #rd tree.}
     other {We planted our #th tree.}
  }`
}
Code language: JavaScript (javascript)

Just like cardinal plurals, the # character is replaced with the given count at render time.

// In our view code
t("trees_planted", { count: 3 });

// => "We planted our 3rd tree."
Code language: JavaScript (javascript)

Select

A general conditional expression in ICU messages is select: It functions like switch or match in programming languages.

Just like plural and selectordinal, for select we provide a runtime variable and a few branches in the message. What’s different here is that we determine what the branches are and what they mean.

// English message
{
  "fruit_picking":
    `{fruit, select,
       apple {Let's pick apples!}
       cherry {Let's pick cherries!}
       other {Let's pick fruits!}
     }`
}
Code language: JavaScript (javascript)
// In our view code

t("fruit_picking", { fruit: "cherry" });

// => "Let's pick cherries!"
Code language: JavaScript (javascript)

The select keyword is used to designate the message as a conditional. The other branch is a fallback (like the default case in a switch statement). Again, the keywords to resolve the branches are entirely up to us here. In the above message, we went with fruits (apple | cherry). Here’s a more typical example:

Using select to choose the proper gender pronoun.

🗒️ Note that in the above example, the select expression is part of a bigger message. This can be done with plural and selectordinal expressions as well.

🔗 Feel free to play with the previous examples in our StackBlitz demo. (Alternatively, grab the demo code from GitHub and run it on your machine).

Number formatting

Let’s switch gears and look at numbers. We’ll start with numbers formatted outside of messages. For this, we’ll need to look at an official ICU implementation like ICU4J (Java). This is because our demo app, built with React Intl, formats our numbers with JavaScript’s Intl.NumberFormat, which doesn’t conform to the ICU (although it offers similar capabilities).

Alright, let’s look at a Java example.

import java.util.Locale;
import com.ibm.icu.number.NumberFormatter;
import com.ibm.icu.number.Notation;
import com.ibm.icu.util.Currency;

class Main {
  public static void main(String[] args) {
    Locale[] locales = {
        new Locale("zh", "CN"), // Mandarin Chinese (China)
        new Locale("es", "ES"), // Spanish (Spain)
        new Locale("en", "US"), // English (United States)
        new Locale("hi", "IN"), // Hindi (India)
        new Locale("bn", "BD"), // Bengali (Bangladesh)
        new Locale("pt", "BR"), // Portuguese (Brazil)
        new Locale("ru", "RU"), // Russian (Russia)
        new Locale("ja", "JP"), // Japanese (Japan)
        new Locale("mr", "IN"), // Marathi (India)
        new Locale("fr", "FR"), // French (France)
        new Locale("ar", "SA"), // Arabic (Saudi Arabia)
        new Locale("id", "ID"), // Indonesian (Indonesia)
    };

    // NumberFormat
    System.out.println("Format a number");
    System.out.println("---------------");

    for (var locale : locales) {
      var formattedNumber = NumberFormatter.with()
          .notation(Notation.compactShort())
          .unit(Currency.getInstance("EUR"))
          .locale(locale)
          .format(1234)
          .toString();

      var output = String.format("%s: %s", locale, formattedNumber);

      System.out.println(output);
    }
  }
}
Code language: Java (java)

🗒️ Make sure to install ICU4J if you want to run the code above yourself.

Here’s the output:

Format a number
---------------
zh_CN: €1200
es_ES: 1,2 mil €
en_US: €1.2K
hi_IN: €1.2 हज़ार
bn_BD: ১.২ হা€
pt_BR: € 1,2 mil
ru_RU: 1,2 тыс. €
ja_JP: €1200
mr_IN: €१.२ ह
fr_FR: 1,2 k €
ar_SA: ١٫٢ ألف €
id_ID: €1,2 rb
Code language: plaintext (plaintext)

Note how were able to use Notation.compactShort() to show an abbreviated version of the number (1.2K). Of course, we’re also showing the number as a currency (Euro) value using .unit(Currency.getInstance("EUR")).

ICU is extremely powerful and flexible when it comes to number formatting. We can achieve the above formatting much more succinctly using number skeletons, which are compact tokens that define a format.

import java.util.Locale;
import com.ibm.icu.number.NumberFormatter;

class Main {
  public static void main(String[] args) {
    Locale[] locales = {
        new Locale("zh", "CN"),
        new Locale("es", "ES"),
        // ...
    };

    //...

    for (var locale : locales) {
      var formatter = NumberFormatter
        // Use a number skeleton.
        // "K" is equivalent to `Notation.compactShort()`
        .forSkeleton("currency/EUR K")
        .locale(locale);

      var output = formatter.format(1234).toString();

      System.out.println(output);
    }
  }
}
Code language: Java (java)

This gives us the same output we got before.

We can use number skeletons inside of ICU messages as well. We just need to use the special syntax, {variable, number, ::skeleton}}. We can use these skeletons with React Intl (and many other libraries that support the ICU Message Format).

// English message
// We use a percent skeleton here.
{
  "completion": "{completion, number, ::percent}"
}

// We can use the above message for all our locale
// translation files, which we're assuming here. We
// can alternatively use a different date format
// for each locale.

// In our view code:
t("completion", { completion: 0.85 });

// en-US => "85%"
// bn-BD => "৮৫%"
// ar-SA => "٨٥٪؜"
Code language: JavaScript (javascript)

Here’s another example from our demo playground:

Currency and compact short skeletons used to render a number in various locales.

🗒️ Keep in mind that our demo playground utilizes React Intl, which supports ICU number skeletons in translation messages, but leverages JavaScript’s Intl.NumberFormat under the hood to format them. While many of ICU’s number skeletons are likely to be compatible with React Intl, be aware that any unsupported ones might be due to limitations in Intl.NumberFormat.

🗒️ Dive deeper into number localization with our Concise Guide to Number Localization.

Date formatting

ICU’s flexibility and power extend fully to date formatting. Let’s again start by formatting dates outside of messages. We’ll return to the official Java ICU implementation, ICU4J (Java). (Our demo app, built with React Intl, uses JavaScript’s Intl.DateTimeFormat, which, like Intl.NumberFormat offers capabilities akin to ICU but does not fully conform to it).

Here’s an example in Java:

import java.util.Date;
import java.util.Locale;
import com.ibm.icu.text.SimpleDateFormat;

class Main {
  public static void main(String[] args) {
    Locale[] locales = {
        new Locale("zh", "CN"), // Mandarin Chinese (China)
        new Locale("es", "ES"), // Spanish (Spain)
        new Locale("en", "US"), // English (United States)
        new Locale("hi", "IN"), // Hindi (India)
        new Locale("bn", "BD"), // Bengali (Bangladesh)
        new Locale("pt", "BR"), // Portuguese (Brazil)
        new Locale("ru", "RU"), // Russian (Russia)
        new Locale("ja", "JP"), // Japanese (Japan)
        new Locale("mr", "IN"), // Marathi (India)
        new Locale("fr", "FR"), // French (France)
        new Locale("ar", "SA"), // Arabic (Saudi Arabia)
        new Locale("id", "ID"), // Indonesian (Indonesia)
    };

    // SimpleDateFormat
    System.out.println("Format a date");
    System.out.println("-------------");

    Date currentDate = new Date();

    for (var locale : locales) {
      // Use a date skeleton to define the
      // format.
      var dateFormat = new SimpleDateFormat(
          "yyyy.MMMM.dd GGG hh:mm aaa", locale);

      var formattedDate = dateFormat.format(currentDate);

      var output = String.format("%s: %s", locale, formattedDate);

      System.out.println(output);
    }
  }
}
Code language: Java (java)

🗒️ Make sure to install ICU4J if you want to run the above code.

This will output the current date in different formats based on the locale:

Format a date
-------------
zh_CN: 2023.十二月.22 公元 12:00 下午
es_ES: 2023.diciembre.22 d. C. 12:00 p. m.
en_US: 2023.December.22 AD 12:00 PM
hi_IN: 2023.दिसंबर.22 ईस्वी 12:00 pm
bn_BD: ২০২৩.ডিসেম্বর.২২ খৃষ্টাব্দ ১২:০০ PM
pt_BR: 2023.dezembro.22 d.C. 12:00 PM
ru_RU: 2023.декабря.22 н. э. 12:00 PM
ja_JP: 2023.12月.22 西暦 12:00 午後
mr_IN: २०२३.डिसेंबर.२२ इ. स. १२:०० PM
fr_FR: 2023.décembre.22 ap. J.-C. 12:00 PM
ar_SA: ١٤٤٥.جمادى الآخرة.٠٩ هـ ١٢:٠٠ م
id_ID: 2023.Desember.22 M 12:00 PM
Code language: plaintext (plaintext)

Note that we used an ICU date skeleton to define the date format above. There’s a wide variety of formatting options available here, and they can be used in ICU messages as well. As you may have guessed, we need a special syntax in our messages to format dates: {variable, date, ::skeleton}.

Let’s see an example in JavaScript:

// Translation message
{
  "eventDate": "{date, date, ::yyyyMMMd}"
}

// Again, we assume that we're using the same
// message for each locale (but we could use
// a different date format for each locale if
// we wanted to).

// In our view code:
t("eventDate", { date: new Date("2024-03-12") });

// en-US => "Mar 12, 2024"
// es-ES => "12 mar 2024"
// zh-CN => "2024年3月12日"
Code language: JavaScript (javascript)

🔗 In our demo playground, you can see different date formats rendered for various locales.

🗒️ Much like with number formatting, our demo playground only supports a subset of ICU datetime skeletons. This is because our demo is built with React Intl, which uses JavaScript’s Intl.DateTimeFormat for its actual formatting. So some ICU datetime skeletons may not be supported due to Intl.DateTimeFormat’s limitations.

🔗 We’re just giving you a taste of what the ICU can offer regarding dates. The libraries can also format relative datetimes, work with time zones, parse dates, and much more.

🔗 Feel free to play with the previous examples in our StackBlitz demo. (Alternatively, grab the demo code from GitHub and run it on your machine).

A bridge across language barriers

That brings us to the end of this ICU guide. In our connected world, the role of tools like ICU in i18n and l10n is crucial. With its proficiency in handling plurals, selectors, and complex formats, ICU can be an incredible tool for developers aiming to present software effectively across different languages and regions. And we’ve just scratched the surface of what ICU can do. We hope that this guide has been helpful to you as a start to your work with ICU as you develop software for everyone.