Software localization

What’s the Best Database Structure to Keep Multilingual Data?

There are many ways of designing databases to accommodate localized, multilingual data. Let's explore the most common ones and try to choose the best one.
Software localization blog category featured image | Phrase

There are many ways to design our databases to accommodate localized data. Modeling can be simple or sophisticated depending on our app's needs, and there are general considerations for any database that works with a localized app.

A Simple Approach: Localized Columns on the Model

The most basic approach to localizing relational database data is localizing within the model itself. This is as simple as having title_en and title_fr columns instead of a title column, for example.

CREATE TABLE `articles_i18n_simple` (

  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,

  `published_at` datetime NOT NULL,

  `title_en` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',

  `title_fr` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',

  `body_en` text COLLATE utf8mb4_unicode_ci,

  `body_fr` text COLLATE utf8mb4_unicode_ci,

  PRIMARY KEY (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

We could then query our table like normally do (without i18n) and then refine our data in the application layer, extracting the title and body that match the active locale in the app. This is a good solution for small apps, especially ones where we're sure that our supported locales won't change.

A Sophisticated Approach: Separate Translation Tables

While our simple approach above is quick and easy to use, it's a bit inflexible. If we added Hebrew to the above table, we would have to add new Hebrew versions for every localizable column. A solution that scales better is having translations in a separate table per model.

CREATE TABLE `articles_i18n` (

  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,

  `published_at` datetime NOT NULL,

  PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `articles_i18n_translations` (

  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,

  `article_id` int(11) NOT NULL,

  `locale` varchar(5) COLLATE utf8mb4_unicode_ci NOT NULL,

  `title` varchar(2000) COLLATE utf8mb4_unicode_ci NOT NULL,

  `body` text COLLATE utf8mb4_unicode_ci,

  PRIMARY KEY (`id`)

) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Of course, this makes our queries a bit more complex and expensive, since we may have to join or perform two queries to get all our model data.

SElECT articles_i18n.id AS id, published_at, title, body

FROM articles_i18n

INNER JOIN articles_i18n_translations ON article_id = articles_i18n.id

WHERE locale = "fr";

With that complexity, however, comes flexibility: we can have as many locales as we want, and it wouldn't change the basic structure of our models.

General Considerations for Database i18n

Unicode

Notice that the tables and text fields above are defined with utfmb4_unicode_ci. This collation gives us full, case-insensitive Unicode – the storage and sorting of our text data will include all characters in all languages supported by the Unicode standard. Other collations can cause problems with certain characters, so do make sure you know which locales your app is supporting when selecting a collation if you're not going with utfmb4_unicode_ci.

Dates and Times

It's a good idea to store dates and times in UTC, and when reading the data, add or subtract a time zone offset to get the date or time in the user's time zone. Optionally, if you want to log the time zone the data was stored in, include a second time zone offset column for each date/time column, while still keeping the date and time column themselves in UTC.

Further reading

If you're interested in how to speed up your ActiveRecord SQL queries for your internationalized database, have a look at our guide.

For a relaxed i18n experience, check out Phrase. Built by developers for developers, Phrase is a one-stop-shop for all your i18n needs. With its GitHub, GitLab, and Bitbucket sync, and flexible CLI and API, you can tailor your workflow to automatically push translation strings to Phrase. Your translation team picks up the strings in a well-designed web interface, localizes them, and saves them. You can then seamlessly pull these translations down to your dev machine and get back to the creative code you love. Phrase supports a comprehensive collection of platforms. And if you happen to be working in iOS or Android, you can push your translations over-the-air (OTA) without app reviews. Yes, that's your developer's heart smiling, and that's why we're here. Check out all of Phrase's features and try Phrase for free.