Software localization

Detecting a User’s Locale in a Web App

One of the most common issues in web app development is detecting a user's locale. This is how to do it the right way.
Software localization blog category featured image | Phrase

Whether we're developing a simple blog, or a sophisticated, modern single-page application (SPA), oftentimes, when considering i18n in a web application, we hit an important question: how do we detect a user's language preference? This is important because we always want to provide the best user experience, and if the user has defined a set of preferred languages in his or her browser, we want to do our best to present our content in those preferred languages.

In this article, we'll go through three different ways of detecting a user's locale: through the browser's navigator.languages (on the client) object, through the Accept-Language HTTP header (on the server), and through geolocation using the user's IP address (on the server).

Client-side: The navigator.languages Object

Modern browsers provide a navigator.languages object that we can use to get all the preferred languages the user has set in his or her browser.

Browser navigator.languages object for webpage language settings | Phrase

The language settings in Firefox

Given the settings above, if we were to open the Firefox console and check the value of navigator.languages, we would get the following:

Firefox navigator.languages object value | Phrase

The codes for the locales match the ones in our browser settings

navigator.languages is available in all modern web browsers and is generally safe to rely on. So let's write a reusable JavaScript function that tell us the preferred language(s) of the current user.

function getBrowserLocales(options = {}) {

  const defaultOptions = {

    languageCodeOnly: false,

  };

  const opt = {

    ...defaultOptions,

    ...options,

  };

  const browserLocales =

    navigator.languages === undefined

      ? [navigator.language]

      : navigator.languages;

  if (!browserLocales) {

    return undefined;

  }

  return browserLocales.map(locale => {

    const trimmedLocale = locale.trim();

    return opt.languageCodeOnly

      ? trimmedLocale.split(/-|_/)[0]

      : trimmedLocale;

  });

}

getBrowserLocales() checks the navigator.languages array, falling back on navigator.language if the array isn't available. It's worth noting that in some browsers, like Chrome, navigator.language will be the UI language, which is likely the language the operating system is set to. This is different than navigator.languages, which has the user-set preferred languages in the browser itself.

✋🏽 Heads up » If you're supporting Internet Explorer, you will need to use the navigator.userLanguage and navigaor.browserLanguage properties. Of course, you will also need to replace all instances of const with var in the code above.

Our function also has a convenient languageCodeOnly option, which will trim off the country codes of locales before it returns them. This can be handy when our app isn't really handling the regional nuances of a language, e.g. we just have one version of English content.

With languageCodeOnly: true, we get the languages without countries | Phrase

With languageCodeOnly: true, we get the languages without countries

Server-Side: The Accept-Language HTTP Header

If the user sets his or her language preferences in a modern browser, the browser will, in turn, send an HTTP header that relays these language preferences to the server with each request. This is the Accept-Language header, and it often looks something like this: Accept-Language: en-CA,ar-EG;q=0.5.

The header lists the user's preferred languages, with a weight defined by a q value, given to each. When an explicit q value is not specified, a default of 1.0 is assumed. So, in the above header value, the client is indicating that the user prefers Canadian English (with a weight of q = 1.0), then Egyptian Arabic (with a weight of q = 0.5).

We can use this standard HTTP header to determine the user's preferred locales. Let's write a class called HttpAcceptLanguageHeaderLocaleDetector to do this. We'll use PHP here, but you can use any language you like; the Accept-Language header should be the same (or similar enough) in all environments.

<?php

class HttpAcceptLanguageHeaderLocaleDetector

{

  const HTTP_ACCEPT_LANGUAGE_HEADER_KEY = 'HTTP_ACCEPT_LANGUAGE';

  public static function detect()

  {

    $httpAcceptLanguageHeader = static::getHttpAcceptLanguageHeader();

    if ($httpAcceptLanguageHeader == null) {

      return [];

    }

    $locales = static::getWeightedLocales($httpAcceptLanguageHeader);

    $sortedLocales = static::sortLocalesByWeight($locales);

    return array_map(function ($weightedLocale) {

      return $weightedLocale['locale'];

    }, $sortedLocales);

  }

  private static function getHttpAcceptLanguageHeader()

  {

    if (isset($_SERVER[static::HTTP_ACCEPT_LANGUAGE_HEADER_KEY])) {

      return trim($_SERVER['HTTP_ACCEPT_LANGUAGE']);

    } else {

      return null;

    }

  }

  private static function getWeightedLocales($httpAcceptLanguageHeader)

  {

    if (strlen($httpAcceptLanguageHeader) == 0) {

      return [];

    }

    $weightedLocales = [];

    // We break up the string 'en-CA,ar-EG;q=0.5' along the commas,

    // and iterate over the resulting array of individual locales. Once

    // we're done, $weightedLocales should look like

    // [['locale' => 'en-CA', 'q' => 1.0], ['locale' => 'ar-EG', 'q' => 0.5]]

    foreach (explode(',', $httpAcceptLanguageHeader) as $locale) {

      // separate the locale key ("ar-EG") from its weight ("q=0.5")

      $localeParts = explode(';', $locale);

      $weightedLocale = ['locale' => $localeParts[0]];

      if (count($localeParts) == 2) {

        // explicit weight e.g. 'q=0.5'

        $weightParts = explode('=', $localeParts[1]);

        // grab the '0.5' bit and parse it to a float

        $weightedLocale['q'] = floatval($weightParts[1]);

      } else {

        // no weight given in string, ie. implicit weight of 'q=1.0'

        $weightedLocale['q'] = 1.0;

      }

      $weightedLocales[] = $weightedLocale;

    }

    return $weightedLocales;

  }

  /**

   * Sort by high to low `q` value

   */

  private static function sortLocalesByWeight($locales)

  {

    usort($locales, function ($a, $b) {

      // usort will cast float values that we return here into integers,

      // which can mess up our sorting. So instead of subtracting the `q`,

      // values and returning the difference, we compare the `q` values and

      // explicitly return integer values.

      if ($a['q'] == $b['q']) {

        return 0;

      }

      if ($a['q'] > $b['q']) {

        return -1;

      }

      return 1;

    });

    return $locales;

  }

}

This long bit of code is actually not very complicated. In the only public method, detect(), our class does the following:

  1. Gets the raw string value of the Accept-Language header, e.g. "en-CA,ar-EG;q=0.5"
  2. Uses the helper method getWeightedLocales() to parse the header string into an array that looks like [['locale' => 'en-CA', 'q' => 1.0], ['locale' => 'ar-EG', 'q' => 0.5]].
  3. Uses the helper method sortLocalesByWeight() to sort the above array from highest to lowest q value.
  4. Plucks the locale values from the sorted array, returning an array that looks like ['en-CA', 'ar-EG'].

We can now use our new class to get a nice, consumable array of locale codes based on the Accept-Language HTTP header.

<?php

$locales = HttpAcceptLanguageHeaderLocaleDetector::detect();

// => ['en-CA', 'ar-EG']

Server-side: Geolocation by IP Address

Sometimes the Accept-Language header won't be present in the requests to our server. In these cases we might want to use the user's IP address to determine the user's country, and infer the locale or language from this country.

✋🏽 Heads up » Geolocation should be used as a last resort when detecting the user's locale, as it can often lead to an incorrect locale determination. For example, if we see that our user is coming from Canada, do we assume that his or her preferred language is English or French? Both are formal and widely-used languages in the country. And, of course, the user could belong to an Arabic-speaking minority, or be a Spanish-speaking visitor.

Using MaxMind for Geolocation

In order to determine the user's country by the request's IP address, we'll use the MaxMind PHP API and the MaxMind geolocation database. MaxMind is a company that offers a few IP-related products, and among them are two that are of interest to us here:

  • The GeoIP2 Databases — these are MaxMind's commercial geolocation databases and are low-latency and subscription-based. You may want to upgrade to these if you want more up-to-date or faster databases.
  • The GeoLite2 Databases — these are MaxMind's free geolocation databases, and while reportedly less accurate than their commercial counterparts, they're more than enough to get started with. We'll be using a GeoLite2 database here. Do note that you will need to credit Maxmind on your public web page and link back to their site if you use one of their free databases.

To install the database, just sign up for a free MaxMind account. You'll receive an email with a sign-in link. Follow the link and sign in. Once you do, you should land on your Account Summary page.

MaxMind Download Databases | Phrase

Click the Download Databases link on the Account Summary page

This will take you to a page with the list of free GeoList2 databases. Grab the country binary database from there.

MaxMind country binary database | Phrase

We want the country binary database for our purposes

Place the file you downloaded somewhere in your project.

We'll also need the MaxMind PHP API to work with the database. We can install that with Composer.

composer require geoip2/geoip2:~2.0

Peter Kahl's Country-to-Locale Package

We'll need one more package before we get to our code. In order to determine the locales or languages of a country, we'll use Peter Kahl's country-to-locale package. We can install it using Composer as well.

composer require peterkahl/country-to-locale

The IP Address Locale Detector Class

With our setup in place, we can get to our own class, IpAddressLocaleDetector.

<?php

require '../vendor/autoload.php';

use GeoIp2\Database\Reader;

use peterkahl\locale\locale;

class IpAddressLocaleDetector

{

  const MAX_MIND_DB_FILEPATH =

    __DIR__ . '/GeoLite2-Country_20200121/GeoLite2-Country.mmdb';

  private static $maxMindDbReader;

  public static function detect()

  {

    $ipAddress = static::getIpAddress();

    try {

      $record = static::getMaxMindDbReader()->country($ipAddress);

      $locales = locale::country2locale($record->country->isoCode);

      $normalizedLocales = str_replace('_', '-', $locales);

      return explode(',', $normalizedLocales);

    } catch (Exception $ex) {

      return null;

    }

  }

  private static function getIpAddress()

  {

    return $_SERVER['REMOTE_ADDR'];

  }

  private static function getMaxMindDbReader()

  {

    if (static::$maxMindDbReader == null) {

      static::$maxMindDbReader = new Reader(static::MAX_MIND_DB_FILEPATH);

    }

    return static::$maxMindDbReader;

  }

}

Our class is relatively straightforward. Much like HttpAcceptLanguageHeaderLocaleDetector, it has one public method, detect(), which does the following:

  1. Get the request's IP Address from the global $_SERVER array.
  2. Feeds this IP address to the MaxMind database Reader's country method, which attempts to geolocate a country based on the IP address.
  3. Uses Peter Kahl's locale::country2locale() to get the languages of the given country.
  4. Normalizes the acquired locales, so that "en_CA,ar_EG" becomes "en-CA,ar-EG".
  5. Returns the locales it normalized as an array, e.g. ["en-CA", "ar-EG"].

📖 Go deeper » The MaxMind Reader has many more methods. Check out the official API documentation if you want to dive a bit deeper into the info available in the MaxMind databases.

Server-side: Cascading Locale Detection

Given the two server-side detection strategies we covered above, we can write a little detect_user_locales() function that can attempt the HTTP header strategy first.

<?php

require './HttpAcceptLanguageHeaderLocaleDetector.php';

require './IpAddressLocaleDetector.php';

function detect_user_locales()

{

  $locales = HttpAcceptLanguageHeaderLocaleDetector::detect();

  if (count($locales) == 0) {

    $locales = IPAddressLocaleDetector::detect();

  }

  if (count($locales) == 0) {

    // fall back on some default locale, English in this case

    $locales = ['en'];

  }

  return $locales;

}

If HTTP Header detection fails, detect_user_locales() will try IP geolocation detection. If the latter bears no fruit, the function will fall back on some default locale.

If handled carefully, detecting the user's locale can help provide a better user experience in our web apps. Thankfully, the navigator.languages object and Accept-Langauge HTTP header are available to reduce our guesswork when it comes to locale detection.

If you and your team are working on an internationalized web app, check out Phrase for a professional, developer-friendly i18n platform. Featuring a flexible CLI and API, translation syncing with GitHub and Bitbucket integration, over-the-air (OTA) translations, and much more, Phrase has your i18n covered, so you can focus on your business logic.

Check out all Phrase features for developers and see for yourself how it can streamline your software localization workflows.