Linguistics for Developers: Real-World i18n Challenges

Internationalization is one of those tricky problems that many of us put off until the last moment. It feels like it’s going to be difficult. And often, that’s true. It’s easy to forget that while English dominates web development, it no longer dominates the web itself.

The numbers show the trend: the web is becoming increasingly multilingual. At the start of 2015, 55.6% of web content was in English. By January 2025, that had fallen to 49.3%—a drop of 6.3 percentage points.

But the learning curve is steep. Suddenly you’re not just solving technical problems—you’re doing a crash course in linguistics, discovering that many of the assumptions from your native tongue are entirely upended in other languages.

So, in this guide we’ll look at some of the key linguistic concepts you’ll need to consider when localizing your software.

Overview

Linguistics you need to know for i18n

Here’s that crash course in linguistics. If you’re an English monoglot, you’re in for some surprises. That’s because English grammar is relatively simple. Sure, the spelling is pretty rough and you must be thorough to avoid the hiccoughs. (See what I did there?)

But grammatically, it’s a breeze even compared to the Anglo Saxon that preceded it. (Why? Because when Danes, Angles, Saxons, and Norse settlers needed to communicate, people naturally dropped the complicated bits to find a common tongue.)

So, let’s look at some of the issues you’ll need to consider.

Text will expand and shrink in ways that might break your UI

The days of pixel-perfect web design went out of the window (pardon the pun) with responsive layouts. But our fluid, flexible design systems need to take into account not just different screen sizes but also differences in text lengths.

Take a pretty common example:

English: Retry (5 characters)
German: Erneut versuchen (17 characters)
Chinese: 重试 (2 characters)

Imagine we have a sign-up form. Our frontend gets an error message back from the backend so we invite the user to try again.

If we build our UI with the English string in mind, we might set a fixed width for the Retry button. This works fine for “Retry”. But in German the text becomes “Erneut versuchen”. So, the text has to wrap and our button becomes twice as tall. There’s a chance that might not be a problem but if we haven’t planned for it we could end up with layout shifts, overlapping elements, or buttons that break out of their containers.

In this example, we’re lucky: these particular German words wrap quite neatly in the space provided. But, if either word was longer, we might end-up with an ugly word break.

To avoid that, let’s go with a flexible width so our button expands to accommodate the longer German text.

But, of course, this is frontend development so it’s not as simple as that. Let’s take a look at what happens to the button when we select Chinese:

It shrinks with the width of the text. And, yes, that’s what we want but only up to a point. The Web Content Accessibility Guidelines (WCAG) have specific requirements when it comes to target size. To make our button usable on touch interfaces, it needs to have a minimum size of 44px by 44px. So we must prevent our buttons from shrinking too far.

In Tailwind, that might look a bit like this:

</button>

Setting a minimum width means the button will always meet the accessibility requirement and it makes your layout more predictable.

Getting creative with translations

Making your UI flexible is good practice, whether or not you want to handle localized strings. But that flexibility can only get you so far.

What do you do when a translated string is so long that you can’t accommodate it without compromising other aspects of the interface? In a photo sharing app, you might have a feature to share an image with the user’s friends:

English: “Share with friends” (17 characters)
German: “Mit Freunden teilen” (18 characters)
Russian: “Поделиться с друзьями” (21 characters)
Finnish: “Jaa ystävien kanssa” (19 characters)

The English is already pretty long. But the Russian might just push past the limits of your UI. So, what do you do?

Your first steps should be to find a way to accommodate the longer text. If you really want to support speakers of a particular language, then you need to treat them as first-class citizens within your app.

Realistically, though, there might be strong reasons preventing you from doing that. The next option is to change the string to something shorter that preserves enough of the meaning. In Russian, that could be “Поделиться”, or “Share”, instead. The “with friends” part would be obvious to most users from context.

Alternatively, using an icon instead of text could do away with the problem altogether. Just be certain that the icon will be understood the same way in all your target markets.

As a last resort, you might truncate the string with ellipses. But be aware that this creates a diminished experience. And it somewhat misses the point of localization, which is to provide a natural, smooth experience; truncated text adds friction, instead.

Good news! You can predict what space different languages require

You can, though, prepare for these scenarios in advance.

Over the years, localization teams have developed rough ratios to estimate how much longer or shorter a language tends to be compared to its English equivalent.

One set of ratios comes from translation service company Eriksen. While not foolproof, these ratios can give you an idea of how much flexibility you need to build into your UIs:

Language	From English	To English
Arabic	+25%	-20% to -25%
Chinese	varies	varies
Danish	-10% to -15%	+10% to +15%
Finnish	-25% to -40%	+30% to +40%
French	+15% to +20%	-10% to -15%
German	+10% to +35%	-20% to -35%
Greek	+5% to +10%	-5% to -20%
Japanese	varies	+10% to +55%
Korean	-10% to -15%	+10% to +20%
Norwegian	-10% to -15%	+10% to +15%
Portuguese	+15% to +30%	-5% to -15%
Spanish	+20% to +25%	-10% to -20%
Swedish	-10% to -15%	+10% to +15%

That way, you should have fewer surprises when reviewing localized versions of your software.

Right to left is about more than the direction of your text

You probably already know that some languages run right to left rather than left to right. But if you’re targeting speakers of Arabic, Hebrew, Persian, and Urdu, amongst other languages, there’s a little more to the problem than simply adding an RTL modifier to your components.

That’s because the entire UI—not just the text—must adapt to right-to-left (RTL) layouts.

Directional CSS is culturally biased: Those margin-left and float: right properties aren’t just styling—they’re cultural assumptions. Modern CSS offers direction-agnostic properties like margin-inline-start that automatically flip in RTL contexts. For existing projects, tools like rtlcss can help.
Icons are directional, too: Directional icons like arrows, pagination controls, and media playback buttons might need to be flipped for RTL—but not all icons should be mirrored. A “share” icon or checkbox should remain the same, while a “next” arrow should be reversed. Similarly, elements like progress bars need to run from right to left.
Bidirectional text needs careful handling: Mix Arabic with English product names or numbers, and watch your layouts break. The browser has to constantly switch direction mid-paragraph, creating bizarre text flow issues. Putting <bdi> tags around mixed content can help, though.

Try it for yourself by adding dir=”rtl” to your HTML tag and watch your UI fall apart in creative ways. You’ll see forms break, tooltips vanish off-screen, and plenty more to add to your to-do list.

Pluralization is complex, to say the least

In English, we tend to think of pluralization as a simple matter of adding “s” or maybe “es” to the end of a noun. But even English is more involved than that. For example, the plural of ox is oxen but the plural of fox is foxes. Similarly, child becomes children. And then there’s a particular type of person who enjoys saying “octopi” even though octopus is from Greek and would technically be octopodes—but in modern English, it’s just plain old octopuses.

Outside of English, pluralization becomes much more involved. While in English a word is singular or plural, other languages might have different word endings depending on whether there are two, three, a few, or a large number of items.

You’re most likely to come across complex plural forms in Slavic languages, such as Russian and Polish. For example, in Polish:

1 plik (file)
2 pliki (a different plural form for 2-4)
5 plików (yet another form for 5+)

Arabic takes this even further with specific forms for zero, one, two, small plurals (3-10), and large plurals (11+). So that simple “X items in your cart” message suddenly requires five different translation strings.

Ranges create another level of complexity. When displaying “5-10 items remaining,” some languages might need different grammatical forms for the beginning and end of the range.

This means you can’t just write components that insert a different word depending on whether it’s singular or plural. In fact, there’s a general rule of string handling when it comes to localization: avoid concatenation in favor of interpolation.

Building sentences by joining fragments together (like “You have ” + count + ” new ” + itemType) creates rigid structures that often can’t handle the varied grammars of different languages. They put numbers, subjects, and verbs in different orders, and adjectives might need to agree with the noun’s gender and count. Instead, use complete sentence templates with placeholders that translators can rearrange as needed for their language.

But there is some help to be found. Plural forms are a problem that many people have already considered and the International Components for Unicode project, as well as your preferred frontend framework, can point you in the right direction.

Do you need to store the gender of every user?

If you’ve built your UI for English, Chinese, or Turkish first, you’ve probably focused on gendered language mainly for pronouns or salutations. But there’s a bigger challenge: roughly half the world’s languages have grammatical gender systems that affect much more than just pronouns.

When localizing your interface, you’ll come across two main ways that gender impacts your text strings:

Word endings that change based on the subject’s gender.
Gendered forms for articles, adjectives, and even inanimate objects.

Take Spanish, for example. The simple message “Alice is online” becomes “Alice está conectada,” if Alice identifies as female, but for a make it would be something like, “Bob está conectado.” The participle changes based on the person’s gender. French works similarly: “Alice est connectée” versus “Bob est connecté.”

So, this leaves you with a decision. How do you localize your app in a way that is grammatically correct in a gendered language? Do you capture the gender of every user? Perhaps, but not all users identify as strictly male or female.

Some of your options include:

Embrace gender: Create separate message templates for different gender variations, while using the locale-specific approach to serving your non-binary users. Of course, this means asking for and storing the gender identity of everyone who uses your software.
Restructure your sentences: Rephrase to avoid gendered constructions when possible.
Use existing neutral alternatives: Many languages already have accepted gender-neutral forms.
Consider emerging inclusive forms: Some communities use alternatives like the “-e” ending in Spanish (conectade), though adoption varies widely by region and context.
Try hybrid approaches: Something like “Usuario en línea: Alice” minimizes gendered language while maintaining correct grammar.

But gender isn’t just for people

User gender shows that there’s more to localization than just storing and displaying some alternative strings. But if you thought that was tricky, then gendered objects take it a step further.

In German, “the red car” is “das rote Auto” because “Auto” is neuter. But “the red flower” is “die rote Blume” because “Blume” is feminine. Both the article and adjective change form based on the noun’s gender. That’s perhaps easier if you have a known set of strings and you can plan in advance.

But it can get messy with user generated content. Imagine a notification system: “Alice gave you a banana.” In many languages, the form of “gave” might change based on who’s doing the giving and what’s being given.

In Polish, this might be:

“Alice dała ci banana” (Alice gave you a banana)
“Bob dał ci banana” (Bob gave you a banana)

The verb “dał/dała” (gave) changes form based on the person’s gender. But it can be even more complex. In Polish and many other languages, the verb might also change based on the gender of the person receiving the banana.

So what’s the solution? Two key approaches:

Give your translators all possible context: Screenshots, workflow descriptions, and more will give you your translators the best chance of constructing grammatically correct sentences for each language, where they can account for gender agreement and other language-specific rules.
Store metadata alongside your strings: Many i18n systems now support this approach, allowing you to keep track of the gender of known items.

For example:

const items = {

“banana”: {

gender: {

“de”: “masculine”,

“fr”: “feminine”,

“es”: “masculine”,

“pl”: “masculine”

}

};

Grammatical roles

If object genders weren’t challenging enough, some languages transform names and nouns—a process called inflection—depending on their role in the sentence.

English handles this differently. Consider the words “to” or “for”. These prepositions do the work that would be done by inflections in other languages.

Let’s look at an example from Finnish, which is known for its involved approach to inflection:

“Alice on verkossa” (Alice is online)
“Lähetä viesti Alicelle” (Message Alice)

To an English speaker, the idea of someone’s name changing depending on its role in the sentence might seem a bit odd at first. But we do the same in English all the time. “Bob’s computer is offline”: here we’re adding apostrophe s to Bob’s name to show he owns the computer. Go back far enough in the history of English and you’ll find that the possessive apostrophe wasn’t always there. The “s” on Bob’s name is a remnant of older English’s case system; it’s one of the few forms of inflection we still use today in English, so we hardly think about it.

Finnish has 15 grammatical cases, each adding different endings to names and nouns depending on their role in the sentence. As well as Finnish, Polish and many other languages build meaning into different word endings that in English would be conveyed by separate words like prepositions.

The ICU standard helps out here, enabling you to keep track of different forms of words:

{name, case,

nominative {Alice}

dative {Alicelle}

genitive {Alicen}

}

You can then insert the appropriate form into complete translated sentences.

Linguistics doesn’t have to be hard

The linguistic aspect of localization is, perhaps, an added complexity that you might not have expected. It’s not just about swapping one string for another but, instead, navigating the quirks of human language.

The good news? You don’t need to become a linguist overnight. Modern i18n libraries handle much of the complexity we’ve discussed—from text expansion to RTL layouts, from pluralization rules to grammatical cases. Your job is to build flexibility into your UI, provide clear context for translators, and remember that concatenated strings are a localization nightmare waiting to happen.

As the web continues its multilingual expansion, software that respects linguistic diversity will simply work better for more people. It’s a bit more work upfront, sure. But when your Finnish users aren’t confused by broken case endings or your Arabic users don’t have to deal with a backward UI, you’ll be glad you put in the effort.