Making LLMs work for scalable, brand-consistent multilingual content

Senior Content Manager at Phrase

Last updated on September 26, 2025.

Learn how localization teams are moving from LLM experimentation to operational use. In this webinar recap, we explore practical ways to generate, adapt, and evaluate multilingual content with AI—without sacrificing brand consistency or control. Featuring insights and live demos from Phrase’s Semih Altinay and Miklós Urbán.

As AI continues to evolve, so do the expectations placed on localization teams. Large Language Models (LLMs) have moved from being a buzzword to a practical tool that can drive scalable, brand-consistent multilingual content. But turning that potential into production-ready workflows is a real challenge.

In our recent webinar “Making LLMs Work for Scalable, Brand-Consistent Multilingual Content” Phrase’s VP of AI Solutions, Semih Altinay, and Principal Enterprise AI & Automation Architect Miklós Urbán unpacked how LLMs are reshaping localization.

From live demos to strategy insights, the session explored how companies can go beyond experimentation and begin operationalizing AI in a safe, repeatable, and governed way.

Watch the full webinar recording:

Overview

Why LLMs are a game changer for localization

One of the first things our hosts made clear was just how far machine translation has evolved… and how much further large language models are pushing that evolution.

Semih began by outlining the progression from early rule-based systems—those rigid frameworks built on hand-coded linguistic rules—to the more flexible but still limited era of statistical machine translation.

From there, he described the rise of neural machine translation (NMT), which introduced deep learning into the mix and brought with it a new level of fluency. For many teams, NMT still plays a dependable role, especially when high volume and fast turnaround are priorities.

But as Semih noted, even with these improvements, NMT remains confined by its design. It processes one segment at a time, which can limit coherence across longer content. It’s effective at converting language, but doesn’t really adapt it. There’s no awareness of tone, audience expectations, or stylistic choices.

This is where large language models start to shift the dynamic. Rather than just translating, LLM’s have the ability to truly interpret. They can adjust phrasing for a specific readership, reframe intent, or alter tone to suit regional expectations. This means they offer a more flexible kind of language handling, which moves beyond simple substitution and into transformation.

“There’s a generational shift between neural machine translation and large language models.” — Semih Altinay, VP of AI solutions at Phrase

To illustrate this shift, Semih shared a story from his own experience. Preparing for a trip to Türkiye, he wanted to send a note in Turkish to his Airbnb host. Although Turkish is his native language, he’d been living in the U.S. for decades and often writes more naturally in English. So he used ChatGPT to help.

“I was going to write a note to the Airbnb person. I just wanted to see if there’s a grocery store basically around the house that we’re going to rent out.
I used chat GPT again, and what I got back was just this incredible translation.

I was so blown away, because it didn’t just translate what I was saying into those questions, but it really understood the cultural nuances, like we have two different grocery store names or types in Turkey, for example, there’s a small mom and pop shop, and there’s a big market.

And when I said, Is there a grocery store around the place, it actually translated as, is there this or that in that location?”
Semih Altinay, VP of AI solutions at Phrase

This wasn’t just a question of accurate vocabulary. The model responded with phrasing that reflected local customs and made the message feel as though it had been written by someone familiar with that part of the country.

That example set the tone for the rest of the session: As LLMs bring new capabilities, they’re changing expectations. And as our speakers made clear, this isn’t some distant trend. It’s already shaping how content is created, localized, and evaluated across real production workflows.

Phrase is adapting its platform around that shift. As the session later demonstrated, tools like Auto Adapt, Phrase Next GenMT, and AI-assisted quality review are being built to make LLMs usable in practice. The result is a system that can modify copy for different age groups, cultural norms, or linguistic registers, all while preserving brand alignment and glossary fidelity.

(To hear more about when and why LLMs outperform traditional MT—and when they don’t—you can also read Semih’s post on LinkedIn: GenAI vs. NMT for Translation.)

From raw models to production-ready workflows

Another central theme in the webinar was the gap between what large language models are capable of in theory, and what they can actually deliver in a high-volume, high-stakes localization environment.

As Semih explained, early interactions with LLMs (like the first public versions of ChatGPT) captured attention for good reason. The fluency was impressive. The ease of generating multilingual content felt new. But for enterprise teams managing complex workflows, that surface-level brilliance didn’t go far enough.

These early models often failed to produce consistent results. The same prompt might return different answers each time. There was no mechanism to enforce terminology rules, apply a preferred tone, or stick to an approved glossary. The outputs were plausible, but unpredictable. And without secure workflows, they weren’t ready for production.

That’s the line that drew a visible line in the conversation: capability alone isn’t enough. You need control.

During the session, both Semih and Miklós highlighted the difference between raw language models and productized solutions built for localization teams. It’s not just about plugging in an API. What Phrase has developed over the past year represents a more grounded approach. Rather than general-purpose interfaces, teams need task-specific systems designed to support real work at scale.

“The future isn’t really prompt engineering. It’s operationalized AI, as we call it, and the modern era of LLMs embedded workflow.”

— Semih Altinay

That shift was evident in a live demo presented by Miklós, where we walked through how Auto Adapt processes entire documents rather than segments, and how it applies tone, demographic preferences, and even measurement conversions with a level of consistency that’s hard to achieve manually.

These workflows both contain and direct the model, with structures for glossary adherence, style guide alignment, and brand voice preservation. And when changes are made, they’re traceable, offering a way to use them within established governance frameworks.

The three pillars: Generate, adapt, evaluate

One of the most useful structures introduced during the webinar was a simple but practical framework for applying LLMs across the localization workflow.

Rather than treating large language models as a single solution, the speakers broke their application into three distinct functions: generating content, adapting it for context, and evaluating its quality.

This framing allowed the discussion to move beyond theory and into how these models are already being applied in day-to-day scenarios.

1. Generate: Moving beyond segment-by-segment translation

The session began with a focus on generation—specifically, how Phrase’s machine translation engine handles input differently from traditional MT. Rather than processing one sentence at a time, it works across multiple segments. This change may seem small on the surface, but it directly affects fluency, cohesion, and the natural rhythm of a translated text.

Semih explained that the engine draws on a Retrieval-Augmented Generation (RAG) approach, meaning it doesn’t just rely on the model’s base training. It actively retrieves relevant content and applies that context when generating output. Glossaries and style guides aren’t layered on after the fact—they’re part of the process from the beginning.

“We’re no longer looking at one segment at a time—we’re looking above, below, and around it.”

— Semih Altinay

This shift is particularly helpful for content types that demand consistency, like help articles, onboarding flows, or product documentation, where individual sentences need to sit comfortably within a broader structure.

2. Adapt: Reshaping content for different requirements

Miklós walked through how Auto Adapt reshapes content in a way that goes beyond linguistic correctness, shifting tone, formality, register, and regional norms.

This functionality matters when content needs to feel familiar to its reader. For example, the system can switch between formal and informal tones based on language or audience preferences.

It can also make geographic or cultural adjustments, such as converting measurements or swapping phrases that might be idiomatic in one region but unclear in another.

Miklós showed this in a fun example, using the fictional menu from “Milliways” – the restaurant at the end of the universe, from the book “The Hitchhiker’s Guide to the Galaxy”:

“So for example, “dining at the edge of time and unforgettable experience” – that’s just fine for elderly people, but then it changed dining to feasting for a younger audience.

Also, the use of “totally” right? It’s just so young. It’s just “totally this” and “totally that”, whereas, for elderly people, it’s more like “I had the pleasure of sampling the pan galactic gargle blaster”
Miklós Urbán, Principal Enterprise AI & automation Architect, Phrase

The session included several live comparisons that demonstrated this in action. One particularly telling example involved content rewritten for UK English. The system not only adapted spellings and punctuation, but also rephrased elements to better align with local usage.

What stood out was the way this adaptability could be scaled. Miklós showed how a shared Google Sheet can drive tone and message changes dynamically across domains or languages without scripting. For many teams, this introduces a new level of consistency, especially when managing large volumes of content across multiple markets.

3. Evaluate: Scaling quality control without scaling workload

The third pillar—evaluation—focused on how LLMs can support review processes without displacing the human input that still matters. Auto LQA was designed to assist, not replace, and during the session, Semih described how it’s being used to support enterprise-grade QA at scale.

He shared an example involving a travel services company that needed to assess 28 million words across thousands of projects. Manual review was too slow and resource-heavy.

By using automated quality scoring built into Phrase’s systems, they were able to dramatically cut review time without losing oversight.

“The impact was a 40% cost reduction. The annualized amount was about 80,000 euros. LQA efficiency was boosted from hours down to minutes, increase in content, scoring in workflow, and faster time to market without having quality issues.”

— Semih Altinay

The discussion also covered MT Optimize, a feature that quietly improves raw machine translation output by correcting formatting, terminology, and basic grammar issues without needing manual input. It isn’t designed to rewrite content entirely, but to handle the routine fixes that often slow down review cycles.

By separating generation, adaptation, and evaluation into distinct functions, the speakers were able to present a more grounded, less abstract picture of how LLMs fit into a modern localization workflow. Each function solves a different part of the problem. Each requires its own level of control. And each is already being applied in production settings where deadlines, scale, and brand consistency matter.

Real-world results: Use cases in action

The session closed with several examples showing how these tools are already being used across industries. Rather than hypothetical scenarios, these were grounded cases involving real companies addressing familiar challenges—tight timelines, high volumes, and the need for precision across languages.

Healthcare Platform: Adaptive Messaging at Speed

One healthcare company needed to localize sensitive content for different demographic groups. Adapting tone, language, and formality typically required specialist linguists and long lead times. Using Auto Adapt, they were able to reduce that workload significantly, automating much of the transformation while maintaining message integrity.

Result: 15% cost reduction, 3x increase in automation, and turnaround times that dropped from weeks to hours.

Crypto and travel brands: Localizing variants, not just languages

Several companies used Auto Adapt to fine-tune content for regional variants—converting from U.S. English to U.K. English, while preserving tone and making subtle adjustments that go beyond spelling.

This ability to respond to context, whether it’s punctuation style or unit conversion, helps teams manage localized variants without relying entirely on human rewriting.

LSPs: Reducing post-edit fatigue

For one language service provider working with complex, tag-heavy machine translation, MT Optimize offered a way to reduce manual clean-up. The system automatically handled formatting and terminology issues, allowing human reviewers to focus on higher-value work.

Result: Shorter delivery timelines, improved margins, and less time spent on repetitive post-editing tasks.

Each of these examples reinforced a broader point made throughout the webinar: the goal isn’t to remove people from the process, but to focus their time where it counts. When applied selectively, these tools can extend what localization teams can achieve, without requiring more hours or larger budgets.

Where we go from here

Toward the end of the session, the conversation shifted from tools and tactics to something more fundamental: how the role of localization is changing, and what that means for teams working under increasing pressure to deliver more, faster, and with tighter alignment to brand expectations.

This shift was echoed in the Q&A, where several questions centered on integration, security, and model control. One participant asked whether it was possible to plug in a custom LLM—something trained on a company’s own data.

The answer, as Semih explained, was not simply about capability. It was about complexity: prompt strategies, fallback logic, error handling, and quality safeguards all need to be rebuilt and validated around that specific model. It’s a reminder that the conversation isn’t just about which engine you use, but how you govern it.

That tension between capability and control, speed and quality, and experimentation and production was present throughout the session.

Our Chief Product Officer, Simone Bohnenberger-Rich, PhD., recently wrote about this in MultiLingual Magazine, talking about this exact moment as one that demands not just new tools, but new ways of thinking about localization’s role within the broader content ecosystem.

Read Simone’s article in MultiLingual Magazine

She makes the case that localization isn’t a side channel—it’s infrastructure. And the companies that treat it that way, with the same operational seriousness as software or logistics, are the ones most likely to succeed when entering new markets or scaling globally.

The webinar reflected that same mindset. Utilising LLMs in localization isn’t about chasing novelty. It’s about making systems that support better decision-making, more consistent messaging, and localization that keeps pace with the rest of the business.

For teams navigating these changes, the message was clear: don’t wait for perfect. Start where you are. Build processes that can scale. Keep humans in the loop. And treat language not just as a surface feature, but as a strategic layer in how your brand shows up around the world.

Conclusion: LLMs as a strategic advantage

LLMs have moved beyond trial runs and speculative use cases. As the webinar made clear, they are now being embedded into real workflows, supporting teams that need to produce more content, across more markets, without losing consistency or control.

This isn’t about replacing translators or reviewers. It’s about giving them better tools—tools that understand tone, recognize nuance, and scale across formats and audiences. Whether refining phrasing for regulatory documentation or adapting style for younger readers, LLMs are already reshaping how content moves through the localization process.

“With LLMs, we’re not just translating. We’re transforming how global content is created and delivered.”

— Semih Altinay

For localization teams, this represents a new kind of leverage—one that helps meet rising demand without sacrificing quality or voice.

The work ahead lies not in choosing the flashiest tool, but in building thoughtful systems around the technology. That’s where real value is created. And that’s where the most forward-thinking teams are now focused.

Watch the full webinar

Want to see these tools in action? Check out the full webinar recording, including Miklós Urbán’s live demo.

Join Phrase for a webinar tailored to localization leaders looking to harness large language models (LLMs) safely, at scale, and in line with brand standards.

Learn how LLMs are evolving from experimental tools into production-ready engines for generating, adapting, and evaluating multilingual content across the localization workflow.

Making LLMs work for multilingual content | Semih Altinay and Miklós Urbán

Watch now

Navigating the new translation frontier: From NMT to LLMs, why your strategy needs to evolve

Large Language Models (LLMs) are reshaping translation by enabling context-aware, dynamic adaptation beyond traditional NMT. They process entire documents, follow prompts, and support real-time customization—allowing localization teams to build flexible, scalable workflows optimized for both quality and efficiency.

The new rules of global e-learning: AI, cloud platforms, and smarter content workflows

As content in education becomes faster, richer, and more global, localization must evolve too. This post explores how cloud-native platforms and AI-first workflows are transforming the way institutions scale learning, featuring real-world examples, insights, and a look at why legacy tools are no longer enough.

Automated quality assessment of GenAI-based translation

As GenAI reshapes enterprise translation, legacy quality metrics fall short. This post explores new methodologies—from Auto Adapt evaluations to LLM-as-a-judge—to assess translation accuracy, consistency, and fit-for-purpose at scale.

What Game Quality Forum 2025 revealed about the future of QA, localization, and player experience

From automation to emotional intelligence, Game Quality Forum 2025 brought together developers, QA leaders, and localization experts to explore the future of player experience. This event recap highlights the most urgent challenges, emerging tools, and practical strategies shaping quality in modern game development—plus insights from Phrase’s panel with Unity and AWS on building smarter, scalable pipelines.

Make your localization data work for you: Insights, impact, and automation with Phrase Data

Learn about the ways strategic teams are leveraging Phrase Data to generate actionable insights and deliver measurable impact.

Want to find out more?

Get in touch

Request a demo