As large language models continue to reshape how content is created and adapted, many of the workflows built around earlier generations of machine translation are starting to show their age. In Part 1, I argued that the dominant MTPE paradigm no longer aligns well with the capabilities of modern AI. If that workflow is increasingly inadequate, the natural question is what replaces it?places it?
The answer points toward a different conception of translation. Not as a one-shot generation task followed by correction, but as an adaptive, interactive, and collaborative process between humans and AI systems
From static output to adaptive systems
One of the most important shifts enabled by LLMs is the evolution of automated translation beyond a static, single one-step process.
Traditionally, achieving high-quality automated translation in specialized domains required building dedicated systems trained or fine-tuned over long periods of time. This was the dominant paradigm for many years. Back in 2009, I founded and led an MT technology company, Safaba, focused on exactly this challenge. We developed dedicated custom MT systems adapted to the specific content domain and language used by enterprise customers. After Safaba’s acquisition by Amazon in 2015, this paradigm of automated static adaptation at scale delivered significant value for large-scale content translation at Amazon and became widely adopted across the industry, including here at Phrase, where our Custom AI production models still serve many of our customers.
Today, modern LLM-based models can increasingly adapt at inference time, incorporating instructions and contextual information dynamically. Rather than translating in isolation, systems can now be guided with background knowledge about the author or domain, examples of desired style or tone, explicit translation strategies, and other relevant contextual materials.
In principle, this allows much closer alignment with the requirements of a specific translation task. In practice, however, this capability remains underutilized. The bottleneck is not the underlying model capability, but the orchestration layer around it. Supplying this level of context manually is difficult and time-consuming. The real opportunity lies in automating the adaptation layer, enabling systems to gather, organize, and apply relevant context with minimal friction.
From post-editing to interaction
A second shift concerns the role of the human translator.
In the traditional MTPE workflow, the human is positioned as a reactive editor, entering only after a full draft has been produced. Large language models make it possible to move beyond this limited constraint.
Instead of editing a completed output, a translator can engage earlier and more selectively in the process. This includes reviewing initial segments, identifying systematic issues such as tone or stylistic consistency, and providing targeted feedback that guides the system toward improved outputs. This creates an iterative feedback loop in which the system adapts based on expert human input.
The goal is not to require interaction at every sentence, which would simply recreate the inefficiencies of post-editing. Rather, the objective is to enable high-leverage interaction, where relatively small amounts of expert input can drive meaningful improvements across larger portions of the text. Achieving this requires better interfaces, more predictable model behavior, and clearer mechanisms for incorporating feedback. But the conceptual shift is clear: from editing outputs to steering systems.
Toward multi-agent translation workflows
A third dimension involves the structure of the system itself.
Rather than relying on a single model to perform all aspects of translation, we can begin to think in terms of multi-agent workflows, where different components specialize in complementary roles. One component may focus on generating candidate translations, another on evaluating their quality, and yet another on refining or correcting them. These components may interact iteratively before presenting results to a human expert.
This naturally raises a critical question. Can automated systems perform nuanced, context-aware evaluation reliably enough to guide such a process?
Recent work on LLM-as-a-judge and, more broadly, agents-as-judges suggests that we are beginning to move in that direction. These approaches aim to go beyond scalar scoring and toward structured, explainable assessments that identify specific issues and trade-offs.
This is also central to ongoing work in the WMT shared tasks on translation quality evaluation systems, which I have been involved in organizing and leading. Recent efforts have focused on unifying reference-based metrics and quality estimation approaches, and on evaluating systems under more realistic and challenging conditions.
At the same time, it is important to be clear about the current state of the field. Evaluation, especially nuanced, domain-sensitive evaluation, remains far from solved. This is not a minor detail; it is a fundamental constraint. Multi-agent systems depend not only on improved generation capabilities, but on reliable, context-aware evaluation. Without that, the feedback loops that these systems rely on cannot function robustly.
Why this isn’t fully working yet
If these capabilities are emerging, why haven’t they already transformed translation workflows?
The answer lies primarily in systems integration. Most existing tools and platforms are still built around earlier paradigms, assuming a single-pass generation step followed by downstream editing, with limited support for iterative feedback or contextual adaptation. As a result, even when more advanced capabilities are available at the model level, they remain difficult to access and apply effectively in practice.
There is also a broader ecosystem challenge. The development of translation technology spans multiple layers, from foundational model providers to application-level technology developers to end users such as enterprise localization managers, translators and language service providers. These layers have not always been well aligned, particularly when it comes to incorporating the expertise of professional translators into system design and operation.
Bridging this gap more effectively will be critical to realizing the full potential of these new approaches.
This is the gap that the Phrase Language Intelligence Platform is designed to close; bringing together automated context, orchestration, and human expertise in a single composable system.
Implications for Translators and the Industry
These shifts have important implications for how roles and responsibilities evolve across the industry.
For translators, the role is unlikely to disappear, but it will change in meaningful ways. Increasingly, the value of human expertise will lie in guiding and supervising AI systems, ensuring that outputs meet domain-specific quality requirements, intervening in high-risk or high-value cases, and contributing to the design of translation workflows and tools themselves.
At the same time, the balance between human and machine effort will vary significantly across domains. In highly standardized technical content, automation may dominate. In creative domains such as literary translation, human involvement will remain central, although the nature of that involvement may shift. In high-risk domains such as medical or financial translation, hybrid models combining automation with targeted human oversight are likely to persist.
For technology providers, the challenge is no longer simply to improve model performance, but to design systems that effectively support interaction, incorporate context, and integrate human expertise effectively and reliably into the process.
An uneven but inevitable transition
As with previous waves of innovation in translation automation, the transition to these new paradigms will be gradual and uneven.
The science fiction author William Gibson famously observed that “the future is already here—it’s just not very evenly distributed.” This insight applies directly to the current state of translation technology.
Some domains, language pairs, and use cases are already benefiting from highly advanced GenAI capabilities, while others continue to rely on more traditional approaches. Even within the same organization, different teams may be operating at very different points along this spectrum. Over time, this uneven distribution will gradually shift, but it is unlikely to completely disappear for the foreseeable future.
In practice, this means that new workflows will be adopted incrementally. Many practitioners will experiment with interactive or agent-based approaches and conclude, at least initially, that they do not yet deliver sufficient value. That is a natural part of the transition.
What matters is that the space of what is possible has expanded dramatically, and will continue to do so.
Looking ahead
The move from post-editing to adaptive, interactive workflows represents more than an incremental improvement. It reflects a deeper shift in how we conceptualize translation itself.
Translation is no longer best understood as a task performed either by humans or by machines. Instead, it is increasingly a collaborative process, shaped by the interaction between human expertise and increasingly capable AI systems.
Realizing this vision will require advances not only in models, but in dedicated platforms and systems, and in how we design human–AI collaboration.
The future of translation will not be decided at the model layer. It will be decided by the systems built around those models. the platforms that apply context intelligently, evaluate quality reliably, and integrate human expertise where it matters most.
The organizations that recognize this shift and invest accordingly will not just translate faster. They will communicate more effectively across every market they operate in.





