What Is Segmentation in Translation?

Localization Director at Phrase, with 15+ years of experience in the language industry

Last updated on January 30, 2024.

Segmentation in translation is the process of breaking a source text down into smaller units for translation. These units are configured by choosing specific segmentation rules that serve as a base for creating and editing translation memories, according to a chosen language pair. These rules constitute an advancement in translation automation as systems ‘learn’ to recognize them and they are automatically applied during the translation workflow.

Why is segmentation important?

With the rise of computer-assisted translation (CAT) tools, translators, reviewers, and outsourcers have gained access to software that significantly cuts down on time required for translation-related tasks, through automation. Segmentation is one of the initial steps that source content undergoes before a translator starts the translation process.

With segmentation, the source content is broken down in translation units called segments. These segments can be:

Phrases,
Paragraphs,
Bullet points,
Descriptions,
Titles, etc.

The segments are automatically generated according to specific segmentation rules. They help build and customize the translation memory (TM) for a specific project or client. Segmentation is a cornerstone setup for the future leveraging of already existing translated content.

Thanks to the generated segments, the translator and reviewer also see their task simplified. Staying focused and remembering the exact wording of a translated segment in a high volume project can be challenging. Segmentation, and storage in a translation memory, help eliminate these translation issues by saving previously translated segments.

With segmentation and the TM resulting from already translated segments, the CAT tool will insert the existing translation into context with the exact same (auto-propagation) or partial content automatically replaced. This is done through a matching process that generates matches within certain match frames (i.e. 100 % match, 95-99 % match, etc.).

What are segmentation rules?

Segmentation rules can be adjusted for every single project. When the project is created, the CAT tool points to an option, where the user will choose whether to use the pre-configured segmentation rules or define specific ones.

The logic behind the segmentation rules can vary. All of the rules defined will be stored in an SRX file. Exceptions to the rules can also be defined. Some commonly used rules are the following:

each segment ends with a full stop
each segment ends with any punctuation mark
a new segment is automatically created after a paragraph break (hard break)

Considering the source content, segmentation can happen according to the nature of the content as well. For example, for software there is string-based segmentation, for gaming content, there is cell-based segmentation, for documents, there is sentence-based segmentation and so on.

How to leverage segmentation in translation?

Segmentation is a great asset within a CAT tool that allows the translator to be more efficient in translation. It is the base for the creation and editing of the translation memory for a language pair, client, or project. The core benefit is time saved searching for already translated sentences and other text units that are repeated within the text. Segmentation is an automated process once the previously defined rules are set up in the system.

Segmentation also extracts specific units, previously translated, for future reuse, cutting costs for clients by leveraging already translated content. In addition, time savings mean faster project turnarounds while improving translation consistency and quality. This helps the user build a project-specific or general translation memory that can be used for any future translation project, eliminating redundant translation.

Why the post-editing paradigm is breaking down in the age of LLMs

Progress in translation quality is no longer driven primarily by better models. It’s driven by how those models are orchestrated, and the gap between a well-orchestrated system and a one-shot prompt is only widening.

Why video localization belongs in your core workflow

For many localization teams, video still sits outside the system. Text moves through mature workflows, while audio and video rely on separate vendors, tools, and review processes. The next step is not simply localizing more video, but integrating it into the operating model for global content.

Why accessibility matters in translation tools: Improving our CAT editor

Accessibility is shaping the future of translation tools. Learn how WCAG-aligned improvements in the Phrase CAT editor are helping linguists work faster, with greater clarity and fewer barriers.

Build or buy? You’re probably asking the wrong question

Building a translation prototype is easy. Making it work at scale is another story. This article explores why most organizations succeed with a hybrid approach: buy the infrastructure and build the capabilities that differentiate their business.

Why localization ecosystems outperform traditional vendor setups: Insight from Argos Multilingual and Personio

Localization teams often spend more time coordinating vendors than managing translation. In this webinar discussion, Personio, Argos Multilingual, and Phrase explore how ecosystem collaboration can remove operational friction and help localization leaders focus on strategy instead of coordination.

Want to find out more?

Get in touch

Request a demo