Segmentation in translation is the process of breaking a source text down into smaller units for translation. These units are configured by choosing specific segmentation rules that serve as a base for creating and editing translation memories, according to a chosen language pair. These rules constitute an advancement in translation automation as systems ‘learn’ to recognize them and they are automatically applied during the translation workflow.
Why is segmentation important for translation?
With the rise of computer-assisted translation (CAT) tools, translators, reviewers, and outsourcers have gained access to software that significantly cuts down on time required for translation-related tasks, through automation. Segmentation is one of the initial steps that source content undergoes before a translator starts the translation process.
With segmentation, the source content is broken down in translation units called segments. These segments can be:
- Bullet points,
- Titles, etc.
The segments are automatically generated according to specific segmentation rules. They help build and customize the translation memory (TM) for a specific project or client. Segmentation is a cornerstone setup for the future leveraging of already existing translated content.
Thanks to the generated segments, the translator and reviewer also see their task simplified. Staying focused and remembering the exact wording of a translated segment in a high volume project can be challenging. Segmentation, and storage in a translation memory, help eliminate these translation issues by saving previously translated segments.
With segmentation and the TM resulting from already translated segments, the CAT tool will insert the existing translation into context with the exact same (auto-propagation) or partial content automatically replaced. This is done through a matching process that generates matches within certain match frames (i.e. 100 % match, 95-99 % match, etc.).
What are segmentation rules?
Segmentation rules can be adjusted for every single project. When the project is created, the CAT tool points to an option, where the user will choose whether to use the pre-configured segmentation rules or define specific ones.
The logic behind the segmentation rules can vary. All of the rules defined will be stored in an SRX file. Exceptions to the rules can also be defined. Some commonly used rules are the following:
- Each segment ends with a full stop,
- Each segment ends with any punctuation mark,
- A new segment is automatically created after a paragraph break (hard break).
Considering the source content, segmentation can happen according to the nature of the content as well. For example, for software there is string-based segmentation, for gaming content, there is cell-based segmentation, for documents, there is sentence-based segmentation and so on.
How to leverage segmentation in translation?
Segmentation is a great asset within a CAT tool that allows the translator to be more efficient in translation. It is the base for the creation and editing of the translation memory for a language pair, client, or project. The core benefit is time saved searching for already translated sentences and other text units that are repeated within the text. Segmentation is an automated process once the previously defined rules are set up in the system.
Segmentation also extracts specific units, previously translated, for future reuse, cutting costs for clients by leveraging already translated content. In addition, time savings mean faster project turnarounds while improving translation consistency and quality. This helps the user build a project-specific or general translation memory that can be used for any future translation project, eliminating redundant translation.