Artificial Intelligence and Machine Learning Statement
Are we engaging the supplier (Phrase in this case) for a service or technology solution that uses artificial intelligence or machine learning?
We have AI-enabled solutions and customers can control when and which to enable, across all projects or for selective projects.
With a dedicated AI research team, in addition to our development teams, we are fully committed to the ongoing improvement of our AI/ML competencies. The key solutions described below are in current releases or active development.
We would be happy to have a follow-up discussion to better understand your use cases and concerns, so we may incorporate them as considerations for future releases.
What is the AI/ML solution intended to do and how will customers use the AI/ML solution? (e.g., what decisions or actions with the AI/ML solution drive?)
There are a few AI/ML-enabled Phrase Solutions, details are below.
A. Phrase Language AI, MT autoselect
Customers could utilize this autoselect feature to identify the highest-performing engine or specify the desired engine for the specific project(s). The autoselect gives customers the flexibility to use the highest-performing engine for each project without the hassle of configuring each one.
Details: We automatically match the most suitable machine translation engine based on content domain and language pairs. We will scan customer’s content for domain-specific keywords to aid the selection. After post-editing is completed, we use the quality score and effort required to improve our quality estimation score (highlighted in feature B).
Data usage: Customer’s content for translation is scanned briefly for domain identification (by matching keywords) and for performance evaluation (to assess the required post-editing effort).
Data protection: Customer’s content for translation is not retained or shared during this process.
B. QPS, Quality Performance Score
Customers could utilize this to plan resources, based on expected machine translation quality and post-editing workload. The quality estimation allows customers to have a more precise budgeting and planning process, by anticipating the overall workload, cost, and time required. QPS can also be used within the TMS product as a mechanism for automatically confirming the highest quality job for direct delivery, without the need for human review.
Details: We use customer data to train the current estimation model and output an estimate of the Multidimensional Quality Metrics (MQM) score that the translation would receive were it sent through human LQA for review.
Data usage: Customer’s content for translation and the translated output from the machine translation engine are used by the estimation model to compute the quality estimation.
In addition, to compute the machine-translation quality score, we use both the output from the machine-translation engine and the final post-editing result. Finally, to improve the quality estimation model, we conduct ongoing training using only the machine-translation quality score.
Data protection: Customer’s data will be used to train the quality estimation model but will never be accessible (by extraction or reverse engineering) by anyone, even if they gain access to this model.
C. Identification of Non-translatables
Customers could utilize this to plan resources, based on the expected workload after skipping non-translatable segments. This and the quality estimation (feature B) both allow Customer to have a more precise budgeting and planning process, by anticipating the overall workload, cost, and time required.
Details: Similar to B, we use customer data to improve the accurate identification of segments requiring no translation, such as company/product names, and the associated language pair(s).
Data usage: Customer’s content for translation is used by the non-translatable identification model to predict the probability of skipping a segment. Additionally, the content for translation along with the translated outputs are used for periodic training of the Phrase non-translatable identification model.
Data protection: Customer’s data will be used to train the model but will never be accessible (by extraction or reverse engineering) by anyone, even if they gain access to this model.
D. Phrase NextMT, Phrase’s in-house translation engine
Customers could utilize the current version of Phrase NextMT, the Phrase in-house machine translation engine. This allows customers to improve the quality and speed of localization projects by providing linguists with machine-translated content optimized for post-editing by professional translators.
Details: Phrase NextMT engine is tailored to professional translations, including support for tag placement, advanced glossary integration (including morphological inflection), and translation memory adaption (fuzzy matches). We train the current version of Phrase NextMT, our in-house machine translation engine, with publicly available data. We then use the customer’s translation memory to improve the translation, such as better style matches. In future releases of Phrase NextMT, we plan to provide customers with the ability to generate custom engines. Customers will be able to then utilize a custom-trained “customer-only Phrase NextMT engine” that no other customers will be able to access.
Data usage: Only publicly available data are utilized in the training of the current generic Phrase NextMT translation engines. During the translation process, the trained Phrase NextMT translation engines have access to the translated content along with customer’s translation memory (for relevant fuzzy matches) and customer’s MT glossary.
Data protection: Customer’s data and the custom-trained engine will not be accessible to anyone else.
E. Phrase Custom AI, Dataset creation and custom engine training
Customers have the ability to create custom training data from their translation memories and use them to generate custom NextMT engines for specific use cases. Customers will be able to then utilize a custom-trained “customer-only Phrase NextMT engine” that no other customers will be able to access.
Details: Customers can create their own datasets using automated cleaning filters using their own translation memories. Customers can then use one of our pre-trained, generic NextMT engines and train their own, customized version of that engine which is then accessible only to customers. We also provide analytics that demonstrate the success of model training and the utility of the resulting engine.
Data usage: During the dataset creation and cleaning process and subsequent training of the Phrase NextMT engine, the system has access to the customer’s translation memory. Once trained, access to the resulting engine and datasets is restricted such that they can be accessed and used only by customers.
Data protection: Customer’s data and the custom-trained engine will not be accessible to anyone else.
F. MT data cleaning, Translation memory cleaning
Customers could utilize this to train customer’s own translation engines. This allows customers to optimize training results by having cleaner training data.
Details: We cleanse customer’s translation memory using rule-based filters and Machine Learning algorithms.
Data usage: We ingest customer’s translation memory and provide a cleaned-up version.
Data protection: Customer’s data (before and after the cleansing) will not be accessible to anyone else.
G. Automated linguist selection
Customers could utilize this to expedite project assignment decisions, by assigning linguists with past experience working on similar documents. This allows customers to improve the quality and speed of localization projects by optimizing linguist assignments and reducing repetitive project management decisions.
Details: We use suitable vector representations to categorize documents, followed by fast approximate similarity searches to curate the recommended list of linguists. In future releases, we plan to utilize similar mechanisms to automate other project management decisions.
Data usage: Customer’s content for translation is scanned briefly for domain identification.
Data protection: Customer’s data (before and after the analysis) will not be accessible to anyone else.
Will the third party (Phrase in this case) custom develop AI/ML models exclusively for customers or will they provide generic AI/ML models?
We offer both. Customers may utilize any of the 30+ generic engines and additional custom MT models. Custom MT models would be tailored for the particular customer, using customer data. Please refer to section E in question 2 for more details.
MT engines integrated via customer’s own API key
We have no control over custom model training of MT engines provided by 3rd parties with whom you maintain a direct relationship, within the meaning that you are using your own API key to integrate the MT engine with the Phrase Solution.