Security and legal
Phrase security statement
This security statement applies to the products, services and applications offered by Phrase. The protection and reliability of customer data is our utmost priority. Our security system is based on the principles of high resilience, transparency and third-party evaluation in accordance with the globally recognized security standards. We believe that Phrase architecture based on a public cloud service with multi-tenant model and logical access controls provides the best value and protection to confidential data of our customers such as translations, translation memory files, etc.
Phrase a.s. (formerly Memsource a.s.) has been certified for ISO 27001 which proves that the information security management system (ISMS) which we have introduced conforms to the ISO standard. The ISO certificate was renewed for years 2020-2023.
We use Amazon web services (AWS) as our cloud provider. AWS is compliant with a wide range of security standards including SOC 1/ISAE 3402, SOC 2, SOC 3, FISMA, DIACAP, and FedRAMP, PCI DSS Level 1, ISO 9001, ISO 27001, ISO 27017, and ISO 27018.
We use a third-party payment provider that is PCI DSS compliant and uses additional security mechanisms such as MasterCard SecureCode, Verified by VISA and SafeKey.
Audits and Vulnerability Detection
Phrase services pass through third-party penetration tests each year. The tests are conducted in accordance with the OWASP ASVS standard.
We operate a third-party hosted vulnerability disclosure program allowing independent researchers to responsibly disclose any vulnerabilities they may find in our applications and services.
We use a third-party service for monthly automated vulnerability scans.
Our information security management system is subject to annual internal audits and third-party audits verifying our compliance with the ISO 27001 standard.
Data that you upload to Phrase is completely private. It is only reserved for you and your organization’s users which you create and allow to have access to the data. The privacy of your data is guaranteed by technical means as well as by our Terms of Service.
Phrase’s approach to customer data is fully compliant with governmental regulations such as GDPR and CCPA. Phrase will only interact with customer data if having an explicit customer consent for data processing. We will not use your content for any purpose other than to keep you informed, provide you with Phrase services and enhance our services and product offerings.
Transactional and marketing communication from Phrase from which users can unsubscribe.
Providing customers with the Phrase services that they have subscribed to, such as upload a file for translation, export a TMX file, etc..
This consent is related to machine learning in the Phrase TMS Services. Let us explain in more detail below.
Phrase uses data for training the artificial intelligence (AI) algorithms to provide better service to our customers. Use of data by the AI team is guided by the following principles:
- Full GDPR compliance
All aspects of Phrase machine learning – from data training to implementation – is fully GDPR and CCPA compliant. Phrase treats all training data as potentially including personal data. Therefore, any training data is discarded within 90 days in line with GDPR guidelines.
- Internal use only
Phrase provides AI-powered features to Phrase customers and does not sell them to third-parties.
- No training data reconstruction
We use aggregate data to train machine learning models that do not output text. The output is metadata such as an MTQE score (50%, 75%, 100%). This approach rules out the possibility of data leak through the AI models and also prevents training data reconstruction.
- No mingling of customer data for features that output text
For any AI features that outputs text (machine translation, translation hints, autocomplete, etc.) Phrase does not mingle customer data. Models are trained per customer and only from customer data or potentially complemented with publicly available data.
The data in your Phrase account is protected. Only users that you have provided appropriate user rights have access to your content. Instead of emailing data, users access data upon authentication in Phrase (see Access Control) and all user actions are logged.
All stored data is encrypted using Linux LUKS (aes-xts-plain64:sha256) or AWS encryption (AES256).
Data Centers and Locations
Phrase service is hosted on Amazon Web Services (AWS) platform. The physical servers are located in AWS data centers. User content can also be found in backups, stored in AWS S3.
We maintain separate and distinct development, QA, pre-production and production environments.
To access the Phrase production environment, authorized and trained members of the Phrase Engineering team members use VPN and authenticate using unique strong passwords and 2FA.
Phrase uses a formalized IT change management process designed to ensure that changes are authorized and operate as intended.
The change management system in Phrase follows these principles:
- All software development follows the best practices documented in Phrase policies and documentation of particular components.
- All changes are documented and approved by the relevant team lead.
- All changes are tested in the QA and pre-production environments prior to deployment to the production environment. Changes are approved only if they fulfill predetermined criteria. The development and QA environments use testing data and do not include real customer data.
- All changes which affect applied security measures or risk profile of the Phrase service are assessed form the security standpoint.
- In case of a major change, penetration tests and/or vulnerability tests are performed.
Access management in Phrase is guided by the following principles:
- Principle of Least Privilege
Access privileges for any user should be limited to resources absolutely essential for completion of assigned duties or functions, and nothing more.
- Principle of Segregation of Duties
Whenever practical, no single person should be responsible for completing or controlling a task, or set of tasks, from beginning to end when it involves the potential for fraud, abuse, or other harm.
- Personalized profiles
Whenever possible, user profiles are personalized, e.g. tied to the identity of one specific user.
- Single identity
Wherever possible, user profiles use a single authentication provider (such as Google ID) and single credentials. Multi-factor authentication is enabled when supported by the authentication provider.
- User responsibility
The user is responsible for the protection of the authentication means (username, password, means of multi-factor authentication) and all actions performed under their profile. The administrator of the IT system / application is responsible for the use and protection of technical profiles.
Our audit logs meet NIST SP800-53 AU-3 requirements. We store logs related to system and applications events and also related to any user activity within their Phrase account. We have centralized log management in the form of a third-party service.
Audit logs are available to Phrase engineers and can be provided upon request.
Login history (including IP address, country and user agent identification) is available to each user and accessible via the UI.
All communication is encrypted in Phrase by default. This includes communication between Phrase servers and the user’s web browser as well as the Phrase CAT desktop editor as well as the mobile app.
The connection to Phrase is encrypted using the latest security standards and best practices. The connection uses TLS 1.2. The identity of the connection to Phrase is verified by a secure certification authority.
Redundancy and Backups
Redundant architecture ensures a high service up-time. All data is kept in several redundant database instances. All data is backed up through near real-time incremental backups as well as daily full backups to a highly durable storage hosted in AWS S3. Backups are encrypted using Linux LUKS (aes-xts-plain64:sha256) or AWS encryption (AES256).
Disaster Recovery and Incident Response
We apply disaster recovery and incident response policies that ensure timely and effective reactions to incidents. Thanks to redundant architecture and rapid incident response we were able to reach 99.99% availability long-term. Thanks to a robust backup system, we are able to guarantee swift recovery and minimal data loss. The performance of our disaster recovery is measured by Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
RTO is the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity. Phrase guarantees an 8 hour RTO for all components of its service.
RPO is the maximum acceptable amount of data loss measured in time. It is the age of the files or data in backup storage required to resume normal operations if a computer system or network failure occurs. RPO covers incidents that require complete recovery of all database instances. In case only one database instance is affected by the incident, the production environment seamlessly switches to another instance. Phrase guarantees a 4 hour RPO even in case of a catastrophic failure.
Although most of the assets of Phrase are cloud-based, company policy ensures the protection of the physical premises as well as the information assets stored herein.
Our premises are protected by a security service that is present 24/7. The entrance to the building is monitored by CCTV camera. Security controls all access points to the building including emergency doors.
In general, Phrase premises are only accessible to Phrase employees and long-term contractors. These persons are holders of tokens granting access to the general office area, excluding restricted areas.
Visitors are registered at the reception desk that operates 24/7. Based on their registration, they are only given access to the lift area. To access Phrase premises, they must be accompanied at all times by a Phrase employee. All Phrase employees are responsible for keeping their visitors accompanied at all times during their visit and not granting them any unnecessary access to any information assets belonging to Phrase.
Hard copies of classified information may be stored only in locked closets located in the Phrase office. Access to those documents is granted only to employees who require it for the performance of their duties.
Classified IT assets are stored in the server room. Access to the server room is only granted following confirmation by a designated Phrase employee. Phrase’s information assets are stored separately from the equipment of other tenants in locked racks.
Users are obliged to act in line with legislation, rules and procedures described in this and related policy documents. They are responsible for the security of assets entrusted to them by Phrase. Any misconduct or violation of the aforementioned obligations may lead to disciplinary measures according to applicable labor legislation.
A centrally managed and automatically updated anti-malware solution is installed on all computers. All devices have full disc encryption enabled and are protected by strong password and/or biometrics. Phrase users have to follow these policies even when using their own devices. Clean desk policy provides rules for securing the devices when not attended and for safe storage of internal and classified information only in the designated protected areas.
Users have to create unique, complex and not guessable passwords for all work-related accounts. Remote access to the internal Phrase network is only possible through company managed VPN.
All prospective Phrase employees and contractors are subject to background checks in line with privacy legislation. Security awareness training is part of our on-boarding process and is repeated annually. All employees and contractors have a signed NDA as part of their contract.
Artificial intelligence and machine learning
How machine translation and artificial intelligence is used at Phrase and how it relates to data privacy and processing.
Data privacy and security
Data in machine learning models
Data uploaded by Phrase clients (including metadata) can potentially be used for training machine learning models. This data is not shared with other users, nor is it possible to extract it from the models as they do not produce any content.
All content is treated as if containing personal information, so all data for ML is handled in accordance with the rules imposed by GDPR. Client data (or anyone’s data) is not resold for profit. This data can’t be reconstructed or reverse engineered. None of the AI features generate any textual content, only the labeling of content with metadata (e.g. MT quality category, non-translatable, etc).
When training models, all relevant data is aggregated from Phrase and models are constructed from it. After no more than 90 days (as required by GDPR), all data is deleted and only the models remain. These models do not contain customer data, as they do not store sentences. The neural network model is a complex mathematical formula that calculates a quality score based on the source sentence and its translation. Training the model involves adjusting the parameters of the formula until it provides desired results.
Training data is processed by the machine learning algorithm to create (train) a model. This model is used in the feature to predict non-translatables, MT quality (MTQE) or to recommend an optimal MT engine.
While the training data may contain personal data, the resulting model does not. Any personal data is anonymized during the training process.
Machine learning models and new content
When new content is processed with MT a numerical representation is created, fed to the formula and a score is calculated. If identical to training sentences, it indicates the formula is already optimized to generate the correct translation.
The formula is designed to learn and identify patterns in data; to generalize. It will, for example, learn that when it sees cat in the English source, the MT should contain gatto in Italian to be considered good. MT that doesn’t contain it is bad. These learned patterns are then applied to any newly submitted sentences and MT output.
For general information on the engines supported by Phrase, their performance, and factors to consider when getting started with machine translation, see these resources:
Data security and privacy with MT providers
When submitting source content to machine translation providers, Phrase encrypts the data in transit. When processed by the MT engine, the data is subject to the MT providers terms of service and privacy policies.
- Phrase collects training data from already-processed segments (source, MT output, postedit).
- Based on the similarity between MT output and post-edit, a quality category is calculated (100, 99, 75, 0).
- The similarity is calculated using a combination of an in-house metric, which is partially based on chrf3 (a popular MT evaluation metric).
- The tuples are fed (source, MT output, score) into a deep neural network and taught to predict the quality category. One neural net per language pair is in use.
- In production, the neural net gets a source sentence and the MT output as input, and it predicts the most probable quality category.
chrf3 vs BLUE or TER
Better results has been observed with chrf3, and it is more reliable when scoring individual sentences. It also handles various language types, such as morphologically rich languages or CJK languages.
No MTQE scores
If a score is not provided, the segment is likely not worth post-editing but it may also mean that the model may not be confident enough to answer. Separating these two situations is in development.
Defining and identifying domains
Phrase collects training data from existing segments. An algorithm is applied that identifies different domains making up the training data. Although obtained automatically, the detected domains correspond well to standardized domains (e.g. medical, travel/hospitality, software, etc).
For each language pair and domain, the performance of machine translation engines is monitored. When a new document is uploaded for translation, the model detects which of these domains are present. The most relevant domain is selected (e.g. If a document is 60% legal and 40% medical, it is categorized as legal) and the current best engine for the legal domain and corresponding language pair is recommended.
Phrase is available in two separate locations: the EU data center is hosted on Amazon AWS located in Ireland (eu-west-1), and the US data center, launched at the beginning of 2022, is hosted on Amazon AWS in the United States (us-east-1). We refer to those as European Union and United States data centers and their home URLs are https://cloud.memsource.com and https://us.cloud.memsource.com, and https://app.phrase.com/ and https://us.app.phrase.com/, respectively.
New organizations can choose the data center location during the sign-up process. All profiles created on the EU data center fully reside in the EU, no data is shared with the US data center, and vice versa. The data centers are separate cloud infrastructures and there is no data sharing, integration, or migration path between them. Based on this statement, We declare that Phrase will not transfer any data from customers who have chosen the EU data center to US data center and will not transfer any data from customers who have chosen the US data center to the EU data center. However, if the customer decides to use Phrase Translate Add-on for post-editing, several Machine translation engines may use their own data centers with different locations. The Phrase Translate Add-on for post-editing offering unlimited machine translation at a flat fee is also available in the US data center. Some of the fully managed MT engines, however, cannot guarantee that customer’s data won’t be transferred outside the US.
To prevent a potential data transfer to another region, use only MT engines with the tag Data region policy and turn off all MT engines that don’t have this data region guarantee.
Any Phrase subscription, regardless of the data center the customer has chosen, is governed by the same Terms of Service, including Service Level Agreements if these are part of the customer’s subscription. The list of the sub-processors is the same for both data centers, but some of the sub-processor may be involved only for the processing activities within the EU data center.
The service up-time and performance metrics are published on https://status.phrase.com/.
The functionality of both instances is identical, except for the following:
- Snowflake data cloud is only available for EU data center.
For IP addresses the servers use, refer to Phrase Servers IP Addresses.
This article covers the practical aspects of data storage in Phrase. The legal aspects are covered by our Terms of Service and the security aspects are covered by our Security Statement.
Data in the recycle bin
Data in the recycle bin is stored for 30 days and is consequently deleted permanently. The recycle bin can also be purged immediately if required.
- Projects that are inactive (no new jobs added or current jobs not updated) for 12 months are automatically deleted by the system (a warning message will be displayed 6 months prior to its deletion on the project page). When a project is deleted, the project’s jobs, analyses, and reference files are also deleted and will be stored for 30 days in the recycle bin after which they will be permanently deleted. The translation memories and term bases selected for the project are not deleted.
- Lifetime of shared projects is controlled by criteria from the buyer’s end. Once a shared project is no longer available in the buyer’s profile, it also disappears from the vendor’s organization.
- The Privacy Notice details how other data is stored.
Extending the project lifetime
The creation of a new job within a project or the modification to the underlying job file (manually or via API) within a project extends the project lifetime.
Project age definition
A project’s age (how old a project is) is based on the project creation date.
Translation Memory storage
With a few exceptions, translation memories are stored permanently unless deleted by users.
Deletion of translation memories from expired and unused profiles
Translation memories are deleted from Phrase profiles that have expired for at least 6 months (the account has not been logged into for at least 6 months).
- If a profile has been expired for at least 6 months, the administrator user of the expired profile will be notified by email that the profile’s translation memories will be moved to the profile’s recycle bin.
- The administrator user can disregard the message and the translation memories will be deleted permanently from the recycle bin after 30 days. To avoid deletion, the administrator user may log into the profile, extend the subscription and restore the translation memories from the profile’s recycle bin.
Term Base storage
The policy for term base storage is the same as the policy for translation memory storage.
Phrase carries out a system-wide data backup on a daily basis. Users may back up data in the following ways:
Manual data backup
To backup essential project data, download the following:
- Original files
- MXLIFF files (or a single MXLIFF joined file)
- Completed files
Automated data backup
Users with the Ultimate plan and higher can use Phrase APIs to automate data backup.