Software localization
Rails Validation: Pitfalls in Validating Uniqueness with Active Record
The tagging of translation keys was added as a feature in the very early stages of Phrase's product development. The feature was continuously improved over time and its performance – very reliable. Until a customer contacted us one day due to a validation error.
The Problem
The tagging feature had been used rather extensively by this customer to automatically tag all uploaded keys with a GitHub pull request, leading to a validation error with an existing tag. So we investigated the issue and discovered that the error was caused by a tag name which was not unique to the project. So how could that happen? We had used Active Record uniqueness validation and only Rails without skipping the validation.
Code State
The last state was that the Tag
should have a name field and belong to a project as well as have an Active Record validation, so that a tag was unique to the project_id
and name
:
State of Codebase
class Tag < ActiveRecord::Base
has_and_belongs_to_many :translation_keys
...
belongs_to :project
belongs_to :account
...
validates_uniqueness_of :name, scope: :project_id
...
end
Our code to create new tags looked something like this:
class Keys::TagService
def self.create_or_find_tag_by_name_for_project(project, tag_name, user, ...)
...
tag = TagRepository.tags_for_project(project).find_or_initialize_by(name: tag_name)
if tag.new_record? && tag.save
...
end
tag
end
end
State within Database
Checking our database revealed that there were 40k records that were not unique.
SELECT COUNT(*) FROM
(SELECT COUNT(*)
FROM tags
GROUP BY project_id, name
HAVING COUNT(*) > 1) AS g
Furthermore, we saw that all of the duplicated records were system generated tags by our upload feature. Uploads are processed asynchronously by concurrent workers. By using our client's push command an upload for each locale is triggered. On any upload the user can allocate a tag, the keys should also be tagged. This combination dramatically increases the probability of two processes trying to create a tag at the same time.
What happened?
If uniqueness validation is enabled, Rails will look for existing records before performing Tag.create, Tag.save, Tag.update ...
operations. If a record was found the validation fails and the transaction will be rolled back, if not the record will be saved. Example of how Rails validation fails:
pry(main)> Tag.create(name: "test", project_id: 1)
(0.2ms) BEGIN
...
Tag Exists (0.2ms) SELECT 1 AS one FROM `tags` WHERE (`tags`.`name` = BINARY 'test' AND `tags`.`project_id` = 1) LIMIT 1
(0.1ms) ROLLBACK
And here's an example of a successful Rails validation:
pry(main)> Tag.create(name: "test", project_id: 1)
(0.2ms) BEGIN
### Some other validations ###
..
Tag Exists (0.3ms) SELECT 1 AS one FROM `tags` WHERE (`tags`.`name` = BINARY 'test' AND `tags`.`project_id` = 1) LIMIT 1
..
### some before_save and before_create hooks ###
..
SQL (0.2ms) INSERT INTO `tags` (`name`, `project_id`, `slug`, `created_at`, `updated_at`) VALUES ('test', 1, 'test', '2017-02-13 12:56:07', '2017-02-13 12:56:07')
(14.7ms) COMMIT
Through this example you can see that there is no lock on the table between the SELECT
and the INSERT
statement. In this time gap another concurrent process can also create a record with the same name and the same project_id without any failing validation. As you can see hooks are also triggered between the validation and INSERT statement. The increase of the number and the complexity of hooks, will also lead to an increased probability of the creation of non unique records.
Solution and Learnings
The solution in preventing the creation non unique records would therefore be setting up a unique database index. There was already an index on project_id
and name
but the index was not unique. To make this index unique multiple without downtime, these steps are needed:
- Make sure no new duplicated records can be created
- Cleanup Database
- Remove old index
- Create a new unique index
Preventing the creation of non unique Records between Cleanup and index migration
Adding a unique index to a table which contains non unique entries will raise an exception. Therefore, we will need to cleanup the database before adding the index. If new duplicated entries are introduced between the cleanup and the adding of the index, the migration will fail. So we first have to make sure that no new duplicated records can be created. You can solve this by using temporary tables, but we chose another approach. As MySQL ignores NULL
values for unique indexes, we add a new column that we will set to project_id and the name concatenated to the new records. The default value of this column should be NULL
. On this field we are able to add a unique index.
class AddTmpFieldToTags < ActiveRecord::Migration
def change
add_column :tags, :tmp_field, :string, null: true, default: nil
add_index :tags, [:tmp_field], unique: true
end
end
When we are not creating any tag record skipping hooks, we can add a before_save
hook that sets the tmp_field
to the concatenation of project_id
and name
. This index will prevent the creation of non unique tag entries. Now the application will raise an exception every time while trying to create a duplicated entry, in order to avoid this we should change our code for creating tags to handle this issue. Rails will raise a RecordNotUnique exception in this case so we can catch this and select and return the existing tag. Here you should make sure to reload, so as to prevent strange query caching issues.
...
def self.create_or_find_tag_by_name_for_project(project, tag_name, user)
tag = TagRepository.tags_for_project(project).find_or_initialize_by(name: tag_name)
begin
if tag.new_record? && tag.save
...
end
rescue ActiveRecord::RecordNotUnique
tag = TagRepository.tags_for_project(project).reload.find_by(name: tag_name)
end
tag
end
...
Cleanup and Index change
After the changes are deployed, we can cleanup the database. Here you can go for migration or a script. We decided to do this by a migration. During the cleanup it is important to set the correct associations, because we do not want to lose the association between tags and the translation keys. After the cleanup is complete the index can be changed to be unique.
class AddUniqueIndexOnProjectIdAndNameToTags < ActiveRecord::Migration def change remove_index :tags, [:project_id, :name] cleanup add_index :tags, [:project_id, :name], unique: true end private def cleanup dub_tags = ActiveRecord::Base.connection.exec_query("SELECT name, project_id, COUNT(*) FROM tags GROUP BY name, project_id HAVING COUNT(*) > 1").to_hash
dub_tags.each do |tag|
unify_tag(tag["name"], tag["project_id"])
end
end
def unify_tag(name, project_id)
tags = Tag.where(project_id: project_id, name: name).order(:id)
return if tags.count < 2
Tag.transaction do
keys_to_tag = []
tag_to_keep = tags.to_a.first
tags.each_with_index do |tag, index|
keys_to_tag = keys_to_tag.concat(tag.translation_keys.to_a)
tag.destroy if tag.id != tag_to_keep.id
end
tag_to_keep.translation_keys = keys_to_tag.uniq { |key| key.id }
tag_to_keep.save
end
end
end
After this migration has run. The temporary field and the hook to fill it can be removed.
Lessons Learned
- When to use a unique index with Rails validation for uniqueness
- How to conduct a clean-up and add a unique index with zero downtime on MySQL