Rails Validation: Pitfalls in Validating Uniqueness with Active Record

Pitfalls In The Validation For Uniqueness Using Rails ActiveRecord

Active Record Validation in Rails provides an easy way to validate records for uniqueness. With the help of a real-world example, we'll try to explain why relying solely on Rails validation could become a problem and demonstrate a way to clean up your database without downtime.

The tagging of translation keys was added as a feature in the very early stages of Phrase’s product development. The feature was continuously improved over time and its performance – very reliable. Until a customer contacted us one day due to a validation error.

The Problem

The tagging feature had been used rather extensively by this customer to automatically tag all uploaded keys with a GitHub pull request, leading to a validation error with an existing tag. So we investigated the issue and discovered that the error was caused by a tag name which was not unique to the project. So how could that happen? We had used Active Record uniqueness validation and only Rails without skipping the validation.

Code State

The last state was that the Tag should have a name field and belong to a project as well as have an Active Record validation, so that a tag was unique to the project_id and name:

State of Codebase

Our code to create new tags looked something like this:

State within Database

Checking our database revealed that there were 40k records that were not unique.

Furthermore, we saw that all of the duplicated records were system generated tags by our upload feature. Uploads are processed asynchronously by concurrent workers. By using our client’s push command an upload for each locale is triggered. On any upload the user can allocate a tag, the keys should also be tagged. This combination dramatically increases the probability of two processes trying to create a tag at the same time.

What happened?

If uniqueness validation is enabled, Rails will look for existing records before performing Tag.create, Tag.save, Tag.update ... operations. If a record was found the validation fails and the transaction will be rolled back, if not the record will be saved. Example of how Rails validation fails:

And here’s an example of a successful Rails validation:

Through this example you can see that there is no lock on the table between the SELECT and the INSERT statement. In this time gap another concurrent process can also create a record with the same name and the same project_id without any failing validation. As you can see hooks are also triggered between the validation and INSERT statement. The increase of the number and the complexity of hooks, will also lead to an increased probability of the creation of non unique records.

Solution and Learnings

The solution in preventing the creation non unique records would therefore be setting up a unique database index. There was already an index on project_id and name but the index was not unique. To make this index unique multiple without downtime, these steps are needed:

  1. Make sure no new duplicated records can be created
  2. Cleanup Database
  3. Remove old index
  4. Create a new unique index

Preventing the creation of non unique Records between Cleanup and index migration

Adding a unique index to a table which contains non unique entries will raise an exception. Therefore, we will need to cleanup the database before adding the index. If new duplicated entries are introduced between the cleanup and the adding of the index, the migration will fail. So we first have to make sure that no new duplicated records can be created. You can solve this by using temporary tables, but we chose another approach. As MySQL ignores NULL values for unique indexes, we add a new column that we will set to project_id and the name concatenated to the new records. The default value of this column should be NULL. On this field we are able to add a unique index.

When we are not creating any tag record skipping hooks, we can add a before_save hook that sets the tmp_field to the concatenation of project_id and name. This index will prevent the creation of non unique tag entries. Now the application will raise an exception every time while trying to create a duplicated entry, in order to avoid this we should change our code for creating tags to handle this issue. Rails will raise a RecordNotUnique exception in this case so we can catch this and select and return the existing tag. Here you should make sure to reload, so as to prevent strange query caching issues.

Cleanup and Index change

After the changes are deployed, we can cleanup the database. Here you can go for migration or a script. We decided to do this by a migration. During the cleanup it is important to set the correct associations, because we do not want to lose the association between tags and the translation keys. After the cleanup is complete the index can be changed to be unique.

After this migration has run. The temporary field and the hook to fill it can be removed.

Lessons Learned

  • When to use a unique index with Rails validation for uniqueness
  • How to conduct a clean-up and add a unique index with zero downtime on MySQL
Rate this post