Software localization

ActiveRecord: How to Speed Up Your SQL Queries

Find out how to improve and debug generated SQL queries in ActiveRecord and learn how to work with Rails and ActiveRecord to write a more performant code.
Software localization blog category featured image | Phrase

Today, I want to share some knowledge about improving the performance of ActiveRecord queries by selecting only the needed data and reducing the number of instantiated objects. There are many cases in your Rails application where you only need a small subset of data and not the whole record. For example, if you want to render a user-select box where you only need the "id" and "name" fields you can use something similar to the following code:

<%=  f.select(:user_id, current_account.users.map{ |user| [user.name, user.id] }%>

But if we take a closer look here it is obvious that we instantiate an object for every user found on this account and also fetch all fields of the user table from the database. For a small amount of found records, this is not a big deal but imagine an account having thousands of users.

ActiveRecord provides methods that solve this issue without writing crazy custom SQL queries.

Load only the Data you need!

If you are using `Model.find`, `Model.where`, etc. Active Record will generate SQL-like `SELECT models.* WHERE (condition)`. Most of the time this is tolerable but if you select a lot of records and don't need all fields this will load a lot of unneeded data into the memory.

In the end, this results in a longer DB time and, moreover, wastes a lot of memory. You can use the pluck and select method to increase overall performance.

pluck

The pluck method allows you to select only a subset of requested fields and stores them in an array.

For example:

User.where(active: true).pluck(:name)

will return all names of active users in an array like:

["Barny Gumble", "Bart Simpson", ...]

Try using pluck to select multiple fields. The returned Data will be structured in a two-dimensional array.

Let's assume you want also select ids of your active user:

User.where(active: true).pluck(:id, :name)

[

  [1, "Barny Gumble`"],

  [2, "Bart Simpson"],

]

In my opinion a typical use case for pluck is the selection of data for select boxes, using ids as sub-selects in a where clause. Typically you should think aboutpluck in every case you see something like `collection.map { |record| record.field }`.

By usingpluck your allocated memory and the database time is reduced. Often, you can remove time constructs like `.map{ ... }` and make your code more compact.

select

The select Method is another way to limit the fields selected from your database. The main difference to pluck is that an ActiveRecord model object is created, instead of returning an array of the selected fields. This allows you to call methods on this model.

> Translation.select(:id, :content)

Translation Load (0.6ms) SELECT `translations`.`id`, `translations`.`content` FROM `translations`

=> [#<Translation:0x000000055cf0f8 id: 1, content: "My Translation">,

...

]

Be careful: if you try to access a field that was not selected it will raise an ActiveModel::MissingAttributeError.

> Translation.select(:id).first.unverified?

Translation Load (0.8ms) SELECT `translations`.`id` FROM `translations` ORDER BY `translations`.`id` ASC LIMIT 1

ActiveModel::MissingAttributeError: missing attribute: changed_in_main_locale

I recommend using select for models with large text fields or lots of fields you don't need to load. A typical use case at Phrase is our translations table. When we only want to verify a batch of translations we don't need to load the whole content.

Benchmark

Finally, a simple benchmark approves this. This benchmark compares the time needed to load 10 000 ids of a record into an array using (pluck,select and loading full records). As expected the pluck solution is the fastest followed by select (10 times slower thanpluck). The solution without selecting only specific fields takes the last place being 20 times slower thanpluck.

require "rails_helper"

require "benchmark"

describe "performance" do

  before do

    FactoryGirl.create_list(:user, 10000, deleted_at: nil)

    FactoryGirl.create_list(:user, 2000, deleted_at: Date.today - 1.day)

  end

  specify do

    Benchmark.bmbm do |bm|

      User.connection.clear_query_cache

      bm.report("select") do

        user_ids = User.active.select(:id).map(&:id)

        user_ids

      end

      User.connection.clear_query_cache

      bm.report("map") do

        user_ids = User.active.map(&:id)

        user_ids

      end

      User.connection.clear_query_cache

      bm.report("pluck") do

        user_ids = User.active.pluck(:id)

        user_ids

      end

    end

  end

end

$ docker-dev/spec.sh spec/perf_spec.rb

Rehearsal ------------------------------------------

select   0.120000   0.000000   0.120000 (  0.120574)

map      0.210000   0.030000   0.240000 (  0.269985)

pluck    0.010000   0.000000   0.010000 (  0.011394)

--------------------------------- total: 0.370000sec

             user     system      total        real

select   0.100000   0.000000   0.100000 (  0.104537)

map      0.220000   0.020000   0.240000 (  0.268657)

pluck    0.010000   0.000000   0.010000 (  0.015573)

.

Finished in 1 minute 1.21 seconds (files took 0.64633 seconds to load)

1 example, 0 failures

Further Reading

If you are interested in the ActiveRecord topic, also make sure to check out our overview of pitfalls in the validation of uniqueness using Rails ActiveRecord.