Startup hacks and engineering miracles from your exhausted friends at Faraday

How to use Postgres cursors and multi-insert in Ruby with Sequel

Seamus Abshere on

This post is part of our PostgreSQL series.

SQL cursors are a great way to stream data out of a database without loading it all into memory.

Reading using a cursor

Sequel has use_cursor:

BATCH_SIZE = 1000  
DB[:widgets].  
  select(:id, :name, :power).
  use_cursor(rows_per_fetch: BATCH_SIZE).each do |row|
    row[:id] # whatever
  end
end  

Writing using cursors and multi-insert

Here's a real-world example - we had to copy things from widgets to widgets_copy.

We thought we were being really clever, keeping all the data inside Postgres:

# NOTE: this is not the recommended method, but it works too
batches = (desired_count.to_f / BATCH_SIZE).ceil  
batches.times do |i|  
  offset = i * BATCH_SIZE
  DB.run <<-SQL
    INSERT INTO widgets_copy (
      id,
      name,
      power,
      created_at,
      updated_at
    ) (
      SELECT
        uuid_generate_v4(),
        name,
        power,
        now(),
        now(),
      FROM widgets
      ORDER BY id
      LIMIT #{BATCH_SIZE}
      OFFSET #{offset}
    )
  SQL
end  

Even with 244gb of RAM, this sometimes stopped dead 75% of the way through. We rewrote it—pulling the data into Ruby, even—and it's both faster and doesn't stop in the middle. We're using Sequel's multi_insert.

batch = []  
now = Sequel.function :now  
uuid_generate_v4 = Sequel.function :uuid_generate_v4  
DB[:widgets].  
  select(:name, :power).
  order(:id).
  use_cursor(rows_per_fetch: BATCH_SIZE).each do |row|
    batch << row.merge!(
      id: uuid_generate_v4,
      updated_at: now,
      created_at: now
    )
    if (batch.length % BATCH_SIZE == 0) || batch.length == desired_count
      DB[:widgets_copy].multi_insert batch
      batch.clear
    end
  end
end  

Thanks to @dkastner and @devTristan who promoted these ideas internally!