AI, startup hacks, and engineering miracles from your friends at Faraday

How to read CREATE TABLE SQL with pg_query

Seamus Abshere on

pg_query is a really cool Ruby library that uses Postgres's query parser to give you programmatic access to SQL queries.

One thing you can do with it is read a CREATE TABLE statement to map column names to column types:

sql = <<-SQL
  CREATE TABLE example (
    my_text text,
    my_int int,
    my_float float
  )
SQL

require 'pg_query'
types = PgQuery.parse(sql).tree[0].dig('RawStmt', 'stmt', 'CreateStmt', 'tableElts').map do |c|
  c.fetch('ColumnDef')
end.inject({}) do |memo, c|
  memo[c.fetch('colname')] = c.dig('typeName', 'TypeName', 'names').detect { |td| td.dig('String', 'str') != 'pg_catalog' }.dig('String', 'str')
  memo
end

The output will be:

{
  "my_text"  => "text",
  "my_int"   => "int4",
  "my_float" => "float8"
}

Thanks to Lukas Fittl for this gem!

How to use Postgres cursors and multi-insert in Ruby with Sequel

Seamus Abshere on

This post is part of our PostgreSQL series.

SQL cursors are a great way to stream data out of a database without loading it all into memory.

Reading using a cursor

Sequel has use_cursor:

BATCH_SIZE = 1000
DB[:widgets].
  select(:id, :name, :power).
  use_cursor(rows_per_fetch: BATCH_SIZE).each do |row|
    row[:id] # whatever
  end
end

Writing using cursors and multi-insert

Here's a real-world example - we had to copy things from widgets to widgets_copy.

We thought we were being really clever, keeping all the data inside Postgres:

# NOTE: this is not the recommended method, but it works too
batches = (desired_count.to_f / BATCH_SIZE).ceil
batches.times do |i|
  offset = i * BATCH_SIZE
  DB.run <<-SQL
    INSERT INTO widgets_copy (
      id,
      name,
      power,
      created_at,
      updated_at
    ) (
      SELECT
        uuid_generate_v4(),
        name,
        power,
        now(),
        now(),
      FROM widgets
      ORDER BY id
      LIMIT #{BATCH_SIZE}
      OFFSET #{offset}
    )
  SQL
end

Even with 244gb of RAM, this sometimes stopped dead 75% of the way through. We rewrote it—pulling the data into Ruby, even—and it's both faster and doesn't stop in the middle. We're using Sequel's multi_insert.

batch = []
now = Sequel.function :now
uuid_generate_v4 = Sequel.function :uuid_generate_v4
DB[:widgets].
  select(:name, :power).
  order(:id).
  use_cursor(rows_per_fetch: BATCH_SIZE).each do |row|
    batch << row.merge!(
      id: uuid_generate_v4,
      updated_at: now,
      created_at: now
    )
    if (batch.length % BATCH_SIZE == 0) || batch.length == desired_count
      DB[:widgets_copy].multi_insert batch
      batch.clear
    end
  end
end

Thanks to @dkastner and @devTristan who promoted these ideas internally!