AI, startup hacks, and engineering miracles from your friends at Faraday

Buy or build: things we built

Seamus Abshere on

Here are Faraday's contributions to open source that we use every day in production. No experiments here; this is the stuff that we looked for on the shelf, found the options wanting, and built ourselves.

A new standard for secrets: Secretfile

secret_garden (Ruby), vault-env (JS), and credentials-to-env (Rust) all implement a standard we call Secretfile(s):

# /app/Secretfile
DATABASE_URL secrets/database/$VAULT_ENV:url
REDIS_URL secrets/redis/$VAULT_ENV:url

Then you use it like this SecretGarden.fetch('DATABASE_URL').
Clients implementing this standard are meant to first check the environment for DATABASE_URL, then failing that look up the secret in Hashicorp Vault (interpolating $VAULT_ENV into production, staging, etc. first). It's very useful for development where your DATABASE_URL is just postgres://seamus@ - you can save this in a local .env file and only mess with Vault in production/staging.

Lightning fast CSV processing: catcsv and scrubcsv

catcsv is a very fast CSV concatenation tool that gracefully handles headers and compression. It also supports Google's Snappy compression. We store everything on S3 and GCS szip'ed using burntsushi's szip.

$ cat a.csv

$ cat b.csv

$ szip a.csv

$ ls

$ catcsv b.csv

Of course, before you cat files, sometimes you need to clean them up with scrubcsv:

$ scrubcsv giant.csv > scrubbed.csv
3000001 rows (1 bad) in 51.58 seconds, 72.23 MiB/sec

Lightning-fast fixed-width to CSV: fixed2csv

fixed2csv converts fixed-width files to CSV very fast. You start with this:

first     last      middle
John      Smith     Q
Sally     Jones

You should be able to run:

$ fixed2csv -v 10 10 6 < input.txt

World's fastest geocoder: node_smartystreets

node_smartystreets is the world's fastest geocoder client. We shell out to its binary rather than using it as a library. It will do 10k records/second against the smartystreets geocoding API. If you don't have an Unlimited plan, use it with extreme caution.

Better caching: lock_and_cache

lock_and_cache (Ruby) and lock_and_cache_js (JS) go beyond normal caching libraries: they lock the calculation while it's being performed. Most caching libraries don't do locking, meaning that >1 process can be calculating a cached value at the same time. Since you presumably cache things because they cost CPU, database reads, or money, doesn't it make sense to lock while caching?

def expensive_thing
  @expensive_thing ||= LockAndCache.lock_and_cache("expensive_thing/#{id}", expires: 30) do
    # do expensive calculation

It uses Redis for distributed caching and locking, so this is not only cross-process but also cross-machine.

Better state machine: status_workflow

status_workflow handles state transitions with distributed locking using Redis. Most state machine libraries either don't do locking or use Postgres advisory locks.

class Document < ActiveRecord::Base
  include StatusWorkflow
    archive_requested: [:archiving],
    archiving: [:archived],

Then you can do


It's safe to use in a horizontally sharded environment because it uses distributed locking - the second process that tries to do this will get a InvalidTransition error even if it's the same microsecond.

Rust build tools: rust-musl-builder and heroku-buildpack-rust

rust-musl-builder is how we build Rust apps on top of Alpine. It also drives heroku-buildpack-rust, the preeminent way of running Rust on Heroku.

Minimal postgres for node: simple-postgres

simple-postgres (JS) is just the essentials to talk to Postgres from Node. We particularly love its use of template literals for apparently magical escaping:

let account = await db.row`
  FROM accounts
  WHERE id = ${id}

Yes, that's safe!

Minimal HTTP server: srvr

srvr (JS) is a small HTTP server that speaks for itself:

  • everything express does
  • better
  • less code
  • no dependencies
  • websockets

Proper Docker API support for Rust: boondock

boondock is a rewrite of rust-docker to be more correct.

Coordinate docker-compose: cage

cage boots multiple docker-compose.ymls, each as a pod. It's sortof like a local k8s. You configure it with a bunch of docker-compose files:

├── admin.yml (a pod containing adminweb and horse)
├── common.env (common env vars)
├── donkey.yml (a pod containing donkey)
├── placeholders.yml (development-only pod with redis, db, etc.)

Local development looks like this:

$ cage pull
==== Fetching secrets from vault into config/secrets.yml
==== Logging into ECR
Fetching temporary AWS 'administrator' credentials from vault
Pulling citus        ... done
Pulling citusworker1 ... done
Pulling citusworker2 ... done
Pulling queue        ... done
Pulling redis        ... done
Pulling s3           ... done
Pulling smtp         ... done
Pulling vault        ... done
Pulling horse        ... done
Pulling adminweb     ... done
$ cage up
Starting fdy_citusworker2_1 ... done
Starting fdy_smtp_1         ... done
Starting fdy_citus_1        ... done
Starting fdy_vault_1        ... done
Starting fdy_citusworker1_1 ... done
Starting fdy_queue_1        ... done
Starting fdy_s3_1           ... done
Starting fdy_redis_1        ... done
Starting fdy_horse_1 ... done
Starting fdy_adminweb_1 ... done
$ cage stop
Stopping fdy_citusworker2_1 ... done
Stopping fdy_vault_1        ... done
Stopping fdy_citus_1        ... done
Stopping fdy_s3_1           ... done

Fixed up rust crates: rust-amqp

rust-amqp@tokio (Rust) is our rewrite of the internals of the rust-amqp crate in proper tokio. It is much more reliable and needs to be merged upstream.

(beta release) 3rd gen batch processing on k8s: falconeri

falconeri is a distributed batch job runner for kubernetes (k8s). It is compatible with Pachyderm pipeline definitions, but is simpler and handles autoscaling, etc. properly.

(alpha release) Seamless transfer between Postgres/Citus and BigQuery: dbcrossbar

dbcrossbar handles all the details of transferring tables and data to and from Postgres and Google BigQuery. Additionally, it knows about citus, the leading Postgres horizontal sharding solution - so it can do highly efficient transfers between Citus clusters and BigQuery.


That's it. I only mentioned tools that we use every day.

How to read CREATE TABLE SQL with pg_query

Seamus Abshere on

pg_query is a really cool Ruby library that uses Postgres's query parser to give you programmatic access to SQL queries.

One thing you can do with it is read a CREATE TABLE statement to map column names to column types:

sql = <<-SQL
  CREATE TABLE example (
    my_text text,
    my_int int,
    my_float float

require 'pg_query'
types = PgQuery.parse(sql).tree[0].dig('RawStmt', 'stmt', 'CreateStmt', 'tableElts').map do |c|
end.inject({}) do |memo, c|
  memo[c.fetch('colname')] = c.dig('typeName', 'TypeName', 'names').detect { |td| td.dig('String', 'str') != 'pg_catalog' }.dig('String', 'str')

The output will be:

  "my_text"  => "text",
  "my_int"   => "int4",
  "my_float" => "float8"

Thanks to Lukas Fittl for this gem!

How to generate UUID ranges for SQL queries

Seamus Abshere on

So you've got a table with a UUID primary key. How do you search for only 1/4, or 1/8, or 1/128 of it?

BETWEEN '00000000-0000-0000-0000-000000000000' AND '3fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '40000000-0000-0000-0000-000000000000' AND '7fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '80000000-0000-0000-0000-000000000000' AND 'bfffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'c0000000-0000-0000-0000-000000000000' AND 'ffffffff-ffff-ffff-ffff-ffffffffffff'

And how do you generate those ranges? Here it is in Ruby, with n=4:

UUID_MAX = 2**128
def to_uuid(int)
  memo = int.to_s(16)
  memo = memo.rjust(32, '0')
  [ 8, 13, 18, 23 ].each do |p|
    memo = memo.insert(p, '-')
def sql_uuid_ranges(n)
  chunk = UUID_MAX / n
  (n+1) do |i|
    [chunk*i, chunk*i-1]
  end.each_cons(2).map do |bottom, top|
    a1, _ = bottom
    _, b2 = top
    "BETWEEN '#{to_uuid(a1)}' AND '#{to_uuid(b2)}'"
puts sql_uuid_ranges(4)

Note that we have to do some fancy stuff with chunk*i-1 because BETWEEN is inclusive on both sides. Not that that will cause a collision any time before the heat death of the universe, but still.

Here's the same thing, with n=64:

BETWEEN '00000000-0000-0000-0000-000000000000' AND '03ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '04000000-0000-0000-0000-000000000000' AND '07ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '08000000-0000-0000-0000-000000000000' AND '0bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '0c000000-0000-0000-0000-000000000000' AND '0fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '10000000-0000-0000-0000-000000000000' AND '13ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '14000000-0000-0000-0000-000000000000' AND '17ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '18000000-0000-0000-0000-000000000000' AND '1bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '1c000000-0000-0000-0000-000000000000' AND '1fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '20000000-0000-0000-0000-000000000000' AND '23ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '24000000-0000-0000-0000-000000000000' AND '27ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '28000000-0000-0000-0000-000000000000' AND '2bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '2c000000-0000-0000-0000-000000000000' AND '2fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '30000000-0000-0000-0000-000000000000' AND '33ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '34000000-0000-0000-0000-000000000000' AND '37ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '38000000-0000-0000-0000-000000000000' AND '3bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '3c000000-0000-0000-0000-000000000000' AND '3fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '40000000-0000-0000-0000-000000000000' AND '43ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '44000000-0000-0000-0000-000000000000' AND '47ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '48000000-0000-0000-0000-000000000000' AND '4bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '4c000000-0000-0000-0000-000000000000' AND '4fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '50000000-0000-0000-0000-000000000000' AND '53ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '54000000-0000-0000-0000-000000000000' AND '57ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '58000000-0000-0000-0000-000000000000' AND '5bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '5c000000-0000-0000-0000-000000000000' AND '5fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '60000000-0000-0000-0000-000000000000' AND '63ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '64000000-0000-0000-0000-000000000000' AND '67ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '68000000-0000-0000-0000-000000000000' AND '6bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '6c000000-0000-0000-0000-000000000000' AND '6fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '70000000-0000-0000-0000-000000000000' AND '73ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '74000000-0000-0000-0000-000000000000' AND '77ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '78000000-0000-0000-0000-000000000000' AND '7bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '7c000000-0000-0000-0000-000000000000' AND '7fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '80000000-0000-0000-0000-000000000000' AND '83ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '84000000-0000-0000-0000-000000000000' AND '87ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '88000000-0000-0000-0000-000000000000' AND '8bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '8c000000-0000-0000-0000-000000000000' AND '8fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '90000000-0000-0000-0000-000000000000' AND '93ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '94000000-0000-0000-0000-000000000000' AND '97ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '98000000-0000-0000-0000-000000000000' AND '9bffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN '9c000000-0000-0000-0000-000000000000' AND '9fffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'a0000000-0000-0000-0000-000000000000' AND 'a3ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'a4000000-0000-0000-0000-000000000000' AND 'a7ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'a8000000-0000-0000-0000-000000000000' AND 'abffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'ac000000-0000-0000-0000-000000000000' AND 'afffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'b0000000-0000-0000-0000-000000000000' AND 'b3ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'b4000000-0000-0000-0000-000000000000' AND 'b7ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'b8000000-0000-0000-0000-000000000000' AND 'bbffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'bc000000-0000-0000-0000-000000000000' AND 'bfffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'c0000000-0000-0000-0000-000000000000' AND 'c3ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'c4000000-0000-0000-0000-000000000000' AND 'c7ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'c8000000-0000-0000-0000-000000000000' AND 'cbffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'cc000000-0000-0000-0000-000000000000' AND 'cfffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'd0000000-0000-0000-0000-000000000000' AND 'd3ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'd4000000-0000-0000-0000-000000000000' AND 'd7ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'd8000000-0000-0000-0000-000000000000' AND 'dbffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'dc000000-0000-0000-0000-000000000000' AND 'dfffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'e0000000-0000-0000-0000-000000000000' AND 'e3ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'e4000000-0000-0000-0000-000000000000' AND 'e7ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'e8000000-0000-0000-0000-000000000000' AND 'ebffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'ec000000-0000-0000-0000-000000000000' AND 'efffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'f0000000-0000-0000-0000-000000000000' AND 'f3ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'f4000000-0000-0000-0000-000000000000' AND 'f7ffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'f8000000-0000-0000-0000-000000000000' AND 'fbffffff-ffff-ffff-ffff-ffffffffffff'
BETWEEN 'fc000000-0000-0000-0000-000000000000' AND 'ffffffff-ffff-ffff-ffff-ffffffffffff'

You get the idea.

ActiveRecord ↔︎ Sequel raw sql cheat sheet

Seamus Abshere on

UPDATED with fixes from @jeremyevans

Going back and forth between ActiveRecord and Sequel can be very confusing... especially because the Sequel documentation assumes that you use "datasets" instead of raw sql.

What ActiveRecord Sequel
One value ActiveRecord::Base.connection.select_value(sql) DB[sql].get
One row ActiveRecord::Base.connection.select_row(sql) DB[sql].first
One column across many rows ActiveRecord::Base.connection.select_values(sql) DB[sql].select_map
Many rows as hashes ActiveRecord::Base.connection.select_all(sql) DB[sql].all
Quote value ActiveRecord::Base.connection.quote(value) DB.literal(value)
Quote identifier (e.g. table name, column name) ActiveRecord::Base.connection.quote_ident(ident) DB.quote_identifier(ident)

Please email us if you think of any other ones!

Antipattern: Using Ruby's Hash#[]

Seamus Abshere on

This is part of our antipatterns series. Ouch! Updated for 2017!

Ask yourself why you're using Hash#[]. It is is a great way to introduce silent bugs into your app.

Use Hash#fetch if you expect the value to exist

That way you get sensible error messages.

#> params = {}

#> params.fetch('really').fetch('important')
KeyError: key not found: "really"

Use Hash#dig if you don't care

Because you don't get idiotic, non-semantic NoMethodError: undefined method '[]' for nil:NilClass errors.

#> params.dig('really', 'important')
=> nil

Avoid Hash#[] because... just... why?

#> params['really']['important']
NoMethodError: undefined method `[]' for nil:NilClass

Special case: ENV

The Twelve-Factor App has us all using environment variables. But most of us default to ENV#[] to look stuff up... even if it's critical. Bad idea! Use fetch!

#> ENV['REALLY_IMPORTANT'] == 'thing'
=> false # well i hope you didn't need that

#> ENV.fetch('REALLY_IMPORTANT') == 'thing'
KeyError: key not found: "REALLY_IMPORTANT"