Startup hacks and engineering miracles from your exhausted friends at Faraday

Customer Success, cookies, and diminishing returns: don't announce features all at once

Thomas Bryenton on

Chocolate chip cookies

Right this moment, how much would you pay for a cookie at a coffeeshop? Let's say the amount is $3.


(I could really go for one, so my personal estimate might be higher. I mean, check out how good that looks.)

How much would you be open to paying the coffeeshop for a subsequent cookie? A third cookie? A fifth?

Probably it would be steadily less than $3. Why? Later cookies have lower value.

Well . . . for now at least. If you come back to the coffeeshop tomorrow, you could be ready for more!

Customer Success

A related principle applies in Customer Success: people love hearing when you ship a feature they wanted. But tell them about a third feature or a fifth? Their eyes will glaze right over.

(Speaking of glaze, if cookies aren't your thing, how about these glazed scones?)


Sure, sometimes an engineering team works in bursts. Progress can come all at once. Then, the temptation for Customer Success is to "catch up" by reporting things to clients exactly when they happen.

This doesn't mean that as a Customer Success team you can't be strategic about who to announce features to, and when.


  • Notice each client's rhythm of hunger for new features
  • Keep a stash of "feature snacks" to give clients
  • Treat yourself to a literal or figurative cookie (or scone)

How to aggregate JSONB in PostgreSQL 9.5+

Seamus Abshere on

This is part of our series on PostgreSQL and things that are obvious once you see them. It's a 2017 update to our 2016 article How to merge JSON fields in Postgres Nice!

Update We used to call this jsonb_collect, but then we realized it was very similar to json_object_agg(name, value)... except that there is no version of that function that just takes an expression. So, copying jsonb_agg(expression), we give you...

How do you aggregate (aka merge aka combine aka collect) JSONB fields in Postgres 9.5+? You define a new aggregate jsonb_object_agg(expression), used here:

# select * from greetings;
(3 rows)

# select jsonb_object_agg(data) from greetings;
 { "es" : "Saludos", "en" : "Hello", "ru" : "Здравствуйте" }
(1 row)

jsonb_object_agg is not built into Postgres. You have to create it yourself:

CREATE AGGREGATE jsonb_object_agg(jsonb) (  
  SFUNC = 'jsonb_concat',
  STYPE = jsonb,
  INITCOND = '{}'

Note that jsonb_concat(jsonb, jsonb) is the function behind || that was introduced in Postgres 9.5.

It's just that simple!

Be not afraid of ZCTAs

Bill Morris on

This post is part of our practical cartography series.

Most American geographers will note that - as much as we'd like it to be otherwise - ZIP Codes are not polygons. Rather, they're constantly-changing lines used by the USPS to coordinate delivery in an efficient network. Many of us polygon-happy mappers use ZIP Code Tabulation Areas (ZCTAs) instead; these are provided by the US Census as a reasonable open data alternative to ZIPs. They're particularly nice for thematic mapping (though their shortcomings have also been well-documented):


But why use ZCTAs if they can never be reconciled with their ground-truth ZIP cousins?

Because the difference is small.

Faraday has address and location records for every household in the country, and it was straightforward to check for disagreement between the ZIP Code of each physical address and the ZCTA polygon that contains it.

Here are the results, broken down by state

The national error rate of ZCTAs is 1.4%. That might be too high for some use cases, but perfectly acceptable for others. There's some regional variation, too: you're usually safe to use ZCTAs in Hawaii and Maine, but might want to exercise caution in Oregon and Utah.

Happy mapping!

How to finally use headless Chrome to power your automated tests

Derek Kastner on

Google Chrome version 59 will ship with the headless
. This means you can test your web applications using chrome without needing xvfb. One problem: the latest chromedriver (version 2.29) doesn't support versions of Chrome higher than 58.

The solution is to build the latest chromedriver that supports the latest chrome/chromium. Google does not make nightly builds of chromedriver public, you have to download the chromium source and build
chromedriver yourself.

How all the pieces work together

Cucumber uses the capybara gem to send commands to selenium-webdriver. Selenium-webdriver in turn starts up your local copy of chromedriver, which then starts up chrome and controls the browser through a special debug port.

To get the latest chromium, I used a tool on GitHub that downloads the latest snapshot compiled for Linux.

To get the latest chromedriver, I followed the
build instructions.

Configuring the tests

When configuring capybara, you need to tell selenium-webdriver the path to your custom chromium binary and send the --headless flag, along with other flags
you'll likely need in a CI build node environment.

For running in docker:

Capybara.register_driver :headless_chromium do |app|  
  caps =                       
    "chromeOptions" => {                                                         
      'binary' => "/chromium-latest-linux/466395/chrome-linux/chrome",           
      'args' => %w{headless no-sandbox disable-gpu}                              
  driver =                                       
    browser: :chrome,                                                            
    desired_capabilities: caps                                                   

Capybara.default_driver = :headless_chromium  

Now, capybara will drive a headless chromium!

Two magic words for greater independence and communication

Thomas Bryenton on

What if there were two words you could add to any email to get your team to weigh in quickly?

There are: DEFAULT DO

How default-do works

  1. Write up a final version of what you'll be doing (your "default do")
  2. Tell teammates you're about to do this thing
  3. Do the thing

Just default do it

I use default-do every day to keep Faraday moving:

  • Emails
  • Code
  • Mockups
  • Decisions
  • Policies
  • Blog posts!

It doesn't matter what it is: if I'm confident it's the right thing to do, I'll tell my team I'm about to do it, pause, then just do it.

Don't wear it out. If you don't really want feedback, don't ask, and if you need it, your default isn't "do," it's "don't."

Use headless chromium with capybara and selenium webdriver - today!

Seamus Abshere on

UPDATE: we have a new version of this post out that resolves some of the gotchas below.

Here's a Rubyist's magic incantation to run headless chromium with selenium-webdriver and capybara: (it will be similar in other languages)

require 'selenium-webdriver'

Capybara.register_driver :headless_chromium do |app|  
  caps =
    "chromeOptions" => {
      'binary' => '/home/myuser/chrome-linux-440004/chrome',
      'args' => ['headless', 'disable-gpu']
  driver =
    browser: :chrome,
    desired_capabilities: caps


  1. You need a chromium (chrome) binary that reports version 57 (version 59 is too new). For example, snapshot 440004 - just download and unzip.
  2. You need a recent chromedriver binary with support for headless chrome. For example, snapshot 460342 - just download and unzip.
  3. If you get the error unrecognized chrome version, then see (1) above - you probably have a too-recent chromium.

Thanks to @dkastner!

Antipattern: Using Ruby's Hash#[]

Seamus Abshere on

This is part of our antipatterns series. Ouch! Updated for 2017!

Ask yourself why you're using Hash#[]. It is is a great way to introduce silent bugs into your app.

Use Hash#fetch if you expect the value to exist

That way you get sensible error messages.

#> params = {}

#> params.fetch('really').fetch('important')
KeyError: key not found: "really"  

Use Hash#dig if you don't care

Because you don't get idiotic, non-semantic NoMethodError: undefined method '[]' for nil:NilClass errors.

#> params.dig('really', 'important')
=> nil

Avoid Hash#[] because... just... why?

#> params['really']['important']
NoMethodError: undefined method `[]' for nil:NilClass  

Special case: ENV

The Twelve-Factor App has us all using environment variables. But most of us default to ENV#[] to look stuff up... even if it's critical. Bad idea! Use fetch!

#> ENV['REALLY_IMPORTANT'] == 'thing'
=> false # well i hope you didn't need that

#> ENV.fetch('REALLY_IMPORTANT') == 'thing'
KeyError: key not found: "REALLY_IMPORTANT"  

Plancha: how to flatten multi-sheet excel workbooks

Bill Morris on

This is part of our series on data science because it belongs in your toolchain.

If you work with data long enough - actually scratch that; if you work with data for more than a week - you'll run into the dreaded multi sheet (or tab) excel workbook. Sometimes the sheets are unrelated, but other times they should really all be stacked together in the same table, ideally in a more-interoperable format than .xlsx:


Enter plancha. Named for the trusty tortilla press, we built this simple CLI tool to flatten multi-sheet excel files, resolve header mismatches, and return a pipeline-friendly csv, like this:



This is a node.js tool, so use npm:

npm install plancha -g


Just feed it an input .xlsx file:

plancha -i myfile.xlsx

Happy data-pressing!

scrubcsv: now with null value removal

Seamus Abshere on

This is part of our series on data science because it belongs in your toolchain. Happy Null Removal!

The latest version of scrubcsv has built-in null value removal:

$ cat a.csv

$ scrubcsv -n 'null|n/a' a.csv

See how null and n/a went away?

Get the latest version with

$ cargo install scrubcsv -f

How to export a Dataiku DSS (or any scikit-learn) model to PMML

Andy Rossmeissl on

This post is part of our data science series

At Faraday we use Dataiku to do ad hoc exploratory data science work, and especially for investigating new predictive techniques before building them into our platform.

Dataiku is awesome and has an incredibly responsive team. One drawback for me, however, has been Dataiku's lack of support for PMML, a standard serialization format for predictive models and their associated apparatus.

Luckily with a little hacking you can export a Dataiku model to PMML. And this technique can work anywhere you have a scikit-learn-based model you're trying to export.


We're going to use Dataiku's built-in Python environment, which lives in your DSS data directory (generally /Users/username/Library/DataScienceStudio/dss_home on a Mac). We need to add a couple libraries first:

$ ./bin/pip install sklearn_pandas
$ ./bin/pip install git+

You'll also need a working JDK. If this doesn't work:

$ java -version
java version "1.8.0_121"  

Then install a JDK. (On Mac: brew cask install java.)

Locate your classifier

OK, now let's get our hands on the model you're trying to export. Maybe it's already in memory, but more likely it's pickled on disk. With Dataiku, you'll find your pickled classifier in a path that looks like this:


There it is, clf.pkl. It's helpful to copy this file into your working dir so we don't accidentally disturb it.

Export the model to PMML

Now let's start up an interactive Python console — again using Dataiku's built-in environment:

$ ./bin/python
Python 2.7.10 (default, Oct 23 2015, 19:19:21)  

First let's load up some libraries:

>>> from sklearn.externals import joblib
>>> from sklearn2pmml import PMMLPipeline
>>> from sklearn2pmml import sklearn2pmml

Now we'll unmarshal the model using joblib, a pickle-compatible serialization library:

>>> clf = joblib.load('/path/to/clf.pkl')

Here's the only tricky part: we have to wrap the trained estimator in a Pipeline-like object that sklearn2pmml understands. (This is likely to get less tricky soon.)

>>> pipeline = PMMLPipeline([
...   ("estimator", clf)
... ])

And finally perform the export:

>>> sklearn2pmml(pipeline, "clf.pmml")
INFO: Parsing PKL..  
INFO: Marshalled PMML in 714 ms.  

All done! The heavy lifting here is done by sklearn2pmml, which wraps the JPMML-SkLearn library. Thanks to Villu Ruusmann in particular for his help.