AI, startup hacks, and engineering miracles from your friends at Faraday

How to get U.S. Census data as CSV — censusapi2csv

Bill Morris on

This post is part of our data science series.

The U.S. Census and American Community Survey (ACS) are the crown jewels of open data (bother your Representative today to make sure they stay that way), but working with data from the Census API isn't always intuitive. Here's an example response to an API call for ACS per capita income data:

[["B19301_001E","state","county","tract","block group"],
["25611","50","007","000100","1"],
["36965","50","007","000100","2"],
["29063","50","007","000200","1"],
. . .

It's not a CSV, it's not exactly JSON, it's just . . . data. We tend to use CSVs as our basic building blocks, so we built a tool to nudge this response into a pure format. Here's how to use it:

Install

npm install censusapi2csv -g

Usage

Let's grab a few things from the ACS API: total population (B01001) and per capita income (B19301), for every block group in Chittenden County, Vermont:

censusapi2csv -l 'block group' -f B01001,B19301 -s 50 -c 007

. . . we can even pipe this into our favorite CSV-parsing tool, xsv:

censusapi2csv -l 'block group' -f B01001,B19301 -s 50 -c 007 | xsv table

. . . and we get a formatted look at the data:

B01001_001E  B19301_001E  state  county  tract   block group
3057         25611        50     007     000100  1
1200         36965        50     007     000100  2
1641         29063        50     007     000200  1
1882         28104        50     007     000200  2
699          61054        50     007     000200  3
. . .

This is just a tiny step in the process of working with census data - and there are many alternative approaches - but we thought it was worth sharing.

How to reverse geocode in bulk

Bill Morris on


This post is part of our practical cartography series.

We just rebuilt our Argo reverse-geocoding module as a proper command-line tool. Got a pile of coordinates in a table like this?

Pipe them through argo to get the context of an address assigned to each of them:

npm install argo-geo -g
argo -i myfile.csv -a "blahblahmapzenauthtoken"

Using Mapzen search, that'll churn through your table at 6 queries per second, appending results to each coordinate pair until it's done:

We built this to process millions of rooftop coordinates that a vendor provided to us without addresses, but you could just as easily use it for any position-only datasets:

  • Bird sightings from the field
  • Cars auto-extracted from imagery
  • GPS tracks from that pub crawl where you forgot the names of the bars
  • Mobile-collected reports of voter intimidation

We named it "Argo" to follow the Greek mythology pattern of Mapzen's geocoding engine "Pelias". Google and Mapbox each offer reverse-geocoding services as well, but those are just that: services. They include TOUs that restrict caching of the results, and man, did we want to cache these. The good folks at Mapzen built their search architecture on some truly amazing open datasets, and they match the spirit of the source by allowing storage and repurposing.

Thanks, Mapzen!