Startup hacks and engineering miracles from your exhausted friends at Faraday

How to reverse geocode in bulk

Bill Morris on

This post is part of our practical cartography series.

We just rebuilt our Argo reverse-geocoding module as a proper command-line tool. Got a pile of coordinates in a table like this?

Pipe them through argo to get the context of an address assigned to each of them:

npm install argo-geo -g  
argo -i myfile.csv -a "blahblahmapzenauthtoken"  

Using Mapzen search, that'll churn through your table at 6 queries per second, appending results to each coordinate pair until it's done:

We built this to process millions of rooftop coordinates that a vendor provided to us without addresses, but you could just as easily use it for any position-only datasets:

  • Bird sightings from the field
  • Cars auto-extracted from imagery
  • GPS tracks from that pub crawl where you forgot the names of the bars
  • Mobile-collected reports of voter intimidation

We named it "Argo" to follow the Greek mythology pattern of Mapzen's geocoding engine "Pelias". Google and Mapbox each offer reverse-geocoding services as well, but those are just that: services. They include TOUs that restrict caching of the results, and man, did we want to cache these. The good folks at Mapzen built their search architecture on some truly amazing open datasets, and they match the spirit of the source by allowing storage and repurposing.

Thanks, Mapzen!

Getting bite-sized chunks of OpenStreetmap

Bill Morris on

At Faraday, we dig OSM.

OpenStreetmap (OSM) is the foundation of our basemap and a model of the power of open data. It guides customers on our platform to their ideal audiences . . .

baemap

. . . and it serves as building blocks for geospatial analysis, both the kind we already do and the kind we want to do more of.

The problem is that it's big. The entire OSM database is portable, but at 50GB it's not very friendly. Sometimes we just want the driveway network of one county, or the building footprints in a zip code. Whole companies have sprung up around this workflow, but we have a few tried-and-true-and-cheap tools that we rely on:

  • Mapzen-hosted metro extracts - If your desired zone is on the list of regularly-updated cities, just grab the shapefiles and go!
  • OSM vector tiles - Use these with toolsets like tilereduce for distributed geoprocessing at tile scale.
  • Overpass API - This tuneable endpoint works great for specific queries in minutely-defined regions (e.g. find all the one-way streets in Park Slope), but it can be a bit opaque. Use the query-overpass node module to spit out GeoJSON with minimal fuss.

Happy mapping!