Startup hacks and engineering miracles from your exhausted friends at Faraday

How to crunch lots of geodata in parallel

Bill Morris on

This post is part of our data science and practical cartography series.

GNU parallel + ogr2ogr = happy data scientists

These power tools in combination make it very easy to process lots of geodata at once, in as many parallel operations as your local machine or server can support.

Reprojecting in bulk

Here's an example, assuming you have a folder full of shapefiles you want to reproject into Geographic coordinates. Make a directory for the output, then pipe every shapefile through ogr2ogr in parallel:

mkdir wgs84  
ls *.shp | parallel ogr2ogr -t_srs 'EPSG:4326' wgs84/{} {}  

Running a sequence of commands on many files

In order to build whole data workflows, you can wrap your sequence of commands in a bash function. Here's an example, where we:

  1. Download each state landmarks file from the census FTP
  2. Extract each file
  3. Create a new file for each consisting of only airport landmarks, projected to WGS84
# grab this handy list of all state FIPS codes
wget -c https://gist.githubusercontent.com/wboykinm/6c514e9caf1fc3158e350fa926ea02bd/raw/f742515fd06824dafd0a88c62b4de11fa1e39fa1/state_fips_codes.txt

# define the function
get_airports() {  
  # grab the data from the census server
  wget -c http://www2.census.gov/geo/tiger/TIGER2016/POINTLM/tl_2016_$1_pointlm.zip
  unzip tl_2016_$1_pointlm.zip
  # extract just airports (code K2451) and reproject to WGS84
  ogr2ogr -t_srs "EPSG:4326" -where "MTFCC = 'K2451'" tl_2016_$1_airports.shp tl_2016_$1_pointlm.shp
  echo "done with state $1"
}
export -f get_airports

# kick off the parallel processing!
cat state_fips_codes.txt | parallel get_airports {}

This crunches through 52 states and territories in 21.8 seconds on a small ec2 server, limited only by network speed.

airports

Install the tools

  • GNU parallel
    • OSX: brew install parallel
    • Ubuntu: apt-get install parallel
  • ogr2ogr
    • OSX: brew install gdal --HEAD
    • Ubuntu: sudo apt-get install gdal-bin

Bonus toolkit: From Derek Watkins, here are a few dozen examples of the awesome geoprocessing you can you with GDAL/OGR.

Happy mapping!