Startup hacks and engineering miracles from your exhausted friends at Faraday

How to permanently delete versioned objects from S3

Seamus Abshere on

This is part of our cloud security and things that are obvious once you see them series. Duhh... safe!

Amazon's explanation of deleting a versioned object and the SDK documentation do not give an example of permanently deleting a versioned object. Here's how to do it:

require 'aws-sdk'

s3 = Aws::S3::Resource.new(  
  region: 'us-east-1',
  access_key_id: ACCESS_KEY_ID,
  secret_access_key: SECRET_ACCESS_KEY
)
bucket = s3.bucket('my-versioned-bucket')

bucket.objects.each do |object_summary|  
  o = bucket.object object_summary.key
  # this is the secret: specify the version while deleting
  o.delete version_id: o.version_id
end  

If you don't specify the version, you get a delete marker, which you can proceed to delete infinite times and it will not go away :)

How to enable S3 server-side encryption for existing objects

Seamus Abshere on

This is part of our cloud security series.

Do you have unencrypted S3 objects lying around? Don't! Here's the safe way to retroactively enable server-side encryption:

Step 1: Make a backup bucket

AWS management console is easiest. Call it [my-bucket]-backup.

Step 2: Copy one way

require 'aws-sdk'

s3 = Aws::S3::Resource.new(region: 'us-east-1', access_key_id: ACCESS_KEY_ID, secret_access_key: SECRET_ACCESS_KEY)  
b1 = s3.bucket('my-bucket')  
b2 = s3.bucket('my-bucket-backup')

# or no prefix if you want everything
b1.objects(prefix: 'xyz').each do |object_summary|  
  o1 = b1.object object_summary.key
  o2 = b2.object object_summary.key
  o1.copy_to o2, server_side_encryption: 'AES256'
end  

Step 3: Sanity check

Now look at [my-bucket]-backup - it's probably 100% perfect, but just reassure yourself.

Step 4: Copy back over

There are 2 changes here, so you might want to copy-paste:

b2.objects.each do |object_summary|  
  o1 = b1.object object_summary.key
  o2 = b2.object object_summary.key
  o2.copy_to o1, server_side_encryption: 'AES256'
end  

Step 5: (optional) Clean up

Delete [my-bucket]-backup.

Postgres strftime (or: how to group by month)

Seamus Abshere on

This post is part of our data science and PostgreSQL series.

"How do you do strftime in postgres?"

The answer: to_char(date, format).

If you want to group by month, this is what you're looking for:

psql=> select count(*), to_char(created_at, 'YYYY-MM') from employees group by to_char(created_at, 'YYYY-MM') order by to_char(created_at,'YYYY-MM') desc;  
 count | to_char
-------+---------
    27 | 2016-08
    32 | 2016-07
    58 | 2016-06
    17 | 2016-05
    57 | 2016-04
    44 | 2016-03
    28 | 2016-02
    45 | 2016-01
    10 | 2015-12
    10 | 2015-11
    24 | 2015-10
    15 | 2015-09
    32 | 2015-08
    38 | 2015-07
    31 | 2015-06
    18 | 2015-05
    19 | 2015-04
     5 | 2015-03
     8 | 2015-02
    10 | 2015-01
     7 | 2014-12
    22 | 2014-11
(22 rows)

That's it.

Cache headers served by Google, Facebook, AWS in 2016

Seamus Abshere on

Recently I realized that our SPA app was misbehaving: our JS and CSS had cache-busting digests in the URLs, but browsers were caching the underlying index.html.

What do the big apps send?

Cache-Control Expires Pragma
AWS management console no-cache, no-store, must-revalidate -1 no-cache
Facebook private, no-cache, no-store, must-revalidate Sat, 01 Jan 2000 00:00:00 GMT no-cache
Google (the search engine) private, max-age=0 -1
gmail no-cache, no-store, max-age=0, must-revalidate Mon, 01 Jan 1990 00:00:00 GMT no-cache
Google Drive no-cache, no-store, max-age=0, must-revalidate Mon, 01 Jan 1990 00:00:00 GMT no-cache
Intercom max-age=0, private, must-revalidate
LinkedIn no-cache, no-store Thu, 01 Jan 1970 00:00:00 GMT no-cache

Implementation in nginx

In the end, I just went with nginx's default:

expires epoch;  

That produced

Cache-Control: no-cache  
Expires: Thu, 01 Jan 1970 00:00:01 GMT  

:D

Mountains of Census geodata for all

Bill Morris on

U.S. Census data gives our modeling a good predictive boost, and it's a robust quality assurance tool for all the third-party data we've got flowing through our wires.

The Census offers its geographic data in easy-to get, familiar formats via the TIGER portal, but distribution is split up by state for the largest datasets: blocks and block groups. There's a pretty simple reason for this: they're big. The census block shapefile for Indiana alone is 116MB compressed.

eastcoast

Ours is probably not a common use case, but we need all of the blocks and block groups in our database - merged, indexed and queryable. It took a significant amount of work to get them there, so in case anyone else needs them too, we're sharing national 2015 datasets in PostGIS dumpfile format, downloadable and ready to use here:


Census block groups

.pg_dump (426MB) | .sql (1.2GB) bg


Census blocks

.pg_dump (4.7GB) | .sql (12GB) b


Add these to your local PostgreSQL database like so:

pg_restore --no-owner --no-privileges --dbname <dbname> <filename>.pg_dump

# OR

psql <dbname> -f <filename>.sql  

To keep things simple, these are just geometries and GeoIDs (CREATE TABLE schemas can be perused here). Detailed analysis will require joining attributes separately.

Side note: I can't recommend censusreporter.org enough for census-based sanity checks.

Happy mapping!

The big picture: exportable maps

Bill Morris on

Today we're introducing exportable audience maps . . .

map

Do you use maps in your Faraday workflow? Prints? Presentations? Twitter? Let us know how these maps can help, and how we can improve them.

Getting a map image from the platform is now simple:

export

From any saved audience, just click the "Export" button and you'll be able to download a high-res image of the geography in context. Share it with colleagues and partners, add it to reports, or put it on your wall.

Cover letters are writing samples

Andy Rossmeissl on

Right, right, we know that cover letters are dead and that nobody reads them, etc.

Don't listen to these clowns. Whether you're hiring or applying, please don't skip the cover letter. That's because (repeat after me):

Cover letters are writing samples

In many cases it's the only shot you're going to get to prove (if you're applying) or learn (if you're hiring) that the candidate knows how to convey complex ideas (a whole person!) briefly and persuasively (you're selling yourself, after all).

How to migrate your Hubspot blog to GitHub Pages, Jekyll, or somewhere else

Andy Rossmeissl on

Hubspot can be a great tool depending on the size/structure of your business, but it's not for everybody. If you find yourself wanting to move your blog off of Hubspot's COS, you've probably already found Hubspot's export documentation—and probably (like me) found the resulting data lacking.

(If you haven't done this, here's a spoiler: Hubspot exports each post as a separate html file, fully rendered complete with all of your template code. This isn't very helpful if you've built a new template somewhere else that you want to insert your old post content in.)

Here's how to migrate your blog

There's two pieces of info you need before you get started:

  1. Hubspot API key — You can get this here
  2. Hubspot blog ID — From your Hubspot dashboard, choose Content → Blog and choose the blog you want to export from the dropdown at the top. You'll find the blog's numeric ID in the URL. (The URL will have 2 numbers in it: you want the second, likely larger, number, as the first is your Hubspot account ID.)

Next, you should download this gist by clicking the "Download ZIP" button and unzip it somewhere. You should review the code here to make sure I'm not doing anything nefarious to your Hubspot data.

Then, to perform the export, you'll run the following commands in your shell (assuming you have a modern Ruby with Bundler installed):

$ cd path-to-gist
$ bundle
$ HUBSPOT_API_KEY=XXX HUBSPOT_BLOG_ID=YYY bundle exec ruby export_hubspot_blog_posts.rb

There should now be a blog directory in there with a markdown file for each post you exported.

Customizing for your migration target

By default the script assumes you're trying to move to Github Pages or some other Jekyll-powered blog host. If you're trying to go somewhere else with your post content, you may find the script to be a good starting point. Inside the loop, you have easy access to all of the data you need to generate a corresponding new post within the new blog.

Moving posts to GitHub Pages/Jekyll

Just copy the blog directory over to your blog repo. You'll probably want to make sure all these posts get put inside a layout by putting something like this in your _config.yml:

defaults:  
  -
    scope:
      path: "blog"
    values:
      layout: "post"

And that's it!

Suggestions?

If you have ideas to make this better (or have found an error) please tweet us at @faradayio.

Hiring a Sales Development Rep

Seamus Abshere on

If you can hustle leads, we want you.

What's Faraday?

We're a venture-backed software startup that uses maps, data, and predictions to help companies find and reach the right families for important purchases. 8 of the top 10 solar companies in the U.S. use Faraday---it's our biggest industry---but we're branching out into other meaningful verticals this year: higher ed, insurance, travel, fitness, etc. 12-person team, about half Midd grads, great office on Pine St in Burlington (AKA the tech/art/brewery district).

What's an SDR?

Stands for Sales Development Representative---you use email/phone/social to generate leads which you then hand off to a salesperson to close. It's a great way to get into the tech/SaaS world and really doesn't require any previous experience . . . just tenacity, comfort talking to strangers, comfort with technology, and eagerness to hustle.

We pay well, have good benefits, and have a fancy lock on our office door that you control with your PHONE. Seriously.

Check out the SDR job description and get in touch.