Startup hacks and engineering miracles from your exhausted friends at Faraday

How to aggregate JSONB in PostgreSQL 9.5+

Seamus Abshere on

This is part of our series on PostgreSQL and things that are obvious once you see them. It's a 2017 update to our 2016 article How to merge JSON fields in Postgres Nice!

Update We used to call this jsonb_collect, but then we realized it was very similar to json_object_agg(name, value)... except that there is no version of that function that just takes an expression. So, copying jsonb_agg(expression), we give you...

How do you aggregate (aka merge aka combine aka collect) JSONB fields in Postgres 9.5+? You define a new aggregate jsonb_object_agg(expression):

CREATE AGGREGATE jsonb_object_agg(jsonb) (  
  SFUNC = 'jsonb_concat',
  STYPE = jsonb,
  INITCOND = '{}'

Here's how you use it:

# select * from greetings;
(3 rows)

# select jsonb_object_agg(data) from greetings;
 { "es" : "Saludos", "en" : "Hello", "ru" : "Здравствуйте" }
(1 row)

If you're curious about the aggregate we just added, note that jsonb_concat(jsonb, jsonb) is the function backing || that was introduced in Postgres 9.5.

It's just that simple!

scrubcsv: clean CSVs, drop bad lines

Seamus Abshere on

This is part of our series on things that are obvious once you see them - and our data science series because it belongs in your toolchain.

Lies, damn lies, and commercial CSV export modules. Who wrote these things? On what planet would this be acceptable? Whatever.

Name,What's wrong  
"Robert "Bob" Smith",quotes inside quotes
Robert "Bob" Smith,quotes in the middle  
Robert Bob" Smith,unmatched quote  

Ruby dies immediately trying to read it:

$ irb
irb(main):001:0> require 'csv'  
=> true
CSV::MalformedCSVError: Missing or stray quote in line 2  

Introducing scrubcsv, a is a lightning-fast static binary written in Rust that best-effort parses CSV and then immediately dumps back out 100% guaranteed standards-compliant CSV. Top speed? About 67mb/s.

$ scrubcsv broken.csv > fixed.csv
4 rows (0 bad) in 0.00 seconds, 787.13 KiB/sec

$ cat fixed.csv
Name,What's wrong  
"Robert Bob"" Smith""",quotes inside quotes
"Robert ""Bob"" Smith",quotes in the middle
"Robert Bob"" Smith",unmatched quote

It uses BurntSushi's world-beating CSV parser which is almost certainly faster than your SSD.

How to permanently delete versioned objects from S3

Seamus Abshere on

This is part of our cloud security and things that are obvious once you see them series. Duhh... safe!

Amazon's explanation of deleting a versioned object and the SDK documentation do not give an example of permanently deleting a versioned object. Here's how to do it:

require 'aws-sdk'

s3 =  
  region: 'us-east-1',
  access_key_id: ACCESS_KEY_ID,
  secret_access_key: SECRET_ACCESS_KEY
bucket = s3.bucket('my-versioned-bucket')

bucket.objects.each do |object_summary|  
  o = bucket.object object_summary.key
  # this is the secret: specify the version while deleting
  o.delete version_id: o.version_id

If you don't specify the version, you get a delete marker, which you can proceed to delete infinite times and it will not go away :)

Obvious once you see it: 1-click Excel autofilter + split

Seamus Abshere on

This is part of our series on things that are obvious once you see them. Duh!

Excel's autofilter and freeze panes are a big reason we keep buying Microsoft Office. Add both with 1 click (hint: the smiley face)

the macro in action

Sub faraday_io_autofilter_split()  
  ActiveWindow.FreezePanes = True
  Cells(1, 1).Select
End Sub  
  1. Tools -> Macro -> Visual Basic Editor -> double click Module1 of Personal Macro Workbook
  2. Paste the code above (starts with Sub faraday_io_autofilter_split())
  3. File -> Save Personal Macro Workbook
  4. Excel -> Close and Return to Microsoft Excel
  5. Right-click toolbar -> Customize Toolbars and Menus -> Commands -> Macros -> drag Custom Button to the menubar (it's a ☺)
  6. Right-click new button -> Assign Macro... -> select 'Personal Macro Workbook'!faraday_io_autofilter_split

a screencast of the instructions

How to merge JSON fields in PostgreSQL

Seamus Abshere on

This is part of our series on PostgreSQL and things that are obvious once you see them. Nice!

Deprecated! Please see our new article How to aggregate JSONB in PostgreSQL 9.5+.

How do you combine (merge) JSON fields in Postgres? You define a new function json_collect()

# select * from greetings;
(3 rows)

# select json_collect(data) from greetings;
 { "es" : "Saludos", "en" : "Hello", "ru" : "Здравствуйте" }
(1 row)

json_collect is not built into Postgres, though. You have to create it yourself:

-- needed by json_collect
CREATE FUNCTION json_merge(data json, merge_data json) RETURNS json LANGUAGE sql IMMUTABLE  
AS $$  
  SELECT json_object_agg(key, value)
  FROM (
    WITH to_merge AS (
      SELECT * FROM json_each(merge_data)
    SELECT *
    FROM json_each(data)
    WHERE key NOT IN (SELECT key FROM to_merge)
    SELECT * FROM to_merge
  ) t;

CREATE AGGREGATE json_collect(json) (  
  SFUNC = json_merge,
  STYPE = json,
  INITCOND = '{}'

Thanks to @matheusoliveira!