AI, startup hacks, and engineering miracles from your friends at Faraday

How to aggregate JSONB in PostgreSQL 9.5+

Seamus Abshere on

This is part of our series on PostgreSQL and things that are obvious once you see them. It's a 2017 update to our 2016 article How to merge JSON fields in Postgres Nice!

Update We used to call this jsonb_collect, but then we realized it was very similar to json_object_agg(name, value)... except that there is no version of that function that just takes an expression. So, copying jsonb_agg(expression), we give you...

How do you aggregate (aka merge aka combine aka collect) JSONB fields in Postgres 9.5+? You define a new aggregate jsonb_object_agg(expression):

CREATE AGGREGATE jsonb_object_agg(jsonb) (  
  SFUNC = 'jsonb_concat',
  STYPE = jsonb,
  INITCOND = '{}'
);

Here's how you use it:

# select * from greetings;
         data
-----------------------
 {"es":"Saludos"}
 {"en":"Hello"}
 {"ru":"Здравствуйте"}
(3 rows)

# select jsonb_object_agg(data) from greetings;
                        jsonb_object_agg
-------------------------------------------------------------
 { "es" : "Saludos", "en" : "Hello", "ru" : "Здравствуйте" }
(1 row)

If you're curious about the aggregate we just added, note that jsonb_concat(jsonb, jsonb) is the function backing || that was introduced in Postgres 9.5.

It's just that simple!

scrubcsv: clean CSVs, drop bad lines

Seamus Abshere on

This is part of our series on things that are obvious once you see them - and our data science series because it belongs in your toolchain.

Lies, damn lies, and commercial CSV export modules. Who wrote these things? On what planet would this be acceptable? Whatever.

Name,What's wrong  
"Robert "Bob" Smith",quotes inside quotes
Robert "Bob" Smith,quotes in the middle  
Robert Bob" Smith,unmatched quote  

Ruby dies immediately trying to read it:

$ irb
irb(main):001:0> require 'csv'  
=> true
irb(main):002:0> CSV.read('broken.csv')  
CSV::MalformedCSVError: Missing or stray quote in line 2  

Introducing scrubcsv, a is a lightning-fast static binary written in Rust that best-effort parses CSV and then immediately dumps back out 100% guaranteed standards-compliant CSV. Top speed? About 67mb/s.

$ scrubcsv broken.csv > fixed.csv
4 rows (0 bad) in 0.00 seconds, 787.13 KiB/sec

$ cat fixed.csv
Name,What's wrong  
"Robert Bob"" Smith""",quotes inside quotes
"Robert ""Bob"" Smith",quotes in the middle
"Robert Bob"" Smith",unmatched quote

It uses BurntSushi's world-beating CSV parser which is almost certainly faster than your SSD.

How to permanently delete versioned objects from S3

Seamus Abshere on

This is part of our cloud security and things that are obvious once you see them series. Duhh... safe!

Amazon's explanation of deleting a versioned object and the SDK documentation do not give an example of permanently deleting a versioned object. Here's how to do it:

require 'aws-sdk'

s3 = Aws::S3::Resource.new(  
  region: 'us-east-1',
  access_key_id: ACCESS_KEY_ID,
  secret_access_key: SECRET_ACCESS_KEY
)
bucket = s3.bucket('my-versioned-bucket')

bucket.objects.each do |object_summary|  
  o = bucket.object object_summary.key
  # this is the secret: specify the version while deleting
  o.delete version_id: o.version_id
end  

If you don't specify the version, you get a delete marker, which you can proceed to delete infinite times and it will not go away :)