Startup hacks and engineering miracles from your exhausted friends at Faraday

Good, strict default warnings for Rust code with Clippy

Eric Kidd on

Do you maintain a project written in Rust? Would you like to enable really aggressive warnings to make code review easier? Here's what it looks like:

We can set this up using a combination of Rust's built-in warnings and the excellent Clippy package. For best results with Clippy, it's easiest to install the nightly release of Rust:

curl https://sh.rustup.rs -sSf | sh  
rustup update nightly  

You can then update your Cargo.toml file as follows:

[features]
unstable = ["clippy"]

[dependencies]
clippy = { version = "0.0.*", optional = true }  

Near the top of your project's src/lib.rs file, add this enormous list of warnings:

// Enable clippy if our Cargo.toml file asked us to do so.
#![cfg_attr(feature="clippy", feature(plugin))]
#![cfg_attr(feature="clippy", plugin(clippy))]

// Enable as many useful Rust and Clippy warnings as we can stand.  We'd
// also enable `trivial_casts`, but we're waiting for
// https://github.com/rust-lang/rust/issues/23416.
#![warn(missing_copy_implementations,
        missing_debug_implementations,
        missing_docs,
        trivial_numeric_casts,
        unsafe_code,
        unused_extern_crates,
        unused_import_braces,
        unused_qualifications)]
#![cfg_attr(feature="clippy", warn(cast_possible_truncation))]
#![cfg_attr(feature="clippy", warn(cast_possible_wrap))]
#![cfg_attr(feature="clippy", warn(cast_precision_loss))]
#![cfg_attr(feature="clippy", warn(cast_sign_loss))]
#![cfg_attr(feature="clippy", warn(missing_docs_in_private_items))]
#![cfg_attr(feature="clippy", warn(mut_mut))]
// Disallow `println!`. Use `debug!` for debug output
// (which is provided by the `log` crate).
#![cfg_attr(feature="clippy", warn(print_stdout))]
// This allows us to use `unwrap` on `Option` values (because doing makes
// working with Regex matches much nicer) and when compiling in test mode
// (because using it in tests is idiomatic).
#![cfg_attr(all(not(test), feature="clippy"), warn(result_unwrap_used))]
#![cfg_attr(feature="clippy", warn(unseparated_literal_suffix))]
#![cfg_attr(feature="clippy", warn(wrong_pub_self_convention))]

For documentation about individual warnings, run rustc -W help and look at the Clippy lint list. This will show you all available warnings, and explain what each one does.

If you also want to turn warnings into compilation errors, you can add:

// Fail hard on warnings.  This will be automatically disabled when we're
// used as a dependency by other crates, thanks to Cargo magic.
#![deny(warnings)]

To see your new warnings, run:

rustup run nightly cargo build --features unstable  

This will compile your code using the nightly build of Rust, activating the unstable feature that we defined in Cargo.toml, which will in turn activate the clippy feature. Surprisingly, even with all these warnings activated, I see very few false positives in practice.

Because Clippy uses nightly Rust, you may occasionally get compilation errors. If this happens, get the latest nightly build and the latest version of Clippy:

rustup update nightly  
cargo update  

If this still doesn't work, just leave off --features unstable and try again tomorrow.

How to convert a fixed-width file into CSV

Seamus Abshere on

This is part of our data science series. How predictive!

(The more valuable and massive a data set is, the less likely it's in a format you can just parse. Has anybody else noticed that?)

Here's how to convert a fixed-width file to CSV with the standard GNU unix tool gawk:

Theoretical

Thanks to stackoverflow: (reproducing verbatim)

gawk '$1=$1' OFS=, FIELDWIDTHS='4 2 5 1 1' infile > outfile.csv  

Where FIELDWIDTHS is a list of field widths and OFS is the output file separator.

Real life

In real life, fixed width files contain commas and double quotes.

# put this in a file called fixed2csv.awk
{
  for (i=1;i<=NF;i++) {
    sub(/\s+$/,"",$i)
    sub("\"","\"\"",$i)
    printf "\"%s\"%s", $i, (i<NF?OFS:ORS)
  }
}

Then run it on your data:

gawk -f fixed2csv.awk OFS=, FIELDWIDTHS='4 2 5 1 1' infile > outfile.csv  

Thanks to Ed Morton on Stackoverflow for inspiration!

How to permanently delete versioned objects from S3

Seamus Abshere on

This is part of our cloud security and things that are obvious once you see them series. Duhh... safe!

Amazon's explanation of deleting a versioned object and the SDK documentation do not give an example of permanently deleting a versioned object. Here's how to do it:

require 'aws-sdk'

s3 = Aws::S3::Resource.new(  
  region: 'us-east-1',
  access_key_id: ACCESS_KEY_ID,
  secret_access_key: SECRET_ACCESS_KEY
)
bucket = s3.bucket('my-versioned-bucket')

bucket.objects.each do |object_summary|  
  o = bucket.object object_summary.key
  # this is the secret: specify the version while deleting
  o.delete version_id: o.version_id
end  

If you don't specify the version, you get a delete marker, which you can proceed to delete infinite times and it will not go away :)

How to enable S3 server-side encryption for existing objects

Seamus Abshere on

This is part of our cloud security series.

Do you have unencrypted S3 objects lying around? Don't! Here's the safe way to retroactively enable server-side encryption:

Step 1: Make a backup bucket

AWS management console is easiest. Call it [my-bucket]-backup.

Step 2: Copy one way

require 'aws-sdk'

s3 = Aws::S3::Resource.new(region: 'us-east-1', access_key_id: ACCESS_KEY_ID, secret_access_key: SECRET_ACCESS_KEY)  
b1 = s3.bucket('my-bucket')  
b2 = s3.bucket('my-bucket-backup')

# or no prefix if you want everything
b1.objects(prefix: 'xyz').each do |object_summary|  
  o1 = b1.object object_summary.key
  o2 = b2.object object_summary.key
  o1.copy_to o2, server_side_encryption: 'AES256'
end  

Step 3: Sanity check

Now look at [my-bucket]-backup - it's probably 100% perfect, but just reassure yourself.

Step 4: Copy back over

There are 2 changes here, so you might want to copy-paste:

b2.objects.each do |object_summary|  
  o1 = b1.object object_summary.key
  o2 = b2.object object_summary.key
  o2.copy_to o1, server_side_encryption: 'AES256'
end  

Step 5: (optional) Clean up

Delete [my-bucket]-backup.

Postgres strftime (or: how to group by month)

Seamus Abshere on

This is part of our data science series. How predictive!

"How do you do strftime in postgres?"

The answer: to_char(date, format).

If you want to group by month, this is what you're looking for:

psql=> select count(*), to_char(created_at, 'YYYY-MM') from employees group by to_char(created_at, 'YYYY-MM') order by to_char(created_at,'YYYY-MM') desc;  
 count | to_char
-------+---------
    27 | 2016-08
    32 | 2016-07
    58 | 2016-06
    17 | 2016-05
    57 | 2016-04
    44 | 2016-03
    28 | 2016-02
    45 | 2016-01
    10 | 2015-12
    10 | 2015-11
    24 | 2015-10
    15 | 2015-09
    32 | 2015-08
    38 | 2015-07
    31 | 2015-06
    18 | 2015-05
    19 | 2015-04
     5 | 2015-03
     8 | 2015-02
    10 | 2015-01
     7 | 2014-12
    22 | 2014-11
(22 rows)

That's it.