Startup hacks and engineering miracles from your exhausted friends at Faraday

How to convert a fixed-width file into CSV

Seamus Abshere on

This is part of our data science series. How predictive!

(The more valuable and massive a data set is, the less likely it's in a format you can just parse. Has anybody else noticed that?)

Here's how to convert a fixed-width file to CSV with the standard GNU unix tool gawk:

Theoretical

Thanks to stackoverflow: (reproducing verbatim)

gawk '$1=$1' OFS=, FIELDWIDTHS='4 2 5 1 1' infile > outfile.csv  

Where FIELDWIDTHS is a list of field widths and OFS is the output file separator.

Real life

In real life, fixed width files contain commas and double quotes.

# put this in a file called fixed2csv.awk
{
  for (i=1;i<=NF;i++) {
    sub(/\s+$/,"",$i)
    sub("\"","\"\"",$i)
    printf "\"%s\"%s", $i, (i<NF?OFS:ORS)
  }
}

Then run it on your data:

gawk -f fixed2csv.awk OFS=, FIELDWIDTHS='4 2 5 1 1' infile > outfile.csv  

Thanks to Ed Morton on Stackoverflow for inspiration!