How to clean up your data in the command line

Open data brain

I work part-time as a data auditor. Think of me as a proofreader who works with tables of data rather than pages of prose. The tables are exported from relational databases and are usually fairly modest in size: 100,000 to 1,000,000 records and 50 to 200 fields.

I haven’t seen an error-free data table, ever. The messiness isn’t limited, as you might think, to duplicate records, spelling and formatting errors, and data items placed in the wrong field. I also find:

read more

Author: dasuberworm

Standing just over 2 meters and hailing from о́стров Ратма́нова, Dasuberworm is a professional cryptologist, entrepreneur and cage fighter. When he's not breaking cyphers and punching people in the face, Das enjoys receiving ominous DHL packages at one of his many drop sites in SE Asia.

Share This Post On