sed for csv debug

Instead of fancy and sophisticated thoughts lets share a daily experience 😉

You’ve collected data.csv:

field1,filed2,filed3
A,B,C
D,E,F
f,a,i,l
H,I,J
L,M,N

and while importing to a DB (sqlite in this example) you’ve got:

sqlite> .mode csv
sqlite> .import data.csv data
data.csv:4: expected 3 columns but found 4 - extras ignored

Now, how to find faulty records (while actually source csv is 5G heavy)?

$ sed -n "/\(.*,\)\{3,\}/p" data.csv
f,a,i,l 
Notice that: 
\ - are escape characters for parentheses
( ) - that are enclosing a seeking pattern .*,
{ } - sets a condition of 3 cases at least

One may ask in what circumstances it is relevant? Consider a collection of financial transactions originated from many countries and locales before a normalization – data.csv could look like:

country,currency,value
UK,GBP,1,000.00
IT,EUR,9.999,99
SE,SEK,99999,99