Instead of fancy and sophisticated thoughts lets share a daily experience 😉
You’ve collected data.csv:
field1,filed2,filed3 A,B,C D,E,F f,a,i,l H,I,J L,M,N
and while importing to a DB (sqlite in this example) you’ve got:
sqlite> .mode csv
sqlite> .import data.csv data
data.csv:4: expected 3 columns but found 4 - extras ignored
Now, how to find faulty records (while actually source csv is 5G heavy)?
$ sed -n "/\(.*,\)\{3,\}/p" data.csv f,a,i,l
Notice that:
\ - are escape characters for parentheses
( ) - that are enclosing a seeking pattern .*,
{ } - sets a condition of 3 cases at least
One may ask in what circumstances it is relevant? Consider a collection of financial transactions originated from many countries and locales before a normalization – data.csv could look like:
country,currency,value UK,GBP,1,000.00 IT,EUR,9.999,99 SE,SEK,99999,99