Written by Jakub2021-10-19
sed for csv debug
Instead of fancy and sophisticated thoughts lets share a daily experience 😉
You’ve collected data.csv:
field1,filed2,filed3 A,B,C D,E,F f,a,i,l H,I,J L,M,N
and while importing to a DB (sqlite in this example) you’ve got:
sqlite> .mode csv
sqlite> .import data.csv data
data.csv:4: expected 3 columns but found 4 - extras ignored
Now, how to find faulty records (while actually source csv is 5G heavy)?
$ sed -n "/\(.*,\)\{3,\}/p" data.csv f,a,i,l
Notice that:
\ - are escape characters for parentheses
( ) - that are enclosing a seeking pattern .*,
{ } - sets a condition of 3 cases at least
One may ask in what circumstances it is relevant? Consider a collection of financial transactions originated from many countries and locales before a normalization – data.csv could look like:
country,currency,value UK,GBP,1,000.00 IT,EUR,9.999,99 SE,SEK,99999,99
Archives
Calendar
M | T | W | T | F | S | S |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |