My CSV files contains a header with 16 columns. The data lines contains 16 values separated with ",".
Just found that some lines contains values within "" that contains ,. This is confusing the parser. Instead of expecting 15 commas, it finds 18. One example below:
"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup","**7,2g**","W","Decorative range","5x**1,2**g Eye Shadow + **1,2**g Powder","http://image.jpg","","3660732000104","","No","","1","1"
How can make the parser ignore the comma sign within ""?
My code looks like this:
import pandas
csv1 = pandas.read_csv('Produktlista.csv', quoting=3)
csv2 = pandas.read_csv('Prislista.csv', quoting= 3)
merged = csv1.merge(csv2, on='id')
merged.to_csv("output.csv", index=False, quoting=3)
解决方案
Pass param quotechar='"'. From the Pandas Documentation:
quotechar : str (length 1), optional
The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
e.g.:
In [9]:
t='''"23210","Cosmetic","Lancome","Eyes Virtuose Palette Makeup","7,2g","W","Decorative range","5x1,2g Eye Shadow + 1,2g Powder","http://image.jpg","","3660732000104","","No","","1","1"'''
df = pd.read_csv(io.StringIO(t), quotechar='"', header=None)
df
Out[9]:
0 1 2 3 4 5 \
0 23210 Cosmetic Lancome Eyes Virtuose Palette Makeup 7,2g W
6 7 8 9 \
0 Decorative range 5x1,2g Eye Shadow + 1,2g Powder http://image.jpg NaN
10 11 12 13 14 15
0 3660732000104 NaN No NaN 1 1