pandas 是基于NumPy 的一种工具,该工具是为解决数据分析任务而创建的。Pandas 纳入了大量库和一些标准的数据模型,提供了高效地操作大型数据集所需的工具。pandas提供了大量能使我们快速便捷地处理数据的函数和方法。你很快就会发现,它是使Python成为强大而高效的数据分析环境的重要因素之一。
pandas.read_csv:
pandas.read_csv(filepath_or_buffer, *,
sep=_NoDefault.no_default,
delimiter=None,
header='infer',
names=_NoDefault.no_default, index_col=None,
usecols=None,
dtype=None,
engine=None,
converters=None,
true_values=None,
false_values=None,
skipinitialspace=False,
skiprows=None,
skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=None, infer_datetime_format=_NoDefault.no_default, keep_date_col=False, date_parser=_NoDefault.no_default, date_format=None, dayfirst=False, cache_dates=True, iterator=False, chunksize=None, compression='infer', thousands=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, doublequote=True, escapechar=None, comment=None, encoding=None, encoding_errors='strict', dialect=None, on_bad_lines='error', delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None, storage_options=None, dtype_backend=_NoDefault.no_default)
# python generate_data.py data/data.csv data/
Traceback (most recent call last):
File "/notebooks/xx/generate_data.py", line 62, in
generate_similarity_data(sys.argv[1], sys.argv[2])
File "/notebooks/xx/generate_data.py", line 9, in generate_similarity_data
df = pd.read_csv(path, index_col=False, sep=',', low_memory=False)
File "/root/miniconda3/envs/text_label/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/root/miniconda3/envs/text_label/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 583, in _read
return parser.read(nrows)
File "/root/miniconda3/envs/text_label/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/root/miniconda3/envs/text_label/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 239, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 796, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 884, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 861, in pandas._libs.parsers.TextReader._check_tokenize_status
File "pandas/_libs/parsers.pyx", line 2029, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 17 fields in line 112, saw 18
重点错误:
error: Expected 17 fields in line 112, saw 18
从上面的错误信息可知:
在原始csv文件中的第 112 行,文件标题是 17 个,但发现 18 个列数据
解决方法:
手动处理这行错误,比如转义字符的处理,或者就是多出来一列,删除就好了!!!
解决!
1.https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
2.https://baike.baidu.com/item/pandas/17209606?fr=ge_ala