熊猫read_csv()–将CSV文件读取到DataFrame

Pandas read_csv() method is used to read CSV file into DataFrame object. The CSV file is like a two-dimensional table where the values are separated using a delimiter.

熊猫的read_csv()方法用于将CSV文件读取到DataFrame对象中。 CSV文件就像一个二维表,其中的值使用定界符分隔。

1.熊猫read_csv()示例 (1. Pandas read_csv() Example)

Let’s say we have a CSV file “employees.csv” with the following content.

假设我们有一个包含以下内容的CSV文件“ employees.csv”。

Emp ID,Emp Name,Emp Role
1,Pankaj Kumar,Admin
2,David Lee,Editor
3,Lisa Ray,Author

Let’s see how to read it into a DataFrame using Pandas read_csv() function.

让我们看看如何使用Pandas read_csv()函数将其读取到DataFrame中。

import pandas

emp_df = pandas.read_csv('employees.csv')

print(emp_df)

Output:

输出:

Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin
1       2     David Lee   Editor
2       3      Lisa Ray   Author
Recommended Reading: 推荐读物 : Python Pandas Tutorial Python Pandas教程

2.使用Pandas read_csv()函数指定分隔符 (2. Specifying Delimiter with Pandas read_csv() function)

The default delimiter of a CSV file is a comma. But, we can use any other delimiter too. Let’s say our CSV file delimiter is #.

CSV文件的默认定界符是逗号。 但是,我们也可以使用任何其他定界符。 假设我们的CSV文件分隔符为#。

Emp ID#Emp Name#Emp Role
1#Pankaj Kumar#Admin
2#David Lee#Editor
3#Lisa Ray#Author

In this case, we can specify the sep parameter while calling read_csv() function.

在这种情况下,我们可以在调用read_csv()函数时指定sep参数。

import pandas

emp_df = pandas.read_csv('employees.csv', sep='#')

print(emp_df)

Output:

输出:

Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin
1       2     David Lee   Editor
2       3      Lisa Ray   Author

3.仅从CSV文件中读取特定的列 (3. Reading only specific Columns from the CSV File)

We can specify usecols parameter to read specific columns from the CSV file. This is very helpful when the CSV file has many columns but we are interested in only a few of them.

我们可以指定usecols参数来从CSV文件读取特定的列。 当CSV文件包含许多列但我们仅对其中几列感兴趣时,这将非常有用。

import pandas

emp_df = pandas.read_csv('employees.csv', usecols=['Emp Name', 'Emp Role'])

print(emp_df)

Output:

输出:

Emp Name Emp Role
0  Pankaj Kumar    Admin
1     David Lee   Editor
2      Lisa Ray   Author

4.读取没有标题的CSV文件 (4. Reading CSV File without Header)

It’s not mandatory to have a header row in the CSV file. If the CSV file doesn’t have header row, we can still read it by passing header=None to the read_csv() function.

在CSV文件中包含标题行不是强制性的。 如果CSV文件没有标题行,我们仍然可以通过将header=None传递给read_csv()函数来读取它。

Let’s say our employees.csv file has the following content.

假设我们的employee.csv文件具有以下内容。

1,Pankaj Kumar,Admin
2,David Lee,Editor

Let’s see how to read this CSV file into a DataFrame object.

让我们看看如何将此CSV文件读取到DataFrame对象中。

import pandas

emp_df = pandas.read_csv('employees.csv', header=None)

print(emp_df)

Output:

输出:

0             1       2
0  1  Pankaj Kumar   Admin
1  2     David Lee  Editor
2  3      Lisa Ray  Author

Notice that the column headers are auto-assigned from 0 to N. We can pass these column values in the usecols parameter to read specific columns.

请注意,列标题是从0到N自动分配的。我们可以在usecols参数中传递这些列值以读取特定的列。

import pandas

emp_df = pandas.read_csv('employees.csv', header=None, usecols=[1])

print(emp_df)

Output:

输出:

1
0  Pankaj Kumar
1     David Lee

5.在CSV文件中指定标题行 (5. Specifying Header Row in the CSV File)

We can also specify the row for the header value. Any rows before the header row will be discarded. Let’s say the CSV file has the following data.

我们还可以为标题值指定行。 标头行之前的任何行都将被丢弃。 假设CSV文件包含以下数据。

# some random data
invalid data
Emp ID,Emp Name,Emp Role
1,Pankaj Kumar,Admin
2,David Lee,Editor
3,Lisa Ray,Author

The header data is present in the 3rd row. So we have to pass header=2 to read the CSV data from the file.

标题数据位于第三行。 因此,我们必须传递header=2才能从文件中读取CSV数据。

import pandas

emp_df = pandas.read_csv('employees.csv', header=2)

print(emp_df)

Output:

输出:

Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin
1       2     David Lee   Editor
2       3      Lisa Ray   Author

6.跳过CSV行 (6. Skipping CSV Rows)

We can pass the skiprows parameter to skip rows from the CSV file. Let’s say we want to skip the 3rd and 4th line from our original CSV file.

我们可以传递skiprows参数来跳过CSV文件中的行。 假设我们要从原始CSV文件中跳过第三行和第四行。

import pandas

emp_df = pandas.read_csv('employees.csv', skiprows=[2, 3])

print(emp_df)

Output:

输出:

Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin

7.为Pandas read_csv()函数指定解析器引擎 (7. Specifying Parser Engine for Pandas read_csv() function)

Let’s say our CSV file delimiter is ‘##’ i.e. multiple characters.

假设我们的CSV文件分隔符为“ ##”,即多个字符。

Emp ID##Emp Name##Emp Role
1##Pankaj Kumar##Admin
2##David Lee##Editor
3##Lisa Ray##Author

Let’s see what happens when we try to read this CSV file.

让我们看看尝试读取此CSV文件时会发生什么。

import pandas

emp_df = pandas.read_csv('employees.csv', sep='##')

print(emp_df)

Output:

输出:

/Users/pankaj/Documents/PycharmProjects/AskPython/hello-world/journaldev/pandas/pandas_read_csv.py:5: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  emp_df = pandas.read_csv('employees.csv', sep='##')
   Emp ID      Emp Name Emp Role
0       1  Pankaj Kumar    Admin
1       2     David Lee   Editor
2       3      Lisa Ray   Author

We can avoid the warning by specifying the ‘engine’ parameter in the read_csv() function.

我们可以通过在read_csv()函数中指定'engine'参数来避免警告。

emp_df = pandas.read_csv('employees.csv', sep='##', engine='python')

There are two parser engines – c and python. The C parser engine is faster and default but the python parser engine is more feature complete.

有两个解析器引擎– c和python。 C解析器引擎更快且默认,但python解析器引擎功能更完善。

8.参考 (8. References)

  • pandas read_csv() API Doc

    熊猫read_csv()API文档

翻译自: https://www.journaldev.com/33316/pandas-read-csv-reading-csv-file-to-dataframe

你可能感兴趣的:(python,csv,matlab,javascript,opencv,ViewUI)