Pandas read_csv() method is used to read CSV file into DataFrame object. The CSV file is like a two-dimensional table where the values are separated using a delimiter.
熊猫的read_csv()方法用于将CSV文件读取到DataFrame对象中。 CSV文件就像一个二维表,其中的值使用定界符分隔。
Let’s say we have a CSV file “employees.csv” with the following content.
假设我们有一个包含以下内容的CSV文件“ employees.csv”。
Emp ID,Emp Name,Emp Role
1,Pankaj Kumar,Admin
2,David Lee,Editor
3,Lisa Ray,Author
Let’s see how to read it into a DataFrame using Pandas read_csv() function.
让我们看看如何使用Pandas read_csv()函数将其读取到DataFrame中。
import pandas
emp_df = pandas.read_csv('employees.csv')
print(emp_df)
Output:
输出:
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
The default delimiter of a CSV file is a comma. But, we can use any other delimiter too. Let’s say our CSV file delimiter is #.
CSV文件的默认定界符是逗号。 但是,我们也可以使用任何其他定界符。 假设我们的CSV文件分隔符为#。
Emp ID#Emp Name#Emp Role
1#Pankaj Kumar#Admin
2#David Lee#Editor
3#Lisa Ray#Author
In this case, we can specify the sep
parameter while calling read_csv() function.
在这种情况下,我们可以在调用read_csv()函数时指定sep
参数。
import pandas
emp_df = pandas.read_csv('employees.csv', sep='#')
print(emp_df)
Output:
输出:
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
We can specify usecols
parameter to read specific columns from the CSV file. This is very helpful when the CSV file has many columns but we are interested in only a few of them.
我们可以指定usecols
参数来从CSV文件读取特定的列。 当CSV文件包含许多列但我们仅对其中几列感兴趣时,这将非常有用。
import pandas
emp_df = pandas.read_csv('employees.csv', usecols=['Emp Name', 'Emp Role'])
print(emp_df)
Output:
输出:
Emp Name Emp Role
0 Pankaj Kumar Admin
1 David Lee Editor
2 Lisa Ray Author
It’s not mandatory to have a header row in the CSV file. If the CSV file doesn’t have header row, we can still read it by passing header=None
to the read_csv() function.
在CSV文件中包含标题行不是强制性的。 如果CSV文件没有标题行,我们仍然可以通过将header=None
传递给read_csv()函数来读取它。
Let’s say our employees.csv file has the following content.
假设我们的employee.csv文件具有以下内容。
1,Pankaj Kumar,Admin
2,David Lee,Editor
Let’s see how to read this CSV file into a DataFrame object.
让我们看看如何将此CSV文件读取到DataFrame对象中。
import pandas
emp_df = pandas.read_csv('employees.csv', header=None)
print(emp_df)
Output:
输出:
0 1 2
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
Notice that the column headers are auto-assigned from 0 to N. We can pass these column values in the usecols
parameter to read specific columns.
请注意,列标题是从0到N自动分配的。我们可以在usecols
参数中传递这些列值以读取特定的列。
import pandas
emp_df = pandas.read_csv('employees.csv', header=None, usecols=[1])
print(emp_df)
Output:
输出:
1
0 Pankaj Kumar
1 David Lee
We can also specify the row for the header value. Any rows before the header row will be discarded. Let’s say the CSV file has the following data.
我们还可以为标题值指定行。 标头行之前的任何行都将被丢弃。 假设CSV文件包含以下数据。
# some random data
invalid data
Emp ID,Emp Name,Emp Role
1,Pankaj Kumar,Admin
2,David Lee,Editor
3,Lisa Ray,Author
The header data is present in the 3rd row. So we have to pass header=2
to read the CSV data from the file.
标题数据位于第三行。 因此,我们必须传递header=2
才能从文件中读取CSV数据。
import pandas
emp_df = pandas.read_csv('employees.csv', header=2)
print(emp_df)
Output:
输出:
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
We can pass the skiprows
parameter to skip rows from the CSV file. Let’s say we want to skip the 3rd and 4th line from our original CSV file.
我们可以传递skiprows
参数来跳过CSV文件中的行。 假设我们要从原始CSV文件中跳过第三行和第四行。
import pandas
emp_df = pandas.read_csv('employees.csv', skiprows=[2, 3])
print(emp_df)
Output:
输出:
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
Let’s say our CSV file delimiter is ‘##’ i.e. multiple characters.
假设我们的CSV文件分隔符为“ ##”,即多个字符。
Emp ID##Emp Name##Emp Role
1##Pankaj Kumar##Admin
2##David Lee##Editor
3##Lisa Ray##Author
Let’s see what happens when we try to read this CSV file.
让我们看看尝试读取此CSV文件时会发生什么。
import pandas
emp_df = pandas.read_csv('employees.csv', sep='##')
print(emp_df)
Output:
输出:
/Users/pankaj/Documents/PycharmProjects/AskPython/hello-world/journaldev/pandas/pandas_read_csv.py:5: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
emp_df = pandas.read_csv('employees.csv', sep='##')
Emp ID Emp Name Emp Role
0 1 Pankaj Kumar Admin
1 2 David Lee Editor
2 3 Lisa Ray Author
We can avoid the warning by specifying the ‘engine’ parameter in the read_csv() function.
我们可以通过在read_csv()函数中指定'engine'参数来避免警告。
emp_df = pandas.read_csv('employees.csv', sep='##', engine='python')
There are two parser engines – c and python. The C parser engine is faster and default but the python parser engine is more feature complete.
有两个解析器引擎– c和python。 C解析器引擎更快且默认,但python解析器引擎功能更完善。
翻译自: https://www.journaldev.com/33316/pandas-read-csv-reading-csv-file-to-dataframe