python载入xls数据
Microsoft Excel is a very widely used spread sheet program. Its user friendliness and appealing features makes it a very frequently used tool in Data Science. The Panadas library provides features using which we can read the Excel file in full as well as in parts for only a selected group of Data. We can also read an Excel file with multiple sheets in it. We use the read_excel function to read the data from it.
Microsoft Excel是一个非常广泛使用的电子表格程序。 它的用户友好性和吸引人的功能使其成为数据科学中非常常用的工具。 Panadas库提供了一些功能,通过这些功能,我们可以全部或部分读取Excel文件,而仅读取选定的一组数据。 我们还可以读取包含多个工作表的Excel文件。 我们使用read_excel函数从中读取数据。
We Create an excel file with multiple sheets in the windows OS. The Data in the different sheets is as shown below.
我们在Windows操作系统中创建具有多个工作表的excel文件。 不同工作表中的数据如下所示。
You can create this file using the Excel Program in windows OS. Save the file as input.xlsx.
您可以使用Windows OS中的Excel程序创建此文件。 将文件另存为input.xlsx 。
# Data in Sheet1
id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Tusar,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
5,Gary,843.25,2015-03-27,Finance
6,Rasmi,578,2013-05-21,IT
7,Pranab,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance
# Data in Sheet2
id name zipcode
1 Rick 301224
2 Dan 341255
3 Tusar 297704
4 Ryan 216650
5 Gary 438700
6 Rasmi 665100
7 Pranab 341211
8 Guru 347480
The read_excel function of the pandas library is used read the content of an Excel file into the python environment as a pandas DataFrame. The function can read the files from the OS by using proper path to the file. By default, the function will read Sheet1.
pandas库的read_excel函数用于将Excel文件的内容作为pandas DataFrame读取到python环境中。 该功能可以通过使用正确的文件路径从OS读取文件。 默认情况下,该函数将读取Sheet1。
import pandas as pd
data = pd.read_excel('path/input.xlsx')
print (data)
When we execute the above code, it produces the following result. Please note how an additional column starting with zero as a index has been created by the function.
当我们执行上面的代码时,它产生以下结果。 请注意,该函数是如何创建从零开始作为索引的附加列的。
id name salary start_date dept
0 1 Rick 623.30 2012-01-01 IT
1 2 Dan 515.20 2013-09-23 Operations
2 3 Tusar 611.00 2014-11-15 IT
3 4 Ryan 729.00 2014-05-11 HR
4 5 Gary 843.25 2015-03-27 Finance
5 6 Rasmi 578.00 2013-05-21 IT
6 7 Pranab 632.80 2013-07-30 Operations
7 8 Guru 722.50 2014-06-17 Finance
Similar to what we have already seen in the previous chapter to read the CSV file, the read_excel function of the pandas library can also be used to read some specific columns and specific rows. We use the multi-axes indexing method called .loc() for this purpose. We choose to display the salary and name column for some of the rows.
类似于我们在上一章中已经看到的读取CSV文件的方法,pandas库的read_excel函数也可以用于读取某些特定的列和特定的行。 为此,我们使用称为.loc()的多轴索引方法。 我们选择显示某些行的薪水和姓名列。
import pandas as pd
data = pd.read_excel('path/input.xlsx')
# Use the multi-axes indexing funtion
print (data.loc[[1,3,5],['salary','name']])
When we execute the above code, it produces the following result.
当我们执行上面的代码时,它产生以下结果。
salary name
1 515.2 Dan
3 729.0 Ryan
5 578.0 Rasmi
Multiple sheets with different Data formats can also be read by using read_excel function with help of a wrapper class named ExcelFile. It will read the multiple sheets into memory only once. In the below example we read sheet1 and sheet2 into two data frames and print them out individually.
在名为ExcelFile的包装器类的帮助下,也可以使用read_excel函数读取具有不同数据格式的多张工作表。 它只会将多张纸读入内存一次。 在下面的示例中,我们将sheet1和sheet2读入两个数据帧,并分别打印出来。
import pandas as pd
with pd.ExcelFile('C:/Users/Rasmi/Documents/pydatasci/input.xlsx') as xls:
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')
print("****Result Sheet 1****")
print (df1[0:5]['salary'])
print("")
print("***Result Sheet 2****")
print (df2[0:5]['zipcode'])
When we execute the above code, it produces the following result.
当我们执行上面的代码时,它产生以下结果。
****Result Sheet 1****
0 623.30
1 515.20
2 611.00
3 729.00
4 843.25
Name: salary, dtype: float64
***Result Sheet 2****
0 301224
1 341255
2 297704
3 216650
4 438700
Name: zipcode, dtype: int64
翻译自: https://www.tutorialspoint.com/python_data_science/python_processing_xls_data.htm
python载入xls数据