python载入xls数据_Python-处理XLS数据

python载入xls数据_Python-处理XLS数据_第1张图片

python载入xls数据

Python-处理XLS数据 (Python - Processing XLS Data)

Microsoft Excel is a very widely used spread sheet program. Its user friendliness and appealing features makes it a very frequently used tool in Data Science. The Panadas library provides features using which we can read the Excel file in full as well as in parts for only a selected group of Data. We can also read an Excel file with multiple sheets in it. We use the read_excel function to read the data from it.

Microsoft Excel是一个非常广泛使用的电子表格程序。 它的用户友好性和吸引人的功能使其成为数据科学中非常常用的工具。 Panadas库提供了一些功能,通过这些功能,我们可以全部或部分读取Excel文件,而仅读取选定的一组数据。 我们还可以读取包含多个工作表的Excel文件。 我们使用read_excel函数从中读取数据。

输入为Excel文件 (Input as Excel File)

We Create an excel file with multiple sheets in the windows OS. The Data in the different sheets is as shown below.

我们在Windows操作系统中创建具有多个工作表的excel文件。 不同工作表中的数据如下所示。

You can create this file using the Excel Program in windows OS. Save the file as input.xlsx.

您可以使用Windows OS中的Excel程序创建此文件。 将文件另存为input.xlsx


# Data in Sheet1

id,name,salary,start_date,dept
1,Rick,623.3,2012-01-01,IT
2,Dan,515.2,2013-09-23,Operations
3,Tusar,611,2014-11-15,IT
4,Ryan,729,2014-05-11,HR
5,Gary,843.25,2015-03-27,Finance
6,Rasmi,578,2013-05-21,IT
7,Pranab,632.8,2013-07-30,Operations
8,Guru,722.5,2014-06-17,Finance

# Data in Sheet2

id	name	zipcode
1	Rick	301224
2	Dan	341255
3	Tusar	297704
4	Ryan	216650
5	Gary	438700
6	Rasmi	665100
7	Pranab	341211
8	Guru	347480

读取Excel文件 (Reading an Excel File)

The read_excel function of the pandas library is used read the content of an Excel file into the python environment as a pandas DataFrame. The function can read the files from the OS by using proper path to the file. By default, the function will read Sheet1.

pandas库的read_excel函数用于将Excel文件的内容作为pandas DataFrame读取到python环境中。 该功能可以通过使用正确的文件路径从OS读取文件。 默认情况下,该函数将读取Sheet1。


import pandas as pd
data = pd.read_excel('path/input.xlsx')
print (data)

When we execute the above code, it produces the following result. Please note how an additional column starting with zero as a index has been created by the function.

当我们执行上面的代码时,它产生以下结果。 请注意,该函数是如何创建从零开始作为索引的附加列的。


   id    name  salary  start_date        dept
0   1    Rick  623.30  2012-01-01          IT
1   2     Dan  515.20  2013-09-23  Operations
2   3   Tusar  611.00  2014-11-15          IT
3   4    Ryan  729.00  2014-05-11          HR
4   5    Gary  843.25  2015-03-27     Finance
5   6   Rasmi  578.00  2013-05-21          IT
6   7  Pranab  632.80  2013-07-30  Operations
7   8    Guru  722.50  2014-06-17     Finance


读取特定的列和行 (Reading Specific Columns and Rows)

Similar to what we have already seen in the previous chapter to read the CSV file, the read_excel function of the pandas library can also be used to read some specific columns and specific rows. We use the multi-axes indexing method called .loc() for this purpose. We choose to display the salary and name column for some of the rows.

类似于我们在上一章中已经看到的读取CSV文件的方法,pandas库的read_excel函数也可以用于读取某些特定的列和特定的行。 为此,我们使用称为.loc()的多轴索引方法。 我们选择显示某些行的薪水和姓名列。


import pandas as pd
data = pd.read_excel('path/input.xlsx')

# Use the multi-axes indexing funtion
print (data.loc[[1,3,5],['salary','name']])

When we execute the above code, it produces the following result.

当我们执行上面的代码时,它产生以下结果。


   salary   name
1   515.2    Dan
3   729.0   Ryan
5   578.0  Rasmi

读取多个Excel工作表 (Reading Multiple Excel Sheets)

Multiple sheets with different Data formats can also be read by using read_excel function with help of a wrapper class named ExcelFile. It will read the multiple sheets into memory only once. In the below example we read sheet1 and sheet2 into two data frames and print them out individually.

在名为ExcelFile的包装器类的帮助下,也可以使用read_excel函数读取具有不同数据格式的多张工作表。 它只会将多张纸读入内存一次。 在下面的示例中,我们将sheet1和sheet2读入两个数据帧,并分别打印出来。


import pandas as pd
with pd.ExcelFile('C:/Users/Rasmi/Documents/pydatasci/input.xlsx') as xls:
    df1 = pd.read_excel(xls, 'Sheet1')
    df2 = pd.read_excel(xls, 'Sheet2')

print("****Result Sheet 1****")
print (df1[0:5]['salary'])
print("")
print("***Result Sheet 2****")
print (df2[0:5]['zipcode'])

When we execute the above code, it produces the following result.

当我们执行上面的代码时,它产生以下结果。


****Result Sheet 1****
0    623.30
1    515.20
2    611.00
3    729.00
4    843.25
Name: salary, dtype: float64

***Result Sheet 2****
0    301224
1    341255
2    297704
3    216650
4    438700
Name: zipcode, dtype: int64


翻译自: https://www.tutorialspoint.com/python_data_science/python_processing_xls_data.htm

python载入xls数据

你可能感兴趣的:(python,excel,linux,tensorflow,机器学习)