The read_excel()
method can read Excel 2003 (.xls
) and Excel 2007+ (.xlsx
) files using the xlrd
Python module. The to_excel()
instance method is used for saving a DataFrame
to Excel. Generally the semantics are similar to working with csv data. See the cookbook for some advanced strategies.
In the most basic use-case, read_excel
takes a path to an Excel file, and the sheet_name
indicating which sheet to parse.
# Returns a DataFrame
read_excel('path_to_file.xls', sheet_name='Sheet1')
ExcelFile
class
To facilitate working with multiple sheets from the same file, the ExcelFile
class can be used to wrap the file and can be passed into read_excel
There will be a performance benefit for reading multiple sheets as the file is read into memory only once.
xlsx = pd.ExcelFile('path_to_file.xls')
df = pd.read_excel(xlsx, 'Sheet1')
The ExcelFile
class can also be used as a context manager.
with pd.ExcelFile('path_to_file.xls') as xls:
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')
The sheet_names
property will generate a list of the sheet names in the file.
The primary use-case for an ExcelFile
is parsing multiple sheets with different parameters:
data = {}
# For when Sheet1's format differs from Sheet2
with pd.ExcelFile('path_to_file.xls') as xls:
data['Sheet1'] = pd.read_excel(xls, 'Sheet1', index_col=None, na_values=['NA'])
data['Sheet2'] = pd.read_excel(xls, 'Sheet2', index_col=1)
Note that if the same parsing parameters are used for all sheets, a list of sheet names can simply be passed to read_excel
with no loss in performance.
# using the ExcelFile class
data = {}
with pd.ExcelFile('path_to_file.xls') as xls:
data['Sheet1'] = read_excel(xls, 'Sheet1', index_col=None, na_values=['NA'])
data['Sheet2'] = read_excel(xls, 'Sheet2', index_col=None, na_values=['NA'])
# equivalent using the read_excel function
data = read_excel('path_to_file.xls', ['Sheet1', 'Sheet2'], index_col=None, na_values=['NA'])
Specifying Sheets
Note
The second argument is sheet_name
, not to be confused with ExcelFile.sheet_names
.
Note
An ExcelFile’s attribute sheet_names
provides access to a list of sheets.
sheet_name
allows specifying the sheet or sheets to read.sheet_name
is 0, indicating to read the first sheetNone
to return a dictionary of all available sheets.# Returns a DataFrame
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
Using the sheet index:
# Returns a DataFrame
read_excel('path_to_file.xls', 0, index_col=None, na_values=['NA'])
Using all default values:
# Returns a DataFrame
read_excel('path_to_file.xls')
Using None to get all sheets:
# Returns a dictionary of DataFrames
read_excel('path_to_file.xls', sheet_name=None)
Using a list to get multiple sheets:
# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
read_excel('path_to_file.xls', sheet_name=['Sheet1', 3])
read_excel
can read more than one sheet, by setting sheet_name
to either a list of sheet names, a list of sheet positions, or None
to read all sheets. Sheets can be specified by sheet index or sheet name, using an integer or string, respectively.