aws python库
Many Python developers in the financial world are tasked with creating Excel documents for analysis by non-technical users.
金融界的许多Python开发人员的任务是创建Excel文档,以供非技术用户进行分析。
This is actually a lot harder than it sounds. From sourcing the data to formatting the spreadsheet to deploying the final doc in a central location, there are plenty of steps involved in the process.
这实际上比听起来要难得多。 从获取数据到格式化电子表格,再到在中心位置部署最终文档,该过程涉及很多步骤。
In this tutorial, I'm going to show you how to create Excel spreadsheets using Python that:
在本教程中,我将向您展示如何使用Python创建Excel电子表格:
Automatically update daily using the cron
command line utility
使用cron
命令行实用程序每天自动更新
IEX Cloud is the data provider subsidiary of the IEX stock exchange.
IEX Cloud是IEX股票交易所的数据提供商子公司。
In case you're unfamiliar with IEX, it is an acronym for "The Investor's Exchange". IEX was founded by Brad Katsuyama to build a better stock exchange that avoids investor-unfriendly behavior like front-running and high-frequency trading. Katsuyama's exploits were famously chronicled in Michael Lewis' best-selling book Flash Boys.
如果您不熟悉IEX,则它是“投资者交易所”的缩写。 IEX由布拉德·胜山(Brad Katsuyama)创立,旨在建立一个更好的证券交易所,避免投资者不友好的行为,例如前期交易和高频交易。 胜山的功绩在迈克尔·刘易斯(Michael Lewis)的畅销书《 Flash Boys 》中被记载。
I have investigated many financial data providers and IEX Cloud has the best combination of:
我调查了许多金融数据提供商,并且IEX Cloud具有以下最佳组合:
Their prices are below:
其价格如下:
The $9/month Launch plan is plenty for many use cases.
每月9美元的启动计划可满足许多用例。
A warning on using IEX Cloud (and any other pay-per-use data provider): it is very important that you set usage budgets from the beginning. These budgets lock you out of your account once you hit a certain dollar cost for the month.
关于使用IEX Cloud(和任何其他按使用量付费数据提供者)的警告:从一开始就设置使用预算非常重要。 一旦您当月达到一定的美元费用,这些预算就会使您无法进入帐户。
When I first started using IEX Cloud, I accidentally created an infinite loop on a Friday afternoon that contained an API call to IEX Cloud. These API calls are priced on a cost-per-call basis...which resulted in a terrifying email from IEX:
刚开始使用IEX Cloud时,我意外地在星期五下午创建了一个无限循环,其中包含对IEX Cloud的API调用。 这些API调用是按每次调用费用定价的,这导致来自IEX的电子邮件令人恐惧:
It is a testament to IEX's customer-centricity that they agreed to reset my usage as long as I set usage budgets moving forward. Go IEX!
IEX以客户为中心的证明是,只要我设定未来的使用预算,他们就同意重置我的使用情况。 去IEX!
As with most API subscriptions, the main benefit of creating an IEX Cloud account is having an API key.
与大多数API订阅一样,创建IEX Cloud帐户的主要好处是拥有API密钥。
For obvious reasons, I will not share an API key in this article.
出于明显的原因,我不会在本文中共享API密钥。
However, you can still work through this tutorial with your own API key as long as you assign it to the following variable name:
但是,只要将其分配给以下变量名,您仍然可以使用自己的API密钥来完成本教程:
IEX_API_Key
You will see the blank IEX_API_Key
variable in my code blocks throughout the rest of this tutorial.
在本教程的其余部分中,您将在我的代码块中看到空白的IEX_API_Key
变量。
Now that you have access to the API key that you'll need to gather financial data, it's time to write your Python script.
现在您已经可以访问收集财务数据所需的API密钥,现在该编写Python脚本了。
This will be the longest section of this tutorial. It is also the most flexible - we are going to create a Python script that satisfies certain pre-specified criteria, but you could modify this section to really create any spreadsheet you want!
这将是本教程中最长的部分。 它也是最灵活的-我们将创建一个满足某些预先指定条件的Python脚本,但是您可以修改此部分以真正创建所需的任何电子表格!
To start, let's lay out our goal posts. We are going to write a Python script that generates an Excel file of stock market data with the following characteristics:
首先,让我们布置目标职位。 我们将编写一个Python脚本,该脚本生成具有以下特征的股市数据的Excel文件:
It will be formatted such that the header's background color is #135485
and text is white, while the spreadsheet body's background is #DADADA
and the font color is black (the default).
格式将设置为:标题的背景颜色为#135485
,文本为白色,而电子表格正文的背景为#DADADA
,字体颜色为黑色(默认值)。
Let's start by importing our first package.
让我们从导入第一个包开始。
Since spreadsheets are essentially just data structures with rows and columns, then the pandas
library - including its built-in DataFrame
object - is a perfect candidate for manipulating data in this tutorial.
由于电子表格本质上只是具有行和列的数据结构,因此pandas
库(包括其内置的DataFrame
对象)是本教程中处理数据的理想选择。
We'll start by importing pandas
under the alias pd
like this:
我们将以这样的别名pd
导入pandas
:
import pandas as pd
Next, we'll specify our IEX Cloud API key. As I mentioned before, I'm not going to really include my API key, so you'll have to grab your own API key from your IEX account and include it here:
接下来,我们将指定IEX Cloud API密钥。 如前所述,我不会真正包含我的API密钥,因此您必须从IEX帐户中获取自己的API密钥并将其包含在此处:
IEX_API_Key = ''
Our next step is to determine the ten largest companies in the United States.
我们的下一步是确定美国十大公司。
You can answer this question with a quick Google search.
您可以通过快速的Google搜索回答此问题。
For brevity, I have included the companies (or rather, their stock tickers) in the following Python list:
为简便起见,我将公司(或更确切地说,它们的股票行情自动收录器)包括在以下Python列表中:
Next, it is time to figure out how to ping the IEX Cloud API to pull in the metrics we need for each company.
接下来,是时候弄清楚如何对IEX Cloud API进行ping操作,以获取每个公司所需的指标了。
The IEX Cloud API returns JSON objects in response to HTTP requests. Since we are working with more than 1 ticker in this tutorial, we will use IEX Cloud's batch API call functionality, which allows you to request data on more than one ticker at a time. Using batch API calls has two benefits:
IEX Cloud API返回JSON对象以响应HTTP请求。 由于我们在本教程中使用多个代码,因此我们将使用IEX Cloud的批处理API调用功能,该功能可让您一次在多个代码上请求数据。 使用批处理API调用有两个好处:
Here is an example of what the HTTP request might look like, with a few placeholder words where we'll need to customize the request:
这是HTTP请求的外观示例,其中有一些占位符,我们需要自定义该请求:
https://cloud.iexapis.com/stable/stock/market/batch?symbols=TICKERS&types=ENDPOINTS&range=RANGE&token=IEX_API_Key
In this URL, we'll replace these variables with the following values:
在此URL中,我们将这些变量替换为以下值:
TICKERS
will be replaced by a string that contains each of our tickers separated by a comma.
TICKERS
将替换为包含以逗号分隔的每个TICKERS
代码的字符串。
ENDPOINTS
will be replaced by a string that contains each of the IEX Cloud endpoints we want to hit, separated by a comma.
ENDPOINTS
将替换为包含我们要命中的每个IEX Cloud端点的字符串,并用逗号分隔。
RANGE
will be replaced by 1y
. These endpoints each contain point-in-time data and not time series data, so this range can really be whatever you want.
RANGE
将替换为1y
。 这些端点每个都包含时间点数据而不是时间序列数据,因此此范围实际上可以是您想要的任何范围。
Let's put this URL into a variable called HTTP_request
for us to modify later:
让我们将此URL放入一个名为HTTP_request
的变量中,以便我们稍后进行修改:
HTTP_request = 'https://cloud.iexapis.com/stable/stock/market/batch?symbols=TICKERS&types=ENDPOINTS&range=RANGE&token=IEX_API_Key'
Let's work through each of these variables one-by-one to determine the exact URL that we need to hit.
让我们逐一研究这些变量,以确定需要命中的确切URL。
For the TICKERS
variable, we can generate a real Python variable (and not just a placeholder word) with a simple for
loop:
对于TICKERS
变量,我们可以使用简单的for
循环生成一个真实的Python变量(而不仅仅是占位符):
#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''
#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:
ticker_string += ticker
ticker_string += ','
#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]
Now we can interpolate our ticker_string
variable into the HTTP_request
variable that we created earlier using an f-string:
现在,我们可以将我们的ticker_string
变量插值到我们先前使用f字符串创建的HTTP_request
变量中:
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types=ENDPOINTS&range=RANGE&token=IEX_API_Key'
Next, we need to determine which IEX Cloud endpoints we need to ping.
接下来,我们需要确定需要ping的IEX Cloud端点。
Some quick investigation into the IEX Cloud documentation reveals that we only need the price
and stats
endpoints to create our spreadsheet.
对IEX Cloud文档的一些快速调查显示,我们只需要price
和stats
端点即可创建电子表格。
Thus, we can replace the placeholder ENDPOINTS
word from our original HTTP request with the following variable:
因此,我们可以将原始HTTP请求中的占位符ENDPOINTS
单词替换为以下变量:
endpoints = 'price,stats'
Like we did with our ticker_string
variable, let's substitute the endpoints
variable into the ticker_string
variable:
就像我们对ticker_string
变量所做的一样,让我们将endpoints
变量替换为ticker_string
变量:
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=RANGE&token=IEX_API_Key'
The last placeholder we need to replace is RANGE
. We will not replace with this a variable. Instead, we can hardcode a 1y
directly into the URL path like this:
我们需要替换的最后一个占位符是RANGE
。 我们不会用这个替换变量。 相反,我们可以将1y
直接硬编码到URL路径中,如下所示:
https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token=IEX_API_Key
We've done a lot so far, so let's recap our code base:
到目前为止,我们已经做了很多工作,所以让我们回顾一下代码库:
import pandas as pd
IEX_API_Key = ''
#Specify the stock tickers that will be included in our spreadsheet
tickers = [
'MSFT',
'AAPL',
'AMZN',
'GOOG',
'FB',
'BRK.B',
'JNJ',
'WMT',
'V',
'PG'
]
#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''
#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:
ticker_string += ticker
ticker_string += ','
#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]
#Create the endpoint strings
endpoints = 'price,stats'
#Interpolate the endpoint strings into the HTTP_request string
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token={IEX_API_Key}'
It is now time to ping the API and save its data into a data structure within our Python application.
现在是时候ping API并将其数据保存到我们的Python应用程序中的数据结构中了。
We can read JSON objects with pandas' read_json
method. In our case, we'll save the JSON data to a pandas DataFrame
called raw_data
, like this:
我们可以使用pandas的read_json
方法读取JSON对象。 在本例中,我们将JSON数据保存到名为raw_data
的pandas DataFrame
,如下所示:
raw_data = pd.read_json(HTTP_request)
Let's take a moment now to make sure that the data has been imported in a nice format for our application.
现在让我们花一点时间来确保已为我们的应用程序以一种不错的格式导入了数据。
If you're working through this tutorial in a Jupyter Notebook, you can simply type the name of the pandas DataFrame
variable on the last line of a code cell, and Jupyter will nicely render an image of the data, like this:
如果您正在Jupyter Notebook中完成本教程,则只需在代码单元的最后一行键入pandas DataFrame
变量的名称,Jupyter就会很好地呈现数据图像,如下所示:
As you can see, the pandas DataFrame
contains a column for each stock ticker and two rows: one for the stats
endpoint and one for the price
endpoint. We will need to parse this DataFrame to get the four metrics we want. Let's work through the metrics one-by-one in the steps below.
如您所见, pandas DataFrame
的每个股票行情pandas DataFrame
包含一列,两行分别是: stats
端点和price
端点。 我们将需要解析此DataFrame以获得所需的四个指标。 让我们在以下步骤中一步一步地研究指标。
This step is very straightforward since the stock tickers are contained in the columns of the pandas DataFrame
. We can access them through the columns
attribute of the pandas DataFrame
like this:
由于股票行情收录器包含在pandas DataFrame
的列中,因此这一步骤非常简单。 我们可以通过pandas DataFrame
的columns
属性访问它们,如下所示:
raw_data.columns
To access the other metrics in raw_data
, we will create a for
loop that loops through each ticker in raw_data.columns
. In each iteration of the loop we will add the data to a new pandas DataFrame
object called output_data
.
要访问raw_data
的其他指标,我们将创建一个for
循环,循环遍历raw_data.columns
每个报价器。 在循环的每次迭代中,我们都将数据添加到名为output_data
的新pandas DataFrame
对象中。
First we'll need to create output_data
, which should be an empty pandas DataFrame
with four columns. Here's how to do this:
首先,我们需要创建output_data
,它应该是一个具有四列的空pandas DataFrame
。 这样做的方法如下:
output_data = pd.DataFrame(pd.np.empty((0,4)))
This creates an empty pandas DataFrame
with 0 rows and 4 columns.
这将创建一个空的pandas DataFrame
其中包含0行和4列。
Now that this object has been created, here's how we can structure this for
loop:
现在已经创建了该对象,这是我们如何构造此for
循环的方法:
for ticker in raw_data.columns:
#Parse the company's name - not completed yet
company_name = ''
#Parse the company's stock price - not completed yet
stock_price = 0
#Parse the company's dividend yield - not completed yet
dividend_yield = 0
new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
output_data = output_data.append(new_column, ignore_index = True)
Next, let's determine how to parse the company_name
variable from the raw_data
object.
接下来,让我们确定如何从raw_data
对象中解析company_name
变量。
The company_name
variable is the first variable will need to be parsed from the raw_data
object. As a quick recap, here's what raw_data
looks like:
company_name
变量是第一个需要从raw_data
对象解析的变量。 快速回顾一下, raw_data
如下所示:
The company_name
variable is held within the stats
endpoint under the dictionary key companyName
. To parse this data point out of raw_data
, we can use these indexes:
company_name
变量保存在stats
端点内的字典键companyName
。 要从raw_data
解析此数据点,我们可以使用以下索引:
raw_data[ticker]['stats']['companyName']
Including this in our for
loop from before gives this:
从之前将其包含在我们的for
循环中将得到以下结果:
output_data = pd.DataFrame(pd.np.empty((0,4)))
for ticker in raw_data.columns:
#Parse the company's name - not completed yet
company_name = raw_data[ticker]['stats']['companyName']
#Parse the company's stock price - not completed yet
stock_price = 0
#Parse the company's dividend yield - not completed yet
dividend_yield = 0
new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
output_data = output_data.append(new_column, ignore_index = True)
Let's move on to parsing stock_price
.
让我们继续分析stock_price
。
The stock_price
variable is contained within the price
endpoint, which returns only a single value. This means we do not need to chain together indexes like we did with company_name
.
stock_price
变量包含在price
端点内,该端点仅返回单个值。 这意味着我们不需要像对company_name
那样将索引链接在一起。
Here's how we could parse stock_price
from raw_data
:
这是我们从raw_data
解析stock_price
:
raw_data[ticker]['price']
Including this in our for
loop gives us:
在我们的for
循环中包含它for
使我们:
output_data = pd.DataFrame(pd.np.empty((0,4)))
for ticker in raw_data.columns:
#Parse the company's name - not completed yet
company_name = raw_data[ticker]['stats']['companyName']
#Parse the company's stock price - not completed yet
stock_price = raw_data[ticker]['price']
#Parse the company's dividend yield - not completed yet
dividend_yield = 0
new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
output_data = output_data.append(new_column, ignore_index = True)
The last metric we need to parse is dividend_yield
.
我们需要解析的最后一个指标是dividend_yield
。
Like company_name
, dividend_yield
is contained in the stats
endpoint. It is held under the dividendYield
dictionary key.
像company_name
一样, dividend_yield
端点也包含在stats
端点中。 它被保存在dividendYield
字典关键字下。
Here is how we could parse it out of raw_data
:
这是我们可以从raw_data
解析出来的方法:
raw_data[ticker]['stats']['dividendYield']
Adding this to our for
loop gives us:
将其添加到我们的for
循环中可以使我们:
output_data = pd.DataFrame(pd.np.empty((0,4)))
for ticker in raw_data.columns:
#Parse the company's name - not completed yet
company_name = raw_data[ticker]['stats']['companyName']
#Parse the company's stock price - not completed yet
stock_price = raw_data[ticker]['price']
#Parse the company's dividend yield - not completed yet
dividend_yield = raw_data[ticker]['stats']['dividendYield']
new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
output_data = output_data.append(new_column, ignore_index = True)
Let's print out our output_data
object to see what the data looks like:
让我们打印出output_data
对象,看看数据是什么样的:
So far so good! The next two steps are to name the columns of the pandas DataFrame
and to change its index.
到目前为止,一切都很好! 接下来的两个步骤是命名pandas DataFrame
的列并更改其索引。
We can update the column names of our output_data
object by creating a list of column names and assigning it to the output_data.columns
attribute, like this:
我们可以通过创建列名列表并将其分配给output_data.columns
属性来更新output_data
对象的列名,如下所示:
output_data.columns = ['Ticker', 'Company Name', 'Stock Price', 'Dividend Yield']
Let's print out our output_data
object to see what the data looks like:
让我们打印出output_data
对象,看看数据是什么样的:
Much better! Let's change the index of output_data
next.
好多了! 接下来让我们更改output_data
的索引。
The index of a pandas DataFrame
is a special column that is somewhat similar to the primary key of a SQL database table. In our output_data
object, we want to set the Ticker
column as the DataFrame
's index.
pandas DataFrame
的索引是一个特殊的列,与SQL数据库表的主键有些相似。 在我们的output_data
对象中,我们想要将Ticker
列设置为DataFrame
的索引。
Here's how we can do this using the set_index
method:
使用set_index
方法的方法如下:
output_data.set_index('Ticker', inplace=True)
Let's print out our output_data
object to see what the data looks like:
让我们打印出output_data
对象,看看数据是什么样的:
Another incremental improvement!
另一个增量的改进!
Next, let's deal with the missing data in output_data
.
接下来,让我们处理output_data
丢失的数据。
If you take a close look at output_data
, you will notice that there are several None
values in the Dividend Yield
column:
如果仔细查看output_data
,您会注意到“ Dividend Yield
列中有多个“ None
值:
These None
values simply indicate that the company for that row does not currently pay a dividend. While None
is one way of representing a non-dividend stock, it is more common to show a Dividend Yield
of 0
.
这些“ None
值仅表示该行的公司当前不支付股息。 虽然None
是表示非股息股票的一种方式,但更常见的是将Dividend Yield
显示为0
。
Fortunately, the fix for this is quite straightforward. The pandas
library includes an excellent fillna
method that allows us to replace missing values in a pandas DataFrame
.
幸运的是,解决方法非常简单。 pandas
库包含出色的fillna
方法,该方法使我们可以替换pandas DataFrame
缺失值。
Here's how we can use the fillna
method to replace our Dividend Yield
column's None
values with 0
:
这是我们可以使用fillna
方法将“ Dividend Yield
列的None
值替换为0
:
output_data['Dividend Yield'].fillna(0,inplace=True)
The output_data
object looks much cleaner now:
现在, output_data
对象看起来更加干净:
We are now ready to export our DataFrame to an Excel document! As a quick recap, here is our Python script to date:
现在,我们准备将DataFrame导出到Excel文档中! 快速回顾一下,这是迄今为止的Python脚本:
import pandas as pd
IEX_API_Key = ''
#Specify the stock tickers that will be included in our spreadsheet
tickers = [
'MSFT',
'AAPL',
'AMZN',
'GOOG',
'FB',
'BRK.B',
'JNJ',
'WMT',
'V',
'PG'
]
#Create an empty string called `ticker_string` that we'll add tickers and commas to
ticker_string = ''
#Loop through every element of `tickers` and add them and a comma to ticker_string
for ticker in tickers:
ticker_string += ticker
ticker_string += ','
#Drop the last comma from `ticker_string`
ticker_string = ticker_string[:-1]
#Create the endpoint strings
endpoints = 'price,stats'
#Interpolate the endpoint strings into the HTTP_request string
HTTP_request = f'https://cloud.iexapis.com/stable/stock/market/batch?symbols={ticker_string}&types={endpoints}&range=1y&token={IEX_API_Key}'
#Create an empty pandas DataFrame to append our parsed values into during our for loop
output_data = pd.DataFrame(pd.np.empty((0,4)))
for ticker in raw_data.columns:
#Parse the company's name
company_name = raw_data[ticker]['stats']['companyName']
#Parse the company's stock price
stock_price = raw_data[ticker]['price']
#Parse the company's dividend yield
dividend_yield = raw_data[ticker]['stats']['dividendYield']
new_column = pd.Series([ticker, company_name, stock_price, dividend_yield])
output_data = output_data.append(new_column, ignore_index = True)
#Change the column names of output_data
output_data.columns = ['Ticker', 'Company Name', 'Stock Price', 'Dividend Yield']
#Change the index of output_data
output_data.set_index('Ticker', inplace=True)
#Replace the missing values of the 'Dividend Yield' column with 0
output_data['Dividend Yield'].fillna(0,inplace=True)
#Print the DataFrame
output_data
There are multiple ways to export an xlsx
file from a pandas DataFrame
.
有多种方法可以从pandas DataFrame
导出xlsx
文件。
The easiest way is to use the built-in function to_excel
. As an example, here's how we could export output_data
to an Excel file:
最简单的方法是使用内置函数to_excel
。 例如,这是我们将output_data
导出到Excel文件的方法:
output_data.to_excel('my_excel_document.xlsx)
The problem with this approach is that the Excel file has no format whatsoever. The output looks like this:
这种方法的问题在于Excel文件没有任何格式。 输出如下所示:
The lack of formatting in this document makes it hard to interpret.
本文档缺乏格式,使其难以解释。
What is the solution?
解决办法是什么?
We can use the Python package XlsxWriter
to generate nicely-formatted Excel files. To start, we'll want to add the following import to the beginning of our Python script:
我们可以使用Python包XlsxWriter
生成格式正确的Excel文件。 首先,我们要在Python脚本的开头添加以下导入:
import xlsxwriter
Next, we need to create our actual Excel file. The XlsxWriter package actually has a dedicated documentation page for how to work with pandas DataFrames
, which is available here.
接下来,我们需要创建实际的Excel文件。 XlsxWriter软件包实际上有一个专门的文档页面,介绍如何使用pandas DataFrames
,可在此处找到 。
Our first step is to call the pd.ExcelWriter
function and pass in the desired name of our xlsx
file as the first argument and engine='xlsxwriter
as the second argument. We will assign this to a variable called writer
:
我们的第一步是调用pd.ExcelWriter
函数,并将我们的xlsx
文件的所需名称作为第一个参数传递,并将engine='xlsxwriter
作为第二个参数engine='xlsxwriter
。 我们将其分配给一个称为writer
的变量:
writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')
From there, we need to call the to_excel
method on our pandas DataFrame
. This time, instead of passing in the name of the file that we're trying to export, we'll pass in the writer
object that we just created:
从那里,我们需要在pandas DataFrame
上调用to_excel
方法。 这次,我们将传递刚刚创建的writer
对象,而不是传递我们试图导出的文件名:
output_data.to_excel(writer, sheet_name='Sheet1')
Lastly, we will call the save
method on our writer
object, which saves the xlsx
file to our current working directory. When all this is done, here is the section of our Python script that saves output_data
to an Excel file.
最后,我们将在writer
对象上调用save
方法,该方法将xlsx
文件保存到当前工作目录中。 完成所有这些操作后,这就是我们的Python脚本部分,该部分将output_data
保存到Excel文件。
writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')
output_data.to_excel(writer, sheet_name='Sheet1')
writer.save()
All of the formatting code that we will include in our xlsx
file needs to be contained between the creation of the ExcelWriter
object and the writer.save()
statement.
在创建ExcelWriter
对象和writer.save()
语句之间,必须包含将包含在xlsx
文件中的所有格式代码。
xlsx
文件设置样式 (How to Style an xlsx
File Created with Python)It is actually harder than you might think to style an Excel file using Python.
实际上,这比使用Python设置Excel文件样式的难度要大。
This is partially because of some of the limitations of the XlsxWriter package. Its documentation states:
部分原因是XlsxWriter软件包的某些限制。 其文档指出:
'XlsxWriter and Pandas provide very little support for formatting the output data from a dataframe apart from default formatting such as the header and index cells and any cells that contain dates or datetimes. In addition it isn’t possible to format any cells that already have a default format applied.
“ XlsxWriter和Pandas除了默认格式(如标头和索引单元格以及任何包含日期或日期时间的单元格)外,几乎没有支持格式化数据帧中的输出数据。 此外,无法格式化已应用默认格式的任何单元格。
If you require very controlled formatting of the dataframe output then you would probably be better off using Xlsxwriter directly with raw data taken from Pandas. However, some formatting options are available.'
如果您需要对数据帧输出进行严格控制的格式设置,那么最好直接使用Xlsxwriter处理来自Pandas的原始数据。 但是,有些格式选项可用。
In my experience, the most flexible way to style cells in an xlsx
file created by XlsxWriter is to use conditional formatting that only applies styling when a cell is not equal to None
.
以我的经验,在XlsxWriter创建的xlsx
文件中设置单元格样式的最灵活的方法是使用条件格式,该条件格式仅在单元格不等于None
时才应用样式设置。
This has three advantages:
这具有三个优点:
You do not need to manually loop through each data point and import them into the writer
object one-by-one.
您无需手动遍历每个数据点,并将它们一个接一个地导入到writer
对象中。
It allows you to easily see when None
values have made their way into your finalized xlsx
files, since they'll be missing the required formatting.
它使您可以轻松地查看None
值何时进入了最终的xlsx
文件,因为它们将缺少所需的格式。
To apply styling using conditional formatting, we first need to create a few style templates. Specifically, we will need four templates:
要使用条件格式应用样式,我们首先需要创建一些样式模板。 具体来说,我们将需要四个模板:
One header_template
that will be applied to the column names at the top of the spreadsheet
一个header_template
将应用于电子表格顶部的列名称
One string_template
that will be applied to the Ticker
and Company Name
columns
一个将应用于“ Ticker
和“ Company Name
列的string_template
One dollar_template
that will be applied to the Stock Price
column
一dollar_template
将应用于“ Stock Price
列
One percent_template
that will be applied to the Dividend Yield
column
将应用于“ Dividend Yield
列的一个percent_template
Each of these format templates need to be added to the writer
object in dictionaries that resemble CSS syntax. Here's what I mean:
这些格式模板中的每一个都需要以类似于CSS语法的字典添加到writer
对象。 这就是我的意思:
header_template = writer.book.add_format(
{
'font_color': '#ffffff',
'bg_color': '#135485',
'border': 1
}
)
string_template = writer.book.add_format(
{
'bg_color': '#DADADA',
'border': 1
}
)
dollar_template = writer.book.add_format(
{
'num_format':'$0.00',
'bg_color': '#DADADA',
'border': 1
}
)
percent_template = writer.book.add_format(
{
'num_format':'0.0%',
'bg_color': '#DADADA',
'border': 1
}
)
To apply these formats to specific cells in our xlsx
file, we need to call the package's conditional_format
method on writer.sheets['Stock Market Data']
. Here is an example:
要将这些格式应用于xlsx
文件中的特定单元格,我们需要在writer.sheets['Stock Market Data']
上调用程序包的conditional_format
方法。 这是一个例子:
writer.sheets['Stock Market Data'].conditional_format('A2:B11',
{
'type': 'cell',
'criteria': '<>',
'value': '"None"',
'format': string_template
}
)
If we generalize this formatting to the other three formats we're applying, here's what the formatting section of our Python script becomes:
如果将这种格式概括为我们将要应用的其他三种格式,这就是Python脚本的格式部分:
writer = pd.ExcelWriter('stock_market_data.xlsx', engine='xlsxwriter')
output_data.to_excel(writer, sheet_name='Stock Market Data')
header_template = writer.book.add_format(
{
'font_color': '#ffffff',
'bg_color': '#135485',
'border': 1
}
)
string_template = writer.book.add_format(
{
'bg_color': '#DADADA',
'border': 1
}
)
dollar_template = writer.book.add_format(
{
'num_format':'$0.00',
'bg_color': '#DADADA',
'border': 1
}
)
percent_template = writer.book.add_format(
{
'num_format':'0.0%',
'bg_color': '#DADADA',
'border': 1
}
)
#Format the header of the spreadsheet
writer.sheets['Stock Market Data'].conditional_format('A1:D1',
{
'type': 'cell',
'criteria': '<>',
'value': '"None"',
'format': header_template
}
)
#Format the 'Ticker' and 'Company Name' columns
writer.sheets['Stock Market Data'].conditional_format('A2:B11',
{
'type': 'cell',
'criteria': '<>',
'value': '"None"',
'format': string_template
}
)
#Format the 'Stock Price' column
writer.sheets['Stock Market Data'].conditional_format('C2:C11',
{
'type': 'cell',
'criteria': '<>',
'value': '"None"',
'format': dollar_template
}
)
#Format the 'Dividend Yield' column
writer.sheets['Stock Market Data'].conditional_format('D2:D11',
{
'type': 'cell',
'criteria': '<>',
'value': '"None"',
'format': percent_template
}
)
writer.save()
Let's take a look at our Excel document to see how its looking:
让我们看一下我们的Excel文档,看看它的外观:
So far so good! The last incremental improvement that we can make to this document is to make its columns a bit wider.
到目前为止,一切都很好! 我们可以对本文档进行的最后一个增量改进是使其列更宽。
We can specify column widths by calling the set_column
method on writer.sheets['Stock Market Data']
.
我们可以通过调用writer.sheets['Stock Market Data']
的set_column
方法来指定列宽。
Here's what we'll add to our Python script to do this:
这是我们将添加到Python脚本中的操作:
#Specify all column widths
writer.sheets['Stock Market Data'].set_column('B:B', 32)
writer.sheets['Stock Market Data'].set_column('C:C', 18)
writer.sheets['Stock Market Data'].set_column('D:D', 20)
Here's the final version of the spreadsheet:
这是电子表格的最终版本:
Voila! We are good to go! You can access the final version of this Python script on GitHub here. The file is named stock_market_data.py
.
瞧! 我们很好走! 您可以在GitHub上访问此Python脚本的最终版本 。 该文件名为stock_market_data.py
。
Your Python script is finalized and ready to run.
您的Python脚本已完成并可以运行。
However, we do not want to simply run this on our local machine on an ad hoc basis.
但是,我们不想临时在本地计算机上简单地运行它。
Instead, we are going to set up a virtual machine using Amazon Web Services' Elastic Compute Cloud (EC2) service.
相反,我们将使用Amazon Web Services的Elastic Compute Cloud (EC2)服务来设置虚拟机。
You'll need to create an AWS account first if you do not already have one. To do this, navigate to this URL and click the "Create an AWS Account" in the top-right corner:
如果您还没有一个AWS账户,则需要先创建一个。 为此,请导航至该URL,然后单击右上角的“创建AWS账户”:
AWS' web application will guide you through the steps to create an account.
AWS的Web应用程序将指导您完成创建帐户的步骤。
Once your account is created, you'll need to create an EC2 instance. This is simply a virtual server for running code on AWS infrastructure.
创建帐户后,您需要创建一个EC2实例。 这只是一个用于在AWS基础架构上运行代码的虚拟服务器。
EC2 instances come in various operating systems and sizes, ranging from very small servers that qualify for AWS' free tier to very large servers capable of running complex applications.
EC2实例具有各种操作系统和大小,范围从符合AWS免费层标准的小型服务器到能够运行复杂应用程序的大型服务器。
We will use AWS' smallest server to run the Python script that we wrote in this article. To get started, navigate to EC2 within the AWS management console. Once you've arrived within EC2, click Launch Instance
:
我们将使用AWS最小的服务器来运行我们在本文中编写的Python脚本。 要开始使用,请在AWS管理控制台中导航到EC2。 进入EC2后,点击Launch Instance
:
This will bring you to a screen that contains all of the available instance types within AWS EC2. Any machine that qualifies for AWS' free tier will be sufficient.
这将带您到一个包含AWS EC2内所有可用实例类型的屏幕。 符合AWS免费套餐条件的任何计算机就足够了。
I chose the Amazon Linux 2 AMI (HVM)
:
我选择了Amazon Linux 2 AMI (HVM)
:
Click Select
to proceed.
单击Select
继续。
On the next page, AWS will ask you to select the specifications for your machine. The fields you can select include:
在下一页上,AWS将要求您选择计算机的规格。 您可以选择的字段包括:
Family
Family
Type
Type
vCPUs
vCPUs
Memory
Memory
Instance Storage (GB)
Instance Storage (GB)
EBS-Optimized
EBS-Optimized
Network Performance
Network Performance
IPv6 Support
IPv6 Support
For the purpose of this tutorial, we simply want to select the single machine that is free tier eligible. It is characterized by a small green label that looks like this:
就本教程而言,我们只想选择符合免费套餐资格的单台计算机。 它的特征是带有一个小的绿色标签,如下所示:
Once you have selected a free tier eligible machine, click Review and Launch
at the bottom of the screen to proceed. The next screen will present the details of your new instance for you to review. Quickly review the machine's specifications, then click Launch
in the bottom right-hand corner.
选择符合条件的免费套餐计算机后,请单击屏幕底部的“ Review and Launch
”以继续。 下一个屏幕将显示新实例的详细信息供您查看。 快速查看机器的规格,然后单击右下角的Launch
。
Clicking the Launch
button will trigger a popup that asks you to Select an existing key pair or create a new key pair
. A key pair is comprised of a public key that AWS holds and a private key that you must download and store within a .pem
file. You must have access to that .pem
file in order to access your EC2 instance (typically via SSH). You also have the option to proceed without a key pair, but this is not recommended for security reasons.
单击Launch
按钮将触发一个弹出窗口,要求您Select an existing key pair or create a new key pair
。 密钥对由AWS持有的公共密钥和必须下载并存储在.pem
文件中的私有密钥组成。 您必须有权访问该.pem
文件才能访问您的EC2实例(通常通过SSH)。 您还可以选择不使用密钥对继续进行操作,但是出于安全原因, 不建议这样做。
Once you have selected or created a key pair for this EC2 instance and click the radio button for I acknowledge that I have access to the selected private key file (data-feeds.pem), and that without this file, I won't be able to log into my instance
, you can click Launch Instances
to proceed.
为该EC2实例选择或创建密钥对后,请单击单选按钮,因为I acknowledge that I have access to the selected private key file (data-feeds.pem), and that without this file, I won't be able to log into my instance
,您可以单击Launch Instances
继续。
Your instance will now begin to launch. It can take some time for these instances to boot up, but once its ready, its Instance State
will show as running
in your EC2 dashboard.
您的实例现在将开始启动。 这些实例启动可能需要一些时间,但是一旦准备就绪,其Instance State
将在您的EC2仪表板中显示为running
。
Next, you will need to push your Python script into your EC2 instance. Here is a generic command state statement that allows you to move a file into an EC2 instance:
接下来,您将需要将Python脚本推送到EC2实例中。 这是一条通用的命令状态语句,它使您可以将文件移动到EC2实例中:
scp -i path/to/.pem_file path/to/file username@host_address.amazonaws.com:/path_to_copy
Run this statement with the necessary replacements to move stock_market_data.py
into the EC2 instance.
运行此语句并进行必要的替换,以将stock_market_data.py
移入EC2实例。
Trying to run stock_market_data.py
at this point will actually result in an error because the EC2 instance does not come with the necessary Python packages.
此时尝试运行stock_market_data.py
实际上会导致错误,因为EC2实例未附带必需的Python软件包。
To fix this, you can either export a requirements.txt
file and import the proper packages using pip
, or you can simply run the following:
要解决此问题,您可以导出requirements.txt
文件并使用pip
导入适当的软件包,也可以简单地运行以下命令:
sudo yum install python3-pip
pip3 install pandas
pip3 install xlsxwriter
Once this is done, you can SSH into the EC2 instance and run the Python script from the command line with the following statement:
完成此操作后,您可以通过SSH进入EC2实例,并使用以下语句从命令行运行Python脚本:
python3 stock_market_data.py
With the work that we have completed so far, our Python script can be executed inside of our EC2 instance.
到目前为止,我们已经完成了工作,可以在EC2实例内部执行Python脚本。
The problem with this is that the xlsx
file will be saved to the AWS virtual server.
问题是xlsx
文件将保存到AWS虚拟服务器。
It is not accessible to anyone but us in that server, which limits its usefulness.
该服务器中除我们以外的任何人都无法访问它,这限制了它的实用性。
To fix this, we are going to create a public bucket on AWS S3 where we can save the xlsx
file. Anyone who has the right URL will be able to download this file once this change is made.
为了解决这个问题,我们将在AWS S3上创建一个公共存储桶,在其中可以保存xlsx
文件。 进行此更改后,拥有正确URL的任何人都可以下载此文件。
To start, navigate to AWS S3 from within the AWS Management Console. Click Create bucket
in the top right:
首先,从AWS管理控制台中导航到AWS S3。 点击右上角的Create bucket
:
On the next screen, you will need to pick a name for your bucket and an AWS region for the bucket to be hosted in. The bucket name must be unique and cannot contain spaces or uppercase letters. The region does not matter much for the purpose of this tutorial, so I will be using the default region of US East (Ohio) us-east-2)
.
在下一个屏幕上,您将需要为存储桶选择一个名称,并为要托管的存储桶选择一个AWS区域。存储桶名称必须唯一,并且不能包含空格或大写字母。 该区域对于本教程而言并不重要,因此我将使用US East (Ohio) us-east-2)
的默认区域。
You will need to change the Public Access settings in the next section to match this configuration:
您将需要在下一部分中更改“公共访问”设置以匹配此配置:
Click Create bucket
to create your bucket and conclude this step of this tutorial!
单击Create bucket
以创建您的存储桶,并完成本教程的这一步!
Our AWS S3 bucket is now ready to hold our finalized xlsx
document. We will now make a small change to our stock_market_data.py
file to push the finalized document to our S3 bucket.
现在,我们的AWS S3存储桶已准备好保存我们最终确定的xlsx
文档。 现在,我们stock_market_data.py
文件进行少量更改,以将最终文档推送到我们的S3存储桶中。
We will need to use the boto3
package to do this. boto3
is the AWS Software Development Kit (SDK) for Python, allowing Python developers to write software that connects to AWS services. To start, you'll need to install boto3
on your EC2 virtual machine. Run the following command line statement to do this:
我们将需要使用boto3
软件包来执行此操作。 boto3
是用于Python的AWS软件开发套件(SDK),允许Python开发人员编写连接到AWS服务的软件。 首先,您需要在EC2虚拟机上安装boto3
。 运行以下命令行语句以执行此操作:
pip3 install boto3
You will also need to import the library into stock_market_data.py
by adding the following statement to the top of the Python script.
您还需要通过将以下语句添加到Python脚本的顶部,将库导入stock_market_data.py
。
import boto3
We will need to add a few lines of code to the end of stock_market_data.py
to push the final document to AWS S3.
我们将需要在stock_market_data.py
的末尾添加几行代码,以将最终文档推送到AWS S3。
s3 = boto3.resource('s3')
s3.meta.client.upload_file('stock_market_data.xlsx', 'my-S3-bucket', 'stock_market_data.xlsx', ExtraArgs={'ACL':'public-read'})
The first line of this code, s3 = boto3.resource('s3')
, allows our Python script to connect to Amazon Web Services.
此代码的第一行s3 = boto3.resource('s3')
允许我们的Python脚本连接到Amazon Web Services。
The second line of code calls a method from boto3
that actually uploads our file to S3. It takes four arguments:
第二行代码从boto3
调用一个方法,该方法实际上将我们的文件上传到S3。 它包含四个参数:
stock_market_data.xlsx
- the name of the file on our local machine.
stock_market_data.xlsx
我们本地计算机上文件的名称。
my-S3-bucket
- the name of the S3 bucket that we're uploading our file to.
my-S3-bucket
我们要将文件上传到的S3存储桶的名称。
stock_market_data.xlsx
- the desired name of the file within the S3 bucket. In most cases, this will have the same value as the first argument passed into this method.
stock_market_data.xlsx
-S3存储桶中文件的所需名称。 在大多数情况下,该值与传递给此方法的第一个参数的值相同。
ExtraArgs={'ACL':'public-read'}
- this is an optional argument that tells AWS to make the uploaded file publicly-readable.
ExtraArgs={'ACL':'public-read'}
-这是一个可选参数,告诉AWS使上传的文件公开可读。
So far, we have completed the following:
到目前为止,我们已经完成了以下工作:
Created an S3 bucket where we can push the final xlsx
document
创建了一个S3存储桶,我们可以在其中推送最终的xlsx
文档
Modified the original Python script to upload the finalized stock_market_data.xlsx
file to an AWS S3 bucket
修改了原始Python脚本以将最终的stock_market_data.xlsx
文件上传到AWS S3存储桶
The only step that is left is to schedule the Python script to run periodically.
剩下的唯一步骤是安排Python脚本定期运行。
We can do this using a command-line utility called cron
. To start, we will need to create a cron
expression that tells the utility when to run the code. The crontab guru website is an excellent resource for this.
我们可以使用名为cron
的命令行实用程序来执行此操作。 首先,我们将需要创建一个cron
表达式来告诉实用程序何时运行代码。 crontab大师网站是一个很好的资源。
Here's how you can use crontab guru to get cron
expression that means every day at noon
:
这是使用crontab专家获取cron
表达式的方法,这意味着every day at noon
:
Now we need to instruct our EC2 instance's cron
daemon to run stock_market_data.py
at this time each day.
现在,我们需要指示EC2实例的cron
守护进程每天每天的这个时候运行stock_market_data.py
。
To do this, we will first create a new file in our EC2 instance called stock_market_data.cron
.
为此,我们将首先在EC2实例中创建一个名为stock_market_data.cron
的新文件。
Open up this file and type in our cron expression followed by the statement that should be executed at the command line at that specified time.
打开此文件,然后键入cron表达式,然后输入应在指定时间在命令行执行的语句。
Our command line statement is python3 stock_market_data.py
, so here is what should be contained in stock_market_data.cron
:
我们的命令行语句是python3 stock_market_data.py
,所以这是stock_market_data.cron
应包含的stock_market_data.cron
:
00 12 * * * python3 stock_market_data.py
If you run an ls
command in your EC2 instance, you should now see two files:
如果在EC2实例中运行ls
命令,现在应该看到两个文件:
stock_market_data.py stock_market_data.cron
The last step of this tutorial is to load stock_market_data.cron
into the crontab
. You can think of the crontab
as a file that contains commands and instructions for the cron
daemon to execute. In other words, the crontab
contains batches of cron
jobs.
本教程的最后一步是将stock_market_data.cron
加载到crontab
。 您可以将crontab
视为一个文件,其中包含cron
守护程序要执行的命令和说明。 换句话说, crontab
包含一批cron
作业。
First, let's see what's in our crontab
. It should be empty since we have not put anything in it! You can view the contents of your crontab
with the following command:
首先,让我们看看crontab
。 它应该是空的,因为我们没有在其中放任何东西! 您可以使用以下命令查看crontab
的内容:
crontab -l
To load stock_market_data.cron
into the crontab
, run the following statement on the command line:
要将stock_market_data.cron
加载到crontab
,请在命令行上运行以下语句:
crontab stock_market_data.cron
Now when you run crontab -l
, you should see:
现在,当您运行crontab -l
,您应该看到:
00 12 * * * python3 stock_market_data.py
Our stock_market_data.py
script will now run at noon every day on our AWS EC2 virtual machine!
现在,我们的stock_market_data.py
脚本每天中午将在我们的AWS EC2虚拟机上运行!
In this article, you learned how to create automatically-updating Excel spreadsheets of financial data using Python, IEX Cloud, and Amazon Web Services.
在本文中,您学习了如何使用Python,IEX Cloud和Amazon Web Services创建自动更新财务数据的Excel电子表格。
Here are the specific steps we covered in this tutorial:
以下是我们在本教程中介绍的具体步骤:
How to schedule code to run using the cron
software utility
如何安排代码使用cron
软件实用程序运行
This article was published by Nick McCullum, who teaches people how to code on his website.
这篇文章由尼克·麦卡鲁姆(Nick McCullum)发表,他教人们如何在其网站上进行编码 。
翻译自: https://www.freecodecamp.org/news/auto-updating-excel-python-aws/
aws python库