A pivot table is a table of statistics that summarizes the data of a more extensive table. The summary of data is reached through various aggregate functions – sum, average, min, max, etc.
数据透视表是统计表,汇总了更广泛的表的数据。 数据汇总可通过各种汇总函数获得-总和,平均值,最小值,最大值等。
A pivot table is a data processing technique to derive useful information from a table.
数据透视表是一种从表中获取有用信息的数据处理技术。
Pandas pivot_table() function is used to create pivot table from a DataFrame object. We can generate useful information from the DataFrame rows and columns. The pivot_table() function syntax is:
熊猫的pivot_table()函数用于从DataFrame对象创建数据透视表。 我们可以从DataFrame的行和列中生成有用的信息。 ivot_table()函数的语法为:
def pivot_table(
data,
values=None,
index=None,
columns=None,
aggfunc="mean",
fill_value=None,
margins=False,
dropna=True,
margins_name="All",
observed=False,
)
It’s better to use real-life data to understand the actual benefit of pivot tables. I have downloaded a sample CSV file from this link. Here is the direct download link for the CSV file.
最好使用实际数据来了解数据透视表的实际好处。 我已经从此链接下载了示例CSV文件。 这是CSV文件的直接下载链接。
The CSV file is a listing of 1,460 company funding records reported by TechCrunch. The below image shows the sample data from the file.
CSV文件列出了TechCrunch报告的1,460家公司资金记录。 下图显示了文件中的样本数据。
We are interested in the columns – ‘company’, ‘city’, ‘state’, ‘raisedAmt’, and ’round’. Let’s create some pivot tables to generate useful statistics from this data.
我们对“公司”,“城市”,“州”,“ raisedAmt”和“回合”列感兴趣。 让我们创建一些数据透视表,以根据这些数据生成有用的统计信息。
Let’s try to create a pivot table for average funding by the state.
让我们尝试创建一个枢纽分析表,以了解各州的平均资助情况。
import pandas as pd
import numpy as np
df = pd.read_csv('TechCrunchcontinentalUSA.csv', usecols=['company', 'city', 'state', 'raisedAmt', 'round'])
print('DataFrame Records:\n', df.head(6))
# average funding by State
df1 = pd.pivot_table(df, values='raisedAmt', columns='state')
print('\nAverage Funding by State:\n', df1)
Output:
输出:
DataFrame Records:
company city state raisedAmt round
0 LifeLock Tempe AZ 6850000 b
1 LifeLock Tempe AZ 6000000 a
2 LifeLock Tempe AZ 25000000 c
3 MyCityFaces Scottsdale AZ 50000 seed
4 Flypaper Phoenix AZ 3000000 a
5 Infusionsoft Gilbert AZ 9000000 a
Average Funding by State:
state AZ CA ... VA WA
raisedAmt 5613750.0 1.072324e+07 ... 1.158261e+07 8.140103e+06
[1 rows x 33 columns]
We can also call pivot_table() function directly on the DataFrame object. The above pivot table can be generated using the below code snippet too.
我们还可以直接在DataFrame对象上调用ivot_table()函数。 上面的数据透视表也可以使用下面的代码段生成。
df1 = df.pivot_table(values='raisedAmt', columns='state')
The default aggregate function is numpy.mean
. We can specify the aggregate function as numpy.sum
to generate the total funding by the state.
默认的聚合函数是numpy.mean
。 我们可以将合计函数指定为numpy.sum
来生成州的总资金。
df1 = pd.pivot_table(df, values='raisedAmt', columns='state', aggfunc=np.sum)
print('\nTotal Funding by State:\n', df1)
Output:
输出:
Total Funding by State:
state AZ CA CO ... UT VA WA
raisedAmt 50523750 9361385000 126470000 ... 153080000 266400000 789590000
[1 rows x 33 columns]
df1 = pd.pivot_table(df, values='raisedAmt', columns='company', aggfunc=np.sum)
print('\nTotal Funding by Company:\n', df1)
Output:
输出:
Total Funding by Company:
company 23andMe 3Jam 4HomeMedia ... vbs tv x+1 xkoto
raisedAmt 9000000 4000000 2850000 ... 10000000 16000000 7500000
[1 rows x 909 columns]
Let’s try to create a pivot table for the average funding by round grouped by the state. The trick is to generate a pivot table with ’round’ as the index column.
让我们尝试创建一个按州分组的平均融资的数据透视表。 诀窍是生成一个以“ round”作为索引列的数据透视表。
df1 = pd.pivot_table(df, values='raisedAmt', columns='state', index='round')
print('\nAverage Funding by round in State:\n', df1)
Output:
输出:
Average Funding by round in State:
state AZ CA ... VA WA
round ...
a 6.000000e+06 7.158314e+06 ... 9910000.0 6.570476e+06
angel 2.337500e+05 1.006784e+06 ... NaN 8.935714e+05
b 6.850000e+06 1.238483e+07 ... 9850000.0 1.187826e+07
c 2.500000e+07 2.369708e+07 ... 19500000.0 1.592222e+07
d NaN 3.012188e+07 ... 20000000.0 8.500000e+06
debt_round NaN 1.660833e+07 ... NaN NaN
e NaN 3.132500e+07 ... NaN 2.200000e+07
seed 1.466667e+05 8.778214e+05 ... 350000.0 7.800000e+05
unattributed NaN 1.933000e+07 ... NaN 2.050000e+07
[9 rows x 33 columns]
df1 = pd.pivot_table(df, values='raisedAmt', columns='state', index='round', aggfunc=np.sum, fill_value=0)
print('\nTotal Funding by round in State:\n', df1)
Output:
输出:
Total Funding by round in State:
state AZ CA CO ... UT VA WA
round ...
a 18000000 2526885000 25650000 ... 31800000 99100000 275960000
angel 233750 74502000 3950000 ... 0 0 12510000
b 6850000 2898050000 66900000 ... 67200000 68950000 273200000
c 25000000 2109040000 28850000 ... 54000000 78000000 143300000
d 0 963900000 0 ... 0 20000000 17000000
debt_round 0 199300000 500000 ... 0 0 0
e 0 250600000 0 ... 0 0 44000000
seed 440000 49158000 620000 ... 80000 350000 3120000
unattributed 0 289950000 0 ... 0 0 20500000
[9 rows x 33 columns]
Let’s look at a more complex example. We will create a pivot table of total funding per company per round, state wise.
让我们看一个更复杂的例子。 我们将在州级明智的情况下创建每个公司每轮总资金的数据透视表。
df1 = pd.pivot_table(df, values='raisedAmt', columns='state', index=['company', 'round'], aggfunc=np.sum, fill_value=0)
print('\nTotal Funding by company and round in State:\n', df1)
Output:
输出:
Total Funding by round in State:
state AZ CA CO CT DC FL GA ... PA RI TN TX UT VA WA
company round ...
23andMe a 0 9000000 0 0 0 0 0 ... 0 0 0 0 0 0 0
3Jam a 0 4000000 0 0 0 0 0 ... 0 0 0 0 0 0 0
4HomeMedia a 0 2850000 0 0 0 0 0 ... 0 0 0 0 0 0 0
5min a 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
angel 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
... .. ... .. .. .. .. .. ... .. .. .. .. .. .. ..
uber b 0 7600000 0 0 0 0 0 ... 0 0 0 0 0 0 0
utoopia seed 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
vbs tv seed 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
x+1 a 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
xkoto b 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
[1405 rows x 33 columns]
翻译自: https://www.journaldev.com/33521/pandas-pivot_table-dataframe-data-analysis