python 数据透视表
One of the biggest challenges when facing a new data set is knowing where to start and what to focus on. Being able to quickly summarize hundreds of rows and columns can save you a lot of time and frustration. A simple tool you can use to achieve this is a pivot table, which helps you slice, filter, and group data at the speed of inquiry and represent the information in a visually appealing way.
面对新数据集时最大的挑战之一就是要知道从哪里开始以及应该关注什么。 能够快速汇总成百上千的行和列可以节省大量时间和精力。 数据透视表是您可以用来实现此目的的简单工具,它可以帮助您以查询的速度对数据进行切片,过滤和分组,并以视觉上有吸引力的方式表示信息。
You may already be familiar with the concept of pivot tables from Excel, where they were introduced in 1994 by the trademarked name PivotTable. This tool enabled users to automatically sort, count, total, or average the data stored in one table. In the image below we used the PivotTable functionality to quickly summarize the Titanic data set. The larger table below displays the first ~30 rows of the data set, and the smaller tables are the PivotTables we created.
您可能已经熟悉Excel中的数据透视表的概念,该概念在1994年由商标名称PivotTable引入。 使用此工具,用户可以自动对一个表中存储的数据进行排序,计数,总计或平均。 在下图中,我们使用了数据透视表功能来快速汇总Titanic数据集。 下面较大的表显示了数据集的前30行,较小的表是我们创建的数据透视表。
The pivot table on the left grouped the data according to the Sex
and Survived
column. As a result, this table displays the percentage of each gender among the different survival status (0
: Didn’t survive, 1
: Survived). This allows us to quickly see that women had better chances of survival than men. The table on the right also uses the Survived
column, but this time the data is grouped by Class
.
左侧的数据透视表根据“ Sex
和Survived
列对数据进行了分组。 结果,此表显示了每种性别在不同生存状态中所占的百分比( 0
:未生存, 1
:生存)。 这使我们能够Swift看到女性比男性拥有更好的生存机会。 右侧的表也使用了Survived
列,但是这次数据是按Class
分组的。
We used Excel for the above examples, but this post will demonstrate the advantages of the built-in pandas function pivot_table built in function in Pandas. We’ll use the World Happiness Report, which is a survey about the state of global happiness. The report ranks more than 150 countries by their happiness levels, and has been published almost every year since 2012. We’ll use data collected in the years 2015, 2016, and 2017, which is available for download if you’d like to follow along. We’re running python 3.6 and pandas 0.19.
在上面的示例中,我们使用了Excel,但是本文将演示内置熊猫函数内置的功能pivot_table 。 我们将使用《 世界幸福报告》 ,该报告是关于全球幸福状况的调查。 该报告按幸福等级对150多个国家/地区进行排名,自2012年以来几乎每年都会发布。我们将使用2015年,2016年和2017年收集的数据,如果您想了解的话可以下载。沿。 我们正在运行python 3.6和pandas 0.19。
Some interesting questions we might like to answer are:
我们可能想回答的一些有趣的问题是:
Let’s import our data and take a quick first look:
让我们导入数据并快速浏览一下:
import import pandas pandas as as pd
pd
import import numpy numpy as as np
np
# reading the data
# reading the data
data data = = pdpd .. read_csvread_csv (( 'data.csv''data.csv' , , index_colindex_col == 00 )
)
# sort the df by ascending years and descending happiness scores
# sort the df by ascending years and descending happiness scores
datadata .. sort_valuessort_values ([([ 'Year''Year' , , "Happiness Score""Happiness Score" ], ], ascendingascending == [[ TrueTrue , , FalseFalse ], ], inplaceinplace == TrueTrue )
)
#diplay first 10 rows
#diplay first 10 rows
datadata .. headhead (( 1010 )
)
Country | 国家 | Region | 地区 | Happiness Rank | 幸福等级 | Happiness Score | 幸福分数 | Economy (GDP per Capita) | 经济(人均GDP) | Family | 家庭 | Health (Life Expectancy) | 健康(预期寿命) | Freedom | 自由 | Trust (Government Corruption) | 信任(政府腐败) | Generosity | 慷慨大方 | Dystopia Residual | 反乌托邦残渣 | Year | 年 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
141 | 141 | Switzerland | 瑞士 | Western Europe | 西欧 | 1.0 | 1.0 | 7.587 | 7.587 | 1.39651 | 1.39651 | 1.34951 | 1.34951 | 0.94143 | 0.94143 | 0.66557 | 0.66557 | 0.41978 | 0.41978 | 0.29678 | 0.29678 | 2.51738 | 2.51738 | 2015 | 2015年 |
60 | 60 | Iceland | 冰岛 | Western Europe | 西欧 | 2.0 | 2.0 | 7.561 | 7.561 | 1.30232 | 1.30232 | 1.40223 | 1.40223 | 0.94784 | 0.94784 | 0.62877 | 0.62877 | 0.14145 | 0.14145 | 0.43630 | 0.43630 | 2.70201 | 2.70201 | 2015 | 2015年 |
38 | 38 | Denmark | 丹麦 | Western Europe | 西欧 | 3.0 | 3.0 | 7.527 | 7.527 | 1.32548 | 1.32548 | 1.36058 | 1.36058 | 0.87464 | 0.87464 | 0.64938 | 0.64938 | 0.48357 | 0.48357 | 0.34139 | 0.34139 | 2.49204 | 2.49204 | 2015 | 2015年 |
108 | 108 | Norway | 挪威 | Western Europe | 西欧 | 4.0 | 4.0 | 7.522 | 7.522 | 1.45900 | 1.45900 | 1.33095 | 1.33095 | 0.88521 | 0.88521 | 0.66973 | 0.66973 | 0.36503 | 0.36503 | 0.34699 | 0.34699 | 2.46531 | 2.46531 | 2015 | 2015年 |
25 | 25 | Canada | 加拿大 | North America | 北美 | 5.0 | 5.0 | 7.427 | 7.427 | 1.32629 | 1.32629 | 1.32261 | 1.32261 | 0.90563 | 0.90563 | 0.63297 | 0.63297 | 0.32957 | 0.32957 | 0.45811 | 0.45811 | 2.45176 | 2.45176 | 2015 | 2015年 |
46 | 46 | Finland | 芬兰 | Western Europe | 西欧 | 6.0 | 6.0 | 7.406 | 7.406 | 1.29025 | 1.29025 | 1.31826 | 1.31826 | 0.88911 | 0.88911 | 0.64169 | 0.64169 | 0.41372 | 0.41372 | 0.23351 | 0.23351 | 2.61955 | 2.61955 | 2015 | 2015年 |
102 | 102 | Netherlands | 荷兰 | Western Europe | 西欧 | 7.0 | 7.0 | 7.378 | 7.378 | 1.32944 | 1.32944 | 1.28017 | 1.28017 | 0.89284 | 0.89284 | 0.61576 | 0.61576 | 0.31814 | 0.31814 | 0.47610 | 0.47610 | 2.46570 | 2.46570 | 2015 | 2015年 |
140 | 140 | Sweden | 瑞典 | Western Europe | 西欧 | 8.0 | 8.0 | 7.364 | 7.364 | 1.33171 | 1.33171 | 1.28907 | 1.28907 | 0.91087 | 0.91087 | 0.65980 | 0.65980 | 0.43844 | 0.43844 | 0.36262 | 0.36262 | 2.37119 | 2.37119 | 2015 | 2015年 |
103 | 103 | New Zealand | 新西兰 | Australia and New Zealand | 澳大利亚和新西兰 | 9.0 | 9.0 | 7.286 | 7.286 | 1.25018 | 1.25018 | 1.31967 | 1.31967 | 0.90837 | 0.90837 | 0.63938 | 0.63938 | 0.42922 | 0.42922 | 0.47501 | 0.47501 | 2.26425 | 2.26425 | 2015 | 2015年 |
6 | 6 | Australia | 澳大利亚 | Australia and New Zealand | 澳大利亚和新西兰 | 10.0 | 10.0 | 7.284 | 7.284 | 1.33358 | 1.33358 | 1.30923 | 1.30923 | 0.93156 | 0.93156 | 0.65124 | 0.65124 | 0.35637 | 0.35637 | 0.43562 | 0.43562 | 2.26646 | 2.26646 | 2015 | 2015年 |
Each country’s Happiness Score
is calculated by summing the seven other variables in the table. Each of these variables reveals a population-weighted average score on a scale running from 0 to 10, that is tracked over time and compared against other countries.
每个国家的Happiness Score
是通过将表格中的其他七个变量相加得出的。 这些变量中的每一个都揭示了人口加权的平均得分,范围从0到10,随时间推移进行追踪,并与其他国家进行比较。
These variables are:
这些变量是:
Economy
: real GDP per capitaFamily
: social supportHealth
: healthy life expectancyFreedom
: freedom to make life choicesTrust
: perceptions of corruptionGenerosity
: perceptions of generosityDystopia
: each country is compared against a hypothetical nation that represents the lowest national averages for each key variable and is, along with residual error, used as a regression benchmarkEconomy
:人均实际国内生产总值 Family
:社会支持 Health
:健康的预期寿命 Freedom
:自由选择生活 Trust
:对腐败的看法 Generosity
:对慷慨的看法 Dystopia
:将每个国家与一个假设国家进行比较,该国家代表每个关键变量的最低全国平均水平,并与剩余误差一起用作回归基准 Each country’s Happiness Score
determines its Happiness Rank
– which is its relative position among other countries in that specific year. For example, the first row indicates that Switzerland was ranked the happiest country in 2015 with a happiness score of 7.587. Switzerland was ranked first just before Iceland, which scored 7.561. Denmark was ranked third in 2015, and so on. It’s interesting to note that Western Europe took seven of the top eight rankings in 2015.
每个国家的Happiness Score
决定其Happiness Rank
-这是该年在其他国家中的相对排名。 例如,第一行表示瑞士在2015年的幸福感得分为7.587,是最幸福的国家。 瑞士排名第一,仅次于冰岛,得分为7.561。 丹麦在2015年排名第三,依此类推。 有趣的是,西欧在2015年的前八名中排名七。
We’ll concentrate on the final Happiness Score
to demonstrate the technical aspects of pivot table.
我们将集中在最终的Happiness Score
以演示数据透视表的技术方面。
Our data has 495 rows and 12 columns
Are there missing values? True
Happiness Rank | 幸福等级 | Happiness Score | 幸福分数 | Economy (GDP per Capita) | 经济(人均GDP) | Family | 家庭 | Health (Life Expectancy) | 健康(预期寿命) | Freedom | 自由 | Trust (Government Corruption) | 信任(政府腐败) | Generosity | 慷慨大方 | Dystopia Residual | 反乌托邦残渣 | Year | 年 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 计数 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 470.000000 | 495.000000 | 495.000000 |
mean | 意思 | 78.829787 | 78.829787 | 5.370728 | 5.370728 | 0.927830 | 0.927830 | 0.990347 | 0.990347 | 0.579968 | 0.579968 | 0.402828 | 0.402828 | 0.134790 | 0.134790 | 0.242241 | 0.242241 | 2.092717 | 2.092717 | 2016.000000 | 2016.000000 |
std | 性病 | 45.281408 | 45.281408 | 1.136998 | 1.136998 | 0.415584 | 0.415584 | 0.318707 | 0.318707 | 0.240161 | 0.240161 | 0.150356 | 0.150356 | 0.111313 | 0.111313 | 0.131543 | 0.131543 | 0.565772 | 0.565772 | 0.817323 | 0.817323 |
min | 分 | 1.000000 | 1.000000 | 2.693000 | 2.693000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.328580 | 0.328580 | 2015.000000 | 2015.000000 |
25% | 25% | 40.000000 | 40.000000 | 4.509000 | 4.509000 | 0.605292 | 0.605292 | 0.793000 | 0.793000 | 0.402301 | 0.402301 | 0.297615 | 0.297615 | 0.059777 | 0.059777 | 0.152831 | 0.152831 | 1.737975 | 1.737975 | 2015.000000 | 2015.000000 |
50% | 50% | 79.000000 | 79.000000 | 5.282500 | 5.282500 | 0.995439 | 0.995439 | 1.025665 | 1.025665 | 0.630053 | 0.630053 | 0.418347 | 0.418347 | 0.099502 | 0.099502 | 0.223140 | 0.223140 | 2.094640 | 2.094640 | 2016.000000 | 2016.000000 |
75% | 75% | 118.000000 | 118.000000 | 6.233750 | 6.233750 | 1.252443 | 1.252443 | 1.228745 | 1.228745 | 0.768298 | 0.768298 | 0.516850 | 0.516850 | 0.173161 | 0.173161 | 0.315824 | 0.315824 | 2.455575 | 2.455575 | 2017.000000 | 2017.000000 |
max | 最高 | 158.000000 | 158.000000 | 7.587000 | 7.587000 | 1.870766 | 1.870766 | 1.610574 | 1.610574 | 1.025250 | 1.025250 | 0.669730 | 0.669730 | 0.551910 | 0.551910 | 0.838075 | 0.838075 | 3.837720 | 3.837720 | 2017.000000 | 2017.000000 |
The describe() method reveals that Happiness Rank
ranges from 1 to 158, which means that the largest number of surveyed countries for a given year was 158. It’s worth noting that Happiness Rank
was originally of type int
. The fact it’s displayed as a float here implies we have NaN
values in this column (we can also determine this by the count
row which only amounts to 470 as opposed to the 495 rows in our data set).
describe()方法显示, Happiness Rank
范围是1到158,这意味着在给定年份中被调查的国家最多,为158。值得注意的是, Happiness Rank
最初是int
类型的。 它在此处显示为浮点数的事实意味着我们在此列中具有NaN
值(我们也可以通过count
行(仅相当于470行而不是数据集中的495行)来确定该值)。
The Year
column doesn’t have any missing values. Firstly, because it’s displayed in the data set as int
, but also – the count for Year
amounts to 495 which is the number of rows in our data set. By comparing the count
value for Year
to the other columns, it seems we can expect 25 missing values in each column (495 in Year
VS. 470 in all other columns).
Year
列没有任何缺失值。 首先,由于它在数据集中显示为int
,而且-– Year
的总数为495,这是我们数据集中的行数。 通过将Year
的count
数值与其他列进行比较,我们似乎可以预期每列中有25个缺失值( Year
VS470。所有其他列为470)。
Year
和Region
对数据进行分类 (Categorizing the data by Year
and Region
)The fun thing about pandas pivot_table
is you can get another point of view on your data with only one line of code. Most of the pivot_table
parameters use default values, so the only mandatory parameters you must add are data
and index
. Though it isn’t mandatory, we’ll also use the value
parameter in the next example.
熊猫pivot_table
的有趣之处在于,您只需要一行代码就可以在数据上获得另一种观点。 大多数pivot_table
参数使用默认值,因此必须添加的唯一必需参数是data
和index
。 尽管不是强制性的,但在下一个示例中我们还将使用value
参数。
data
is self explanatory – it’s the DataFrame you’d like to useindex
is the column, grouper, array (or list of the previous) you’d like to group your data by. It will be displayed in the index column (or columns, if you’re passing in a list)values
(optional) is the column you’d like to aggregate. If you do not specify this then the function will aggregate all numeric columns.data
是不言自明的–这是您要使用的DataFrame index
是您要对数据进行分组的列,分组器,数组(或上一个列表)。 它将显示在索引列中(如果要传递列表,则显示在列中) values
(可选)是您要汇总的列。 如果您未指定此选项,则该函数将汇总所有数字列。 Let’s first look at the output, and then explain how the table was produced:
让我们首先看一下输出,然后解释该表是如何产生的:
pdpd .. pivot_tablepivot_table (( datadata , , indexindex = = 'Year''Year' , , valuesvalues = = "Happiness Score""Happiness Score" )
)
Happiness Score | 幸福分数 | ||
---|---|---|---|
Year | 年 | ||
2015 | 2015年 | 5.375734 | 5.375734 |
2016 | 2016年 | 5.382185 | 5.382185 |
2017 | 2017年 | 5.354019 | 5.354019 |
By passing Year
as the index
parameter, we chose to group our data by Year
. The output is a pivot table that displays the three different values for Year
as index
, and the Happiness Score
as values
. It’s worth noting that the aggregation default value is mean (or average), so the values displayed in the Happiness Score
column are the yearly average for all countries. The table shows the average for all countries was highest in 2016, and is currently the lowest in the past three years.
通过将Year
作为index
参数,我们选择按Year
对数据进行分组。 输出是一个数据透视表,该表将Year
的三个不同值显示为index
,将Happiness Score
为values
。 值得注意的是,聚合默认值是平均值(或平均值),因此“ Happiness Score
列中显示的值是所有国家/地区的年平均值。 该表显示所有国家/地区的平均值在2016年最高,目前是过去三年中最低的。
Here’s a detailed diagram of how this pivot table was created:
以下是此数据透视表的创建方式的详细示意图:
Next, let’s use the Region
column as index
:
接下来,让我们使用Region
列作为index
:
Happiness Score | 幸福分数 | ||
---|---|---|---|
Region | 地区 | ||
Australia and New Zealand | 澳大利亚和新西兰 | 7.302500 | 7.302500 |
Central and Eastern Europe | 中欧和东欧 | 5.371184 | 5.371184 |
Eastern Asia | 东亚 | 5.632333 | 5.632333 |
Latin America and Caribbean | 拉丁美洲和加勒比 | 6.069074 | 6.069074 |
Middle East and Northern Africa | 中东和北非 | 5.387879 | 5.387879 |
North America | 北美 | 7.227167 | 7.227167 |
Southeastern Asia | 东南亚 | 5.364077 | 5.364077 |
Southern Asia | 南亚 | 4.590857 | 4.590857 |
Sub-Saharan Africa | 撒哈拉以南非洲 | 4.150957 | 4.150957 |
Western Europe | 西欧 | 6.693000 | 6.693000 |
The numbers displayed in the Happiness Score
column in the pivot table above are the mean, just as before – but this time it’s each region’s mean for all years documented (2015, 2016, 2017). This display makes it easier to see Australia and New Zealand
have the highest average score, while North America
is ranked close behind. It’s interesting that despite the initial impression we got from reading the data, which showed Western Europe
in most of the top places, Western Europe
is actually ranked third when calculating the average for the past three years. The lowest ranked region is Sub-Saharan Africa
, and close behind is Southern Asia
.
像以前一样,在上方数据透视表的“ Happiness Score
列中显示的数字是平均值,但这一次是记录的所有年份(2015、2016、2017)的每个区域的平均值。 通过此显示,可以更轻松地看到Australia and New Zealand
的平均得分最高,而North America
排名第二。 有趣的是,尽管我们从读取的数据中获得了最初的印象,该数据显示Western Europe
在大多数Western Europe
中排名第一,但在计算过去三年的平均值时, Western Europe
实际上排名第三。 排名最低的地区是Sub-Saharan Africa
,紧随其后的是Southern Asia
。
You may have used groupby()
to achieve some of the pivot table functionality (we’ve previously demonstrated how to use groupby() to analyze your data). However, the pivot_table()
built-in function offers straightforward parameter names and default values that can help simplify complex procedures like multi-indexing.
您可能已经使用groupby()
来实现某些数据透视表功能(我们之前已经演示了如何使用groupby()分析数据)。 但是, pivot_table()
内置函数提供了直接的参数名称和默认值,可以帮助简化诸如多索引之类的复杂过程。
In order to group the data by more than one column, all we have to do is pass in a list of column names. Let’s categorize the data by Region
and Year
.
为了将数据按不止一列进行分组,我们要做的就是传递列名列表。 让我们按Region
和Year
对数据进行分类。
pdpd .. pivot_tablepivot_table (( datadata , , index index = = [[ 'Region''Region' , , 'Year''Year' ], ], valuesvalues == "Happiness Score""Happiness Score" )
)
Happiness Score | 幸福分数 | ||||
---|---|---|---|---|---|
Region | 地区 | Year | 年 | ||
Australia and New Zealand | 澳大利亚和新西兰 | 2015 | 2015年 | 7.285000 | 7.285000 |
2016 | 2016年 | 7.323500 | 7.323500 | ||
2017 | 2017年 | 7.299000 | 7.299000 | ||
Central and Eastern Europe | 中欧和东欧 | 2015 | 2015年 | 5.332931 | 5.332931 |
2016 | 2016年 | 5.370690 | 5.370690 | ||
2017 | 2017年 | 5.409931 | 5.409931 | ||
Eastern Asia | 东亚 | 2015 | 2015年 | 5.626167 | 5.626167 |
2016 | 2016年 | 5.624167 | 5.624167 | ||
2017 | 2017年 | 5.646667 | 5.646667 | ||
Latin America and Caribbean | 拉丁美洲和加勒比 | 2015 | 2015年 | 6.144682 | 6.144682 |
2016 | 2016年 | 6.101750 | 6.101750 | ||
2017 | 2017年 | 5.957818 | 5.957818 | ||
Middle East and Northern Africa | 中东和北非 | 2015 | 2015年 | 5.406900 | 5.406900 |
2016 | 2016年 | 5.386053 | 5.386053 | ||
2017 | 2017年 | 5.369684 | 5.369684 | ||
North America | 北美 | 2015 | 2015年 | 7.273000 | 7.273000 |
2016 | 2016年 | 7.254000 | 7.254000 | ||
2017 | 2017年 | 7.154500 | 7.154500 | ||
Southeastern Asia | 东南亚 | 2015 | 2015年 | 5.317444 | 5.317444 |
2016 | 2016年 | 5.338889 | 5.338889 | ||
2017 | 2017年 | 5.444875 | 5.444875 | ||
Southern Asia | 南亚 | 2015 | 2015年 | 4.580857 | 4.580857 |
2016 | 2016年 | 4.563286 | 4.563286 | ||
2017 | 2017年 | 4.628429 | 4.628429 | ||
Sub-Saharan Africa | 撒哈拉以南非洲 | 2015 | 2015年 | 4.202800 | 4.202800 |
2016 | 2016年 | 4.136421 | 4.136421 | ||
2017 | 2017年 | 4.111949 | 4.111949 | ||
Western Europe | 西欧 | 2015 | 2015年 | 6.689619 | 6.689619 |
2016 | 2016年 | 6.685667 | 6.685667 | ||
2017 | 2017年 | 6.703714 | 6.703714 |
These examples also reveal where pivot table got its name from: it allows you to rotate or pivot the summary table, and this rotation gives us a different perspective of the data. A perspective that can very well help you quickly gain valuable insights.
这些示例还揭示了数据透视表的名称来自何处:它允许您旋转或旋转汇总表,并且这种旋转为我们提供了数据的不同视角。 可以很好地帮助您快速获得宝贵见解的观点。
This is one way to look at the data, but we can use the columns
parameter to get a better display:
这是查看数据的一种方法,但是我们可以使用columns
参数来获得更好的显示:
columns
is the column, grouper, array, or list of the previous you’d like to group your data by. Using it will spread the different values horizontally.columns
是您想要对数据进行分组的前一个列,分组器,数组或列表。 使用它会水平分散不同的值。 Using Year
as the Columns
argument will display the different values for year
, and will make for a much better display, like so:
使用Year
作为Columns
参数将显示year
的不同值,并使显示效果更好,如下所示:
Year | 年 | 2015 | 2015年 | 2016 | 2016年 | 2017 | 2017年 |
---|---|---|---|---|---|---|---|
Region | 地区 | ||||||
Australia and New Zealand | 澳大利亚和新西兰 | 7.285000 | 7.285000 | 7.323500 | 7.323500 | 7.299000 | 7.299000 |
Central and Eastern Europe | 中欧和东欧 | 5.332931 | 5.332931 | 5.370690 | 5.370690 | 5.409931 | 5.409931 |
Eastern Asia | 东亚 | 5.626167 | 5.626167 | 5.624167 | 5.624167 | 5.646667 | 5.646667 |
Latin America and Caribbean | 拉丁美洲和加勒比 | 6.144682 | 6.144682 | 6.101750 | 6.101750 | 5.957818 | 5.957818 |
Middle East and Northern Africa | 中东和北非 | 5.406900 | 5.406900 | 5.386053 | 5.386053 | 5.369684 | 5.369684 |
North America | 北美 | 7.273000 | 7.273000 | 7.254000 | 7.254000 | 7.154500 | 7.154500 |
Southeastern Asia | 东南亚 | 5.317444 | 5.317444 | 5.338889 | 5.338889 | 5.444875 | 5.444875 |
Southern Asia | 南亚 | 4.580857 | 4.580857 | 4.563286 | 4.563286 | 4.628429 | 4.628429 |
Sub-Saharan Africa | 撒哈拉以南非洲 | 4.202800 | 4.202800 | 4.136421 | 4.136421 | 4.111949 | 4.111949 |
Western Europe | 西欧 | 6.689619 | 6.689619 | 6.685667 | 6.685667 | 6.703714 | 6.703714 |
plot()
可视化数据透视表 (Visualizing the pivot table using plot()
)If you want to look at the visual representation of the previous pivot table we created, all you need to do is add plot()
at the end of the pivot_table
function call (you’ll also need to import the relevant plotting libraries).
如果要查看我们创建的上一个数据透视表的外观,只需在pivot_table
函数调用的末尾添加plot()
(您还需要导入相关的绘图库)。
%% matplotlib inline
matplotlib inline
import import matplotlib.pyplot matplotlib.pyplot as as plt
plt
import import seaborn seaborn as as sns
sns
# use Seaborn styles
# use Seaborn styles
snssns .. setset ()
()
pdpd .. pivot_tablepivot_table (( datadata , , indexindex = = 'Region''Region' , , columnscolumns = = 'Year''Year' , , valuesvalues = = "Happiness Score""Happiness Score" )) .. plotplot (( kindkind = = 'bar''bar' )
)
pltplt .. ylabelylabel (( "Happiness Rank""Happiness Rank" )
)
The visual representation helps reveal that the differences are minor. Having said that, this also shows that there’s a permanent decrease in the Happiness rank of both of the regions located in America.
视觉表示有助于揭示差异很小。 话虽如此,这也表明位于美国的两个地区的幸福度都在持续下降。
aggfunc
处理数据 (Manipulating the data using aggfunc
)Up until now we’ve used the average to get insights about the data, but there are other important values to consider. Time to experiment with the aggfunc
parameter:
到目前为止,我们已经使用平均值来获取有关数据的见解,但是还需要考虑其他重要值。 是时候尝试使用aggfunc
参数了:
aggfunc
(optional) accepts a function or list of functions you’d like to use on your group (default: numpy.mean
). If a list of functions is passed, the resulting pivot table will have hierarchical columns whose top level are the function names.aggfunc
(可选)接受您要在组中使用的功能或功能列表(默认值: numpy.mean
)。 如果传递了函数列表,则结果数据透视表将具有层次结构列,其顶级是函数名称。 Let’s add the median, minimum, maximum, and the standard deviation for each region. This can help us evaluate how accurate the average is, and if it’s really representative of the real picture.
让我们添加每个区域的中位数,最小值,最大值和标准偏差。 这可以帮助我们评估平均值的准确性,以及它是否真的可以代表真实情况。
mean | 意思 | median | 中位数 | min | 分 | max | 最高 | std | 性病 | ||
---|---|---|---|---|---|---|---|---|---|---|---|
Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | ||
Region | 地区 | ||||||||||
Australia and New Zealand | 澳大利亚和新西兰 | 7.302500 | 7.302500 | 7.2995 | 7.2995 | 7.284 | 7.284 | 7.334 | 7.334 | 0.020936 | 0.020936 |
Central and Eastern Europe | 中欧和东欧 | 5.371184 | 5.371184 | 5.4010 | 5.4010 | 4.096 | 4.096 | 6.609 | 6.609 | 0.578274 | 0.578274 |
Eastern Asia | 东亚 | 5.632333 | 5.632333 | 5.6545 | 5.6545 | 4.874 | 4.874 | 6.422 | 6.422 | 0.502100 | 0.502100 |
Latin America and Caribbean | 拉丁美洲和加勒比 | 6.069074 | 6.069074 | 6.1265 | 6.1265 | 3.603 | 3.603 | 7.226 | 7.226 | 0.728157 | 0.728157 |
Middle East and Northern Africa | 中东和北非 | 5.387879 | 5.387879 | 5.3175 | 5.3175 | 3.006 | 3.006 | 7.278 | 7.278 | 1.031656 | 1.031656 |
North America | 北美 | 7.227167 | 7.227167 | 7.2175 | 7.2175 | 6.993 | 6.993 | 7.427 | 7.427 | 0.179331 | 0.179331 |
Southeastern Asia | 东南亚 | 5.364077 | 5.364077 | 5.2965 | 5.2965 | 3.819 | 3.819 | 6.798 | 6.798 | 0.882637 | 0.882637 |
Southern Asia | 南亚 | 4.590857 | 4.590857 | 4.6080 | 4.6080 | 3.360 | 3.360 | 5.269 | 5.269 | 0.535978 | 0.535978 |
Sub-Saharan Africa | 撒哈拉以南非洲 | 4.150957 | 4.150957 | 4.1390 | 4.1390 | 2.693 | 2.693 | 5.648 | 5.648 | 0.584945 | 0.584945 |
Western Europe | 西欧 | 6.693000 | 6.693000 | 6.9070 | 6.9070 | 4.857 | 4.857 | 7.587 | 7.587 | 0.777886 | 0.777886 |
Looks like some regions have extreme values that might affect our average more than we’d like them to. For example, Middle East and Northern Africa
region have a high standard deviation, so we might want to remove extreme values. Let’s see how many values we’re calculating for each region. This might affect the representation we’re seeing. For example, Australia and new Zealand
have a very low standard deviation and are ranked happiest for all three years, but we can also assume they only account for two countries.
看起来有些地区的极端价值可能会影响我们的平均水平,而不是我们希望的那样。 例如, Middle East and Northern Africa
地区的标准差较高,因此我们可能要删除极值。 让我们看看我们正在为每个区域计算多少个值。 这可能会影响我们所看到的表示形式。 例如, Australia and new Zealand
标准差非常低,在过去三年中排名最高,但是我们也可以假设它们仅占两个国家。
pivot_table
allows you to pass your own custom aggregation functions as arguments. You can either use a lambda function, or create a function. Let’s calculate the average number of countries in each region in a given year. We can do this easily using a lambda function, like so:
pivot_table
允许您传递自己的自定义聚合函数作为参数。 您可以使用lambda函数,也可以创建一个函数。 让我们计算给定年份中每个区域的平均国家/地区数量。 我们可以使用lambda函数轻松完成此操作,如下所示:
pdpd .. pivot_tablepivot_table (( datadata , , index index = = 'Region''Region' , , valuesvalues == "Happiness Score""Happiness Score" ,
,
aggfuncaggfunc = = [[ npnp .. meanmean , , minmin , , maxmax , , npnp .. stdstd , , lambda lambda xx : : xx .. countcount ()() // 33 ])
])
mean | 意思 | min | 分 | max | 最高 | std | 性病 | |
|||
---|---|---|---|---|---|---|---|---|---|---|---|
Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | ||
Region | 地区 | ||||||||||
Australia and New Zealand | 澳大利亚和新西兰 | 7.302500 | 7.302500 | 7.284 | 7.284 | 7.334 | 7.334 | 0.020936 | 0.020936 | 2.000000 | 2.000000 |
Central and Eastern Europe | 中欧和东欧 | 5.371184 | 5.371184 | 4.096 | 4.096 | 6.609 | 6.609 | 0.578274 | 0.578274 | 29.000000 | 29.000000 |
Eastern Asia | 东亚 | 5.632333 | 5.632333 | 4.874 | 4.874 | 6.422 | 6.422 | 0.502100 | 0.502100 | 6.000000 | 600万 |
Latin America and Caribbean | 拉丁美洲和加勒比 | 6.069074 | 6.069074 | 3.603 | 3.603 | 7.226 | 7.226 | 0.728157 | 0.728157 | 22.666667 | 22.666667 |
Middle East and Northern Africa | 中东和北非 | 5.387879 | 5.387879 | 3.006 | 3.006 | 7.278 | 7.278 | 1.031656 | 1.031656 | 19.333333 | 19.333333 |
North America | 北美 | 7.227167 | 7.227167 | 6.993 | 6.993 | 7.427 | 7.427 | 0.179331 | 0.179331 | 2.000000 | 2.000000 |
Southeastern Asia | 东南亚 | 5.364077 | 5.364077 | 3.819 | 3.819 | 6.798 | 6.798 | 0.882637 | 0.882637 | 8.666667 | 8.666667 |
Southern Asia | 南亚 | 4.590857 | 4.590857 | 3.360 | 3.360 | 5.269 | 5.269 | 0.535978 | 0.535978 | 7.000000 | 700万 |
Sub-Saharan Africa | 撒哈拉以南非洲 | 4.150957 | 4.150957 | 2.693 | 2.693 | 5.648 | 5.648 | 0.584945 | 0.584945 | 39.000000 | 39.000000 |
Western Europe | 西欧 | 6.693000 | 6.693000 | 4.857 | 4.857 | 7.587 | 7.587 | 0.777886 | 0.777886 | 21.000000 | 21.000000 |
Both highest ranking regions with the lowest standard deviation only account for only two countries. Sub-Saharan Africa
, on the other hand, has the lowest Happiness score
, but it accounts for 43 countries. An interesting next step would be to remove extreme values from the calculation to see if the ranking changes significantly. Let’s create a function that only calculates the values that are between the 0.25th and 0.75th quantiles. We’ll use this function as a way to calculate the average for each region and check if the ranking stays the same or not.
具有最低标准偏差的两个排名最高的区域仅占两个国家。 另一方面, Sub-Saharan Africa
的Happiness score
最低,但是却占43个国家的一半。 有趣的下一步是从计算中删除极值,以查看排名是否发生重大变化。 让我们创建一个仅计算介于第0.25位和第0.75位之间的值的函数。 我们将使用此函数作为一种方法来计算每个区域的平均值,并检查排名是否保持不变。
mean | 意思 | remove_outliers | remove_outliers | |
|||
---|---|---|---|---|---|---|---|
Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | ||
Region | 地区 | ||||||
Australia and New Zealand | 澳大利亚和新西兰 | 7.302500 | 7.302500 | 7.299125 | 7.299125 | 2.000000 | 2.000000 |
Central and Eastern Europe | 中欧和东欧 | 5.371184 | 5.371184 | 5.449250 | 5.449250 | 29.000000 | 29.000000 |
Eastern Asia | 东亚 | 5.632333 | 5.632333 | 5.610125 | 5.610125 | 6.000000 | 600万 |
Latin America and Caribbean | 拉丁美洲和加勒比 | 6.069074 | 6.069074 | 6.192750 | 6.192750 | 22.666667 | 22.666667 |
Middle East and Northern Africa | 中东和北非 | 5.387879 | 5.387879 | 5.508500 | 5.508500 | 19.333333 | 19.333333 |
North America | 北美 | 7.227167 | 7.227167 | 7.244875 | 7.244875 | 2.000000 | 2.000000 |
Southeastern Asia | 东南亚 | 5.364077 | 5.364077 | 5.470125 | 5.470125 | 8.666667 | 8.666667 |
Southern Asia | 南亚 | 4.590857 | 4.590857 | 4.707500 | 4.707500 | 7.000000 | 700万 |
Sub-Saharan Africa | 撒哈拉以南非洲 | 4.150957 | 4.150957 | 4.128000 | 4.128000 | 39.000000 | 39.000000 |
Western Europe | 西欧 | 6.693000 | 6.693000 | 6.846500 | 6.846500 | 21.000000 | 21.000000 |
Removing the outliers mostly affected the regions with a higher number of countries, which makes sense. We can see Western Europe
(average of 21 countries surveyed per year) improved its ranking. Unfortunately, Sub-Saharan Africa
(average of 39 countries surveyed per year) received an even lower ranking when we removed the outliers.
消除异常值主要影响了具有更多国家的区域,这是有道理的。 我们可以看到Western Europe
(每年接受调查的平均21个国家/地区)的排名有所提高。 不幸的是,当我们剔除异常值时, Sub-Saharan Africa
(每年接受调查的39个国家/地区)的排名甚至更低。
Up until now we’ve grouped our data according to the categories in the original table. However, we can search the strings in the categories to create our own groups. For example, it would be interesting to look at the results by continents. We can do this by looking for region names that contains Asia
, Europe
, etc. To do this, we can first assign our pivot table to a variable, and then add our filter:
到目前为止,我们已经根据原始表中的类别对数据进行了分组。 但是,我们可以搜索类别中的字符串以创建我们自己的组。 例如,按大洲查看结果将很有趣。 我们可以通过查找包含Asia
, Europe
等的区域名称来完成此操作。为此,我们可以先将数据透视表分配给变量,然后添加过滤器:
table table = = pdpd .. pivot_tablepivot_table (( datadata , , index index = = 'Region''Region' , , valuesvalues == "Happiness Score""Happiness Score" ,
,
aggfuncaggfunc = = [[ npnp .. meanmean , , remove_outliersremove_outliers ])
])
tabletable [[ tabletable .. indexindex .. strstr .. containscontains (( 'Asia''Asia' )]
)]
mean | 意思 | remove_outliers | remove_outliers | ||
---|---|---|---|---|---|
Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | ||
Region | 地区 | ||||
Eastern Asia | 东亚 | 5.632333 | 5.632333 | 5.610125 | 5.610125 |
Southeastern Asia | 东南亚 | 5.364077 | 5.364077 | 5.470125 | 5.470125 |
Southern Asia | 南亚 | 4.590857 | 4.590857 | 4.707500 | 4.707500 |
Let’s see the results for Europe
:
让我们看看Europe
的结果:
mean | 意思 | remove_outliers | remove_outliers | ||
---|---|---|---|---|---|
Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | ||
Region | 地区 | ||||
Central and Eastern Europe | 中欧和东欧 | 5.371184 | 5.371184 | 5.44925 | 5.44925 |
Western Europe | 西欧 | 6.693000 | 6.693000 | 6.84650 | 6.84650 |
The difference shows that the two European regions have a larger difference in happiness score. In most cases, removing outliers makes the score higher, but not in Eastern Asia.
差异表明,两个欧洲地区的幸福感得分差异较大。 在大多数情况下,除去异常值会使得分更高,但在东亚则不然。
If you’d like to extract specific values from more than one column, then it’s better to use df.query
because the previous method won’t work for conditioning multi-indexes. For example, we can choose to view specific years, and specific regions in the Africa area.
如果要从多个列中提取特定值,则最好使用df.query
因为前一种方法不适用于条件化多索引。 例如,我们可以选择查看非洲的特定年份和特定区域。
table table = = pdpd .. pivot_tablepivot_table (( datadata , , index index = = [[ 'Region''Region' , , 'Year''Year' ], ], valuesvalues == 'Happiness Score''Happiness Score' ,
,
aggfuncaggfunc = = [[ npnp .. meanmean , , remove_outliersremove_outliers ])
])
tabletable .. queryquery (( 'Year == [2015, 2017] and Region == ["Sub-Saharan Africa", "Middle East and Northern Africa"]''Year == [2015, 2017] and Region == ["Sub-Saharan Africa", "Middle East and Northern Africa"]' )
)
mean | 意思 | remove_outliers | remove_outliers | ||||
---|---|---|---|---|---|---|---|
Happiness Score | 幸福分数 | Happiness Score | 幸福分数 | ||||
Region | 地区 | Year | 年 | ||||
Middle East and Northern Africa | 中东和北非 | 2015 | 2015年 | 5.406900 | 5.406900 | 5.515875 | 5.515875 |
2017 | 2017年 | 5.369684 | 5.369684 | 5.425500 | 5.425500 | ||
Sub-Saharan Africa | 撒哈拉以南非洲 | 2015 | 2015年 | 4.202800 | 4.202800 | 4.168375 | 4.168375 |
2017 | 2017年 | 4.111949 | 4.111949 | 4.118000 | 4.118000 |
In this example the differences are minor, but an interesting exercise would be to compare information from previous years since the survey has reports since 2012.
在此示例中,差异很小,但是一个有趣的练习是比较前几年的信息,因为该调查自2012年以来都有报告。
We’ve covered the most powerful parameters of pivot_table
thus far, so you can already get a lot out of it if you go experiment using this method on your own project. Having said that, it’s useful to quickly go through the remaining parameters (which are all optional and have default values). The first thing to talk about is missing values.
到目前为止,我们已经介绍了数据pivot_table
最强大的参数,因此,如果您在自己的项目中使用此方法进行实验,则已经可以从中pivot_table
。 话虽如此,快速浏览其余参数(都是可选参数并具有默认值)很有用。 首先要谈的是缺失值。
dropna
is type boolean, and used to indicate you do not want to include columns whose entries are all NaN
(default: True)fill_value
is type scalar, and used to choose a value to replace missing values (default: None).dropna
是布尔类型,用于表示您不想包括所有条目均为NaN
列(默认值:True) fill_value
是标量类型,用于选择一个值来替换缺少的值(默认值:无)。 We don’t have any columns where all entries are NaN
, but it’s worth knowing that if we did pivot_table
would drop them by default according to dropna
definition.
我们没有所有条目均为NaN
列,但是值得一提的是,如果我们这样做, pivot_table
将根据dropna
定义默认删除它们。
We have been letting pivot_table
treat our NaN
’s according to the default settings. The fill_value
default value is None
so this means we didn’t replace missing values in our Data set. To demonstrate this we’ll need to produce a pivot table with NaN
values. We can split the Happiness Score
of each region into three quantiles, and check how many countries fall into each of the three quantiles (hoping at least one of the quantiles will have missing values in it).
我们一直在根据默认设置让pivot_table
处理我们的NaN
。 fill_value
默认值为None
,这意味着我们没有替换数据集中缺少的值。 为了证明这一点,我们需要生成一个具有NaN
值的数据透视表。 我们可以将每个区域的Happiness Score
分为三个分位数,并检查有多少国家属于这三个分位数(希望至少一个分位数中会有缺失值)。
To do this, we’ll use qcut()
, which is a built-in pandas function that allows you to split your data into any number of quantiles you choose. For example, specifying pd.qcut(data["Happiness Score"], 4)
will result in four quantiles:
为此,我们将使用qcut()
,它是内置的pandas函数,可让您将数据拆分为任意数量的分位数。 例如,指定pd.qcut(data["Happiness Score"], 4)
将产生四个分位数:
Happiness Score | 幸福分数 | ||||
---|---|---|---|---|---|
Region | 地区 | Happiness Score | 幸福分数 | ||
Australia and New Zealand | 澳大利亚和新西兰 | (2.692, 4.509] | (2.692,4.509] | NaN | N |
(4.509, 5.283] | (4.509,5.283] | NaN | N | ||
(5.283, 6.234] | (5.283,6.234] | NaN | N | ||
(6.234, 7.587] | (6.234,7.587] | 6.0 | 6.0 | ||
Central and Eastern Europe | 中欧和东欧 | (2.692, 4.509] | (2.692,4.509] | 10.0 | 10.0 |
(4.509, 5.283] | (4.509,5.283] | 28.0 | 28.0 | ||
(5.283, 6.234] | (5.283,6.234] | 46.0 | 46.0 | ||
(6.234, 7.587] | (6.234,7.587] | 3.0 | 3.0 | ||
Eastern Asia | 东亚 | (2.692, 4.509] | (2.692,4.509] | NaN | N |
Regions where there are no countries in a specific quantile show NaN
. This isn’t ideal because a count that equals NaN doesn’t give us any useful information. It’s less confusing to display 0
, so let’s substitute NaN
by zeros using fill_value
:
在特定分位数中没有国家的区域显示NaN
。 这不是理想的,因为等于NaN的计数不会提供任何有用的信息。 显示0
不太容易混淆,因此让我们使用fill_value
将NaN
替换为零:
# splitting the happiness score into 3 quantiles
# splitting the happiness score into 3 quantiles
score score = = pdpd .. qcutqcut (( datadata [[ "Happiness Score""Happiness Score" ], ], 33 )
)
pdpd .. pivot_tablepivot_table (( datadata , , indexindex = = [[ 'Region''Region' , , scorescore ], ], valuesvalues = = "Happiness Score""Happiness Score" , , aggfuncaggfunc = = 'count''count' ,
,
fill_valuefill_value = = 00 )
)
Happiness Score | 幸福分数 | ||||
---|---|---|---|---|---|
Region | 地区 | Happiness Score | 幸福分数 | ||
Australia and New Zealand | 澳大利亚和新西兰 | (2.692, 4.79] | (2.692,4.79] | 0 | 0 |
(4.79, 5.895] | (4.79,5.895] | 0 | 0 | ||
(5.895, 7.587] | (5.895,7.587] | 6 | 6 | ||
Central and Eastern Europe | 中欧和东欧 | (2.692, 4.79] | (2.692,4.79] | 15 | 15 |
(4.79, 5.895] | (4.79,5.895] | 58 | 58 | ||
(5.895, 7.587] | (5.895,7.587] | 14 | 14 | ||
Eastern Asia | 东亚 | (2.692, 4.79] | (2.692,4.79] | 0 | 0 |
(4.79, 5.895] | (4.79,5.895] | 11 | 11 | ||
(5.895, 7.587] | (5.895,7.587] | 7 | 7 | ||
Latin America and Caribbean | 拉丁美洲和加勒比 | (2.692, 4.79] | (2.692,4.79] | 4 | 4 |
(4.79, 5.895] | (4.79,5.895] | 19 | 19 | ||
(5.895, 7.587] | (5.895,7.587] | 45 | 45 | ||
Middle East and Northern Africa | 中东和北非 | (2.692, 4.79] | (2.692,4.79] | 18 | 18 |
(4.79, 5.895] | (4.79,5.895] | 20 | 20 | ||
(5.895, 7.587] | (5.895,7.587] | 20 | 20 | ||
North America | 北美 | (2.692, 4.79] | (2.692,4.79] | 0 | 0 |
(4.79, 5.895] | (4.79,5.895] | 0 | 0 | ||
(5.895, 7.587] | (5.895,7.587] | 6 | 6 | ||
Southeastern Asia | 东南亚 | (2.692, 4.79] | (2.692,4.79] | 6 | 6 |
(4.79, 5.895] | (4.79,5.895] | 12 | 12 | ||
(5.895, 7.587] | (5.895,7.587] | 8 | 8 | ||
Southern Asia | 南亚 | (2.692, 4.79] | (2.692,4.79] | 13 | 13 |
(4.79, 5.895] | (4.79,5.895] | 8 | 8 | ||
(5.895, 7.587] | (5.895,7.587] | 0 | 0 | ||
Sub-Saharan Africa | 撒哈拉以南非洲 | (2.692, 4.79] | (2.692,4.79] | 101 | 101 |
(4.79, 5.895] | (4.79,5.895] | 16 | 16 | ||
(5.895, 7.587] | (5.895,7.587] | 0 | 0 | ||
Western Europe | 西欧 | (2.692, 4.79] | (2.692,4.79] | 0 | 0 |
(4.79, 5.895] | (4.79,5.895] | 12 | 12 | ||
(5.895, 7.587] | (5.895,7.587] | 51 | 51 |
The last two parameters are both optional and mostly useful to improve display:
最后两个参数都是可选的,并且对于改善显示效果最有用:
margins
is type boolean and allows you to add an all
row / columns, e.g. for subtotal / grand totals (Default False)margins_name
which is type string and accepts the name of the row / column that will contain the totals when margins is True (default ‘All’)margins
是布尔类型,允许您添加all
行/列,例如小计/总计(默认为False) margins_name
是字符串类型,当margins为True时,将接受将包含总计的行/列的名称(默认为“ All”) Let’s use these to add a total to our last table.
让我们使用这些将总计添加到我们的上一张表中。
Happiness Score | 幸福分数 | ||||
---|---|---|---|---|---|
Region | 地区 | Happiness Score | 幸福分数 | ||
Australia and New Zealand | 澳大利亚和新西兰 | (2.692, 4.79] | (2.692,4.79] | 0.0 | 0.0 |
(4.79, 5.895] | (4.79,5.895] | 0.0 | 0.0 | ||
(5.895, 7.587] | (5.895,7.587] | 6.0 | 6.0 | ||
Central and Eastern Europe | 中欧和东欧 | (2.692, 4.79] | (2.692,4.79] | 15.0 | 15.0 |
(4.79, 5.895] | (4.79,5.895] | 58.0 | 58.0 | ||
(5.895, 7.587] | (5.895,7.587] | 14.0 | 14.0 | ||
Eastern Asia | 东亚 | (2.692, 4.79] | (2.692,4.79] | 0.0 | 0.0 |
(4.79, 5.895] | (4.79,5.895] | 11.0 | 11.0 | ||
(5.895, 7.587] | (5.895,7.587] | 7.0 | 7.0 | ||
Latin America and Caribbean | 拉丁美洲和加勒比 | (2.692, 4.79] | (2.692,4.79] | 4.0 | 4.0 |
(4.79, 5.895] | (4.79,5.895] | 19.0 | 19.0 | ||
(5.895, 7.587] | (5.895,7.587] | 45.0 | 45.0 | ||
Middle East and Northern Africa | 中东和北非 | (2.692, 4.79] | (2.692,4.79] | 18.0 | 18.0 |
(4.79, 5.895] | (4.79,5.895] | 20.0 | 20.0 | ||
(5.895, 7.587] | (5.895,7.587] | 20.0 | 20.0 | ||
North America | 北美 | (2.692, 4.79] | (2.692,4.79] | 0.0 | 0.0 |
(4.79, 5.895] | (4.79,5.895] | 0.0 | 0.0 | ||
(5.895, 7.587] | (5.895,7.587] | 6.0 | 6.0 | ||
Southeastern Asia | 东南亚 | (2.692, 4.79] | (2.692,4.79] | 6.0 | 6.0 |
(4.79, 5.895] | (4.79,5.895] | 12.0 | 12.0 | ||
(5.895, 7.587] | (5.895,7.587] | 8.0 | 8.0 | ||
Southern Asia | 南亚 | (2.692, 4.79] | (2.692,4.79] | 13.0 | 13.0 |
(4.79, 5.895] | (4.79,5.895] | 8.0 | 8.0 | ||
(5.895, 7.587] | (5.895,7.587] | 0.0 | 0.0 | ||
Sub-Saharan Africa | 撒哈拉以南非洲 | (2.692, 4.79] | (2.692,4.79] | 101.0 | 101.0 |
(4.79, 5.895] | (4.79,5.895] | 16.0 | 16.0 | ||
(5.895, 7.587] | (5.895,7.587] | 0.0 | 0.0 | ||
Western Europe | 西欧 | (2.692, 4.79] | (2.692,4.79] | 0.0 | 0.0 |
(4.79, 5.895] | (4.79,5.895] | 12.0 | 12.0 | ||
(5.895, 7.587] | (5.895,7.587] | 51.0 | 51.0 | ||
Total count | 总数 | 470.0 | 470.0 |
翻译自: https://www.pybloggers.com/2017/09/explore-happiness-data-using-python-pivot-tables/
python 数据透视表