Lynda: Pandas Essential Training chap06-chap08

Lynda: Pandas Essential Training chap06-chap08

  • chap06
    • Index
    • set_index()
    • reset_index()
    • sort_index()
    • loc()
    • iloc()
  • chap07
    • groupby
    • functions
    • agg()
  • chap08
    • unstack()
    • stack()

chap06

Pandas包和数据的导入

import pandas as pd
oo = pd.read_csv('../data/olympics.csv',skiprows=4)

Index

Index是Pandasl两种数据类型Series和DataFrame的一个属性序列。如果为指定在创建数据类型时候会按照记录的顺序给予一个正整数序列。就是说,第一行Index的值是1,第二行是2,以此类推。使用type(oo.index)可以查看。
视频中提到了,index是不可以直接赋值oo.index[100] = 5这种操作会出现错误。
以下是三种对index进行操作的方法:

  • DataFrame.set_index(columnName,inplace = False)
  • DataFrame.reset_index(inplace = False)
  • DataFrame.sort_index(inplace = False, ascending = True)

set_index()

set_index()是一个内置在DataFrame/Series的一种方法。DataFrame.set_index(columnName,inplace = False)可以返回一个将对应column设置成为index的DataFrame。但是这种改变不是inplace的变化,对DataFrame本身没有影响。但是,如果传入inplace = True,那么这种改变是对DataFrame生效的。

reset_index()

DataFrame.reset_index(inplace = False)
这是在使用set_index()之后,进行undo的一步操作。之前被设置为index的column还原成普通的column而index依然是正整数序列。这个函数同样默认inplace = Fasle

sort_index()

DataFrame.sort_index(inplace = False, ascending = True)
这是将记录按照index进行排序的一个内置方法。

loc()

loc是按照索引进行查找的一个方式,支持单个索引,切片器操作和boolean indexing。
Lynda: Pandas Essential Training chap06-chap08_第1张图片

oo.set_index('Athlete',inplace=True)
oo.loc['BOLT, Usain']

出现的是,
Lynda: Pandas Essential Training chap06-chap08_第2张图片
这样的结果。
而,

oo.reset_index(inplace=True)
oo.loc[oo.Athlete == 'BOLT, Usain']

出现的是,
Lynda: Pandas Essential Training chap06-chap08_第3张图片

iloc()

iloc()则是最初的正整数顺序进行查找的一个方式,同样支持单个索引,切片器操作和boolean indexing。
Lynda: Pandas Essential Training chap06-chap08_第4张图片
oo.iloc[[1542,2390,6000,15000]]
Lynda: Pandas Essential Training chap06-chap08_第5张图片
oo.iloc[1:4]
在这里插入图片描述

chap07

group by 是来源于SQL中的分组操作。在pandas中DataFrame.groupby(‘colunms’) 会得到一个变异的数据类型:pandas.core.groupby.DataFrameGroupBy。

groupby

DataFrame.groupby([column list])可以得到按照不同的column值或者是columns值的组合进行分组的数据结构。

for group_key, group_value in oo.groupby('Edition'):
    print(group_key)
    print(group_value)

这个DataFrameGroupBy数据结构包含两个部分的内容(分组的key,组内元素),上面这个代码的运行结果中的第一组为:

1896
       City  Edition          Sport       Discipline  \
0    Athens     1896       Aquatics         Swimming   
1    Athens     1896       Aquatics         Swimming   
2    Athens     1896       Aquatics         Swimming   
3    Athens     1896       Aquatics         Swimming   
4    Athens     1896       Aquatics         Swimming   
5    Athens     1896       Aquatics         Swimming   
6    Athens     1896       Aquatics         Swimming   
7    Athens     1896       Aquatics         Swimming   
8    Athens     1896       Aquatics         Swimming   
9    Athens     1896       Aquatics         Swimming   
10   Athens     1896       Aquatics         Swimming   
11   Athens     1896      Athletics        Athletics   
12   Athens     1896      Athletics        Athletics   
13   Athens     1896      Athletics        Athletics   
14   Athens     1896      Athletics        Athletics   
15   Athens     1896      Athletics        Athletics   
16   Athens     1896      Athletics        Athletics   
17   Athens     1896      Athletics        Athletics   
18   Athens     1896      Athletics        Athletics   
19   Athens     1896      Athletics        Athletics   
20   Athens     1896      Athletics        Athletics   
21   Athens     1896      Athletics        Athletics   
22   Athens     1896      Athletics        Athletics   
23   Athens     1896      Athletics        Athletics   
24   Athens     1896      Athletics        Athletics   
25   Athens     1896      Athletics        Athletics   
26   Athens     1896      Athletics        Athletics   
27   Athens     1896      Athletics        Athletics   
28   Athens     1896      Athletics        Athletics   
29   Athens     1896      Athletics        Athletics   
..      ...      ...            ...              ...   
121  Athens     1896       Shooting         Shooting   
122  Athens     1896       Shooting         Shooting   
123  Athens     1896       Shooting         Shooting   
124  Athens     1896       Shooting         Shooting   
125  Athens     1896       Shooting         Shooting   
126  Athens     1896       Shooting         Shooting   
127  Athens     1896       Shooting         Shooting   
128  Athens     1896       Shooting         Shooting   
129  Athens     1896       Shooting         Shooting   
130  Athens     1896       Shooting         Shooting   
131  Athens     1896       Shooting         Shooting   
132  Athens     1896         Tennis           Tennis   
133  Athens     1896         Tennis           Tennis   
134  Athens     1896         Tennis           Tennis   
135  Athens     1896         Tennis           Tennis   
136  Athens     1896         Tennis           Tennis   
137  Athens     1896         Tennis           Tennis   
138  Athens     1896         Tennis           Tennis   
139  Athens     1896         Tennis           Tennis   
140  Athens     1896         Tennis           Tennis   
141  Athens     1896         Tennis           Tennis   
142  Athens     1896  Weightlifting    Weightlifting   
143  Athens     1896  Weightlifting    Weightlifting   
144  Athens     1896  Weightlifting    Weightlifting   
145  Athens     1896  Weightlifting    Weightlifting   
146  Athens     1896  Weightlifting    Weightlifting   
147  Athens     1896  Weightlifting    Weightlifting   
148  Athens     1896      Wrestling  Wrestling Gre-R   
149  Athens     1896      Wrestling  Wrestling Gre-R   
150  Athens     1896      Wrestling  Wrestling Gre-R   

                         Athlete  NOC Gender  \
0                  HAJOS, Alfred  HUN    Men   
1               HERSCHMANN, Otto  AUT    Men   
2              DRIVAS, Dimitrios  GRE    Men   
3             MALOKINIS, Ioannis  GRE    Men   
4             CHASAPIS, Spiridon  GRE    Men   
5          CHOROPHAS, Efstathios  GRE    Men   
6                  HAJOS, Alfred  HUN    Men   
7               ANDREOU, Joannis  GRE    Men   
8          CHOROPHAS, Efstathios  GRE    Men   
9                  NEUMANN, Paul  AUT    Men   
10             PEPANOS, Antonios  GRE    Men   
11                 LANE, Francis  USA    Men   
12              SZOKOLYI, Alajos  HUN    Men   
13                 BURKE, Thomas  USA    Men   
14                HOFMANN, Fritz  GER    Men   
15                CURTIS, Thomas  USA    Men   
16            GOULDING, Grantley  GBR    Men   
17             LERMUSIAUX, Albin  FRA    Men   
18                  FLACK, Edwin  AUS    Men   
19                 BLAKE, Arthur  USA    Men   
20               GMELIN, Charles  GBR    Men   
21                 BURKE, Thomas  USA    Men   
22              JAMISON, Herbert  USA    Men   
23            GOLEMIS, Dimitrios  GRE    Men   
24                  FLACK, Edwin  AUS    Men   
25                  DANI, Nandor  HUN    Men   
26              VERSIS, Sotirios  GRE    Men   
27               GARRETT, Robert  USA    Men   
28   PARASKEVOPOULOS, Panagiotis  GRE    Men   
29                 CLARK, Ellery  USA    Men   
..                           ...  ...    ...   
121         PHRANGOUDIS, Joannis  GRE    Men   
122         ORPHANIDIS, Georgios  GRE    Men   
123         PHRANGOUDIS, Joannis  GRE    Men   
124                PAINE, Sumner  USA    Men   
125              NIELSEN, Holger  DEN    Men   
126           TRIKUPIS, Nicolaos  GRE    Men   
127         KARASEVDAS, Pantelis  GRE    Men   
128             PAVLIDIS, Pavlos  GRE    Men   
129                JENSEN, Viggo  DEN    Men   
130         ORPHANIDIS, Georgios  GRE    Men   
131         PHRANGOUDIS, Joannis  GRE    Men   
132                 FLACK, Edwin  ZZX    Men   
133     ROBERTSON, George Stuart  ZZX    Men   
134                 BOLAND, John  ZZX    Men   
135             TRAUN, Friedrich  ZZX    Men   
136         KASDAGLIS, Dionysios  ZZX    Men   
137     PETROKOKKINOS, Demetrios  ZZX    Men   
138       PASPATIS, Konstantinos  GRE    Men   
139         TAPAVICZA, Momcsillo  HUN    Men   
140                 BOLAND, John  GBR    Men   
141         KASDAGLIS, Dionysios  GRE    Men   
142     NIKOLOPOULOS, Alexandros  GRE    Men   
143          ELLIOTT, Launceston  GBR    Men   
144                JENSEN, Viggo  DEN    Men   
145             VERSIS, Sotirios  GRE    Men   
146                JENSEN, Viggo  DEN    Men   
147          ELLIOTT, Launceston  GBR    Men   
148     CHRISTOPOULOS, Stephanos  GRE    Men   
149               SCHUMANN, Carl  GER    Men   
150             TSITAS, Georgios  GRE    Men   

其中,group_value是一个DataFrame类型。

functions

对于分组中的数值变量,我们可以利用一些函数来进行描述性统计。可用的函数包括,max, min, mean, sum, size等,其中key值就是生成结果的index。
oo.groupby('Edition').size()
可以计算每届奥运会的奖牌数。

agg()

上面用到的function只能进行一种统计变量的计算,利用agg()可以进行制定数据列的指定描述性统计(或者其他函数的计算)。

oo.groupby(['Edition','NOC','Medal']).agg(['min','max','count'])

这段代码指定对所有,除了groupby中的数据列,都进行分组的求最小,最大,计数等描述性统计。

oo.groupby(['Edition','NOC','Medal']).agg({'Edition' :['min','max','count']})

下面的代码则以字典的形式指定对‘Edition’列进行分组的min max count的函数运算。

chap08

紧接着上一节讨论groupby的操作。unstack()这个操作是将选择了多个group_key值的一些key值维度放在行上面的操作。可以类比的是Excel的pivot操作,首先你把很多的字段放在了row这个选项中,然后在把某些字段移动到column那边去。unstack()做的就是这个操作。按照视频里面的原话来说就是把数据表变扁平。

unstack()

mw = oo[(oo.Edition == 2008) & ( (oo.Event == '100m') | (oo.Event == '200m'))]
## 回顾一下以前的进行数据筛选的操作
g = mw.groupby(['NOC','Gender','Discipline','Event']).size()
## groupby得到统计计数的表格

g的结构是:
Lynda: Pandas Essential Training chap06-chap08_第6张图片

df = g.unstack(['Discipline','Event'])

df的结构是:
Lynda: Pandas Essential Training chap06-chap08_第7张图片
看到Event以及转到行上面去了。
!注意这里有一个NaN的数据,对此,我们可以在unstack函数值加入fill_value = 0

stack()

stack()是unstack()的逆向操作。
df.stack()
会得到:
Lynda: Pandas Essential Training chap06-chap08_第8张图片
这里会把最后堆叠起来的Event展开到列。有点像pandas里面的melt方法。

你可能感兴趣的:(lynda,python,pandas,基础,learning,notes)