Datawhale Task02 pandas基础 打卡

小白表示收获很大。一次性摄入过多,练习题我缓缓再写。ps资料的编写者真的好厉害,膜拜。

import numpy as np
import pandas as pd 
pd.__version__
'1.1.5'

2.1 文件的读取和写入

2.1.1 文件读取

df_csv = pd.read_csv(r'D:\Ajupyter\iris.csv'
                     ,usecols = ['Sepal_Length','Sepal_Width']
                     ,index_col = ['Sepal_Length']
                    )
df_csv
Sepal_Width
Sepal_Length
5.1 3.5
4.9 3.0
4.7 3.2
4.6 3.1
5.0 3.6
... ...
6.7 3.0
6.3 2.5
6.5 3.0
6.2 3.4
5.9 3.0

150 rows × 1 columns

df_txt = pd.read_csv(r'C:\Users\86198\Desktop\数据\平台查询的企业.txt')
df_txt
企业名
0 北京爱钱帮财富科技有限公司
1 烟台艾利互金网络信息服务有限公司
2 成都伟品信息技术服务有限公司
3 北京朴素磐石投资管理有限公司
4 宝蓝财富科技有限公司
... ...
126 北京中金丰联信息技术股份有限公司
127 广州中青金服互联网金融信息服务有限公司
128 上海顽色投资管理有限公司
129 杭州上陈金融服务外包有限公司
130 杭州飞牛科技有限公司

131 rows × 1 columns

df_excel = pd.read_excel(r'C:\Users\86198\Desktop\数据\批量查询_859.xls',parse_dates = ['成立日期'],nrows = 3)
df_excel
企业名称 登记状态 法定代表人 注册资本 成立日期 核准日期 所属省份 所属城市 所属区县 电话 ... 注册号 组织机构代码 参保人数 企业类型 所属行业 曾用名 网址 企业地址 最新年报地址 经营范围
0 宝蓝财富(天津)科技有限公司 存续 胡德荣 5500万元人民币 2014-04-04 2018-04-04 天津市 天津市 滨海新区 022-23757986 ... 120193000087395 09366780-7 43 有限责任公司 科技推广和应用服务业 - http://www.batiaoyu.com 天津滨海高新区华苑产业区(环外)海泰创新六路2号3-2-601 天津市河西区解放南路与浯水道交口喜年广场5-201 软件技术开发、咨询、服务、转让;商务信息咨询;计算机系统集成;财务咨询;企业管理咨询;批发和...
1 深圳市兴荣欣科技有限责任公司 存续 汪超 50万元人民币 2014-09-01 2020-08-04 广东省 深圳市 龙岗区 13802572770 ... 440306111217767 31197532-3 - 有限责任公司 批发业 - - 深圳市龙岗区南湾街道南岭村社区南园路4号文峰华庭1栋B座210 深圳市光明新区公明街道长圳村长圳路西八巷10号 一般经营项目是:LED灯饰的销售;国内贸易;货物及技术进出口;软件的开发与销售;游戏软件的开...
2 北京顺信益信息技术有限公司 存续 王维虎 10000万元人民币 2015-07-13 2018-06-01 北京市 北京市 海淀区 010-56855134 ... 110108019480505 34438049-X 12 有限责任公司(自然人投资或控股) 软件和信息技术服务业 - - 北京市海淀区万寿路甲12号北京万寿宾馆B座北侧三层1325 北京市海淀区马甸东路19号15层1815 技术开发、技术服务、技术咨询、技术推广;销售机械设备、电子产品、工艺品;经济贸易咨询;企业策...

3 rows × 25 columns

总结:header = None 表示第一行不作为列名;usecols 表示读取哪几列,默认是全部列;nrow表示读取的数据行数;index_col表示把某一列或者某几列作为索引;parse_date把这一列的数据转化为表示时间的列。

pd.read_table(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_table_special_sep.txt')

col1 |||| col2
0 TS |||| This is an apple.
1 GQ |||| My name is Bob.
2 WT |||| Well done!
3 PT |||| May I help you?
pd.read_table(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_table_special_sep.txt'
             ,sep = '\|\|\|\|',engine = 'python') #sep中使用的是正则表达式
col1 col2
0 TS This is an apple.
1 GQ My name is Bob.
2 WT Well done!
3 PT May I help you?

2.1.2 数据写入

df_csv.to_csv(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_csv_saved.csv',index = None)  #更改的是我自己的文件
df_excel.to_excel(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_excel_saved.xlsx',index = None)  
df_txt.to_csv(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_txt_saved.txt',index = None)

2.2基本数据结构

2.2.1 Series

s = pd.Series(data = [100,'a',{
     '阿信':'最帅'}]   #my_index相当于是列名,name是Series的名字
              ,index = pd.Index(['id1',20,'third'],name = 'my_index')
              ,dtype = 'object'
              ,name = 'my_name'
              
)
s   
my_index
id1               100
20                  a
third    {'阿信': '最帅'}
Name: my_name, dtype: object
s.values
array([100, 'a', {'阿信': '最帅'}], dtype=object)
s.dtype
dtype('O')
s.index
Index(['id1', 20, 'third'], dtype='object', name='my_index')
s.name
'my_name'
s.shape
(3,)

2.2.2 数据框

data = [[1,'a',1.2],[2,'b',2.2],[3,'c',3.2]]
df = pd.DataFrame(data = data
                 ,index = ['row_%d'%i for i in range(3)]  #由行索引来构造数据
                 ,columns=['col_0','col_1','col_2'])
df
col_0 col_1 col_2
row_0 1 a 1.2
row_1 2 b 2.2
row_2 3 c 3.2
df = pd.DataFrame(data = {
     'col_0':[1,2,3],'col_1':list('abc'),'col_2':[1.2,2.2,3.2]}
                 ,index = ['row_%d'%i for i in range(3)])
df
col_0 col_1 col_2
row_0 1 a 1.2
row_1 2 b 2.2
row_2 3 c 3.2
df['col_0']
row_0    1
row_1    2
row_2    3
Name: col_0, dtype: int64
df[['col_0','col_1']]df.

col_0 col_1
row_0 1 a
row_1 2 b
row_2 3 c
df.values
array([[1, 'a', 1.2],
       [2, 'b', 2.2],
       [3, 'c', 3.2]], dtype=object)
df.index
Index(['row_0', 'row_1', 'row_2'], dtype='object')
df.columns
Index(['col_0', 'col_1', 'col_2'], dtype='object')
df.dtypes
col_0      int64
col_1     object
col_2    float64
dtype: object
df.shape
(3, 3)
df.T
row_0 row_1 row_2
col_0 1 2 3
col_1 a b c
col_2 1.2 2.2 3.2

2.3 常用基本函数

import pandas as pd
df = pd.read_csv('joyful-pandas-master\data\learn_pandas.csv')
df
School Grade Name Gender Height Weight Transfer Test_Number Test_Date Time_Record
0 Shanghai Jiao Tong University Freshman Gaopeng Yang Female 158.9 46.0 N 1 2019/10/5 0:04:34
1 Peking University Freshman Changqiang You Male 166.5 70.0 N 1 2019/9/4 0:04:20
2 Shanghai Jiao Tong University Senior Mei Sun Male 188.9 89.0 N 2 2019/9/12 0:05:22
3 Fudan University Sophomore Xiaojuan Sun Female NaN 41.0 N 2 2020/1/3 0:04:08
4 Fudan University Sophomore Gaojuan You Male 174.0 74.0 N 2 2019/11/6 0:05:22
... ... ... ... ... ... ... ... ... ... ...
195 Fudan University Junior Xiaojuan Sun Female 153.9 46.0 N 2 2019/10/17 0:04:31
196 Tsinghua University Senior Li Zhao Female 160.9 50.0 N 3 2019/9/22 0:04:03
197 Shanghai Jiao Tong University Senior Chengqiang Chu Female 153.9 45.0 N 1 2020/1/5 0:04:48
198 Shanghai Jiao Tong University Senior Chengmei Shen Male 175.3 71.0 N 2 2020/1/7 0:04:58
199 Tsinghua University Sophomore Chunpeng Lv Male 155.7 51.0 N 1 2019/11/6 0:05:05

200 rows × 10 columns

df.columns
Index(['School', 'Grade', 'Name', 'Gender', 'Height', 'Weight', 'Transfer',
       'Test_Number', 'Test_Date', 'Time_Record'],
      dtype='object')
#df = df[df.columns[:7]]
df.columns[:7]  #取出前7列的列名
Index(['School', 'Grade', 'Name', 'Gender', 'Height', 'Weight', 'Transfer'], dtype='object')
df[df.columns[:7]] #t通过这个列名,取出前7列对应的值,然后赋值给df
School Grade Name Gender Height Weight Transfer
0 Shanghai Jiao Tong University Freshman Gaopeng Yang Female 158.9 46.0 N
1 Peking University Freshman Changqiang You Male 166.5 70.0 N
2 Shanghai Jiao Tong University Senior Mei Sun Male 188.9 89.0 N
3 Fudan University Sophomore Xiaojuan Sun Female NaN 41.0 N
4 Fudan University Sophomore Gaojuan You Male 174.0 74.0 N
... ... ... ... ... ... ... ...
195 Fudan University Junior Xiaojuan Sun Female 153.9 46.0 N
196 Tsinghua University Senior Li Zhao Female 160.9 50.0 N
197 Shanghai Jiao Tong University Senior Chengqiang Chu Female 153.9 45.0 N
198 Shanghai Jiao Tong University Senior Chengmei Shen Male 175.3 71.0 N
199 Tsinghua University Sophomore Chunpeng Lv Male 155.7 51.0 N

200 rows × 7 columns

2.3.1 汇总函数

df.head(2)
School Grade Name Gender Height Weight Transfer
0 Shanghai Jiao Tong University Freshman Gaopeng Yang Female 158.9 46.0 N
1 Peking University Freshman Changqiang You Male 166.5 70.0 N
df.tail(3)
School Grade Name Gender Height Weight Transfer
197 Shanghai Jiao Tong University Senior Chengqiang Chu Female 153.9 45.0 N
198 Shanghai Jiao Tong University Senior Chengmei Shen Male 175.3 71.0 N
199 Tsinghua University Sophomore Chunpeng Lv Male 155.7 51.0 N
df.head()
School Grade Name Gender Height Weight Transfer
0 Shanghai Jiao Tong University Freshman Gaopeng Yang Female 158.9 46.0 N
1 Peking University Freshman Changqiang You Male 166.5 70.0 N
2 Shanghai Jiao Tong University Senior Mei Sun Male 188.9 89.0 N
3 Fudan University Sophomore Xiaojuan Sun Female NaN 41.0 N
4 Fudan University Sophomore Gaojuan You Male 174.0 74.0 N
df.tail()
School Grade Name Gender Height Weight Transfer
195 Fudan University Junior Xiaojuan Sun Female 153.9 46.0 N
196 Tsinghua University Senior Li Zhao Female 160.9 50.0 N
197 Shanghai Jiao Tong University Senior Chengqiang Chu Female 153.9 45.0 N
198 Shanghai Jiao Tong University Senior Chengmei Shen Male 175.3 71.0 N
199 Tsinghua University Sophomore Chunpeng Lv Male 155.7 51.0 N

总结:head tail 分别返回数据的前多少行,和后多少行,括号里不指定的话,默认返回前5或者后5

df.info()

RangeIndex: 200 entries, 0 to 199
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   School    200 non-null    object 
 1   Grade     200 non-null    object 
 2   Name      200 non-null    object 
 3   Gender    200 non-null    object 
 4   Height    183 non-null    float64
 5   Weight    189 non-null    float64
 6   Transfer  188 non-null    object 
dtypes: float64(2), object(5)
memory usage: 11.1+ KB
df.describe()
Height Weight
count 183.000000 189.000000
mean 163.218033 55.015873
std 8.608879 12.824294
min 145.400000 34.000000
25% 157.150000 46.000000
50% 161.900000 51.000000
75% 167.500000 65.000000
max 193.900000 89.000000

总结:info和describe分别返回表的基本信息和数据的主要统计量

2.3.2 特征统计函数

df_demo = df[['Height','Weight']]
df_demo.mean() #平均数
Height    163.218033
Weight     55.015873
dtype: float64
df_demo.max() #最大值
Height    193.9
Weight     89.0
dtype: float64
df_demo.quantile() #分位数 默认是中位数
Height    161.9
Weight     51.0
Name: 0.5, dtype: float64
df_demo.quantile(0.75) #上四分位数
Height    167.5
Weight     65.0
Name: 0.75, dtype: float64
df_demo.count() #非缺失值的数量
Height    183
Weight    189
dtype: int64
df_demo.idxmax()  #最大值对应的索引
Height    193
Weight      2
dtype: int64
#聚合函数 axis = 0默认为0 表示列聚合 axis=1表示行聚合
df_demo.mean(axis = 1).head()  #求的是某个学生的身高和体重的平均数,没有意义
0    102.45
1    118.25
2    138.95
3     41.00
4    124.00
dtype: float64

2.3.3 唯一值函数

df['School'].unique()
array(['Shanghai Jiao Tong University', 'Peking University',
       'Fudan University', 'Tsinghua University'], dtype=object)
df['School'].nunique()
4

总结:unique可以得到某一列中的唯一值(出现的哪几类的数据),nunique得到种类的数量

df['School'].value_counts()  #value_counts返回唯一值和其对应的频数
Tsinghua University              69
Shanghai Jiao Tong University    57
Fudan University                 40
Peking University                34
Name: School, dtype: int64
df_demo = df[['Gender','Transfer','Name']]
df_demo.drop_duplicates(['Gender','Transfer'],keep = 'last')  #去重函数 多个列唯一值的组合,以Gender和Transfer为关键字,在df_demo 中执行,所以会有name
Gender Transfer Name
147 Male NaN Juan You
150 Male Y Chengpeng You
169 Female Y Chengquan Qin
194 Female NaN Yanmei Qian
197 Female N Chengqiang Chu
199 Male N Chunpeng Lv
df_demo.drop_duplicates(['Name','Gender'],keep = False).head(100)  #其余重复项都删除
Gender Transfer Name
0 Female N Gaopeng Yang
1 Male N Changqiang You
2 Male N Mei Sun
4 Male N Gaojuan You
5 Female N Xiaoli Qian
... ... ... ...
115 Female N Gaofeng Sun
116 Male N Feng Zhao
117 Male N Chunli Zhao
119 Female N Peng Zhang
120 Female NaN Peng Han

100 rows × 3 columns

df['School'].drop_duplicates() #Series中也可以使用
0    Shanghai Jiao Tong University
1                Peking University
3                 Fudan University
5              Tsinghua University
Name: School, dtype: object
df_demo.duplicated(['Gender','Transfer']).head(100) #重复为True 不重复为False drop_duplicates 是把重复的行以对应的指令删除,并把剩余的显示出来
0     False
1     False
2      True
3      True
4      True
      ...  
95     True
96     True
97     True
98     True
99     True
Length: 100, dtype: bool
df['School'].duplicated().head(100)  #Seried中也可以使用
0     False
1     False
2      True
3     False
4      True
      ...  
95     True
96     True
97     True
98     True
99     True
Name: School, Length: 100, dtype: bool

2.3.4 替换函数

总结:映射替换(replace),逻辑替换(where,mask),数值替换(abs,clip,round)

df['Gender'].replace({
     'Female':0,'Male':1}).head(100)
0     0
1     1
2     1
3     0
4     1
     ..
95    1
96    0
97    0
98    1
99    1
Name: Gender, Length: 100, dtype: int64
df['Gender'].replace(['Female','Male'],[0,1]).head(100)
0     0
1     1
2     1
3     0
4     1
     ..
95    1
96    0
97    0
98    1
99    1
Name: Gender, Length: 100, dtype: int64
s = pd.Series(['a',1,'b',2,1,1,'a'])
s
0    a
1    1
2    b
3    2
4    1
5    1
6    a
dtype: object
s.replace([1,2],method='ffill') #替换1和2,用前面一个最近的未被替换的值替换1和2,一个是a一个是b
0    a
1    a
2    b
3    b
4    b
5    b
6    a
dtype: object
s.replace([1,2],method='bfill') #1-b 2-a
0    a
1    b
2    b
3    a
4    a
5    a
6    a
dtype: object
#逻辑替换
s = pd.Series([-1,2,100,-50])
s.where(s<0) #False时进行替换
0    -1.0
1     NaN
2     NaN
3   -50.0
dtype: float64
s.where(s<0,100)
0     -1
1    100
2    100
3    -50
dtype: int64
s.mask(s<0) #True时进行替换
0      NaN
1      2.0
2    100.0
3      NaN
dtype: float64
s.mask(s<0,10)
0     10
1      2
2    100
3     10
dtype: int64
s_condition = pd.Series([True,False,False,True],index = s.index)
s.mask(s_condition,-50)  #True时替换为-50,False时不变,保留原来的数值
0    -50
1      2
2    100
3    -50
dtype: int64
#数值替换
s = pd.Series([-1,3.5515,100,-50])
s.round() #括号里是小数位数,默认是取整,四舍五入
0     -1.0
1      4.0
2    100.0
3    -50.0
dtype: float64
s.abs()
0      1.0000
1      3.5515
2    100.0000
3     50.0000
dtype: float64
s
0     -1.0000
1      3.5515
2    100.0000
3    -50.0000
dtype: float64
s.clip(0,5) #小于0的用0代替,大于5的用5代替
0    0.0000
1    3.5515
2    5.0000
3    0.0000
dtype: float64

参数:lower : float或array_like,默认为None

最小阈值。低于此阈值的所有值都将设置为它。

upper : float或array_like,默认为None

最大阈值。高于此阈值的所有值都将设置为它。

axis : int或string轴名称,可选

沿给定轴将对象与下部和上部对齐。

inplace : 布尔值,默认为False

是否对数据执行操作。

返回:

Series或DataFrame

与调用对象相同的类型,替换了剪辑边界之外的值

参考:https://www.cjavapy.com/article/330/

import pandas as pd
data = {
     'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}  
df = pd.DataFrame(data)
df
col_0 col_1
0 9 -2
1 -3 -7
2 0 6
3 -1 8
4 5 -5

t = pd.Series([2, -4, -1, 6, 3])
t
0    2
1   -4
2   -1
3    6
4    3
dtype: int64
df
col_0 col_1
0 9 -2
1 -3 -7
2 0 6
3 -1 8
4 5 -5
 df.clip(t, t + 4, axis=0)  #从网上找的这个例子不明白啊
col_0 col_1
0 6 2
1 -3 -4
2 0 3
3 6 8
4 5 3

2.3.5 排序函数

df = pd.read_csv('joyful-pandas-master/data/learn_pandas.csv')
df_demo = df[['Grade','Name','Height','Weight']].set_index(['Grade','Name'])
df_demo.sort_values('Height').head()
Height Weight
Grade Name
Junior Xiaoli Chu 145.4 34.0
Senior Gaomei Lv 147.3 34.0
Sophomore Peng Han 147.8 34.0
Senior Changli Lv 148.7 41.0
Sophomore Changjuan You 150.5 40.0
df_demo.sort_values('Height',ascending = False).head()
Height Weight
Grade Name
Senior Xiaoqiang Qin 193.9 79.0
Mei Sun 188.9 89.0
Gaoli Zhao 186.5 83.0
Freshman Qiang Han 185.3 87.0
Senior Qiang Zheng 183.9 87.0
df_demo.sort_values(['Weight','Height'],ascending=[True,False]).head(100)  #Weight是升序排列,在相同的weight下,Height再降序排列

Height Weight
Grade Name
Sophomore Peng Han 147.8 34.0
Senior Gaomei Lv 147.3 34.0
Junior Xiaoli Chu 145.4 34.0
Sophomore Qiang Zhou 150.5 36.0
Freshman Yanqiang Xu 152.4 38.0
Qiang Han 151.8 38.0
Senior Chengpeng Zheng 151.7 38.0
Sophomore Mei Xu 154.2 39.0
Freshman Xiaoquan Sun 154.6 40.0
Sophomore Qiang Sun 154.3 40.0
Senior Juan You 154.0 40.0
Sophomore Changjuan You 150.5 40.0
Senior Yanli Zhang 154.2 41.0
Changli Lv 148.7 41.0
Sophomore Xiaojuan Sun NaN 41.0
Freshman Gaojuan Qin NaN 41.0
Gaoquan Sun 156.8 42.0
Senior Xiaopeng Chu 156.5 42.0
Junior Qiang Lv 152.1 42.0
Sophomore Xiaoqiang Feng 157.0 43.0
Junior Gaoqiang Zhou 156.8 43.0
Freshman Xiaoli Xu 156.5 43.0
Sophomore Feng Qian 156.4 43.0
Freshman Quan Chu 154.7 43.0
Junior Xiaoquan Lv 153.2 43.0
Freshman Qiang Zhang 152.7 43.0
Gaofeng Zhao 152.2 43.0
Sophomore Changmei Xu 151.6 43.0
Freshman Chunmei Wang 151.2 43.0
Feng Yang 158.9 44.0
Senior Quan Xu 157.0 44.0
Junior Mei Zhang 156.5 44.0
Chengpeng Zhao 156.0 44.0
Freshman Li Lv 155.2 44.0
Junior Gaojuan Qian 154.8 44.0
Chunmei Han 153.2 44.0
Sophomore Chunpeng Shi 152.9 44.0
Senior Gaojuan Zhao 151.5 44.0
Junior Yanpeng Han NaN 44.0
Freshman Changquan Chu 159.6 45.0
Junior Xiaofeng You 158.5 45.0
Sophomore Xiaoquan Zhang 158.3 45.0
Senior Chengqiang Chu 153.9 45.0
Freshman Xiaoli Lv 152.5 45.0
Junior Xiaofeng Zhao 159.9 46.0
Freshman Gaopeng Yang 158.9 46.0
Gaoli Feng 157.4 46.0
Senior Changmei Sun 155.3 46.0
Xiaopeng Qian 154.3 46.0
Junior Xiaojuan Sun 153.9 46.0
Changjuan You 161.4 47.0
Freshman Chengquan Chu 161.3 47.0
Senior Juan Zhao 161.2 47.0
Junior Yanli Zhang 160.6 47.0
Senior Juan Zhang 159.9 47.0
Chunjuan Xu 159.8 47.0
Junior Chunjuan Zhang 158.9 47.0
Senior Xiaopeng Lv 158.4 47.0
Sophomore Xiaomei Shi 157.9 47.0
Senior Juan Qin 156.0 47.0
Gaoli Wu 155.7 47.0
Feng Zhou 155.6 47.0
Freshman Changli Zhang 163.0 48.0
Gaopeng Shi 162.9 48.0
Junior Gaofeng Sun 162.8 48.0
Sophomore Yanfeng Qian 160.1 48.0
Junior Qiang Wang 157.5 48.0
Gaoli Xu 157.3 48.0
Senior Peng You NaN 48.0
Junior Yanli You NaN 48.0
Freshman Yanjuan Han 163.7 49.0
Senior Feng Zheng 162.6 49.0
Junior Xiaojuan Zhao 160.3 49.0
Senior Yanmei Qian 160.3 49.0
Junior Changjuan Xu 159.6 49.0
Yanjuan Lv 159.3 49.0
Freshman Xiaomei Yang 159.3 49.0
Xiaofeng Qian 158.5 49.0
Changqiang Yang 156.0 49.0
Senior Qiang Chu 162.4 50.0
Junior Gaoqiang Qian 161.9 50.0
Senior Mei Zheng 161.1 50.0
Li Zhao 160.9 50.0
Xiaojuan Qian 160.6 50.0
Junior Mei Sun 159.5 50.0
Senior Quan Qian 159.0 50.0
Junior Feng Zheng 165.6 51.0
Li Chu 165.2 51.0
Xiaojuan Qian 164.7 51.0
Freshman Li Wu 164.3 51.0
Yanqiang Feng 162.3 51.0
Junior Chengquan Shi 160.8 51.0
Xiaopeng Zhou 160.2 51.0
Feng Zhao 159.0 51.0
Freshman Xiaoli Qian 158.0 51.0
Junior Gaoquan Shen 158.0 51.0
Sophomore Chunpeng Lv 155.7 51.0
Senior Mei Feng NaN 51.0
Junior Gaoquan Chu NaN 51.0
Senior Feng Yang 167.0 52.0
df_demo.sort_values(['Height','Weight'],ascending=[True,False]).head(100)  #固定相同身高,体重是降序排列
Height Weight
Grade Name
Junior Xiaoli Chu 145.4 34.0
Senior Gaomei Lv 147.3 34.0
Sophomore Peng Han 147.8 34.0
Senior Changli Lv 148.7 41.0
Sophomore Changjuan You 150.5 40.0
Qiang Zhou 150.5 36.0
Freshman Chunmei Wang 151.2 43.0
Senior Gaojuan Zhao 151.5 44.0
Sophomore Changmei Xu 151.6 43.0
Senior Chengpeng Zheng 151.7 38.0
Freshman Qiang Han 151.8 38.0
Junior Qiang Lv 152.1 42.0
Freshman Gaofeng Zhao 152.2 43.0
Yanqiang Xu 152.4 38.0
Xiaoli Lv 152.5 45.0
Qiang Zhang 152.7 43.0
Sophomore Chunpeng Shi 152.9 44.0
Junior Chunmei Han 153.2 44.0
Xiaoquan Lv 153.2 43.0
Senior Mei Chen 153.6 NaN
Junior Xiaojuan Sun 153.9 46.0
Senior Chengqiang Chu 153.9 45.0
Juan You 154.0 40.0
Yanli Zhang 154.2 41.0
Sophomore Mei Xu 154.2 39.0
Senior Xiaopeng Qian 154.3 46.0
Sophomore Qiang Sun 154.3 40.0
Freshman Xiaoquan Sun 154.6 40.0
Quan Chu 154.7 43.0
Junior Gaojuan Qian 154.8 44.0
Freshman Li Lv 155.2 44.0
Senior Changmei Sun 155.3 46.0
Feng Zhou 155.6 47.0
Sophomore Chunpeng Lv 155.7 51.0
Senior Gaoli Wu 155.7 47.0
Freshman Changqiang Yang 156.0 49.0
Senior Juan Qin 156.0 47.0
Junior Chengpeng Zhao 156.0 44.0
Sophomore Feng Qian 156.4 43.0
Junior Mei Zhang 156.5 44.0
Freshman Xiaoli Xu 156.5 43.0
Senior Xiaopeng Chu 156.5 42.0
Junior Gaoqiang Zhou 156.8 43.0
Freshman Gaoquan Sun 156.8 42.0
Senior Quan Xu 157.0 44.0
Sophomore Xiaoqiang Feng 157.0 43.0
Junior Gaoli Xu 157.3 48.0
Freshman Gaoli Feng 157.4 46.0
Junior Qiang Wang 157.5 48.0
Senior Qiang Shi 157.7 NaN
Sophomore Xiaomei Shi 157.9 47.0
Freshman Xiaoli Qian 158.0 51.0
Junior Gaoquan Shen 158.0 51.0
Sophomore Xiaoquan Zhang 158.3 45.0
Senior Xiaopeng Lv 158.4 47.0
Freshman Xiaofeng Qian 158.5 49.0
Junior Xiaofeng You 158.5 45.0
Chunjuan Zhang 158.9 47.0
Freshman Gaopeng Yang 158.9 46.0
Feng Yang 158.9 44.0
Junior Feng Zhao 159.0 51.0
Senior Quan Qian 159.0 50.0
Junior Yanjuan Lv 159.3 49.0
Freshman Xiaomei Yang 159.3 49.0
Senior Gaopeng Qin 159.4 52.0
Junior Mei Sun 159.5 50.0
Changjuan Xu 159.6 49.0
Freshman Changquan Chu 159.6 45.0
Senior Chunjuan Xu 159.8 47.0
Juan Zhang 159.9 47.0
Junior Xiaofeng Zhao 159.9 46.0
Xiaopeng Shen 160.1 53.0
Sophomore Yanfeng Qian 160.1 48.0
Junior Xiaopeng Zhou 160.2 51.0
Xiaojuan Zhao 160.3 49.0
Senior Yanmei Qian 160.3 49.0
Junior Quan Zhao 160.6 53.0
Senior Xiaojuan Qian 160.6 50.0
Junior Yanli Zhang 160.6 47.0
Chengquan Qin 160.7 52.0
Sophomore Xiaoqiang Qin 160.8 54.0
Junior Chengquan Shi 160.8 51.0
Qiang Sun 160.8 NaN
Senior Li Zhao 160.9 50.0
Freshman Xiaopeng Zhao 161.0 53.0
Senior Mei Zheng 161.1 50.0
Juan Zhao 161.2 47.0
Freshman Chengquan Chu 161.3 47.0
Junior Changjuan You 161.4 47.0
Senior Li Xu 161.5 53.0
Chunpeng Qian 161.6 NaN
Junior Xiaopeng Sun 161.9 54.0
Gaoqiang Qian 161.9 50.0
Chunquan Xu 162.1 54.0
Freshman Yanqiang Feng 162.3 51.0
Xiaojuan Chu 162.4 58.0
Senior Qiang Chu 162.4 50.0
Freshman Peng Wu 162.5 53.0
Qiang Chu 162.5 52.0
Senior Feng Zheng 162.6 49.0
df_demo.sort_index(level=['Grade','Name'],ascending=[True,False]).head(100) #按照Grade,Name进行排序
Height Weight
Grade Name
Freshman Yanquan Wang 163.5 55.0
Yanqiang Xu 152.4 38.0
Yanqiang Feng 162.3 51.0
Yanpeng Lv NaN 65.0
Yanli Zhang 165.1 52.0
Yanjuan Zhao NaN 53.0
Yanjuan Han 163.7 49.0
Xiaoquan Sun 154.6 40.0
Xiaopeng Zhou 174.1 74.0
Xiaopeng Zhao 161.0 53.0
Xiaopeng Han 164.1 53.0
Xiaomei Yang 159.3 49.0
Xiaoli Xu 156.5 43.0
Xiaoli Qian 158.0 51.0
Xiaoli Lv 152.5 45.0
Xiaojuan Qin NaN 79.0
Xiaojuan Chu 162.4 58.0
Xiaofeng Qian 158.5 49.0
Quan Chu 154.7 43.0
Qiang Zhang 152.7 43.0
Qiang Shi 164.5 52.0
Qiang Han 185.3 87.0
Qiang Han 151.8 38.0
Qiang Feng 178.9 80.0
Qiang Chu 162.5 52.0
Peng Zhang 163.1 NaN
Peng Wu 162.5 53.0
Li Wu 164.3 51.0
Li Lv 155.2 44.0
Juan Zhang 168.6 55.0
Gaoquan Xu NaN 52.0
Gaoquan Sun 156.8 42.0
Gaoqiang Qin 170.2 63.0
Gaopeng Yang 158.9 46.0
Gaopeng Shi 162.9 48.0
Gaoli Zhao 175.4 78.0
Gaoli Feng 157.4 46.0
Gaojuan Qin NaN 41.0
Gaofeng Zhao 152.2 43.0
Feng Yang 158.9 44.0
Feng Wang 176.3 74.0
Chunmei Wang 151.2 43.0
Chunmei Shi 164.9 52.0
Chunli Zhao 180.2 83.0
Chengquan Chu 161.3 47.0
Changquan Chu 159.6 45.0
Changqiang You 166.5 70.0
Changqiang Yang 156.0 49.0
Changpeng Zhao 181.3 83.0
Changmei Lv 172.2 75.0
Changmei Feng 163.8 56.0
Changli Zhang 163.0 48.0
Junior Yanpeng Han NaN 44.0
Yanmei Yang 167.7 57.0
Yanli Zhang 160.6 47.0
Yanli You NaN 48.0
Yanli Wang 169.9 67.0
Yanjuan Lv 159.3 49.0
Yanfeng Qian 178.7 75.0
Xiaoquan Lv 153.2 43.0
Xiaoqiang Qin 170.1 68.0
Xiaopeng Zhou 160.2 51.0
Xiaopeng Sun 161.9 54.0
Xiaopeng Shen 160.1 53.0
Xiaoli Wang 171.4 70.0
Xiaoli Chu 145.4 34.0
Xiaojuan Zhao 160.3 49.0
Xiaojuan Sun 153.9 46.0
Xiaojuan Qian 164.7 51.0
Xiaofeng Zhao 159.9 46.0
Xiaofeng You 158.5 45.0
Quan Zhao 160.6 53.0
Qiang You 170.0 56.0
Qiang Wang 157.5 48.0
Qiang Sun 163.1 53.0
Qiang Sun 160.8 NaN
Qiang Lv 152.1 42.0
Peng Wang 162.8 65.0
Mei Zhang 156.5 44.0
Mei Sun 159.5 50.0
Li Sun 166.6 54.0
Li Chu 165.2 51.0
Juan Xu 164.8 NaN
Gaoquan Zhou 166.8 70.0
Gaoquan Shen 158.0 51.0
Gaoquan Chu NaN 51.0
Gaoqiang Zhou 156.8 43.0
Gaoqiang Qin 167.1 71.0
Gaoqiang Qian 161.9 50.0
Gaoli Xu 157.3 48.0
Gaojuan Qian 154.8 44.0
Gaofeng Sun 162.8 48.0
Feng Zheng 165.6 51.0
Feng Zhao 159.0 51.0
Chunquan Xu 162.1 54.0
Chunqiang Chu 168.6 72.0
Chunmei Han 153.2 44.0
Chunjuan Zhang 158.9 47.0
Chunfeng Zhao 173.4 72.0
Chengquan Shi 160.8 51.0

2.3.6 apply方法

df_demo = df[['Height','Weight']]
def my_mean(x):
    res = x.mean()
    return res
df_demo.apply(my_mean)
Height    163.218033
Weight     55.015873
dtype: float64
df_demo.apply(lambda x:x.mean())
Height    163.218033
Weight     55.015873
dtype: float64
df_demo.apply(lambda x:x.mean(),axis = 1).head()
0    102.45
1    118.25
2    138.95
3     41.00
4    124.00
dtype: float64
df_demo.apply(lambda x:(x-x.mean()).abs().mean()) #与上面的是两个函数
Height     6.707229
Weight    10.391870
dtype: float64
df_demo.mad()
Height     6.707229
Weight    10.391870
dtype: float64

2.4 窗口对象

https://www.gairuo.com/p/pandas-window-functions

可以把“窗口”(windows)这个理解一个集合,一个窗口就是一个集合,在统计分析中有需要不同的「窗口」,比如一个部门分成不同组,在统计时会按组进行平均、排名等操作。再比如,在一些像时间这种有顺序的数据,我们可能5天分一组、一月分一组再进行排序、求中位数等计算。
rolling(10) 与 groupby 很像,但并没有进行分组,而是创建了一个按移动 10(天)位的滑动窗口对象。我们再对每个对象进行统计操作。

2.4.1 滑窗对象

rolling得到滑窗对象,最重要的参数是窗口大小window

s = pd.Series([1,2,3,4,5])
roller = s.rolling(window = 3)
roller
Rolling [window=3,center=False,axis=0]
roller.mean() #https://blog.csdn.net/qsx123432/article/details/111396542  解释的很清楚
0    NaN
1    NaN
2    2.0
3    3.0
4    4.0
dtype: float64
roller.sum()
0     NaN
1     NaN
2     6.0
3     9.0
4    12.0
dtype: float64
s2 = pd.Series([1,2,6,16,30])
roller.cov(s2)
0     NaN
1     NaN
2     2.5
3     7.0
4    12.0
dtype: float64
roller.corr(s2)
0         NaN
1         NaN
2    0.944911
3    0.970725
4    0.995402
dtype: float64
#通过apply传入自定义的函数
roller.apply(lambda x:x.mean())

0    NaN
1    NaN
2    2.0
3    3.0
4    4.0
dtype: float64
a = pd.Series([1,3,6,10,15])
a.shift(2)  #取向前第2个元素的值
0    NaN
1    NaN
2    1.0
3    3.0
4    6.0
dtype: float64
a.diff(3) #与向前第3个元素做差
0     NaN
1     NaN
2     NaN
3     9.0
4    12.0
dtype: float64
a.pct_change() #与向前第ng个元素相比计算增长率
0         NaN
1    2.000000
2    1.000000
3    0.666667
4    0.500000
dtype: float64
a.shift(-1) #取向后一个元素的值
0     3.0
1     6.0
2    10.0
3    15.0
4     NaN
dtype: float64
a.diff(-2) #与向后第二个元素做差
0   -5.0
1   -7.0
2   -9.0
3    NaN
4    NaN
dtype: float64
a

0     1
1     3
2     6
3    10
4    15
dtype: int64

2.4.2 扩张窗口

s = pd.Series([1,3,6,10])
s.expanding().mean()
0    1.000000
1    2.000000
2    3.333333
3    5.000000
dtype: float64


你可能感兴趣的:(pandas,python,pandas)