小白表示收获很大。一次性摄入过多,练习题我缓缓再写。ps资料的编写者真的好厉害,膜拜。
import numpy as np
import pandas as pd
pd.__version__
'1.1.5'
2.1 文件的读取和写入
2.1.1 文件读取
df_csv = pd.read_csv(r'D:\Ajupyter\iris.csv'
,usecols = ['Sepal_Length','Sepal_Width']
,index_col = ['Sepal_Length']
)
df_csv
|
Sepal_Width |
Sepal_Length |
|
5.1 |
3.5 |
4.9 |
3.0 |
4.7 |
3.2 |
4.6 |
3.1 |
5.0 |
3.6 |
... |
... |
6.7 |
3.0 |
6.3 |
2.5 |
6.5 |
3.0 |
6.2 |
3.4 |
5.9 |
3.0 |
150 rows × 1 columns
df_txt = pd.read_csv(r'C:\Users\86198\Desktop\数据\平台查询的企业.txt')
df_txt
|
企业名 |
0 |
北京爱钱帮财富科技有限公司 |
1 |
烟台艾利互金网络信息服务有限公司 |
2 |
成都伟品信息技术服务有限公司 |
3 |
北京朴素磐石投资管理有限公司 |
4 |
宝蓝财富科技有限公司 |
... |
... |
126 |
北京中金丰联信息技术股份有限公司 |
127 |
广州中青金服互联网金融信息服务有限公司 |
128 |
上海顽色投资管理有限公司 |
129 |
杭州上陈金融服务外包有限公司 |
130 |
杭州飞牛科技有限公司 |
131 rows × 1 columns
df_excel = pd.read_excel(r'C:\Users\86198\Desktop\数据\批量查询_859.xls',parse_dates = ['成立日期'],nrows = 3)
df_excel
|
企业名称 |
登记状态 |
法定代表人 |
注册资本 |
成立日期 |
核准日期 |
所属省份 |
所属城市 |
所属区县 |
电话 |
... |
注册号 |
组织机构代码 |
参保人数 |
企业类型 |
所属行业 |
曾用名 |
网址 |
企业地址 |
最新年报地址 |
经营范围 |
0 |
宝蓝财富(天津)科技有限公司 |
存续 |
胡德荣 |
5500万元人民币 |
2014-04-04 |
2018-04-04 |
天津市 |
天津市 |
滨海新区 |
022-23757986 |
... |
120193000087395 |
09366780-7 |
43 |
有限责任公司 |
科技推广和应用服务业 |
- |
http://www.batiaoyu.com |
天津滨海高新区华苑产业区(环外)海泰创新六路2号3-2-601 |
天津市河西区解放南路与浯水道交口喜年广场5-201 |
软件技术开发、咨询、服务、转让;商务信息咨询;计算机系统集成;财务咨询;企业管理咨询;批发和... |
1 |
深圳市兴荣欣科技有限责任公司 |
存续 |
汪超 |
50万元人民币 |
2014-09-01 |
2020-08-04 |
广东省 |
深圳市 |
龙岗区 |
13802572770 |
... |
440306111217767 |
31197532-3 |
- |
有限责任公司 |
批发业 |
- |
- |
深圳市龙岗区南湾街道南岭村社区南园路4号文峰华庭1栋B座210 |
深圳市光明新区公明街道长圳村长圳路西八巷10号 |
一般经营项目是:LED灯饰的销售;国内贸易;货物及技术进出口;软件的开发与销售;游戏软件的开... |
2 |
北京顺信益信息技术有限公司 |
存续 |
王维虎 |
10000万元人民币 |
2015-07-13 |
2018-06-01 |
北京市 |
北京市 |
海淀区 |
010-56855134 |
... |
110108019480505 |
34438049-X |
12 |
有限责任公司(自然人投资或控股) |
软件和信息技术服务业 |
- |
- |
北京市海淀区万寿路甲12号北京万寿宾馆B座北侧三层1325 |
北京市海淀区马甸东路19号15层1815 |
技术开发、技术服务、技术咨询、技术推广;销售机械设备、电子产品、工艺品;经济贸易咨询;企业策... |
3 rows × 25 columns
总结:header = None 表示第一行不作为列名;usecols 表示读取哪几列,默认是全部列;nrow表示读取的数据行数;index_col表示把某一列或者某几列作为索引;parse_date把这一列的数据转化为表示时间的列。
pd.read_table(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_table_special_sep.txt')
|
col1 |||| col2 |
0 |
TS |||| This is an apple. |
1 |
GQ |||| My name is Bob. |
2 |
WT |||| Well done! |
3 |
PT |||| May I help you? |
pd.read_table(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_table_special_sep.txt'
,sep = '\|\|\|\|',engine = 'python')
|
col1 |
col2 |
0 |
TS |
This is an apple. |
1 |
GQ |
My name is Bob. |
2 |
WT |
Well done! |
3 |
PT |
May I help you? |
2.1.2 数据写入
df_csv.to_csv(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_csv_saved.csv',index = None)
df_excel.to_excel(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_excel_saved.xlsx',index = None)
df_txt.to_csv(r'C:\Users\86198\Desktop\joyful-pandas-master\data\my_txt_saved.txt',index = None)
2.2基本数据结构
2.2.1 Series
s = pd.Series(data = [100,'a',{
'阿信':'最帅'}]
,index = pd.Index(['id1',20,'third'],name = 'my_index')
,dtype = 'object'
,name = 'my_name'
)
s
my_index
id1 100
20 a
third {'阿信': '最帅'}
Name: my_name, dtype: object
s.values
array([100, 'a', {'阿信': '最帅'}], dtype=object)
s.dtype
dtype('O')
s.index
Index(['id1', 20, 'third'], dtype='object', name='my_index')
s.name
'my_name'
s.shape
(3,)
2.2.2 数据框
data = [[1,'a',1.2],[2,'b',2.2],[3,'c',3.2]]
df = pd.DataFrame(data = data
,index = ['row_%d'%i for i in range(3)]
,columns=['col_0','col_1','col_2'])
df
|
col_0 |
col_1 |
col_2 |
row_0 |
1 |
a |
1.2 |
row_1 |
2 |
b |
2.2 |
row_2 |
3 |
c |
3.2 |
df = pd.DataFrame(data = {
'col_0':[1,2,3],'col_1':list('abc'),'col_2':[1.2,2.2,3.2]}
,index = ['row_%d'%i for i in range(3)])
df
|
col_0 |
col_1 |
col_2 |
row_0 |
1 |
a |
1.2 |
row_1 |
2 |
b |
2.2 |
row_2 |
3 |
c |
3.2 |
df['col_0']
row_0 1
row_1 2
row_2 3
Name: col_0, dtype: int64
df[['col_0','col_1']]df.
|
col_0 |
col_1 |
row_0 |
1 |
a |
row_1 |
2 |
b |
row_2 |
3 |
c |
df.values
array([[1, 'a', 1.2],
[2, 'b', 2.2],
[3, 'c', 3.2]], dtype=object)
df.index
Index(['row_0', 'row_1', 'row_2'], dtype='object')
df.columns
Index(['col_0', 'col_1', 'col_2'], dtype='object')
df.dtypes
col_0 int64
col_1 object
col_2 float64
dtype: object
df.shape
(3, 3)
df.T
|
row_0 |
row_1 |
row_2 |
col_0 |
1 |
2 |
3 |
col_1 |
a |
b |
c |
col_2 |
1.2 |
2.2 |
3.2 |
2.3 常用基本函数
import pandas as pd
df = pd.read_csv('joyful-pandas-master\data\learn_pandas.csv')
df
|
School |
Grade |
Name |
Gender |
Height |
Weight |
Transfer |
Test_Number |
Test_Date |
Time_Record |
0 |
Shanghai Jiao Tong University |
Freshman |
Gaopeng Yang |
Female |
158.9 |
46.0 |
N |
1 |
2019/10/5 |
0:04:34 |
1 |
Peking University |
Freshman |
Changqiang You |
Male |
166.5 |
70.0 |
N |
1 |
2019/9/4 |
0:04:20 |
2 |
Shanghai Jiao Tong University |
Senior |
Mei Sun |
Male |
188.9 |
89.0 |
N |
2 |
2019/9/12 |
0:05:22 |
3 |
Fudan University |
Sophomore |
Xiaojuan Sun |
Female |
NaN |
41.0 |
N |
2 |
2020/1/3 |
0:04:08 |
4 |
Fudan University |
Sophomore |
Gaojuan You |
Male |
174.0 |
74.0 |
N |
2 |
2019/11/6 |
0:05:22 |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
195 |
Fudan University |
Junior |
Xiaojuan Sun |
Female |
153.9 |
46.0 |
N |
2 |
2019/10/17 |
0:04:31 |
196 |
Tsinghua University |
Senior |
Li Zhao |
Female |
160.9 |
50.0 |
N |
3 |
2019/9/22 |
0:04:03 |
197 |
Shanghai Jiao Tong University |
Senior |
Chengqiang Chu |
Female |
153.9 |
45.0 |
N |
1 |
2020/1/5 |
0:04:48 |
198 |
Shanghai Jiao Tong University |
Senior |
Chengmei Shen |
Male |
175.3 |
71.0 |
N |
2 |
2020/1/7 |
0:04:58 |
199 |
Tsinghua University |
Sophomore |
Chunpeng Lv |
Male |
155.7 |
51.0 |
N |
1 |
2019/11/6 |
0:05:05 |
200 rows × 10 columns
df.columns
Index(['School', 'Grade', 'Name', 'Gender', 'Height', 'Weight', 'Transfer',
'Test_Number', 'Test_Date', 'Time_Record'],
dtype='object')
df.columns[:7]
Index(['School', 'Grade', 'Name', 'Gender', 'Height', 'Weight', 'Transfer'], dtype='object')
df[df.columns[:7]]
|
School |
Grade |
Name |
Gender |
Height |
Weight |
Transfer |
0 |
Shanghai Jiao Tong University |
Freshman |
Gaopeng Yang |
Female |
158.9 |
46.0 |
N |
1 |
Peking University |
Freshman |
Changqiang You |
Male |
166.5 |
70.0 |
N |
2 |
Shanghai Jiao Tong University |
Senior |
Mei Sun |
Male |
188.9 |
89.0 |
N |
3 |
Fudan University |
Sophomore |
Xiaojuan Sun |
Female |
NaN |
41.0 |
N |
4 |
Fudan University |
Sophomore |
Gaojuan You |
Male |
174.0 |
74.0 |
N |
... |
... |
... |
... |
... |
... |
... |
... |
195 |
Fudan University |
Junior |
Xiaojuan Sun |
Female |
153.9 |
46.0 |
N |
196 |
Tsinghua University |
Senior |
Li Zhao |
Female |
160.9 |
50.0 |
N |
197 |
Shanghai Jiao Tong University |
Senior |
Chengqiang Chu |
Female |
153.9 |
45.0 |
N |
198 |
Shanghai Jiao Tong University |
Senior |
Chengmei Shen |
Male |
175.3 |
71.0 |
N |
199 |
Tsinghua University |
Sophomore |
Chunpeng Lv |
Male |
155.7 |
51.0 |
N |
200 rows × 7 columns
2.3.1 汇总函数
df.head(2)
|
School |
Grade |
Name |
Gender |
Height |
Weight |
Transfer |
0 |
Shanghai Jiao Tong University |
Freshman |
Gaopeng Yang |
Female |
158.9 |
46.0 |
N |
1 |
Peking University |
Freshman |
Changqiang You |
Male |
166.5 |
70.0 |
N |
df.tail(3)
|
School |
Grade |
Name |
Gender |
Height |
Weight |
Transfer |
197 |
Shanghai Jiao Tong University |
Senior |
Chengqiang Chu |
Female |
153.9 |
45.0 |
N |
198 |
Shanghai Jiao Tong University |
Senior |
Chengmei Shen |
Male |
175.3 |
71.0 |
N |
199 |
Tsinghua University |
Sophomore |
Chunpeng Lv |
Male |
155.7 |
51.0 |
N |
df.head()
|
School |
Grade |
Name |
Gender |
Height |
Weight |
Transfer |
0 |
Shanghai Jiao Tong University |
Freshman |
Gaopeng Yang |
Female |
158.9 |
46.0 |
N |
1 |
Peking University |
Freshman |
Changqiang You |
Male |
166.5 |
70.0 |
N |
2 |
Shanghai Jiao Tong University |
Senior |
Mei Sun |
Male |
188.9 |
89.0 |
N |
3 |
Fudan University |
Sophomore |
Xiaojuan Sun |
Female |
NaN |
41.0 |
N |
4 |
Fudan University |
Sophomore |
Gaojuan You |
Male |
174.0 |
74.0 |
N |
df.tail()
|
School |
Grade |
Name |
Gender |
Height |
Weight |
Transfer |
195 |
Fudan University |
Junior |
Xiaojuan Sun |
Female |
153.9 |
46.0 |
N |
196 |
Tsinghua University |
Senior |
Li Zhao |
Female |
160.9 |
50.0 |
N |
197 |
Shanghai Jiao Tong University |
Senior |
Chengqiang Chu |
Female |
153.9 |
45.0 |
N |
198 |
Shanghai Jiao Tong University |
Senior |
Chengmei Shen |
Male |
175.3 |
71.0 |
N |
199 |
Tsinghua University |
Sophomore |
Chunpeng Lv |
Male |
155.7 |
51.0 |
N |
总结:head tail 分别返回数据的前多少行,和后多少行,括号里不指定的话,默认返回前5或者后5
df.info()
RangeIndex: 200 entries, 0 to 199
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 School 200 non-null object
1 Grade 200 non-null object
2 Name 200 non-null object
3 Gender 200 non-null object
4 Height 183 non-null float64
5 Weight 189 non-null float64
6 Transfer 188 non-null object
dtypes: float64(2), object(5)
memory usage: 11.1+ KB
df.describe()
|
Height |
Weight |
count |
183.000000 |
189.000000 |
mean |
163.218033 |
55.015873 |
std |
8.608879 |
12.824294 |
min |
145.400000 |
34.000000 |
25% |
157.150000 |
46.000000 |
50% |
161.900000 |
51.000000 |
75% |
167.500000 |
65.000000 |
max |
193.900000 |
89.000000 |
总结:info和describe分别返回表的基本信息和数据的主要统计量
2.3.2 特征统计函数
df_demo = df[['Height','Weight']]
df_demo.mean()
Height 163.218033
Weight 55.015873
dtype: float64
df_demo.max()
Height 193.9
Weight 89.0
dtype: float64
df_demo.quantile()
Height 161.9
Weight 51.0
Name: 0.5, dtype: float64
df_demo.quantile(0.75)
Height 167.5
Weight 65.0
Name: 0.75, dtype: float64
df_demo.count()
Height 183
Weight 189
dtype: int64
df_demo.idxmax()
Height 193
Weight 2
dtype: int64
df_demo.mean(axis = 1).head()
0 102.45
1 118.25
2 138.95
3 41.00
4 124.00
dtype: float64
2.3.3 唯一值函数
df['School'].unique()
array(['Shanghai Jiao Tong University', 'Peking University',
'Fudan University', 'Tsinghua University'], dtype=object)
df['School'].nunique()
4
总结:unique可以得到某一列中的唯一值(出现的哪几类的数据),nunique得到种类的数量
df['School'].value_counts()
Tsinghua University 69
Shanghai Jiao Tong University 57
Fudan University 40
Peking University 34
Name: School, dtype: int64
df_demo = df[['Gender','Transfer','Name']]
df_demo.drop_duplicates(['Gender','Transfer'],keep = 'last')
|
Gender |
Transfer |
Name |
147 |
Male |
NaN |
Juan You |
150 |
Male |
Y |
Chengpeng You |
169 |
Female |
Y |
Chengquan Qin |
194 |
Female |
NaN |
Yanmei Qian |
197 |
Female |
N |
Chengqiang Chu |
199 |
Male |
N |
Chunpeng Lv |
df_demo.drop_duplicates(['Name','Gender'],keep = False).head(100)
|
Gender |
Transfer |
Name |
0 |
Female |
N |
Gaopeng Yang |
1 |
Male |
N |
Changqiang You |
2 |
Male |
N |
Mei Sun |
4 |
Male |
N |
Gaojuan You |
5 |
Female |
N |
Xiaoli Qian |
... |
... |
... |
... |
115 |
Female |
N |
Gaofeng Sun |
116 |
Male |
N |
Feng Zhao |
117 |
Male |
N |
Chunli Zhao |
119 |
Female |
N |
Peng Zhang |
120 |
Female |
NaN |
Peng Han |
100 rows × 3 columns
df['School'].drop_duplicates()
0 Shanghai Jiao Tong University
1 Peking University
3 Fudan University
5 Tsinghua University
Name: School, dtype: object
df_demo.duplicated(['Gender','Transfer']).head(100)
0 False
1 False
2 True
3 True
4 True
...
95 True
96 True
97 True
98 True
99 True
Length: 100, dtype: bool
df['School'].duplicated().head(100)
0 False
1 False
2 True
3 False
4 True
...
95 True
96 True
97 True
98 True
99 True
Name: School, Length: 100, dtype: bool
2.3.4 替换函数
总结:映射替换(replace),逻辑替换(where,mask),数值替换(abs,clip,round)
df['Gender'].replace({
'Female':0,'Male':1}).head(100)
0 0
1 1
2 1
3 0
4 1
..
95 1
96 0
97 0
98 1
99 1
Name: Gender, Length: 100, dtype: int64
df['Gender'].replace(['Female','Male'],[0,1]).head(100)
0 0
1 1
2 1
3 0
4 1
..
95 1
96 0
97 0
98 1
99 1
Name: Gender, Length: 100, dtype: int64
s = pd.Series(['a',1,'b',2,1,1,'a'])
s
0 a
1 1
2 b
3 2
4 1
5 1
6 a
dtype: object
s.replace([1,2],method='ffill')
0 a
1 a
2 b
3 b
4 b
5 b
6 a
dtype: object
s.replace([1,2],method='bfill')
0 a
1 b
2 b
3 a
4 a
5 a
6 a
dtype: object
s = pd.Series([-1,2,100,-50])
s.where(s<0)
0 -1.0
1 NaN
2 NaN
3 -50.0
dtype: float64
s.where(s<0,100)
0 -1
1 100
2 100
3 -50
dtype: int64
s.mask(s<0)
0 NaN
1 2.0
2 100.0
3 NaN
dtype: float64
s.mask(s<0,10)
0 10
1 2
2 100
3 10
dtype: int64
s_condition = pd.Series([True,False,False,True],index = s.index)
s.mask(s_condition,-50)
0 -50
1 2
2 100
3 -50
dtype: int64
s = pd.Series([-1,3.5515,100,-50])
s.round()
0 -1.0
1 4.0
2 100.0
3 -50.0
dtype: float64
s.abs()
0 1.0000
1 3.5515
2 100.0000
3 50.0000
dtype: float64
s
0 -1.0000
1 3.5515
2 100.0000
3 -50.0000
dtype: float64
s.clip(0,5)
0 0.0000
1 3.5515
2 5.0000
3 0.0000
dtype: float64
参数:lower : float或array_like,默认为None
最小阈值。低于此阈值的所有值都将设置为它。
upper : float或array_like,默认为None
最大阈值。高于此阈值的所有值都将设置为它。
axis : int或string轴名称,可选
沿给定轴将对象与下部和上部对齐。
inplace : 布尔值,默认为False
是否对数据执行操作。
返回:
Series或DataFrame
与调用对象相同的类型,替换了剪辑边界之外的值
参考:https://www.cjavapy.com/article/330/
import pandas as pd
data = {
'col_0': [9, -3, 0, -1, 5], 'col_1': [-2, -7, 6, 8, -5]}
df = pd.DataFrame(data)
df
|
col_0 |
col_1 |
0 |
9 |
-2 |
1 |
-3 |
-7 |
2 |
0 |
6 |
3 |
-1 |
8 |
4 |
5 |
-5 |
t = pd.Series([2, -4, -1, 6, 3])
t
0 2
1 -4
2 -1
3 6
4 3
dtype: int64
df
|
col_0 |
col_1 |
0 |
9 |
-2 |
1 |
-3 |
-7 |
2 |
0 |
6 |
3 |
-1 |
8 |
4 |
5 |
-5 |
df.clip(t, t + 4, axis=0)
|
col_0 |
col_1 |
0 |
6 |
2 |
1 |
-3 |
-4 |
2 |
0 |
3 |
3 |
6 |
8 |
4 |
5 |
3 |
2.3.5 排序函数
df = pd.read_csv('joyful-pandas-master/data/learn_pandas.csv')
df_demo = df[['Grade','Name','Height','Weight']].set_index(['Grade','Name'])
df_demo.sort_values('Height').head()
|
|
Height |
Weight |
Grade |
Name |
|
|
Junior |
Xiaoli Chu |
145.4 |
34.0 |
Senior |
Gaomei Lv |
147.3 |
34.0 |
Sophomore |
Peng Han |
147.8 |
34.0 |
Senior |
Changli Lv |
148.7 |
41.0 |
Sophomore |
Changjuan You |
150.5 |
40.0 |
df_demo.sort_values('Height',ascending = False).head()
|
|
Height |
Weight |
Grade |
Name |
|
|
Senior |
Xiaoqiang Qin |
193.9 |
79.0 |
Mei Sun |
188.9 |
89.0 |
Gaoli Zhao |
186.5 |
83.0 |
Freshman |
Qiang Han |
185.3 |
87.0 |
Senior |
Qiang Zheng |
183.9 |
87.0 |
df_demo.sort_values(['Weight','Height'],ascending=[True,False]).head(100)
|
|
Height |
Weight |
Grade |
Name |
|
|
Sophomore |
Peng Han |
147.8 |
34.0 |
Senior |
Gaomei Lv |
147.3 |
34.0 |
Junior |
Xiaoli Chu |
145.4 |
34.0 |
Sophomore |
Qiang Zhou |
150.5 |
36.0 |
Freshman |
Yanqiang Xu |
152.4 |
38.0 |
Qiang Han |
151.8 |
38.0 |
Senior |
Chengpeng Zheng |
151.7 |
38.0 |
Sophomore |
Mei Xu |
154.2 |
39.0 |
Freshman |
Xiaoquan Sun |
154.6 |
40.0 |
Sophomore |
Qiang Sun |
154.3 |
40.0 |
Senior |
Juan You |
154.0 |
40.0 |
Sophomore |
Changjuan You |
150.5 |
40.0 |
Senior |
Yanli Zhang |
154.2 |
41.0 |
Changli Lv |
148.7 |
41.0 |
Sophomore |
Xiaojuan Sun |
NaN |
41.0 |
Freshman |
Gaojuan Qin |
NaN |
41.0 |
Gaoquan Sun |
156.8 |
42.0 |
Senior |
Xiaopeng Chu |
156.5 |
42.0 |
Junior |
Qiang Lv |
152.1 |
42.0 |
Sophomore |
Xiaoqiang Feng |
157.0 |
43.0 |
Junior |
Gaoqiang Zhou |
156.8 |
43.0 |
Freshman |
Xiaoli Xu |
156.5 |
43.0 |
Sophomore |
Feng Qian |
156.4 |
43.0 |
Freshman |
Quan Chu |
154.7 |
43.0 |
Junior |
Xiaoquan Lv |
153.2 |
43.0 |
Freshman |
Qiang Zhang |
152.7 |
43.0 |
Gaofeng Zhao |
152.2 |
43.0 |
Sophomore |
Changmei Xu |
151.6 |
43.0 |
Freshman |
Chunmei Wang |
151.2 |
43.0 |
Feng Yang |
158.9 |
44.0 |
Senior |
Quan Xu |
157.0 |
44.0 |
Junior |
Mei Zhang |
156.5 |
44.0 |
Chengpeng Zhao |
156.0 |
44.0 |
Freshman |
Li Lv |
155.2 |
44.0 |
Junior |
Gaojuan Qian |
154.8 |
44.0 |
Chunmei Han |
153.2 |
44.0 |
Sophomore |
Chunpeng Shi |
152.9 |
44.0 |
Senior |
Gaojuan Zhao |
151.5 |
44.0 |
Junior |
Yanpeng Han |
NaN |
44.0 |
Freshman |
Changquan Chu |
159.6 |
45.0 |
Junior |
Xiaofeng You |
158.5 |
45.0 |
Sophomore |
Xiaoquan Zhang |
158.3 |
45.0 |
Senior |
Chengqiang Chu |
153.9 |
45.0 |
Freshman |
Xiaoli Lv |
152.5 |
45.0 |
Junior |
Xiaofeng Zhao |
159.9 |
46.0 |
Freshman |
Gaopeng Yang |
158.9 |
46.0 |
Gaoli Feng |
157.4 |
46.0 |
Senior |
Changmei Sun |
155.3 |
46.0 |
Xiaopeng Qian |
154.3 |
46.0 |
Junior |
Xiaojuan Sun |
153.9 |
46.0 |
Changjuan You |
161.4 |
47.0 |
Freshman |
Chengquan Chu |
161.3 |
47.0 |
Senior |
Juan Zhao |
161.2 |
47.0 |
Junior |
Yanli Zhang |
160.6 |
47.0 |
Senior |
Juan Zhang |
159.9 |
47.0 |
Chunjuan Xu |
159.8 |
47.0 |
Junior |
Chunjuan Zhang |
158.9 |
47.0 |
Senior |
Xiaopeng Lv |
158.4 |
47.0 |
Sophomore |
Xiaomei Shi |
157.9 |
47.0 |
Senior |
Juan Qin |
156.0 |
47.0 |
Gaoli Wu |
155.7 |
47.0 |
Feng Zhou |
155.6 |
47.0 |
Freshman |
Changli Zhang |
163.0 |
48.0 |
Gaopeng Shi |
162.9 |
48.0 |
Junior |
Gaofeng Sun |
162.8 |
48.0 |
Sophomore |
Yanfeng Qian |
160.1 |
48.0 |
Junior |
Qiang Wang |
157.5 |
48.0 |
Gaoli Xu |
157.3 |
48.0 |
Senior |
Peng You |
NaN |
48.0 |
Junior |
Yanli You |
NaN |
48.0 |
Freshman |
Yanjuan Han |
163.7 |
49.0 |
Senior |
Feng Zheng |
162.6 |
49.0 |
Junior |
Xiaojuan Zhao |
160.3 |
49.0 |
Senior |
Yanmei Qian |
160.3 |
49.0 |
Junior |
Changjuan Xu |
159.6 |
49.0 |
Yanjuan Lv |
159.3 |
49.0 |
Freshman |
Xiaomei Yang |
159.3 |
49.0 |
Xiaofeng Qian |
158.5 |
49.0 |
Changqiang Yang |
156.0 |
49.0 |
Senior |
Qiang Chu |
162.4 |
50.0 |
Junior |
Gaoqiang Qian |
161.9 |
50.0 |
Senior |
Mei Zheng |
161.1 |
50.0 |
Li Zhao |
160.9 |
50.0 |
Xiaojuan Qian |
160.6 |
50.0 |
Junior |
Mei Sun |
159.5 |
50.0 |
Senior |
Quan Qian |
159.0 |
50.0 |
Junior |
Feng Zheng |
165.6 |
51.0 |
Li Chu |
165.2 |
51.0 |
Xiaojuan Qian |
164.7 |
51.0 |
Freshman |
Li Wu |
164.3 |
51.0 |
Yanqiang Feng |
162.3 |
51.0 |
Junior |
Chengquan Shi |
160.8 |
51.0 |
Xiaopeng Zhou |
160.2 |
51.0 |
Feng Zhao |
159.0 |
51.0 |
Freshman |
Xiaoli Qian |
158.0 |
51.0 |
Junior |
Gaoquan Shen |
158.0 |
51.0 |
Sophomore |
Chunpeng Lv |
155.7 |
51.0 |
Senior |
Mei Feng |
NaN |
51.0 |
Junior |
Gaoquan Chu |
NaN |
51.0 |
Senior |
Feng Yang |
167.0 |
52.0 |
df_demo.sort_values(['Height','Weight'],ascending=[True,False]).head(100)
|
|
Height |
Weight |
Grade |
Name |
|
|
Junior |
Xiaoli Chu |
145.4 |
34.0 |
Senior |
Gaomei Lv |
147.3 |
34.0 |
Sophomore |
Peng Han |
147.8 |
34.0 |
Senior |
Changli Lv |
148.7 |
41.0 |
Sophomore |
Changjuan You |
150.5 |
40.0 |
Qiang Zhou |
150.5 |
36.0 |
Freshman |
Chunmei Wang |
151.2 |
43.0 |
Senior |
Gaojuan Zhao |
151.5 |
44.0 |
Sophomore |
Changmei Xu |
151.6 |
43.0 |
Senior |
Chengpeng Zheng |
151.7 |
38.0 |
Freshman |
Qiang Han |
151.8 |
38.0 |
Junior |
Qiang Lv |
152.1 |
42.0 |
Freshman |
Gaofeng Zhao |
152.2 |
43.0 |
Yanqiang Xu |
152.4 |
38.0 |
Xiaoli Lv |
152.5 |
45.0 |
Qiang Zhang |
152.7 |
43.0 |
Sophomore |
Chunpeng Shi |
152.9 |
44.0 |
Junior |
Chunmei Han |
153.2 |
44.0 |
Xiaoquan Lv |
153.2 |
43.0 |
Senior |
Mei Chen |
153.6 |
NaN |
Junior |
Xiaojuan Sun |
153.9 |
46.0 |
Senior |
Chengqiang Chu |
153.9 |
45.0 |
Juan You |
154.0 |
40.0 |
Yanli Zhang |
154.2 |
41.0 |
Sophomore |
Mei Xu |
154.2 |
39.0 |
Senior |
Xiaopeng Qian |
154.3 |
46.0 |
Sophomore |
Qiang Sun |
154.3 |
40.0 |
Freshman |
Xiaoquan Sun |
154.6 |
40.0 |
Quan Chu |
154.7 |
43.0 |
Junior |
Gaojuan Qian |
154.8 |
44.0 |
Freshman |
Li Lv |
155.2 |
44.0 |
Senior |
Changmei Sun |
155.3 |
46.0 |
Feng Zhou |
155.6 |
47.0 |
Sophomore |
Chunpeng Lv |
155.7 |
51.0 |
Senior |
Gaoli Wu |
155.7 |
47.0 |
Freshman |
Changqiang Yang |
156.0 |
49.0 |
Senior |
Juan Qin |
156.0 |
47.0 |
Junior |
Chengpeng Zhao |
156.0 |
44.0 |
Sophomore |
Feng Qian |
156.4 |
43.0 |
Junior |
Mei Zhang |
156.5 |
44.0 |
Freshman |
Xiaoli Xu |
156.5 |
43.0 |
Senior |
Xiaopeng Chu |
156.5 |
42.0 |
Junior |
Gaoqiang Zhou |
156.8 |
43.0 |
Freshman |
Gaoquan Sun |
156.8 |
42.0 |
Senior |
Quan Xu |
157.0 |
44.0 |
Sophomore |
Xiaoqiang Feng |
157.0 |
43.0 |
Junior |
Gaoli Xu |
157.3 |
48.0 |
Freshman |
Gaoli Feng |
157.4 |
46.0 |
Junior |
Qiang Wang |
157.5 |
48.0 |
Senior |
Qiang Shi |
157.7 |
NaN |
Sophomore |
Xiaomei Shi |
157.9 |
47.0 |
Freshman |
Xiaoli Qian |
158.0 |
51.0 |
Junior |
Gaoquan Shen |
158.0 |
51.0 |
Sophomore |
Xiaoquan Zhang |
158.3 |
45.0 |
Senior |
Xiaopeng Lv |
158.4 |
47.0 |
Freshman |
Xiaofeng Qian |
158.5 |
49.0 |
Junior |
Xiaofeng You |
158.5 |
45.0 |
Chunjuan Zhang |
158.9 |
47.0 |
Freshman |
Gaopeng Yang |
158.9 |
46.0 |
Feng Yang |
158.9 |
44.0 |
Junior |
Feng Zhao |
159.0 |
51.0 |
Senior |
Quan Qian |
159.0 |
50.0 |
Junior |
Yanjuan Lv |
159.3 |
49.0 |
Freshman |
Xiaomei Yang |
159.3 |
49.0 |
Senior |
Gaopeng Qin |
159.4 |
52.0 |
Junior |
Mei Sun |
159.5 |
50.0 |
Changjuan Xu |
159.6 |
49.0 |
Freshman |
Changquan Chu |
159.6 |
45.0 |
Senior |
Chunjuan Xu |
159.8 |
47.0 |
Juan Zhang |
159.9 |
47.0 |
Junior |
Xiaofeng Zhao |
159.9 |
46.0 |
Xiaopeng Shen |
160.1 |
53.0 |
Sophomore |
Yanfeng Qian |
160.1 |
48.0 |
Junior |
Xiaopeng Zhou |
160.2 |
51.0 |
Xiaojuan Zhao |
160.3 |
49.0 |
Senior |
Yanmei Qian |
160.3 |
49.0 |
Junior |
Quan Zhao |
160.6 |
53.0 |
Senior |
Xiaojuan Qian |
160.6 |
50.0 |
Junior |
Yanli Zhang |
160.6 |
47.0 |
Chengquan Qin |
160.7 |
52.0 |
Sophomore |
Xiaoqiang Qin |
160.8 |
54.0 |
Junior |
Chengquan Shi |
160.8 |
51.0 |
Qiang Sun |
160.8 |
NaN |
Senior |
Li Zhao |
160.9 |
50.0 |
Freshman |
Xiaopeng Zhao |
161.0 |
53.0 |
Senior |
Mei Zheng |
161.1 |
50.0 |
Juan Zhao |
161.2 |
47.0 |
Freshman |
Chengquan Chu |
161.3 |
47.0 |
Junior |
Changjuan You |
161.4 |
47.0 |
Senior |
Li Xu |
161.5 |
53.0 |
Chunpeng Qian |
161.6 |
NaN |
Junior |
Xiaopeng Sun |
161.9 |
54.0 |
Gaoqiang Qian |
161.9 |
50.0 |
Chunquan Xu |
162.1 |
54.0 |
Freshman |
Yanqiang Feng |
162.3 |
51.0 |
Xiaojuan Chu |
162.4 |
58.0 |
Senior |
Qiang Chu |
162.4 |
50.0 |
Freshman |
Peng Wu |
162.5 |
53.0 |
Qiang Chu |
162.5 |
52.0 |
Senior |
Feng Zheng |
162.6 |
49.0 |
df_demo.sort_index(level=['Grade','Name'],ascending=[True,False]).head(100)
|
|
Height |
Weight |
Grade |
Name |
|
|
Freshman |
Yanquan Wang |
163.5 |
55.0 |
Yanqiang Xu |
152.4 |
38.0 |
Yanqiang Feng |
162.3 |
51.0 |
Yanpeng Lv |
NaN |
65.0 |
Yanli Zhang |
165.1 |
52.0 |
Yanjuan Zhao |
NaN |
53.0 |
Yanjuan Han |
163.7 |
49.0 |
Xiaoquan Sun |
154.6 |
40.0 |
Xiaopeng Zhou |
174.1 |
74.0 |
Xiaopeng Zhao |
161.0 |
53.0 |
Xiaopeng Han |
164.1 |
53.0 |
Xiaomei Yang |
159.3 |
49.0 |
Xiaoli Xu |
156.5 |
43.0 |
Xiaoli Qian |
158.0 |
51.0 |
Xiaoli Lv |
152.5 |
45.0 |
Xiaojuan Qin |
NaN |
79.0 |
Xiaojuan Chu |
162.4 |
58.0 |
Xiaofeng Qian |
158.5 |
49.0 |
Quan Chu |
154.7 |
43.0 |
Qiang Zhang |
152.7 |
43.0 |
Qiang Shi |
164.5 |
52.0 |
Qiang Han |
185.3 |
87.0 |
Qiang Han |
151.8 |
38.0 |
Qiang Feng |
178.9 |
80.0 |
Qiang Chu |
162.5 |
52.0 |
Peng Zhang |
163.1 |
NaN |
Peng Wu |
162.5 |
53.0 |
Li Wu |
164.3 |
51.0 |
Li Lv |
155.2 |
44.0 |
Juan Zhang |
168.6 |
55.0 |
Gaoquan Xu |
NaN |
52.0 |
Gaoquan Sun |
156.8 |
42.0 |
Gaoqiang Qin |
170.2 |
63.0 |
Gaopeng Yang |
158.9 |
46.0 |
Gaopeng Shi |
162.9 |
48.0 |
Gaoli Zhao |
175.4 |
78.0 |
Gaoli Feng |
157.4 |
46.0 |
Gaojuan Qin |
NaN |
41.0 |
Gaofeng Zhao |
152.2 |
43.0 |
Feng Yang |
158.9 |
44.0 |
Feng Wang |
176.3 |
74.0 |
Chunmei Wang |
151.2 |
43.0 |
Chunmei Shi |
164.9 |
52.0 |
Chunli Zhao |
180.2 |
83.0 |
Chengquan Chu |
161.3 |
47.0 |
Changquan Chu |
159.6 |
45.0 |
Changqiang You |
166.5 |
70.0 |
Changqiang Yang |
156.0 |
49.0 |
Changpeng Zhao |
181.3 |
83.0 |
Changmei Lv |
172.2 |
75.0 |
Changmei Feng |
163.8 |
56.0 |
Changli Zhang |
163.0 |
48.0 |
Junior |
Yanpeng Han |
NaN |
44.0 |
Yanmei Yang |
167.7 |
57.0 |
Yanli Zhang |
160.6 |
47.0 |
Yanli You |
NaN |
48.0 |
Yanli Wang |
169.9 |
67.0 |
Yanjuan Lv |
159.3 |
49.0 |
Yanfeng Qian |
178.7 |
75.0 |
Xiaoquan Lv |
153.2 |
43.0 |
Xiaoqiang Qin |
170.1 |
68.0 |
Xiaopeng Zhou |
160.2 |
51.0 |
Xiaopeng Sun |
161.9 |
54.0 |
Xiaopeng Shen |
160.1 |
53.0 |
Xiaoli Wang |
171.4 |
70.0 |
Xiaoli Chu |
145.4 |
34.0 |
Xiaojuan Zhao |
160.3 |
49.0 |
Xiaojuan Sun |
153.9 |
46.0 |
Xiaojuan Qian |
164.7 |
51.0 |
Xiaofeng Zhao |
159.9 |
46.0 |
Xiaofeng You |
158.5 |
45.0 |
Quan Zhao |
160.6 |
53.0 |
Qiang You |
170.0 |
56.0 |
Qiang Wang |
157.5 |
48.0 |
Qiang Sun |
163.1 |
53.0 |
Qiang Sun |
160.8 |
NaN |
Qiang Lv |
152.1 |
42.0 |
Peng Wang |
162.8 |
65.0 |
Mei Zhang |
156.5 |
44.0 |
Mei Sun |
159.5 |
50.0 |
Li Sun |
166.6 |
54.0 |
Li Chu |
165.2 |
51.0 |
Juan Xu |
164.8 |
NaN |
Gaoquan Zhou |
166.8 |
70.0 |
Gaoquan Shen |
158.0 |
51.0 |
Gaoquan Chu |
NaN |
51.0 |
Gaoqiang Zhou |
156.8 |
43.0 |
Gaoqiang Qin |
167.1 |
71.0 |
Gaoqiang Qian |
161.9 |
50.0 |
Gaoli Xu |
157.3 |
48.0 |
Gaojuan Qian |
154.8 |
44.0 |
Gaofeng Sun |
162.8 |
48.0 |
Feng Zheng |
165.6 |
51.0 |
Feng Zhao |
159.0 |
51.0 |
Chunquan Xu |
162.1 |
54.0 |
Chunqiang Chu |
168.6 |
72.0 |
Chunmei Han |
153.2 |
44.0 |
Chunjuan Zhang |
158.9 |
47.0 |
Chunfeng Zhao |
173.4 |
72.0 |
Chengquan Shi |
160.8 |
51.0 |
2.3.6 apply方法
df_demo = df[['Height','Weight']]
def my_mean(x):
res = x.mean()
return res
df_demo.apply(my_mean)
Height 163.218033
Weight 55.015873
dtype: float64
df_demo.apply(lambda x:x.mean())
Height 163.218033
Weight 55.015873
dtype: float64
df_demo.apply(lambda x:x.mean(),axis = 1).head()
0 102.45
1 118.25
2 138.95
3 41.00
4 124.00
dtype: float64
df_demo.apply(lambda x:(x-x.mean()).abs().mean())
Height 6.707229
Weight 10.391870
dtype: float64
df_demo.mad()
Height 6.707229
Weight 10.391870
dtype: float64
2.4 窗口对象
https://www.gairuo.com/p/pandas-window-functions
可以把“窗口”(windows)这个理解一个集合,一个窗口就是一个集合,在统计分析中有需要不同的「窗口」,比如一个部门分成不同组,在统计时会按组进行平均、排名等操作。再比如,在一些像时间这种有顺序的数据,我们可能5天分一组、一月分一组再进行排序、求中位数等计算。
rolling(10) 与 groupby 很像,但并没有进行分组,而是创建了一个按移动 10(天)位的滑动窗口对象。我们再对每个对象进行统计操作。
2.4.1 滑窗对象
rolling得到滑窗对象,最重要的参数是窗口大小window
s = pd.Series([1,2,3,4,5])
roller = s.rolling(window = 3)
roller
Rolling [window=3,center=False,axis=0]
roller.mean()
0 NaN
1 NaN
2 2.0
3 3.0
4 4.0
dtype: float64
roller.sum()
0 NaN
1 NaN
2 6.0
3 9.0
4 12.0
dtype: float64
s2 = pd.Series([1,2,6,16,30])
roller.cov(s2)
0 NaN
1 NaN
2 2.5
3 7.0
4 12.0
dtype: float64
roller.corr(s2)
0 NaN
1 NaN
2 0.944911
3 0.970725
4 0.995402
dtype: float64
roller.apply(lambda x:x.mean())
0 NaN
1 NaN
2 2.0
3 3.0
4 4.0
dtype: float64
a = pd.Series([1,3,6,10,15])
a.shift(2)
0 NaN
1 NaN
2 1.0
3 3.0
4 6.0
dtype: float64
a.diff(3)
0 NaN
1 NaN
2 NaN
3 9.0
4 12.0
dtype: float64
a.pct_change()
0 NaN
1 2.000000
2 1.000000
3 0.666667
4 0.500000
dtype: float64
a.shift(-1)
0 3.0
1 6.0
2 10.0
3 15.0
4 NaN
dtype: float64
a.diff(-2)
0 -5.0
1 -7.0
2 -9.0
3 NaN
4 NaN
dtype: float64
a
0 1
1 3
2 6
3 10
4 15
dtype: int64
2.4.2 扩张窗口
s = pd.Series([1,3,6,10])
s.expanding().mean()
0 1.000000
1 2.000000
2 3.333333
3 5.000000
dtype: float64