Dataframe的相关知识点总结

dataframe相关知识点总结表.png

（1）Dataframe的基本概念

data = {'name':['Jack','Tom','Mary'],
        'age':[18,19,20],
       'gender':['m','m','w']} # 字典dict
frame = pd.DataFrame(data)
print(frame,type(frame))
# 用pd.DataFrame()生成Dataframe
# Dataframe带有index（行标签）和columns（列标签）

print('--------')
print('查看frame的行标签和行标签类型:\n',frame.index,type(frame.index))
# .index查看行标签
print('查看frame的列标签和列标签类型:\n',frame.columns,type(frame.columns))
# .columns查看列标签
print('查看frame的值和值的数据类型:\n',frame.values,type(frame.values))
# .values查看值，数据类型为ndarra

输出结果：

   age gender  name
0   18      m  Jack
1   19      m   Tom
2   20      w  Mary 
--------
查看frame的行标签和行标签类型:
 RangeIndex(start=0, stop=3, step=1) 
查看frame的列标签和列标签类型:
 Index(['age', 'gender', 'name'], dtype='object') 
查看frame的值和值的数据类型:
 [[18 'm' 'Jack']
 [19 'm' 'Tom']
 [20 'w' 'Mary']]

（2）Dataframe的创建方法

创建Dataframe用到 pd.DataFrame() 一共有5种创建Dataframe的方法。【创建Dataframe较多使用方法1，2，3】

--- >>> 方法1：由数组/list组成的字典创建Dataframe，columns为字典key，index为默认数字标签，字典的value的长度必须保持一致。

# 由数组/list组成的字典 创建Dataframe，columns为字典key，index为默认数字标签
# 字典的值的长度必须保持一致！
data1={'a':list(range(3)),
      'b':list(range(4,7)),
      'c':list(range(7,10))}
data2={'one':np.random.rand(3),
      'two':np.random.rand(3)}
print(data1)
print(data2)
# data1是list组成的字典，data2是ndarray数组组成的字典
df1=pd.DataFrame(data1)
df2=pd.DataFrame(data2)
print(df1)
print(df2)
print('----------------')
# 重新定义columns列标签
df1=pd.DataFrame(data1,columns=list('cdab'))
print(df1)
# 重新定义的columns比原数据多，现有数据中没有该列（比如'd'），则产生NaN值
df1=pd.DataFrame(data1,columns=list('ac'))
print(df1)
# 如果columns重新指定时候，列的数量可以少于原数据
print('----------------')
# 重新定义index行标签
df2=pd.DataFrame(data2,index=list('abe'))
print(df2)
# 重新定义的index行标签，必须与原数据一样多，不能多不能少，否则报错

输出结果：

{'a': [0, 1, 2], 'c': [7, 8, 9], 'b': [4, 5, 6]}
{'two': array([ 0.89216661,  0.31251123,  0.53794988]), 'one': array([ 0.75671784,  0.06891223,  0.60083476])}
   a  b  c
0  0  4  7
1  1  5  8
2  2  6  9
        one       two
0  0.756718  0.892167
1  0.068912  0.312511
2  0.600835  0.537950
----------------
   c    d  a  b
0  7  NaN  0  4
1  8  NaN  1  5
2  9  NaN  2  6
   a  c
0  0  7
1  1  8
2  2  9
----------------
        one       two
a  0.756718  0.892167
b  0.068912  0.312511
e  0.600835  0.537950

--->>> 方法2：由Seris组成的字典创建Dataframe，columns为字典key，index为Series的标签；若Series没有重新制定标签，则是默认数字标签。

df1=pd.DataFrame({'one':pd.Series(np.random.rand(2)),
                 'two':pd.Series(np.random.rand(3))})
print(df1)
df2=pd.DataFrame({'one':pd.Series(np.random.rand(2),index=list('ab')),
                 'two':pd.Series(np.random.rand(3),index=list('abc'))})
print(df2)
# df1没有重新定义index标签，使用Series的默认数字标签
# df2重新定义了index标签
# 由Series组成的字典，Series可以长度不一样，生成的Dataframe会出现NaN值

输出结果：

        one       two
0  0.764246  0.276496
1  0.877065  0.385122
2       NaN  0.083968
        one       two
a  0.043106  0.609206
b  0.175361  0.457400
c       NaN  0.082632

--->>> 方法3：通过二维数组直接创建Dataframe，得到一样形状的结果数据，如果不指定index和columns，两者均返回默认数字格式，并且index和colunms指定长度须与原数组保持一致。

# 创建二维数组ndarray
ar=np.reshape(np.random.rand(10),(2,5))
print(ar)
# 通过二维数组用pd.DataFrame()创建Dataframe
# 若没有重新制定index行标签和columns列标签，默认均使用数字标签
df1=pd.DataFrame(ar)
print(df1)
# 重新定义index和columns
# 重新定义的行标签和列标签必须与原数据长度一致，否则多了少了都报错
df2=pd.DataFrame(ar,index=list('ab'),columns=list('rkefi'))
print(df2)

输出结果：

[[ 0.31690795  0.4554707   0.46110939  0.49108045  0.93137807]
 [ 0.08933855  0.13430606  0.45611142  0.40476408  0.72615298]]
          0         1         2         3         4
0  0.316908  0.455471  0.461109  0.491080  0.931378
1  0.089339  0.134306  0.456111  0.404764  0.726153
          r         k         e         f         i
a  0.316908  0.455471  0.461109  0.491080  0.931378
b  0.089339  0.134306  0.456111  0.404764  0.726153

--->>> 方法4：由字典组成的列表创建Dataframe，columns为字典的key，index不做指定则为默认数组标签

# 创建字典组成的列表
lst=[{'one':1,'two':2},
    {'one':9,'two':12,'three':13}]
print(lst)
# 由字典组成的列表创建Dataframe
df1=pd.DataFrame(lst)
print(df1)
print('----------')
# 重新定义index，长度必须与原数据一致
df2=pd.DataFrame(lst,index=list('ab'))
print(df2)
# 重新定义columns，columns参数可以增加和减少现有列，如出现新的列，值为NaN
df3=pd.DataFrame(lst,columns=['one','two','four'])
print(df3)

输出结果：

[{'two': 2, 'one': 1}, {'two': 12, 'one': 9, 'three': 13}]
   one  three  two
0    1    NaN    2
1    9   13.0   12
----------
   one  three  two
a    1    NaN    2
b    9   13.0   12
   one  two  four
0    1    2   NaN
1    9   12   NaN

--->>> 方法5：由字典组成的字典创建Dataframe，columns为字典的key
columns参数可以增加和减少现有列，如出现新的列，值为NaN；
index在这里和之前不同，并不能重新定义新的index，如果指向新的标签，值为NaN （非常重要！）

data = {'Jack':{'math':90,'english':89,'art':78},
       'Marry':{'math':82,'english':95,'art':92},
       'Tom':{'math':78,'english':67}}
df1 = pd.DataFrame(data)
print(df1)
print('----------')
df2 = pd.DataFrame(data, columns = ['Jack','Tom','Bob'])
df3 = pd.DataFrame(data, index = ['a','b','c'])
print(df2)
print(df3)
# df2 columns参数可以增加和减少现有列，如出现新的列，值为NaN
# df3 重新定义index了，因此返回的都是NaN

输出结果：

         Jack  Marry   Tom
art        78     92   NaN
english    89     95  67.0
math       90     82  78.0
----------
         Jack   Tom  Bob
art        78   NaN  NaN
english    89  67.0  NaN
math       90  78.0  NaN
   Jack  Marry  Tom
a   NaN    NaN  NaN
b   NaN    NaN  NaN
c   NaN    NaN  NaN

（3）Dataframe的索引和切片

--->>> df[columns]默认选择列，[]中写列标签名

# 创建Dataframe
df=pd.DataFrame(np.random.rand(12).reshape(3,4)*100,
               index=list('abc'),columns=['one','two','three','four'])
print(df)

# df[columns]默认选择列，[]中写列标签名
# 因而一般数据colunms都会单独制定，不会用默认数字列名，以免和index冲突
data1=df['one']
print(data1,type(data1))
print('---------')
# 选择单列时，单选列为Series，print结果为Series格式
data2=df[['two','four']]
print(data2,type(data2))
print('---------')
# 选择多列时，多选列为Dataframe，print结果为Dataframe格式

# 【注意！！！！】
# df[]中填数字时，则默认选择行，且只能进行切片的选择，不能单独选择（df[0]）
data3=df[0:1]
print(data3,type(data3))
data4=df[:2]
print(data4)
# 输出结果即便只选择一行,类型为Dataframe
# df[]若用来选择行，则[]内指能填数字，若填写的是索引标签名来选择行(df['one'])，则报错
# 核心重点：一般选择列用df[col]，[]中写列名，选择行有其他方法。

输出结果：

         one        two      three       four
a  90.914477  19.150946  13.451741   7.575419
b  24.002117  84.665548  21.014130  59.794000
c  67.605976  23.585457  55.870082  37.634451
a    90.914477
b    24.002117
c    67.605976
Name: one, dtype: float64 
---------
         two       four
a  19.150946   7.575419
b  84.665548  59.794000
c  23.585457  37.634451 
---------
         one        two      three      four
a  90.914477  19.150946  13.451741  7.575419 
         one        two      three       four
a  90.914477  19.150946  13.451741   7.575419
b  24.002117  84.665548  21.014130  59.794000

--->>> df.loc[] - 按index选择行

# df.loc[] - 按index选择行
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   index = ['one','two','three','four'],
                   columns = ['a','b','c','d'])
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   columns = ['a','b','c','d'])
print(df1)
print(df2)
print('-----')
data1=df1.loc['one']
data2=df2.loc[2]
print(data1,type(data1))
print(data2,type(data2))
# 选择单行时，返回Series，且原数据columns变为Series的index
print('-----')
data3=df1.loc[['one','three']]
data4=df2.loc[[3,0,4]]
print(data3,type(data3))
print(data4,type(data4))
# 选择多行时，返回Dataframe，若标签不存在，则返回NaN
# 顺序可变
print('-----')

# 做切片索引
data5=df1.loc['one':'three']
print(data5)
# df.loc[index]做切片，末端包含
data6=df2.loc[0:2]
print(data6)
#核心重点：df.loc[label]主要针对index选择行，同时支持指定index，及默认数字index

输出结果：

               a          b          c          d
one    25.422364  20.795395  11.336709  90.864404
two    86.156487  11.198670  54.739204  45.817514
three  83.311279  37.328900  84.881397  26.402103
four    8.671409  61.604665  71.907952  66.940001
           a          b          c          d
0  48.645209  99.572800  27.856945  92.607217
1  79.674493  73.053547  18.308831  88.747135
2  25.372909  62.037611  85.714760  49.503700
3  71.654423  24.761800  11.173785  46.230001
-----
a    25.422364
b    20.795395
c    11.336709
d    90.864404
Name: one, dtype: float64 
a    25.372909
b    62.037611
c    85.714760
d    49.503700
Name: 2, dtype: float64 
-----
               a          b          c          d
one    25.422364  20.795395  11.336709  90.864404
three  83.311279  37.328900  84.881397  26.402103 
           a        b          c          d
3  71.654423  24.7618  11.173785  46.230001
0  48.645209  99.5728  27.856945  92.607217
4        NaN      NaN        NaN        NaN 
-----
               a          b          c          d
one    25.422364  20.795395  11.336709  90.864404
two    86.156487  11.198670  54.739204  45.817514
three  83.311279  37.328900  84.881397  26.402103
           a          b          c          d
0  48.645209  99.572800  27.856945  92.607217
1  79.674493  73.053547  18.308831  88.747135
2  25.372909  62.037611  85.714760  49.503700

--->>> df.iloc[] - 按照整数位置（从轴的0到length-1）选择行，类似list的索引，其顺序就是dataframe的整数位置，从0开始计

# df.iloc[] - 按照整数位置（从轴的0到length-1）选择行
# 类似list的索引，其顺序就是dataframe的整数位置，从0开始计
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   index = ['one','two','three','four'],
                   columns = ['a','b','c','d'])
print(df)
print('------')

data1=df.iloc[0]
print(data1,type(data1))
data2=df.iloc[-1]
print(data2,type(data2))
print('------')
# 单位置索引,与list的索引类似

data3=df.iloc[[0,2]]
print(data3,type(data3))
#data4=df.iloc[[2,1,4]]
#print(data4)输出会报错，因为索引为4的行不存在
# 和loc索引不同，不能索引超出数据行数的整数位置
# 多位置索引，顺序可变

# df.iloc[索引]做切片，末端不包含
data4=df.iloc[:2]
print(data4)
# data4定位索引为0-1的行

输出结果：

               a          b          c          d
one    74.980847  68.412788  55.553170  70.115846
two    33.516484  79.838096  54.487476  66.310558
three  86.773031  73.986408  23.077972  56.278295
four   79.205418  19.199874  10.778306  74.669667
------
a    74.980847
b    68.412788
c    55.553170
d    70.115846
Name: one, dtype: float64 
a    79.205418
b    19.199874
c    10.778306
d    74.669667
Name: four, dtype: float64 
------
               a          b          c          d
one    74.980847  68.412788  55.553170  70.115846
three  86.773031  73.986408  23.077972  56.278295 
             a          b          c          d
one  74.980847  68.412788  55.553170  70.115846
two  33.516484  79.838096  54.487476  66.310558

--->>> 多重索引，同时索引行和列。应先选择列再选择行 —— 相当于对于一个数据，先筛选字段，再选择数据量。

df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   index = ['one','two','three','four'],
                   columns = ['a','b','c','d'])
print(df)
print('------')
print(df['a'].loc['two'])
print('------')
# 从Dataframe中索引选中指定的一个数据
# 选择a列，two行

print(df[['b','d']].loc[['one','three']])
print('------')
# 从Dataframe中索引选中指定的不连续的行和列
# 选择b、d列，one、three行

print(df[['a','d']].loc['two':'four'])
print('------')
# df[]选择多列是用逗号分隔
# df.loc[]选择多行，用切片索引，用冒号分隔

print(df[df['a'] < 50].iloc[:2])   # 选择满足判断索引的前两行数据

输出结果：

               a          b          c          d
one    23.625057  14.995706  63.663427  67.586236
two     6.870833  59.007275  66.547176  50.959152
three  48.086678  50.274425  59.094988  42.759351
four   58.284649  82.886278  16.476423  27.154450
------
6.87083343585
------
               b          d
one    14.995706  67.586236
three  50.274425  42.759351
------
               a          d
two     6.870833  50.959152
three  48.086678  42.759351
four   58.284649  27.154450
------
             a          b          c          d
one  23.625057  14.995706  63.663427  67.586236
two   6.870833  59.007275  66.547176  50.959152

--->>> 布尔型索引和Series原理相同。

# 布尔型索引
# 和Series原理相同
df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   index = ['one','two','three','four'],
                   columns = ['a','b','c','d'])
print(df)
print('------')

b1 = df < 20
print(b1,type(b1))
print(df[b1])  # 也可以书写为 df[df < 20]
print('------')
# 不做索引则会对数据每个值进行判断
# 索引结果保留 所有数据：True返回原数据，False返回值为NaN

b2 = df['a'] > 50
print(b2,type(b2))
print(df[b2])  # 也可以书写为 df[df['a'] > 50]
print('------')
# 单列做判断
# 索引结果保留 单列判断为True的行数据，包括其他列

b3 = df[['a','b']] > 50
print(b3,type(b3))
print(df[b3])  # 也可以书写为 df[df[['a','b']] > 50]
print('------')
# 多列做判断
# 索引结果保留 所有数据：True返回原数据，False返回值为NaN

b4 = df.loc[['one','three']] < 50
print(b4,type(b4))
print(df[b4])  # 也可以书写为 df[df.loc[['one','three']] < 50]
print('------')
# 多行做判断
# 索引结果保留 所有数据：True返回原数据，False返回值为NaN

输出结果：

               a          b          c          d
one    19.185849  20.303217  21.800384  45.189534
two    50.105112  28.478878  93.669529  90.029489
three  35.496053  19.248457  74.811841  20.711431
four   24.604478  57.731456  49.682717  82.132866
------
           a      b      c      d
one     True  False  False  False
two    False  False  False  False
three  False   True  False  False
four   False  False  False  False 
               a          b   c   d
one    19.185849        NaN NaN NaN
two          NaN        NaN NaN NaN
three        NaN  19.248457 NaN NaN
four         NaN        NaN NaN NaN
------
one      False
two       True
three    False
four     False
Name: a, dtype: bool 
             a          b          c          d
two  50.105112  28.478878  93.669529  90.029489
------
           a      b
one    False  False
two     True  False
three  False  False
four   False   True 
               a          b   c   d
one          NaN        NaN NaN NaN
two    50.105112        NaN NaN NaN
three        NaN        NaN NaN NaN
four         NaN  57.731456 NaN NaN
------
          a     b      c     d
one    True  True   True  True
three  True  True  False  True 
               a          b          c          d
one    19.185849  20.303217  21.800384  45.189534
two          NaN        NaN        NaN        NaN
three  35.496053  19.248457        NaN  20.711431
four         NaN        NaN        NaN        NaN
------

（4）Dataframe的基本技巧

--->>> .head()和.tail()分别查看头部数据与尾部数据；.T转置数据

# 数据查看、转置
df = pd.DataFrame(np.random.rand(16).reshape(8,2)*100,
                   columns = ['a','b'])
print(df.head(2))
print(df.tail())
# .head()查看头部数据
# .tail()查看尾部数据
# 默认查看5条
print(df.T)
# .T 转置 原形状是(3,4)置换后会变为(4,3)，且dataframe的index与columns互换

输出结果：

           a          b
0   7.949040  32.642699
1  92.208337  10.828741
           a          b
3  69.807085  57.420046
4  79.708661   0.590644
5  65.844373  47.775450
6   4.988854  64.613866
7   8.913846  87.750012
           0          1          2          3          4          5  \
a   7.949040  92.208337  82.494079  69.807085  79.708661  65.844373   
b  32.642699  10.828741  90.075549  57.420046   0.590644  47.775450   

           6          7  
a   4.988854   8.913846  
b  64.613866  87.750012

--->>> 给Dataframe添加数据和修改数据

df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   columns = ['a','b','c','d'])
print(df)

# 【添加数据】
df['g']=100
df.loc[5]=200
print(df)
print('----------')
# df['g']=100，新增列g，值均为100
# df.loc[5]=200，新增行index为5的行，值均为200

# 【修改数据】
df['c']=120
df.loc[0:1]=250
print(df)
# df['c']=120，将c列的值均修改为120
# df.loc[0:1]=250，将index为0-1的行的值均修改为250

输出结果：

           a          b          c          d
0  81.933629  58.281026  97.975374  92.070574
1  61.370852  56.591523  40.667676  32.055902
2  60.103233   5.406372  66.602369  84.946211
3  84.100587  13.442995  16.667553  13.657240
            a           b           c           d    g
0   81.933629   58.281026   97.975374   92.070574  100
1   61.370852   56.591523   40.667676   32.055902  100
2   60.103233    5.406372   66.602369   84.946211  100
3   84.100587   13.442995   16.667553   13.657240  100
5  200.000000  200.000000  200.000000  200.000000  200
----------
            a           b    c           d    g
0  250.000000  250.000000  250  250.000000  250
1  250.000000  250.000000  250  250.000000  250
2   60.103233    5.406372  120   84.946211  100
3   84.100587   13.442995  120   13.657240  100
5  200.000000  200.000000  120  200.000000  200

--->>> 给Dataframe删除数据，del语句删除列，drop()删除行、列

df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   columns = ['a','b','c','d'],index=list('njkr'))
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   columns = ['a','b','c','d'],index=[1,2,3,4])
print(df1)
print(df2)
print('-----')
del df1['a']
print(df1)
print('-----')
# del 语句删除列

print(df1.drop(['n']))
print(df2.drop(1))
print(df1)
print(df2)
print('-----')
# drop(labe),括号内填写行index标签，删除行
# drop()删除行，默认inplace=False → 删除后生成新的数据，不改变原数据

# drop()默认删除行，
# drop()若要删除列，需要加上axis = 1
print(df1.drop(['d'],axis=1))
print(df1)

输出结果：

           a          b          c          d
n  99.781228  25.472789  77.134248  88.759165
j  30.835066  97.793713   5.044348  63.591559
k  91.380903  54.532245  64.167292  15.215125
r  11.363368  22.048608  11.202747  94.795838
           a          b          c          d
1  81.331983  72.663489  87.489846  39.120100
2  17.028264  47.835036  95.802606   0.556850
3  97.790851  68.518948  42.980309  46.952173
4  40.470713   2.216181  15.092647  95.034669
-----
           b          c          d
n  25.472789  77.134248  88.759165
j  97.793713   5.044348  63.591559
k  54.532245  64.167292  15.215125
r  22.048608  11.202747  94.795838
-----
           b          c          d
j  97.793713   5.044348  63.591559
k  54.532245  64.167292  15.215125
r  22.048608  11.202747  94.795838
           a          b          c          d
2  17.028264  47.835036  95.802606   0.556850
3  97.790851  68.518948  42.980309  46.952173
4  40.470713   2.216181  15.092647  95.034669
           b          c          d
n  25.472789  77.134248  88.759165
j  97.793713   5.044348  63.591559
k  54.532245  64.167292  15.215125
r  22.048608  11.202747  94.795838
           a          b          c          d
1  81.331983  72.663489  87.489846  39.120100
2  17.028264  47.835036  95.802606   0.556850
3  97.790851  68.518948  42.980309  46.952173
4  40.470713   2.216181  15.092647  95.034669
-----
           b          c
n  25.472789  77.134248
j  97.793713   5.044348
k  54.532245  64.167292
r  22.048608  11.202747
           b          c          d
n  25.472789  77.134248  88.759165
j  97.793713   5.044348  63.591559
k  54.532245  64.167292  15.215125
r  22.048608  11.202747  94.795838

--->>> Dataframe数据的对齐计算，

df1 = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
print(df1 + df2)
# DataFrame对象之间的数据自动按照列和索引（行标签）对齐

输出结果：

          A         B         C   D
0  0.375689  0.154600  0.588799 NaN
1  0.314006  0.322111  1.521050 NaN
2 -0.617338  1.621051  1.019352 NaN
3  0.079740  1.035736 -0.069641 NaN
4  0.362620  1.040316 -3.178383 NaN
5  1.162456  1.628615  1.699686 NaN
6 -1.306092 -0.410553 -3.453950 NaN
7       NaN       NaN       NaN NaN
8       NaN       NaN       NaN NaN
9       NaN       NaN       NaN NaN

--->>> Dataframe的数据排序：
【排序1 - 按值排序，.sort_values（）】

# 排序1 - 按值排序 .sort_values
# 同样适用于Series
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                   columns = ['a','b','c','d'])
print(df1)
print(df1.sort_values(['a'], ascending = True))  # 升序
print(df1.sort_values(['a'], ascending = False))  # 降序
print('------')
# ascending参数：设置升序降序，默认升序
# 单列排序


df2 = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
                  'b':list(range(8)),
                  'c':list(range(8,0,-1))})
print(df2)
print(df2.sort_values(['a','c']))
# 多列排序，按列顺序排序
# 前提是Dataframe的数据本身是可以排序的
# df2的多列排序，先排序a升序，1，2升序，然后排序c从1开始排序5，6，7，8升序

输出结果：

           a          b          c          d
0  21.476998  94.124307  74.651624  86.206047
1  64.418365  51.001043  67.371010  57.975594
2  98.097496  42.493171  18.868006  96.989554
3  50.945730   5.336877  45.237293  48.395052
           a          b          c          d
0  21.476998  94.124307  74.651624  86.206047
3  50.945730   5.336877  45.237293  48.395052
1  64.418365  51.001043  67.371010  57.975594
2  98.097496  42.493171  18.868006  96.989554
           a          b          c          d
2  98.097496  42.493171  18.868006  96.989554
1  64.418365  51.001043  67.371010  57.975594
3  50.945730   5.336877  45.237293  48.395052
0  21.476998  94.124307  74.651624  86.206047
------
   a  b  c
0  1  0  8
1  1  1  7
2  1  2  6
3  1  3  5
4  2  4  4
5  2  5  3
6  2  6  2
7  2  7  1
   a  b  c
3  1  3  5
2  1  2  6
1  1  1  7
0  1  0  8
7  2  7  1
6  2  6  2
5  2  5  3
4  2  4  4

【排序2 - 索引排序 .sort_index（）】

# 排序2 - 索引排序 .sort_index
df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                  index = [5,4,3,2],
                   columns = ['a','b','c','d'])
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
                  index = ['h','s','x','g'],
                   columns = ['a','b','c','d'])
print(df1)
data1=df1.sort_index()
print(data1)
print(df1)
print('--------')
print(df2)
data2=df2.sort_index()
print(data2)
print(df2)
# 按照index排序
# 默认 ascending=True升序, inplace=False生成新数据排序

输出结果：

           a          b          c          d
5  46.148715   3.841598  28.772054  30.769325
4  62.050299  72.191961  96.291796  86.469963
3  34.731134  22.328996  96.434859  96.952085
2  14.084726  64.354711  45.780757  82.532313
           a          b          c          d
2  14.084726  64.354711  45.780757  82.532313
3  34.731134  22.328996  96.434859  96.952085
4  62.050299  72.191961  96.291796  86.469963
5  46.148715   3.841598  28.772054  30.769325
           a          b          c          d
5  46.148715   3.841598  28.772054  30.769325
4  62.050299  72.191961  96.291796  86.469963
3  34.731134  22.328996  96.434859  96.952085
2  14.084726  64.354711  45.780757  82.532313
--------
           a          b          c          d
h  93.225186  74.185120  83.354408  98.871708
s  69.450625  14.009624  98.526514  32.786719
x  95.268889  97.052805  70.099924  22.170984
g  96.089474  79.845508  66.001147  46.291457
           a          b          c          d
g  96.089474  79.845508  66.001147  46.291457
h  93.225186  74.185120  83.354408  98.871708
s  69.450625  14.009624  98.526514  32.786719
x  95.268889  97.052805  70.099924  22.170984
           a          b          c          d
h  93.225186  74.185120  83.354408  98.871708
s  69.450625  14.009624  98.526514  32.786719
x  95.268889  97.052805  70.099924  22.170984
g  96.089474  79.845508  66.001147  46.291457

笔记：Python之Pandas的数据结构-Dataframe

Dataframe的相关知识点总结

（1）Dataframe的基本概念

（2）Dataframe的创建方法

（3）Dataframe的索引和切片

（4）Dataframe的基本技巧

你可能感兴趣的:(笔记：Python之Pandas的数据结构-Dataframe)