pandas dataframe汇总和计算方法

Dataframe汇总计算的主要方法有:

pandas dataframe汇总和计算方法_第1张图片pandas dataframe汇总和计算方法_第2张图片

Pandas 统计的一些常用方法:

  1. frame.idxmax(): 列的最大值 输出每列最大值的索引
np.random.seed(38754)
data=np.random.randint(0,15,15).reshape(5,3)
frame=DataFrame(data,index=['a','b','c','d','e'],columns=['x','y','z'])
result=frame.idxmax()
print(result)
#输出:
x    b
y    a
z    e
  1. frame.cumsum() :返回行或列的累加值的series,默认列累加

语法:DataFrame.cumsum(axis=None, dtype=None, out=None, skipna=True, **kwargs)

axis=0: 行 axis=1:列(默认)
skipna:是否跳过空值,默认为True

np.random.seed(38754)
data=np.random.randint(0,15,15).reshape(5,3)
frame=DataFrame(data,index=['a','b','c','d','e'],columns=['x','y','z'])
result=frame.cumsum(axis=1)
print(frame)
print(result)
#输出:
    x   y   z
a   5  14   7
b  12  12   6
c   3   7   8
d  11  10   0
e   1  10  10
    x   y   z
a   5  19  26
b  12  24  30
c   3  10  18
d  11  21  21
e   1  11  21
  1. frame.describe():描述列的一些值:
np.random.seed(38754)
data=np.random.randint(0,15,15).reshape(5,3)
frame=DataFrame(data,index=['a','b','c','d','e'],columns=['x','y','z'])
print(frame.describe())
#输出:
               x          y          z
count   5.000000   5.000000   5.000000 #每列非NAN数的个数
mean    6.400000  10.600000   6.200000 #每列平均值
std     4.878524   2.607681   3.768289 #标准差
min     1.000000   7.000000   0.000000
25%     3.000000  10.000000   6.000000 # 第一四分位数 (Q1),又称“较小四分位数”,等于该样本中所有数值由小到大排列后第25%的数字。
50%     5.000000  10.000000   7.000000 #中位数
75%    11.000000  12.000000   8.000000
max    12.000000  14.000000  10.000000
  1. 缺失数据处理:dataframe.dropna() ,默认删除所有存在na的行
    语法:DataFrame.dropna(axis=0, how=‘any’, thresh=None, subset=None, inplace=False)
    axis=0,默认对行作用,axis=1即对列作用
    how=any/all:
     any : 如果存在任何NA值,则删除该行
     all : 如果该行所有的值都为NA,则删除该行
#代码含义:对列作用,当整列都为NA值时才放弃列
np.random.seed(38754)
data=np.random.randint(0,15,15).reshape(5,3)
frame=DataFrame(data,index=['a','b','c','d','e'],columns=['x','y','z'])

#设置na值
frame.loc['a','x']=np.nan
frame.loc['b','y']=np.nan
frame.loc[0,0]=np.nan

print(frame)
print(frame.dropna(axis=1,how='all'))
#输出:
      x     y     z   0
a   NaN  14.0   7.0 NaN
b  12.0   NaN   6.0 NaN
c   3.0   7.0   8.0 NaN
d  11.0  10.0   0.0 NaN
e   1.0  10.0  10.0 NaN
0   NaN   NaN   NaN NaN
      x     y     z
a   NaN  14.0   7.0
b  12.0   NaN   6.0
c   3.0   7.0   8.0
d  11.0  10.0   0.0
e   1.0  10.0  10.0
0   NaN   NaN   NaN
  1. 填充缺失值:frame.fillna()
    语法:DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None, **kwargs)

value: 变量, 字典, Series, or DataFrame
#字典填充:{‘a’:1,‘b’:2} a列填充1,b列填充2,字典填充只能按列填充
method=‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None (见实例)
limit: int 针对连续缺失值,指定填充数量

#实例
np.random.seed(38754)
data=np.random.randint(0,15,15).reshape(5,3)
frame=DataFrame(data,index=['a','b','c','d','e'],columns=['x','y','z'])
frame.loc['a','x']=np.nan
frame.loc['b','y']=np.nan
frame.loc[0,0]=np.nan

print(frame)
print(frame.fillna('new',))
print(frame.fillna('new',axis=0))
print('向前或向后填充',frame.fillna(method='ffill'))
print('对连续na值限制填充次数',frame.fillna('new',limit=3))
print('字典填充',frame.fillna(value={'x':11,'y':22,'z':33}))
#输出

#原始frame
      x     y     z   0
a   NaN  14.0   7.0 NaN
b  12.0   NaN   6.0 NaN
c   3.0   7.0   8.0 NaN
d  11.0  10.0   0.0 NaN
e   1.0  10.0  10.0 NaN
0   NaN   NaN   NaN NaN

#用new填充na
     x    y    z    0
a  new   14    7  new
b   12  new    6  new
c    3    7    8  new
d   11   10    0  new
e    1   10   10  new
0  new  new  new  new

向前或向后填充 #用前一行(axis=0)/前一列(axis=1)的数值填充na,如果前一行或前一列也是na,则不填充
    x     y     z   0
a   NaN  14.0   7.0 NaN
b  12.0  14.0   6.0 NaN
c   3.0   7.0   8.0 NaN
d  11.0  10.0   0.0 NaN
e   1.0  10.0  10.0 NaN
0   1.0  10.0  10.0 NaN

对连续na值限制填充次数     
	 x    y    z    0
a  new   14    7  new
b   12  new    6  new
c    3    7    8  new
d   11   10    0  NaN
e    1   10   10  NaN
0  new  new  new  NaN

字典填充 #只能填充列     
	x     y     z   0
a  11.0  14.0   7.0 NaN
b  12.0  22.0   6.0 NaN
c   3.0   7.0   8.0 NaN
d  11.0  10.0   0.0 NaN
e   1.0  10.0  10.0 NaN
0  11.0  22.0  33.0 NaN
  1. dataframe.head(n) :查看前n行数据(默认是前5行)
  2. dataframe.tail(n) :查看后n行数据(默认是前5行)

你可能感兴趣的:(python,入门)