python pandas 分组后去重统计

python pandas 分组后去重统计

  • 方法一
  • 方法二

有DataFrame:

df = pd.DataFrame({
    'group': [1, 1, 2, 3, 3, 3, 4],
    'param': ['a', 'a', 'b', np.nan, 'a', 'a', np.nan]
})
print(df)

#    group param
# 0      1     a
# 1      1     a
# 2      2     b
# 3      3   NaN
# 4      3     a
# 5      3     a
# 6      4   NaN

想要得到的结果:

# a    2
# b    1

方法一

nunique()

print (df.groupby('param')['group'].nunique())
param
# a    2
# b    1
# Name: group, dtype: int64

方法二

  1. unique()
  2. create new df by DataFrame.from_records()
  3. reshape to Series by stack
  4. value_counts()
a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
# a    2
# b    1
# dtype: int64

来自 stack overflow

你可能感兴趣的:(python,pandas)