pandas、numpy篇——计数、排序、分类

numpy:

np.unique(nd)

counts = defaultdict(int)
for key in dict:
    counts[key] += 1

np.sort(list,axis=0,kind='quicksort')
np.searchsorted(sorted list,values,side='left')
np.argmax()
np.argmin()
np.argsort(nd,axis)返回排序后所在的索引nd,shape同nd
ind = np.lexsort((b,a)) # Sort by a, then by b
np.where(condition,x,y)

pandas:

pd.unique(series)

category=df[key].astype('category')将series作为分类标准,返回分类实例
category.values.category,返回种类,无重复值,完成unique操作
category.values.codes,返回每个样本对应的种类序号,用0 1 2 表示
category.cat.category,同values
category.cat.codes,同values
category.cat.set_category(categories list),设置种类
category.cat.remove_unused_categories(),删除未用到的种类
pd.get_dummies(category),将category onehot编码,返回多维度特征

df.value_counts()
pd.value_counts(df)

df.sort_values(by=['column1','col2'],axis=0,ascending=True,na_position='first')
series.searchsorted(value,side='left')插入value时的index,如果value在series中,left为左index
series.nsmallest(3) series.nlargest(3)

df.qcut(series,[0,0.25,0.5,0.75,1.])4分位分类,返回类别list[(],(],(]...]
df.qcut(series,4,labels=['a','b','c','d']),进行4分位分类,返回所属类别list['b','a','c','d'...]

你可能感兴趣的:(pandas、numpy篇——计数、排序、分类)