1.将字典格式转化成dataframe:
temp = {'user_id':['a','b','c','d'],
'age':[23,34,18,20],
'sex':['f','m','m','f'],
'click':[1,0,0,1]}
df = pd.DataFrame.from_dict(temp)
#此处默认orient=columns,将字典的key转换成dataframe的列名
print(df)
user_id age sex click
0 a 23 f 1
1 b 34 m 0
2 c 18 m 0
3 d 20 f 1
df1 = pd.DataFrame.from_dict(temp,orient = 'index')
#此处将字典的key转换成行索引
print(df1)
0 1 2 3
user_id a b c d
age 23 34 18 20
sex f m m f
click 1 0 0 1
2.将dataframe转换成dict,将df换成temp
df:
user_id age sex click
0 a 23 f 1
1 b 34 m 0
2 c 18 m 0
3 d 20 f 1
temp = df.to_dict(orient = 'list')
{'user_id': ['a', 'b', 'c', 'd'],
'age': [23, 34, 18, 20],
'sex': ['f', 'm', 'm', 'f'],
'click': [1, 0, 0, 1]}
注意:当df.to_dict(),默认orient='dict',此时是一个嵌套字典,格式{column -> {index -> value}},df.to_dict()结果如下:
{'user_id': {0: 'a', 1: 'b', 2: 'c', 3: 'd'},
'age': {0: 23, 1: 34, 2: 18, 3: 20},
'sex': {0: 'f', 1: 'm', 2: 'm', 3: 'f'},
'click': {0: 1, 1: 0, 2: 0, 3: 1}}
3.将dataframe变成嵌套字典形式:
data:
gaid tag score
0 a AA 1
1 a BB 2
2 a CC 1
3 b AA 1
4 b CC 1
5 c BB 1
变成每个用户对不同tag的评分字典
{'a': {'AA': 1.0, 'BB': 2.0, 'CC': 1.0},
'b': {'AA': 1.0, 'BB': nan, 'CC': 1.0},
'c': {'AA': nan, 'BB': 1.0, 'CC': nan}}
python代码:
temp = data.groupby(['tag','gaid'])['score'].sum().unstack().to_dict()
然后将nan的去掉:
for k in list(temp):
for k1 in list(temp[k]):
if str(temp[k][k1]) == 'nan':
del temp[k][k1]
{'a': {'AA': 1.0, 'BB': 2.0, 'CC': 1.0},
'b': {'AA': 1.0, 'CC': 1.0},
'c': {'BB': 1.0}}
接下来将字典变成dataframe:
df = pd.DataFrame(temp).T
Out[104]:
AA BB CC
a 1.0 2.0 1.0
b 1.0 NaN 1.0
c NaN 1.0 NaN
进一步将列转成行 和原始data一样
df.stack()可理解成将每一行转置竖着放,堆叠起来,行名作为一级索引
pd.DataFrame(temp).T.stack()
Out[101]:
a AA 1.0
BB 2.0
CC 1.0
b AA 1.0
CC 1.0
c BB 1.0
dtype: float64
df.unstack()可理解成将每一列竖着堆叠,以列名作为一级索引
pd.DataFrame(temp).T.unstack()
Out[105]:
AA a 1.0
b 1.0
c NaN
BB a 2.0
b NaN
c 1.0
CC a 1.0
b 1.0
c NaN
dtype: float64
4.嵌套字典排序:
top5
S={}
for k,v in list(temp.items()):
S[k] = dict(sorted(v.items(),key=lambda x:x[1],reverse=True)[:5])
字典写入本地文档:
SS = {'a': {'AA': 1.0, 'BB': 2.0, 'CC': 1.0},
'b': {'AA': 1.0, 'CC': 1.0},
'c': {'BB': 1.0}}
file = open('E:\\working_file\\simatrix', 'w')
for k,v in SS.items():
file.write(str(k)+' '+str(v)+'\n')