name | status | number | message |
matt | active | 12345 | [job: , money: none, wife: none] |
james | active | 23456 | [group: band, wife: yes, money: 10000] |
adam | inactive | 34567 | [job: none, money: none, wife: , kids: one, group: jail] |
方法一:
首先通过replace(\s+代表一个及以上空格),将list of dict 转化为set of dict 然后使用ast
import ast
df.message = df.message.replace([':\s+,','\[', '\]', ':\s+', ',\s+'], ['":"none","', '{"', '"}', '":"', '","'], regex=True)
df.message = df.message.apply(ast.literal_eval)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
kids money group job money wife 0 NaN none NaN none NaN none 1 NaN NaN band NaN 10000 yes 2 one NaN jail none none none
问题来了因为‘money’在第二行的message中是第三个dict,不同于其他两行在第二个dict,
因此会产生两列‘money’。这时候需要我们手动修改,不展开了。
所以按正常的操作得到如下:
df=pd.concat([df,df1],axis=1)
print(df)
name status number kids money group job money wife 0 matt active 12345 NaN none NaN none NaN none 1 james active 23456 NaN NaN band NaN 10000 yes 2 adam inactive 34567 one NaN jail none none none
方法二:
使用yaml包
import yaml
df.message = df.message.replace(['\[','\]'],['{','}'], regex=True).apply(yaml.load)
df1 = pd.DataFrame(df.pop('message').values.tolist(), index=df.index)
print (df1)
group job kids money wife 0 NaN None NaN none none 1 band NaN NaN 10000 True 2 jail none one none Nonedf = pd.concat([df, df1], axis=1)
print (df)
name status number group job kids money wife 0 matt active 12345 NaN None NaN none none 1 james active 23456 band NaN NaN 10000 True 2 adam inactive 34567 jail none one none Non