简介:pandas 中的to_dict 可以对DataFrame类型的数据进行转换
可以选择六种的转换类型,分别对应于参数 ‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’,下面逐一介绍每种的用法
Help on method to_dict in module pandas.core.frame:
to_dict(orient='dict') method of pandas.core.frame.DataFrame instance
Convert DataFrame to dictionary.
Parameters
----------
orient : str {
'dict', 'list', 'series', 'split', 'records', 'index'}
Determines the type of the values of the dictionary.
- dict (default) : dict like {column -> {index -> value}}
- list : dict like {column -> [values]}
- series : dict like {column -> Series(values)}
- split : dict like
{index -> [index], columns -> [columns], data -> [values]}
- records : list like
[{column -> value}, ... , {column -> value}]
- index : dict like {index -> {column -> value}}
.. versionadded:: 0.17.0
Abbreviations are allowed. `s` indicates `series` and `sp`
indicates `split`.
Returns
-------
result : dict like {column -> {index -> value}}
1、选择参数orient=’dict’
dict也是默认的参数,下面的data数据类型为DataFrame结构, 会形成 {column -> {index -> value}}这样的结构的字典,可以看成是一种双重字典结构
- 单独提取每列的值及其索引,然后组合成一个字典
- 再将上述的列属性作为关键字(key),值(values)为上述的字典
查询方式为 :data_dict[key1][key2]
- data_dict 为参数选择orient=’dict’时的数据名
- key1 为列属性的键值(外层)
- key2 为内层字典对应的键值
data
Out[9]:
pclass age embarked home.dest sex
1086 3rd 31.194181 UNKNOWN UNKNOWN male
12 1st 31.194181 Cherbourg Paris, France female
1036 3rd 31.194181 UNKNOWN UNKNOWN male
833 3rd 32.000000 Southampton Foresvik, Norway Portland, ND male
1108 3rd 31.194181 UNKNOWN UNKNOWN male
562 2nd 41.000000 Cherbourg New York, NY male
437 2nd 48.000000 Southampton Somerset / Bernardsville, NJ female
663 3rd 26.000000 Southampton UNKNOWN male
669 3rd 19.000000 Southampton England male
507 2nd 31.194181 Southampton Petworth, Sussex male
In[10]: data_dict=data.to_dict(orient= 'dict')
In[11]: data_dict
Out[11]:
{
'age': {
12: 31.19418104265403,
437: 48.0,
507: 31.19418104265403,
562: 41.0,
663: 26.0,
669: 19.0,
833: 32.0,
1036: 31.19418104265403,
1086: 31.19418104265403,
1108: 31.19418104265403},
'embarked': {
12: 'Cherbourg',
437: 'Southampton',
507: 'Southampton',
562: 'Cherbourg',
663: 'Southampton',
669: 'Southampton',
833: 'Southampton',
1036: 'UNKNOWN',
1086: 'UNKNOWN',
1108: 'UNKNOWN'},
'home.dest': {
12: 'Paris, France',
437: 'Somerset / Bernardsville, NJ',
507: 'Petworth, Sussex',
562: 'New York, NY',
663: 'UNKNOWN',
669: 'England',
833: 'Foresvik, Norway Portland, ND',
1036: 'UNKNOWN',
1086: 'UNKNOWN',
1108: 'UNKNOWN'},
'pclass': {
12: '1st',
437: '2nd',
507: '2nd',
562: '2nd',
663: '3rd',
669: '3rd',
833: '3rd',
1036: '3rd',
1086: '3rd',
1108: '3rd'},
'sex': {
12: 'female',
437: 'female',
507: 'male',
562: 'male',
663: 'male',
669: 'male',
833: 'male',
1036: 'male',
1086: 'male',
1108: 'male'}}
2、当关键字orient=’ list’ 时
和1中比较相似,只不过内层变成了一个列表,结构为{column -> [values]}
查询方式为: data_list[keys][index]
In[19]: data_list=data.to_dict(orient='list')
In[20]: data_list
Out[20]:
{
'age': [31.19418104265403,
31.19418104265403,
31.19418104265403,
32.0,
31.19418104265403,
41.0,
48.0,
26.0,
19.0,
31.19418104265403],
'embarked': ['UNKNOWN',
'Cherbourg',
'UNKNOWN',
'Southampton',
'UNKNOWN',
'Cherbourg',
'Southampton',
'Southampton',
'Southampton',
'Southampton'],
'home.dest': ['UNKNOWN',
'Paris, France',
'UNKNOWN',
'Foresvik, Norway Portland, ND',
'UNKNOWN',
'New York, NY',
'Somerset / Bernardsville, NJ',
'UNKNOWN',
'England',
'Petworth, Sussex'],
'pclass': ['3rd',
'1st',
'3rd',
'3rd',
'3rd',
'2nd',
'2nd',
'3rd',
'3rd',
'2nd'],
'sex': ['male',
'female',
'male',
'male',
'male',
'male',
'female',
'male',
'male',
'male']}
3、关键字参数orient=’series’
形成结构{column -> Series(values)}
调用格式为:data_series[key1][key2]或data_dict[key1]
In[21]: data_series=data.to_dict(orient='series')
In[22]: data_series
Out[22]:
{
'age': 1086 31.194181
12 31.194181
1036 31.194181
833 32.000000
1108 31.194181
562 41.000000
437 48.000000
663 26.000000
669 19.000000
507 31.194181
Name: age, dtype: float64, 'embarked': 1086 UNKNOWN
12 Cherbourg
1036 UNKNOWN
833 Southampton
1108 UNKNOWN
562 Cherbourg
437 Southampton
663 Southampton
669 Southampton
507 Southampton
Name: embarked, dtype: object, 'home.dest': 1086 UNKNOWN
12 Paris, France
1036 UNKNOWN
833 Foresvik, Norway Portland, ND
1108 UNKNOWN
562 New York, NY
437 Somerset / Bernardsville, NJ
663 UNKNOWN
669 England
507 Petworth, Sussex
Name: home.dest, dtype: object, 'pclass': 1086 3rd
12 1st
1036 3rd
833 3rd
1108 3rd
562 2nd
437 2nd
663 3rd
669 3rd
507 2nd
Name: pclass, dtype: object, 'sex': 1086 male
12 female
1036 male
833 male
1108 male
562 male
437 female
663 male
669 male
507 male
Name: sex, dtype: object}
4、关键字参数orient=’split’
形成{index -> [index], columns -> [columns], data -> [values]}的结构,是将数据、索引、属性名单独脱离出来构成字典
调用方式有 data_split[‘index’],data_split[‘data’],data_split[‘columns’]
data_split=data.to_dict(orient='split')
data_split
Out[38]:
{
'columns': ['pclass', 'age', 'embarked', 'home.dest', 'sex'],
'data': [['3rd', 31.19418104265403, 'UNKNOWN', 'UNKNOWN', 'male'],
['1st', 31.19418104265403, 'Cherbourg', 'Paris, France', 'female'],
['3rd', 31.19418104265403, 'UNKNOWN', 'UNKNOWN', 'male'],
['3rd', 32.0, 'Southampton', 'Foresvik, Norway Portland, ND', 'male'],
['3rd', 31.19418104265403, 'UNKNOWN', 'UNKNOWN', 'male'],
['2nd', 41.0, 'Cherbourg', 'New York, NY', 'male'],
['2nd', 48.0, 'Southampton', 'Somerset / Bernardsville, NJ', 'female'],
['3rd', 26.0, 'Southampton', 'UNKNOWN', 'male'],
['3rd', 19.0, 'Southampton', 'England', 'male'],
['2nd', 31.19418104265403, 'Southampton', 'Petworth, Sussex', 'male']],
'index': [1086, 12, 1036, 833, 1108, 562, 437, 663, 669, 507]}
5、当关键字orient=’records’ 时
形成[{column -> value}, … , {column -> value}]的结构
整体构成一个列表,内层是将原始数据的每行提取出来形成字典
调用格式为data_records[index][key1]
data_records=data.to_dict(orient='records')
data_records
Out[41]:
[{
'age': 31.19418104265403,
'embarked': 'UNKNOWN',
'home.dest': 'UNKNOWN',
'pclass': '3rd',
'sex': 'male'},
{
'age': 31.19418104265403,
'embarked': 'Cherbourg',
'home.dest': 'Paris, France',
'pclass': '1st',
'sex': 'female'},
{
'age': 31.19418104265403,
'embarked': 'UNKNOWN',
'home.dest': 'UNKNOWN',
'pclass': '3rd',
'sex': 'male'},
{
'age': 32.0,
'embarked': 'Southampton',
'home.dest': 'Foresvik, Norway Portland, ND',
'pclass': '3rd',
'sex': 'male'},
{
'age': 31.19418104265403,
'embarked': 'UNKNOWN',
'home.dest': 'UNKNOWN',
'pclass': '3rd',
'sex': 'male'},
{
'age': 41.0,
'embarked': 'Cherbourg',
'home.dest': 'New York, NY',
'pclass': '2nd',
'sex': 'male'},
{
'age': 48.0,
'embarked': 'Southampton',
'home.dest': 'Somerset / Bernardsville, NJ',
'pclass': '2nd',
'sex': 'female'},
{
'age': 26.0,
'embarked': 'Southampton',
'home.dest': 'UNKNOWN',
'pclass': '3rd',
'sex': 'male'},
{
'age': 19.0,
'embarked': 'Southampton',
'home.dest': 'England',
'pclass': '3rd',
'sex': 'male'},
{
'age': 31.19418104265403,
'embarked': 'Southampton',
'home.dest': 'Petworth, Sussex',
'pclass': '2nd',
'sex': 'male'}]
6、当关键字orient=’index’ 时
形成{index -> {column -> value}}的结构,调用格式正好和’dict’ 对应的反过来,请读者自己思考
data_index=data.to_dict(orient='index')
data_index
Out[43]:
{
12: {
'age': 31.19418104265403,
'embarked': 'Cherbourg',
'home.dest': 'Paris, France',
'pclass': '1st',
'sex': 'female'},
437: {
'age': 48.0,
'embarked': 'Southampton',
'home.dest': 'Somerset / Bernardsville, NJ',
'pclass': '2nd',
'sex': 'female'},
507: {
'age': 31.19418104265403,
'embarked': 'Southampton',
'home.dest': 'Petworth, Sussex',
'pclass': '2nd',
'sex': 'male'},
562: {
'age': 41.0,
'embarked': 'Cherbourg',
'home.dest': 'New York, NY',
'pclass': '2nd',
'sex': 'male'},
663: {
'age': 26.0,
'embarked': 'Southampton',
'home.dest': 'UNKNOWN',
'pclass': '3rd',
'sex': 'male'},
669: {
'age': 19.0,
'embarked': 'Southampton',
'home.dest': 'England',
'pclass': '3rd',
'sex': 'male'},
833: {
'age': 32.0,
'embarked': 'Southampton',
'home.dest': 'Foresvik, Norway Portland, ND',
'pclass': '3rd',
'sex': 'male'},
1036: {
'age': 31.19418104265403,
'embarked': 'UNKNOWN',
'home.dest': 'UNKNOWN',
'pclass': '3rd',
'sex': 'male'},
1086: {
'age': 31.19418104265403,
'embarked': 'UNKNOWN',
'home.dest': 'UNKNOWN',
'pclass': '3rd',
'sex': 'male'},
1108: {
'age': 31.19418104265403,
'embarked': 'UNKNOWN',
'home.dest': 'UNKNOWN',
'pclass': '3rd',
'sex': 'male'}}