python字典转数据框,将标准python键值字典列表转换为pyspark数据框

Consider i have a list of python dictionary key value pairs , where key correspond to column name of a table, so for below list how to convert it into a pyspark dataframe with two cols arg1 arg2?

[{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]

How can i use the following construct to do it?

df = sc.parallelize([

...

]).toDF

Where to place arg1 arg2 in the above code (...)

解决方案

Old way:

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]).toDF()

New way:

from pyspark.sql import Row

from collections import OrderedDict

def convert_to_row(d: dict) -> Row:

return Row(**OrderedDict(sorted(d.items())))

sc.parallelize([{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""},{"arg1": "", "arg2": ""}]) \

.map(convert_to_row) \

.toDF()

你可能感兴趣的:(python字典转数据框)