ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling

pyspark 中跑完一个resultrdd=rdd.filter(),再toDF(*columns)时出现如标题异常:

ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling

后发现resultrdd.count()只有一条

resultrdd.collect()
[(None,1876)]

因为第一列是None,导致出现标题错误

但是如果

resultrdd.collect()
[(None,1876)]
[("",1245)]

这种一列不是全为None的就不会报入标题的错误

你可能感兴趣的:(spark)