ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling

ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling


Resolutions:
1.Improve sample ratio, e.g.

sqlContext.createDataFrame(rdd, samplingRatio=0.2)

2.Tell spark the explicit schema, e.g.

from pyspark.sql.types import *
schema = StructType([
StructField("column_1", StringType(), True),
StructField("column_2", IntegerType(), True)
])
df = sqlContext.createDataFrame(rdd, schema=schema)

你可能感兴趣的:(SparkSQL)