Spark使用独立集群管理器运行任务报错("Randomness of hash of string should be disabled via PYTHONHASHSEED")

1 报错原因

In [7]: c.collect()
19/03/09 10:58:41 WARN TaskSetManager: Lost task 1.0 in stage 3.0 (TID 8, 10.0.2.13): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/home/hadoop/spark-2.0.2/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main
    process()
  File "/home/hadoop/spark-2.0.2/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/home/hadoop/spark-2.0.2/python/lib/pyspark.zip/pyspark/serializers.py", line 133, in dump_stream
    for obj in iterator:
  File "/home/hadoop/spark-2.0.2/python/pyspark/rdd.py", line 1719, in add_shuffle_key
    buckets[partitionFunc(k) % numPartitions].append((k, v))
  File "/home/hadoop/spark-2.0.2/python/lib/pyspark.zip/pyspark/rdd.py", line 74, in portable_hash
    raise Exception("Randomness of hash of string should be disabled via PYTHONHASHSEED")
Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED

	at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
    ......

2 解决办法

启动 pyspark 时增加 conf 参数,如下:

pyspark --master spark://master:7077 --conf spark.executorEnv.PYTHONHASHSEED=321

再次运行,成功执行任务:
Spark使用独立集群管理器运行任务报错(
Spark使用独立集群管理器运行任务报错(

完!

你可能感兴趣的:(Spark,Spark学习,PYTHONHASHSEED)