pyspark报错TypeError: an integer is required (got type bytes)

安装配置pyspark,计算时报错如下:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/09/02 23:52:02 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
  File "C:\Users\hx\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 184, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "C:\Users\hx\AppData\Local\Programs\Python\Python38\lib\runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "", line 991, in _find_and_load
  File "", line 975, in _find_and_load_unlocked
  File "", line 655, in _load_unlocked
  File "", line 618, in _load_backward_compatible
  File "", line 259, in load_module
  File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\__init__.py", line 51, in <module>
  File "", line 991, in _find_and_load
  File "", line 975, in _find_and_load_unlocked
  File "", line 655, in _load_unlocked
  File "", line 618, in _load_backward_compatible
  File "", line 259, in load_module
  File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\context.py", line 31, in <module>
  File "", line 991, in _find_and_load
  File "", line 975, in _find_and_load_unlocked
  File "", line 655, in _load_unlocked
  File "", line 618, in _load_backward_compatible
  File "", line 259, in load_module
  File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\accumulators.py", line 97, in <module>
  File "", line 991, in _find_and_load
  File "", line 975, in _find_and_load_unlocked
  File "", line 655, in _load_unlocked
  File "", line 618, in _load_backward_compatible
  File "", line 259, in load_module
  File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\serializers.py", line 71, in <module>
  File "", line 991, in _find_and_load
  File "", line 975, in _find_and_load_unlocked
  File "", line 655, in _load_unlocked
  File "", line 618, in _load_backward_compatible
  File "", line 259, in load_module
  File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\cloudpickle.py", line 145, in <module>
  File "D:\spark-2.4.3-bin-hadoop2.7\python\lib\pyspark.zip\pyspark\cloudpickle.py", line 126, in _make_cell_set_template_code
TypeError: an integer is required (got type bytes)
[Stage 0:>                                                          (0 + 1) / 1]23/09/02 23:52:14 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker failed to connect back.
	at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:170)
	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:97)
	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)
	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:108)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:103)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.run(Task.scala:121)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Accept timed out
	at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native Method)
	at java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:131)
	at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:535)
	at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:189)
	at java.net.ServerSocket.implAccept(ServerSocket.java:545)
	at java.net.ServerSocket.accept(ServerSocket.java:513)
	at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:164)
	... 18 more

解决方案:

电脑中有多个python版本:3.8, 3.7和Anaconda的,但只在3.7配置了pyspark环境,就算将编译器切换至3.7,程序执行还是会调用其他python的runpy.py文件导致报错,改变环境变量顺序无果,索性将其他python版本删除,解决问题。

后面发现的其他方案

方案一:添加环境变量:PYSPARK_PYTHON=“你电脑python解释器的路径”。
方案二:在运行的代码最前面添加以下两行:

import os
os.environ['PYSPARK_PYTHON'] = r"你电脑python解释器的路径"

你可能感兴趣的:(bug解决,bug,spark,python)