PySpark的运行出错:Py4JJavaError【python为3.9==>pyspark版本为3.0】

详细错误信息:

Py4JJavaError                             Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_8396/2169931463.py in <module>
----> 1 user_categorical_encoder.fit(feat_df)

~\AppData\Local\Temp/ipykernel_8396/3161698003.py in fit(self, df)
     11 
     12         genre_string_indexer = StringIndexer(inputCol='genre_item', outputCol='genre_index')
---> 13         indexer_model = genre_string_indexer.fit(exploded_df)
     14 
     15         # get mapping from string indexer

c:\Program_Files_AI\Anaconda3531\envs\anime\lib\site-packages\pyspark\ml\base.py in fit(self, dataset, params)
    159                 return self.copy(params)._fit(dataset)
    160             else:
--> 161                 return self._fit(dataset)
    162         else:
    163             raise ValueError("Params must be either a param map or a list/tuple of param maps, "

c:\Program_Files_AI\Anaconda3531\envs\anime\lib\site-packages\pyspark\ml\wrapper.py in _fit(self, dataset)
    333 
    334     def _fit(self, dataset):
--> 335         java_model = self._fit_java(dataset)
    336         model = self._create_model(java_model)
    337         return self._copyValues(model)
...
	at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:551)
	at java.base/java.net.ServerSocket.accept(ServerSocket.java:519)
	at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:174)
	... 41 more

对策
Python的安装目录里面复制python.exe,并改名为python3.exe
PySpark的运行出错:Py4JJavaError【python为3.9==>pyspark版本为3.0】_第1张图片

二、原因02

原因:Java版本问题,现在使用的版本过高,切换到1.8问题解决。
ps:

  • python为3.7,pyspark版本为2.3.1
  • python为3.9,pyspark版本为3.0



参考资料:
IDEA2020中测试PySpark的运行出错
pysaprk报错:Py4JJavaError

你可能感兴趣的:(#,大数据/Spark,pyspark)