pyspark的常见报错、问题以及解决方法【持续更新】。

一、报错:Py4JError: An error occurred while calling o46.fit


环境:Centos7、Python3.7、spark2.4.6、java1.8.0_211、scala2.11.12
报错原因代码段:

from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.feature import HashingTF, Tokenizer

# (id, text, label) 
training = spark.createDataFrame([
    (0, "a b c d e spark", 1.0),
    (1, "b d", 0.0),
    (2, "spark f g h", 1.0),
    (3, "hadoop mapreduce", 0.0)
], ["id", "text", "label"])
training.show()# tokenizer, hashingTF, and lr.
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")
lr = LogisticRegression(maxIter=10, regParam=0.001)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

model = pipeline.fit(training)

报错大致如下几段:【建议自己筛选关键部分即可】

Exception happened during processing of request from ('127.0.0.1', 48756)
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/root/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:44278)
py4j.protocol.Py4JError: An error occurred while calling o46.fit
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:
py4j.protocol.Py4JError: An error occurred while calling o46.fit

During handling of the above exception, another exception occurred:

Py4JError: An error occurred while calling o46.fit

分析:
运行之后,这一段会导致报错:

model = pipeline.fit(training)

是在阿里云ECS上运行导致的报错,Centos系统,学生机。然后我百度 了网上很多种产生问题的原因和解决方法,我发现都是不行的。然后就换了一台本地的虚拟机跑,然后运行成功了,本地虚拟机和阿里云买的那台学生机环境一模一样,只是配置不一样而已。所以,应该是配置太低的问题。

一、报错:NameError: name 'long' is not defined


环境:Centos7、Python3.7、spark2.4.6、java1.8.0_211、scala2.11.12

错误原因:难道是Python3.x中没有long类型,只有int类型。Python2.x中既有long 类型又有int 类型。

将long改为int。

你可能感兴趣的:(大数据组件)