Spark报错java.io.IOException: Cannot run program "python": CreateProcess error=2, 系统找不到指定的文件

在windows10环境下搭建单机版spark,使用ml包运行时报错

from pyspark.ml.classification import GBTClassifier
GBT = GBTClassifier(featuresCol="features",
                    labelCol="label", predictionCol="prediction",
                    maxDepth=3, maxBins=32, minInstancesPerNode=1,
                    minInfoGain=0.0, maxMemoryInMB=256, cacheNodeIds=False,
                    checkpointInterval=10, lossType="logistic", maxIter=10,
                    stepSize=0.1, seed=None)
model=GBT.fit(spark_train)

错误信息:

py4j.protocol.Py4JJavaError: An error occurred while calling o250.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.io.IOException: Cannot run program "python": CreateProcess error=2, 系统找不到指定的文件。

经过研究,发现错误的原因是spark运行时无法找到python解释器
解决方法:
Spark报错java.io.IOException: Cannot run program
在pycharm右上角点击Edit Configurations,
在environment variable中插入环境变量即可解决问题

PYSPARK_PYTHON = D:\Anaconda3\python.exe
SPARK_HOME = D:\spark-1.6.3-bin-hadoop2.6
PYTHONUNBUFFERED = 1

你可能感兴趣的:(hadoop,and,spark)