Python写的Spark示例,报错与解决方法

 spark运行环境参考:https://blog.csdn.net/max_cola/article/details/78902597

对应的环境变量:

#java
export JAVA_HOME=/usr/local/jdk1.8.0_181  
export PATH=$JAVA_HOME/bin:$PATH
#python
export PYTHON_HOME=/usr/local/python3
export PATH=$PYTHON_HOME/bin:$PATH
#spark
export SPARK_HOME=/usr/local/spark                                                                              export PATH=$SPARK_HOME/bin:$PATH
#add spark to python
export PYTHONPATH=/usr/local/spark/python
#add pyspark to jupyter
export PYSPARK_PYTHON=/usr/local/python3/bin/python3 # 因为我们装了两个版本的python,所以要指定pyspark_python,>否则pyspark执行程序会报错。
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook --allow-root'

使用 python写的Spark示例:

# -*- coding: utf-8 -*-
from __future__ import print_function
from pyspark import *
import os
if __name__ == '__main__':
    sc = SparkContext("local[4]")
    sc.setLogLevel("WARN")
    rdd = sc.parallelize("hello Pyspark world".split(" "))
    counts = rdd \                                                                                              
       .flatMap(lambda line: line) \
       .map(lambda word: (word, 1)) \
       .reduceByKey(lambda a, b: a + b) \
       .foreach(print)
    sc.stop

出现如下错误

Traceback (most recent call last):
  File "test1.py", line 3, in 
    from pyspark import *
  File "/usr/local/spark/python/pyspark/__init__.py", line 46, in 
    from pyspark.context import SparkContext
  File "/usr/local/spark/python/pyspark/context.py", line 29, in 
    from py4j.protocol import Py4JError
ImportError: No module named py4j.protocol

解决方法:

#进入python的目录
/usr/local/python3/lib/python3.6/site-packages

#拷贝日志包过来
cp /usr/local/spark/python/lib/py4j-0.10.7-src.zip ./

#解压
unzip py4j-0.10.7-src.zip 


 

你可能感兴趣的:(Spark,spark,python)