下载相应包。然后放到linux 相关目录,然后配置环境变量,配置文件如下
vim ~/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/.local/bin:$HOME/bin
export PATH
#java setting
export JAVA_HOME=/home/handoop/app/jdk1.8.0_91
export PATH=$JAVA_HOME/bin:$PATH
#scala setting
export SCALA_HOME=/home/handoop/app/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH
#hadoop setting
export HADOOP_HOME=/home/handoop/app/hadoop-2.6.0-cdh5.7.0
export PATH=$HADOOP_HOME/bin:$PATH
#maven setting
export MAVEN_HOME=/home/handoop/app/apache-maven-3.3.9
export PATH=$MAVEN_HOME/bin:$PATH
#spak setting
export SPARK_HOME=/home/handoop/app/spark-2.3.0-bin-2.6.0-cdh5.7.0
export PATH=$SPARK_HOME/bin:$PATH
# pthon qidong
export PYSPARK_PYTHON=python3
#build run .py code
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.6-src.zip
windows 下使用pyCharm 中使用spark
配置环境变量
1,Javahome:
JAVA_HOME:C:\Program Files\Java\jdk1.8.0_321 (java 必须是1.8及以上。我用1.5踩坑)
系统变量:
PATH:
C:\Program Files\Java\jdk1.8.0_321\bin
E:\WooPython\Python_Spark\hadoop-2.6.0\bin
HADOOP_HOME:E:\WooPython\Python_Spark\hadoop-2.6.0
pycharm 的Edit Configuration 设置Environment variables:
PYTHONUNBUFFERED=1;SPARK_HOME=E:\WooPython\Python_Spark\spark-2.3.0-bin-2.6.0-cdh5.7.0;PYTHONPATH=E:\WooPython\Python_Spark\spark-2.3.0-bin-2.6.0-cdh5.7.0\python
工具/File/Settings/Project Structure /Add Content Root
E:\WooPython\Python_Spark\spark-2.3.0-bin-2.6.0-cdh5.7.0\python\lib\py4j-0.10.6-src
E:\WooPython\Python_Spark\spark-2.3.0-bin-2.6.0-cdh5.7.0\python\lib\pyspark.zip