解决Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/ FSDataInputStream

我们将 Hadoop 的 classhpath 信息添加到 CLASSPATH 变量中,在 ~/.bashrc 中增加如下几行:

export HADOOP_HOME=/home/hadoop/app/hadoop

export JAVA_HOME=/home/hadoop/app/java/jdk

exportSCALA_HOME=/home/hadoop/app/scala
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:${SCALA_HOME}/bin:${SPARK_HOME}/bin:$PATH
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH

spark-env.sh中添加环境变量:

export JAVA_HOME=/home/hadoop/app/jdk
export SCALA_HOME=/home/hadoop/app/scala
export SPARK_MASTER_IP=zhangge
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/home/hadoop/app/hadoop/etc/hadoop
export SPARK_DIST_CLASSPATH=$(hadoop classpath)//非常重要的变量。可参考下面的链接解释

https://spark.apache.org/docs/latest/hadoop-provided.html#using-sparks-hadoop-free-build

Using Spark's "Hadoop Free" Build
Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages “Hadoop free” builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify SPARK_DIST_CLASSPATH to include Hadoop’s package jars. The most convenient place to do this is by adding an entry in conf/spark-env.sh.

This page describes how to connect Spark to Hadoop for different types of distributions.

Apache Hadoop
For Apache distributions, you can use Hadoop’s ‘classpath’ command. For instance:

Apache Hadoop

For Apache distributions, you can use Hadoop’s ‘classpath’ command. For instance:

### in conf/spark-env.sh ###

# If 'hadoop' binary is on your PATH
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

# With explicit path to 'hadoop' binary
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)

# Passing a Hadoop configuration directory
export SPARK_DIST_CLASSPATH=$(hadoop --config /path/to/configs classpath)

你可能感兴趣的:(spark,hadoop)