基于Hadoop2.2.0安装Spark 1.0

 
 
转至元数据结尾
 
转至元数据起始
 

基于Hadoop2.2.0安装Spark 1.0

基础

服务器的版本:ubuntu 12.04

JDK

下载

tar -zxvf ... -C /usr/local
ln -s /usr/local/jdk1.7.0_60 /usr/local/jvm

设置环境变量

export JAVA_HOME=/usr/local/jdk
export JRE_HOME=${JAVA_HOME}/jre 
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib 
export PATH=$JAVA_HOME/bin:$PATH

Scala

下载

wget http://www.scala-lang.org/files/archive/scala-2.10.3.tgz
tar -zxf scala-2.10.3.tgz -C /usr/local/
ln -s /usr/local/scala-2.10.3 /usr/local/scala

设置环境变量

export SCALA_HOME=/usr/local/scala
export PATH=$SCALA_HOME/bin:$PATH

maven

下载

#wget http://mirror.bit.edu.cn/apache/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz

  1. tar xvzf apache-maven-3.1.1-bin.tar.gz -C /usr/local # ln -s /usr/local/apache-maven-3.1.1 /usr/local/maven

    设置环境变量

    export MAVEN_HOME=/usr/local/maven
    export PATH=$MAVEN_HOME/bin:$PATH

    PROTOBUF 安装(主节点)

    安装依赖包

    apt-get install g++ autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev make

    下载安装

    wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
  2. tar xvzf protobuf-2.5.0.tar.gz# ./configure --prefix=/usr/local/protobuf# make && make install

    设置环境变量

    export PATH=/usr/local/protobuf/bin:$PATH

    Hadoop2.2.0安装

    基础

    http://blog.changecong.com/2013/10/ubuntu-%E7%BC%96%E8%AF%91%E5%AE%89%E8%A3%85-hadoop-2-2-0/ http://blog.csdn.net/licongcong_0224/article/details/12972889

    创建用户并设置无密码登录

    nanenode


    useradd -s /bin/bash -m hadoop
    mkdir -p /home/hadoop/.ssh
    chown hadoop.hadoop /home/hadoop/.ssh 
    su - hadoop
    ssh-keygen -t rsa 
    #如果主节点也当datanode的话,
    执行:cat id_rsa.pub > authorized_keys 

    datanode

    useradd -s /bin/bash -m hadoop
    mkdir -p /home/hadoop/.ssh
    chown hadoop.hadoop /home/hadoop/.ssh 

    echo "主节点id_rsa.pub的内容" > id_rsa.pub
    cat /home/hadoop/.ssh/id_rsa.pub >> /home/hadoop/.ssh/authorized_keys 

    每个节点关闭防火墙

    ufw disable

    Hadoop安装问题汇总

    [ERROR] class file for org.mortbay.component.AbstractLifeCycle not found

         解决方法:编辑hadoop-common-project/hadoop-auth/pom.xml文件,添加依赖:
         <dependency>
          <groupId>org.mortbay.jetty</groupId>       <artifactId>jetty-util</artifactId>      <scope>test</scope>    </dependency>

    注意hosts,192.168.137.100 namenode对应

    Spark1.0安装

    下载源码 spark1.0

    进入下载网页:http://spark.apache.org/downloads.html
    wget http://d3kbcqa49mib13.cloudfront.net/spark-1.0.0.tgz

    配置

    解压、配置路径

    sudo tar zxf spark-1.0.0.tgz
    sudo cp spark-1.0.0 /usr/local/ -rf
    cd /usr/local
    sudo ln -s spark-1.0.0/ spark 

    设置变量

    sudo /etc/profile:
    export SPARK_HOME=/usr/local/spark 
    export SCALA_HOME=/usr/local/scala
    export PATH=$SCALA_HOME/bin:$PATH
    source /etc/profile 

    组件依赖

    apt-get install unzip

    build Spark

    cd $SPARK_HOME
    export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
    #mvn -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0 -Pnew-yarn -DskipTests package
    mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package 
    #sudo ./sbt/sbt assembly
    SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true ./sbt/sbt assembly 

    将编译后的文件分发到各数据节点中

    1../bin/ 
    2../sbin/
    3. ./assemble/...
    4../conf/ 
    scp -r spark/sbin [email protected]:/home/hadoop/spark
    scp -r spark/bin [email protected]:/home/hadoop/spark
    scp -r spark/conf [email protected]:/home/hadoop/spark
    scp -r spark/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop2.2.0.jar [email protected]:/home/hadoop/spark/assembly/target/scala-2.10

    测行测试

    build成功后,会在spark目录下生成两个文件,分别为:
    1. examples/target/scala-2.10/spark-examples-1.0.0-hadoop2.2.0.jar
    2. assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop2.2.0.jar

    配置运行脚本

    1. sudo cp conf/spark-env.sh.template conf/spark-env.sh
    并添加如下内容:
    export JAVA_HOME=/usr/local/jvm/java
    export SCALA_HOME=/usr/local/scala
    export HADOOP_HOME=/home/hadoop/hadoop 
    2. sudo cp conf/log4j.properties.template conf/log4j.properties 
    3.cd $SPARK_HOME 
    vi conf/slaves
    添加数据节点:datanode1

    配置Spark路径

    vi /etc/profile
    export SPARK_HOME=/usr/local/spark
    export PATH=$PATH:$SPARK_HOME/bin 
    source /etc/profile

    运行测试

    本地模式

    普通集群模式

    结合HDFS模式

    ./bin/spark-shell 
    var file = sc.textFile("hdfs://namenode:9000/user/spark/hdfs.cmd");
    val count = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(+)
    count.collect()

    基于Yarn模式

    在Spark目录mkdir test && cd test
    vi run_spark_shell.sh 
    SPARK_JAR=../assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop2.2.0.jar \
    spark-class org.apache.spark.deploy.yarn.Client \
    --jar ../examples/target/scala-2.10/spark-examples-1.0.0-hadoop2.2.0.jar \
    --class org.apache.spark.examples.SparkPi \
    --num-workers 3 \
    --master-memory 1g \
    --worker-memory 1g \
    --worker-cores 1 

    运行状态监控

    在启动./bin/spark-she ll时,会显示如下日志:
    14/06/23 14:36:43 INFO SparkUI: Started SparkUI at http://namenode:4040
    则可访问:http://192.168.137.100:4040/

    Shark安装

    添加DataNode

    无密码登录

    修改hostname及hadoop/slaves,spark/slaves文件

    同步hadoop文件

    同步spark文件

    同步环境变量/etc/profile

    安装问题汇总

    spark目录必须为hadoop权限下的目录

    chown hadoop.hadoop /usr/local/spark

    SCALA_HOME not set

    build Spark的时候,用的是Spark帐户,Scala_Home是有设置的,但还会报这个错误,但切换到root帐户下,就没有出现这个问题。 

    JAVA_HOME is not set

    在conf/ spark-env.sh上添加export JAVA_HOME=/usr/local/jvm/java即可

    wrap: java.lang.reflect.InvocationTargetException: PermGen space

    出现这个错误,内存不够,应该是jvm参数引起的,给jvm分配的空间太小了。我用官方推荐的export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
    ,还会报此错误,我改成MaxPermSize=1g,解决此问题。

    WARN NativeCodeLoader: Unable to load native-hadoop library for your platform


    WARN TaskSchedulerImpl: Initial job has not accepted any resources

    worker内存不足引起的,可以修改每个datanode的worker数量,和每个worker的内存 

    附录

    1. Spark On YARN 环境搭建:http://sofar.blog.51cto.com/353572/1352713

你可能感兴趣的:(spark)