spark study note1 环境搭建

grep -A 5 'UPDATE ddt_frequency_car' pub.log 查找向下的五行

ip addr 查看ip
service network restart /etc/init.d/network restart 重启网卡

免密登录
通过命令”ssh-keygen -t rsa“
生成之后会在用户的根目录生成一个 “.ssh”的文件夹
通过ssh-copy-id的方式
命令: ssh-copy-id -i ~/.ssh/id_rsa.put
举例:      
[root@test .ssh]# ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.91.135
[email protected]'s password:
Now try logging into the machine, with "ssh '192.168.91.135'", and check in:
.ssh/authorized_keys
to make sure we haven't added extra keys that you weren't expecting.
[root@test .ssh]# ssh [email protected]
Last login: Mon Oct 10 01:25:49 2016 from 192.168.91.133
[root@localhost ~]#
    常见错误:
      [root@test ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub 192.168.91.135
      -bash: ssh-copy-id: command not found //提示命令不存在
      解决办法:yum -y install openssh-clients
cat id_rsa.pub >> authorized_keys 将公钥追加到 authorized_keys 中,authorized_keys中是可以防问的公钥
A将公钥发给B,不是说让B来访问A,而是A就可以访问B了

vi /etc/hosts
192.168.20.75 Master
192.168.20.76 Slave1
192.168.20.77 Slave2

设置静态ip
用#将BOOTPROTO=dhcp注释
IPADDR=192.168.60.101 #静态IP
GATEWAY=192.168.20.1 #默认网关
NETMASK=255.255.255.0 #子网掩码
DNS1=192.168.1.10 #DNS 配置

IPADDR=192.168.60.102 #静态IP
GATEWAY=192.168.20.1 #默认网关
NETMASK=255.255.255.0 #子网掩码
DNS1=192.168.1.10 #DNS 配置

IPADDR=192.168.60.103 #静态IP
GATEWAY=192.168.20.1 #默认网关
NETMASK=255.255.255.0 #子网掩码
DNS1=192.168.1.10 #DNS 配置

安装jdk scala 设置环境变量
rpm -ivh jdk-8u144-linux-x64.rpm 安装jdk
rpm -ivh scala-2.11.8.rpm 安装scala
vi /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_144
export PATH=$PATH:${JAVA_HOME}/bin
export SCALA_HOME=/usr/share/scala
export PATH=$SCALA_HOME/bin:$PATH

移动解压hadoop 设置环境变量
mv hadoop-2.7.4 /opt
tar -zxvf hadoop-2.7.4.tar.gz
export HADOOP_HOME=/opt/hadoop-2.7.4/
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_ROOT_LOGGER=INFO,console
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
source /etc/profile
修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh,修改JAVA_HOME 如下:
export JAVA_HOME=/usr/java/jdk1.8.0_144
修改$HADOOP_HOME/etc/hadoop/slaves,将原来的localhost删除,改成如下内容:
Slave1
Slave2
修改$HADOOP_HOME/etc/hadoop/core-site.xml


      
          fs.defaultFS
          hdfs://Master:9000
      
      
         io.file.buffer.size
         131072
     
     
          hadoop.tmp.dir
          /opt/hadoop-2.7.4/tmp
     

修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml


    
      dfs.namenode.secondary.http-address
      Master:50090
    
    
      dfs.replication
      2
    
    
      dfs.namenode.name.dir
      file:/opt/hadoop-2.7.4/hdfs/name
    
    
      dfs.datanode.data.dir
      file:/opt/hadoop-2.7.4/hdfs/data
    

复制template,生成xml,命令如下:
cp mapred-site.xml.template mapred-site.xml
修改$HADOOP_HOME/etc/hadoop/mapred-site.xml


 
    mapreduce.framework.name
    yarn
  
  
          mapreduce.jobhistory.address
          Master:10020
  
  
          mapreduce.jobhistory.address
          Master:19888
  

修改$HADOOP_HOME/etc/hadoop/yarn-site.xml


     
         yarn.nodemanager.aux-services
         mapreduce_shuffle
     
     
         yarn.resourcemanager.address
         Master:8032
     
     
         yarn.resourcemanager.scheduler.address
         Master:8030
     
     
         yarn.resourcemanager.resource-tracker.address
         Master:8031
     
     
         yarn.resourcemanager.admin.address
         Master:8033
     
     
         yarn.resourcemanager.webapp.address
         Master:8088
     

scp -r /opt/hadoop-2.7.4/etc/hadoop root@Slave1:/opt/hadoop-2.7.4/etc
scp -r /opt/hadoop-2.7.4/etc/hadoop root@Slave2:/opt/hadoop-2.7.4/etc
在Master节点启动集群,启动之前格式化一下namenode:
hadoop namenode -format
启动:
/opt/hadoop-2.7.4/sbin/start-all.sh
防问地址:
http://192.168.20.75:8088
http://master:50070

安装spark2.1.0
mv spark-2.1.0-bin-hadoop2.7.tgz /opt
tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz
修改/etc/profie,增加如下内容:
export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7/
export PATH=$PATH:$SPARK_HOME/bin
cd /opt/spark-2.1.0-bin-hadoop2.7/conf
复制spark-env.sh.template成spark-env.sh
cp spark-env.sh.template spark-env.sh
修改$SPARK_HOME/conf/spark-env.sh,添加如下内容:
export JAVA_HOME=/usr/java/jdk1.8.0_144
export SCALA_HOME=/usr/share/scala
export HADOOP_HOME=/opt/hadoop-2.7.4
export HADOOP_CONF_DIR=/opt/hadoop-2.7.4/etc/hadoop
export SPARK_MASTER_IP=192.168.20.75
export SPARK_MASTER_HOST=192.168.20.75
export SPARK_LOCAL_IP=192.168.20.75
export SPARK_WORKER_MEMORY=1g
export SPARK_WORKER_CORES=2
export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7
export SPARK_DIST_CLASSPATH=$(/opt/hadoop-2.7.4/bin/hadoop classpath)
复制slaves.template成slaves
cp slaves.template slaves
修改$SPARK_HOME/conf/slaves,添加如下内容:
Master
Slave1
Slave2
将配置好的spark文件复制到Slave1和Slave2节点。
scp -r /opt/spark-2.1.0-bin-hadoop2.7 root@Slave1:/opt
scp -r /opt/spark-2.1.0-bin-hadoop2.7 root@Slave2:/opt
在Slave1和Slave2上分别修改/etc/profile,增加Spark的配置
在Slave1和Slave2修改$SPARK_HOME/conf/spark-env.sh,将export SPARK_LOCAL_IP=114.55.246.88改成Slave1和Slave2对应节点的IP
在Master节点启动集群。
/opt/spark-2.1.0-bin-hadoop2.7/sbin/start-all.sh
查看集群是否启动成功:
jps
Master在Hadoop的基础上新增了:
Master
Slave在Hadoop的基础上新增了:
Worker

启动: systemctl start firewalld
查看状态: systemctl status firewalld
停止: systemctl disable firewalld
禁用: systemctl stop firewalld

zookeeper安装
conf/zoo.cfg
server.0=Master:2288:3388
server.1=Slave1:2288:3388
server.2=Slave2:2288:3388
touch myid 在dataDir下
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.10
export PATH=$PATH:$ZOOKEEPER_HOME/bin
/opt/zookeeper-3.4.10/bin/zkServer.sh start
/opt/zookeeper-3.4.10/bin/zkServer.sh status

你可能感兴趣的:(spark study note1 环境搭建)