推荐新手阅读
先说感想,建立个大数据的文件目录,以后所有大数据程序都放里面,一台机器配置好以后只要把整个目录复制到其他服务器上就行了。
大数据产品不要图最新版本,1不稳定,2不同产品之间有版本约束。
例如:不同产品不同版本都对jdk有要求。
hadoop:依赖jdk
spark:依赖jdk,hadoop,scala
hive:依赖jdk,hadoop,mysql
hbase:依赖jdk,hadoop
kylin:依赖jdk,hadoop,hive,hbase
kylin,hive,hbase如果计算不用hadoop的mapreduce而用速度快100倍的spark,自然对spark版本也有依赖。
这种版本环环相扣口,走了不少坑,一般取1年前的产品版本即可,对应的教程也多也稳定。
仅仅应用层面的话,其实就是各产品修改配置文件,并复制到各个服务器上就完成软件搭建了。
但是每个产品其实都有命令和调试案例,这个是不能忽略的。
凡是目前使用的大数据产品都在特定条件下有着特殊处理能力,没必要把实际中产品都用上,还是按项目需求来看。越复杂的架构,运维工作量也越高,每个产品都是一本书啊。
/etc目录修改hosts,添加如下内容
172.16.2.147 Master 172.16.2.148 Slave1 172.16.2.149 Slave2 |
通过命令修改hostname
hostname Master hostname Slave1 hostname Slave2 |
Master\Slave1\Slave2分别生成密钥
ssh-keygen -t rsa -P "" cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys |
Slave1
cd /root/.ssh scp id_rsa.pub root@Master:/root/.ssh/id_rsa.pub.Slave1 |
Slave2
cd /root/.ssh scp id_rsa.pub root@Master:/root/.ssh/id_rsa.pub.Slave2 |
Master
cd /root/.ssh cat id_rsa.pub >> authorized_keys cat id_rsa.pub.Slave1 >> authorized_keys cat id_rsa.pub.Slave2 >> authorized_keys scp authorized_keys root@Slave1:/root/.ssh/authorized_keys scp authorized_keys root@Slave2:/root/.ssh/authorized_keys |
查找
http://www.cnblogs.com/kerrycode/archive/2015/08/27/4762921.html /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java |
已经存在则进行环境配置
java/hadoop/scala/spark/classpath/path都已经完整配置好。
vim ~/.bashrc 输入: export JAVA_HOME=/usr/lib/jvm/java-1.7.0 export JRE_HOME=${JAVA_HOME}/jre export HADOOP_HOME=/usr/local/bigdata/hadoop-2.8.2 export SCALA_HOME=/usr/local/bigdata/scala-2.11.0 export SPARK_HOME=/usr/local/bigdata/spark-2.0.0-bin-hadoop2.7 export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${SPARK_HOME}/bin:${SCALA_HOME}/bin:${HADOOP_HOME}/bin:${JAVA_HOME}/bin:$PATH scp ~/.bashrc root@Slave1:~/.bashrc scp ~/.bashrc root@Slave2:~/.bashrc source ~/.bashrc java -version |
所有大数据框架软件都安装在/usr/local/bigdata目录下,便于整体复制和搬移
scp -r /usr/local/bigdata/* root@Slave1:/usr/local/bigdata scp -r /usr/local/bigdata/* root@Slave2:/usr/local/bigdata |
1、Master:把下载下来的“hadoop-2.8.2.tar.gz”拷贝到“/usr/local/bigdata/”目录下并解压
http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.8.2/
cd /usr/local/bigdata tar -xzf hadoop-2.8.2.tar.gz rm -rf ./hadoop-2.8.2.tar.gz |
2、创建文件夹dfs(name/data)/tmp
3、修改8个配置文件
将配置文件复制到/usr/local/bigdata/hadoop-2.8.2/etc/hadoop下
把Master目录复制给Slave1和Slave2
4、Master启动
格式化
cd /usr/local/bigdata/hadoop-2.8.2/bin hadoop namenode –format jps后,namenode没有启动:每次格式化之前必须先删除core-site.xml下配置的 cd /usr/local/bigdata/hadoop-2.8.2/sbin ./stop-all.sh cd /usr/local/bigdata/hadoop-2.8.2/tmp rm -rf * NodeManager没启动:每次格式化前必须先删除hdfs-site.xml下的dfs.datanode.data.dir所指目录即执行下面命令 /usr/local/bigdata/hadoop-2.8.2/dfs/data rm –rf * cd /usr/local/bigdata/hadoop-2.8.2/sbin ./start-all.sh jps 查看效果 |
启动yarn
./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh starthistoryserver
Unable to load native-hadoop library foryour platform... using builtin-java classes where applicabl
http://blog.csdn.net/jack85986370/article/details/51902871
hadoop fs -mkdir -p /data/wordcount hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar wordcount /data/wordcount /output/wordcount hadoop fs -cat /output/wordcount/part-r-00000 |head |
http://master:50070/ 登录Web控制可以查看HDFS集群的状况
http://master:8088/
http://slave1:8042/
http://slave1:8042/
http://slave1:50075/datanode.html
http://slave2:50075/datanode.html
http://master:19888/jobhistory
http://master:50070/explorer.html#/
1、Master
解压、设置.bashrc
cd /usr/local/bigdata tar –zxf scala-2.11.0.tgz vim ~/.bashrc …SCALA_HOME和PATH source ~/.bashrc 测试 scala -version scala 9*9 :quit |
2、Master复制到Slave1和Slave2
1)Master: spark-2.0.0-bin-hadoop2.7.tgz放于/usr/local/bigdata下解压、设置.bashrc
2)/usr/local/bigdata/spark-2.0.0-bin-hadoop2.7/conf设置slaves、spark-env.sh
3)Slave1\Slave2同步文件目录
scp -r /usr/local/bigdata/spark-2.0.0-bin-hadoop2.7 root@Slave1:/usr/local/bigdata scp -r /usr/local/bigdata/spark-2.0.0-bin-hadoop2.7 root@Slave2:/usr/local/bigdata |
Master启动spqrk
cd /usr/local/bigdata/spark-2.0.0-bin-hadoop2.7/sbin ./start-all.sh cd /usr/local/bigdata/spark-2.0.0-bin-hadoop2.7/bin ./spark-shell |
http://Master:8080
http://master:4040/jobs/
退出 :quit
问题1:执行测试任务,权限被拒绝
chmod -R 777 /usr/local/bigdata/spark-2.0.0-bin-hadoop2.7 |
问题2:SPARK无法访问master:7077
17/11/08 23:54:27 WARN worker.Worker:Failed to connect to master 7080:7077
[root@Slave1 name]# nc -z -w 172.16.2.147 7077 nc: timeout cannot be negative |