命令行键入命令:
确定该系统的静态IP,需要设置的参数如下:
Bootproto参数为static
Ip地址 子网掩码 网关 dns服务器
Onboo参数为yes,如下图
编辑结束,重启机器或者运行:service network start使得配置生效
1.配置master的hosts
2.把hosts文件传送给其他主机
原理:把master生成的公钥发送给其他slaves主机即可
如果不检查权限会导致在hadoop用户下启动集群ssh免密码登录失效
切换到root用户修改文件 vi /etc/sudoers
添加 hadoop ALL=(root) NOPASSWD:ALL
生成公钥 cat id_rsa.pub>>authorized_keys
chmod g-w /home/hadoop
chmod 700 /home/hadoop/.ssh
chmod 600 /home/hadoop/.ssh/authorized_keys
ssh-copy-id hadoop@slave1
Master上主机的jdk环境配置
配置slaves的jdk环境
测试所有的master,slaves jdk是否安装成功命令:java
Master主机配置
几个配置文件分别是:hadoop-env.sh、core-site.xml、hdfs-site,yarn-site。xml、mapred-site.xml
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://master:9000value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/home/hadoop/tmpvalue>
property>
configuration>
<configuration>
<property>
<name>dfs.namenode.name.dirname>
<value>/home/hadoop/tmp/namevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>/home/hadoop/tmp/datavalue>
property>
<property>
<name>dfs.replicationname>
<value>3value>
property>
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>master:50090value>
property>
configuration>
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
configuration>
<configuration>
<property>
<name>yarn.resourcemanager.hostnamename>
<value>mastervalue>
property>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
configuration>
slave1
slave2
slave3
配置完master上的几个hadoop配置文件之后,把配置文件目录传送到slaves;
Pass
Master主机上配置Hive
配置环境变量
修改配置文件
<configuration>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://(mysql的ip地址):3306/hive? Create
DatabaseIfNotExist=truevalue>
property>
<property>
<name>javax.jdo.option.ConnectionDriveNamename>
<value>com.mysql.jdbc.Drivervalue>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>rootvalue>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>rootvalue>
property>
configuration>
MySQL JDBC驱动
cp mysql-connector-java-5.1.40-bin.jar /usr/local/hive/lib
mysql>grant all privileges on *.* to root@master identified by ‘root’;
(将mysql数据库所有表的所有权限赋给root用户 root用户和后面的root是在配置文件hive-site.xml中事先设置的连接密码)
mysql>flush privileges; (刷新权限)
启动 HIVE
若是启动失败
Spark安装
配置相关文件
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
测试
Spark集群安装
配置环境变量
Spark的配置
$cd /usr/local/spark
$cp ./conf/slaves.template ./conf/slaves
替换默认的localhost为Slave1 Slave2
$cp ./conf/spark-env.sh.template ./conf/spark-env.sh
编辑spark-env.sh
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_MATER_IP=xxx.xxx.xxx.xxx
$scp –r /usr/local/spark root@slave1:/usr/local/
$scp –r /usr/local/spark root@slave2:/usr/local/
$sudo chown –R hadoop usr/local/spark
启动Spark集群
$sbin/start-master.sh
$sbin/start-slaves.sh
访问http://master:8080
export FLUME_HOME=/home/hadoop/local/flume
export PATH=$FLUME_HOME/bin:$PATH
export JAVA_HOME=/usr/local/jdk
#命名此代理上的组件
a1.sources=r1
a1.channels=c1
a1.sinks=k1
a1.sources.r1.type=netcat
a1.sources.r1.bind=localhost
a1.sources.r1.port=55555
a1.sources.r1.max-line-length=1000000
a1.sources.r1.channels=c1
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://Initial_Data/%Y%m%d
a1.sinks.k1.hdfs.filePrefix=%Y%m%d-
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.rollSize=1048576
a1.sinks.k1.hdfs.rollInterval=0
a1.sinks.k1.channel=c1
连接好后通过flume-ng 命令启动flume组件
flume-ng agent -n a1 -c conf -p 55555 -f /home/hadoop/local/flume/conf/demoagent.conf -Dflume.hadoop.logger=INFO,console
windows 上面cmd命令行用 telnet 主机ip地址 +端口号也可以连接 并且cmd是作为客户端(作为55555端口存在)存在
import time
import socket
import datetime
class Flume_test(object):
def __init__(self):
self.flume_host ='192.168.31.190'
self.flume_port = 55555
def gen_conn(self):
tcp_cli = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
return tcp_cli
def gen_data(self):
return 'Flume test ,datetime:[%s]\n'%datetime.datetime.now()
def main(self):
cli = self.gen_conn()
cli.connect((self.flume_host,self.flume_port))
while 1:
data = self.gen_data()
print(data)
cli.sendall(bytes(data, encoding="utf8"))
recv = cli.recv(1024)
print(recv)
time.sleep(1)
if __name__ == '__main__':
ft = Flume_test()
ft.main()
1操作4台电脑都要做
export ZOOKEEPER_HOME=/home/hadoop/local/zookeeper
export PATH= $ZOOKEEPER_HOME/bin:$PATH
dataDir=/home/hadoop/local/zookeeper/data
server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888
server.4=slave3:2888:3888
通过scp将/etc/profile文件传入 此时要使用root用户登录对方
这样会有一个权限问题 这样解决给当前hadoop用户赋予读写执行权限
sudo chmod u+rwx /home/hadoop/local/zookeeper
通过scp将zookeeper安装包传入 此时hadoop用户登录即可
sudo echo "1" >myid 剩下三个集群分别做对应配置
firewall-cmd –state
systemctl stop firewalld.service
zkServer.sh start
{start|start-foreground|stop|restart|status|upgrade|print-cmd}
zkServer.sh status 查看关系
这一步报错请检查
第一,zoo.cfg文件配置出错:dataLogDir指定的目录未被创建。
第二,myid文件中的整数格式不对,或者与zoo.cfg中的server整数不对应
第三,防火墙未关闭;(我当时遇到的)
第四,端口被占用;
第五,zoo.cfg文件中主机名出错;
第六,hosts文件中,本机的主机名有两个对应,只需保留主机名和ip地址的映射