记一次hadoop大数据集群生产事故

陆续对原有的hadoop、hbase集群做了扩容,增加了几个节点,中间没有重启过,今天早上发现一个hregionserver服务停止了,就先启动服务,没想到启动之后一直有访问数据的出错,尝试对整个hbase集群进行重启出现了下面的错误:

$ start-hbase.sh

master running as process 112580. Stop it first.

The authenticity of host 'szc-l0104567 (192.168.1.81)' can't be established.

RSA key fingerprint is 76:e5:12:90:de:59:e1:da:02:f3:f1:2a:9a:a6:f8:c4.

Are you sure you want to continue connecting (yes/no)? The authenticity of host 'szc-l0104566 (192.168.1.80)' can't be established.

RSA key fingerprint is cd:d1:ad:98:ca:36:b5:ec:c3:1d:be:b8:8c:ae:bc:80.

Are you sure you want to continue connecting (yes/no)? The authenticity of host 'szc-l0124500 (192.168.1.95)' can't be established.

RSA key fingerprint is ec:3e:83:b0:bf:f0:3b:6d:7e:fa:e8:1d:7e:67:ed:27.

 

通过字面意思可以看出是验证出现问题,在验证主机名无密码通过的时候需要通过输入“yes”才能进行下去。

我是通过拷贝authorized_keys文件到对方节点的.ssh目录进行无密码通过的,这样只能通过针对IP的无密码验证,不需要输入“yes”,但是在第一次无密码验证主机名的时候还需要输入“yes”,所以在添加hregionser节点的时候,无密码验证就需要手动的去做第一次的主机名无密码验证,或者用下面的方式做无密码验证。

 

方式2:

ssh-copy-id -i ~/.ssh/id_rsa.pub username@IP

ssh-copy-id -i ~/.ssh/id_rsa.pub username@HOSTNAME

 

方式3:

在执行完第一步拷贝authorized_keys或者ssh-copy-id -i ~/.ssh/id_rsa.pub username@IP 之后再执行下面的步骤

ssh -o stricthostkeychecking=no HOSTNAME

你可能感兴趣的:(云计算和大数据)