CDH平台Solr initialize failed错误分析与暴力解决方法

搭建好cdh平台之后,不知道误删了什么文件,导致添加Solr服务时在初始化阶段显示Solr initialize failed,初始化失败,重新安装服务甚至多次重新安装cdh平台依然是这种状况。查看日志显示如下:

15/Sep/2018 18:52:53 +0000 org.apache.solr.common.cloud.ZkStateReader$3 process
WARNING: ZooKeeper watch triggered, but Solr cannot talk to ZK
15/Sep/2018 18:52:53 +0000 org.apache.solr.cloud.LeaderElector$1 process
WARNING:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/test/leader_elect/slice3/election
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1249)
at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:266)
at org.apache.solr.common.cloud.SolrZkClient$6.execute(SolrZkClient.java:263)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65)
at org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:263)
at org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:92)
at org.apache.solr.cloud.LeaderElector.access$000(LeaderElector.java:57)
at org.apache.solr.cloud.LeaderElector$1.process(LeaderElector.java:121)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:507)

日志显示zookeeper的watch已经被触发,但是Solr和Zookeeper不能进行会话。根据Solr以下的相关源码,可以看出,Solr创建了ZkStateReader实例,这个实例主要负责持有zk中的属性,并注册watcher。从源码中可以看到,警告日志信息ZooKeeper watch triggered, but Solr cannot talk to ZK出现的条件是SESSIONEXPIRED异常或CONNECTIONLOSS异常,原因可能是会话过期或连接丢失,可以尝试加大zookeeper会话的超时时间。

synchronized (getUpdateLock()) {
      cmdExecutor.ensureExists(CLUSTER_STATE, zkClient);
      cmdExecutor.ensureExists(ALIASES, zkClient);
     
      log.info("Updating cluster state from ZooKeeper... ");
      
      zkClient.exists(CLUSTER_STATE, new Watcher() {
       
        @Override
        public void process(WatchedEvent event) {
          // session events are not change events,
          // and do not remove the watcher
          if (EventType.None.equals(event.getType())) {
            return;
          }
          log.info("A cluster state change: {}, has occurred - updating... (live nodes size: {})", (event) , ZkStateReader.this.clusterState == null ? 0 : ZkStateReader.this.clusterState.getLiveNodes().size());
          try {
           
            // delayed approach
            // ZkStateReader.this.updateClusterState(false, false);
            synchronized (ZkStateReader.this.getUpdateLock()) {
              // remake watch
              final Watcher thisWatch = this;
              Stat stat = new Stat();
              
              byte[] data = zkClient.getData(CLUSTER_STATE, thisWatch, stat ,
                  true);
              Set<String> ln = ZkStateReader.this.clusterState.getLiveNodes();
              ClusterState clusterState = ClusterState.load(stat.getVersion(), data, ln,ZkStateReader.this);
              // update volatile
              ZkStateReader.this.clusterState = clusterState;
            }
          } catch (KeeperException e) {
            if (e.code() == KeeperException.Code.SESSIONEXPIRED
                || e.code() == KeeperException.Code.CONNECTIONLOSS) {
              log.warn("ZooKeeper watch triggered, but Solr cannot talk to ZK");
              return;
            }
            log.error("", e);
            throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR,
                "", e);
          } catch (InterruptedException e) {
            // Restore the interrupted status
            Thread.currentThread().interrupt();
            log.warn("", e);
            return;
          }
        }
       
      }, true);
    }

以上源码是通过已经存在于zookeeper中的属性,对ZkStateReader进行初始化过程的一部分,因此我使用zkCli.sh指令进入zookeeper的管理器,查看zookeeper中和Solr相关的属性,发现根目录下没有/Solr目录。猜测是误删除了zookeeper中的某些文件夹,导致Solr初始化时无法在zookeeper创建/Solr目录并持有一些属性数据,从而导致Solr没有办法从zookeeper中获得相关属性,向zookeeper多次轮询最终会话超时,这种情况下加大会话时长并不起作用。于是最终选择将cdh彻底清除再重新进行安装,添加Solr服务时,zookeeper自动创建/Solr目录和生成一些属性值,Solr成功初始化。
cdh完全卸载步骤如下:

  1. 通过cdh的可视化管理页面关闭集群中的服务。
  2. 停止cloudera的相关服务。
    server节点:
    service cloudera-scm-server stop
    agent节点:
    service cloudera-scm-agent stop
  3. 卸载安装包。
    rpm -qa | grep cloudera
    for f in rpm -qa | grep cloudera ; do rpm -e ${f} ; done (如果有保存,再执行一遍)
  4. 清除已安装服务的相关目录。
    umount /var/run/cloudera-scm-agent/process
    rm -rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/x86_64/6/cloudera* /var/log/cloudera* /var/run/cloudera* /etc/cloudera*
  5. 清除安装文件。
    rm -rf /var/lib/hadoop-* /var/lib/impala /var/lib/solr /var/lib/zookeeper /var/lib/hue /var/lib/oozie /var/lib/pgsql /var/lib/sqoop2 /data/dfs/ /data/impala/ /data/yarn/ /dfs/ /impala/ /yarn/ /var/run/hadoop-/ /var/run/hdfs-/ /usr/bin/hadoop* /usr/bin/zookeeper* /usr/bin/hbase* /usr/bin/hive* /usr/bin/hdfs /usr/bin/mapred /usr/bin/yarn /usr/bin/sqoop* /usr/bin/oozie /etc/hadoop* /etc/zookeeper* /etc/hive* /etc/hue /etc/impala /etc/sqoop* /etc/oozie /etc/hbase* /etc/hcatalog
    rm -rf ` find /var/lib/alternatives/* ! -name “mta” ! -name “print” ! -name “zlibrary-ui” -mtime -3`
    rm -rf /etc/alternatives/*
  6. 杀死监管进程。
    ps aux|grep super
    kill -9 pid(pid为上述指令回车后supervisord的进程id)
  7. 删除parcel包分发文件和解压文件。
    rm -rf /opt/cloudera/parcel-cache /opt/cloudera/parcels
    完成以上步骤后,即可重新安装cdh平台。

你可能感兴趣的:(CDH平台)