Kudu启动过程中常见错误及解决方法

在kudu启动时,控制台显示启动ok并不一定已经真正启动了kudu-master和kudu-tserver,需要查看进程是否已经运行ps -ef | grep kudu

[root@node01 ~]# ps -ef | grep kudu
kudu      9644     1 18 00:43 ?        00:00:02 /usr/lib/kudu/sbin/kudu-master --server_dump_info_path=/var/run/kudu/kudu-master-kudu.json --flagfile=/etc/kudu/conf/master.gflagfile
kudu      9747     1 25 00:43 ?        00:00:02 /usr/lib/kudu/sbin/kudu-tserver --server_dump_info_path=/var/run/kudu/kudu-tserver-kudu.json --flagfile=/etc/kudu/conf/tserver.gflagfile
root     10052  3510  0 00:43 pts/2    00:00:00 grep kudu

两个进程都已经启动才算成功,出现启动失败或报错信息需要进入日志文件查看

[root@node01 kudu]# pwd
/var/log/kudu
[root@node01 kudu]# ll
total 21100
lrwxrwxrwx 1 kudu kudu      55 Oct 29 16:13 kudu-master.ERROR -> kudu-master.node01.kudu.log.ERROR.20191029-161311.45731
lrwxrwxrwx 1 kudu kudu      55 Oct 29 12:12 kudu-master.FATAL -> kudu-master.node01.kudu.log.FATAL.20191029-121251.23666
lrwxrwxrwx 1 kudu kudu      54 Oct 29 14:41 kudu-master.INFO -> kudu-master.node01.kudu.log.INFO.20191029-144152.45731

一般查看错误日志找到出错信息tail -f kudu-master.ERROR

  • 1、给用户添加sudo权限的时候报错

    sudo: /etc/sudoers is world writable
    

    解决方式: chmod 555 /etc/sudoers

  • 2、启动kudu的时候报错

    Failed to start Kudu Master Server. Return value: 1 [FAILED]
    

    去日志文件中查看,发现集群机器时间同步出现问题:

    F0810 09:04:08.354552 4866 master_main.cc:68] Check failed: _s.ok() Bad status:
    Service unavailable: Cannot initialize clock: Error reading clock. Clock considered
    unsynchronized
    

    解决方式:
    第一步:首先检查是否有安装ntp:如果没有安装则使用以下命令安装:

    yum -y install ntp
    

    第二步:设置开机启动:

    service ntpd start
    chkconfig ntpd on
    

    第三步:配置集群时间同步(主节点node01与外部网络时间同步,集群其他节点与主节点node1同步)

    • 方法一:设置定时任务
      主节点node01设置定时任务,每分钟刷新一次 crontab -e
    */1 * * * * /usr/sbin/ntpdate -u ntp4.aliyun.com;
    

    其他节点设置定时任务,每分钟刷新一次 crontab -e

    */1 * * * * /usr/sbin/ntpdate -u 192.168.100.110(主节点1的IP地址)
    
    • 方法二:修改配置文件(文件略长,这里只把有用的配置截取下来,其他都已经注释了)
      主节点node01 vim /etc/ntp.conf

      driftfile /var/lib/ntp/drift
      restrict default kod nomodify notrap nopeer noquery
      restrict -6 default kod nomodify notrap nopeer noquery
      restrict 127.0.0.1
      restrict -6 ::1
      restrict 192.168.100.0  mask  255.255.255.0 nomodify  notrap  # 给192.168.100.0网段,子网掩码为255.255.255.0的局域网机的机器有同>步时间的权限
      server ntp1.aliyun.com prefer
      server  127.127.1.0  #  localclock
      includefile /etc/ntp/crypto/pw
      keys /etc/ntp/keys
      

      其他节点配置 vim /etc/ntp.conf

      driftfile /var/lib/ntp/drift
      restrict default kod nomodify notrap nopeer noquery
      restrict -6 default kod nomodify notrap nopeer noquery
      restrict 127.0.0.1
      restrict -6 ::1
      server 192.168.100.110  # 主节点IP
      includefile /etc/ntp/crypto/pw
      keys /etc/ntp/keys
      

      修改配置后需要重启ntpd服务service ntpd restart

      查看同步状态命令:netstat,成功如下:

      synchronised to NTP server (120.25.115.20) at stratum 3
         time correct to within 32 ms
         polling server every 128 s
      
      synchronised to NTP server (192.168.100.110) at stratum 7
         time correct to within 60 ms
         polling server every 1024 s
      

      方法二配置后可能需要等5~10分钟才能同步成功,时间偏差过大可能会失败,最好用ntpdate -u xxx命令手动同步一次,重启服务多试几次就好了

  • 3、启动过程中报错

    F0810 21:31:12.620932 20143 master_main.cc:71] Check failed: _s.ok() Bad status:
    Invalid argument: Unable to initialize catalog manager: Failed to initialize sys tables
    async: on-disk master list
    

    解决:
    第一步:停掉master和tserver
    第二步:删除掉所有master和tserver下的内容

    rm -rf /export/servers/kudu/master/*
    rm -rf /export/servers/kudu/tserver/*
    
  • 4、启动过程中报错

    F0913 15:12:00.628237 20859 master_main.cc:74] Check failed: _s.ok() Bad status: IO
    error: Could not create new FS layout: unable to create file system roots: unable to
    write instance metadata: Call to mkstemp() failed on name template
    /export/servers/kudu/master/instance.kudutmp.XXXXXX: Permission denied (error 13)
    

    解决:

    这是因为kudu默认使用kudu权限进行执行,可能遇到文件夹的权限不一致情况,更改文件夹权限即可

    chown -R kudu:kudu /export/servers/kudu
    
  • 5、启动kudu-master失败

    F1029 10:09:52.183449 44724 master_main.cc:77] Check failed: _s.ok() Bad status: Not found: Unable to initialize catalog manager: Failed to initialize sys tables async: Unable to load consensus metadata for tablet 00000000000000000000000000000000: Unable to load consensus metadata for tablet 00000000000000000000000000000000: /export/servers/kudu/master/consensus-meta/00000000000000000000000000000000: No such file or directory (error 2)
    

    原因:配置文件/etc/kudu/conf/master.gflagfile创建wal和data目录时共用了同一个路径

    --fs_wal_dir=/export/servers/kudu/master
    --fs_data_dirs=/export/servers/kudu/master
    

    解决:
    第一步:修改/etc/kudu/conf/master.gflagfile配置

    --fs_wal_dir=/export/servers/kudu/master/wal
    --fs_data_dirs=/export/servers/kudu/master/data
    

    第二步:删除master目录下所有文件

    rm -rf /export/servers/kudu/master/*
    

你可能感兴趣的:(大数据)