今天我们来说一下CDH集群5.14.2版本,CentOS7.5的时钟同步问题
最近发现在CentOS 7的系统上使用ntp的方式同步后,CDH集群依然会告警,如下图:
而此时集群内服务器都是往阿里云的时钟同步源做的ntp同步,且执行ntptime返回的结果都是ok:
[root@wjltony parcel-repo]# ntptime
ntp_gettime() returns code 0 (OK)
time e0d8ffcd.417db000 Wed, Jul 17 2019 10:00:45.255, (.255824),
maximum error 58773 us, estimated error 12 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 0.000 us, frequency 37.103 ppm, interval 1 s,
maximum error 58773 us, estimated error 12 us,
status 0x0 (),
time constant 7, precision 1.000 us, tolerance 500 ppm,
执行timedatectl,显示也无异常
[root@wjltony parcel-repo]# timedatectl
Local time: Wed 2019-07-17 10:02:11 CST
Universal time: Wed 2019-07-17 02:02:11 UTC
RTC time: Wed 2019-07-17 02:02:11
Time zone: Asia/Shanghai (CST, +0800)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: n/a
根据之前几次成功的经验判断,尝试使用chronyd来做时钟同步于是
[root@wjltony ~]# systemctl stop ntpd
[root@wjltony ~]# cat /etc/chrony.conf
server 192.168.0.100 iburst
stratumweight 0
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
logdir /var/log/chrony
[root@wjltony ~]# systemctl start chronyd
[root@wjltony ~]#
使用chronyd同步后,果然CDH不告警了
时钟同步如果不在刚开始的时候做好,kudu就无法正常运行。如下图:
kudu在运行的时候会去校验ntptime,如果返回的是ERROR,那么kudu就有可能会停止。
[root@wjltony ~]# ntptime
ntp_gettime() returns code 5 (ERROR)
time e09a4178.ee80be80 Thu, May 30 2019 19:48:08.931, (.931652319),
maximum error 16000000 us, estimated error 16000000 us, TAI offset 0
ntp_adjtime() returns code 5 (ERROR)
modes 0x0 (),
offset 0.000 us, frequency 36.192 ppm, interval 1 s,
maximum error 16000000 us, estimated error 16000000 us,
status 0x2041 (PLL,UNSYNC,NANO),
time constant 3, precision 0.001 us, tolerance 500 ppm,