[2013-05-13 19:06:17] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:3 [2013-05-13 19:06:18] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:18] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:18] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:3 [2013-05-13 19:06:19] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:19] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:19] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:3 [2013-05-13 19:06:20] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:20] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:20] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:3 [2013-05-13 19:06:21] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:21] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:21] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:3 [2013-05-13 19:06:22] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:22] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:22] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:3 [2013-05-13 19:06:23] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:23] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:2 [2013-05-13 19:06:23] ERROR handlePacket (heartbeat_thread.cpp:138) [1143060800] ControlPacket, cmd:3
所以,往往只从dataserver上的error log来看,很难定位问题所在。 这时候,需要我们分析configserver上的error log来分析运行失败的原因了。
我自己在搭建Tair的时候, 遇到了两种情况, 不过我相信不仅仅只是这2个, 大家如果遇到了同样的问题, 欢迎分享。 例如下面的
[2013-05-13 18:56:46] ERROR load (config.cpp:124) [1165531456] 不能打开配置文件: ../etc/group.conf [2013-05-13 18:56:46] ERROR load_group_file (server_conf_thread.cpp:125) [1165531456] load config file ../etc/group.conf error
原因是: 在configserver.conf中错误的配置了group_file的路径,例如:
[configserver] port=5198 log_file=logs/config.log pid_file=logs/config.pid log_level=info group_file=group.conf data_dir=data/data
此处的group_file=group.conf,应该对应configserver上的group.conf文件的路径。
-----------------------cut-----------------------------
还有另外一种情况是, 在log里面会看到:
[2013-05-13 18:59:23] ERROR rebuild (group_info.cpp:624) [1099381056] can not get enough data servers. need 1 lef 0 [2013-05-13 19:06:09] ERROR rebuild (group_info.cpp:624) [1106565440] can not get enough data servers. need 1 lef 0
configserver怎么都找不到dataserver,这里需要注意的是, 在启动服务时,一定要先启动dataserver上的服务,然后再启动configserver上的服务。
还有一种导致这个问题的原因是,在configserver.conf和dataserver.conf这两个配置文件中,网卡的映射需要填写正确。 例如,
#slave config server config_server=10.210.214.136:5198 dev_name=eth1
如果这里配置的是eth1,然而你的server恰巧没有eth1这个网卡,那么也会导致Tair运行失败。 先写到这里, 再遇到问题,再和大家分享。