statsD+graphite集群迁移记录

该文章为本人迁移旧集群的过程记录,为图方便直接打包了原配置文件直接部署到新机器上,而且排除了数据与日志存放目录,故下面有新建目录/文件的操作(其中有部分不需要手动创建),与新建集群的操作基本没有太大差异。留作记录,也供大家参考。

注:在三台节点上都要进行如下配置。

1,安装软件包:

apt-get update
##部分包互相依赖的,可能有重复:
apt-get -y install build-essential libpq-dev libxml2-dev libxslt1-dev libldap2-dev libsasl2-dev libffi-dev libssl-dev python-django-tagging python-simplejson python-memcache python-ldap python-cairo python-pysqlite2 python-support python-pip python-dev python-rrdtool zlib1g-dev gunicorn nodejs wget curl  nginx supervisor git devscripts debhelper software-properties-common lftp

pip install Django==1.7.7
pip install Twisted==11.1.0
pip install whisper 

2,解压原配置文件并创建数据与日志相关目录(因体积过大未拷贝):

tar -pzxvf grap-1.tar.gz -C /
tar -pzxvf grap-2.tar.gz -C /
tar -pzxf statsd.tar.gz -C /
tar -pzxvf supervisor.tar.gz -C /
mv /etc/supervisor/conf.d/{es.conf,es_monitor.conf} /etc/supervisor/conf.d/bak/

mkdir -p /data/graphite/storage/log/{carbon-cache,carbon-relay,webapp} /data/graphite/storage/whisper/b_statsd/{timers,counter} /data/graphite/storage/whisper/monitor /data/log/{statsd,graphite} 
chown www-data:www-data /data/graphite/storage/log/
touch /data/graphite/storage/log/webapp/{exception.log,info.log}

3,修改各项目的配置文件:

  • 修改statsd的配置文件(更改节点的IP地址):
    /data/statsd/statsd.js*
ls *statsd.js | xargs sed -i 's/172.18.20.57/172.16.0.208/g'
ls *statsd.js | xargs sed -i 's/172.18.20.58/172.16.33.238/g'
ls *statsd.js | xargs sed -i 's/172.18.20.59/172.16.17.195/g'
##若不需要第三台集群可以通过以下命令删除该行配置:
ls *statsd.js | xargs sed -i '/172.18.20.59/d'
  • 修改carbon相关配置文件(调整各参数并更改节点IP):
    /opt/graphite/conf/carbon.conf
DESTINATIONS = 172.16.0.208:2014:a,172.16.33.238:2014:b,172.16.17.195:2014:c

/opt/graphite/conf/relay-rules.conf

destinations = 172.16.0.208:2004:a,172.16.33.238:2004:b,172.16.17.195:2004:c
  • 修改graphite-web的配置文件:
    /opt/graphite/webapp/graphite/local_settings.py
CLUSTER_SERVERS = ["172.16.0.208:80","172.16.33.238:86","172.16.17.195:86"]

4,配置nginx反代graphite:

相关目录需要注意www-data的权限,按原属性解压就没问题。

注:业务机器通过前面的slb传数据到后端的statsD集群上。

  • cluster-node1:
cat /etc/nginx/conf.d/graphite.conf

server { 
server_name  grap.bilibili.co 172.16.0.208;
listen 80; 
charset utf-8; 
location / { 
  proxy_pass http://127.0.0.1:8000;
} 
}
  • cluster-node2:
cat /etc/nginx/conf.d/graphite.conf

server { 
server_name  grap.bilibili.co 172.16.0.209;
listen 86; 
charset utf-8; 
location / { 
  proxy_pass http://127.0.0.1:8000;
} 
}
  • cluster-node3:
cat /etc/nginx/site-enabled/01-graphite
server { 
server_name  grap.bilibili.co 172.16.17.195;
listen 86; 
charset utf-8; 
location / { 
  proxy_pass http://127.0.0.1:8000;
} 
}

5,配置迁移完成,启动服务:

systemctl restart supervisor.service
nginx -t
nginx -s reload

6,排错与测试:

  • supervisorctl打开失败:
# supervisorctl  
unix:///var/run/supervisor.sock no such file
supervisor>

该错误是因为supervisord父进程没有预先启动。

起服务时需要先启动supervisord,然后再通过supervisorctl管理子进程。如下为启动成功:

# systemctl restart supervisor.service
# supervisorctl 
activity-node1                   RUNNING    pid 31429, uptime 0:19:56
activity-node2                   RUNNING    pid 31417, uptime 0:19:56
activity_statsd                  RUNNING    pid 31420, uptime 0:19:56
aso-node1                        RUNNING    pid 31419, uptime 0:19:56
aso-node2                        RUNNING    pid 31423, uptime 0:19:56
aso_statsd                       RUNNING    pid 31418, uptime 0:19:56
carbon-cache                     RUNNING    pid 31915, uptime 0:11:54
carbon-relay                     RUNNING    pid 31919, uptime 0:11:48
graphite                         RUNNING    pid 31974, uptime 0:10:29
supervisor> 
  • graphite进程启动失败:
    有时由于Django的依赖包安装不完整或者版本不对,导致graphite启动失败,可以通过pip freeze 查看其他服务器上的配置,再通过pip install安装:
pip freeze > /tmp/django.txt #再将该文件拷贝至本服务器
pip install -r /tmp/django.txt #配置django环境
##有遇到django-tagging的版本过低(0.3.x)导致graphite启动失败
pip install django-tagging==0.4
  • 数据传输测试:
echo "test.logstash.num:100|c" | nc -w 1 -u $IP $port
echo "test.logstash.num:100|c" | nc -w 1 -u 127.0.0.1 8921
echo "test.logstash.num:200|c" | nc -w 1 -u 127.0.0.1 7798

如果安装配置是正常的,在graphite的左侧会多出这些路径与数据表:metrics->b_stats->counters->test->logstash->num


以上。

你可能感兴趣的:(statsD+graphite集群迁移记录)