zookeeper 本质上是一个分布式实时key-value存储数据库。在现代商业系统中,部署很广泛。
之前,搭过很多次zookeeper, 虽然不难,但是步骤比较繁琐,每次搭建5个节点都要耗费至少1小时时间。
后来采用ansible部署,写成了一个独立的ansible role,变成一个标准。 使用者只要配置一下参数即可,使用起来非常方便,
我在实际使用,全程部署:8分钟,100%成功,不依赖于部署者的心情。 实际部署速度主要取决于网络速度,和部署的节点数量。
一般商业使用,建议至少部署5个节点。3个节点虽然可以使用,但是比较脆弱。
部署过程和环境要求描述如下:
相关代码可以查看我的github https://github.com/HappyFreeAngel/zookeeper-cluster-offline-install.git
组件名称 | 版本 | 是否必须 | 下载链接 |
操作系统 | centos7 1608 | 是 | 见官网 |
JDK | jdk8 | 是 | 见官方 |
zookeeper | 是 | https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.5.4-beta/zookeeper-3.5.4-beta.tar.gz | |
lsof | 否 | ||
nc | 否 | ||
ssh-passwordless-login | 1.0.0 | 否 | https://github.com/HappyFreeAngel/passwordless-ssh-login.git |
序号 | 虚拟机名称 | IP | ||
1 | zkb1 | 10.20.2.51 | ||
2 | zkb2 | 10.20.2.51 | ||
3 | zkb3 | 10.20.2.51 | ||
4 | zkb4 | 10.20.2.51 | ||
5 | zkb5 | 10.20.2.51 |
安装过程描述:
1. 首先准备 要安装的虚拟机或物理机. 创建相应的机器,设置好IP地址,确保能互相ping 通.
2. 部署zookeeper
3. 测试,确认部署成功.
- name: zookeeper-cluster offline install playbook include many books. hosts: localhost gather_facts: False # become: yes # become_method: sudo vars: projectinfo: "{{ lookup('file','input.yml') | from_yaml }}" vm_host_list: [] domain_group_dict: {} pre_tasks: - set_fact: task_startup_timestamp="{{lookup('pipe','date \"+%Y-%m-%d %H:%M:%S\"')}}" - name: "这个是在每个任务执行之前会执行的任务." shell: echo "任务开始...,检查依赖的文件是否存在."; ./before-run.sh; - name: "检查本地项目文件夹里的文件是否存在" shell: ./check-file-exist-status.sh register: files_status - name: "if stdout check failed,interrupt execution" fail: msg="出错了,有文件的链接失效,文件不存在" when: '"does not exist" in files_status.stdout' - name: "检查role依赖是否正常,版本是否正确" #todo shell: ./check-role-dependency.sh register: role_dependency_status - name: "role依赖缺失" fail: msg="role依赖存在问题" when: '"role does not exist" in role_dependency_status.stdout' - name: "set projectroot short hand hostdict" set_fact: projectroot="{{projectinfo['project_root']}}" - name: "set commonsetting short hand vars" set_fact: commonsetting="{{projectroot['common']}}" - name: "set hostdict short hand vars" set_fact: hostdict="{{projectroot['all_hosts']}}" - name: "set hostconfig short hand vars" set_fact: hostconfig="{{projectroot['host_config']}}" - name: "set hostconfig short hand vars" set_fact: zookeeperconfig="{{projectroot['host_config']['zookeeper_config']}}" - name: "vcenterconfig" set_fact: vcenterconfig="{{projectroot['vsphere_platform']['vmware_esxi']}}" - name: "set fact" set_fact: virtualbox_template_name="{{projectroot['host_config']['vagrant_config']['virtualbox_template_name']}}" when: projectroot['deploy_vsphere_platform']=='vmware_esxi' - name: "set fact" set_fact: vm_bridge_nic_name="eth1" - name: "批量合并列表合并对象" set_fact: vm_host_list="{{ vm_host_list }} + {{ hostdict[item] }}" with_items: "{{hostdict.keys()}}" when: hostdict[item][0].ismaster == true - name: "生成临时group-domain-ip映射表文本文件/tmp/group_domain_ip.txt" template: src=templates/group_domain_ip_user_password.txt.j2 dest=/tmp/group_domain_ip_user_password.txt - name: "把/tmp/group_domain_ip_user_password.txt内容放到注册变量里" shell: cat /tmp/group_domain_ip_user_password.txt register: group_domain_ip_user_password #注意密码和用户名不能包含:和逗号,否则就出错了,因为分割符号是,:无法正确分割.. #hadoop-namenode-hosts:hadoop-namenode1.yourdomain.com:10.20.2.1:centos:yourpassword,hadoop-namenode-hosts:hadoop-namenode2.yourdomain.com:10.20.2.2:centos:yourpassword,hadoop-namenode-hosts:hadoop-namenode3.yourdomain.com:10.20.2.3:centos:yourpassword,hadoop-datanode-hosts:hadoop-datanode1.yourdomain.com:10.20.2.11:centos:yourpassword,hadoop-datanode-hosts:hadoop-datanode2.yourdomain.com:10.20.2.12:centos:yourpassword,hadoop-datanode-hosts:hadoop-datanode3.yourdomain.com:10.20.2.13:centos:yourpassword - set_fact: group_domain_ip_user_password_list={{ group_domain_ip_user_password.stdout.split(',') }} - add_host: hostname: "{{item.split(':')[1]}}" groups: "{{item.split(':')[0]}}" ansible_host: "{{item.split(':')[2]}}" # ansible_port: 22 ansible_user: "{{item.split(':')[3]}}" ansible_ssh_pass: "{{item.split(':')[4]}}" with_items: "{{group_domain_ip_user_password_list}}" #特别注意,这里都是root 用户,hadoop 用户还没有创建. - name: "set short hand vars" set_fact: dnsconfig="{{hostconfig['dns_config']}}" - name: "动态创建/修改DNS 记录 (DDNS) 当域名没有解析或解析不正确时才添加解析. the current host is {{ansible_hostname}}. create A record {{ item.name }}-->ip:{{ item.ip }}" nsupdate: key_name: "{{dnsconfig['key_name']}}" key_secret: "{{dnsconfig['dns_update_key']}}" server: "{{commonsetting['citybox_work_network']['dnsserver1']}}" zone: "{{dnsconfig['zone']}}" record: "{{item.name.split('.')[0]}}" value: "{{ item.ip }}" with_items: "{{hostdict['zookeeper-hosts']}}" when: lookup('dig', item.name) != item.ip #顶层的playbook include,不是task include roles: - role: vmware-del-vm user_vcenterconfig: "{{ vcenterconfig }}" user_host_list: "{{ hostdict['zookeeper-hosts'] }}" #这个名称不能用appconfig,会冲突. async: 300 poll: 0 when: projectroot['deploy_vsphere_platform']=="vmware_esxi" # when: inventory_hostname.find('zookeeper')!=-1 - role: vmware-create-vm user_vcenterconfig: "{{ vcenterconfig }}" user_host_list: "{{ hostdict['zookeeper-hosts'] }}" #这个名称不能用user_host_list,会冲突. user_vm_network: "{{commonsetting['citybox_work_network']}}" async: 600 poll: 0 when: projectroot['deploy_vsphere_platform']=="vmware_esxi" - role: wait-in-second max_wait_time_in_seconds: "{{ (hostdict['zookeeper-hosts'] | length | int )* 30 + 150 }}" - role: vmware-poweredon-vm user_vcenterconfig: "{{ vcenterconfig }}" user_host_list: "{{ hostdict['zookeeper-hosts'] }}" async: 240 poll: 0 - role: waitfor-vm-startup max_wait_time_in_seconds: "{{ (hostdict['zookeeper-hosts'] | length | int )* 30 + 150 }}" user_host_list: "{{ hostdict['zookeeper-hosts'] }}" - role: system-storage-increase host_list: "{{ hostdict['zookeeper-hosts'] }}" target_device: "/dev/sda" virtual_machine_template_disk_size_in_gb: "{{ vcenterconfig['virtual_machine_template_disk_size_in_gb'] }}" file_system: "xfs" mount_dir: "/var/server" - role: dns-resolve host_list: "{{ hostdict['zookeeper-hosts'] }}" dns_server_ip: "{{vcenterconfig['dnsserver1']}}" - role: dns-resolve host_list: "{{ hostdict['zookeeper-hosts'] }}" dns_server_ip: "8.8.8.8" ##- import_playbook: tasks/test-password-less-login.yml - import_playbook: tasks/system-performance-tune.yml - import_playbook: tasks/zookeeper.yml - import_playbook: tasks/reboot-host-and-wait-for-host-up.yml host_list="{{ hostdict['zookeeper-hosts'] }}" max_wait_time_in_seconds=200 - import_playbook: tasks/notify.yml
#####下面是配置文件格式
--- #config file version-1.1.0 2018-08-22 project_root: #字典开头的空2格,列表开头的子项空2个空格. project_info: project_descripton: "Zookeeper集群离线自动化部署" version: "1.0" source_code: "your-git-download-link" created_date: "2017-06-01" author_list: - name: "作者" phone: "dianhua" email: "[email protected]" weixin: "todo" QQ: "todo" vsphere_platform: virtualbox: vagrant_offline_install_file: "vagrant_2.0.2_x86_64.rpm" virtualbox_offline_install_file: "VirtualBox-5.2-5.2.6_120293_el7-1.x86_64.rpm" vagrant_box_name: "centos1708-kernel4.4.116-docker-17.12.0-jre9-ce-go1.9" vmware_esxi: vcenterhostname: "" #vcenter.yourdomain.com 如果域名没有解析,在执行机器上设置hosts也可以 vcenterusername: "[email protected]" vcenterpassword: "" datacenter: "" default_datastore: "cw_m4_sas_datastore" #"cw_m4_pcie_datastore2 cw_m4_sas_datastore" template: "centos1611_docker_jdk8_template" virtual_machine_template_disk_size_in_gb: 30 resource_pool: "hadoopcluster" folder: "/vm" dnsserver1: "10.20.1.1" #这个是create-dns-record.yml 里面要访问到的IP,也是dns-host[0].ip dnsserver2: "114.114.114.114" state: "poweredon" esxi_nic_network: vlan: "VM Network" #"192.100.x.x" gateway: "10.20.0.1" # sudo route add -net 11.23.3.0 -netmask 255.255.255.128 11.23.3.1 netmask: "255.255.0.0" dnsserver1: "10.20.1.1" dnsserver2: "114.114.114.114" datastore: rabbitmq_datastore: "cw_m4_sas_datastore" vmware_workstation: openstack: huawei_fusion_vsphere: deploy_vsphere_platform: "vmware_esxi" common: vm_platform: "vmware-vsphere" #vagrant, vmware-vsphere,huawei-vsphere period_force_time_sync: "yes" nic_name: "eens160" #ens160 enp0s3 is_internet_up: false rabbitmq_datastore: "cw_m4_sas_datastore" software_root_dir: "/var/server" #这个跟下面的配置是相关的,如果修改了, 下面相关的目录必须跟着改. citybox_work_network: vlan: "10.20.0.0_10G-port" #"10.20.x.x" gateway: "10.20.0.1" #10.20.1.1 to do netmask: "255.255.0.0" dnsserver1: "10.20.1.1" dnsserver2: "114.114.114.114" network: "10.20.0.0/16" host_config: mail_agent_info: host: "smtp.mxhichina.com" secure_smtp_port_ipv4: "465" secure: "always" username: "[email protected]" password: "" sender: "[email protected]" mail_notify_info: receiver_name: "Happy" to: "[email protected]" bcc: "[email protected]" cc: "[email protected]" charset: "utf-8" subject: "Ansible 自动创建Hadoop集群报告" body: "项目Hadoop集群已经创建成功." dns_config: zone: "yourdomain.com" key_name: "yourdomain.com" dns_update_key: "" docker_config: docker_default_data_path: "/var/lib/docker" docker_data_folder_name: "docker-data" # 默认放在 /var/server目录下 vagrant_config: app_home: "/Volumes/linyingjie/mesos-test" # "/var/server/mesos-test" # virtualbox_template_file_path: "centos1708-kernel4.4.116-docker-17.12.0-jre9-ce-go1.9.box" virtualbox_template_name: "centos1708-kernel4.4.116-docker-17.12.0-jre9-ce-go1.9" vm_bridge_nic_name: "ens1f0" java_config: #app_home: "/var/server/jre" #jre-8u181-linux-x64.tar.gz jre_targz: "jre-8u181-linux-x64.tar.gz" #jre-10.0.1_linux-x64_bin.tar.gz #tar -zxvf jre-9.0.4_linux-x64_bin.tar.gz -C jre9 --strip-components=1 jre_foldername: "jre" jre_version: "1.8" jdk_targz: "jdk-8u131-linux-x64.tar.gz" jdk_foldername: "jdk" jdk_version: "1.8" go_config: app_home: "/var/server/go" app_foldername: "go" install_filename: "go1.10.linux-amd64.tar.gz" version: "1.10" ansible_config: app_home: "/var/server/ansible" app_foldername: "ansible" install_filename_rpm_tgz: "ansible-offline-install-2.6.0.rpms.tgz" version: "2.6.0" ntp_config: app_home: "/var/server/ntp" timezone: "Asia/Shanghai" port: "123" ntp_server_list: - hostname: 10.20.1.1 command: iburst - hostname: 1.asia.pool.ntp.org command: iburst # - hostname: 0.asia.pool.ntp.org # command: iburst # # - hostname: 1.asia.pool.ntp.org # command: iburst zookeeper_config: zookeeper_username: "zookeeper" zookeeper_salt_password: "$1$SomeSalt$.uTwnphKwuihqy2S2/v2l/" root_salt_password: "$1$SomeSalt$.uTwnphKwuihqy2S2/v2l/" app_home: "/var/server/zookeeper" zookeeper_tgz: "zookeeper-3.5.4-beta.tar.gz" docker_image_name: "docker.yourdomain.com/ascs/zookeeper" docker_image_version: "3.5.3-beta-alpine" docker_compressed_image_tgz: "zookeeper-3.5.3-beta-alpine.image.tgz" #特别注意下面是跟镜像有关系的,不同的镜像路径可能不一样. conf_dir: "/var/server/zookeeper/conf" data_dir: "/var/server/zookeeper/data" data_log_dir: "/var/server/zookeeper/log" # conf_dir: "/conf" # data_dir: "/data" # data_log_dir: "/datalog" open_port_list: - port_type: tcp port_number: 2181 immediate: True permanent: True state: enabled # 有4个选项 enabled, disabled, present, absent description: "" - port_type: tcp port_number: 2888 immediate: True permanent: True state: enabled # 有4个选项 enabled, disabled, present, absent description: "" - port_type: tcp port_number: 3888 immediate: True permanent: True state: enabled # 有4个选项 enabled, disabled, present, absent description: "" zookeeper_client_connection_tcp_port_ipv4: "2181" zookeeper_peer_communication_tcp_port_ipv4: "2888" zookeeper_leader_select_tcp_port_ipv4: "3888" #ENV ZOO_USER=zookeeper \ # ZOO_CONF_DIR=/conf \ # ZOO_DATA_DIR=/data \ # ZOO_DATA_LOG_DIR=/datalog \ # ZOO_PORT=2181 \ # ZOO_TICK_TIME=2000 \ # ZOO_INIT_LIMIT=5 \ # ZOO_SYNC_LIMIT=2 \ # ZOO_MAX_CLIENT_CNXNS=60 \ # ZOO_STANDALONE_ENABLED=false a_4lw_commands_whitelist: "stat, ruok, conf, isro,wchs, wchc, wchp, cons, dump, envi, reqs" # 使用echo ruok|nc 127.0.0.1 2181 测试是否启动了该Server,若回复imok表示已经启动。 are you ok=ruok # echo dump| nc 127.0.0.1 2181 ,列出未经处理的会话和临时节点。 # echo kill | nc 127.0.0.1 2181 ,关掉server # echo conf | nc 127.0.0.1 2181 ,输出相关服务配置的详细信息。 # echo cons | nc 127.0.0.1 2181 ,列出所有连接到服务器的客户端的完全的连接 / 会话的详细信息。 # echo envi |nc 127.0.0.1 2181 ,输出关于服务环境的详细信息(区别于 conf 命令)。 # echo reqs | nc 127.0.0.1 2181 ,列出未经处理的请求。 # echo wchs | nc 127.0.0.1 2181 ,列出服务器 watch 的详细信息。 # echo wchc | nc 127.0.0.1 2181 ,通过 session 列出服务器 watch 的详细信息,它的输出是一个与 watch 相关的会话的列表。 # echo wchp | nc 127.0.0.1 2181 ,通过路径列出服务器 watch 的详细信息。它输出一个与 session 相关的路径。 all_hosts: zookeeper-hosts: - name: "zkb1.yourdomain.com" uuid: "zkb1.yourdomain.com" ip: "10.20.3.51" cpu: "1" memory: "4096" # 600MB 以上 disk: 30 username: "root" password: "yourpassword" datastore: "cw_m4_pcie_datastore1" host_machine: "192.168.3.11" ismaster: true - name: "zkb2.yourdomain.com" uuid: "zkb2.yourdomain.com" ip: "10.20.3.52" cpu: "1" memory: "4096" disk: 30 username: "root" password: "yourpassword" datastore: "cw_m4_pcie_datastore2" host_machine: "192.168.3.11" ismaster: true - name: "zkb3.yourdomain.com" uuid: "zkb3.yourdomain.com" ip: "10.20.3.53" cpu: "1" memory: "4096" disk: 30 username: "root" password: "yourpassword" datastore: "cw_m4_pcie_datastore1" host_machine: "192.168.3.11" ismaster: true - name: "zkb4.yourdomain.com" uuid: "zkb4.yourdomain.com" ip: "10.20.3.54" cpu: "1" memory: "4096" disk: 30 username: "root" password: "yourpassword" datastore: "cw_m4_pcie_datastore2" host_machine: "192.168.3.11" ismaster: true - name: "zkb5.yourdomain.com" uuid: "zkb5.yourdomain.com" ip: "10.20.3.55" cpu: "1" memory: "4096" disk: 30 username: "root" password: "yourpassword" datastore: "cw_m4_pcie_datastore1" host_machine: "192.168.3.11" ismaster: true
[root@zkb3 ~]# more /etc/hosts
# Ansible managed
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
#格式类似
#192.168.12.21 master.yourdomain master
10.20.3.51 zkb1.yourdomain zkb1
10.20.3.52 zkb2.yourdomain zkb2
10.20.3.53 zkb3.yourdomain zkb3
10.20.3.54 zkb4.yourdomain zkb4
10.20.3.55 zkb5.yourdomain zkb5
happy:~ happy$ echo stat | nc 10.20.3.51 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51162[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16
happy:~ happy$ echo stat | nc 10.20.3.52 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51163[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16
happy:~ happy$ echo stat | nc 10.20.3.53 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51164[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: leader
Node count: 16
Proposal sizes last/min/max: 32/32/32
happy:~ happy$ echo stat | nc 10.20.3.54 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51167[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16
happy:~ happy$ echo stat | nc 10.20.3.55 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
/192.168.2.33:51169[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16