zookeeper集群ansible自动化部署实战

zookeeper 本质上是一个分布式实时key-value存储数据库。在现代商业系统中,部署很广泛。

之前,搭过很多次zookeeper,  虽然不难,但是步骤比较繁琐,每次搭建5个节点都要耗费至少1小时时间。

后来采用ansible部署,写成了一个独立的ansible role,变成一个标准。 使用者只要配置一下参数即可,使用起来非常方便,

我在实际使用,全程部署:8分钟,100%成功,不依赖于部署者的心情。 实际部署速度主要取决于网络速度,和部署的节点数量。

一般商业使用,建议至少部署5个节点。3个节点虽然可以使用,但是比较脆弱。

 

部署过程和环境要求描述如下:

相关代码可以查看我的github   https://github.com/HappyFreeAngel/zookeeper-cluster-offline-install.git

 

Zookeeper 集群部署环境描述
组件名称 版本 是否必须 下载链接
操作系统 centos7 1608  见官网
JDK  jdk8 见官方
zookeeper   https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.5.4-beta/zookeeper-3.5.4-beta.tar.gz
lsof    
nc    
ssh-passwordless-login 1.0.0  https://github.com/HappyFreeAngel/passwordless-ssh-login.git
序号 虚拟机名称 IP     
1 zkb1 10.20.2.51    
2 zkb2 10.20.2.51    
3 zkb3 10.20.2.51    
4 zkb4 10.20.2.51    
5 zkb5 10.20.2.51    

 

安装过程描述:

1. 首先准备 要安装的虚拟机或物理机. 创建相应的机器,设置好IP地址,确保能互相ping 通.

2. 部署zookeeper

3. 测试,确认部署成功.

 

- name: zookeeper-cluster offline install playbook include many books.
  hosts: localhost
  gather_facts: False

#  become: yes
#  become_method: sudo

  vars:
    projectinfo: "{{ lookup('file','input.yml') | from_yaml }}"
    vm_host_list: []
    domain_group_dict: {}

  pre_tasks:
    - set_fact: task_startup_timestamp="{{lookup('pipe','date \"+%Y-%m-%d %H:%M:%S\"')}}"

    - name: "这个是在每个任务执行之前会执行的任务."
      shell: echo "任务开始...,检查依赖的文件是否存在."; ./before-run.sh;

    - name: "检查本地项目文件夹里的文件是否存在"
      shell: ./check-file-exist-status.sh
      register: files_status

    - name: "if stdout check failed,interrupt execution"
      fail: msg="出错了,有文件的链接失效,文件不存在"
      when: '"does not exist" in files_status.stdout'

    - name: "检查role依赖是否正常,版本是否正确"  #todo
      shell: ./check-role-dependency.sh
      register: role_dependency_status

    - name: "role依赖缺失"
      fail: msg="role依赖存在问题"
      when: '"role does not exist" in role_dependency_status.stdout'

    - name: "set projectroot short hand hostdict"
      set_fact: projectroot="{{projectinfo['project_root']}}"

    - name: "set commonsetting short hand vars"
      set_fact: commonsetting="{{projectroot['common']}}"

    - name: "set hostdict short hand vars"
      set_fact: hostdict="{{projectroot['all_hosts']}}"

    - name: "set hostconfig short hand vars"
      set_fact: hostconfig="{{projectroot['host_config']}}"

    - name: "set hostconfig short hand vars"
      set_fact: zookeeperconfig="{{projectroot['host_config']['zookeeper_config']}}"

    - name: "vcenterconfig"
      set_fact: vcenterconfig="{{projectroot['vsphere_platform']['vmware_esxi']}}"

    - name: "set fact"
      set_fact: virtualbox_template_name="{{projectroot['host_config']['vagrant_config']['virtualbox_template_name']}}"
      when: projectroot['deploy_vsphere_platform']=='vmware_esxi'

    - name: "set fact"
      set_fact: vm_bridge_nic_name="eth1"

    - name: "批量合并列表合并对象"
      set_fact: vm_host_list="{{ vm_host_list }} + {{ hostdict[item] }}"
      with_items: "{{hostdict.keys()}}"
      when: hostdict[item][0].ismaster == true

    - name: "生成临时group-domain-ip映射表文本文件/tmp/group_domain_ip.txt"
      template: src=templates/group_domain_ip_user_password.txt.j2 dest=/tmp/group_domain_ip_user_password.txt

    - name: "把/tmp/group_domain_ip_user_password.txt内容放到注册变量里"
      shell: cat /tmp/group_domain_ip_user_password.txt
      register: group_domain_ip_user_password

    #注意密码和用户名不能包含:和逗号,否则就出错了,因为分割符号是,:无法正确分割..
    #hadoop-namenode-hosts:hadoop-namenode1.yourdomain.com:10.20.2.1:centos:yourpassword,hadoop-namenode-hosts:hadoop-namenode2.yourdomain.com:10.20.2.2:centos:yourpassword,hadoop-namenode-hosts:hadoop-namenode3.yourdomain.com:10.20.2.3:centos:yourpassword,hadoop-datanode-hosts:hadoop-datanode1.yourdomain.com:10.20.2.11:centos:yourpassword,hadoop-datanode-hosts:hadoop-datanode2.yourdomain.com:10.20.2.12:centos:yourpassword,hadoop-datanode-hosts:hadoop-datanode3.yourdomain.com:10.20.2.13:centos:yourpassword
    - set_fact: group_domain_ip_user_password_list={{ group_domain_ip_user_password.stdout.split(',') }}

    - add_host:
        hostname: "{{item.split(':')[1]}}"
        groups: "{{item.split(':')[0]}}"
        ansible_host: "{{item.split(':')[2]}}"
       # ansible_port: 22
        ansible_user: "{{item.split(':')[3]}}"
        ansible_ssh_pass: "{{item.split(':')[4]}}"
      with_items: "{{group_domain_ip_user_password_list}}"
    #特别注意,这里都是root 用户,hadoop 用户还没有创建.
    - name: "set short hand vars"
      set_fact: dnsconfig="{{hostconfig['dns_config']}}"


    - name: "动态创建/修改DNS 记录 (DDNS) 当域名没有解析或解析不正确时才添加解析. the current host is {{ansible_hostname}}. create A record {{ item.name }}-->ip:{{ item.ip }}"
      nsupdate:
        key_name: "{{dnsconfig['key_name']}}"
        key_secret: "{{dnsconfig['dns_update_key']}}"
        server: "{{commonsetting['citybox_work_network']['dnsserver1']}}"
        zone: "{{dnsconfig['zone']}}"
        record: "{{item.name.split('.')[0]}}"
        value: "{{ item.ip }}"
      with_items: "{{hostdict['zookeeper-hosts']}}"
      when:  lookup('dig', item.name) != item.ip



    #顶层的playbook include,不是task include
  roles:
    - role: vmware-del-vm
      user_vcenterconfig: "{{ vcenterconfig }}"
      user_host_list: "{{ hostdict['zookeeper-hosts'] }}"  #这个名称不能用appconfig,会冲突.
      async: 300
      poll: 0
      when: projectroot['deploy_vsphere_platform']=="vmware_esxi"
 #     when: inventory_hostname.find('zookeeper')!=-1

    - role: vmware-create-vm
      user_vcenterconfig: "{{ vcenterconfig }}"
      user_host_list:  "{{ hostdict['zookeeper-hosts'] }}"  #这个名称不能用user_host_list,会冲突.
      user_vm_network: "{{commonsetting['citybox_work_network']}}"
      async: 600
      poll: 0
      when: projectroot['deploy_vsphere_platform']=="vmware_esxi"

    - role: wait-in-second
      max_wait_time_in_seconds: "{{ (hostdict['zookeeper-hosts'] | length | int )* 30  + 150 }}"

    - role: vmware-poweredon-vm
      user_vcenterconfig: "{{ vcenterconfig }}"
      user_host_list: "{{ hostdict['zookeeper-hosts'] }}"
      async: 240
      poll: 0

    - role: waitfor-vm-startup
      max_wait_time_in_seconds: "{{ (hostdict['zookeeper-hosts'] | length | int )* 30  + 150 }}"
      user_host_list: "{{ hostdict['zookeeper-hosts'] }}"

    - role: system-storage-increase
      host_list: "{{ hostdict['zookeeper-hosts'] }}"
      target_device: "/dev/sda"
      virtual_machine_template_disk_size_in_gb: "{{ vcenterconfig['virtual_machine_template_disk_size_in_gb'] }}"
      file_system: "xfs"
      mount_dir: "/var/server"

    - role: dns-resolve
      host_list: "{{ hostdict['zookeeper-hosts'] }}"
      dns_server_ip: "{{vcenterconfig['dnsserver1']}}"

    - role: dns-resolve
      host_list: "{{ hostdict['zookeeper-hosts'] }}"
      dns_server_ip: "8.8.8.8"

##- import_playbook: tasks/test-password-less-login.yml

- import_playbook: tasks/system-performance-tune.yml
- import_playbook: tasks/zookeeper.yml
- import_playbook: tasks/reboot-host-and-wait-for-host-up.yml host_list="{{ hostdict['zookeeper-hosts'] }}" max_wait_time_in_seconds=200
- import_playbook: tasks/notify.yml

 

 

#####下面是配置文件格式

---    #config file version-1.1.0 2018-08-22
  project_root:  #字典开头的空2格,列表开头的子项空2个空格.
    project_info:
      project_descripton: "Zookeeper集群离线自动化部署"
      version: "1.0"
      source_code: "your-git-download-link"
      created_date: "2017-06-01"
      author_list:
        - name: "作者"
          phone: "dianhua"
          email: "[email protected]"
          weixin: "todo"
          QQ: "todo"

    vsphere_platform:
      virtualbox:
        vagrant_offline_install_file: "vagrant_2.0.2_x86_64.rpm"
        virtualbox_offline_install_file: "VirtualBox-5.2-5.2.6_120293_el7-1.x86_64.rpm"
        vagrant_box_name: "centos1708-kernel4.4.116-docker-17.12.0-jre9-ce-go1.9"

      vmware_esxi:
        vcenterhostname: ""      #vcenter.yourdomain.com 如果域名没有解析,在执行机器上设置hosts也可以
        vcenterusername: "[email protected]"
        vcenterpassword: ""
        datacenter: ""
        default_datastore: "cw_m4_sas_datastore"    #"cw_m4_pcie_datastore2 cw_m4_sas_datastore"
        template: "centos1611_docker_jdk8_template"
        virtual_machine_template_disk_size_in_gb: 30
        resource_pool: "hadoopcluster"
        folder: "/vm"

        dnsserver1: "10.20.1.1"   #这个是create-dns-record.yml 里面要访问到的IP,也是dns-host[0].ip
        dnsserver2: "114.114.114.114"
        state: "poweredon"

        esxi_nic_network:
          vlan: "VM Network"      #"192.100.x.x"
          gateway: "10.20.0.1"  # sudo route  add -net 11.23.3.0 -netmask 255.255.255.128 11.23.3.1
          netmask: "255.255.0.0"
          dnsserver1: "10.20.1.1"
          dnsserver2: "114.114.114.114"

        datastore:
          rabbitmq_datastore: "cw_m4_sas_datastore"

      vmware_workstation:

      openstack:

      huawei_fusion_vsphere:

    deploy_vsphere_platform: "vmware_esxi"
    common:
      vm_platform: "vmware-vsphere"  #vagrant, vmware-vsphere,huawei-vsphere
      period_force_time_sync: "yes"
      nic_name: "eens160" #ens160 enp0s3
      is_internet_up: false

      rabbitmq_datastore: "cw_m4_sas_datastore"
      software_root_dir: "/var/server"    #这个跟下面的配置是相关的,如果修改了, 下面相关的目录必须跟着改.
      citybox_work_network:
        vlan: "10.20.0.0_10G-port"  #"10.20.x.x"
        gateway: "10.20.0.1" #10.20.1.1 to do
        netmask: "255.255.0.0"
        dnsserver1: "10.20.1.1"
        dnsserver2: "114.114.114.114"
        network: "10.20.0.0/16"

    host_config:
      mail_agent_info:
        host: "smtp.mxhichina.com"
        secure_smtp_port_ipv4: "465"
        secure: "always"
        username: "[email protected]"
        password: ""
        sender: "[email protected]"

      mail_notify_info:
        receiver_name: "Happy"
        to: "[email protected]"
        bcc: "[email protected]"
        cc: "[email protected]"
        charset: "utf-8"
        subject: "Ansible 自动创建Hadoop集群报告"
        body: "项目Hadoop集群已经创建成功."

      dns_config:
        zone: "yourdomain.com"
        key_name: "yourdomain.com"
        dns_update_key: ""

      docker_config:
        docker_default_data_path: "/var/lib/docker"
        docker_data_folder_name: "docker-data"   # 默认放在 /var/server目录下

      vagrant_config:
        app_home: "/Volumes/linyingjie/mesos-test"  #  "/var/server/mesos-test" #
        virtualbox_template_file_path: "centos1708-kernel4.4.116-docker-17.12.0-jre9-ce-go1.9.box"
        virtualbox_template_name: "centos1708-kernel4.4.116-docker-17.12.0-jre9-ce-go1.9"
        vm_bridge_nic_name: "ens1f0"

      java_config:
        #app_home: "/var/server/jre"   #jre-8u181-linux-x64.tar.gz
        jre_targz: "jre-8u181-linux-x64.tar.gz"  #jre-10.0.1_linux-x64_bin.tar.gz #tar -zxvf jre-9.0.4_linux-x64_bin.tar.gz  -C jre9 --strip-components=1
        jre_foldername: "jre"
        jre_version: "1.8"


        jdk_targz: "jdk-8u131-linux-x64.tar.gz"
        jdk_foldername: "jdk"
        jdk_version: "1.8"

      go_config:
        app_home: "/var/server/go"
        app_foldername: "go"
        install_filename: "go1.10.linux-amd64.tar.gz"
        version: "1.10"

      ansible_config:
        app_home: "/var/server/ansible"
        app_foldername: "ansible"
        install_filename_rpm_tgz: "ansible-offline-install-2.6.0.rpms.tgz"
        version: "2.6.0"

      ntp_config:
        app_home: "/var/server/ntp"
        timezone: "Asia/Shanghai"
        port: "123"
        ntp_server_list:
          - hostname: 10.20.1.1
            command: iburst

          - hostname: 1.asia.pool.ntp.org
            command: iburst

#          - hostname: 0.asia.pool.ntp.org
#            command: iburst
#
#          - hostname: 1.asia.pool.ntp.org
#            command: iburst

      zookeeper_config:
        zookeeper_username: "zookeeper"
        zookeeper_salt_password: "$1$SomeSalt$.uTwnphKwuihqy2S2/v2l/"
        root_salt_password: "$1$SomeSalt$.uTwnphKwuihqy2S2/v2l/"


        app_home: "/var/server/zookeeper"
        zookeeper_tgz: "zookeeper-3.5.4-beta.tar.gz"
        docker_image_name: "docker.yourdomain.com/ascs/zookeeper"
        docker_image_version: "3.5.3-beta-alpine"
        docker_compressed_image_tgz: "zookeeper-3.5.3-beta-alpine.image.tgz"

        #特别注意下面是跟镜像有关系的,不同的镜像路径可能不一样.
        conf_dir: "/var/server/zookeeper/conf"
        data_dir: "/var/server/zookeeper/data"
        data_log_dir: "/var/server/zookeeper/log"
#        conf_dir: "/conf"
#        data_dir: "/data"
#        data_log_dir: "/datalog"

        open_port_list:
          - port_type: tcp
            port_number: 2181
            immediate: True
            permanent: True
            state: enabled # 有4个选项 enabled, disabled, present, absent
            description: ""

          - port_type: tcp
            port_number: 2888
            immediate: True
            permanent: True
            state: enabled # 有4个选项 enabled, disabled, present, absent
            description: ""

          - port_type: tcp
            port_number: 3888
            immediate: True
            permanent: True
            state: enabled # 有4个选项 enabled, disabled, present, absent
            description: ""

        zookeeper_client_connection_tcp_port_ipv4: "2181"
        zookeeper_peer_communication_tcp_port_ipv4: "2888"
        zookeeper_leader_select_tcp_port_ipv4: "3888"
         #ENV ZOO_USER=zookeeper \
         #    ZOO_CONF_DIR=/conf \
         #    ZOO_DATA_DIR=/data \
         #    ZOO_DATA_LOG_DIR=/datalog \
         #    ZOO_PORT=2181 \
         #    ZOO_TICK_TIME=2000 \
         #    ZOO_INIT_LIMIT=5 \
         #    ZOO_SYNC_LIMIT=2 \
         #    ZOO_MAX_CLIENT_CNXNS=60 \
         #    ZOO_STANDALONE_ENABLED=false
        a_4lw_commands_whitelist: "stat, ruok, conf, isro,wchs, wchc, wchp, cons, dump, envi, reqs"

#        使用echo ruok|nc 127.0.0.1 2181 测试是否启动了该Server,若回复imok表示已经启动。 are you ok=ruok
#        echo dump| nc 127.0.0.1 2181 ,列出未经处理的会话和临时节点。
#        echo kill | nc 127.0.0.1 2181 ,关掉server
#        echo conf | nc 127.0.0.1 2181 ,输出相关服务配置的详细信息。
#        echo cons | nc 127.0.0.1 2181 ,列出所有连接到服务器的客户端的完全的连接 / 会话的详细信息。
#        echo envi |nc 127.0.0.1 2181 ,输出关于服务环境的详细信息(区别于 conf 命令)。
#        echo reqs | nc 127.0.0.1 2181 ,列出未经处理的请求。
#        echo wchs | nc 127.0.0.1 2181 ,列出服务器 watch 的详细信息。
#        echo wchc | nc 127.0.0.1 2181 ,通过 session 列出服务器 watch 的详细信息,它的输出是一个与 watch 相关的会话的列表。
#        echo wchp | nc 127.0.0.1 2181 ,通过路径列出服务器 watch 的详细信息。它输出一个与 session 相关的路径。

    all_hosts:
      zookeeper-hosts:
      - name: "zkb1.yourdomain.com"
        uuid: "zkb1.yourdomain.com"
        ip: "10.20.3.51"
        cpu: "1"
        memory: "4096"  # 600MB 以上
        disk: 30
        username: "root"
        password: "yourpassword"
        datastore: "cw_m4_pcie_datastore1"
        host_machine: "192.168.3.11"
        ismaster: true

      - name: "zkb2.yourdomain.com"
        uuid: "zkb2.yourdomain.com"
        ip: "10.20.3.52"
        cpu: "1"
        memory: "4096"
        disk: 30
        username: "root"
        password: "yourpassword"
        datastore: "cw_m4_pcie_datastore2"
        host_machine: "192.168.3.11"
        ismaster: true

      - name: "zkb3.yourdomain.com"
        uuid: "zkb3.yourdomain.com"
        ip: "10.20.3.53"
        cpu: "1"
        memory: "4096"
        disk: 30
        username: "root"
        password: "yourpassword"
        datastore: "cw_m4_pcie_datastore1"
        host_machine: "192.168.3.11"
        ismaster: true

      - name: "zkb4.yourdomain.com"
        uuid: "zkb4.yourdomain.com"
        ip: "10.20.3.54"
        cpu: "1"
        memory: "4096"
        disk: 30
        username: "root"
        password: "yourpassword"
        datastore: "cw_m4_pcie_datastore2"
        host_machine: "192.168.3.11"
        ismaster: true

      - name: "zkb5.yourdomain.com"
        uuid: "zkb5.yourdomain.com"
        ip: "10.20.3.55"
        cpu: "1"
        memory: "4096"
        disk: 30
        username: "root"
        password: "yourpassword"
        datastore: "cw_m4_pcie_datastore1"
        host_machine: "192.168.3.11"
        ismaster: true

 

 

 

[root@zkb3 ~]# more /etc/hosts

# Ansible managed

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

 

#格式类似

#192.168.12.21 master.yourdomain  master

10.20.3.51  zkb1.yourdomain  zkb1

10.20.3.52  zkb2.yourdomain  zkb2

10.20.3.53  zkb3.yourdomain  zkb3

10.20.3.54  zkb4.yourdomain  zkb4

10.20.3.55  zkb5.yourdomain  zkb5

zookeeper集群ansible自动化部署实战_第1张图片

 

 

happy:~ happy$ echo stat | nc 10.20.3.51 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
 /192.168.2.33:51162[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16
happy:~ happy$ echo stat | nc 10.20.3.52 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
 /192.168.2.33:51163[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16
happy:~ happy$ echo stat | nc 10.20.3.53 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
 /192.168.2.33:51164[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: leader
Node count: 16
Proposal sizes last/min/max: 32/32/32
happy:~ happy$ echo stat | nc 10.20.3.54 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
 /192.168.2.33:51167[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16
happy:~ happy$ echo stat | nc 10.20.3.55 2181
Zookeeper version: 3.5.4-beta-7f51e5b68cf2f80176ff944a9ebd2abbc65e7327, built on 05/11/2018 16:27 GMT
Clients:
 /192.168.2.33:51169[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x300000002
Mode: follower
Node count: 16

 

 

 

你可能感兴趣的:(常见IT基础,常用工具,Ansible)