Ansible使用之playbooks大法

Playbooks 使用指南

1.主机与用户

在yml文件中hosts指定主机组或者满足主机的patterns，以逗号分隔;
remote_user指定以远程用户执行；sudo指定远程用户使用sudo权限执行命令
注意:在每一个task中也可以定义自己的远程用户.也可以在一个task中使用sudo，而全局不使用sudo

---
- hosts: webservers
  remote_user: pe
  vars:
    http_port: 80
    max_clients: 200
#  sudo: yes
  tasks:
    - name:测试连通性
      ping:
      sudo: yes
    - name:重启nginx服务{{ http_port }}   #在全局定义了vars变量之后，可以在任何一地方进行引用
      template: src=/srv/nginx.j2 dest=/etc/nginx.conf  #同步nginx配置文件
      #service: name=nginx state=restarted
      sudo: yes         #在子任务中使用sudo
      sudo_user:supdev      #使用sudo去切换到其他用户执行
      notify:           #当检测文件被修改之后执行下面的语句
        - restart nginx         #改restart nginx语句被在最后的handlers中定义


    - name:修改selinux配置
      command: /sbin/setenforce 0
      shell: /usr/bin/somecommand || /bin/true  #如果成功执行命令的返回码不是0，可以这样做
      ignore_errors: True           #上面的shell模块执行也可以使用该参数
    - name: 拷贝文件
      copy: src=/etc/ansible/hosts dest=/etc/ansible/hosts
        owner=root group=root mode=0644 
    handlers:
      - name: restart nginx
    service: name=nginx state=restarted

注意:如果使用sudo时需要指定密码，可以在运行的ansible-playbook命令时加上ask-sudo-pass

2.Tasks 列表

注意:

1.每一个play中包含了一个task列表，一个task在其对应的所有主机执行完毕之后,下一个task才会执行;
2.在运行playbooks时是按照从上到下的顺序进行的，如果一个hosts执行task失败，这个hosts将会从整个playbook的rotation中移除.
3.每个task的目标在于执行一个module,通常是带有特定的参数来执行，在参数中可以使用变量(variables)
Exapmle:shell,command,user,template(copy),service,yum等模块，后面接对应模块的一些参数
4.每个task必须有一个name，这样在运行时，可以很好的辨别每个task执行的详细信息

3.Handlers在发生改变时执行的操作

一个task中定义了了配置文件的更改，当notify模块检测到文件有改动之后执行handlers中的操作

- name: template configuration file
  template: src=template.j2 dest=/etc/foo.conf
  notify:
     - restart memcached
     - restart apache
handlers:
    - name: restart memcached
      service:  name=memcached state=restarted
    - name: restart apache
      service: name=apache state=restarted

注意:handlers会按照生命的顺序来执行。Handler最佳的应用场景就是用来重启服务，或者触发系统重启

4.运行一个playbook

ansible-playbook playbooks.yml -f 10 #并行运行ansible，并行级别为10

5.使用Ansible-Pull (拉取配置)

Ansible-pull 是一个小脚本,它从 git 上 checkout 一个关于配置指令的 repo,然后以这个配置指令来运行 ansible-playbook.

6.奇银技巧

在使用playbooks过程中,如果你想看到执行成功的 modules 的输出信息,使用 --verbose flag（否则只有执行失败的才会有输出信息
在执行一个 playbook 之前,想看看这个 playbook 的执行会影响到哪些 hosts,你可以这样做:

ansible-playbook playbook.yml --list-hosts

附录：Playbooks案例分析

1.使用playbooks进行应用jvm相关调整

目录结构：

sh-4.1$ tree 
.
├── playbooks.yml
├── start.sh.j2
├── stop.sh.j2
└── vars.yml

Playbooks.yml配置

---
#file: playbooks.yml
- hosts: local
#  remote_user: pe
#  sudo: yes
  vars:
    service: " Nginx服务"
  vars_files:
    - vars.yml
  tasks:
  - name: "{{ service }}测试联通性 {{ ansible_date_time.iso8601 }} "
    ping:
  - name: 更新tomcat启动配置
    remote_user: pe
    sudo: yes
    template:
#        src: "start.sh.j2"
#        dest: "/tmp/start{{ ansible_date_time.iso8601_basic }}.sh"
#        src: "stop.sh.j2"
#        dest: "/tmp/stop{{ ansible_date_time.iso8601_basic }}.sh"
        src: "{{ item.src }}"
        dest: "{{ item.dest }}"
        owner: admin
        group: admin
        mode: 0755
    with_items:
          - { src: "start.sh.j2", dest: "/tmp/start{{ ansible_date_time.iso8601 }}.sh" }
          - { src: "stop.sh.j2", dest: "/tmp/stop{{ ansible_date_time.iso8601 }}.sh" }

变量定义文件vars.yml

---
#定义tomcat_version
tomcat_version: tomcat6.0.33
#定义jdk_version
jdk_version: jdk1.6.0_25

#定义app_name
app_name: xxbandy.test.local
#定义server_id
server_id: 1

star.sh模板文件

#!/bin/bash

#chown 555 -R /export/home/tomcat/domains/
export CATALINA_HOME=/export/servers/{{ tomcat_version }}
export CATALINA_BASE=/export/Domains/{{ app_name }}/server{{ server_id }}
export CATALINA_PID=$CATALINA_BASE/work/catalina.pid
export LANG=zh_CN.UTF-8
###JAVA
export JAVA_HOME=/export/servers/{{ jdk_version }}
export JAVA_BIN=/export/servers/{{ jdk_version }}/bin
export PATH=$JAVA_BIN:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/bin
export CLASSPATH=.:/lib/dt.jar:/lib/tools.jar
export JAVA_OPTS="-Djava.library.path=/usr/local/lib -server -Xms2048m -Xmx2048m -XX:MaxPermSize=512m -XX:+UnlockExperimentalVMOptions -Djava.awt.headless=true -Dsun.net.client.defaultConnectTimeout=60000 -Dsun.net.client.defaultReadTimeout=60000 -Djmagick.systemclassloader=no -Dnetworkaddress.cache.ttl=300 -Dsun.net.inetaddr.ttl=300 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$CATALINA_BASE/logs -XX:ErrorFile=$CATALINA_BASE/logs/java_error_%p.log"
export JAVA_HOME JAVA_BIN PATH CLASSPATH JAVA_OPTS
$CATALINA_HOME/bin/startup.sh -config $CATALINA_BASE/conf/server.xml

2.使用playbooks进行docker监控客户端telegraf配置更新

telegraf.conf模板文件：

[global_tags]
  dc = "bigdata-1"
  # dc = "us-east-1" # will tag all metrics with dc=us-east-1
  # rack = "1a"
  ## Environment variables can be used as tags, and throughout the config file
  # user = "$USER"
[agent]
  ## Default data collection interval for all inputs
  #采集间隔时间
  interval = "10s"
  ## Rounds collection interval to 'interval'
  ## ie, if interval="10s" then always collect on :00, :10, :20, etc.
  #采用轮询时间间隔
  round_interval = true
  ## Telegraf will send metrics to outputs in batches of at
  ## most metric_batch_size metrics.
  #每次发送到output的度量大小
  metric_batch_size = 1000
  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  #为每一个output 设置缓存
  metric_buffer_limit = 10000
  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  #设置收集抖动时间，防止多个采集源数据同一时间都在队列
  collection_jitter = "0s"
  ## Default flushing interval for all outputs. You shouldn't set this below
  ## interval. Maximum flush_interval will be flush_interval + flush_jitter
  #默认所有数据flush到outputs的时间(最大能到flush_interval + flush_jitter)
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  # flush的抖动时间
  flush_jitter = "0s"
  ## By default, precision will be set to the same timestamp order as the
  ## collection interval, with the maximum being 1s.
  ## Precision will NOT be used for service inputs, such as logparser and statsd.
  ## Valid values are "ns", "us" (or "µs"), "ms", "s".
  precision = ""
  ## Run telegraf in debug mode
  debug = false
  ## Run telegraf in quiet mode
  quiet = false
  ## Override default hostname, if empty use os.Hostname()
  hostname = ""
  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false
[[outputs.influxdb]]
  ## The full HTTP or UDP endpoint URL for your InfluxDB instance.
  ## Multiple urls can be specified as part of the same cluster,
  ## this means that only ONE of the urls will be written to each interval.
  # urls = ["udp://localhost:8089"] # UDP endpoint example
  urls = ["http://10.0.0.1:8086"] # required
  ## The target database for metrics (telegraf will create it if not exists).
  database = "bigdata" # required
  ## Retention policy to write to. Empty string writes to the default rp.
  retention_policy = ""
  ## Write consistency (clusters only), can be: "any", "one", "quorum", "all"
  write_consistency = "any"
  ## Write timeout (for the InfluxDB client), formatted as a string.
  ## If not provided, will default to 5s. 0s means no timeout (not recommended).
  timeout = "5s"
  # username = "telegraf"
  # password = "metricsmetricsmetricsmetrics"
  ## Set the user agent for HTTP POSTs (can be useful for log differentiation)
  # user_agent = "telegraf"
  ## Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)
  # udp_payload = 512
  ## Optional SSL Config
  # ssl_ca = "/etc/telegraf/ca.pem"
  # ssl_cert = "/etc/telegraf/cert.pem"
  # ssl_key = "/etc/telegraf/key.pem"
  ## Use SSL but skip chain & host verification
  # insecure_skip_verify = false
[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## Comment this line if you want the raw CPU time metrics
  fielddrop = ["time_*"]
[[inputs.disk]]
  ## By default, telegraf gather stats for all mountpoints.
  ## Setting mountpoints will restrict the stats to the specified mountpoints.
  mount_points = ["/export"]
  fieldpass = ["inodes*"]
  ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
  ## present on /run, /var/run, /dev/shm or /dev).
  ## By default, telegraf will gather stats for all devices including
  ## disk partitions.
  ## Setting devices will restrict the stats to the specified devices.
  # devices = ["sda", "sdb"]
  ## Uncomment the following line if you need disk serial numbers.
  # skip_serial_number = false
  # no configuration
[[inputs.mem]]
  # no configuration
  # no configuration
  # no configuration
  # no configuration
[[inputs.docker]]
  endpoint = "tcp://127.0.0.1:5256"
  container_names = []

配置以及重启telegraf的playbook文件：

---
#file: playbooks.yml
- hosts: bigdata
  remote_user: root
  vars:
    service: "dockers telegraf update"
  tasks:
  - name: "{{ service }}测试联通性 {{ ansible_date_time.iso8601 }} "
    ping:
  tasks: 
  - name: "{{ service }} 更新配置文件"
    template:
      src: "telegraf.j2"
      dest: "/etc/telegraf/telegraf.conf"
    notify: restart telegraf

  handlers:
    - name: restart telegraf
      service: name=telegraf state=restarted

执行结果：

sh-4.2# ansible-playbook telegraf.yml 
 [WARNING]: While constructing a mapping from /export/ansible/telegraf.yml, line 3, column 3, found a duplicate dict key (tasks). Using last
defined value only.


PLAY [bigdata] *****************************************************************

TASK [setup] *******************************************************************
ok: [10.0.0.1]
ok: [10.0.0.2]
ok: [10.0.0.3]
ok: [10.0.0.4]
ok: [10.0.0.5]

TASK [dockers telegraf update 更新配置文件] ******************************************
ok: [10.0.0.1]
ok: [10.0.0.2]
ok: [10.0.0.3]
ok: [10.0.0.4]
ok: [10.0.0.5]

PLAY RECAP *********************************************************************
10.0.0.1             : ok=2    changed=0    unreachable=0    failed=0   
10.0.0.2              : ok=2    changed=0    unreachable=0    failed=0   
10.0.0.3             : ok=2    changed=0    unreachable=0    failed=0   
10.0.0.4              : ok=2    changed=0    unreachable=0    failed=0   
10.0.0.5             : ok=2    changed=0    unreachable=0    failed=0

因为配置文件是没有改动过的，因此不会触发后面的restart telegraf操作