How L3 and DHCP agents HA work in Red Hat OSP7
25 November 2015on OpenstackIn Red Hat Openstack Platform 7, l3-agent and dhcp-agents are running in active-active on each controller node, instead of active-standby in OSP6.
[root@overcloud-controller-2 ~]# pcs status | grep 'l3-agent\|dhcp-agent' -A1 Clone Set: neutron-l3-agent-clone [neutron-l3-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ] -- Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent] Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
Let's look into more details of how tenant dhcp server and vrouter work in HA.
dhcp-agent HA
Let's create a tenant network with dhcp enabled:
[root@overcloud-controller-2 ~]# neutron net-create testdhcp[root@overcloud-controller-2 ~]# neutron subnet-create --name subnet-testdhcp testdhcp 192.168.200.0/24
Before we have this line in /etc/neutron/neutron.conf
:
[root@overcloud-controller-0 ~]# grep ^dhcp_agents_per_network /etc/neutron/neutron.confdhcp_agents_per_network = 3
So here we have 3 dhcp servers running on 3 controller nodes, same namespace gets created on each of them.
[root@overcloud-controller-0 ~]# for i in 0 1 2; do ssh overcloud-controller-$i ip netns; doneqdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72 qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72 qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72
We could see 3 dnsmasq servers with same hosts file running on each controller:
[root@overcloud-controller-0 ~]# for i in 0 1 2; do echo "On overcloud-controller-$i:" ; ssh overcloud-controller-$i ip netns exec qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72 ps -ef|grep dnsmasq; doneOn overcloud-controller-0: root 1181 27974 0 09:05 pts/0 00:00:00 grep --color=auto dnsmasq nobody 6585 1 0 08:27 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tapf1509d9f-57 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host --addn-hosts=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/opts --dhcp-leasefile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/leases --dhcp-range=set:tag0,192.168.200.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal On overcloud-controller-1: nobody 6714 1 0 08:27 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap97155ac6-9b --except-interface=lo --pid-file=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host --addn-hosts=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/opts --dhcp-leasefile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/leases --dhcp-range=set:tag0,192.168.200.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal On overcloud-controller-2: nobody 21166 1 0 08:27 ? 00:00:00 dnsmasq --no-hosts --no-resolv --strict-order --bind-interfaces --interface=tap05b08c68-49 --except-interface=lo --pid-file=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/pid --dhcp-hostsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host --addn-hosts=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/addn_hosts --dhcp-optsfile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/opts --dhcp-leasefile=/var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/leases --dhcp-range=set:tag0,192.168.200.0,static,86400s --dhcp-lease-max=256 --conf-file=/etc/neutron/dnsmasq-neutron.conf --domain=openstacklocal [root@overcloud-controller-0 ~]# for i in 0 1 2; do echo "On overcloud-controller-$i:" ;ssh overcloud-controller-$i ip netns exec qdhcp-3f3f6372-0c96-4521-9d60-2524c139ab72 cat /var/lib/neutron/dhcp/3f3f6372-0c96-4521-9d60-2524c139ab72/host ; doneOn overcloud-controller-0: fa:16:3e:98:b5:e4,host-192-168-200-2.openstacklocal,192.168.200.2 fa:16:3e:f8:8c:4e,host-192-168-200-4.openstacklocal,192.168.200.4 fa:16:3e:d7:e4:22,host-192-168-200-3.openstacklocal,192.168.200.3 On overcloud-controller-1: fa:16:3e:98:b5:e4,host-192-168-200-2.openstacklocal,192.168.200.2 fa:16:3e:f8:8c:4e,host-192-168-200-4.openstacklocal,192.168.200.4 fa:16:3e:d7:e4:22,host-192-168-200-3.openstacklocal,192.168.200.3 On overcloud-controller-2: fa:16:3e:98:b5:e4,host-192-168-200-2.openstacklocal,192.168.200.2 fa:16:3e:f8:8c:4e,host-192-168-200-4.openstacklocal,192.168.200.4 fa:16:3e:d7:e4:22,host-192-168-200-3.openstacklocal,192.168.200.3
Also we could see 3 neutron ports for 3 dhcp server IPs:
[root@overcloud-controller-0 ~]# neutron port-list+--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+ | id | name | mac_address | fixed_ips | +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+ | 05b08c68-4963-4003-ba83-46175bb72d24 | | fa:16:3e:98:b5:e4 | {"subnet_id": "183a7323-a015-4eb5-8108-9e1295dfbe42", "ip_address": "192.168.200.2"} | | 97155ac6-9bc6-42ce-98a1-cb2c868868eb | | fa:16:3e:f8:8c:4e | {"subnet_id": "183a7323-a015-4eb5-8108-9e1295dfbe42", "ip_address": "192.168.200.4"} | | f1509d9f-5704-4114-a81b-e328d1076419 | | fa:16:3e:d7:e4:22 | {"subnet_id": "183a7323-a015-4eb5-8108-9e1295dfbe42", "ip_address": "192.168.200.3"} | +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------------+
Now we know in OSP7, tenant dhcp HA is achieved by running 3 dhcp servers at the same time, if there's one controller down, still other 2 dhcp servers are running and serving dhcp requests.
vRouter(l3-agent) HA
Let's create a vRouter
[root@overcloud-controller-0 ~]# neutron router-create testrouter Created a new router: +-----------------------+--------------------------------------+| Field | Value | +-----------------------+--------------------------------------+| admin_state_up | True | | distributed | False | | external_gateway_info | | | ha | True | | id | 2be95cbd-efee-4908-90cf-622fcef8cae8 | | name | testrouter | | routes | | | status | ACTIVE | | tenant_id | c5cb88bd612949a5afaed8acf79350ef | +-----------------------+--------------------------------------+
We can see ha
is true
, because of have this:
[root@overcloud-controller-0 ~]# grep ^l3_ha /etc/neutron/neutron.confl3_ha = True l3_ha_net_cidr = 169.254.192.0/18
And a "HA network" is created using 169.254.192.0/18
network as configured in neutron.conf
:
[root@overcloud-controller-0 ~]# neutron net-list+--------------------------------------+----------------------------------------------------+-------------------------------------------------------+ | id | name | subnets | +--------------------------------------+----------------------------------------------------+-------------------------------------------------------+ | 3f3f6372-0c96-4521-9d60-2524c139ab72 | testdhcp | 183a7323-a015-4eb5-8108-9e1295dfbe42 192.168.200.0/24 | | d160978b-fa7d-4d3e-bb45-a9ba1d98439f | HA network tenant c5cb88bd612949a5afaed8acf79350ef | 6ecc49ee-2dd9-47eb-9593-fa6fc55469a3 169.254.192.0/18 | +--------------------------------------+----------------------------------------------------+-------------------------------------------------------+
We can see 3 ports created for 3 controllers in this HA network:
[root@overcloud-controller-0 ~]# neutron router-port-list testrouter+--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+ | id | name | mac_address | fixed_ips | +--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+ | 2dba7c38-0547-4d77-806c-e9bf9acb8a55 | HA port tenant c5cb88bd612949a5afaed8acf79350ef | fa:16:3e:3d:5b:95 | {"subnet_id": "6ecc49ee-2dd9-47eb-9593-fa6fc55469a3", "ip_address": "169.254.192.2"} | | 5d00bea2-3dbe-4ea6-97bb-65ce74fb056b | HA port tenant c5cb88bd612949a5afaed8acf79350ef | fa:16:3e:fa:8c:8b | {"subnet_id": "6ecc49ee-2dd9-47eb-9593-fa6fc55469a3", "ip_address": "169.254.192.1"} | | 835b136c-ec0b-4f5d-a96c-145de332efb6 | HA port tenant c5cb88bd612949a5afaed8acf79350ef | fa:16:3e:e7:22:d0 | {"subnet_id": "6ecc49ee-2dd9-47eb-9593-fa6fc55469a3", "ip_address": "169.254.192.3"} | +--------------------------------------+-------------------------------------------------+-------------------+--------------------------------------------------------------------------------------+
Keeplived/VRRP process is running for this HA router:
[root@overcloud-controller-0 ~]# ps -ef | grep keepalivedneutron 22025 1 0 02:35 ? 00:00:00 /usr/bin/python2 /bin/neutron-keepalived-state-change --router_id=2be95cbd-efee-4908-90cf-622fcef8cae8 --namespace=qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 --conf_dir=/var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8 --monitor_interface=ha-835b136c-ec --monitor_cidr=169.254.0.1/24 --pid_file=/var/lib/neutron/external/pids/2be95cbd-efee-4908-90cf-622fcef8cae8.monitor.pid --state_path=/var/lib/neutron --user=998 --group=996 root 22041 20724 0 03:15 pts/0 00:00:00 grep --color=auto keepalived root 22047 1 0 02:35 ? 00:00:00 keepalived -P -f /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf -p /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid -r /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid-vrrp root 22049 22047 0 02:35 ? 00:00:00 keepalived -P -f /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf -p /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid -r /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8.pid-vrrp
Let's check keeplived.conf
of this HA router:
[root@overcloud-controller-0 ~]# cat /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.conf vrrp_instance VR_1 { state BACKUP interface ha-835b136c-ec virtual_router_id 1 priority 50 garp_master_repeat 5 garp_master_refresh 10 nopreempt advert_int 2 track_interface { ha-835b136c-ec } virtual_ipaddress { 169.254.0.1/24 dev ha-835b136c-ec }
We can see there's one internal virtual_ipaddress
, 169.254.0.1
defined for this router, and it now running on overcloud-controller-0
:
[root@overcloud-controller-0 ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a | grep " inet 16 " inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-835b136c-ec inet 169.254.0.1/24 scope global ha-835b136c-ec
From other 2 controllers, we can't see this VIP, only keeplived/VRRP IP available:
[root@overcloud-controller-0 ~]# ssh overcloud-controller-1 ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a | grep " inet 16" inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-2dba7c38-05[root@overcloud-controller-0 ~]# ssh overcloud-controller-2 ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a | grep " inet 16" inet 169.254.192.1/18 brd 169.254.255.255 scope global ha-5d00bea2-3d
This can be witnessed by neutron l3-agent-list-hosting-router
:
[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router testrouter+--------------------------------------+------------------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+------------------------------------+----------------+-------+----------+ | a7b76ad9-83a7-4c2c-a9ef-59e60e175a81 | overcloud-controller-1.localdomain | True | :-) | standby | | 85836d82-a187-424a-9816-a3db79e0bb8b | overcloud-controller-2.localdomain | True | :-) | standby | | 51e840f0-5d52-4b9c-a1a5-52b976abff7d | overcloud-controller-0.localdomain | True | :-) | active | +--------------------------------------+------------------------------------+----------------+-------+----------+
We can see the testrouter
is active on overcloud-controller-0
, standby on other 2 controllers.
Now let's add the testdhcp
network to testrouter
:
[root@overcloud-controller-0 ~]# neutron router-interface-add testrouter subnet-testdhcpAdded interface 946f1e38-2ef3-4747-8a81-60b14909d8c0 to router testrouter.
Check output of ip a
on overcloud-controller-0
, vrouter namespace:
[root@overcloud-controller-0 ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a |grep " inet " inet 127.0.0.1/8 scope host lo inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-835b136c-ec inet 169.254.0.1/24 scope global ha-835b136c-ec inet 192.168.200.1/24 scope global qr-946f1e38-2e
We can see now the gateway ip of testdhcp
network, 192.168.200.1
is running on overcloud-controller-0
.
Also we should be able to see keeplived.conf
gets updated:
[root@overcloud-controller-0 ~]# cat /var/lib/neutron/ha_confs/2be95cbd-efee-4908-90cf-622fcef8cae8/keepalived.confvrrp_instance VR_1 { state BACKUP interface ha-835b136c-ec virtual_router_id 1 priority 50 garp_master_repeat 5 garp_master_refresh 10 nopreempt advert_int 2 track_interface { ha-835b136c-ec } virtual_ipaddress { 169.254.0.1/24 dev ha-835b136c-ec } virtual_ipaddress_excluded { 192.168.200.1/24 dev qr-946f1e38-2e fe80::f816:3eff:fe1a:58d8/64 dev qr-946f1e38-2e scope link }
L3-agent HA working way can be illustrated as:
Let's see how vRouter switch over when active controller is down, now we shutdown active overcloud-controller-0
:
[root@overcloud-controller-0 ~]# shutdown nowConnection to 192.0.2.7 closed by remote host.
Now check who will take over the testrouter
:
[root@overcloud-controller-1 ~]# neutron l3-agent-list-hosting-router testrouter+--------------------------------------+------------------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+------------------------------------+----------------+-------+----------+ | a7b76ad9-83a7-4c2c-a9ef-59e60e175a81 | overcloud-controller-1.localdomain | True | :-) | active | | 85836d82-a187-424a-9816-a3db79e0bb8b | overcloud-controller-2.localdomain | True | :-) | standby | | 51e840f0-5d52-4b9c-a1a5-52b976abff7d | overcloud-controller-0.localdomain | True | xxx | active | +--------------------------------------+------------------------------------+----------------+-------+----------+
We can see overcloud-controller-0
is not alive anymore, and now overcloud-controller-1
is active. Let's check if virtual IP moves to new active node or not:
[root@overcloud-controller-1 ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a |grep " inet " inet 127.0.0.1/8 scope host lo inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-2dba7c38-05 inet 169.254.0.1/24 scope global ha-2dba7c38-05 inet 192.168.200.1/24 scope global qr-946f1e38-2e
We can see virtual IP 192.168.200.1
is now running on overcloud-controller-1
.
We still see for this vRouter, overcloud-controller-0
is still active although it's not alive, and virtual IP has moved to overcloud-controller-1
, now let's bring overcloud-controller-0
back online, to see what will happen.
When it's online, check again:
[root@overcloud-controller-0 ~]# neutron l3-agent-list-hosting-router testrouter+--------------------------------------+------------------------------------+----------------+-------+----------+ | id | host | admin_state_up | alive | ha_state | +--------------------------------------+------------------------------------+----------------+-------+----------+ | a7b76ad9-83a7-4c2c-a9ef-59e60e175a81 | overcloud-controller-1.localdomain | True | :-) | active | | 85836d82-a187-424a-9816-a3db79e0bb8b | overcloud-controller-2.localdomain | True | :-) | standby | | 51e840f0-5d52-4b9c-a1a5-52b976abff7d | overcloud-controller-0.localdomain | True | :-) | standby | +--------------------------------------+------------------------------------+----------------+-------+----------+ [root@overcloud-controller-0 ~]# ip netns exec qrouter-2be95cbd-efee-4908-90cf-622fcef8cae8 ip a |grep " inet " inet 127.0.0.1/8 scope host lo inet 169.254.192.3/18 brd 169.254.255.255 scope global ha-835b136c-ec
Now overcloud-controller-0
is alive and in standby state(as expected), also in namespace, virtual IP is not there anymore.
转自: http://kimizhang.com/how-l3-and-dhcp-agents-ha-work-in-red-hat-osp7/