转自:https://oldwiki.archive.openwrt.org/doc/recipes/high-availability
先记号一下,有空再仔细研究。
--------------------------------------------------------
High availability is a term that can be used to refer to systems that are designed to remain functional despite some hardware and/or software failures and/or planned maintenance (e.g. upgrades). Actual measured availability (e.g. percentage of time or requests that succeed) can vary.
In this howto, we'll be describing a simple 2 router setup, in an active/backup configuration. The devices will share a virtual ip address that hosts on the lan can use as a gateway to reach the internet. In case the active router fails or is rebooted, a backup router will take over.
We will be using keepalived to implement healthchecking and ip failover, and conntrack-tools to implement firewall/nat syncing.
Most of openwrt configuration required (but not all) is doable from luci web ui as well.
You have 2 openwrt routers and a static WAN IP. (could also be a private IP+DMZ).
If you're not doing NAT or connection tracking based firewalling, skip the conntrackd/conntrack-tools sections.
DHCP dynamic WAN IP is possible with keepalived, but requires extra scripting and is not going to be described here.
VPNs and tunnel setups and failing those over is not covered.
Failing over PPPoE WAN is not implement, best bet: let the modem do PPPoE and setup your virtual wan ip to DMZ.
Internal LAN ip: 192.168.1.2/24 (change so 192.168.1.1 is available for initial configuration of 2nd router)
WAN IP, gateway: static 192.168.0.2/24 gw 192.168.0.1 metric 10 (using double nat / dmz on the isp provided router)
DHCP on defaults is fine, we'll configure it later.
Interface LAN ip: 192.168.1.3/24 (change so that when you connect the second router to the same network you can configure it)
WAN IP, gateway: static 192.168.0.3/24 gw 192.168.0.1 metric 10 (using double nat / dmz on the isp provided router)
DHCP on defaults is fine for now, if you have any static leases in dhcp, or fixed host entries, make sure they're the same as on 1st router.
verification and troubleshooting
change a client to use gw 192.168.1.3 and dns 192.168.1.3, make sure second router is working as well
hosts that have IPs issued with one dnsmasq might not be resolvable using the second dnsmasq, assigning static leases helps.
keepalived is a linux daemon that uses VRRP (Virtual Router Redundancy Protocol) to healthcheck and elect a router on the network that will serve a particular IP. We'll be using a small subset of its features in our use case.
opkg update opkg install keepalived
The following configuration in /etc/keepalived/keepalived.conf
assumes routers are symmetrical, ie. they're of the same priority, they start up in backup mode and they will not preemept the other router until they establish other router is gone. You will need to adjust the interfaces to match your device.
! Configuration File for keepalived ! failover E1 and I1 at the same time vrrp_sync_group G1 { group { E1 I1 } } ! internal vrrp_instance I1 { state backup interface br-lan virtual_router_id 51 priority 101 advert_int 1 virtual_ipaddress { 10.9.8.4/24 } authentication { auth_type PASS auth_pass s3cret } nopreempt } ! external vrrp_instance E1 { state backup interface eth0.2 virtual_router_id 51 priority 101 advert_int 1 virtual_ipaddress { 192.168.0.4/24 } virtual_routes { src 192.168.0.4 to 0.0.0.0/0 via 192.168.0.1 dev eth0.2 metric 5 } authentication { auth_type PASS auth_pass s3cret } nopreempt }
This step is optional, keepalived will be failing over (successing over?) the ip address with or without conntrackd, however, as NAT relies on tracking connection state in a (network address) table that links external ip:port with internal ip:port (per given protocol, tcp or udp), connections might be broken on failover to backup openwrt instance. New connections (such as application level reconnects) will work just fine. This is because the backup instance will not know who to send outgoing packets to.
Below is a simple config file for conntrackd. It would be advisable to navigate to /etc/conntrackd/ in order to rename the original config. Creating a brand new "conntrackd.conf" file allows you to browse back to the old one for reference.
Sync { Mode FTFW { DisableExternalCache Off CommitTimeout 1800 PurgeTimeout 5 } UDP { IPv4_address "ip addr of host router" IPv4_Destination_Address "ip addr of partner router" Port 3780 Interface eth* SndSocketBuffer 1249280 RcvSocketBuffer 1249280 Checksum on } } General { Nice -20 HashSize 32768 HashLimit 131072 LogFile on Syslog on LockFile /var/lock/conntrack.lock UNIX { Path /var/run/conntrackd.ctl Backlog 20 } NetlinkBufferSize 2097152 NetlinkBufferSizeMaxGrowth 8388608 Filter From Userspace { Protocol Accept { TCP UDP ICMP # This requires a Linux kernel >= 2.6.31 } Address Ignore { IPv4_address 127.0.0.1 # loopback } } }
Run simple commands to verify functionality
Summary of connected devices: conntrackd -s
Resync nodes: conntrackd -n
You'll want DHCP (dnsmasq) to serve 192.168.0.4 (vip address) to hosts on the lan, both as their gateway and DNS. Here's an excerpt from /etc/config/dhcp
that instructs dnsmasq to do that.
... config dhcp 'lan' ... option force '1' list dhcp_option '3,192.168.1.4' list dhcp_option '6,192.168.1.4' ...
option force '1' is needed for dnsmasq to not deactivate when it sees the other dhcp server. dhcp_option 3 is gateway, dhcp_option 6 is DNS.
Add the following directories to /etc/sysupgrade.conf
. (can be done from luci as well).
... /etc/keepalived/ /etc/conntrackd/
TODO(risk): restarting keepalived with logread -f open, pulling cables with ssh / telnet / http sessions open, forcing dhcp renewal with tcpdump running, ensure