This docvumentationcovers the ip utility from IPROUTE2. This utility is written byAlexey N. Kuznetsov who also wrote the IPv6 and IPv4 routing code forLinux 2.2. This is the utility he uses for manipulating the Linux 2.2-2.6network interface code.
We will begin byexplaining where to obtain the utility collection and how to compileit. After it is compiled we will cover the utilities created and inwhat location on the system they should reside. This includes all ofthe utilities in the IPROUTE2 suite.
Then we will beginextensive coverage of the ip command with documentation of usage andexamples. This section draws heavily upon Alexey's own documentationof the command with additional discussion and examples. Some of theusages of the command, such as multicast and IPv6 specific usage willbe deferred at this point but we will be extending this document withthat coverage as time goes on. While this is often what would befound in man pages, no man pages currently exist for the ip commandand Alexey's own current documentation is only available in Latexformat. With Alexey's permission we have edited and expanded theLatex documentation into the sections found here. If there are errorsin these sections they probably belong to Matthew's translation andshould be addressed to him first.
Totie together what we have learned about the ip utility we will list afew working examples of the ip utility. These include several longerscript examples from Alexey along with some daily usage features ofthe utility. We then in the Table of Contents list a set of examplesfrom real life that are collected here.
Obtaining& Compiling IPROUTE2
Theip utility is just one of the utilities in the IPROUTE2 utilitypackage from Alexey. The primary FTP site was located in Russia atftp://ftp.inr.ac.ru/ip-routing/ but is no longer running. The most complete mirror is located at http://www.linuxgrill.com/anonymous/iproute2/ with the newest OSDL source code located within the http://www.linuxgrill.com/anonymous/iproute2/NEW-OSDL/ directory. We will assume that you have obtained the latest package usually called iproute2-currentsymlinked to the latest dated version. The version we primarily coverhere is the 1999-06-30 version of IPROUTE2.
Once the utility hasbeen obtained you need to unpack it into whatever directory you usefor compiling source code. The default is to use /usr/src. When youhave the package untarred you can enter the directory and just typemake. You must have the kernel source code that was used to compileyour current running kernel located in /usr/src/linux. You do want tocompile a version of your own unless you are using a distributionthat includes the utility and you have not remade your kernel. Sinceone of the best tuning and security functions you can perform on yoursystem is to obtain and compile your own specific kernel you willwant to compile this utility also as it is the single most importantutility in the IP configuration of your system.
After you have typedmake the utility suite will compile. Then we have to install thevarious parts. There is no install target in the makefile. All of theutilities in this package should be installed into the /sbindirectory. This is so that they are available even before your /usrdirectory is mounted. There is additionally a /etc/iproute2 directoryin the package that contains sample definition files. If you do nothave a /etc/iproute2/ directory on your system then create one andcopy the contents of the package directory to the new directory. Ifan /etc/iproute2/ directory exists and you do not know what it isbeing used for then you will want to find out if the files in thatdirectory have some meaning to the system you are running. If notthen replacing them with the files in the package directory will nothurt.
In a nutshell we want toperform the following steps:
1. Compile theutilities by typing make
2. Check /etc/iproute2/with ls -l /etc/iproute2
3. If needed create/etc/iproute2/ with
mkdir /etc/iproute2/
4. Populate it with cp./etc/iproute2/* /etc/iproute2/
5. Change into the ipdirectory with cd ip
6. cp ifcfg ip routefroutel rtacct rtmon rtpr /sbin
7. Change into tcdirectory with cd ../tc
8. cp tc /sbin
This will compile theutility and copy the configuration files and the executables into theappropriate directories. We should now be able to execute the iputility from anywhere on the system by typing ip. To test and see ifthis worked type ip addr and you should get a list of the interfacesand addresses on your system.
IP Command Set
Inthis section we will present a comprehensive description of the iputility from Alexey Kuznetsov's IPROUTE2 package. We will start bygoing through most of the ip command in extreme detail. We will coverthe link, addr, route, rule, neigh, tunnel, and monitor parts of theip command. The multicast sections will be covered in a "to beadded later" section on IPv6 and multicasting.
We will first go throughall of the command syntax of the ip command. This is due to thesituation, current as of February 2000, that there are no man pagesfor ip and the documentation is only available in Latex format. Ifyou have read the ip-cref.tex document that Alexey has written asincluded in 1999-06-30 distribution of IPROUTE2 then feel free tojust skim through most of this section. Matthew has extended thediscussion and examples somewhat but the core is taken fromip-cref.tex. If you have any questions or comments about the examplesor statements in this section please direct them to Matthew. Notealso that by the time you read this the ip command may have changedfor 2.3/2.4. As it changes we will attempt to keep this documentcurrent.
IP Global CommandSyntax
The generic form of theip command is
ip [ OPTIONS ] OBJECT [COMMAND [ ARGUMENTS ]]
OPTIONS:
OPTIONS is a multivaluedset of modifiers that affect the general behaviour and output of theip utility. All options begin with the "-" character andmay be used both in long and abbreviated form. Currently thefollowing options are available
-V, -Version --- printthe version of the ip utility and exit.
-s, -stats, -statistics--- output more information.
This option may berepeated to increase the verbosity level of the output. As a rule theadditional information is device or function statistics or values. Inmany cases the values output should be considered in the same senseas output from the /proc/ directory where the name of the value isnot directly related to the value itself. See later when we run thisoption with different network device drivers.
-f, -family {inet,inet6, link} --- enforce which protocol family to use.
If this option is notpresent, the protocol family output to use is guessed from the othercommand line arguments. If the rest of command line does not providesufficient information to guess a protocol family, the ip commandfalls back to a default family of inet in the case of networkprotocols or to any. Link is a special family identifier meaning thatno networking protocol is involved. There are several shortcuts forthis option and they are as listed here:
-4 --- shortcut for-family inet.
-6 --- shortcut for-family inet6.
-0 --- shortcut for-family link.
-o, -oneline --- formatthe output records as single lines by replacing any line feeds withthe "\" character.
This option is toprovide a convenient method for sending the command output through apipe. IE: When you want to count the number of output records with wcor you want to to grep through the output. As of 1999-06-30 theIPROUTE2 utility package includes the trivial script rtpr to convertthe output back to the original readable form.
-r, -resolve --- usesystem name resolution to output DNS names
Do not use this optionif you are reporting bugs with the ip utility or querying for usageadvice. ip itself never uses DNS to resolve names to addresses. Thisoption exists for the administrators convenience only.
OBJECT:
OBJECT is the objecttype on which you wish to operate on or obtain information about. Theobject types understood by the current ip utility are link, address,neighbor, route, rule, maddress, mroute, and tunnel.
link --- physical orlogical network device.
address --- protocol(IPv4 or IPv6) address on a device.
neighbour --- ARP orNDISC cache entry.
route --- routing tableentry.
rule --- rule in routingpolicy database.
maddress --- multicastaddress.
mroute --- multicastrouting cache entry.
tunnel --- tunnel overIP.
The names of all of theobjects may be written in full or abbreviated form. IE: address maybe abbreviated as addr or just a. However if you use these commandswithin scripts you should make it a habit to always use the fullspecification of the action. Using the abbreviation makes it easy touse on the command line but hard to understand the logic withinscripts. Since you may not be the only person who ever has to dealwith your scripts then you should strive to make them as complete aspossible.
COMMAND:
COMMAND specifies theaction to perform on the object. The set of possible actions dependson the object type. Typically it is possible to add, delete, and show(list) the object(s), but some objects will not allow all of theseoperations and many have additional actions and commands. Note thatthe command syntax help which is available for all objects prints outthe full list of available commands and argument syntax conventions.If no command is given a default command is assumed. The defaultcommand is usually show (list) but if the objects of the class cannotbe listed then the default is to print out the command syntax help.
ARGUMENTS:
ARGUMENTS is the list ofcommand options specific to the command. The arguments depend on thecommand and the object. There are two types of arguments that can beissued:
--- flags - which areabbreviated with a single keyword
--- parameters -consisting of a keyword followed by a value
Each command has adefault parameter which is used if the arguments are omitted. IE: Thedev parameter is the default for the ip link command thus ip linklist eth0 is equivalent to ip link list dev eth0. Within all thecommand descriptions below we distinguish default parameters with themarker (default).
As we mentioned abovefor the names of objects, all keywords may be abbreviated with thefirst or first few unique letters. These shortcuts are convenientwhen ip is used interactively, but they are not recommended for usein scripts and please do not use them when reporting bugs or askingfor help. Officially allowed abbreviations are listed along with thefirst mention of the command.
Error Conditions
The ip command mostcommonly fails for the following reasons:
* Wrong command linesyntax
This is often due tousing an unknown keyword, a wrongly formatted IP address, wrongkeyword argument for the command, etc. In this case the ip commandexits without performing any actions and prints out an error messagecontaining information about the reason for failure. In some cases itprints out the command syntax help.
* The arguments did notpass self-consistency verification
* ip failed to compilea kernel request from the arguments due to insufficient user providedinformation
* Kernel returned anerror to a syscall. In this case ip prints the error message as itwas output from perror(3), prefixed with a comment and the syscallidentifier.
* Kernel returned anerror to a RTNETLINK request. In this case ip prints the errormessage as it was output from perror(3) prefixed with "RTNETLINKanswers".
Note that all ip commandoperations are atomic. This means that if the ip command fails itdoes not change anything in the system. One harmful exception is theip link command which may change only part of the device parametersgiven on the command line. We will mention this again in the sectionon ip link usage and reccomend that all ip link actions be performedindividually. This is actually a preferred use for the ip command ingeneral. If you need to perform many repetitions of the command use ascript loop or a script as then any generated error messages can beassociated with the appropriate ip command action.
It is difficult to listall possible error messages especially the syntax errors. As a ruletheir meaning should be clear from the context of the command thatwas issued. For example if we issue the command ip link sub eth0 withthe obvious misspelling of set then we get the error message "Command"sub" is unknown, try "ip link help"" whichshould prompt us to check our command syntax.
In using the ip commandthere are several facilities that need to be present in order for thecommand to perform its functions. The ip command talks to the kernelthrough the NETLINK interface. This is turned on by the NETLINKoptions which are enabled in the kernel compile. If the ip commanddoes not work or you get an error message then you may not have theneeded functions defined or your kernel is not the one you compiled.The most common mistakes are:
* NETLINK is notconfigured in the kernel. The error message is
"Cannot opennetlink socket Invalid value"
* RTNETLINK is notconfigured in the kernel.
In this case one ofthe following messages may be printed depending on the actual commandissued:
"Cannot talk tortnetlink Connection refused"
"Cannot send dumprequest Connection refused"
ip link - network device configuration
A link refers a networkdevice. The ip link object and the corresponding command set allowsviewing and manipulating the state of network devices. The commandsfor the link object are just two, set and show.
ip link set --- change device attributes.
Abbreviations:set, s
Warning
You can request multipleparameter changes with ip link. If you request multiple parameterchanges and any ONE change fails then ip aborts immediately after thefailure thus the parameter changes previous to the failure havecompleted and are not backed out on abort. This is the only casewhere using the ip command can leave your system in an unpredictablestate. The solution is to avoid changing multiple parameters with oneip link set call. Use as many individual ip link set commands asnecessary to perform the actions you desire.
Arguments:
* dev NAME (default)--- NAME specifies the network device to operate on
* up / down --- changethe state of the device to UP or to DOWN
* arp on / arp off ---change NOARP flag status on the device
Note that thisoperation is not allowed if the device is already in the UP state.Since neither the ip utility nor the kernel check for this condition,you can get very unpredictable results changing the flag while thedevice is running. It is better to set the device down then issuethis command.
* multicast on /multicast off --- change MULTICAST flag on the device.
* dynamic on / dynamicoff --- change DYNAMIC flag on the device.
* name NAME --- changename of the device.
Note that thisoperation is not recommended if the device is running or has someaddresses already configured. You can break your systems security andscrew up other networking daemons and programs by changing the devicename while the device is running or has addressing assigned.
* txqueuelen NUMBER /txqlen NUMBER --- change transmit queue length of the device
* mtu NUMBER --- changeMTU of the device.
* address LLADDRESS ---change station address of the interface.
* broadcast LLADDRESS,brd LLADDRESS or peer LLADDRESS --- change link layer broadcastaddress or peer address in the case of a POINTOPOINT interface
Note that for mostphysical network devices (Ethernet, TokenRing, etc) changing the linklayer broadcast address will break networking. Do not use thisargument if you do not understand what this operation really does.
* The ip command doesnot allow changing the PROMISC or ALLMULTI flags as these flags areconsidered obsolete and should not be changed administratively.
Examples:
ip link set dummyaddress 000000000001 --- change station address of the interfacedummy.
ip link set dummy up ---start the interface dummy.
ip link show --- lookat device attributes.
Abbreviations:show, list, lst, sh, ls, l
Arguments:
* dev NAME (default) ---NAME specifies network device to show.
If this argument isomitted, the command lists all the devices.
* up --- display onlyrunning interfaces.
Output:
kuznet@alisa~:$ ip linkls dummy
2: dummy:<BROADCAST,NOARP> mtu 1500 qdisc noop
link/ether000000000000 brd ffffffffffff
The number followed by acolon is the interface index or ifindex. This number uniquelyidentifies the interface. If you look at the output from cat/proc/net/dev you will see that the network devices are listed in thesame order as the numbering you see here. After the ifindex is theinterface name (eth0, sit0 etc.). The interface name is also uniqueat any given moment, however interfaces may disappear from the list,such as when the corresponding driver module is unloaded, and anotherinterface with the same name will be created later. Additionally withthe ip link set DEVICE name NEWNAME command the system administratormay change the devices name.
The interface name mayalso have another name or the keyword NONE appended after an "@"sign. This signifies that this device is bound to another device in amaster/slave device relationship. Thus packets sent through thisdevice are encapsulated and forwarded on via the master device. Ifthe name is NONE, then the master device is unknown.
After the interface namewe see the interface mtu (maximal transfer unit) which determinesmaximal size of data packet which can be sent as a single packet overthis interface.
The qdisc (queuingdiscipline) shows which queuing algorithm is used on the interface.In particular the keyword noqueue means that this interface does notqueue anything and the keyword noop indicates that the interface isin blackhole mode in which all of the packets sent to it areimmediately discarded.
The qlen indicates thedefault transmit queue length of the device measured in packets.
Following all of thisinormation is a section within angle brackets. Within the anglebrackets is where the interface flags are summarized. The mostapplicable flags are as follows:
UP --- this device isturned on, ready to accept packets for transmission onto the networkand it may receive packets from other nodes on the network.
LOOPBACK --- theinterface does not communicate to another hosts. All the packetswhich are sent through it will be returned back to the sender andnothing but bounced back packets can be received.
BROADCAST --- thisdevice has the facility to send packets to all other hosts sharingthe same physical link. Example: Ethernet
POINTOPOINT --- thenetwork has only two ends with two nodes attached. All the packetssent to the link will reach the peer link and all packets receivedare origined by the peer.
If neither LOOPBACK norBROADCAST nor POINTOPOINT are set, the interface is assumed to be aNBMA (Non-Broadcast Multi-Access) link. NBMA is the most generic typeof device and also the most complicated type of device because a hostattached to a NBMA link cannot send information to any other hostwithout additional manually provided configuration information.
MULTICAST --- anadvisory flag noting the interface is aware of multicasting.Broadcasting is particular case of multicasting where the multicastgroup contains all of the nodes on the link as members. Note thatsoftware must NOT interpret the absence of this flag as theincapability of the interface to multicast. Any POINTOPOINT andBROADCAST link is multicasting by definition because we have directaccess to all the link neighbours and thus to any particular group ofthem. The use of high bandwidth multicast transfers is notrecommended on broadcast-only networks due to the high expensesassociated with the transmission, but such use is not strictlyprohibited.
PROMISC --- the devicelistens and feeds to the kernel all of the traffic on the link. Thisincludes every packet on the network that passes our transceiver.Usually this mode exists only on broadcast links and is used bybridges and network monitoring devices.
ALLMULTI --- the devicereceives all multicast packets wandering on the link. This mode isused by multicast routers.
NOARP --- this flag isdifferent from the other flags. It has no invariant value and itsinterpretation depends on network protocols involved. As a rule itindicates that the device does not need any address resolution andthat the software or hardware knows how to deliver packets withoutany help from the protocol stacks.
DYNAMIC --- is anadvisory flag marking this interface as dynamically created anddestroyed.
SLAVE --- this interfaceis bonded to other interfaces in order to share link capacities.
Other flags do exist andcan be seen in within the angle brackets but they are either obsolete(NOTRAILERS), not implemented (DEBUG), or specific to certain devices(MASTER, AUTOMEDIA and PORTSEL). We will not discuss them here.Additionally the values of the PROMISC and ALLMULTI flags as shown bythe ifconfig utility and by the ip utility are different. The ip linklist command provides the current true device state, whereas ifconfigshows the flag state which was set through ifconfig itself.
The second line of theoutput from the example contains information about the link layeraddresses associated with the device. The first word (ether, sit)defines the interface hardware type which then determines the formatand semantics of the addresses and thus logically is part of theaddress itself. The default format of station and broadcast addresses(or peer addresses for pointopoint links) is a sequence ofhexadecimal bytes separated by colons. However some link types mayinstead have their own natural address formats which are used in thepresentation. IE: The addresses of IP tunnels are printed asdotted-quad IP addresses. While NBMA links have no well-definedbroadcast or peer address, this field may contain useful informationsuch as the address of a broadcast relay or the address of an ARPserver. Multicast addresses are not shown by this command, see ipmaddr list output.
When given the option-statistics ip will print the interface statistics as additionalinformation in the listing. Note that you can give this optionmultiple times with each repetition increasing the verbosity ofoutput.
kuznet@alisa~ $ ip -slink ls eth0
3: eth0:<BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
link/ether00a0cc661878 brd ffffffffffff
RX bytes packets errors dropped overrun mcast
2449949362 2786187 0 0 0 0
TX bytes packets errors dropped carrier collsns
178558497 1783945 332 0 332 35172
The RX and TX linessummarize receiver and transmitter statistics. The information outputbreaks down into:
bytes --- total numberof bytes received or transmitted on the interface.
This number wraps whenthe maximal length of the natural data type on the architecture isexceeded. In order to provide correct long term data from this outputthese statistics should be continuously monitored. Continuousmonitoring of this data requires a user level daemon to sample theoutput periodically.
packets --- total numberof packets received or transmitted on the interface.
errors --- total numberof receiver or transmitter errors.
dropped --- total numberof packets dropped because of lack of resources.
overrun --- total numberof receiver overruns resulting in packet drops. As a rule if theinterface is overrun you have a serious problem either within thekernel or your machine is too slow to handle the speed of thisinterface.
mcast --- total numberof received multicast packets. This option is supported only oncertain devices.
carrier --- total numberof link media failures such as those due to lost carrier.
collsns --- total numberof collision events on Ethernet-like media. This number has differentinterpretations on other link types.
compressed --- totalnumber of compressed packets. It is available only for links using VJheader compression.
When you issue the-statistics option more than once you get additional output dependingon the statistics supported by the device itself as in the followingexample with Ethernet:
kuznet@alisa~ $ ip -s -slink ls eth0
3: eth0:<BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
link/ether00a0cc661878 brd ffffffffffff
RX bytes packets errors dropped overrun mcast
2449949362 2786187 0 0 0 0
RX errors length crc frame fifo missed
0 0 0 0 0
TX bytes packets errors dropped carrier collsns
178558497 1783945 332 0 332 35172
TX errors aborted fifo window heartbeat
0 0 0 332
In this case the errornames are pure Ethernetisms. Other devices may have non-zero fieldsin these positions but the headers are generated independantly of thedevice responses. It is up to the device driver to send moreappropriate error messages to the system logging facility such as isdone by the TokenRing driver.
ip address - protocol address management
Abbreviations:address, addr, a
Arguments: add,delete, flush, show (list)
The address refers to aprotocol (IP or IPv6) address attached to a network device. Eachdevice must have at least one address in order to use thecorresponding protocol. It is possible to have several differentaddresses attached to one device. These addresses are notdiscriminated within the protocol structure so that the term alias isnot quite appropriate for such multiple addresses and we will notrefer to this situation in those terms.
The ip addr commandallows you to look at the addresses and their properties on aninterface. You can add new addresses and delete old ones withoutregard to any ordering. Later on we will discuss the concept ofprimary and secondary addresses as applied to Linux.
ip address add --- add new protocol address.
Abbreviations: add, a
Arguments:
dev NAME --- name ofthe device to which we add the address
local ADDRESS (default)--- address of the interface.
The format of theaddress depends on the protocol. IPv4 uses dotted quad and IPv6 usesa sequence of hexadecimal halfwords separated by colons. The ADDRESSmay be followed by a slash and a decimal number, which encodesnetwork prefix (netmask) length in CIDR notation. If no CIDR netmasknotation is specified then the command assumes a host (/32 mask)address is specified.
peer ADDRESS--- addressof remote endpoint for pointopoint interfaces. Again, the ADDRESS maybe followed by a slash and decimal number, encoding the networkprefix length. If a peer address is specified then the local addresscannot have a network prefix length as the network prefix isassociated with the peer rather than with the local address. In otherwords, netmasks can only be assigned to peer addresses whenspecifying both peer and local addresses.
broadcast ADDRESS ---broadcast address on the interface.
The special symbols "+"and "-" can be used instead of specifying the broadcastaddress. In this case the broadcast address is derived by eithersetting all of the interface host bits to one (+) or by setting allof the interface host bits to zero (-). In most modernimplementations of IPv4 networking you will want to use the (+)setting. See the ipup init script in Chapter 15. Unlike ifconfig, theip command does not set a broadcast address unless explicitlyrequested.
label NAME --- Eachaddress may be tagged with a label string.
In order to preservecompatibility with Linux-2.0 net aliases, this string must coincidewith the name of the device or must be prefixed with device namefollowed by a colon. (eth0:duh)
scope SCOPE_VALUE ---scope of the area within which this address is valid.
The available scopes arelisted in the file
/etc/iproute2/rt_scopes.The predefined scope values are:
global --- the addressis globally valid.
site --- (IPv6 only)address is site local, valid only inside this site.
link --- the address islink local, valid only on this device.
host --- the address isvalid only inside this host.
Examples:
ip addr add 127.0.0.1/8dev lo brd + scope host
--- adds the usualloopback address to loopback device. The device must be enabledbefore this address will show up.
ip addr add 10.0.0.1/24brd + dev eth0
--- adds address10.0.0.1 with prefix length 24 (netmask 255.255.255.0) and standardbroadcast to interface eth0
ip address delete --- delete protocol address.
Abbreviations: delete, del, d
Arguments:
The arguments coincidewith arguments of ip addr add. The device name is a requiredargument, the rest are optional. If no arguments are given, the firstaddress listed is deleted.
Examples:
ip addr del 127.0.0.1/8dev lo
--- deletes the loopbackaddress from loopback device.
Alexey states:
"It would be betternot to try to repeat this experiment 8-}"
Delete all IPv4addresses on interface eth0:
while ip -f inet addrdel dev eth0; do
nothing
done
Another method todisable IP on an interface using ip addr flush is discussed later.
ip address show --- look at protocol addresses.
Abbreviations: show, list, lst, sh, ls, l
Arguments:
dev NAME (default) ---name of the device.
scope SCOPE_VAL ---list only addresses with this scope.
to PREFIX --- list onlyaddresses matching this prefix.
label PATTERN --- listonly addresses with labels matching the PATTERN.
PATTERN is the usualshell regexp style pattern.
dynamic / permanent ---(IPv6 only) list only addresses installed due to stateless addressconfiguration or list only the permanent (not dynamic) addresses.
tentative --- (IPv6only) list only addresses, which did not pass duplicate addressdetection.
deprecated --- (IPv6only) list only deprecated addresses.
primary / secondary ---list only primary (or secondary) addresses.
Example:
kuznet@alisa~ $ ip addrls eth0
3: eth0:<BROADCAST,MULTICAST,UP> mtu 1500 qdisc cbq qlen 100
link/ether00a0cc661878 brd ffffffffffff
inet 193.233.7.90/24brd 193.233.7.255 scope global eth0
inet63ffe2400012a0ccfffe661878/64 scope global dynamic
valid_lft foreverpreferred_lft 604746sec
inet6fe802a0ccfffe661878/10 scope link
The first two linescoincide with the output of ip link list as it is only natural tointerpret link layer addresses as being addresses of the protocolfamily AF_PACKET. The list of IPv4 and IPv6 addresses followsaccompanied by additional attributes such as scope value, flags, andaddress label. Address flags are set by the kernel and cannot bechanged administratively. Currently the following flags are defined:
secondary --- thisaddress is not used when selecting the default source address foroutgoing packets. An IP address becomes secondary if another addresswithin the same prefix (network) already exists. The first addresswithin the prefix is primary and is the tag address for the group ofall the secondary addresses. When the primary address is deleted allof the secondaries are purged too. See the examples for the actualfunctionality of these steps.
dynamic --- the addresswas created due to stateless autoconfiguration. In this case theoutput also contains information on the times for which the addressremains valid. After the preferred lifetime (preferred_lft) expiresthe address is moved to the deprecated state and after the validlifetime (valid_lft) expires the address is finally invalidated.
deprecated --- theaddress is deprecated. It is still valid but cannot be used by newlycreated connections. See dynamic above.
tentative --- theaddress is not used because duplicate address detection is still notcomplete or has failed.
IP Interface Primary and Secondary Addressing:
To explain the actualrelationship between primary and secondary addresses we will run thefollowing experiment.
ip addr add 10.1.1.1/24dev dummy
ip addr add 10.1.1.2/24dev dummy
Now look at the output:
ip addr list dummy
3: dummy:<BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop
link/ether00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.1/24scope global dummy
inet 10.1.1.2/24scope global secondary dummy
Now add in someaddresses still in that network but add them as host addresses:
ip addr add 10.1.1.3/32dev dummy
ip addr add 10.1.1.4/25dev dummy
And run our listcommand:
ip addr list dummy
3: dummy:<BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop
link/ether00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.1/24scope global dummy
inet 10.1.1.3/32scope global dummy
inet 10.1.1.4/25scope global dummy
inet 10.1.1.2/24scope global secondary dummy
And finally delete theprimary address
ip addr del 10.1.1.1/24dev dummy
Run the list command:
ip addr list dummmy
3: dummy:<BROADCAST,MULTICAST,NOARP> mtu 1500 qdisc noop
link/ether00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
inet 10.1.1.3/32scope global dummy
inet 10.1.1.4/25scope global dummy
Note that the mostimportant part of what we said above about secondary and primaryaddresses is the prefix (netmask) length. Even though technically youcan consider the address 10.1.1.3 to belong within the network prefix10.1.1.0/24, the actual prefix associated with the address is /32 sothis address is treated independantly of the initial primary address.If you are still uncertain about why sit down and calculate out thenetworks and masks of the example above.
What we are showing hereis that unlike the behaviour in the 2.0 series kernels under thehorrid eth0:xx style aliasing is that multiple addresses on aninterface are not neccesarily related. So if you want to (and we willshow an example in the howto section) you can enter in all of your ipaddresses without network masks and treat them completelyindependantly.
ip address flush --- flush protocol addresses.
Abbreviations: flush, f
Arguments:
This commands flushesprotocol addresses selected by some criteria. This command has thesame arguments as show. The major difference is that this commandwill not run if no arguments are given. Otherwise you could deleteall of your addresses by mistake. This command (and the other flushcommands described below) are very dangerous. If you make a mistakethe command does not ask or forgive but really will creully purge allof your addresses. Be warned!
With the option-statistics the command becomes verbose and prints out the number ofdeleted addresses and number of processing rounds made in order toflush the address list. If the -statistics option is given twice thenip addr flush also dumps all of the deleted addresses in the fullformat as described in the ip addr list section.
Examples:
Delete all the addressesfrom private network 10.0.0.0/8:
netadm@amber~ # ip -stat-stat addr flush to 10/8
2 dummy inet10.7.7.7/16 brd 10.7.255.255 scope global dummy
3 eth0 inet10.10.7.7/16 brd 10.10.255.255 scope global eth0
4 eth1 inet10.8.7.7/16 brd 10.8.255.255 scope global eth1
***Round1, deleting 3 addresses***
***Flush is completeafter 1 round***
Another instructiveexample is deleting all IPv4 addresses from all Ethernet interfacesin the system:
netadm@amber~ # ip -4addr flush label "eth*"
And the last exampleshows how to flush all the IPv6 addresses acquired by the host fromstateless address autoconfiguration after enabling forwarding ordisabling autoconfiguration.
netadm@amber~ # ip -6addr flush dynamic
ip neighbour --- neighbour/arp table management.
Abbreviations: neighbour, neighbor, neigh, n
The neighbour tableobjects establish bindings between protocol addresses and link layeraddresses for hosts sharing the same physical link. Neighbour objectentries are organized into tables. The IPv4 neighbour object table isknown under another name as the ARP table. These commands allow youto look at the neighbour table bindings and their properties, to addnew neighbour table entries, and to delete old ones.
Arguments:
add, change, replace,delete, flush and show (list)
ip neighbour add --- add new neighbour entry
ip neighbour change --- change existing entry
ip neighbour replace --- add new or change existing entry
add, a; change, chg;replace, repl
These commands createnew neighbour records or update existing ones.
to ADDRESS (default) --- protocol address of the neighbour. It iseither an IPv4 or IPv6 address.
dev NAME --- the interface to which this neighbour is attached
lladdr LLADDRESS --- link layer address of the neighbour.LLADDRESS can be null.
nud NUD_STATE --- state of the neighbour entry. nud is anabbreviation for "Neighbour Unreachability Detection". Thisstate can take one of the following values:
permanent --- the neighbour entry is valid forever and can beremoved only administratively.
noarp --- the neighbour entry is valid, no attempts to validatethis entry will be made but it can be removed when its lifetimeexpires.
reachable --- the neighbour entry is valid until reachabilitytimeout expires.
stale --- the neighbour entry is valid, but suspicious. Thisoption to ip neighbour does not change the neighbour state if theentry was valid and the address has not been changed by this command.
Examples:
ip neigh add 10.0.0.3lladdr 000001 dev eth0 nud perm
--- add permanent ARPentry for neighbour 10.0.0.3 on the device eth0.
ip neigh chg 10.0.0.3dev eth0 nud reachable
--- change its state toreachable.
ip neighbour delete --- delete neighbour entry.
Abbreviations: delete, del, d.
This command invalidatesa neighbour entry.
The arguments are thesame as with ip neigh add, only lladdr and nud are ignored.
Example:
ip neigh del 10.0.0.3dev eth0
--- invalidate ARP entryfor neighbour 10.0.0.3 on the device eth0.
Deleted neighbour entrywill not disappear from the tables immediately; if it is in use itcannot be deleted until the last client will release it, otherwise itwill be destroyed during the next garbage collection.
WARNING!
Attempts to delete or tochange manually a noarp entry created by kernel may result inunpredictable behaviour. More specifically the kernel may starttrying to resolve this address even on NOARP interfaces or change theaddress to multicast or broadcast.
ip neighbour show --- list neighbour entries.
Abbreviations: show, list, sh, ls.
This commands displaysneighbour tables.
Arguments:
to ADDRESS (default)--- prefix selecting neighbours to list.
dev NAME --- list onlyneighbours attached to this device.
unused --- list onlyneighbours, which are not in use now.
nud NUD_STATE --- listonly neighbour entries in this state. NUD_STATE takes values listedbelow after the example or the special value all, which means all thestates. This option may occur more than once. If this option isabsent, ip lists all the entries except for none and noarp.
Example:
kuznet@alisa~ $ ip neighls
dev lo lladdr000000000000 nud noarp
fe80200cfffe763f85 deveth0 lladdr 00000c763f85 router nud stale
0.0.0.0 dev lo lladdr000000000000 nud noarp
193.233.7.254 dev eth0lladdr 00000c763f85 nud reachable
193.233.7.85 dev eth0lladdr 00e01e633900 nud stale
kuznet@alisa~ $
The first word of eachline is the protocol address of the neighbour followed by the devicename. The rest of the line describes the contents of neighbour entryidentified by the pair (device, address).
lladdr is link layeraddress of the neighbour.
nud is the state of``neighbour unreachability detection for this entry. The full list ofthe possible nud states with minimal descriptions are:
none --- state of the neighbour is void.
incomplete --- the neighbour is in process of resolution.
reachable --- the neighbour is valid and apparently reachable.
stale --- the neighbour is valid, but probably it is alreadyunreachable, so that kernel will try to check it at the firsttransmission.
delay --- a packet has been sent to the stale neighbour, kernelwaits for confirmation.
probe --- delay timer expired, but no confirmation was received.Kernel has started to probe neighbour with ARP/NDISC messages.
failed --- resolution has failed.
noarp --- the neighbour is valid, no attempts to check the entrywill be made.
permanent --- it is noarp entry, but only administrator mayremove the entry from neighbour table.
Link layer address is valid in all the states except for none,failed and incomplete.
IPv6 neighbours can bemarked with the additional flag router, which means that thatneighbour introduced itself as an IPv6 router.
Option -statisticsprovides some usage statistics,
kuznet@alisa~ $ ip -s nls 193.233.7.254
193.233.7.254 dev eth0lladdr 00000c763f85 ref 5 used 12/13/20 \
nud reachable
kuznet@alisa~ $
Here ref is number ofusers of this entry, and used is a triplet of time intervals inseconds separated by slashes. The triplet of numbers is coded as{used/confirmed/updated}. In this example they show that
The entry was used 12seconds ago.
The entry was confirmed13 seconds ago.
The entry was updated 20seconds ago.
ip neighbour flush --- flush neighbour entries.
Abbreviations: flush, f.
This commands flushesthe neighbour tables. Entries may be selected to flush by variouscriteria.
This command has thesame arguments as show. Note that it will not run when no argumentsare given, and that the default neighbour states to be flushed do notinclude permanent or noarp.
With the option-statistics the command becomes verbose and prints out the number ofdeleted neighbours and number of rounds made in flushing theneighbour table. If the option is given twice, ip neigh flush alsodumps all the deleted neighbours in the format described in theprevious subsection.
netadm@alisa~ # ip -s -sn f 193.233.7.254
193.233.7.254 dev eth0lladdr 00000c763f85 ref 5 used 12/13/20 \
nud reachable
***Round 1, deleting 1entries***
***Flush is completeafter 1 round***
ip route - routing table management.
Abbreviations: route, ro, r.
This command manages theroute entries within the kernel routing tables. The kernel routingtables keep information about protocol paths to other networkednodes.
Each route entry has akey consisting of the protocol prefix, which is the pairing of thenetwork address and network mask length, and optionally the Type ofService (TOS) value. An IP packet matches to the route if the highestbits of the packets destination address are equal to the route prefixat least up to the prefix length and if the TOS of the route is zeroor equal to TOS of the packet.
If several routes matchto the packet, the following pruning rules are used to select thebest one:
1. The longest matchingprefix is selected, all shorter ones are dropped.
2. If the TOS of someroute with the longest prefix is equal to TOS of the packet thenroutes with different TOS are dropped.
3. If no exact TOSmatch was found and routes with TOS=0 exist, the rest of the routesare pruned. Otherwise the route lookup fails.
4. If several routesremain after steps 1-4 have been tried then routes with the bestpreference value are selected.
5. If we still haveseveral routes then the first of them is selected.
Note the ambiguity ofaction 5. Unfortunately, Linux historically allowed such a bizarresituation. The sense of the word "the first" depends on theliteral order in which the routes were added to the routing table andit is practically impossible to maintain a bundle of such routes inany such order.
For simplicity we willlimit ourselves to the case wherein such a situation is impossibleand routes are uniquely identified by the triplet of {prefix, tos,preference}. Using the ip command for route creation and manipulationmakes it impossible to create such non-unique routes.
One useful exception tothis rule is the default route on non-forwarding hosts. It is"officially" allowed to have several fallback routes incases when several routers are present on directly connectednetworks. In this case Linux performs "dead gateway detection"as controlled by neighbour unreachability detection and referencesfrom the transport protocols to select the working router thus theordering of the routes is not essential. However in this specificcase it is not recommended that you manually fiddle with defaultroutes but instead use the Router Discovery protocol. Actually LinuxIPv6 does not even allow user level applications access to defaultroutes.
Of course the routeselection steps above are not performed in exactly this sequence. Therouting table in the kernel is kept in a data structure which allowsachieving the final result with minimal cost. Without depending onany particular routing algorithm implemented in the kernel we cansummarize the sequence above as: Route is identified by triplet{prefix,tos,preference} key which uniquely locates the route in therouting table.
Route attributes:Each route key refers to a routing information record. The routinginformation record contains the data required to deliver IP packets,such as output device and next hop router, and additional optionalattributes, such as path MTU or the preferred source address forcommunicating to that destination.
Route types: Itis important that the set of required and optional attributes dependson the route type. The most important route type is a unicast routewhich describes real paths to another hosts. As a general rule,common routing tables only contain unicast routes. However otherroute types with different semantics do exist. The full list of typesunderstood by the Linux 2.2 kernel is:
unicast --- the routeentry describes real paths to the destinations covered by routeprefix.
unreachable --- thesedestinations are unreachable; packets are discarded and the ICMPmessage host unreachable (ICMP Type 3 Code 1) is generated. The localsenders get error EHOSTUNREACH.
blackhole --- thesedestinations are unreachable; packets are silently discarded. Thelocal senders get error EINVAL.
prohibit --- these destinations areunreachable; packets are discarded and the ICMP message communicationadministratively prohibited (ICMP Type 3 Code 13) is generated. Thelocal senders get error EACCES.
local --- thedestinations are assigned to this host, the packets are looped backand delivered locally.
broadcast --- thedestinations are broadcast addresses, the packets are sent as linkbroadcasts.
throw --- specialcontrol route used together with policy rules. If a throw route isselected then lookup in this particular table is terminatedpretending that no route was found. Without any policy routing it isequivalent to the absence of the route in the routing table, thepackets are dropped and ICMP message net unreachable (ICMP Type 3Code 0) is generated. The local senders get error ENETUNREACH.
nat --- special NATroute. Destinations covered by the prefix are considered as dummy (orexternal) addresses, which require translation to real (or internal)ones before forwarding. The addresses to translate to are selectedwith the attribute via.
anycast --- (notimplemented) the destinations are anycast addresses assigned to thishost. They are mainly equivalent to local addresses with thedifference that such addresses are invalid to be used as the sourceaddress of any packet.
multicast --- specialtype, used for multicast routing. It does not present in normalrouting tables.
Route tables:Linux can place routes within multiple routing tables identified by anumber in the range from 1 to 255 or by a name taken from the file/etc/iproute2/rt_tables. By default all normal routes are inserted tothe table main (ID 254) and the kernel uses only this table whencalculating routes.
Actually another routingtable always exists which is invisible but even more important. It isthe local table (ID 255). This table consists of routes for local andbroadcast addresses. The kernel maintains this table automaticallyand administrators should not modify it and do not even need to lookat it in normal operation.
The multiple routingtables come into play when policy routing is used. In policy routingthe routing table identifier becomes effectively one more parameteradded to the key triplet {prefix,tos,preference}. Thus under policyrouting the route is obtained by {tableid,key triplet} identifing theroute uniquely. So you can have several identical routes in differenttables that will not conflict as we had mentioned above in in thedescription of "the first" mechanism.
ip route add --- add new route
ip route change --- change route
ip route replace --- change route or add new one.
Abbreviations: add, a; change, chg; replace, repl.
Arguments:
to PREFIX or to TYPE PREFIX (default) ---destination prefix of the route. If TYPE is omitted, ip assumes typeunicast. Another values of TYPE are listed above. PREFIX is IPv4 orIPv6 address optionally followed by slash and prefix length. If thelength of the prefix is missing, ip assumes full-length host route.Also there is one special PREFIX --- default --- which is equivalentto IP 0/0 or to IPv6 /0.
tos TOS or dsfield TOS --- Type Of Service(TOS) key. This key has no mask associated and the longest match isunderstood as first, compare TOS of the route and of the packet, ifthey are not equal, then the packet still may match to a route withzero TOS. TOS is either 8bit hexadecimal number or an identifier from/etc/iproute2/rt_dsfield.
metric NUMBER or preference NUMBER ---preference value of the route. NUMBER is an arbitrary 32bit number.
table TABLEID --- table to add this route.TABLEID may be a number or a string from the file/etc/iproute2/rt_tables. If this parameter is omitted, ip assumestable main, with exception of local, broadcast and nat routes, whichare put to table local by default.
dev NAME --- the output device name.
via ADDRESS --- the address of nexthoprouter. Actually, the sense of this field depends on route type. Fornormal unicast routes it is either true nexthop router or, if it is adirect route installed in BSD compatibility mode, it can be a localaddress of the interface. For NAT routes it is the first address ofblock of translated IP destinations.
src ADDRESS --- the source address toprefer using when sending to the destinations covered by routeprefix. This address must be defined on a local machine interface.This will come into play when routes and rules are combined with themasquerade rules of the ipchains firewall we discuss later.
realm REALMID --- the realm which thisroute is assigned to. REALMID may be a number or a string from thefile /etc/iproute2/rt_realms.
mtu MTU or mtu lock MTU --- the MTU alongthe path to destination. If modifier lock is not used, MTU may beupdated by the kernel due to Path MTU Discovery. If the modifier lockis used then no path MTU discovery will be performed and all thepackets will be sent without the DF bit set for the IPv4 case orfragmented to the MTU for the IPv6 case.
window NUMBER --- the maximal advertisedwindow for TCP to these destinations measured in bytes. Thisparameter limits the maximal data bursts our TCP peers are allowed tosend to us.
rtt NUMBER --- the initial RTT (``RoundTrip Time) estimate. Actually,in Linux 2.2 and 2.0 it is not RTT but the initial TCP retransmissiontimeout. The kernel forgets it as soon as it receives the first validACK from peer. Alas, this means that this attribute affects only theconnection retry rate and is hence useless.
nexthop NEXTHOP --- nexthop of multipathroute. NEXTHOP is a complex value with its own syntax as follows:
via ADDRESS is nexthop router.
dev NAME is output device.
weight NUMBER is weight of this element ofmultipath route
reflecting its relative bandwidth orquality.
scope SCOPE_VAL --- scope of thedestinations covered by the route prefix. SCOPE_VAL may be a numberor a string from the file /etc/iproute2/rt_scopes. If this parameteris omitted, ip assumes scope global for all gatewayed unicast routes,scope link for direct unicast routes and broadcasts and scope hostfor local routes.
protocol RTPROTO --- routing protocolidentifier of this route. RTPROTO may be a number or a string fromthe file /etc/iproute2/rt_protos. If the routing protocol ID is notgiven ip assumes the protocol is boot. IE. This route has been addedby someone who does not understand what they are doing. Several ofthese protocol values have a fixed interpretation.
redirect--- route was installed due to ICMP redirect.
kernel --- route was installed by thekernel during autoconfiguration.
boot --- route was installed during bootupsequence. If a routing daemon will start, it will purge all of them.This is the value assigned to manually inserted routes that do nothave a protocol specified.
static --- route was installed byadministrator to override dynamic routing. Routing daemon(s) willrespect them and advertise them if it is so configured.
ra --- route was installed by RouterDiscovery protocol.
Note that the rest of values are not reserved and administrator isfree to assign or not assign protocol tags. Routing daemons at leastshould take care of setting some unique protocol values forthemselves such as they are assigned in rtnetlink.h or in thert_protos database.
onlink --- pretend that the nexthop isdirectly attached to this link, even if it does match any interfaceprefix. One application of this option may be found in ip tunnelsbetween dissimilar addresses.
equalize --- allow packet by packetrandomization on multipath routes. Without this modifier route willbe frozen to one selected nexthop, so that load splitting will occuronly on per-flow base. Equalize works only if the appropriate kernelconfiguration option is chosen or if the kernel is patched.
Two more commands,prepend and append do exist. Prepend does the same thing as theclassic route add command by adding the route even if another routeto the same destination already exists. The opposite case is appendwhich adds the route to the end of the list. We strongly reccommendthat you avoid using these commands.
Unfortunately, IPv6currently only understands the append command correctly, all the restof the set translating to append. Certainly, this will change in thefuture.
Examples:
Add a plain route tonetwork 10.0.0/24 via gateway 193.233.7.65
ip route add 10.0.0/24via 193.233.7.65
change it to a directroute via device dummy
ip ro chg 10.0.0/24 via193.233.7.65 dev dummy
Add default multipathroute splitting load between ppp0 and ppp1
ip route add defaultscope global nexthop dev ppp0 nexthop dev ppp1
Note the scope valuewhich is not necessary but prompts the kernel that this route isgatewayed rather than direct. Actually, if you know the addresses ofthe remote endpoints it would be better to specify them using theparameter via.
NAT the address192.203.80.144 to 193.233.7.83 before forwarding
ip route add nat192.203.80.142 via 193.233.7.83
Note that the reverseNAT translation is setup with policy rules as described in the policyrouting section.
ip route delete
Abbreviations: delete, del, d.
ip route del has thesame arguments as ip route add but their semantics are a bitdifferent.
Key values (dest, tos,preference and table) select the route to delete. If any optionalattributes are present, ip verifies that they coincide withattributes of the route to delete. If no route with given key andattributes is found then ip route del fails.
Linux kernel 2.0 had theability to delete a route selected only by the prefix address whileignoring its netmask. This option does not exist anymore due to theambiguous nature of the selection. If you wish to have suchfunctionality then look at the ip route flush command which providesa richer set of capabilities.
Examples:
Delete the multipathroute created by the add example previously
ip route del defaultscope global nexthop dev ppp0 nexthop dev ppp1
ip route show
Abbreviations: show, list, sh, ls, l.
This format of thecommand allows viewing the routing tables contents and looking atroute(s) as selected by some criteria.
Arguments:
to SELECTOR (default) --- select routes only fromthe given range of destinations. SELECTOR has optional modifiers(root, match or exact) and a prefix.
root PREFIX selects routes with prefixes notshorter than PREFIX. IE: root 0/0 selects all the routing table.
match PREFIX selects routes with prefixes notlonger than PREFIX. match 10.0/16 selects 10.0/16, 10/8 and 0/0, butit does not select 10.1/16 and 10.0.0/24.
exact PREFIX (or just PREFIX) selects routesexactly with this prefix.
If none of these options are present then the ipcommand assumes root 0/0 which lists the entire table.
tos TOS or dsfield TOS --- Select only routes withgiven TOS.
table TABLEID --- Show routes from this table(s).Default setting is to show table main (ID 254). TABLEID may be eitherID of a real table or one of the special values:
all --- list all the tables.
cache --- dump routing cache.
IPv6 has only a single table, however splittinginto main, local, and cache is emulated by the ip utility.
cloned or cached --- list cloned routes which areroutes dynamically forked off of other routes because some routeattribute (like MTU) was updated. It is equivalent to table cache.
from SELECTOR --- the same syntax as to SELECTORbut bounds the source address range rather than the destination. Notethat the from option only works with cloned routes.
protocol RTPROTO --- list only routes of thisprotocol.
scope SCOPE_VAL --- list only routes with thisscope.
type TYPE --- list only routes of this type.
dev NAME --- list only routes going via thisdevice.
via PREFIX --- list only routes going via selectedby PREFIX nexthop routers.
src PREFIX --- list only routes with preferredsource addresses selected by PREFIX.
realm REALMID or realms FROMREALM/TOREALM --- listonly routes with these realms.
Using this command isbest explained by running through some examples.
Example: Let uscount the routes of protocol gated/bgp on a router
kuznet@amber~ $ ip routelist proto gated/bgp | wc
1413 9891 79010
kuznet@amber~ $
To count size of routingcache we have to use option -o, because cached attributes can takemore than one line of the output
kuznet@amber~ $ ip -oroute list cloned | wc
159 2543 18707
kuznet@amber~ $
The output of thiscommand consists of per route records separated by line feeds.However, some records may consist of more than one line particularlywhen the route is cloned or you have requested additional statistics.If the option -o is given, then line feeds separating lines insiderecords are replaced with backslash sign.
The output has the samesyntax as arguments given to ip route add, so that it can beunderstood easily.
kuznet@amber~ $ ip routelist 193.233.7/24
193.233.7.0/24 dev eth0 proto gated/conn scope link \
src 193.233.7.65realms inr.ac
kuznet@amber~ $
If you list clonedentries the output contains other attributes, which are evaluatedduring route calculation and updated during route lifetime. Theexample of the output is:
kuznet@amber~ $ ip routelist 193.233.7.82 table cache
193.233.7.82 from193.233.7.82 dev eth0 src 193.233.7.65 \
realms inr.ac/inr.ac
cache<src-direct,redirect> mtu 1500 rtt 300 iif eth0
193.233.7.82 dev eth0 src 193.233.7.65 realms inr.ac
cache mtu 1500 rtt300
kuznet@amber~ $
This route looks a bitstrange, does it not? Did you notice that this is the path from193.233.7.82 back to 193.233.82? In the section on ip route get youwill see how this route is created.
The second line whichstarts with the word cache shows the additional attributes whichnormal routes do not possess. The cache flags contained within theangle brackets are:
local --- packets are delivered locally. It standsfor loopback unicast routes, for broadcast routes, and for multicastroutes if this host is member of the corresponding group.
reject --- the path is bad. Any attempt to use itresults in error. See attribute error below.
mc --- the destination is multicast.
brd --- the destination is broadcast.
src-direct --- the source is on a directlyconnected interface.
redirected --- the route was created by an ICMPRedirect.
redirect --- packets going via this route willtrigger ICMP redirect.
fastroute --- route is eligible to be used forfastroute.
equalize --- make packet by packet randomizationalong this path.
dst-nat --- destination address requirestranslation.
src-nat --- source address requires translation.
masq --- source address requires masquerading.
notify --- (not implemented) change/deletion ofthis route will trigger RTNETLINK notification.
The following areoptional attributes that may be present:
error --- on reject routes this is the error codereturned to local senders when they try to use this route. Theseerror codes are translated to ICMP error codes sent to remote sendersaccording to the rules described above in the subsection devoted toroute types.
expires --- this entry will expire after thistimeout.
iif --- the packets for this path are expected toarrive on this interface.
Giving the option-statistics will show further information about this route:
users --- number of users of this entry.
age --- shows when this route was used last time.
used --- number of lookups of this route since itscreation.
ip route flush - allows group deletion of routes
Abbreviations: flush, f.
This command allowsflushing routes as selected by some criteria.
The arguments have thesame syntax and semantics as the arguments of ip route show but therouting tables are purged rather than listed. The only difference isthe default action performed. Where the ip route show command dumpsthe main IP routing table, ip route flush prints the help page. Thereason for this difference does not require an explanation does it?
With the option-statistics the command becomes verbose and prints out the number ofdeleted routes and the number of rounds needed to flush the routingtable. If the option is given twice then ip route flush also dumpsall deleted routes in the format described in the previoussubsection.
Examples:
The first exampleflushes all the gatewayed routes from main table such as after arouting daemon crash.
netadm@amber~ # ip -4 roflush scope global type unicast
This option deserved tobe put into the scriptlet routef available within the IPROUTE2utility distribution. This option was described in the route(8) manpage as borrowed from BSD but was never implemented in Linux.
The second example isflushing all IPv6 cloned routes:
netadm@amber~ # ip -6 -s-s ro flush cache
3ffe2400220affffef4c5d1via 3ffe2400220affffef4c5d1 \
dev eth0 metric 0
cache used 2 age12sec mtu 1500 rtt 300
3ffe2400280adfffeb78034via 3ffe2400280adfffeb78034 \
dev eth0 metric 0
cache used 2 age15sec mtu 1500 rtt 300
3ffe2400280c8fffe595bccvia 3ffe2400280c8fffe595bcc \
dev eth0 metric 0
cache users 1 used1 age 23sec mtu 1500 rtt 300
3ffe2400012a0ccfffe661878via 3ffe2400012a0ccfffe661878 \
dev eth1 metric 0
cache used 2 age20sec mtu 1500 rtt 300
3ffe240001a0020fffe71fb30via 3ffe240001a0020fffe71fb30 \
dev eth1 metric 0
cache used 2 age33sec mtu 1500 rtt 300
ff021 via ff021 dev eth1 metric 0
cache users 1 used1 age 45sec mtu 1500 rtt 300
***Round 1, deleting 6entries***
***Flush is completeafter 1 round***
netadm@amber~ # ip -6 -s-s ro flush cache
Nothing to flush.
The third example isflushing BGP routing tables after gated death.
netadm@amber~ # ip ro lsproto gated/bgp wc
1408 9856 78730
netadm@amber~ # ip -s rof proto gated/bgp
***Round 1, deleting1408 entries***
***Flush is completeafter 1 round***
netadm@amber~ # ip ro fproto gated/bgp
Nothing to flush.
netadm@amber~ # ip ro lsproto gated/bgp
ip route get - obtain route pathing
Abbreviations: get, g.
This command gets asingle route to a destination and prints its contents exactly askernel sees it.
Arguments:
to ADDRESS (default) --- destination address.
from ADDRESS --- source address.
tos TOS or dsfield TOS --- Type Of Service.
iif NAME --- device, which this packet is expectedto arrive from.
oif NAME --- enforce output device, which thispacket will be routed out.
connected --- if no source address (option from)was given, relookup the route with the source address set to thepreferred address as received from the first lookup. If policyrouting is used this may be a different route.
Note that this operationis not equivalent to ip route show. ip route show shows the existingroutes, ip route get resolves them and creates new clones ifnecessary. Essentially, ip route get is equivalent to actuallysending a packet along this path. If the argument iif is not giventhen the kernel creates a route to output packets towards requesteddestination. This is equivalent to pinging the destination thenrunning ip route list cache but in the case of ip route get nopackets are actually sent. With the argument iif present the kernelpretends that a packet has arrived from this interface and searchesfor a path to forward the packet. This command outputs routes in thesame format as ip route ls.
Examples:
Find route to outputpackets to 193.233.7.82:
kuznet@amber~ $ ip routeget 193.233.7.82
193.233.7.82 dev eth0 src 193.233.7.65 realms inr.ac
cache mtu 1500 rtt300
kuznet@amber~ $
Find route to forwardpackets arriving on eth0 from 193.233.7.82 and destined to193.233.7.82:
kuznet@amber~ $ ip routeget 193.233.7.82 from 193.233.7.82 iif eth0
193.233.7.82 from193.233.7.82 dev eth0 src 193.233.7.65 \
realms inr.ac/inr.ac
cache<src-direct,redirect> mtu 1500 rtt 300 iif eth0
kuznet@amber~$
This is the operationthat created the funny route in the examples to ip route list with193.233.7.82 looped back to 193.233.7.82. Note the redirect flagpresent on the output.
Findmulticast route for packets arriving on eth0 from host 193.233.7.82and destined to multicast group 224.2.127.254 assuming that amulticast routing daemon is running such as in this case we arerunning pimd.
kuznet@amber~ $ ip routeget 224.2.127.254 from 193.233.7.82 iif eth0
multicast 224.2.127.254from 193.233.7.82 dev lo \
src 193.233.7.65realms inr.ac/cosmos
cache <mc> iifeth0 Oifs eth1 pimreg
kuznet@amber~$
This route differs fromthe ones seen before. It contains a normal part and a multicast part.The normal part is used to deliver or not deliver the packet to localIP listeners. In this case the router is not acting as a member ofthe multicast group so the route has no local flag and only forwardspackets. The output device for such entries is always loopback. Themulticast part consists of an additional Oifs list showing the outputinterfaces.
Now it is time for amore complicated example. Let us add an invalid gatewayed route for adestination which is really directly connected.
netadm@alisa~ # ip routeadd 193.233.7.98 via 193.233.7.254
netadm@alisa~ # ip routeget 193.233.7.98
193.233.7.98 via193.233.7.254 dev eth0 src 193.233.7.90
cache mtu 1500 rtt 3072
and probe it with ping
netadm@alisa~ # ping -n193.233.7.98
PING 193.233.7.98(193.233.7.98) from 193.233.7.90 56 data bytes
From 193.233.7.254Redirect Host(New nexthop 193.233.7.98)
64 bytes from193.233.7.98 icmp_seq=0 ttl=255 time=3.5 ms
From 193.233.7.254Redirect Host(New nexthop 193.233.7.98)
64 bytes from193.233.7.98 icmp_seq=1 ttl=255 time=2.2 ms
64 bytes from193.233.7.98 icmp_seq=2 ttl=255 time=0.4 ms
64 bytes from193.233.7.98 icmp_seq=3 ttl=255 time=0.4 ms
64 bytes from193.233.7.98 icmp_seq=4 ttl=255 time=0.4 ms
^C
--- 193.233.7.98 pingstatistics ---
5 packets transmitted, 5packets received, 0% packet loss
round-trip min/avg/max =0.4/1.3/3.5 ms
What occured? The routerat 193.233.7.254 understood that we have a much better path to thedestination and sent us an ICMP redirect message. We now retry iproute get to see what we have in our routing tables.
netadm@alisa~ # ip routeget 193.233.7.98
193.233.7.98 dev eth0 src 193.233.7.90
cache <redirected> mtu 1500 rtt 3072
ip rule --- routing policy database management.
Abbreviations: rule, ru.
Rules in routing policydatabase controlling route selection algorithm.
Classic routingalgorithms used in the Internet make routing decisions based only onthe destination address of packets and in theory, but not inpractice, on the TOS field. In some circumstances we want to routepackets differently depending not only on the destination addresses,but also on other packet fields such as source address, IP protocol,transport protocol ports or even packet payload. This task is called"policy routing".
"policy routing"!= "routing policy"
"policy routing"= "cunning routing"
"routing policy"= "routing tactics" or "routing plan"
To solve this task theconventional destination based routing table, ordered according tothe longest match rule, is replaced with the "routing policydatabase" or RPDB, which selects the appropriate route throughexecution of some set of rules. These rules may have many keys ofdifferent natures and therefore they have no natural orderingexcepting that which is imposed by the network administrator. InLinux the RPDB is a linear list of rules ordered by a numericpriority value. The RPDB explicitly allows matching packet sourceaddress, packet destination address, TOS, incoming interface (whichis packet metadata, rather than a packet field), and using fwmarkvalues for matching IP protocols and transport ports.
Each routing policy ruleconsists of a selector and an action predicate. The RPDB is scannedin the order of increasing priority with the selector of each ruleapplied to the source address, destination address, incominginterface, tos, and fwmark. If the selector matches the packet theaction is performed. The action predicate may return success in whichcase the rule output provides either a route or a failure indicationand RPDB lookup is then terminated. Otherwise, the RPDB programcontinues on to the next rule.
What is the actionsemantically? The natural action is to select the nexthop and outputdevice. This is the way a packet path route is selected by Cisco IOS,let us call it "match & set". In Linux the approach ismore flexible as the action includes lookups in destination-basedrouting tables and selecting a route from these tables according toclassic longest match algorithm. The "match & set"approach then becomes the simplest case of Linux route selectionrealized when the second level routing table contains a singledefault route. Remember that Linux supports multiple routing tablesmanaged with ip route command.
At startup the kernelconfigures a default RPDB consisting of three rules:
1. Priority 0: Selector = match anything
Action = lookup routing table local (ID 255).
The table local is the special routing tablecontaining high priority control routes for local and broadcastaddresses.
Rule0 is special, it cannot be deleted or overridden.
2. Priority 32766:Selector = match anything
Action = lookup routingtable main (ID 254)
The table main is thenormal routing table containing all non-policy routes. This rule maybe deleted or overridden with other rules.
3. Priority 32767:Selector = match anything
Action = lookup routingtable default (ID 253).
The table default is empty and reservedfor post-processing if previous default rules did not select thepacket. This rule also may be deleted.
Do not mix routingtables and rules. Rules point to routing tables, several rules mayrefer to one routing table and some routing tables may have no rulespointing to them. If you delete all the rules referring to a tablethen the table is not used but still exists. A routing table willdisappear only after all the routes contained within it are deleted.
Rule attributes:Each RPDB entry has additional attributes attached. Each rule has apointer to some routing table. NAT and masquerading rules have theattribute to select a new IP address to translate/masquearade.Additionally rules have some of the optional attributes which routeshave such as realms. These values do not override those contained inrouting tables, they are used only if the route did not select any ofthose attributes.
Rule types: TheRPDB may contain rules of the following types.
unicast --- the rule prescribes returning theroute found in the routing table referenced by the rule.
blackhole --- the rule prescribes to drop packetsilently.
unreachable --- the rule prescribes generating theerror "Network is unreachable".
prohibit --- the rule prescribes generating theerror "Communication is administratively prohibited".
nat --- the rule prescribes translating the sourceaddress of the IP packet to some other value.
ip rule add --- insert new rule
Abbreviations: add, a; delete, del, d.
Arguments:
type TYPE (default) --- type of this rule. Thelist of valid types was given in the previous subsection.
from PREFIX --- select source prefix to match.
to PREFIX --- select destination prefix to match.
iif NAME --- select incoming device to match. Ifthe interface is loopback, the rule matches only packets originatedby this host. It means that you may create separate routing tablesfor forwarded and local packets and, hence, completely segregatethem.
tos TOS or dsfield TOS --- select TOS value tomatch.
fwmark MARK --- select value of fwmark to match.
priority PREFERENCE --- priority of this rule.Each rule should have an explicitly set unique priority value.Priority is an unsigned 32 bit number thus we have 4294967296possible rules.
WARNING!
For historical reasons ip rule add does notrequire any priority value and allows the priority value to benon-unique. If the user had not supplied a priority value then onewas assigned by the kernel.If the user requested creating a rule witha priority value which already existed then the kernel did not rejectthe request and added the new rule before all old rules of the samepriority. This is a mistake in the current design, nothing more. Itshould be fixed by the time you read this so please do not rely onthis feature. You should always use explicit priorities when creatingrules.
table TABLEID --- routing table identifier tolookup if the rule selector matches.
realms FROM/TO --- Realms to select if the rulematched and routing table lookup succeeded. Realm TO is used only ifthe route returned did not select any realm.
nat ADDRESS --- The base of IP address block totranslate source address. The ADDRESS may be either the start of ablock of NAT addresses as selected by NAT routes, a local hostaddress, or even zero. In the last two cases the Linux router doesnot NAT translate the packets but masquerades them to this address.
Changes to the RPDB madewith these commands do not become active immediately. You should runip route flush cache to flush out the routing cache after insertingrules.
Examples:
Route packets withsource addresses from 192.203.80/24 according to routing tableinr.ruhep
ip rule add from192.203.80.0/24 table inr.ruhep prio 220
Translate packet source193.233.7.83 to 192.203.80.144 and route it according to table #1(Table #1 is defined in /etc/iproute/rt_tables as inr.ruhep)
ip rule add from193.233.7.83 nat 192.203.80.144 table 1 prio 320
Delete unused defaultrule
ip rule del prio 32767
ip rule show - list policy rules
Abbreviations: show, list, sh, ls, l.
Good news - this is theonly command which has no arguments. Here is the example:
kuznet@amber~ $ ip rulelist
0 from all lookuplocal
200 from192.203.80.0/24 to 193.233.7.0/24 lookup main
210 from192.203.80.0/24 to 192.203.80.0/24 lookup main
220 from192.203.80.0/24 lookup inr.ruhep realms inr.ruhep/radio-msu
300 from 193.233.7.83to 193.233.7.0/24 lookup main
310 from 193.233.7.83to 192.203.80.0/24 lookup main
320 from 193.233.7.83lookup inr.ruhep map-to 192.203.80.144
32766 from all lookupmain
In the first positionthe rule priority value stands followed by a colon. Then theselectors follow with each key prefixed by the keyword used to createthe rule.
The keyword lookup isfollowed by the routing table identifier as recorded in the file/etc/iproute2/rt_tables.
If the rule does NAT, asin rule #320, it is shown by the keyword map-to followed by the startof the block of addresses to map.
The sense of thisexample is pretty simple. The prefixes 192.203.80.0/24 and193.233.7.0/24 form an internal network but each prefix is routeddifferently. Additionally, the host 193.233.7.83 is translated toanother prefix as 192.203.80.144 when talking to the outer world.
ip tunnel - ip tunnelling configuration
Abbreviations: tunnel, tunl.
The tunnel objects aretunnels encapsulating packets within IPv4 packets and sending themover the IP infrastructure.
ip tunnel add - creating tunnels
Abbreviations: add, a
Arguments:
name NAME (default) --- select tunnel device name.
mode MODE --- set tunnel mode. Three modes areavailable: ipip, sit, gre
remote ADDRESS --- set remote endpoint of thetunnel.
local ADDRESS --- set fixed local address fortunneled packets. It must be an address on another interface of thishost.
ttl N --- set fixed TTL N on tunneled packets. Nis number in the range 1--255. 0 is special value, meaning thatpackets inherit TTL value. Default value is inherit.
tos TOS or dsfield TOS --- set fixed TOS ontunneled packets. Default value is inherit.
dev NAME --- bind tunnel to device NAME, so thattunneled packets will be routed only via this device and will notable to escape to another device, when route to endpoint changes.
nopmtudisc --- disable Path MTU Discovery on thistunnel. It is enabled by default. Note that a fixed ttl isincompatible with this option. A tunnel with fixed ttl alwaysperforms pmtu discovery.
key K, ikey K, okey K --- (GRE only) use keyed GREwith key K. K is either number or IP address-like dotted quad. Theparameter key sets key to use in both directions, ikey and okey allowsetting different keys for input and output.
csum, icsum, ocsum --- (GRE only) checksumtunneled packets. The flag ocsum orders checksumming outgoingpackets, icsum requires that all the input packets have a correctchecksum. csum is equivalent to the combination "icsum ocsum".
seq, iseq, oseq --- (GRE only) serialize packets.The flag oseq enables sequencing outgouing packets, iseq requiresthat all the input packets were serialized. seq is equivalent to thecombination "iseq oseq".
I think this option does not work. At least, Idid not test it, did not debug it and even do not understand, how itis supposed to work and for what purpose Cisco planned to use it. Donot use it. -- Alexey
Examples:
Create pointopoint IPv6tunnel with maximal TTL of 32.
ip tunl add Cisco modesit remote 192.31.7.104 local 192.203.80.142 ttl 32
ip tunnel show - list tunnel attributes
Abbreviations:show, list, sh, ls, l.
Example:
kuznet@amber~ $ ip tunlls Cisco
Cisco: ipv6/ip remote192.31.7.104 local 192.203.80.142 ttl 32
The line starts with thetunnel device name terminated by a colon then the tunnel modefollows. The parameters of the tunnel are listed with the samekeywords which were used at tunnel creation.
kuznet@amber~ $ ip -stunl ls Cisco
Cisco ipv6/ip remote192.31.7.104 local 192.203.80.142 ttl 32
RX Packets Bytes Errors CsumErrs OutOfSeq Mcasts
12566 1707516 0 0 0 0
TX Packets Bytes Errors DeadLoop NoRoute NoBufs
13445 1879677 0 0 0 0
Essentially thesenumbers are the same as those printed using ip -s link show but thetags are different to reflect tunnel specific features. Thesefeatures are:
CsumErrs --- total number of packets droppedbecause of checksum failures for GRE tunnel with enabledchecksumming.
OutOfSeq --- total number of packets droppedbecause they arrived out of sequence for GRE tunnel with enabledserialization.
Mcasts --- total number of multicast packets,received on broadcast GRE tunnel.
DeadLoop --- total number of packets, which werenot transmitted because tunnel is looped back to itself.
NoRoute --- total number of packets, which werenot transmitted because there is no IP route to remote endpoint.
NoBufs --- total number of packets, which were nottransmitted because kernel failed to allocate buffer.
ip monitor and rtmon --- route state monitoring
The ip utility allowsmonitoring the state of devices, addresses, and routes continuously.This option has a different format in that the command monitor isfirst on the command line followed by the object list.
ip monitor [ file FILE ][ all OBJECT-LIST ]
OBJECT-LIST is the listof object types which we want to monitor. It may contain link,address, and route. If no file argument is given, ip opens RTNETLINK,listens to it and dumps the state changes in the format as describedin the previous sections.
If a file name is givenip does not listen to RTNETLINK but opens the file which is assumedto contain RTNETLINK messages saved in binary format and dumps them.Such a history file can be generated with the utility rtmon. Thisutility has a command line syntax similar to ip monitor. Ideally,rtmon should be started before the first network configurationcommand is issued. It is possible to start rtmon at any time as itprepends the history with the system state snapshot dumped at themoment of startup.
rtacct - route realms and policy propagation
On routers using OSPFASE or especially the BGP protocol, the routing tables may be huge.If we want to classify or account for the packets per route, we willhave to keep lots of information. Even worse, if we want todistinguish the packets not only by their destination, but also bytheir source, the task presents a quadratic complexity and itssolution is physically impossible.
One approach forpropagating the policy from routing protocols to the forwardingengine has been proposed. Essentially, Cisco Policy Propagation viaBGP is based on the fact that dedicated routers have the entire RIB(Routing Information Base) close to forwarding engine so that policyrouting rules can check all the route attributes including ASPATHinformation and community strings.
Within the Linuxarchitecture where we have a split RIB as maintained by user leveldaemon, and the kernel based FIB (Forwarding Information Base), wecannot allow such a simplistic approach.
Fortunately there existsanother solution allowing an even more flexible policy with richsemantics. Routes can be clustered together in user space based ontheir attributes. IE: A BGP router knows the route ASPATH or itscommunity whereas an OSPF router knows the route tag or its area. Anetwork administrator adding routes manually knows the nature ofthose routes. Providing that the number of such aggregates, which wecall realms, is low, the task of full classification both by sourceand destination becomes quite manageable.
So each route may beassigned to a realm. It is assumed that this identification is madeby a routing daemon, but static routes may also be assigned manuallythrough ip route.
Currently there exists apatch to gated allowing it to classify routes to realms over all theset of policy rules. This classification is implemented within gatedby prefix, ASPATH, origin, tag, etc.
To facilitate thisconstruction in the case when the routing daemon is not aware ofrealms, missing realms may be completed with routing policy rules.
For each packet thekernel calculates the tuple of realms, source realm and destinationrealm, using the following algorithm:
1. If route has a realm, destination realm of thepacket is set to it.
2. If rule has a source realm, source realm of thepacket is set to it.
3. If destination realm was not obtained fromroute and rule has destination realm, set destination realm fromrule.
4. If at least one of realms is still unknown,kernel finds reversed route to the source of the packet.
5. If source realm is still unknown, get it fromreversed route.
6. If one of realms is still unknown, swap realmsof reversed routes and apply step 2 again.
After this procedure iscompleted, we know what realm the packet arrived from and the realmwhere it is going to propagate to. If any of the realms is unknown,it is initialized to zero (or realm unknown).
The main application ofrealms is in conjunction with the tc route classifier where they areused to help assign packets to traffic classes, for accounting,policing, and scheduling them according to the classification.
A much simpler but stillvery useful application is packet path accounting by realms. Thekernel gathers a packet statistics summary which can be viewed withutility rtacct.
kuznet@amber~ $ rtacctrussia
Realm BytesTo PktsTo BytesFrom PktsFrom
russia 20576778 169176 47080168 153805
This output shows thatthis router has received 153805 packets from realm russia andforwarded 169176 packets to russia. The realm russia consists ofroutes with ASPATHs not leaving russia.
Note that locallyoriginated packets are not accounted here as rtacct shows ingoingpackets only. Using the route classifier you can get even moredetailed accounting information about outgoing packets, optionallysummarizing traffic not only by source or destination, but by anypair of source and destination realms.
IP Utility Summary
We have presented inthis section coverage of the ip utility from the IPROUTE2 utilitysuite. As we have shown this is the replacement under Linux for theifconfig and route utilities for performing advanced IP networkmanipulation. While the standard utilities will suffice for simplesetups we recommend using the ip utility instead in order to both befamiliar with the usage as well as able to utilize the vast power ofthis utility. Linux possesses one of the most complete and powerfulimplementations of IP networking facilities available. We will nowcover some of the basics of using the ip utility within scripts.
IP Usage in Scripting
In this section we willuse what we have learned about the ip utility to create and learnfrom several scripts. First we will create ipup and ipdown scriptsfor our system. Then we will cover the operation of Alexey's ifcfgscript from IPROUTE2 that uses the ip utilities to provide a strongerversion of ifconfig. Finally we will cover an example of creatingmultiple route tables for splitting up outgoing traffic.
IPUP & IPDOWN
In this section we willcreate some custom networking scripts along with the core/etc/rc.d/init.d/network script using the IPROUTE2 utility suite.
First let us considerhow we would manually configure the interfaces with the ip utility.The first interface is lo and it was configured under ifconfig as:ifconfig lo 127.0.0.1 netmask 255.0.0.0 broadcast 127.255.255.255.Rewriting this in ip we get two lines because of the granularity ofcontrol. So we have:
ip addr add 127.0.0.1/8dev lo broadcast +
ip link set lo up
If we want to substitutethis directly into the ipup script we would fail as the format of theconfiguration file, ifcfg-lo, is different from the information weneed to configure the interface using ip. Also remember that the iputility allows us to set multiple addresses on a single interface andour ipup script should allow us to take full advantage of thatfacility without requiring it.
To configure multipleaddress on our network interface we will use a loop over all thepossible values of addresses within a variable. Additionally we maywant to allow for renaming the device before assigning addresses sothat the output of our listings makes better logical sense. So firstwe should think about what variables we would require in aconfiguration script. Then we can start writing the ipup script totake advantage of the ip utility functions and our configurationvariables.
Consider the followinginterface configuration file:
#!/bin/sh
#>>>Devicetype: ethernet
#>>>Variabledeclarations:
DEVICE=eth0
DEVNAME=inet0
IPCIDR="192.168.1.1/24
10.3.123.1/28"
STARTME=1
#>>>Endvariable declarations
We have ip addressesrecorded in the CIDR format that is used by the ip addr command. Wehave the actual kernel boot supplied interface name and also avariable for renaming the device. Finally we have the on-bootinitialization switch. We want our ipup script to allow on-boot initas well as after boot init functions. Since we can define more thanone IPv4 address within this configuration we need a loop function toiterate the address assignment. Combining all of these needs we getthe following script.
***Begin Listing - ipupscript***
#!/bin/bash
cd/etc/sysconfig/network-scripts/
. $1
if ([ $STARTME -eq 1 ]|| [ "$2" = "now" ])
then
/sbin/ip link set$DEVICE down
DEV=$DEVICE
if [ -n $DEVNAME ];then
/sbin/ip link set$DEVICE name $DEVNAME
DEV=$DEVNAME
fi
for ADDRESS in$IPCIDR
do
/sbin/ipaddr add $ADDRESS broadcast + dev $DEV
done
/sbin/ip link set$DEV arp on
/sbin/ip link set$DEV up
fi
***End Listing***
Note that we allow forboth changing or not changing the device name. The inner loop assignsall addresses that are listed in the IPCIDR variable to the device.Thus with a simple config file and a short ipup script we can setupour network devices with custom names and multiple addresses.
Let us take a quick lookat the related ipdown script that uses the ip utility.
***Begin Listing -ipdown script***
#!/bin/bash
cd/etc/sysconfig/network-scripts/
. $1
DEV=$DEVICE
if [ -n $DEVNAME ]; then
DEV=$DEVNAME
fi
for ADDRESS in $IPCIDR
do
/sbin/ip addrdel $ADDRESS dev $DEV
done
/sbin/ip link set $DEVdown
/sbin/ip link set $DEVarp off
if [ -n $DEVNAME ]; then
ip link set $DEVNAMEname $DEVICE
fi
***End Listing***
Note that we change thedevice name back to the original kernel defined name. That way we canswitch between using any set of utilities we want as any particularset will restore the device to the same state as it started from.
IPNetwork Init Script
Now that we have newipup, ipdown, and ipcfg-xxx files, let us turn our attention to theinit file that runs the ipup script on system bootup. On our systemsthis file is called ipnetwork and resides in the /etc/rc.d/init.d/directory. We will consider the final format of this file as it iswritten with the IPROUTE2 utilities in mind.
***Begin Listing -/etc/rc.d/init.d/ipnetwork***
#
# IP network Turnon/off IP networking
#
# Source functionlibrary.
./etc/rc.d/init.d/pakinit.functions
./etc/sysconfig/ipnetwork
cd/etc/sysconfig/network-scripts
# See how we werecalled.
case "$1" in
start)
pakcmd "ipup.mon""Starting Monitor" exec /sbin/rtmon file \
/var/log/iproute.log &
pakcmd "ipup.lo" "Starting LoopBack " ./ipup ipcfg-lo
for IF in $INTERFACES; do
for i inipcfg-$IF[0-9]; do
pakcmd "ipup.$i"" Starting IP Interface $i " ./ipup $i
done
if [ -r ipcfg-routes]; then
pakcmd "ipup.2""Starting IP Static Routes " ./ipcfg-routes
fi
done
;;
stop)
for IF in $INTERFACES;do
for i inipcfg-$IF[0-9]; do
pakcmd "ipdown.$i""Downing IP Interface $i " ./ipdown $i
done
done
;;
*)
echo "Usage:ipnetwork {start|stop}"
exit 1
esac
exit 0
***End Listing***
We now posess a completeset of IP configuration scripts that will use the ip utility tocreate and destroy IP interfaces. Note that when we consider IPv6these scripts can also be used with only minor changes. Then we willhave configuration files and scripts that cover both protocols.
ifcfg script
We will now dissect ashell script provided in the IPROUTE2 package. This script is calledifcfg and Alexey wrote it as a replacement for ifconfig. Here is thefull text of the script:
***Begin Listing - ifcfgscript***
#! /bin/bash
CheckForwarding () {
local sbase fwd
sbase=/proc/sys/net/ipv4/conf
fwd=0
if [ -d $sbase ]; then
for dir in$sbase/*/forwarding; do
fwd=$[$fwd + `cat$dir`]
done
else
fwd=2
fi
return $fwd
}
RestartRDISC () {
killall -HUP rdisc ||rdisc -fs
}
ABCMaskLen () {
local class;
class=${1%%.*}
if [ "$1" ="" -o $class -eq 0 -o $class -ge 224 ]; then return 0
elif [ $class -ge 224]; then return 0
elif [ $class -ge 192]; then return 24
elif [ $class -ge 128]; then return 16
else return 8; fi
}
label="label $1"
ldev="$1"
dev=${1%:*}
if [ "$dev" ="" -o "$1" = "help" ]; then
echo "Usage:ifcfg DEV [[add|del [ADDR[/LEN]] [PEER] | stop]" 1>&2
echo " add- add new address" 1>&2
echo " del- delete address" 1>&2
echo " stop- completely disable IP" 1>&2
exit 1
fi
shift
CheckForwarding
fwd=$?
if [ $fwd -ne 0 ]; then
echo "Forwardingis ON or its state is unknown ($fwd). OK, No RDISC." 1>&2
fi
deleting=0
case "$1" in
add) shift ;;
stop)
if [ "$ldev"!= "$dev" ]; then
echo "Cannotstop alias $ldev" 1>&2
exit 1;
fi
ip -4 addr flush dev$dev $label || exit 1
if [ $fwd -eq 0 ];then RestartRDISC; fi
exit 0 ;;
del*)
deleting=1; shift ;;
*)
esac
ipaddr=
pfxlen=
if [ "$1" !="" ]; then
ipaddr=${1%/*}
if [ "$1" !="$ipaddr" ]; then
pfxlen=${1#*/}
fi
if [ "$ipaddr"= "" ]; then
echo "$1 is badIP address." 1>&2
exit 1
fi
fi
shift
peer=$1
if [ "$peer"!= "" ]; then
if [ "$pfxlen"!= "" -a "$pfxlen" != "32" ]; then
echo "Peeraddress with non-trivial netmask." 1>&2
exit 1
fi
pfx="$ipaddr peer$peer"
else
if [ "$pfxlen"= "" ]; then
ABCMaskLen $ipaddr
pfxlen=$?
fi
pfx="$ipaddr/$pfxlen"
fi
if [ "$ldev" ="$dev" -a "$ipaddr" != "" ]; then
label=
fi
if [ $deleting -ne 0 ];then
ip addr del $pfx dev$dev $label || exit 1
if [ $fwd -eq 0 ];then RestartRDISC; fi
exit 0
fi
if ! ip link set up dev$dev ; then
echo "Error:cannot enable interface $dev." 1>&2
exit 1
fi
if [ "$ipaddr"= "" ]; then exit 0; fi
if ! arping -q -c 2 -w 3-D -I $dev $ipaddr ; then
echo "Error: somehost already uses address $ipaddr on $dev." 1>&2
exit 1
fi
if ! ip address add $pfxbrd + dev $dev $label; then
echo "Error:failed to add $pfx on $dev." 1>&2
exit 1
fi
arping -q -A -c 1 -I$dev $ipaddr
noarp=$?
( sleep 2 ;
arping -q -U -c 1 -I$dev $ipaddr ) >& /dev/null </dev/null &
ip route add unreachable224.0.0.0/24 >& /dev/null
ip route add unreachable255.255.255.255 >& /dev/null
if [ `ip link ls $dev |grep -c MULTICAST` -ge 1 ]; then
ip route add224.0.0.0/4 dev $dev scope global >& /dev/null
fi
if [ $fwd -eq 0 ]; then
if [ $noarp -eq 0 ];then
ip ro append defaultdev $dev metric 30000 scope global
elif [ "$peer"!= "" ]; then
if ping -q -c 2 -w 4$peer ; then
ip ro appenddefault via $peer dev $dev metric 30001
fi
fi
RestartRDISC
fi
exit 0
***End Listing***
We will take this scriptapart piece by piece and explain what it is doing. At the end of thisyou should have a good understanding of the way an IP address can bechecked for correct operation.
First off notice thatthere are several functions defined early in the script. The firstone, CheckForwarding(), performs a check using the integer valuespresent within the interface forwarding sysctl. The second one,RestartRDISC(), is for restarting the router discovery daemon. Thethird one, ABCMasqLen(), is just for making assumptions about thestandard class netmask.
The script begins byassigning the first argument as the device name and then performingsome error checking. If the arguments are incorrect or the helpswitch was provided then the usage for the command is printed out.Note from the usage statement that this command expects the netmaskto be in CIDR format. If the netmask is not in CIDR format or isprovided incorrectly then the class assumption is made. Havingassigned the device name to the ldev variable we shift the argumentsand check on our forwarding setup. If the forwarding is on we print amessage and continue assigning the forwarding result to the fwdvariable.
We next take up the casestatement that determines what operations we will perform on theinterface. In the case of add we shift arguments and continue, in thecase of del we set a variable then shift and continue. The case ofstop brings up a quick flush of the entire interface ip addressing.Note that on the stop routine we first check to make sure we are notdealing with legacy aliased devices. Such devices use the dev:#format and should not be used anymore due to the new multiple addressstructure available for IP. Note also that after we flush theinterface addressing we restart the router discovery daemon if ourforwarding sysctl is equal to zero. If we are running a router thenwe will have set the forwarding status to ensure that other devicescan interoperate with us. See Chapter 4 on sysctl for moreinformation.
In the case we areadding or deleting an interface address we continue on throughchecking the given address and mask length. Then we check on the peeraddress and determine if it is a single valued ip address. Once wepass these checks we test the netmask to determine if we can safelyuse it. If the netmask does not exist then we call the standard classnetmask function to determine the standard class for the given ipaddress. This function will return the class netmask as a CIDR maskvalue based solely on the first octet of the address. Once we eitherhave a defined netmask or have generated one from our address we thencan define our ip address completely using CIDR format.
We have now completedparsing our arguments and now start into the actual work ofmanipulating the interface. We first cover the case where we aredeleting the ip address from the interface. After deleting theaddress we again restart the router discovery daemon if ourforwarding sysctl is equal to zero to ensure the update of therouting tables. If we are not deleting an address from an interfacewe start running the verification testing. This is where we can learnhow to create better scripts for our own interface addressing.
The first test runsimply verifies that the interface device can be enabled. If not thenthe script aborts because without a running device we cannot do anyconfiguration of the addressing. After determining that the ipaddress we will use is non-null we run a duplicate address test. Thisis an important check to see if the address we want to use alreadyexists on the local network. This uses the arping utility which canmanipulate arp functions. This utility is very powerful and providesquite a few functions for determining and using the ip networkstructure. We will diverge a bit here to discuss this utility.
arping utility
The arping utility isone of several helpful ip utilities provided in the iputils packagefrom Alexey Kuznetsov. The utilities in this package include arping,clockdiff, ping, ping6, rdisc, tracepath, tracepath6, andtraceroute6. The collection should be installed on any machine whereyou will be running any of the advanced ip networking functions.These commands can be used to disrupt the network so caution must beexercised in their use and accessability.
Arping itself providesan IPv4 ping utility that uses ARP packets for communication. This isvery usefull for manipulating arp tables on other local networkdevices. The arping utility can provide duplicate address testing onthe local network and two types of unsolicited ARP output to enablequick updating of local network device arp tables. This latterfunctionality can be used to create hot standby servers on a networkthat allow failover of identical ip addresses to alternate devices.It can also be used to wreak havok on a local network that is notconfigured to prevent sabatoge.
WARNING!
If you do not understand how these functionswork then we strongly reccommend that you obtain a copy of TCP/IPIllustrated Vol. 1 by W. Richard Stevens and read it. Without a firmunderstanding of the basic mechanisms of TCP/IP v.4 networkcommunication most of the utilities we discuss and procedures weexecute can cause severe disruption of your network.
Now that we understandwhat this utility provides let us return to the discussion of theifcfg script.
We have now checked thatthe ip address provided on the command line is not already in use onour local networks. We now assign the ip address to the associateddevice and determine the correct completion of the command. Once thisassignment has succesfully completed we use the unsolicited ARPmechanism of arping to update the arp caches on all of theneighboring devices. This provides instant access to our ip devicefrom any of the local network ip devices.
WARNING!
If you have Win95/98/NT machines in your networkbe warned that the Windows TCP/IP network stack performs duplicateaddress testing incorrectly as specified by RFC-2131. Windows TCP/IPstack sends out a gratuitous ARP immediately on starting TCP/IP. Thisforces interruption of IP networking services if the IP address isalready in use and prevents the TCP/IP stack on Windows fromstarting. There is no known way to workaround this fatal bug inWindows TCP/IP stacks. Do not use proxy ARP or the arping functionsin a Windows TCP/IP environment.
Sadly enough the arpingutility is very popular among disgruntled network people as ananonymous way of preventing NT servers from starting up. It isridiculously simple to have the Linux machine watch for GratArprequests and issue an arping response thus preventing the NT fromenabling the network card. And since ARP must be specially watchedfor by almost all network management systems it is rarely detectedthat this trick is being played. And even if you are watching youREALLY need to know what is going on as it looks "normal"for the original transaction to take place.
Quoting Alexey here ashe replied to a MCSE who brought this up on LinNetDev:
"You have learnedyour networking from a broken pile of crap and you expect me to breakmy system so that you in your dumbness will be happy?"
Of course I almost didnot include this here as there are enough problems in this worldwithout purposefully baiting the stupid.. 8-} - Enough said.
Now that we haveinstalled our address and updated the local network hosts we turn tosetting up a corrected routing structure. We will first deal withroutes to the multicast address class and the broadcast class. Westart by sending both of these routes into the table main with asetting of host unreachable. We then test our link device formulticast capabilities. If the link is multicast enabled we allow theroute for the multicast address class to be assigned to the device.
Finally we test again ifthe interface is forwarding. In the case of no forwarding we furthertest for the arp capabilities and peer addresses. If our interfacehas arp capabilities then we place a default route with a high metricout our device. In the case where we have a peer address then we testfor the presence and insert a default route via our peer address witha somewhat high metric. In either case with no forwarding we restartrouter discovery as the final step.
Now that we have coveredthe script operation let us look at the utilities and logic behind itwith an eye towards modifying our own interface address assignmentscripts. First of all we will note the use of the router discoveryagent. This agent is one of the reasons we stressed in the parts onip route why you should always add a protocol level to your routes.We stressed that if you will be using ip route to add routes to thetables that you code them with protocol static to enable the kernelto know that they are valid static routes. Here is one of the reasonswhy this is important. Under router discovery the rdisc daemon canoverride routes that are non protocol tagged. So if we had justplaced a default route into our table and we then start routerdiscovery we will find that our route is not being used unless wecoded the protocol. This is even more important if we will use any ofthe routing daemons such as zebra or gated.
Note that even in thisscript we need to try and use CIDR notation format for our IPaddressing. This is actually a very good requirement as it speaksdirectly to the function of address masking. IP address masks, andIPX address masks as well, require that the mask portion becontiguous. When we write out a mask using the old style dotteddecimal it is impossible to indicate the continuity of the mask.Consider the address mask 255.252.255.0. If you do not understandthat masks must be contiguous then this looks like it could be avalid mask. We say that knowing that many people configuring IPsystems rely on the numbers belonging to the set of good numbers.This set is: 255,254,252,248,240,224,192,128 So the assumption isthat if these numbers are present then the mask must be valid. UsingCIDR style notation we indicate the number of contiguous onesstarting from the left in the mask. In this manner it is impossibleto specify an invalid mask. Additionally you can readily see thescope of the address mask in CIDR notation thus making it easier tosee where a route would be a more specific or general set of anotherroute. So our choice of using CIDR address notation within ourconfiguration file turns out to be the best way of specifying ouraddressing.
As far as consideringthe uses of the arping utility, as we mentioned in the warning aboveMicrosoft Windows IPv4 stacks do not handle duplicate addressdetection correctly thus using arping on such a network can beproblematic. We would not recommend using it except for the duplicateaddress detection. In our case we will not want to place it into ourscripts except in cases where we know that we will need to performduplicate address detection due to oddly configured DHCP servers andother such high levels of keyboard-seat interface errors.
Policy Routing - Multiple Route Tables Example
We will now considerwhat is arguably the most powerful feature in the Linux kernelrouting code: The use of multiple routing tables combined with policybased routing. In the following text we will present an example of asystem acting as a router for three disparate networks. We willreturn to this example and consider the ipchains utility in thelatter half of this book where we cover Linux security and firewalls.
First off look at thediagram of the network we have under consideration:
Multiple Route TablesNetwork Diagram
Note that we have threeexternal networks attached to our external ethernet interface. Eachof these networks has their own router and their own IP address spacethat we need to use. Note that two of these address spaces overlapthus adding in a degree of complexity. We will want to setup ourrouting tables to allow the following connectivity.
* All traffic from anyinternal network may go to the Internet
* Traffic from InternalB may go to Network A
* Traffic from InternalA may go to Network C
* Traffic from InternalA Hosts 33-62 may go to Network A
* Traffic from InternalB Hosts 65-78 may go to Network C
First we will want tosetup our external IP addressing. We will setup two addresses on ourDMZ ethernel interface.
***Begin Listing***
ip addr add10.254.254.2/30 dev eth0
ip addr add172.17.1.128/24 dev eth0
***End Listing***
Next we will cover whatroute tables we will want to create.
One of the best ways tolook at this is to consider that policy routing enables us todetermine what routing table to use for source addresses. So therules should enable us to segment the internal networks. Then we cansetup normal destination based routes within the tables. So let uscreate two new route tables and then discuss the ramifications ofthis decision.
First we will name thetables by editing the /etc/iproute2/rt_tables file. We will end upwith a file that looks like the following:
***Begin Listing***
#
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# Local Tables
#
1 networka
2 networkb
***End Listing***
Now we can refer tothese tables in the rule commands. First we will set up the routes ineach of these tables.
When you go aboutsetting up the routes within tables there is an approach you can takewhich will help clarify the steps. Imagine that you are configuring arouter that only has two interfaces. The outgoing interface attachesto any outbound router and the incoming interface already only hasthe packets you want to route. Then it is simply a matter of settingup the routing that you would want within that scope. Let usillustrate by running through the setup for table 1, networka. Hereare the commands we would need to input to get the networka routingtable configured:
***create networkarouting table***
ip route add10.10.0.0/16 via 10.254.254.2 table networka proto static
ip route add default via172.17.1.254 table networka proto static
***End***
Now what is the commandsequence for table networkb? Exactly.
***create networkbrouting table***
ip route add172.18.0.0/16 via 172.17.1.1 table networkb proto static
ip route add default via172.17.1.254 table networkb proto static
***End***
Let us step back amoment and discuss why we have only two tables with threedestinations. Notice that we have duplicated the destination defaultfor the Internet into both tables. Why not place this into a thirdtable that we will refer to? The best way to answer this is toconsider the interaction between the rules and the tables. Rememberthat the rules define the policy based routing structure. Multiplerules may point to the same table. However once you are in a tableyou will need to either obtain a route or be returned via a throwroute to the rule list. So if we have matched a rule we would like toassume that we have a correct match of our policy. Thus we would likeall possible routes for that packet to be present in the routingtable to which the packet is sent. In the case where we have threetables then we would have to have additional rules that actually needto look at the destination of the packet. But looking at the packetdestination is the function of a standard route. So why have rulesfor every possible combination of source AND destination? By usingthe table we have we can create a few rules that will serve ourpurpose. Of course the flexibility of the system allows doing theother way around or even through granular inspection of the packetsthemselves. You should work through all of these scenarios foryourself and decide what works best for you. Enough theorizing,onwards to the action.
***ip rule set #1***
ip rule add from192.168.1.32/27 to 172.18.0.0/16 pref 15000 table networka
ip rule add from192.168.2.64/28 to 10.10.0.0/16 pref 15001 table networkb
ip rule add from192.168.1.0/24 pref 15002 table networkb
ip rule add from192.168.2.0/24 pref 15003 table networka
***End***
Note that we have usedthe preference settings to run our rules from most detailed to mostgeneral. Remember from the discussion of ip rule that there are twodefault rules with higher priorities present to catch whatever we donot specify here. These two default rules are very important. Thinkabout what would happen if we forgot about those rules and specifiedpriorities for our rules such as 65535? Would our rules ever be used?
So now consider whatwill happen to a packet incoming from one of our internal networks.First it will be passed through the rule priority 0 which will passon it. Then it hits rule priority 15000. If it matches it will berouted according to table networka. If not it runs through rule15001, then 15002, and finally 15003. Will such a packet evercontinue on beyond rule priority 15003?
Now let us confuse theissue. We will redo our tables and rules from a different angle justto illustrate the range of flexibility we have to specify our routingstructure.
First let us providesome details about our Linux server. Our Linux server has thefollowing network interfaces and addresses:
eth0 - DMZ ethernet - addresses:10.254.254.2/30, 172.17.1.128/24
eth1 - Internal A - addresses: 192.168.1.254/24
eth2 - Internal B - addresses: 192.168.2.254/24
Now we will run through the route and rule creation assuming weare starting from the beginning. First edit /etc/iproute2/rt_tables:
#/etc/iproute2/rt_tables
# reserved values
#
255 local
254 main
253 default
0 unspec
#
# Local Tables
#
1 int1
2 int2
#
Create the routes andrules.
***ip rule set #2***
ip route add10.10.0.0/16 via 10.254.254.1 table int1 proto static
ip route add throw 0/0table int1 proto static
ip route add172.18.0.0/16 via 172.17.1.1 table int2 proto static
ip route add throw 0/0table int2 proto static
ip route add 0/0 via172.17.1.254 table main proto static
ip rule add pref 15000table int1 iif eth1
ip rule add pref 15001table int2 iif eth2
ip rule add pref 15002to 10.10.0.0/16 table int1
ip rule add pref 15003to 172.18.0.0/16 table int2
***End***
This set of routes andrules will perform the same operations as set #1. Study them untilyou see why. Hint: Do not forget the default rules. Later we will seehow to use policy routing to perform miraculous tricks with packetpaths.