首页 > 代码库 > iproute2学习笔记

iproute2学习笔记

一、替代arp, ifconfig, route等命令

显示网卡和IP地址

root@openstack:~# ip link list
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:31:50:43:57:fa brd ff:ff:ff:ff:ff:ff
4: br-ex: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 64:31:50:43:57:fa brd ff:ff:ff:ff:ff:ff
7: br-int: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether a2:99:53:93:1b:47 brd ff:ff:ff:ff:ff:ff
10: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether fe:54:00:68:e0:04 brd ff:ff:ff:ff:ff:ff
35: br-tun: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 42:9b:ec:6c:f6:41 brd ff:ff:ff:ff:ff:ff
71: qbrf38a666d-f5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 96:e0:4d:68:c2:6b brd ff:ff:ff:ff:ff:ff
72: qvof38a666d-f5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 66:9e:9a:e1:25:37 brd ff:ff:ff:ff:ff:ff
73: qvbf38a666d-f5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbrf38a666d-f5 state UP qlen 1000
    link/ether 96:e0:4d:68:c2:6b brd ff:ff:ff:ff:ff:ff
74: tapf38a666d-f5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbrf38a666d-f5 state UNKNOWN qlen 500
    link/ether fe:16:3e:3d:68:e4 brd ff:ff:ff:ff:ff:ff
root@openstack:~# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 64:31:50:43:57:fa brd ff:ff:ff:ff:ff:ff
    inet 16.158.165.152/22 brd 16.158.167.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::6631:50ff:fe43:57fa/64 scope link
       valid_lft forever preferred_lft forever
4: br-ex: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 64:31:50:43:57:fa brd ff:ff:ff:ff:ff:ff
    inet 16.158.165.102/22 brd 16.158.167.255 scope global br-ex
       valid_lft forever preferred_lft forever
    inet6 fe80::905e:c9ff:fe4b:36ef/64 scope link
       valid_lft forever preferred_lft forever
7: br-int: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether a2:99:53:93:1b:47 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9036:18ff:fe6f:39bb/64 scope link
       valid_lft forever preferred_lft forever
35: br-tun: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether 42:9b:ec:6c:f6:41 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::90c0:c4ff:fed2:3cfd/64 scope link
       valid_lft forever preferred_lft forever
71: qbrf38a666d-f5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP
    link/ether 96:e0:4d:68:c2:6b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::a811:6aff:fe0f:667f/64 scope link
       valid_lft forever preferred_lft forever
72: qvof38a666d-f5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 66:9e:9a:e1:25:37 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::649e:9aff:fee1:2537/64 scope link
       valid_lft forever preferred_lft forever
73: qvbf38a666d-f5: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbrf38a666d-f5 state UP qlen 1000
    link/ether 96:e0:4d:68:c2:6b brd ff:ff:ff:ff:ff:ff
    inet6 fe80::94e0:4dff:fe68:c26b/64 scope link
       valid_lft forever preferred_lft forever
74: tapf38a666d-f5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbrf38a666d-f5 state UNKNOWN qlen 500
    link/ether fe:16:3e:3d:68:e4 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fe3d:68e4/64 scope link
      valid_lft forever preferred_lft forever

显示路由

root@openstack:~# ip route show
default via 16.158.164.1 dev br-ex
16.158.164.0/22 dev br-ex  proto kernel  scope link  src 16.158.165.102
16.158.164.0/22 dev eth0  proto kernel  scope link  src 16.158.165.152
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1

显示ARP

root@openstack:~# ip neigh show
16.158.165.47 dev br-ex lladdr e4:11:5b:53:62:00 STALE
192.168.122.61 dev virbr0 lladdr 52:54:00:68:e0:04 STALE
16.158.164.1 dev br-ex lladdr 00:00:5e:00:01:15 DELAY
16.158.166.177 dev br-ex lladdr 00:26:99:d0:12:a9 STALE
16.158.164.3 dev br-ex lladdr 20:fd:f1:e4:c9:e8 STALE
16.158.165.87 dev br-ex lladdr 70:5a:b6:b3:dd:a5 STALE
16.158.166.150 dev br-ex  FAILED
16.158.164.2 dev br-ex lladdr 20:fd:f1:e4:c9:b1 STALE

二、Rules: Routing Policy

Routing Table其实有三个:local, main, default

root@openstack:~# ip rule list
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default

原来的route命令修改的是main和local表

root@openstack:~# ip route list table local
broadcast 16.158.164.0 dev br-ex  proto kernel  scope link  src 16.158.165.102
broadcast 16.158.164.0 dev eth0  proto kernel  scope link  src 16.158.165.152
local 16.158.165.102 dev br-ex  proto kernel  scope host  src 16.158.165.102
local 16.158.165.152 dev eth0  proto kernel  scope host  src 16.158.165.152
broadcast 16.158.167.255 dev br-ex  proto kernel  scope link  src 16.158.165.102
broadcast 16.158.167.255 dev eth0  proto kernel  scope link  src 16.158.165.152
broadcast 127.0.0.0 dev lo  proto kernel  scope link  src 127.0.0.1
local 127.0.0.0/8 dev lo  proto kernel  scope host  src 127.0.0.1
local 127.0.0.1 dev lo  proto kernel  scope host  src 127.0.0.1
broadcast 127.255.255.255 dev lo  proto kernel  scope link  src 127.0.0.1
broadcast 192.168.122.0 dev virbr0  proto kernel  scope link  src 192.168.122.1
local 192.168.122.1 dev virbr0  proto kernel  scope host  src 192.168.122.1
broadcast 192.168.122.255 dev virbr0  proto kernel  scope link  src 192.168.122.1
root@openstack:~# ip route list table main
default via 16.158.164.1 dev br-ex
16.158.164.0/22 dev br-ex  proto kernel  scope link  src 16.158.165.102
16.158.164.0/22 dev eth0  proto kernel  scope link  src 16.158.165.152
192.168.122.0/24 dev virbr0  proto kernel  scope link  src 192.168.122.1
root@openstack:~# ip route list table default

Simple source policy routing

我们来考虑下面的场景,我家里接了两个外网,一个到网通(用的光纤),一个到电信(电话拨号),这两个Modem都连到我的NAT Router上,我把房子出租出去,有很多的室友,其中有一个室友仅仅访问Email,因而想少付费,我想让他仅仅使用电信的线,那么我应该如何配置我的NAT Router呢?

image

原来的配置是这样的

[ahu@home ahu]$ ip route list table main 
195.96.98.253 dev ppp2  proto kernel  scope link  src 212.64.78.148 
212.64.94.1 dev ppp0  proto kernel  scope link  src 212.64.94.251 
10.0.0.0/8 dev eth0  proto kernel  scope link  src 10.0.0.1 
127.0.0.0/8 dev lo  scope link 
default via 212.64.94.1 dev ppp0 

默认都走快的路由

下面我添加一个Table,名字叫John

# echo 200 John >> /etc/iproute2/rt_tables
# ip rule add from 10.0.0.10 table John
# ip rule ls
0:	from all lookup local 
32765:	from 10.0.0.10 lookup John
32766:	from all lookup main 
32767:	from all lookup default

并设定规则从10.0.0.10来的包都查看John这个路由表

在John路由表中添加规则

# ip route add default via 195.96.98.253 dev ppp2 table John
# ip route flush cache

默认的路由走慢的,达到了我的需求。

Routing for multiple uplinks/providers

image

 

$IF1是第一个Interface,它的IP是$IP1

$IF2是第二个Interface,它的IP是$IP2

$P1是Provider1的Gateway,Provider1的网络$P1_NET

$P2是Provider2的Gateway,Provider2的网络$P2_NET

我们要做的第一个事情是Split access.

创建两个routing table, T1和T2,添加到/etc/iproute2/rt_tables.

	  ip route add $P1_NET dev $IF1 src $IP1 table T1
	  ip route add default via $P1 table T1
	  ip route add $P2_NET dev $IF2 src $IP2 table T2
	  ip route add default via $P2 table T2

在T1中设定,如果要到达$P1_NET,需要从网卡$IF1出去

在T2中设定,如果要到达$P2_NET,需要从网卡$IF2出去

设置main table

	    ip route add $P1_NET dev $IF1 src $IP1
	    ip route add $P2_NET dev $IF2 src $IP2
	    ip route add default via $P1
 

添加Rules

	    ip rule add from $IP1 table T1
	    ip rule add from $IP2 table T2

 

第二件事情是Load balancing

default gateway不能总是一个

ip route add default scope global nexthop via $P1 dev $IF1 weight 1 nexthop via $P2 dev $IF2 weight 1
 

GRE tunneling

image

在Router A上做如下配置:

ip tunnel add netb mode gre remote 172.19.20.21 local 172.16.17.18 ttl 255
ip link set netb up
ip addr add 10.0.1.1 dev netb
ip route add 10.0.2.0/24 dev netb

创建一个名为netb的tunnel,模式是GRE,远端是172.19.20.21,此端是172.16.17.18

所有向10.0.2.0的包都通过这个Tunnel转发

在Router B上做如下配置:

ip tunnel add neta mode gre remote 172.16.17.18 local 172.19.20.21 ttl 255
ip link set neta up
ip addr add 10.0.2.1 dev neta
ip route add 10.0.1.0/24 dev neta

Queueing Disciplines for Bandwidth Management

With queueing we determine the way in which data is SENT. It is important to realise that we can only shape data that we transmit.

With the way the Internet works, we have no direct control of what people send us.

我们只能控制发送,无法控制接收,所以发送叫shaping,我们可以控制我们的输出流的形态,接收只能设置policy,拒绝或者接受。

Simple, classless Queueing Disciplines

pfifo_fast

First In, First Out

说是先入先出,实际上一个Queue包含三个Band,每个Band都是先入先出,Band 0优先级最高,它不处理完毕,Band 1不处理,其次是Band 2

在IP头里面有TOS (Type of service),有一个priomap,是一个映射,将不同的TOS映射给不同的Bind。

image

image

root@openstack:~# tc qdisc show dev eth0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1

txqueuelen

The length of this queue is gleaned from the interface configuration, which you can see and set with ifconfig and ip.

Token Bucket Filter

Token按照一定的速度来,每个Token都带走一个Packet,当Packet比Token快的时候,会保证按照Token的速度发送,不至于发送太快。

当Packet的速度比Token慢的时候,Token会累积,但是不会无限累积,累积到Bucket大小为止。如果累积的太多了,忽然来了大量的数据,导致瞬时间有大量的包发送。有了Bucket限制,即便积累满了Bucket,大量数据来的时候,最多带走所有的Bucket的Token,然后又按照Token到来的速度慢慢发送了。

limit or latency

Limit is the number of bytes that can be queued waiting for tokens to become available.

burst/buffer/maxburst

Size of the bucket, in bytes.

rate

The speedknob.

peakrate

If tokens are available, and packets arrive, they are sent out immediately by default.
        That may not be what you want, especially if you have a large bucket.
        The peakrate can be used to specify how quickly the bucket is allowed to be depleted.

# tc qdisc add dev ppp0 root tbf rate 220kbit latency 50ms burst 1540

Stochastic Fairness Queueing

随机公平队列

A TCP/IP flow can be uniquely identified by the following parameters within a certain time period:
Source and Destination IP address
Source and Destination Port
Layer 4 Protocol (TCP/UDP/ICMP)

有很多的FIFO的队列,TCP Session或者UDP stream会被分配到某个队列。包会RoundRobin的从各个队列中取出发送。

这样不会一个Session占据所有的流量。

但不是每一个Session都有一个队列,而是有一个Hash算法,将大量的Session分配到有限的队列中。

这样两个Session会共享一个队列,也有可能互相影响。

Hash函数会经常改变,从而session不会总是相互影响。

perturb

Reconfigure hashing once this many seconds.

quantum

Amount of bytes a stream is allowed to dequeue before the next queue gets a turn.

limit

The total number of packets that will be queued by this SFQ

# tc qdisc add dev ppp0 root sfq perturb 10
# tc -s -d qdisc ls
qdisc sfq 800c: dev ppp0 quantum 1514b limit 128p flows 128/1024 perturb 10sec 
 Sent 4812 bytes 62 pkts (dropped 0, overlimits 0) 

The number 800c: is the automatically assigned handle number, limit means that 128 packets can wait in this queue. There are 1024 hashbuckets available for accounting, of which 128 can be active at a time (no more packets fit in the queue!) Once every 10 seconds, the hashes are reconfigured.

Classful Queueing Disciplines

When traffic enters a classful qdisc, The filters attached to that qdisc then return with a decision, and the qdisc uses this to enqueue the packet into one of the classes. Each subclass may try other filters to see if further instructions apply. If not, the class enqueues the packet to the qdisc it contains.

The qdisc family: roots, handles, siblings and parents:

Each interface has one egress ‘root qdisc‘.

Each qdisc and class is assigned a handle, which can be used by later configuration statements to refer to that qdisc.

The handles of these qdiscs consist of two parts, a major number and a minor number : <major>:<minor>.

The PRIO qdisc

它和FIFO Fast很类似,也分多个Band,但是它的每个Band其实是一个Class,而且数目可以改变。默认是三个Band。

每一个Band也不一定是FIFO,而是任何类型的qdisc.

默认也是根据TOS来决定去那个Class,Band是0-2,而Class是1-3.

当然也可以使用filter来决定去哪个Class

ands

Number of bands to create. Each band is in fact a class. If you change this number, you must also change:

priomap

If you do not provide tc filters to classify traffic, the PRIO qdisc looks at the TC_PRIO priority to decide how to enqueue traffic.

image

# tc qdisc add dev eth0 root handle 1: prio 
## This *instantly* creates classes 1:1, 1:2, 1:3
  
# tc qdisc add dev eth0 parent 1:1 handle 10: sfq
# tc qdisc add dev eth0 parent 1:2 handle 20: tbf rate 20kbit buffer 1600 limit 3000
# tc qdisc add dev eth0 parent 1:3 handle 30: sfq   
# tc -s qdisc ls dev eth0 
qdisc sfq 30: quantum 1514b 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

 qdisc tbf 20: rate 20Kbit burst 1599b lat 667.6ms 
 Sent 0 bytes 0 pkts (dropped 0, overlimits 0) 

 qdisc sfq 10: quantum 1514b 
 Sent 132 bytes 2 pkts (dropped 0, overlimits 0) 

 qdisc prio 1: bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 174 bytes 3 pkts (dropped 0, overlimits 0) 

Hierarchical Token Bucket

# tc qdisc add dev eth0 root handle 1: htb default 30

# tc class add dev eth0 parent 1: classid 1:1 htb rate 6mbit burst 15k

# tc class add dev eth0 parent 1:1 classid 1:10 htb rate 5mbit burst 15k
# tc class add dev eth0 parent 1:1 classid 1:20 htb rate 3mbit ceil 6mbit burst 15k
# tc class add dev eth0 parent 1:1 classid 1:30 htb rate 1kbit ceil 6mbit burst 15k

The author then recommends SFQ for beneath these classes:

# tc qdisc add dev eth0 parent 1:10 handle 10: sfq perturb 10
# tc qdisc add dev eth0 parent 1:20 handle 20: sfq perturb 10
# tc qdisc add dev eth0 parent 1:30 handle 30: sfq perturb 10

Add the filters which direct traffic to the right classes:

# U32="tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32"
# $U32 match ip dport 80 0xffff flowid 1:10
# $U32 match ip sport 25 0xffff flowid 1:20

HTB certainly looks wonderful - if 10: and 20: both have their guaranteed bandwidth, and more is left to divide, they borrow in a 5:3 ratio, just as you would expect.

Unclassified traffic gets routed to 30:, which has little bandwidth of its own but can borrow everything that is left over.

A fundamental part of the HTB qdisc is the borrowing mechanism. Children classes borrow tokens from their parents once they have exceeded rate. A child class will continue to attempt to borrow until it reaches ceil, at which point it will begin to queue packets for transmission until more tokens/ctokens are available. As there are only two primary types of classes which can be created with HTB the following table and diagram identify the various possible states and the behaviour of the borrowing mechanisms.

Table 2. HTB class states and potential actions taken

type of class

class state

HTB internal state

action taken

leaf

< rate

HTB_CAN_SEND

Leaf class will dequeue queued bytes up to available tokens (no more than burst packets)

leaf

> rate, < ceil

HTB_MAY_BORROW

Leaf class will attempt to borrow tokens/ctokens from parent class. If tokens are available, they will be lent in quantum increments and the leaf class will dequeue up to cburst bytes

leaf

> ceil

HTB_CANT_SEND

No packets will be dequeued. This will cause packet delay and will increase latency to meet the desired rate.

inner, root

< rate

HTB_CAN_SEND

Inner class will lend tokens to children.

inner, root

> rate, < ceil

HTB_MAY_BORROW

Inner class will attempt to borrow tokens/ctokens from parent class, lending them to competing children in quantum increments per request.

inner, root

> ceil

HTB_CANT_SEND

Inner class will not attempt to borrow from its parent and will not lend tokens/ctokens to children classes.

This diagram identifies the flow of borrowed tokens and the manner in which tokens are charged to parent classes. In order for the borrowing model to work, each class must have an accurate count of the number of tokens used by itself and all of its children. For this reason, any token used in a child or leaf class is charged to each parent class until the root class is reached.

Any child class which wishes to borrow a token will request a token from its parent class, which if it is also over its rate will request to borrow from its parent class until either a token is located or the root class is reached. So the borrowing of tokens flows toward the leaf classes and the charging of the usage of tokens flows toward the root class.

clip_image002

Note in this diagram that there are several HTB root classes. Each of these root classes can simulate a virtual circuit.

7.1.4. HTB class parameters

default

An optional parameter with every HTB qdisc object, the default default is 0, which cause any unclassified traffic to be dequeued at hardware speed, completely bypassing any of the classes attached to the root qdisc.

rate

Used to set the minimum desired speed to which to limit transmitted traffic. This can be considered the equivalent of a committed information rate (CIR), or the guaranteed bandwidth for a given leaf class.

ceil

Used to set the maximum desired speed to which to limit the transmitted traffic. The borrowing model should illustrate how this parameter is used. This can be considered the equivalent of “burstable bandwidth”.

burst

This is the size of the rate bucket (see Tokens and buckets). HTB will dequeue burst bytes before awaiting the arrival of more tokens.

cburst

This is the size of the ceil bucket (see Tokens and buckets). HTB will dequeue cburst bytes before awaiting the arrival of more ctokens.

quantum

This is a key parameter used by HTB to control borrowing. Normally, the correct quantum is calculated by HTB, not specified by the user. Tweaking this parameter can have tremendous effects on borrowing and shaping under contention, because it is used both to split traffic between children classes over rate (but below ceil) and to transmit packets from these same classes.

r2q

Also, usually calculated for the user, r2q is a hint to HTB to help determine the optimal quantum for a particular class.

mtu

prio

 

Netfilter & iproute - marking packets

我们可以在iptable中设置mark,然后在route的时候使用mark

this command marks all packets destined for port 25, outgoing mail:

# iptables -A PREROUTING -i eth0 -t mangle -p tcp --dport 25  -j MARK --set-mark 1

We‘ve already marked the packets with a ‘1‘, we now instruct the routing policy database to act on this:

# echo 201 mail.out >> /etc/iproute2/rt_tables
# ip rule add fwmark 1 table mail.out
# ip rule ls
0:	from all lookup local 
32764:	from all fwmark        1 lookup mail.out 
32766:	from all lookup main 
32767:	from all lookup default 

Now we generate a route to the slow but cheap link in the mail.out table:

# /sbin/ip route add default via 195.96.98.253 dev ppp0 table mail.out

The u32 classifier

The U32 filter is the most advanced filter available in the current implementation.

# tc filter add dev eth0 protocol ip parent 1:0 pref 10 u32   match u32 00100000 00ff0000 at 0 flowid 1:10