首页 > 代码库 > Linux基本功杂记——[018]——『Network Bonding』

Linux基本功杂记——[018]——『Network Bonding』

  EtherChannel最初是由cisco提出,通过聚合多条物理链路为单条逻辑链路,从而实现高可用及提高吞吐量等目的。AgP(Port Aggregation Protocol,Cisco专有协议)、LACP(IEEE 802.3ad)是应用最广泛的两种实现。Linux中的实现称为Bonding,HA的实现需要系统层面Bonding和物理层面switch紧密配合。

  http://www.mjmwired.net/kernel/Documentation/networking/bonding.txt


 一、Bonding的7种模式


 mode=0 balance-rr

  • round robin模式,所有报文由各个slave平均承担
  • 优点:提供7种模式中最高的理论带宽,任一slave失效,其任务由剩余slave均摊
  • 缺点:不同端口轮循发报,容易导致乱序,对端会要求重发,从而影响吞吐量;需要switch端配置port channel

mode=1 active-backup

  • 主备模式,仅有当primary devivce DOWN掉时,备用设备转换为primary状态
  • 优点:对switch无要求,可接入任何链路
  • 缺点:device利用率最低

mode=2 balance-xor

  • xor异或hash算法,发往同一目的MAC地址的报文由同一端口全部承担,因此,在单switch网络环境下,相当于active-backup,不能提升带宽
  • 优点:multi-switch环境下有可能提供优于balance-rr的吞吐量,不存在Bonding本身导致的乱序问题
  • 缺点:single-switch环境下无效率提升;需要swtch端配置port channel

mode=3 broadcast

  • 所有报文会复制N份,由每个端口同时发出
  • 优点:提供最好的网络容错机制,不存在其它mode下的端口切换期间的丢包现象(业务上不会感知有downtime),适用于金融行业等对稳定性要求极高的领域
  • 缺点:占用N倍网络带宽,影响整体吞吐量

mode=4 802.3ad

  • IEEE标准,所有实现了802.3ad标准的对端均可以有效的合作
  • 优点:switch端通常只需要少量的配置;帧按順序传递,通常不会出现乱序现象
  • 缺点:支持802.3ad的设备相对较少;通常要求所有的slave具有相同的spead和双工mode;和除了balance-rr之外的其它mode一样,任何连接都不能使用多于1个的interface的带宽

mode=5 balance-tlb

  • 根据outgoing流量在各个slave间均衡,适用于multi-switch环境
  • 优点:无须switch端的特别配置;非单点路由环境下,以优于XOR的算法做均衡;各slave速率可以不同
  • 缺点:无法对incoming流量进行均衡处理;不支持arp监控

mode=6 balance-alb

  • 在mode=5之上的改进,通过arp协商实现对ipv4的incoming流量负载均衡
  • 优点:mode=5优点+可以实现incoming负载均衡
  • 缺点:仅在大的集群环境中有较大优势

  综上所述,常用的mode为balance-rr(mode=0)、active-backup(mode=1)、broadcast(mode=3)、balance-alb6;其中balance-alb可以看作是balance-xob、balance-tlb的改进版。


 二、Bonding管理:create、change、destroy、monitor


   最常用的三种Bonding管理方式:iproute2、sysfs、发行版的网络配置文件;其中前两者具有更好的通用性,可跨不同的Linux发行版使用,是本文着重介绍的对象。

Creating Bonds:

  • modprobe bonding(可选操作,模块导入后默认生成一个名为bond0的master)
  • ip link add device bond0 type bond
  • OR
  • echo +bond0 > /sys/class/net/bonding_masters

Show all existing bonds:

  • ip link | grep MASTER
  • OR
  • cat /sys/class/net/bonding_masters

Show the status of bonds:

  • cat /proc/net/bonding/bondX
  • NOTE:Each bonding device has a read-only file residing in the /proc/net/bonding directory.The file contents include information about the bonding configuration, options and state of each slave.Notice all slaves of bond0 have the same MAC address (HWaddr) as bond0 for all modes except TLB and ALB that require a unique MAC address for each slave.

Remove an existing bond:

  • ip link del bond0
  • OR
  • echo -bond0 > /sys/class/net/bonding_masters

  NOTE: due to 4K size limitation of sysfs files, this list may be truncated if you have more than a few hundred bonds.This is unlikely to occur under normal operating conditions.

Adding Slaves:

  • ip link set bond0 up
  • ip link set eth0 down
  • ip link set eth0 nomaster
  • ip link set dev eth0 master bond0
  • OR
  • echo +eth0 > /sys/class/net/bond0/bonding/slaves

Show all slaves belong to bond0:

  • ip link | grep -P ‘master.*bond0‘
  • OR
  • cat /sys/class/net/bond0/bonding/slaves

Free slave eth0 from bond bond0:

  • ip link set eth0 nomaster
  • OR
  • echo -eth0 > /sys/class/net/eth0/master/bonding/slaves
  • #The two operations above will free eth0 from whatever bond it is enslaved to, regardless of the name of the bond interface
  • OR
  • echo -eth0 > /sys/class/net/bond0/bonding/slaves

  NOTE:When an interface is enslaved to a bond, symlinks between the two are created in the sysfs filesystem.In this case, you would get /sys/class/net/bond0/slave_eth0 pointing to /sys/class/net/eth0, and /sys/class/net/eth0/master pointing to /sys/class/net/bond0.This means that you can tell quickly whether or not an interface is enslaved by looking for the master symlink.

Changing a Bond‘s Configuration

  Each bond may be configured individually by manipulating the files located in /sys/class/net/<bond name>/bonding.除了arp_ip_target之外,都可以用echo “+-”的方式改变/sys下对应文件的内容,可以实现与配置文件或命令参数同样的功能。

To configure bond0 for balance-alb mode:

  • ip link set bond0 down
  • echo 6 > /sys/class/net/bond0/bonding/mode
  • OR
  • echo balance-alb > /sys/class/net/bond0/bonding/mode
  • NOTE: The bond interface must be down before the mode can be changed

To enable MII monitoring on bond0 with a 1 second interval:

  • echo 1000 > /sys/class/net/bond0/bonding/miimon
  • NOTE: If ARP monitoring is enabled, it will disabled when MII monitoring is enabled, and vice-versa(反之亦然)

To add ARP targets:

  • echo +192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target
  • echo +192.168.0.101 > /sys/class/net/bond0/bonding/arp_ip_target
  • NOTE: up to 16 target addresses may be specified

To remove an ARP target:

  • echo -192.168.0.100 > /sys/class/net/bond0/bonding/arp_ip_target

To configure the interval between learning packet transmits:

  • echo 12 > /sys/class/net/bond0/bonding/lp_interval
  • NOTE: the lp_inteval is the number of seconds between instances where the bonding driver sends learning packets to each slaves peer switch.The default interval is 1 second.The lp_inteval has effect only in balance-alb and balance-tlb modes

Configuration Files:

  不同的发行版,其开机启动脚本不尽相同。rhel5、6系列,在/etc/rc.d/rc.local中配置;gentoo_openrc在/etc/rc.local/目录下创建foo.start脚本;systemd的环境(如debian、rhel7、gentoo_systemd、suse12等),撰写foo.service文件,置于/etc/systemd/system/目录,systemctl enable foo.service.


 三、Switch Configuration 


   "switch" refers to whatever system the bonded devices are directly connected to (i.e., where the other end of the cable plugs into).This may be an actual dedicated switch device,or it may be another regular system (e.g., another computer running Linux),The active-backup, balance-tlb and balance-alb modes do not require any specific configuration of the switch.

  The 802.3ad mode requires that the switch have the appropriate ports configured as an 802.3ad aggregation.The precise method used to configure this varies from switch to switch, but, for example, a Cisco 3550 series switch requires that the appropriate ports first be grouped together in a single etherchannel instance, then that etherchannel is set to mode "lacp" to enable 802.3ad (instead of standard EtherChannel).
The balance-rr, balance-xor and broadcast modes generally require that the switch have the appropriate ports grouped together.The nomenclature for such a group differs between switches, it may be called an "etherchannel" (as in the Cisco example, above), a "trunk group" or some other similar variation.For these modes, each switch will also have its own configuration options for the switch‘s transmit policy to the bond.Typical choices include XOR of either the MAC or IP addresses.The transmit policy of the two peers does not need to match.For these three modes, the bonding mode really selects a transmit policy for an EtherChannel group; all three will interoperate with another EtherChannel group.


 四、Link Monitoring


   The bonding driver at present supports two schemes for monitoring a slave device‘s link state: the ARP monitor and the MII monitor.At the present time, due to implementation restrictions in the bonding driver itself, it is not possible to enable both ARP and MII monitoring simultaneously.

  Miimon only checks for the device‘s carrier state.It has no way to determine the state of devices on or beyond other ports of a switch, or if a switch is refusing to pass traffic while still maintaining carrier on.
  Loading the bonding driver before any network drivers participating in a bond.
  When bonding is configured, it is important that the slave devices not have routes that supersede routes of the master (or,generally, not have routes at all).
  The ARP monitor (and ARP itself) may become confused by this configuration, because ARP requests (generated by the ARP monitor) will be sent on one interface (bond0 etc.), but the corresponding reply will arrive on a different interface (eth0 etc.).This reply looks to ARP as an unsolicited ARP reply (because ARP matches replies on an interface basis), and is discarded.The MII monitor is not affected by the state of the routing table.
  Insure that slaves do not have routes of their own, and if for some reason they must, those routes do not supersede routes of their master.This should generally be the case, but unusual configurations or errant manual or automatic static route additions may cause trouble.


 五、Promiscuous mode


   When running network monitoring tools, e.g., tcpdump, it is common to enable promiscuous mode on the device, so that all traffic is seen (instead of seeing only traffic destined for the local host).The bonding driver handles promiscuous mode changes to the bonding master device (e.g., bond0), and propagates the setting to the slave devices.

  • For the balance-rr, balance-xor, broadcast, and 802.3ad modes,the promiscuous mode setting is propagated to all slaves.
  • For the active-backup, balance-tlb and balance-alb modes, the promiscuous mode setting is propagated only to the active slave.
  • For balance-tlb mode, the active slave is the slave currently receiving inbound traffic.
  • For balance-alb mode, the active slave is the slave used as a "primary." This slave is used for mode-specific control traffic, for sending to peers that are unassigned or if the load is unbalanced.
  • For the active-backup, balance-tlb and balance-alb modes, when the active slave changes (e.g., due to a link failure), the promiscuous setting will be propagated(传承) to the new active slave.

 六、Configuring Bonding for High Availability


   High Availability refers to configurations that provide maximum network availability by having redundant or backup devices,links or switches between the host and the rest of the world.The goal is to provide the maximum availability of network connectivity(i.e., the network always works), even though other configurations could provide higher throughput.


七、High Availability in a Multiple Switch Topology(interface && switch)


   In multiple switch topologies, there is a trade off between network availability and usable bandwidth.

HA Bonding Mode Selection for Multiple Switch Topology:

  • active-backup: This is generally the preferred mode, particularly if the switches have an ISL and play together well.If the network configuration is such that one switch is specifically a backup switch (e.g., has lower capacity, higher cost, etc),then the primary option can be used to insure that the preferred link is always used when it is available.
  • broadcast: This mode is really a special purpose mode, and is suitable only for very specific needs.For example, if the two switches are not connected (no ISL), and the networks beyond them are totally independent.In this case, if it is necessary for some specific one-way traffic to reach both independent networks, then the broadcast mode may be suitable.

HA Link Monitoring Selection for Multiple Switch Topology:
  The choice of link monitoring ultimately depends upon your switch.If the switch can reliably fail ports in response to other failures, then either the MII or ARP monitors should work.
In general, however, in a multiple switch topology, the ARP monitor can provide a higher level of reliability in detecting end to end connectivity failures (which may be caused by the failure of any individual component to pass traffic for any reason).Additionally,the ARP monitor should be configured with multiple targets (at least one for each switch in the network).This will insure that,regardless of which switch is active, the ARP monitor has a suitable target to query.


八、MT Bonding Mode Selection for Single Switch Topology 


 As bellow:

  • balance-rr: This mode is the only mode that will permit a single TCP/IP connection to stripe traffic across multiple interfaces.It is therefore the only mode that will allow a single TCP/IP stream to utilize more than one interface‘s worth of throughput.This comes at a cost, however: the striping generally results in peer systems receiving packets out of order, causing TCP/IP‘s congestion control system to kick in, often by retransmitting segments.This mode requires the switch to have the appropriate ports configured for "etherchannel" or "trunking."
  • active-backup: There is not much advantage in this network topology to the active-backup mode, as the inactive backup devices are all connected to the same peer as the primary.In this case, a load balancing mode (with link monitoring) will provide the same level of network availability, but with increased available bandwidth.On the plus side, active-backup mode does not require any configuration of the switch, so it may have value if the hardware available does not support any of the load balance modes.
  • balance-xor: This mode will limit traffic such that packets destined for specific peers will always be sent over the same interface.Since the destination is determined by the MAC addresses involved, this mode works best in a "local" network configuration (as described above), with destinations all on the same local network.This mode is likely to be suboptimal if all your traffic is passed through a single router (i.e., a "gatewayed" network configuration, as described above).As with balance-rr, the switch ports need to be configured for "etherchannel" or "trunking."
  • broadcast: Like active-backup, there is not much advantage to this mode in this type of network topology.
  • 802.3ad: This mode can be a good choice for this type of network topology.The 802.3ad mode is an IEEE standard, so all peers that implement 802.3ad should interoperate well.The 802.3ad protocol includes automatic configuration of the aggregates,so minimal manual configuration of the switch is needed (typically only to designate that some set of devices is available for 802.3ad).The 802.3ad standard also mandates that frames be delivered in order (within certain limits), so in general single connections will not see misordering of packets.The 802.3ad mode does have some drawbacks: the standard mandates that all devices in the aggregate operate at the same speed and duplex.Also, as with all bonding load balance modes other than balance-rr, no single connection will be able to utilize more than a single interface‘s worth of bandwidth.Additionally, the linux bonding 802.3ad implementation distributes traffic by peer (using an XOR of MAC addresses and packet type ID), so in a "gatewayed" configuration, all outgoing traffic will generally use the same device.Incoming traffic may also end up on a single device, but that is dependent upon the balancing policy of the peer‘s 8023.ad implementation.In a "local" configuration, traffic will be distributed across the devices in the bond.Finally, the 802.3ad mode mandates the use of the MII monitor,therefore, the ARP monitor is not available in this mode.
  • balance-tlb: The balance-tlb mode balances outgoing traffic by peer.Since the balancing is done according to MAC address, in a "gatewayed" configuration (as described above), this mode will send all traffic across a single device.However, in a "local" network configuration, this mode balances multiple local network peers across devices in a vaguely intelligent manner (not a simple XOR as in balance-xor or 802.3ad mode),so that mathematically unlucky MAC addresses (i.e., ones that XOR to the same value) will not all "bunch up" on a single interface.Unlike 802.3ad, interfaces may be of differing speeds, and no special switch configuration is required.On the down side,in this mode all incoming traffic arrives over a single interface, this mode requires certain ethtool support in the network device driver of the slave interfaces, and the ARP monitor is not available.
  • balance-alb: This mode is everything that balance-tlb is, and more.It has all of the features (and restrictions) of balance-tlb,and will also balance incoming traffic from local network peers,The only additional down side to this mode is that the network device driver must support changing the hardware address while the device is open.

  MT Link Monitoring for Single Switch Topology.The choice of link monitoring may largely depend upon which mode you choose to use.The more advanced load balancing modes do not support the use of the ARP monitor, and are thus restricted to using the MII monitor (which does not provide as high a level of end to end
assurance as the ARP monitor).


九、MT Bonding Mode Selection for Multiple Switch Topology


   In actual practice, the bonding mode typically employed in configurations of this type is balance-rr.MT Link Monitoring for Multiple Switch Topology:again, in actual practice, the MII monitor is most often used in this configuration, as performance is given preference over availability.The ARP monitor will function in this topology, but its advantages over the MII monitor are mitigated by the volume of probes needed as the number of systems involved grows (remember that each host in the network is configured with bonding).


 十、FQA常见问题


 Where does a bonding device get its MAC address from?

  • When using slave devices that have fixed MAC addresses, or when the fail_over_mac option is enabled, the bonding device‘s MAC address is the MAC address of the active slave.For other configurations, if not explicitly configured (with ifconfig or ip link), the MAC address of the bonding device is taken from its first slave device.This MAC address is then passed to all following slaves and remains persistent (even if the first slave is removed) until the bonding device is brought down or reconfigured.You can change the MAC address with ip link.

Which switches/systems does it work with?

  • The full answer to this depends upon the desired mode.In the basic balance modes (balance-rr and balance-xor), it 2741 works with any system that supports etherchannel (also called trunking).Most managed switches currently available have such support, and many unmanaged switches as well.The advanced balance modes (balance-tlb and balance-alb) do not have special switch requirements, but do need device drivers that support specific features.In 802.3ad mode, it works with systems that support IEEE 802.3ad Dynamic Link Aggregation.Most managed and many unmanaged switches currently available support 802.3ad.The active-backup mode should work with any Layer-II switch.

What happens when a slave link dies?

  • If link monitoring is enabled, then the failing device will be disabled.The active-backup mode will fail over to a backup link, and other modes will ignore the failed link.The link will continue to be monitored, and should it recover, it will rejoin the bond (in whatever manner is appropriate for the mode).Link monitoring can be enabled via either the miimon or arp_interval parameters.In general, miimon monitors the carrier state as sensed by the underlying network device, and the arp monitor (arp_interval) monitors connectivity to another host on the local network.If no link monitoring is configured, the bonding driver will be unable to detect link failures, and will assume that all links are always available.This will likely result in lost packets, and a resulting degradation of performance.The precise performance loss depends upon the bonding mode and network configuration.

Linux基本功杂记——[018]——『Network Bonding』