It's always rough when you follow directions and something doesn't turn out, even more so when you are familiar with what you are trying to do. I believe I've discovered a bug somewhere in Debian's Ethernet (or the ifenslave-2.6 package) configuration during a new server setup on December 14, 2012. Once I am finished writing this post I am off to find where to submit an issue for them to see if I can save someone else from similar madness.
I was setting up a new install (Debian 6.0.6 i686) at work and was struggling with setting up Ethernet Bonding. I've done it in the past, and newer versions of Debian have made it easier than ever to configure, so I was really stumped as to why this was not working.
Entries like this from dmesg tell me it's working, but ping illustrates that clearly something is wrong:
Dec 14 14:20:09 ferrari kernel: [ 4.595988] bonding: bond0: setting mode to active-backup (1). Dec 14 14:20:09 ferrari kernel: [ 4.596045] bonding: bond0: Setting MII monitoring interval to 100. Dec 14 14:20:09 ferrari kernel: [ 4.596087] bonding: bond0: Setting up delay to 200. Dec 14 14:20:09 ferrari kernel: [ 4.596121] bonding: bond0: Setting down delay to 200. Dec 14 14:20:09 ferrari kernel: [ 4.658073] bonding: bond0: doing slave updates when interface is down. Dec 14 14:20:09 ferrari kernel: [ 4.658079] bonding: bond0: Adding slave eth0. Dec 14 14:20:09 ferrari kernel: [ 4.658082] bonding bond0: master_dev is not up in bond_enslave Dec 14 14:20:09 ferrari kernel: [ 4.676526] tg3 0000:03:06.0: firmware: requesting tigon/tg3_tso.bin Dec 14 14:20:09 ferrari kernel: [ 4.923645] bonding: bond0: enslaving eth0 as a backup interface with a down link. Dec 14 14:20:09 ferrari kernel: [ 4.934060] bonding: bond0: doing slave updates when interface is down. Dec 14 14:20:09 ferrari kernel: [ 4.934066] bonding: bond0: Adding slave eth1. Dec 14 14:20:09 ferrari kernel: [ 4.934069] bonding bond0: master_dev is not up in bond_enslave Dec 14 14:20:09 ferrari kernel: [ 4.956523] tg3 0000:03:08.0: firmware: requesting tigon/tg3_tso.bin Dec 14 14:20:09 ferrari kernel: [ 5.208291] bonding: bond0: enslaving eth1 as a backup interface with a down link. Dec 14 14:20:09 ferrari kernel: [ 5.212315] ADDRCONF(NETDEV_UP): bond0: link is not ready Dec 14 14:20:11 ferrari kernel: [ 7.813163] tg3 0000:03:08.0: eth1: Link is up at 1000 Mbps, full duplex Dec 14 14:20:11 ferrari kernel: [ 7.813167] tg3 0000:03:08.0: eth1: Flow control is on for TX and on for RX Dec 14 14:20:11 ferrari kernel: [ 7.912012] bonding: bond0: link status up for interface eth1, enabling it in 0 ms. Dec 14 14:20:11 ferrari kernel: [ 7.912016] bonding: bond0: link status definitely up for interface eth1. Dec 14 14:20:11 ferrari kernel: [ 7.912020] bonding: bond0: making interface eth1 the new active one. Dec 14 14:20:11 ferrari kernel: [ 7.912044] bonding: bond0: first active interface up! Dec 14 14:20:11 ferrari kernel: [ 7.912172] ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready Dec 14 14:20:11 ferrari kernel: [ 8.148079] tg3 0000:03:06.0: eth0: Link is up at 1000 Mbps, full duplex Dec 14 14:20:11 ferrari kernel: [ 8.148084] tg3 0000:03:06.0: eth0: Flow control is on for TX and on for RX Dec 14 14:20:11 ferrari kernel: [ 8.212012] bonding: bond0: link status up for interface eth0, enabling it in 200 ms. Dec 14 14:20:11 ferrari kernel: [ 8.412010] bonding: bond0: link status definitely up for interface eth0.
While trying to figure this out I noticed some strange entries in both the routing table, and the output of /sbin/ifconfig.
/sbin/route: Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 192.168.0.0 * 255.255.255.0 U 0 0 0 eth0 192.168.0.0 * 255.255.255.0 U 0 0 0 bond0 default 192.168.0.1 0.0.0.0 UG 0 0 0 bond0
As you can see for some reason eth0 still has an entry in the routing table. Seeing that as a problem I tried to delete it with no success. Below you'll see for some reason eth0, while "RUNNING SLAVE" still has the old IP address it had before it was reassigned to bond0.
/sbin/ifconfig: bond0 Link encap:Ethernet HWaddr 00:0b:db:e2:ce:db inet addr:192.168.0.215 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::20b:dbff:fee2:cedb/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 RX packets:220 errors:0 dropped:0 overruns:0 frame:0 TX packets:30 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:23522 (22.9 KiB) TX bytes:2028 (1.9 KiB) eth0 Link encap:Ethernet HWaddr 00:0b:db:e2:ce:db inet addr:192.168.0.215 Bcast:192.168.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:142 errors:0 dropped:0 overruns:0 frame:0 TX packets:21 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:15158 (14.8 KiB) TX bytes:1344 (1.3 KiB) Interrupt:28 eth1 Link encap:Ethernet HWaddr 00:0b:db:e2:ce:db UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 RX packets:78 errors:0 dropped:0 overruns:0 frame:0 TX packets:9 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:8364 (8.1 KiB) TX bytes:684 (684.0 B) Interrupt:29
At first I thought it was maybe strange notation for the bonded interfaces, but the more I thought about it the more I felt it was wrong. After some searching I came to reading this: http://www.kernel.org/doc/Documentation/networking/bonding.txt and found "Section 8.1 Adventures in Routing" was explaining exactly the issue I was having. For reasons unknown to me I was not able to delete the route I wanted to delete. In the end what worked was getting my bonded connection setup and then rebooting. Only then did I lose the eth0 in routing, and the IP address on the eth0 as reported by /sbin/ifconfig.
I went through some trials using ifup and ifdown to get rid of the eth0 entry in routing, and I even put a short line in the interfaces files:
iface eth0 inet manual
Bringing eth0 up and down removed the errant entries, but restarting networking brought them back, even with eth0 removed from interfaces aside from the slave command.
So far my only success has been a reboot, upon which the bonding is working fine.
References:
- I followed the instructions here: http://wiki.debian.org/Bonding
- More Bonding Information: http://www.kernel.org/doc/Documentation/networking/bonding.txt
- Debain Bug Reporting: http://www.debian.org/Bugs/Reporting