LLDP traffic and Linux bridges

In my previous post I described my Cumulus VX lab environment which is based on Fedora and KVM. One of the first things I noticed after bringing up the setup is that although I have got L3 connectivity between the emulated Cumulus switches, I can’t get LLDP to operate properly between the devices.

For example, a basic ICMP ping between the directly connected interfaces of leaf1 and spine3 is successful, but no LLDP neighbor shows up:

cumulus@leaf1$ ping 13.0.0.3
PING 13.0.0.3 (13.0.0.3) 56(84) bytes of data.
64 bytes from 13.0.0.3: icmp_req=1 ttl=64 time=0.210 ms
64 bytes from 13.0.0.3: icmp_req=2 ttl=64 time=0.660 ms
64 bytes from 13.0.0.3: icmp_req=3 ttl=64 time=0.635 ms
cumulus@leaf1$ lldpcli show neighbors 

LLDP neighbors:
-------------------------------------

Reading through the Cumulus Networks documentation, I discovered that LLDP is turn on by default on all active interfaces. It is possible to tweak things, such as timers, but the basic neighbor discovery functionality should be there by default.

Looking at the output from lldpcli show statistics I also discovered that LLDP messages are being sent out of the interfaces, but never received:

cumulus@leaf1$ lldpcli show statistics 

Interface:    eth0
  Transmitted:  11
  Received:     0
  Discarded:    0
  Unrecognized: 0
  Ageout:       0
  Inserted:     0
  Deleted:      0

Interface:    swp1
  Transmitted:  11
  Received:     0
  Discarded:    0
  Unrecognized: 0
  Ageout:       0
  Inserted:     0
  Deleted:      0

Interface:    swp2
  Transmitted:  11
  Received:     0
  Discarded:    0
  Unrecognized: 0
  Ageout:       0
  Inserted:     0
  Deleted:      0

So what’s going on?

Remember that leaf1 and spine3 are not really directly connected. They are bridged together using a Linux bridge device.

This is where I discovered that by design, Linux bridges silently drop LLDP messages (sent to the LLDP_Multicast address 01-80-C2-00-00-0E) and other control frames in the 01-80-C2-00-00-xx range.

Explanation to that can be found in the 802.1AB standard which is stating that “the destination address shall be 01-80-C2-00-00-0E. This address is within the range reserved by IEEE Std 802.1D-2004 for protocols constrained to an individual LAN, and ensures that the LLDPDU will not be forward by MAC Bridges that conform to IEEE Std 802.1D-2004.”

It is possible to change this behavior on a per bridge basis, though, by using:

# echo 16384 > /sys/class/net/<bridge_name>/bridge/group_fwd_mask

Retesting with leaf1 and spine3

# echo 16384 > /sys/class/net/virbr1/bridge/group_fwd_mask
cumulus@leaf1$ lldpcli show neighbor
LLDP neighbors:

Interface:    swp1, via: LLDP, RID: 1, Time: 0 day, 00:00:02  
  Chassis:     
    ChassisID:    mac 00:00:00:00:00:33
    SysName:      spine3
    SysDescr:     Cumulus Linux version 2.5.5 running on  QEMU Standard PC (i440FX + PIIX, 1996)
    MgmtIP:       3.3.3.3
    Capability:   Bridge, off
    Capability:   Router, on
  Port:        
    PortID:       ifname swp1
    PortDescr:    swp1
cumulus@leaf1$ lldpcli show statistics 

Interface:      eth0
  Transmitted:  117
  Received:     0
  Discarded:    0
  Unrecognized: 0
  Ageout:       0
  Inserted:     0
  Deleted:      0

Interface:      swp1
  Transmitted:  117
  Received:     72
  Discarded:    0
  Unrecognized: 0
  Ageout:       0
  Inserted:     1
  Deleted:      0

Interface:      swp2
  Transmitted:  117
  Received:     0
  Discarded:    0
  Unrecognized: 0
  Ageout:       0
  Inserted:     0
  Deleted:      0


LLDP now operates as expected between leaf1 and spine3. Remember that this is a per bridge setting, so in order to get this fixed across the entire setup, the command needs to be issued for the rest of the bridges (virbr2, virbr3, virbr4) as well.

Hands on with Fedora, KVM and Cumulus VX

Cumulus Linux is a network operating system based on Debian that runs on top of industry standard networking hardware. By providing a software-only solution, Cumulus is enabling disaggregation of data center switches similar to the x86 server hardware/software disaggregation. In addition to the networking features you would expect from a network operating system like L2 bridging, Spanning Tree Protocol, LLDP, bonding/LAG, L3 routing, and so on, it enables users to take advantage of the latest Linux applications and automation tools, which is in my opinion its true power.

Cumulus VX is a community-supported virtual appliance that enables network engineers to preview and test Cumulus Networks technology. The appliance is available in different formats (for VMware, VirtualBox, KVM, and Vagrant environments), and since I am running Fedora on my laptop the easiest thing for me was to use the KVM qcow2 image to try it out.

My goal is to build a four node leaf/spine topology. To form the fabric, each leaf will be connected to each spine, so we will end up with two “fabric facing” interfaces on each switch. In addition, I want to have a separate management interface on each device I can use for SSH access as well as automation purposes (Ansible being an immediate suspect), and a loopback interface to be used as the router-id.

base_topology

Prerequisites

  • Install KVM and related virtualization packages. I am running Fedora 22 and used yum groupinstall “Virtualization*” to obtain the latest versions of libvirt, virt-manager, qemu-kvm and associated dependencies.
  • From the Virtual Machine Manager, create four basic isolated networks (without IP, DHCP or NAT settings). Those will serve as transport for the point-to-point links between our switches. I named them as follows:
    • net1
    • net2
    • net3
    • net4
  • Download the KVM qcow2 image from the Cumulus website. At the time of writing the image is based on Cumulus Linux v2.5.5. You would want to copy it four times, and name them as follows:
    • leaf1.qcow2
    • leaf2.qcow2
    • spine3.qcow2
    • spine4.qcow2

Creating the VMs

While creating each VM you will need to specify the network settings, in particular what interfaces you want to be created, what networks they should be part of, and what is their L2 (MAC) information. To ease troubleshooting, I came out with my own convention for the interfaces MAC addresses.

Leaf1:

  • Leaf1 should have three interfaces:
    • One belonging to the “default” network – a network created by virt-manager with DHCP and NAT enabled, and will be used for the management access.
    • One belonging to net1, which is going to be used for the connection between leaf1 and spine3. Behind the scenes, virt-manager created a Linux bridge for this network.
    • One belonging to net2, which is going to be used for the connection between leaf1 and spine4. Behind the scenes, virt-manager created a Linux bridge for this network.
  • Make sure to adjust the path to specify the location of the image.
sudo virt-install --os-variant=generic --ram=256 --vcpus=1 --network=default,model=virtio,mac=00:00:00:00:00:11 --network network=net1,model=virtio,mac=00:00:01:00:00:13 --network network=net2,model=virtio,mac=00:00:01:00:00:14 --boot hd --disk path=/home/nyechiel/Downloads/VX/leaf1.qcow2,format=qcow2 --name=leaf1

Leaf2:

  • Leaf2 should have three interfaces:
    • One belonging to the “default” network – a network created by virt-manager with DHCP and NAT enabled, and will be used for the management access.
    • One belonging to net3, which is going to be used for the connection between leaf2 and spine3. Behind the scenes, virt-manager created a Linux bridge for this network.
    • One belonging to net4, which is going to be used for the connection between leaf2 and spine4. Behind the scenes, virt-manager created a Linux bridge for this network.
  • Make sure to adjust the path to specify the location of the image.
sudo virt-install --os-variant=generic --ram=256 --vcpus=1 --network=default,model=virtio,mac=00:00:00:00:00:22 --network network=net3,model=virtio,mac=00:00:02:00:00:23 --network network=net4,model=virtio,mac=00:00:02:00:00:24 --boot hd --disk path=/home/nyechiel/Downloads/VX/leaf2.qcow2,format=qcow2 --name=leaf2

Spine3:

  • Spine3 should have three interfaces:
    • One belonging to the “default” network – a network created by virt-manager with DHCP and NAT enabled, and will be used for the management access.
    • One belonging to net1, which is going to be used for the connection between leaf1 and spine3. Behind the scenes, virt-manager created a Linux bridge for this network.
    • One belonging to net3, which is going to be used for the connection between leaf2 and spine3. Behind the scenes, virt-manager created a Linux bridge for this network.
  • Make sure to adjust the path to specify the location of the image.
sudo virt-install --os-variant=generic --ram=256 --vcpus=1 --network=default,model=virtio,mac=00:00:00:00:00:33 --network network=net1,model=virtio,mac=00:00:03:00:00:31 --network network=net3,model=virtio,mac=00:00:03:00:00:32 --boot hd --disk path=/home/nyechiel/Downloads/VX/spine3.qcow2,format=qcow2 --name=spine3

Spine4:

  • Spine4 should have three interfaces:
    • One belonging to the “default” network – a network created by virt-manager with DHCP and NAT enabled, and will be used for the management access.
    • One belonging to net2, which is going to be used for the connection between leaf1 and spine4. Behind the scenes, virt-manager created a Linux bridge for this network.
    • One belonging to net4, which is going to be used for the connection between leaf2 and spine4. Behind the scenes, virt-manager created a Linux bridge for this network.
  • Make sure to adjust the path to specify the location of the image.
sudo virt-install --os-variant=generic --ram=256 --vcpus=1 --network=default,model=virtio,mac=00:00:00:00:00:44 --network network=net2,model=virtio,mac=00:00:04:00:00:41 --network network=net4,model=virtio,mac=00:00:04:00:00:42 --boot hd --disk path=/home/nyechiel/Downloads/VX/spine4.qcow2,format=qcow2 --name=spine4

Verifying the hypervisor topology

Before we log in to any of the newly created VMs, I first would like to verify the configuration and make sure that we have got the right connectivity in place. Using ifconfig on my Fedora system, and by looking into the MAC addresses, I correlated between the Linux bridges created by virt-manager (virbr0, virbr1, virbr2, virbr3, virbr4) and the virtual Ethernet devices (vnet). This is giving me the hypervisor point of view, and going to be really useful for troubleshooting purposes. I came up with this topology:

hypervisor_view

Useful commands to use here are brctl show and brctl showmacs. For example, let’s examine the link between leaf1 and spine3 (note that libvirt based the MAC on the configured guest MAC address with high byte set to 0xFE):

$ ip link show vnet1 | grep link
   link/ether fe:00:01:00:00:13 brd ff:ff:ff:ff:ff:ff
$ ip link show vnet10 | grep link
   link/ether fe:00:03:00:00:31 brd ff:ff:ff:ff:ff:ff
$ brctl show virbr1
bridge name      bridge id      STP enabled     interfacesvirbr1       8000.525400d32feb      yes         virbr1-nic                                                vnet1
                                                vnet10
$ brctl showmacs virbr1
port no   mac addr        is local?   ageing timer          2   fe:00:01:00:00:13    yes        18.34
  3   fe:00:03:00:00:31    yes        24.61
  1   52:54:00:d3:2f:eb    yes        0.00

Verifying the fabric topology

Now that we have the basic networking setup between the VMs and we understand the topology, we can jump into the switches and confirm their view. The switches can be accessed with the username “cumulus” and the password “CumulusLinux!”. This is also the password for root.

Using console access to the VMs and the ifconfig command we can learn a couple of things:

  1. eth0 is the base interface on each switch used for management purposes. It picked up an address from the 192.168.122.0/22 range, which is what virt-manager used to setup the “default” network. SSH to this address is enabled by default with standard TCP port 22.
  2. The “fabric” interfaces are swp1 and swp2.

Based on this information we can build up our final topology, which is a representation of the actual fabric:

fabric_topology

Now what?

Now that we have the basic topology setup and the right diagrams to support us, we can go on and configure things. Cumulus has got some good level of documentation so I will let you take it from here. You can configure things manually using the CLI (which is really a bash system with standard Linux commands) or use automation tools to control the switch.  

Using the CLI and following the documentation, it was pretty straightforward to me to configure hostnames, IP addresses and bring up OSPF and BFD (using Quagga) between the switches. Next I plan to play with the more advanced stuff (personally I want to test out BGP and IPv6 configurations), and try to automate things using Ansible. Happy testing!

 

IPv6 address assignment – stateless, stateful, DHCP… oh my!

People don’t like changes. IPv6 could have help to solve a lot of the burden in networks deployed today, which are still mostly based on the original version of the Internet Protocol, aka version 4. But time has come, and even the old tricks like throwing network address translation (NAT) everywhere are not going to help anymore, simply because we are out of IP addresses. It may take some more time, and people will do everything they can to (continue and) delay it, but believe me – there is no other way around – IPv6 is here to replace IPv4. IPv6 is also a critical part of the promise of the cloud and the Internet of Things (IoT). If you want to connect everything to the network, you better plan for massive scale and have enough addresses to use.

One of the trickiest things with IPv6 though is the fact that it’s pretty different from IPv4. While some of the concepts remains the same, there are some fundamental differences between IPv4 and IPv6, and it’s definitely takes some time to get used into some of the IPv6 basics, including the terms being used. Experienced IPv4 engineers will probably need to change their mindset, and as I stated before, people don’t really like changes…

In this post, I want to highlight the address assignment options available with IPv6, which is in my view one of the most fundamental things in IP networking, and where things are pretty different comparing to IPv4. I am going to assume you have some basic background on IPv6, and while I will cover the theory part I will also show the command line interface and demonstrate some of the configuration options, focusing on SLAAC and stateless DHCPv6. I am going to use a simple topology with two Cisco routers directly connected to each other using their GigabitEthernet 1/0 interface. Both routers are running IOS 15.2(4).

 Let the party started

With IPv6 an interface can have multiple prefixes and IP addresses, and unlike IPv4, all of them are primary. All interfaces will have a Link-Local address which is the address used to implement many of the control plane functions. If you don’t manually set the Link-Local address, one will automatically be generated for you. Note that the IPv6 protocol stack will not become operational on an interface until a Link-Local address was assigned or generated and it passed Duplicate Address Detection (DAD) verification. In Cisco IOS, we will first need to enable IPv6 on the router which is done globally using the ipv6 unicast-routing command. We will then enable IPv6 on the interface using the ipv6 enable command:

ipv6 unicast-routing
!
interface GigabitEthernet1/0
 ipv6 enable
!

Now IPv6 in enabled on the interface, and we should get a Link-Local address assigned automatically:

show ipv6 interface g1/0 | include link

IPv6 is enabled, link-local address is FE80::C800:51FF:FE2F:1C

IPv6 address assignment options

A little bit of theory as promised. When it comes to IPv6 address assignment there are several options you can use:

  • Static (manual) address assignment – exactly like with IPv4, you can go on and apply the address yourself. I believe this is straight forward and therefore I am not going to demonstrate that.
  • Stateless Address Auto Configuration (SLAAC) – nodes listen for ICMPv6 Router Advertisements (RA) messages periodically sent out by routers on the local link, or requested by the node using an RA solicitation message. They can then create a Global unicast IPv6 address by combining its interface EUI-64 (based on the MAC address on Ethernet interfaces) plus the Link Prefix obtained via the Router Advertisement. This is a unique feature only to IPv6 which provides simple “plug & play” networking. By default, SLAAC does not provide anything to the client outside of an IPv6 address and a default gateway. SLAAC is greatly discussed in RFC 4862.
  • Stateless DHCPv6 – with this option SLAAC is still used to get the IP address, but DHCP is used to obtain “other” configuration options, usually things like DNS, NTP, etc. The advantage here is that the DHCP server is not required to store any dynamic state information about any individual clients. In case of large networks which has huge number of end points attached to it, implementing stateless DHCPv6 will highly reduce the number of DHCPv6 messages that are needed for address state refreshment.
  • Stateful DHCPv6 – functions exactly the same as IPv4 DHCP in which hosts receive both their IPv6 address and additional parameters from the DHCP server. Like DHCP for IPv4, the components of a DHCPv6 infrastructure consist of DHCPv6 clients that request configuration, DHCPv6 servers that provide configuration, and DHCPv6 relay agents that convey messages between clients and servers when clients are on subnets that do not have a DHCPv6 server. You can learn more about DHCP for IPv6 in RFC 3315.

NOTE: The only way to get a default gateway in IPv6 is via a RA message. DHCPv6 does not carry default route information at this time.

Putting it all together

An IPv6 host performs stateless address autoconfiguration (SLAAC) by default and uses a configuration protocol such as DHCPv6 based on the following flags in the Router Advertisement message sent by a neighboring router:

  • Managed Address Configuration Flag, the ‘M’ flag. When set to 1, this flag instructs the host to use a configuration protocol to obtain stateful IPv6 addresses
  • Other Stateful Configuration Flag, the ‘O’ flag. When set to 1, this flag instructs the host to use a configuration protocol to obtain other configuration settings, e.g., DNS, NTP, etc.

Combining the values of the M and O flags can yield the following:

  • Both M and O Flags are set to 0. This combination corresponds to a network without a DHCPv6 infrastructure. Hosts use Router Advertisements for non-link-local addresses and other methods (such as manual configuration) to configure other parameters.
  • Both M and O Flags are set to 1. DHCPv6 is used for both addresses and other configuration settings, aka stateful DHCPv6.

  • The M Flag is set to 0 and the O Flag is set to 1. DHCPv6 is not used to assign addresses, only to assign other configuration settings. Neighboring routers are configured to advertise non-link-local address prefixes from which IPv6 hosts derive stateless addresses. This combination is known as statless DHCPv6.

Examining the configuration

SLAAC

Client configuration:

interface GigabitEthernet1/0
 ipv6 address autoconfig
 ipv6 enable

Server configuration:

interface GigabitEthernet1/0
 ipv6 address 2001:1111:1111::1/64
 ipv6 enable

We can see the server sending the RA message with the prefix that was configured:

ICMPv6-ND: Request to send RA for FE80::C801:51FF:FE2F:1C
ICMPv6-ND: Setup RA from FE80::C801:51FF:FE2F:1C to FF02::1 on GigabitEthernet1/0 
ICMPv6-ND: MTU = 1500
ICMPv6-ND: prefix = 2001:1111:1111::/64 onlink autoconfig 
ICMPv6-ND: 2592000/604800 (valid/preferred)

And the client receiving the message and calculating an address using EUI-64:

ICMPv6-ND: Received RA from FE80::C801:51FF:FE2F:1C on GigabitEthernet1/0 
ICMPv6-ND: Prefix : 2001:1111:1111::ICMPv6-ND: Update on-link prefix 2001:1111:1111::/64 on GiabitEthernet1/0 
IPV6ADDR: Generating IntfID for 'eui64', prefix 2001:1111:1111::/64 
ICMPv6-ND: IPv6 Address Autoconfig 2001:1111:1111:0:C800:51FF:FE2F:1C 

R1#show ipv6 interface brief
GigabitEthernet1/0 [up/up]
FE80::C800:51FF:FE2F:1C
2001:1111:1111:0:C800:51FF:FE2F:1

Stateless DHCP

Client configuration:

No changes are required on the client side. The client is configured to use SLAAC by setting the “auto-config” option.

interface GigabitEthernet1/0
 ipv6 address autoconfig
 ipv6 enable

Server configuration:

ipv6
dhcp pool STATELESS_DHCP
dns-server 2001:1111:1111::10
domain-name test.com
!
interface GigabitEthernet1/0
 ipv6 address 2001:1111:11111::1/64
 ipv6 enable
 ipv6 nd other-config-flag
 ipv6 dhcp server STATELESS_DHCP

We can see the client keeping the same IP address, but now obtaining DNS settings through DHCP:

IPv6 DHCP: Adding server FE80::C801:51FF:FE2F:1C
IPv6 DHCP: Processing options
IPv6 DHCP: Configuring DNS server 2001:1111:1111::10
IPv6 DHCP: Configuring domain name test.com