NFV and Open Networking with RHEL OpenStack Platfrom

(This is a summary version of a talk I gave at Intel Israel Telecom and NFV event on December 2nd, 2015. Slides are available here)

I was honored to be invited to speak on a local Intel event about Red Hat and what we are doing in the NFV space. I only had 30 minutes, so I tried to provide a high level overview of our offering, covering some main points:

  • Upstream first approach and why we believe it is a fundamental piece in the NFV journey; this is not a marketing pitch but really how we deliver our entire product portfolio
  • NFV and OpenStack; I touched on the fact that many service providers are asking for OpenStack-based solutions, and that OpenStack is the de-facto choice for VIM. That said, there are some limitations today (both cultural and technical) with OpenStack and clearly we have a way to go to make it a better engine for the telco needs
  • Full open source approach to NFV; it’s not just OpenStack but also other key projects such as QEMU/KVM, Open vSwitch, DPDK, libvirt, and the underlying Linux operating system. It’s hard to coordinate across these different communities, but this is what we are trying to do, with active participants on all of those
  • Red Hat product focus and alignment with OPNFV
  • Main use-cases we see in the market (atomic VNFs, vCPE, vEPC) with a design example of vPGW using SR-IOV
  • What telco and NFV specific features were introduced in RHEL OpenStack Platform 7 (Kilo) and what is planned for OpenStack Platform 8 (Liberty); as a VIM provider we want to offer our customers and the Network Equipment Providers (NEPs) maximum flexibility for packet processing options with PCI Passthrough, SR-IOV, Open vSwitch and DPDK-accelerated Open vSwitch based solutions.

Thanks to Intel Israel for a very interesting and well-organized event!

 

Advertisements

OpenStack Networking with Neutron: What Plugin Should I Deploy?

(This is a summary version of a talk I gave at OpenStack Israel event on June 15th, 2015. Slides are available here)

Neutron is probably one of the most pluggable projects in OpenStack today. The theory is very simple and goes like this: Neutron is providing just an API layer and you have got to choose the backend implementation you want. But in reality, there are plenty of plugins (or drivers) to choose from and the plugin architecture is not always so clear.

The plugin is a critical piece of the deployment and directly affects the feature set you are going to get, as well as the scale, performance, high availability, and supported network topologies. In addition, different plugins offer different approaches for managing and operating the networks.

So what is a Neutron plugin?

The Neutron API exposed via the Neutron server is splitted into two buckets: the core (L2) API and the API extensions. While the core API consists only of the fundamental Neutron definitions (Network, Subnet, Port), the API extension is where the interesting stuff get to be defined, and where you can deal with constructs like L3 router, provider networks, or L4-L7 services such as FWaaS, LBaaS or VPNaaS.

In order to match this design, the plugin architecture is built out of a “core” plugin (which implements the core API) and one or more “service” plugins (to implement additional “advanced” services defined in the API extensions). To make things more interesting, these advanced network services can also be provided by the core plugin by implementing the relevant extensions.

What plugins are out there?

There are many plugins out there, each with its own approach. But when trying to categorize them, I found that usually it boils down to “software centric” plugins versus “hardware centric” plugins.

With the software centric ones, the assumption is that the network hardware is general-purpose, and the functionality is offered, as the name implies, with software only. This is where we get to see most of the overlay networking approaches with the virtual tunnel end-points (VTEP) implemented in the Compute/Hypervisor nodes. The requirements from the physical fabric is to provide only basic IP routing/switching. The plugin can use an SDN approach to provision the overly tunnels in an optimal manner, and handle broadcast, unknown unicast and multicast (BUM) traffic efficiently.

With the hardware centric ones, the assumption is that a dedicated network hardware is in place. This is where the traditional network vendors usually offer a combined software/hardware solution taking advantage of their network gear. The advantages of this design is better performance (if you offload certain network function to the hardware) and the promise of better manageability and control of the physical fabric.

And what is there by default?    

There are efforts in the Neutron community to completely separate the API (or control-plane components) from the plugin or actual implementation. The vision is to position Neutron as a platform, and not as any specific implementation. That being said, Neutron was really developed out of the Open vSwitch plugin, and some good amount of the upstream development today is still focused around that. Open vSwitch (with the OVS ML2 driver) is what you get by default, and this is by far the most common plugin deployed in production (see the recent user survey here). This solution is not perfect and has pros and cons like any other of the solutions out there.

While Open vSwitch is used on the Compute nodes to provide connectivity for VM instances, some of the key components with this solution are actually not related to Open vSwitch. L3 routing, DHCP, and other services are implemented using dedicated software agents using Linux tools such as network namespaces (ip netns), dnsmasq, or iptables.

So how one should choose a plugin?

I am sorry, but there is no easy answer here. From my experience, the best way is to develop a methodological approach:

1. Evaluate the default Open vSwitch based solution first. Even if you end up not choosing it for your production environment, it should at least get you familiar with the Neutron constructs, definitions and concepts

2. Get to know your business needs, and collect technical requirements. Some key questions to answer:

  • Are you building a greenfield deployment?
  • What level of interaction is expected with your existing network?
  • What type of applications are going to run in your cloud?
  • Is self-service required?
  • Who are the end-users?
  • What level of isolation and security is required?
  • What level of QoS is expected?
  • Are you building a multi cloud/multi data-center or an hybrid deployment?

3. Test things up yourself. Don’t rely on vendor presentations and other marketing materials

What’s Coming in OpenStack Networking for the Kilo Release

A post I wrote for the Red Hat Stack blog on what’s coming in OpenStack Networking for the Kilo release

Red Hat Stack

KiloOpenStack  Kilo, the 11th release of the open source project, was officially released in April, and now is a good time to review some of the changes we saw in the OpenStack Networking (Neutron) community during this cycle, as well as some of the key new networking features introduced in the project.

Scaling the Neutron development community

The Kilo cycle brings two major efforts which are meant to better expand and scale the Neutron development community: core plugin decomposition and advanced services split. These changes should not directly impact OpenStack users but are expected to reduce code footprint, improve feature velocity, and ultimately bring faster innovation speed. Let’s take a look at each individually:

Neutron core plugin decomposition

Neutron, by design, has a pluggable architecture which offers a custom backend implementation of the Networking API. The plugin is a core piece of the deployment and acts as the “glue”…

View original post 2,596 more words

An Overview of Link Aggregation and LACP

The concept of Link Aggregation (LAG) is well known in the networking industry by now, and people usually consider it as a basic functionality that just works out of the box. With all of the SDN hype that’s going on out there, I sometimes feel that we tend to neglect some of the more “traditional” stuff like this one. As with many networking technologies and protocols, things may not just work out of the box, and it’s important to master the details to be able to design things properly, know what to expect to (i.e., what the normal behavior is) and ultimately being able to troubleshoot in case of a problem.

The basic concept of LAG is that multiple physical links are combined into one logical bundle. This provides two major benefits, depending on the LAG configuration:

  1. Increased capacity – traffic may be balanced across the member links to provide aggregated throughput
  2. Redundancy – the LAG bundle can survive the loss of one or more member links

LAG is defined by the IEEE 802.1AX-2008 standard, which states, “Link Aggregation allows one or more links to be aggregated together to form a Link Aggregation Group, such that a MAC client can treat the Link Aggregation Group as if it were a single link”. This layer 2 transparency is achieved by the LAG using a single MAC address for all the device’s ports in the LAG group. The individual port members must be of the same speed, so you cannot bundle for example a 1G and 10G interfaces. The ports should also have the same duplex settings, encapsulation type (i.e., access/untagged or 802.1q tagged with the exact same number of VLANs) as well as MTU.

LAG can be configured as either static (manually) or dynamic by using a protocol to negotiate the LAG formation, with LACP being the standard-based one. There is also the Port Aggregation Protocol (PAgP), which is similar in many regards to LACP, but is Cisco proprietary and not in common usage anymore.

LAG blog (1)

Wait… LAG, bond, bundle, team, trunk, EtherChannel, Port Channel?

Let’s clear this right away – there are several acronyms used to describe LAG which are sometimes used interchangeably. While LAG is the standard name defined by the IEEE specification, different vendors and operating systems came up with their own implementation and terminology. Bond, for example, is really known on Linux-based systems, following the name of the kernel driver. Team (or NIC teaming) is also pretty common across Windows systems, and lately Linux systems as well. EtherChannel is one of the famous terms, being used on Cisco’s IOS. Interesting enough, Cisco have changed the term in their IOS-XR software to bundles, and in their NX-OS systems to Port Channels. Oh… I love the standardization out there!

LAG can also be used as a general term to describe link aggregation with different technologies (such as MLPPP for PPP links) which can cause some confusion, while Ethernet is the de facto standard and the focus of the IEEE spec.

Use cases

Today, Link Aggregations can be found in many network designs, and across different portions of the network. LAG can be found across the Enterprise, Data Center, and Service Provider networks. In the cloud and virtualization space, it’s also common to want to use multiple network connections in your hypervisors to support Virtual Machine traffic. So you can have LAG configured between different network devices (for e.g., switch to switch, router to router), or between an end host or hypervisor and the upstream network device (usually some sort of a ToR switch).

L2 LAG and STP

From Spanning Tree Protocol (STP) perspective, no matter how many physical ports are being used to form the LAG, there is going to be only one logical interface representing each LAG bundle. The individual ports are not part of the STP topology, but only the one logical interface. STP is still going to be active on the LAG interface and should not be turned off, so that if there are multiple LAGs configured between two adjacent nodes, STP will block one of them.

LAG blog (2)

LAG blog (3)

L3 LAG

While LAG is extremely common across L2 network designs, and sometimes even seen as a partial replacement for Spanning Tree Protocol (STP), it is important to mention that LAG can also operate at L3, i.e, by assigning an IPv4 or IPv6 subnet to the aggregated link. You can then setup static or dynamic routing over the LAG like any other routed interface.

LAG versus MC-LAG

By definition, LAG is formed across two adjacent nodes which are directly connected to each other. The two nodes must be configured properly to form the LAG, so that traffic would be transferred properly between the nodes without a fear of creating traffic loops between the individual members for example.

MC-LAG, or Multi-Chassis Link Aggregation Group, is a type of LAG with constituent ports that terminate on separate chassis, thereby providing node-level redundancy. Unlike link aggregation in general, MC-LAG is not covered under IEEE standard, and its implementation varies by vendor. Cisco’s vPC is a good example for a MC-LAG implementation. The real challenge with MC-LAG is to maintain a consistent control plane state across the LAG setup, which is why the various multi-chassis mechanisms insist on countermeasures such as peer links or out of band connectivity between the redundant chassis.

LAG blog (4)

Load sharing operation

Traffic is not randomly placed across the LAG members, but instead shared using a deterministic hash algorithm. Depending on the platform and the configuration, a number of parameters may feed into the algorithm, including for example the ingress interface, source and/or destination MAC address, source and/or destination IP address, source and/or destination L4 (TCP/UDP) port numbers, MPLS labels, and so on.

Ultimately the hash will take in some combination of parameters to identify a flow and decide to which member link the frame should be placed in. It is important to note that all traffic for a particular flow will always be placed on the same link. That’s also means that traffic for a single flow (e.g., source and destination MAC) cannot exceed the bandwidth of a single member link. It is also important to note that each node (or chassis) performs the hash calculations locally itself, so that upstream and downstream traffic for a single flow will not necessarily traverse the same link.

Static configuration

The basic way to form a LAG is to simply specify the member ports on each node manually. This method does not involve any protocols to negotiate and form the LAG. Depending on the platform, the user can also control the hash algorithm on each side. As soon as a port becomes physically up it becomes a member of the LAG bundle. The major advantage of this is that the configuration is very simple. The disadvantage is that there is no method to detect any kind of cabling or configuration errors, which is most vendors would recommend a LACP configuration instead.

LACP configuration

LACP is the standards based protocol used to signal LAGs. It detects and protects the network from a variety of misconfiguration, ensuring that links are only aggregated into a bundle if they are consistently configured and cabled. LACP can be configured in one of two modes:

  • Active mode – the device immediately sends LACP messages (LACP PDUs) when the port comes up
  • Passive mode – Places a port into a passive negotiating state, in which the port only responds to LACP PDUs it receives but does not initiate LACP negotiation

If both sides are configured as active, LAG can be formed assuming successful negotiation of the other parameters. If one side is configured as active and the other one as passive, LAG can be formed as the passive port will respond to the LACP PDUs received from the active side. If both sides are passive, LACP will fail to negotiate the bundle. In practice it is rare to find passive mode used as it should be clearly and consistently defined which links will use LACP/LAG ahead of deployment. There are even vendors who does not offer the passive mode option at all.

With LACP, you can also control the timeout interval in which LACP PDUs will be sent. The standard defines two intervals: fast (1 second) and slow (30 seconds). Note that the timeout value does not have to agree between peers. While it is not a recommended configuration, it is possible to bring up a LAG with one end sending every 1 second and the other sending every 30 seconds. Depending on the platform and configuration, it is also possible to use Bidirectional Forwarding Detection (BFD) for fast detection of link failures.

The need for Network Overlays – part II

In the previous post, I covered some of the basic concepts behind network overlays, primarily highlighting the need to move into a more robust, L3 based, network environments. In this post I would like to cover network overlays in more detail, going over the different encapsulation options and highlighting some of the key points to consider when deploying an overlay-based solution.

Underlying fabric considerations

While network overlays give you the impression that networks are suddenly all virtualized, we still need to consider the physical underlying network. No matter what overlay solution you might pick, it’s still going to be the job of the underlying transport network to switch or route the traffic from source to destination (and vice versa).

Like any other network design, there are several options to choose from when building the underlying network. Before picking up a solution, it’s important to analyze the requirements – namely the scale, amount of virtual machines (VMs), size of the network as well as the amount of traffic. Yes, there are some fancy network fabric solutions out there from any of your favorite vendors, but simple L3 Clos network will do just fine. The big news here is that the underlying network should no longer be a L2 bridged network, but can be configured as a L3 routed network. Clos topology with ECMP routing can provide efficient non-blocking forwarding with a quick convergence time in a case of a failure. Known protocols such as OSPF, IS-IS, and BGP, with the addition of a protocol like BFD, can provide a good standard-based foundation for such a network. One thing I do want to highlight when it comes to the underlying network, is the requirement to support Jumbo frames. No matter what overlay encapsulation you may choose to implement, extra bytes of header will be added to the frames, resulting in a need for high MTU support from the physical network.

For the virtualization/cloud admin, with overlay networks, the data network used to carry the overly traffic is no longer a special network that requires careful VLAN configuration. It is now just one more infrastructure network used to provide simple TCP/IP connectivity.

Encapsulation

When it comes to the overlay data-plane encapsulation, the amount of discussions, comparisons and debate out there is amazing. There are several options and standards available, all of them have the same goal: provide an emulated L2 networks over IP infrastructure. The main difference between them is the encapsulation format itself and their approach to the control plane – which is essentially the way to obtain MAC-to-IP mapping information for the tunnel end-points.

It all started with the well-known Generic Routing Encapsulation (GRE) protocol that was rebranded as NVGRE. GRE is a simple point-to-point tunneling protocol which is being used in todays networks to solve various design challenges and therfore is well understood by many network engineers. With NVGRE, the inner frame is being encapsulated with GRE encapsulation as specified in RFC 2784 and RFC 2890. The Key field (32 bits) in the GRE header is used to carry the Tenant Network Identifier (TNI) and is used to isolate different logical segments. One thing to note about GRE is the fact that it uses IP protocol number 47 for communication, i.e., it does not use TCP or UDP – which make it hard to create header entropy. Header entropy is something that you really want to have if you are using a flow-based ECMP network to carry the overlay traffic. Interesting enough, the authors of NVGRE do not cover the control plane part but only the data-plane considerations.

Other option would be Virtual Extensible LAN (VXLAN). Unlike NVGRE, VXLAN is a new protocol that was designed to solve the overlay networks use case. It uses UDP for communication (port 4789) and a 24-bit segment ID known as the VXLAN network identifier (VNID). With VXLAN, a hash of the inner frame’s header is used as the VXLAN source UDP port. As a result, a VXLAN flow can be unique, with the IP addresses and UDP ports combination in its outer header while traversing the underlay physical network. Therefore, the hashed source UDP port introduces a desirable level of entropy for ECMP load balancing. When it comes to the control plane, VXLAN does not provide any solution, but instead relies on flooding emulated with IP multicast. The original standard recommends to create an IP multicast group per VNI to handle broadcast traffic within a segment. This requires support for IP multicast on the underlying physical network as well as proper configuration and maintenance of the various multicast trees. This approach may work for small scale environments, but for large environments with good number of logical VXLAN segments this is probably not a good idea. It also important to note here that while IP multicast is a clever way to handle IP traffic, it is not commonly implemented in Data Center networks today, and the requirement to deploy an IP multicast network (which can be fairly complex) just to introduce VXLAN is not something that is accepted in most cases. These days, it is common to see “unicast mode” VXLAN implementations that do not require any kind of multicast support.

You may also have heard about Stateless Transport Tunneling Protocol (STT) which was originally introduced by Nicira (now VMware NSX). The main reason I decided to mention STT here is one of its benefits: the ability to leverage TCP offloading capabilities from existing physical NICs, resulting in improved performance. STT uses a header that looks just like the TCP header to the NIC. The NIC is thus able to perform Large Segment Offload (LSO) on what it thinks is a simple TCP datagram. That said, new generation NICs also offer offload capabilities for NVGRE and VXLAN, so this is not a unique benefit of STT anymore.

Last but not least, I would also like to introduce Geneve: Generic Network Virtualization Encapsulation, which looks to take a more holistic view of tunneling. From a first look, Geneve looks pretty much similar to VXLAN. It uses a UDP-based header and a 24 bit Virtual Network Identifier. So what is unique about Geneve? The fact that it uses an extendable header format, similar to (long-living) protocols such as BGP, LLDP, and IS-IS. The idea is that Geneve can evolve over time with new capabilities, not by revising the base protocol, but by adding new optional capabilities. The protocol has a set of fixed header, parameters and values, but then leave room for non-defined optional fields. New fields can be added to the protocol by simply defining and publishing them. The protocol is created in such a way that implementations know there may be optional fields that they may or may not understand. Although the protocol is new, there is some work to enable Open vSwitch support as well as NIC vendors announcing support for offloading capabilities.

I also want to leave room here for some other protocols that can be used as an encapsulation option. There is nothing wrong with MPLS for example, other than the fact that it requires to be enabled throughout the underlying transport network.

So should I pick a winner? probably not. As you can see you have got some options to choose from, but let’s make it clear: all protocols discussed above are ignoring the real problems (hint: control-plane operations) and providing a nice framework for data-plane encapsulation, which is just part of the deal. If I need to pick one, I would say that it looks like VXLAN and Geneve are here to stay (but we should let the market decide).

Tunnel End Point

I have already mentioned the term tunnel end-point, sometimes refer to as VTEP, earlier. But what is this end-point, and more importantly, where is it located? The function of VTEP is to encapsulate the VM traffic within an IP header to send across the underlying IP network. With the most common implementations, the VMs are unaware of the VTEP. They just send untagged or VLAN-tagged traffic that needs to classified and associated with a VTEP. Initial designs (which are still the most common ones) implemented the VTEP functionality within the hypervisor which houses the VMs, usually in the software vSwitch. While this is a valid solution that is probably here to stay, it also worth mentioning an alternative design in which the VTEP functionality is implemented in hardware, for e.g., within a top-of-rack (ToR) switch. This makes sense is some environments, especially where performance and throughput is critical.

Control plane or flooding

Probably the most interesting question to ask when picking an overlay network solution is what’s going on with the control plane and how the network is going to handle Broadcast, Unknown unicast and Multicast traffic (sometimes refer to as BUM traffic). I am not to going to provide easy answers here, simply because of the fact that there are plenty of solutions out there, each addresses this problem differently. I just want to emphasize that the protocol you are going to use to form the overly network (e.g., NVGRE, VXLAN, or what have you) is essentially taking care only for the data-plane encapsulation. For control plane you will need to rely either on flooding (basically continue to learn MAC addresses via the “flood and learn” method to ensure that the packet reaches all the other tunnel end-points), or consulting some sort of database which includes the MAC to IP bindings in the network (e.g., an SDN controller).

Connectivity with the outside world

Another factor to consider is the connectivity with the outside world – or how can a VM within an overlay network communicate with a device resides outside of the network. No matter how much overlays would be popular throughout the network, there are still going to be devices inside and outside of the Data Center that speaks only native IP or understand just 802.1Q VLANs. In order to communicate with those the overlay packet will need to get into some kind of a gateway that is capable of bridging or routing the traffic correctly. This gateway should handle the encapsulation/decapsulation function and provide the required connectivity. As with the control plane considerations, this part is not really covered in any of the encapsulation standards. Common ways to solve this challenge is by using virtual gateways, essentially logical routers/switches implemented in software (take a look on Neutron’s l3-agent to see how OpenStack handle this), or by introducing dedicated physical gateway devices.


Are overlays the only option?

I would like to summarize this post by emphasizing that overlays are an exciting technology which probably makes sense in certain environments. As you saw, an overly-based solution needs to be carefully designed, and as always depends on your business and network requirements. I also would like to emphasize that overlays are not the only option to scale-out networking, and I have seen some cool proposals lately which are probably deserve their own post.

The need for Network Overlays – part I

The IT industry has gained significant efficiency and flexibility as a direct result of virtualization. Organizations are moving toward a virtual datacenter model, and flexibility, speed, scale and automation are central to their success. While compute, memory resources and operating systems were successfully virtualized in the last decade, primarily due to the x86 server architecture, networks and network services have not kept pace.

The traditional solution: VLANs

Way before the era of server virtualization, Virtual LANs (or 802.1q VLANs) were used to partition different logical networks (or broadcast domains) over the same physical fabric. Instead of wiring a separate physical infrastructure for each group, VLANs were used efficiently to isolate the traffic from different groups or applications based on the business needs, with a unique identifier allocated to each logical network. For years, a physical server represented one end-point from the network perspective and was attached to an “access” (i.e., untagged) port in the network switch. The access switch was responsible to enforce the VLAN ID as well as other security and network settings (e.g., quality of service). The VLAN ID is a 12-bit field allowing a theoretical limit of 4096 unique logical networks. In practice though, most switch vendors support much lower number to be configured. You should remember that for each active VLAN in a switch, a VLAN database need to be maintained for proper mapping of the physical interfaces and the MAC addresses associated with the VLAN. Furthermore, some vendors would also create a different spanning-tree (STP) instance for each active VLAN on the switch which require additional memory cycles.

VLANs are a perfect solution for small-scale environments, where the number of end-points (and MAC addresses respectively) is small and controlled. With virtualization though, one server, now called hypervisor, can host many virtual machines and many network end-points. As I stated before, the networks have not kept pace, and the easiest (and also rational) thing to do was to reuse the good-old VLANs. We were essentially adding an additional layer of software access switch in the hypervisor to link the different virtual machines on the host, and those server “access” ports in the physical switch that traditionally were untagged, are now expecting tagged traffic with different VLAN IDs differentiating between the virtual machine networks. The main issue here is the fact that the virtual machines MAC addresses must be visible end-to-end throughout the network core. Reminder: VLANs must be properly configured on each switch along the path, as well as on the appropriate interfaces to get end-to-end MAC learning and connectivity.

In a virtualized world, where the number of end-points is constantly increasing and can be very high, VLANs is a limited solution that does not follow one of the main participles beyond virtualization: use of software application to divide one physical resource into multiple isolated virtual environments. Yes, VLANs does offer segmentation of different logical networks (or broadcast domains) over the same physical fabric, but you still need to manually provision the network and make sure the VLANs are properly configured across the network devices. This start to become a management and configuration nightmare and simply does not scale.

Where network vendors started to be (really) creative

 

At this point, when there was no doubt that VLANs and traditional L2 based networks are not suitable for large virtualized environments, plenty of network solutions were raised. I don’t really want to go into detail on any of those, but you can look for 802.1Qbg, VM-FEX, FabricPath, TRILL, 802.1ad (QinQ), and 802.1ah (PBB) to name a few. In my view, these are over complicating the network while ignoring the main problem – L2-based solution is a bad thing to begin with, and we should have looked for something completely different (hint: L3 routing is your friend).

Overlays to the rescue

 

L3 routing is a scalable and well-known solution (it runs the Internet, isn’t it?). With proper planning, routing domains can handle massive number of routes/networks, keeping the broadcast (and failure) domains small. Furthermore, most modern routing operating systems can utilize equal-cost multi-path (ECMP) routing, effectively load-sharing the traffic across all available routed links. In contrast, by default spanning-tree protocol (STP) blocks redundant L2 switched links to avoid switching loops, simply because that there is no way to handle loops within a switched environment (there is no “time-to-live” field within an Ethernet frame).

Routing sounds a lot better, but note that L2 adjacency is required by most applications running inside the virtual machines. L2 connectivity between the virtual machines is also required for virtual machine mobility (e.g., Live Migration in VMware terminology). This is where overlay networks enter the picture; an overlay network is a computer network which is built on top of another network. Using an overlay, we can build a L2 switched network on top of a L3 routed network. Don’t get it wrong – overlays are not a new networking concept and are already used extensively to solve many network challenges (see GRE tunneling and MPLS L2/L3 VPNs for some examples and use cases).

In the next post I will bring the second part of this article, diving into the theory behind network overlays and the way they tend to solve the network virtualization case.