High Availability and Redundancy in a Single ExpressRoute Circuit

Microsoft has designed its ExpressRoute circuits to support high availability and redundancy.

Physical diversity

Redundancy starts with basic physical diversity. This protects you against network outages due to equipment failure or hardware maintenance.

  • Each ExpressRoute circuit is pre-configured to include a primary and secondary connection.
  • At the physical level, the primary and secondary connections originate from two different ports on two physically separated Microsoft Enterprise Edge (MSEE) routers. From there, two separate cables complete the cross connects to PacketFabric equipment.
  • As a Microsoft ExpressRoute connectivity provider, we are required to match their redundancy measures at the MSEE.
Azure ExpressRoute physical diversity in the peering location

Physical diversity beyond the cloud on-ramp

The redundancy that has been put in place at the cloud on-ramp should extend into your on-premises network.

There are two main failure points to consider before the connections terminate:

  • Your PacketFabric source ports (the ports connected to your equipment via cross connect)

    Do not select the same source port for both hosted cloud connections. However, both ports should be in the same metro. If they are in the same facility, they should be in different PacketFabric availability zones.

    NOTE: Because the primary and secondary connections are load balanced to operate in active-active mode, you should keep them within the same geographic location. If you spread them out, the differences in latency could adversely affect performance.
  • Your on-premises equipment

    If possible, you should run your physical cross connects to two separate devices.

Azure ExpressRoute physical diversity extending to on-premises

Provisioning the secondary connection

To provision the secondary circuit connection, simply create another PacketFabric hosted cloud connection using the same service key.

Enter the same VLAN IDs, capacity, and on-ramp location. For better redundancy, you should select a different source port. That port should be in a different availability zone than the primary connection.

NOTE: Although you are automatically allocated a primary and secondary circuit connection within the Microsoft Enterprise Edge (MSEE), you are not required to use both.

We recommend that you set up both to avoid critical network outages, but it is ultimately up to you to determine your network design.

You will not “lose” the secondary connection if you do not configure it immediately. It will remain available to you as long as your ExpressRoute remains active.

Active-active mode

By default, the primary and secondary connections work in active-active mode, and Microsoft handles load balancing accordingly.

You can use route advertisements to force the connections to work in active-passive mode. However, Microsoft does not recommend doing so. For more information, see Microsoft - Active-active connections.

BGP timeout

Microsoft sets their default BGP timeout to 60 seconds. After three timeouts (180 seconds), all traffic fails over to the other router.

If 180 seconds is too long, you can use timers to configure a shorter timeout period on your on-premises router.

For example, to timeout at 30 seconds and fail over at 90 seconds, you would specify the following (Cisco router):

router bgp 65514
       neighbor 192.168.10.1 timers 30 90   

BFD

Bidirectional Forwarding Detection (BFD) can detect network failures in less than a second, without consuming as many resources as a low-threshold BGP timer. BFD is fully supported in ExpressRoute and across all PacketFabric connections.

For more information, see Microsoft - Configure BFD over ExpressRoute.

Microsoft availability zones and zone-redundant gateways

A Microsoft availability zone comprises one or more unique data centers within a region. This means that each availability zone is completely physically separated from another - including vital support systems such as electricity and cooling.

You can leverage availability zones in your ExpressRoute architecture through their zone-redundant gateways. While each VNet can only have one ExpressRoute gateway, you can have multiple instances of a gateway. These instances can be distributed across availability zones:

Microsoft Azure Availability zones

Source: Microsoft

Requirements

To implement zone-redundant availability zones, the following conditions must be met:

  • Your ExpressRoute gateway is in one of the following regions:

    Central US
    East US, East US 2
    West US 2
    France Central
    North Europe
    UK South
    West Europe
    Japan East
    Southeast Asia
    Australia East

  • Each ExpressRoute gateway is associated with a Microsoft Public IP Address (this is an Azure resource, not a generic IP address). You can create this at any time, or create one while you are creating the gateway.

    The public IP address needs to have a Standard SKU. The default SKU is Basic.

  • The public IP address must have the same SKU as the Azure Load Balancer. This means that if you are planning on using an Azure Load Balancer, it will also need to have the Standard SKU. While the Basic SKU load balancer is free, the Standard one is not.

    An Azure Load Balancer is not required, but can be useful for routing traffic within a VNet and when configuring outbound traffic. For more information, see Microsoft - What is Azure Load Balancer?.

  • The ExpressRoute gateway must have one of the following SKUs:

    ErGw1AZ
    ErGw2AZ
    ErGw3AZ

For more information, see Microsoft - About ExpressRoute virtual network gateways.