Virtual Switch System (VSS)

Traditional Enterprise Campus and Traditional Datacenter designs, which have grown extensively the last 5 years, end up with a few challanges. The networks have been built for scalability, services-layered and High Availability. But as traditional as they are they contain loop-prevention protocols such as STP, RSTP or PVST+ and Rapid-PVST+.
Adding additional campus building blocks to an Enterprise network will end up in extensive increase of routing peers, increase of routing reconvergence times, and asymmetric routing.
Adding Access building blocks extensively based on Layer-2 links will end up in no scalability at all. It even increases L2 convergence times. Eventually having a network where a max of 50% of your available bandwith can be used duet o blocked links.
And then we have server access virtualization. Adding numerous dual homes servers and hosting up to 50 virtual servers per physical blade.

VSS

Then we can consider the usage of VSS (Virtual Switching System). Virtualize 2 physical chassis switches (preferably 6500 with Sup720 or 4500-E or X) into one fabric. One virtualised control and data plane tied together via a VSLP or Virtual Switch Link Protocol.
Downstream we can use L3 ECMP (Equal Cost Multi-Path) or L2/L3 MEC (Multi-chassis Etherchannel) depending on the complete topology and requirements.

Before we discuss the usage of L3 routing and ECMP or L2 or L3 MEC’s let’s look at the control and data plane first.
Why use single supervisors per chassis or why should you consider the usage of 2 supervisors per chassis? What is SSO?

Statefull Switch-Over

Consider having only one chassis with two supervisors in your Enterprise access or distribution network. Without SSO a failover from a failed supervisor or Route Processor (RP) could take as long as the general boot time of the supervisor, which in most cases is too long and outage will occur. With the SSO feature one supervisor will be active for the control plane and both supervisors will be active for the data plane. Between both supervisors a synchronization will occur for a few important services:
• Startup-config
• Vlan.dat
• BOOT ROMMON variable
• CONFIG_FILE ROMMON variable
• BOOTLDR ROMMON variable
• DIAG ROMMON variable
• SWITCH_NUMBER ROMMON variable
Let’s call these things SSO-Aware or HA-Aware

There are a few restriction though:
• Both supervisors or RP’s must run the same IOS software version
• There is no support for load-sharing between RP’s
• Enhanced Object Tracking is not SSO-Aware, so don’t use it for your FHRP
• Multi-cast is not SSO-aware and will restart when a failover occurs
• Make sure your Config Register value is set the same on both RP’s, otherwise the failover will result in a failure and more outage will be created.

Try to see SSO as a switch-over for L2 HA-aware applications.

More to read on this link:

Let’s configure the sucker:

CORE-01>enable 
CORE-01#config t
CORE-01(config)# redundancy 
CORE-01(config)# mode sso 
CORE-01(config-red)# end
!
CORE-01# show redundancy 
Redundant System Information :
------------------------------
       Available system uptime = 7 days, 33 minutes
Switchovers system experienced = 0
              Standby failures = 1
        Last switchover reason = none

                 Hardware Mode = Duplex
    Configured Redundancy Mode = sso
     Operating Redundancy Mode = sso
              Maintenance Mode = Disabled
                Communications = Up

Current Processor Information :
-------------------------------
               Active Location = slot 5
        Current Software state = ACTIVE
       Uptime in current state = 7 days, 33 minutes
                 Image Version = Cisco IOS Software, s2t54 Software ... 
Synced to ... 
Copyright (c) 1986-2011 by Cisco Systems, Inc.
Compiled ... 
                          BOOT = disk0:0726_c4,12
                   CONFIG_FILE = 
                       BOOTLDR = 
        Configuration register = 0x2102

Peer Processor Information :
----------------------------
              Standby Location = slot 6
        Current Software state = STANDBY HOT 
       Uptime in current state = 7 hours, 31 minutes
                 Image Version = Cisco IOS Software, s2t54 Software ... 
Synced to ... 
Copyright (c) 1986-2012 by Cisco Systems, Inc.
Compiled ...
                          BOOT = disk0:0726_c4,12
                   CONFIG_FILE = 
                       BOOTLDR = 
        Configuration register = 0x2102

CORE-01#

Dual Supervisor setup in VSS

Consider the following diagram
VSS-CORE If one supervisor within the VSS fails the following will occur:
• Local data plane failure
• Interfaces on linecards local to the failed supervisor will be shut down
• A decrease of 50% available bandwith
• Single attached devices will be lockout of the active network
• Recovery relies on a manual process (install, configure ,etc)
• Recovery of the failed supervisor is undeterministic, depending on a field engineer to drive over and fix the problem.

The problem will probably be solved up to 8 hours after failure.

Quad Supervisor setup in VSS

Each chassis in VSS containing 2 supervisors. Sounds expensive! It is expensive!
But let’s look at the failure process.
VSS_CORE_QUADWhat will happen?

As soon as the second supervisors are inserted within each chassis they will bootup as a linecard with all ports active. Please note that this is only supported from IOS version 12.2(33)SXI4 and up. In earlier IOS version the supervisor won’t boot beyond the ROMMON because the IOS version doesn’t support dual supervisor per chassis in VSS.
SSO will make sure that a fail-over to another supervisor happens statefully, meaning with a synchronized forwarding table (via VSLP) and active data plane.

Then the active supervisor fails:
• SSO wil ensure the Hot-Standby will become active within 0 to 3 seconds
• Data plane will remain active because they’ve already synced their forwarding tables.

Problem is solved in a matter of seconds without human intervention. Consider what could have been the financial loss without two standby supervisors.

When the reload of the failed supervisor is finished it will come up as a standby supervisor or RPR-Warm and will remain in that function.

Let’s have a look on Control Plane level:

VSS-Control-Plane

RPR-Warm means the Supervisor has booted as a normal DFC linecard and is syncing some of it’s features with the inter-chassis active supervisor. Please note that the RPR-WARM doesn’t synchronize cross-chassis.
Feature that are being synced:
• Startup-config
• Vlan.dat
• BOOT ROMMON variable
• CONFIG_FILE ROMMON variable
• BOOTLDR ROMMON variable
• DIAG ROMMON variable
• SWITCH_NUMBER ROMMON variable -> otherwise the supervisor needs to be place into VSS awareness manually

Data Plane Level:

VSS-Data-Plane

Forwarding tables are in sync and data plane is active for all uplinks.

Bandwidth during Switch-over

Not only the fail-over is close to seamless, also your bandwidth is restored much quicker.

Dual Sup Setup

Dual_SUP

QUAD SUP Setup

Quad-SUP

Remember: The data plane switch-over will take between 0 and 3 seconds.

Non-Stop Forwarding

Where SSO is mostly used for L2 protocols, NSF is used in conjunction with SSO for Layer-3 routing protocols. The routing protocols must be NSF capable or HA-aware/capable to be able to react accordingly on a supervisor switch-over. NSF capable means it’s really configured for NSF. Otherwise it would be NSF aware. Remember that the RP(control plane) controls the routing protocol and packet forwarding happens on the data plane. When a RP fails the one concern is that packet forwarding still continues. At that moment nothing else is more important than continuous packet forwarding.

Routing protocols rely on CEF or dCEF to populate the FIB. Enabling CEF will mean that the line cards are performing express forwarding themselves instead of the RP. Mostly CEF is enabled on most common platforms such as Catalyst 6500, 7500 and the 12000 Internet router. Check it via the command ‘Show IP cef’.

The following routing protocols are HA-aware: OSPF, BGP, IS-IS and EIGRP, but each will have their restrictions. NSF is configured on routing protocol level and will appear as

CORE-01# show ip protocols
*** IP Routing is NSF aware ***

NSF per protocol still needs to be tweaked. Enabling it alone will not be sufficient. Please check  this page on the convergence timers.

VSS Support for Service Modules

The following service modules are supported for a VSS setup:

Service Module Minimum Cisco IOS Release Minimum Module Release
Network Analysis Module (NAM-1 and NAM-2) (WS-SVC-NAM-1 and WS-SVC-NAM-2) 12.2(33)SXH1 3.6(1a)
Application Control Engine (ACE10 and ACE20) (ACE10-6500-K9 and ACE20-MOD-K9) 12.2(33)SXI A2(1.3)
Intrusion Detection System Services Module (IDSM-2) (WS-SVC-IDSM2-K9) 12.2(33)SXI 6.0(2)E1
Wireless Services Module (WiSM) (WS-SVC-WISM-1-K9) 12.2(33)SXI 3.2.171.6
Firewall Services Module (FWSM) (WS-SVC-FWM-1-K9) 12.2(33)SXI 4.0.4

A few links:

VSS Deployment Best Practice

Cisco Statefull Switchover

Cisco Nonstop Forwarding

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s