EEM on Cisco Nexus 7000

Just been confronted by an issue with one of our customer’s datacenter. Their datacenter consists of Nexus 7000’s, 5000 and 4000’s.

Via MRTG reports we saw that the CPU of both the Nexus 7000 platforms reach over 50% utilization. Not mindblowing but just interesting to see what causes utilization spikes like these. The customer doesn’t even notice this on their application landscape, so wé have time to investigate pro-actively.

How can we get the “show proc cpu sort” on the moment the spike hits 50% or over. Yes, we let de switch send a SNMP trap based on thresholds above 50%. Ok, then we know when it happens, but are we that fast to get the results from the switch?? I guess not.

Can EEM give as a hand? Wel, actually it can! By applying a relatively simple action-based script we can define all sorts of  things. But we need to have at least a message telling us if it happend and an action to capture the actual processlist to troubleshoot the issue.

Here we go:

NX7K-SWITCH1# sh ver
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Documents: http://www.cisco.com/en/US/products/ps9372/tsd_products_support_serie
s_home.html
Copyright (c) 2002-2011, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php

Software
  BIOS:      version 3.22.0
  kickstart: version 5.1(3)
  system:    version 5.1(3)
  BIOS compile time:       02/20/10
  kickstart image file is: bootflash:///n7000-s1-kickstart-npe.5.1.3.bin
  kickstart compile time:  12/25/2020 12:00:00 [03/11/2011 08:42:56]
  system image file is:    bootflash:///n7000-s1-dk9-npe.5.1.3.bin
  system compile time:     1/21/2011 19:00:00 [03/11/2011 09:37:35]
Hardware
  cisco Nexus7000 C7010 (10 Slot) Chassis ("Supervisor module-1X")
  Intel(R) Xeon(R) CPU         with 4109560 kB of memory.
  Processor Board ID JAF1446ARES

  Device name: NX7K-SWITCH1
  bootflash:    2029608 kB
  slot0:              0 kB (expansion flash)

Kernel uptime is 572 day(s), 18 hour(s), 15 minute(s), 14 second(s)

Last reset
  Reason: Unknown
  System version: 5.1(3)
  Service:

plugin
  Core Plugin, Ethernet Plugin
CMP (Module 5) ok
 CMP Software
  CMP BIOS version:        02.01.05
  CMP Image version:       5.1(1) [build 5.0(0.66)]
  CMP BIOS compile time:   8/ 4/2008 19:39:40
  CMP Image compile time:  1/21/2011 19:00:00

CMP (Module 6) ok
 CMP Software
  CMP BIOS version:        02.01.05
  CMP Image version:       5.1(1) [build 5.0(0.66)]
  CMP BIOS compile time:   8/ 4/2008 19:39:40
  CMP Image compile time:  1/21/2011 19:00:00

NX7K-SWITCH1#conf t
NX7K-SWITCH1(config)#event manager applet HIGH-CPU
QURNLAAM00Q001(config-applet)#event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.6.1 get-type exact entry-op ge entry-val 50 poll-interval 1
NX7K-SWITCH1(config)#action 0.1 syslog msg HIGH CPU DETECTED, Process list written to flash
NX7K-SWITCH1(config)#ction 0.2 cli show process cpu sort >> bootflash:highcpu.txt
NX7K-SWITCH1(config)#action 0.3 cli show process cpu hist >> bootflash:highcpu.txt
NX7K-SWITCH1(config)#action 1.0 cli exit
NX7K-SWITCH1(config)#

For testing purposes I would suggest to lower the threshold value “50” to value “10” or lower to test the script.

NX7K-SWITCH1# dir
      49148    Jul 11 09:49:27 2011  as
      45187    Apr 28 16:30:47 2011  ftp
     351773    Feb 04 15:27:22 2013  highcpu.txt
      16384    Nov 20 13:39:40 2010  lost+found/
  146558933    Feb 11 18:30:42 2011  n7000-s1-dk9-npe.5.1.2.bin
  147087748    Apr 20 17:16:32 2011  n7000-s1-dk9-npe.5.1.3.bin
  107369112    Nov 20 13:42:31 2010  n7000-s1-dk9.5.0.2a.bin
   13564350    Apr 20 17:20:34 2011  n7000-s1-epld.5.1.1.img
   30674944    Apr 20 17:18:59 2011  n7000-s1-kickstart-npe.5.1.3.bin
   23613440    Nov 20 13:41:55 2010  n7000-s1-kickstart.5.0.2a.bin
        880    Feb 03 16:52:52 2011  test.cap
       4096    Nov 20 14:42:00 2010  vdc_2/
       4096    Nov 20 14:42:00 2010  vdc_3/
       4096    Nov 20 14:42:00 2010  vdc_4/

Download the file to your FTP/TFTP solution and view the file:

PID    Runtime(ms)  Invoked   uSecs  1Sec    Process
-----  -----------  --------  -----  ------  -----------
 2998          179        51   3526    2.9%  netstack
    1       337902  10126403     33    0.0%  init
    2            2       268      9    0.0%  kthreadd
    3         6464   1017073      6    0.0%  migration/0
    4      3350498  938593011      3    0.0%  ksoftirqd/0
    5       283644   4082651     69    0.0%  watchdog/0
    6         5110    734897      6    0.0%  migration/1
    7      3478527  887067327      3    0.0%  ksoftirqd/1
    8        17289   4098761      4    0.0%  watchdog/1
    9       289762  50711910      5    0.0%  events/0
   10       272210  51359928      5    0.0%  events/1
   11            0        12     17    0.0%  khelper
   12         4184     15603    268    0.0%  kblockd/0
   13         3924     14479    271    0.0%  kblockd/1
   14            0         2      0    0.0%  kacpid
   15            0         2      0    0.0%  kacpi_notify
   16            0         4     21    0.0%  ksuspend_usbd
   17            0         5      2    0.0%  khubd
   18        75620   3610347     20    0.0%  pdflush
   19       148563   7119056     20    0.0%  pdflush
   20            0         2      2    0.0%  kswapd0
<OUTPUT OMMITED for the sake of viewability>

    121 111    1   11   111 2    1      1 1 21  1  11 1     1  1
    702801688872989089980108269870769769571966893783785978782882
100                                                            
 90                                                            
 80                                                            
 70                                                            
 60                                                            
 50                                                            
 40                                                            
 30                                         #                  
 20 ##    #         #       #           #   ##      # #        
 10 ############################################################
    0....5....1....1....2....2....3....3....4....4....5....5....
              0    5    0    5    0    5    0    5    0    5   

               CPU% per second (last 60 seconds)
                      # = average CPU%

    222321222321222121132222122213111121111321212232344332222122
    634409643268503738763899900184897708856759864226076121811793
100                                                            
 90                                                            
 80                                                            
 70                                                            
 60                                                            
 50                                                  **        
 40                    *                   *         **        
 30 *  *  *  ** *      * ***     *         ** *   ***#*** *   *
 20 *********************##**************************###********
 10 ##################*################*#**###*################*
    0....5....1....1....2....2....3....3....4....4....5....5....
              0    5    0    5    0    5    0    5    0    5   

               CPU% per minute (last 60 minutes)
              * = maximum CPU%   # = average CPU%

    669755655554555556464556555455656555644445565454545687555546545554456574
    810278139745411380956117351196110633306382825026135312200161772734982604
100                                                                        
 90   *                                                                    
 80   *                                                 *                  
 70 * **               *   *                            **                *
 60 ******* **      ** *   * *  *** **  *     ***     ****     **  *   ****
 50 *************************** ********* * ***** *** *************** *****
 40 ************************************************************************
 30 ************************************************************************
 20 ************************************************************************
 10 ########################################################################
    0....5....1....1....2....2....3....3....4....4....5....5....6....6....7.
              0    5    0    5    0    5    0    5    0    5    0    5    0

                   CPU% per hour (last 72 hours)
                  * = maximum CPU%   # = average CPU%

For more related issue check the following Wiki: http://docwiki.cisco.com/wiki/Cisco_Nexus_7000_Series_NX-OS_Troubleshooting_Guide_–_Troubleshooting_Tools_and_Methodology

About these ads

One thought on “EEM on Cisco Nexus 7000

  1. jschlooz

    Adjusted the script to get more insight on the issue. The previous script workd fine and gave us info about the high cpu usage. It was caused by the netstack process.
    ‘netstack’ is the software process that implements the IP / TCP stack for received frames hitting control-plane.
    if netstack cpu is high for an extended period then it implies you have excessive traffic hitting that.

    Thanks to the guys at http://www.gossamer-threads.com/lists/cisco/nsp/143300?do=post_view_threaded

    Following script was initiated and awaiting results:
    !
    event manager applet HIGH-CPU
    event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.6.1 get-type exact entry-op ge entry-val 45 poll-interval 1
    action 0.1 syslog msg HIGH CPU DETECTED, Process list written to flash
    action 0.2 cli show clock >> bootflash:highcpu.txt
    action 0.3 cli show system resources >> bootflash:highcpu.txt
    action 0.4 cli show process cpu sort >> bootflash:highcpu.txt
    action 0.5 cli show process cpu hist >> bootflash:highcpu.txt
    action 0.6 cli show hardware internal cpu-mac inband stats >> bootflash:highcpu.txt
    action 0.7 cli ethanalyzer local interface inband limit-captured-frames 200 >> bootflash:highcpu.txt
    action 1.0 cli exit
    !

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s