Wednesday, March 27, 2013

Building a capture ACL/sniffing on the frontend of a Cisco ASA

I was working on debugging a VIP on a Citrix Netscaler awhile back to verify traffic was passing with no loss and that connections were being attempted or established through a Cisco ASA firewall. To do that, I built a capture ACL with a non-circular buffer(has to be cleared) on the outside interface. I made a named ACL(extended) to capture traffic to a particular VIP(1.1.1.1) since there was very little communication at the time to check connectivity:

asa# conf term
asa(config)# access-list snifftest extended permit tcp any host 1.1.1.1 eq https
asa(config)# capture snifftest access-list snifftest interface outside

To show buffer data for the "snifftest" capture:
 
asa# show capture snifftest
2 packets captured
   1: 15:59:24.543169 2.2.2.2.53360 > 1.1.1.1.443: S 2744539342:2744539342(0) win 5840 <mss 1380,sackOK,timestamp 2400379488 0,nop,wscale 1>
   2: 15:59:24.600158 2.2.2.2.53360 > 1.1.1.1.443: . ack 3969450766 win 5840
2 packets shown
asa# show capture snifftest
2 packets captured
   1: 15:59:24.543169 2.2.2.2.53360 > 1.1.1.1.443: S 2744539342:2744539342(0) win 5840 <mss 1380,sackOK,timestamp 2400379488 0,nop,wscale 1>
   2: 15:59:24.600158 2.2.2.2.53360 > 1.1.1.1.443: . ack 3969450766 win 5840
2 packets shown
asa#

To clear the buffer:

asa# clear capture snifftest
asa# show capture snifftest
0 packet captured
0 packet shown

To remove the ongoing capture:
 
asa# conf term
asa(config)# no capture snifftest
asa# show capture
asa#

To remove the ACL added:

asa#conf term
asa(config)# no access-list snifftest line 1 extended permit tcp any host 1.1.1.1 eq https

asa# show access-list snifftest
ERROR: access-list <snifftest> does not exist
asa#

This was the first step into checking the path for packet loss including things like sniffing the Netscaler DMZ interface(nstrace) and the backend web server interface to ensure simple TCP connections were being established before load. It, of course, gets tricky at the load balancer as you're changing backend server IPs/ports and even frontend inbound IPs/ports to follow a stream...

Determining individual CPU utilization on Checkpoint with SNMP

One of the biggest problemos I find in identifying client issues is trying to correlate the time an issue is reported with log data. The data usually isn't granular enough or not collected due to peak capacity or connectivity loss. Recently, we had an issue with high load causing high latency and connectivity loss. To correlate when database connections were lost with monitoring data, we employ Solarwinds for trending/historical data among other things. I'm learning my way around Solarwinds and the first thing I noticed in a setup is that you need a microscope to identify the data point created by Solarwinds with granularity to 1 day with 1 minute intervals. However, there are also gaps in data collection at high load times which doesn't help identify what actual load the hardware was seeing. 

So, you see a confetti of data points and a few gaps in data but that could be network connectivity issues due to the load. I did notice that one particular CPU was pegged out and wanted to focus on capturing the actual CPU utilization to rule out high memory or interface utilization as the cause. This happened to be a Checkpoint 12600 firewall with performance issues. The operating system is actually based on the Linux kernel which opens many doors from a SNMP monitoring standpoint based on system MIBs. I didn't want to actively watch "top" and wait to use time wisely. After doing some "research," I was able to locate the OIDs for determining the CPU hardware and poll the total utilization for individual CPUs below:

[Expert@12600:0]# snmpwalk -c <community> -v2c 127.0.0.1 HOST-RESOURCES-MIB::hrDeviceDescr
...
HOST-RESOURCES-MIB::hrDeviceDescr.768 = STRING: GenuineIntel: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
HOST-RESOURCES-MIB::hrDeviceDescr.769 = STRING: GenuineIntel: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
HOST-RESOURCES-MIB::hrDeviceDescr.770 = STRING: GenuineIntel: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz

...

To determine current utilization for the specific CPU:

[Expert@12600:0]#snmpget -c <community> -v2c 127.0.0.1 HOST-RESOURCES-MIB::hrProcessorLoad.768
HOST-RESOURCES-MIB::hrProcessorLoad.768 = INTEGER: 79       <--Percent Utilization 

From that, I built this Bash script to get continuous polling for a particular period to correlate:

[Expert@12600:0]# cat /tmp/cpu1.sh
#!/bin/bash

while [ 1 ]; do
        atime=`date`
        cpu=`snmpwalk -c <community> -v2c 127.0.0.1 HOST-RESOURCES-MIB::hrProcessorLoad.768`
        /bin/echo $atime $cpu
        sleep 1
done  


[Expert@12600:0]#

I took this script and piped it to a temporary file to allow me to gather granular data every second without constantly watching which is nice. After talking with support and looking at trending data, we were told our version of code doesn't evenly distribute load to fully utilize the CPUs.