Graphing Cisco Nexus interface statistics via Python, Telegraf and Grafana
In this post we are looking into how to fetch, save and visualize various interface counters on Cisco Nexus switches using Cisco NXAPI. Tools we are using are a Python script to fetch the data via NXAPI, Telegraf to control the script, InfluxDB to save the data and finally Grafana to visualize the data.
These examples cover fetching the basic interface metrics like; bit and packet rates, errors and such. You can easily modify the script to gather any data you wish, that is available via NX-API. This post only covers NX-OS side of things (and not Cisco ACI).
Please see the Cisco NX-API documentation here and here for further information and definitions of the APIs.
Prerequisites
I'm running all the software on Ubuntu 20.04 server and Python 3.10.2. Lesser versions should work also, though Python < 3 has not been tested (and should not be used anymore anyways).
- Cisco Nexus switches with NX-API enabled. (NX-OS version 9.3(1) - 9.3(6) tested)
- InfluxDB 2.x installed (1.x will also work but the Telegraf config is a bit different)
- Telegraf installed
- Grafana installed
Get Cisco Nexus Interface statistics using Python script
This Python script will fetch the statistics via NX-API, parse the JSON-formatted result and output it in either JSON or InfluxDB format. In this script the switches and interfaces must be set in the data structure (switch_array). You can of course implement the script to grab the switch input from a JSON file or any other input.
Worth noting is that I'm monitoring only Ethernet interfaces in this case. If you want to monitor for example Port-Channels or vPC interfaces, you must query "sys/intf/aggr-" instead of "sys/intf/phys-" path.
Note: You can easily modify the script to save the data anywhere you like or for example perform data calculations and such before outputting the data.
import argparse
import textwrap
import requests
import json
import urllib3
import time
import sys
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# Login to NX-API function
def aaa_login(username, password, ip_addr):
# Define the payload
payload = {
'aaaUser': {
'attributes': {
'name': username,
'pwd': password
}
}
}
url = "https://" + ip_addr + "/api/mo/aaaLogin.json"
auth_cookie = {}
# Send the login request
response = requests.request("POST", url, data=json.dumps(payload), verify=False)
# Parse the login request and grab the authentication cookie
if response.status_code == requests.codes.ok:
data = json.loads(response.text)['imdata'][0]
token = str(data['aaaLogin']['attributes']['token'])
auth_cookie = {"APIC-cookie": token}
# Return status code and authentication cookie
return response.status_code, auth_cookie
# Logout from NX-API function
def aaa_logout(username, ip_addr, auth_cookie):
# Define the payload
payload = {
'aaaUser': {
'attributes': {
'name': username
}
}
}
url = "https://" + ip_addr + "/api/mo/aaaLogout.json"
# Send the logout request
response = requests.request("POST", url, data=json.dumps(payload), cookies=auth_cookie, verify=False)
# NX-API GET function
def nxapi_get(url, auth_cookie):
response = requests.request("GET", url, cookies=auth_cookie, verify=False)
return response
# Main function
if __name__ == '__main__':
# Print help to CLI and parse the arguments
parser=argparse.ArgumentParser(
formatter_class=argparse.RawDescriptionHelpFormatter,
description=textwrap.dedent('''\
Cisco Nexus get interface counters via NX-API
--------------------------------
Required parameter is the output format (-f json/influxdb)
Examples for usage:
python nxapi_interfacestats.py -f json
'''))
parser.add_argument('-f', required=True, type=str, dest='arg_format', help='Output format')
args=parser.parse_args()
output_format = args.arg_format
# Verify that correct output format is set
if output_format not in "json" and output_format not in "influxdb":
print("Output format not set correctly. Use -f json or influxdb as an argument.")
sys.exit()
# Username and password
username = 'USERNAME'
password = 'PASSWORD'
# Array where to include the switches and interfaces to get the data from
switch_array = {
'switch1':
{
'mgmt': '192.168.1.1',
'name': 'switch1',
'interfaces': ['1/1', '1/2']
},
'switch2':
{
'mgmt': '192.168.1.2',
'name': 'switch2',
'interfaces': ['1/1', '1/2']
}
}
# Get timestamp and define needed variables
ts = time.strftime('%Y-%m-%dT%H:%M:%SZ')
unixtime = time.time_ns()
interface_stats = {}
interface_stats = {'time': ts}
interface_stats['interfaces'] = []
# Parse switches from the input array
for key, value in switch_array.items():
# Call the login function
status, auth_cookie = aaa_login(username, password, value['mgmt'])
if status == requests.codes.ok:
for port in value['interfaces']:
# Get input stats for the interface
url = "https://" + value['mgmt'] + "/api/node/mo/sys/intf/phys-[eth" + port + "]/dbgIfIn.json"
result_in = nxapi_get(url, auth_cookie)
# Get output stats for the interface
url = "https://" + value['mgmt'] + "/api/node/mo/sys/intf/phys-[eth" + port + "]/dbgIfOut.json"
result_out = nxapi_get(url, auth_cookie)
result_in_json = result_in.json()
result_out_json = result_out.json()
# If output format is JSON, parse the output and append into the output dictionary
if output_format in "json":
interface_stats['interfaces'].append({
'switch': value['name'],
'interface': "Ethernet" + port,
'discards_in': result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["discards"],
'discards_out': result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["discards"],
'errors_in': result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["errors"],
'errors_out': result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["errors"],
'multicastPkts_in': result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["multicastPkts"],
'multicastPkts_out': result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["multicastPkts"],
'broadcastPkts_in': result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["broadcastPkts"],
'broadcastPkts_out': result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["broadcastPkts"],
'ucastPkts_in': result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["ucastPkts"],
'ucastPkts_out': result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["ucastPkts"],
'octets_in': result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["octets"],
'octets_out': result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["octets"],
'octetRate_in': result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["octetRate"],
'octetRate_out': result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["octetRate"],
'packetRate_in': result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["packetRate"],
'packetRate_out': result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["packetRate"]
})
# If output format is InfluxDB, parse the output and print out the data in InfluxDB format
elif output_format in "influxdb":
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " discards_in=" + result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["discards"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " discards_out=" + result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["discards"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " errors_in=" + result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["errors"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " errors_out=" + result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["errors"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " multicastPkts_in=" + result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["multicastPkts"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " multicastPkts_out=" + result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["multicastPkts"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " broadcastPkts_in=" + result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["broadcastPkts"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " broadcastPkts_out=" + result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["broadcastPkts"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " ucastPkts_in=" + result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["ucastPkts"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " ucastPkts_out=" + result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["ucastPkts"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " octets_in=" + result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["octets"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " octets_out=" + result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["octets"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " octetRate_in=" + result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["octetRate"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " octetRate_out=" + result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["octetRate"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " packetRate_in=" + result_in_json["imdata"][0]["rmonIfIn"]["attributes"]["packetRate"] + " " + str(unixtime))
print("nexusinterface,switch=" + value['name'] + ",interface=Ethernet" + port + " packetRate_out=" + result_out_json["imdata"][0]["rmonIfOut"]["attributes"]["packetRate"] + " " + str(unixtime))
# Log out after getting the statistics
aaa_logout(username, value['mgmt'], auth_cookie)
# Print out the data in JSON format if requested in the arguments
if output_format in "json":
print(json.dumps(interface_stats, indent=2))
Automate the data gathering with Telegraf
In this example we are using Telegraf to call the Python script with specific intervals to fetch the data and save it to InfluxDB.
In the configurations below, only the output and inputs are defined. Other Telegraf config must be present also.
# Define InfluxDB v2 output
[[outputs.influxdb_v2]]
urls = ["http://INFLUXDB:8086"]
token = "TOKEN"
organization = "ORG"
bucket = "nxos_counters"
tagexclude = ["tag1"]
[outputs.influxdb_v2.tagpass]
tag1 = ["nexusinterface"]
# Cisco Nexus interface statistics poller which polls every 30 seconds
[[inputs.exec]]
command = "/usr/bin/python3 /pollers/nxapi_interfacestats.py -f influxdb"
data_format = "influx"
timeout = "15s"
interval = "30s"
[inputs.exec.tags]
tag1 = "nexusinterface"
Verify data input in InfluxDB browser
Browse through the InfluxDB Data Explorer to verify that the data is coming to the proper Bucket and is getting written correctly.
Visualize data in Grafana
Below are two example graphs and InfluxDB Flux queries to get and format the data. In addition, I'm using data variables (for example switches and intefaces), you must configure these in the dashboard variable settings.
However, you don't have to use the data variables. You can change / remove the filters for switches and interfaces in the Flux queries below. The queries will then display everything in the same view, without the ability to filter the data in Grafana.
// Cisco Nexus - Packet Rate Graph - 5min aggregate data (Time series graph)
from(bucket: "nxos_counters")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "nexusinterface")
|> filter(fn: (r) => r["switch"] == "${switches}")
|> filter(fn: (r) => r["_field"] == "packetRate_in" or r["_field"] == "packetRate_out")
|> filter(fn: (r) => r["interface"] =~ /${interfaces:regex}$/)
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "last")
// Cisco Nexus - Packet Rate Graph - 30sec data (Time series graph)
from(bucket: "nxos_counters")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "nexusinterface")
|> filter(fn: (r) => r["switch"] == "${switches}")
|> filter(fn: (r) => r["_field"] == "broadcastPkts_in" or r["_field"] == "multicastPkts_in" or r["_field"] == "ucastPkts_in")
|> filter(fn: (r) => r["interface"] =~ /${interfaces:regex}$/)
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
|> map(fn: (r) => ({ r with _value: (r.broadcastPkts_in + r.multicastPkts_in + r.ucastPkts_in) / 30.0 }))
|> derivative(unit: 30s, nonNegative: true, columns: ["_value"], timeColumn: "_time")
|> rename(columns: {_value: "packetRateIn_Calculated"})
// Cisco Nexus - Bitrate Graph - 5min aggregate data (Time series graph)
from(bucket: "nxos_counters")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "nexusinterface")
|> filter(fn: (r) => r["switch"] == "${switches}")
|> filter(fn: (r) => r["_field"] == "octetRate_out" or r["_field"] == "octetRate_in")
|> filter(fn: (r) => r["interface"] =~ /${interfaces:regex}$/)
|> map(fn: (r) => ({
r with _value: int(v: r._value) * 8
}))
|> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
|> yield(name: "last")
// Cisco Nexus - Bitrate Graph - 30sec data (Time series graph)
from(bucket: "nxos_counters")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "nexusinterface")
|> filter(fn: (r) => r["switch"] == "${switches}")
|> filter(fn: (r) => r["_field"] == "octets_in" or r["_field"] == "octets_out")
|> filter(fn: (r) => r["interface"] =~ /${interfaces:regex}$/)
|> map(fn: (r) => ({
r with _value: int(v: r._value) * 8 / 30
}))
|> derivative(unit: 30s, nonNegative: true, columns: ["_value"], timeColumn: "_time")
|> yield(name: "last")
Example on how to grab the data variables from the InfluxDB data for usage in Flux queries. Configure these under Dashboard -> Variables.
from(bucket: "nxos_counters")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "nexusinterface")
|> keep(columns: ["switch"])
|> distinct(column: "switch")
|> keep(columns: ["_value"])
from(bucket: "nxos_counters")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "nexusinterface")
|> filter(fn: (r) => r["switch"] == "${switches}")
|> keep(columns: ["interface"])
|> distinct(column: "interface")
|> keep(columns: ["_value"])
Conclusion
This was an quick example on how to gather and visualize some very informative data from Cisco Nexus switches via Cisco NX-API. Especially the 30 second interval data will allow you to see the spikes and anomalities in the traffic in a more detailed manner.
The Python script above can be found in the GIT repository
Later on I will be posting an example on how to get the "real" Telemetry data from Cisco Nexus switches using a Telemetry listener and Grafana.