As Network Engineers, we get to use plenty of open sourced tools to help with our workflows. Open source projects like Linux, Python, and Netmiko are not only built by the code maintainer—many generous engineers contribute back to the source code. And if you find yourself using any open source tools extensively, it’s good practice (and even good karma) to try to give something back to these projects. However, contributing back to something like the Python source code can seem very daunting and scary if you’ve never contributed to an open source project before. No need to fear, though, because there are projects with an easy and simple way to get started! And all you need to know is a little Python and Regex. Sound exciting? Then let’s get you started on your way to contributing to your first open source project!
I am of course talking about the wonderful world of CLI parsers. Projects like NTC Templates and Cisco Genie Parser aimed to take raw CLI outputs from our network devices and convert them into nice and clean structured data for other programs or scripts to use. They achieve this amazing feat by using custom Regex patterns for each and every command that could possibility be sent to network devices.
If this is your first time ever hearing about the magic of CLI Parsers, then boy am I excited for you! Our very own Mikhail Yohman has created an entire series to help you dive deep into Parsing Strategies! He goes very in depth on how these parsers work under the hood, so I won’t go into specifics here.
What I do want to highlight is that these projects encourage developers to submit custom Regex patterns for commands that are not in the project currently. This is an awesome opportunity to not only give to the community, but also to get your code into a major open source project! The post will walk you through how to develop a Regex pattern that can match an entire output.
Okay, in order to write custom Regex patterns that can grab entire outputs, we first need to go over some Regex syntax that wasn’t covered in my previous post.
\s\d
First up are some shortcuts we call metacharacters that are built into Regex. The two we are going to talk about are \s
and \d
.
\s
is a shortcut for any whitespace character. This, of course, includes the space character; but it can also capture characters like tab and new line!
\d
is a quick one to understand because it just means any digit character. So instead of writing [0-9]
, we can just drop in \d
to represent the same thing.
Any metacharacter can be capitalized to indicate you want to capture the opposite. So just as \s
matches a whitespace character, a capital \S
matches any NON-whitespace character. Super helpful when you want to grab something that maybe contains letters and numbers!
{}*+
If you want to match something 1 to 3 times, you can use quantifiers {}
and write {1,3}
. But you don’t have to be limited to just those two numbers. If you omit the last number like {3,}
, you would match a pattern three or more times.
Regex has some cool shorthand for common quantifiers. We can use +
instead of {1,}
to match something one or more times and *
instead of {0,}
for zero or more times.
(?P<name>)
Capture groups are a really awesome feature in Regex because it allows us to assign the data that we parse with a pattern to a number so we can refer back to it later. Named capture groups are the same thing, but they let us assign a named key to a value instead of a number. We can let Regex know we want to use a named capture group by adding a ?P<some_name>
to a normal capture group ()
. Altogether it would look something like this: (?P<some_name>)
Let’s use the MAC address 0000.0c59.f892
to help explain. Like we discussed before, \S
can match any non-whitespace character, so it is perfect for this case since MAC addresses can contain letters, numbers, and other special characters. We can then use the pattern (?P<mac_address>\S+)
to capture the value and assign it a key name. This will produce the following key/value pair:
"mac_address": "0000.0c59.f892"
|
One last thing I want to go over is the OR operator. Sometimes you want to match more than one thing for a particular capture group. In those cases we can use the OR operator represented with |
. So if you wanted to find Arista
OR Cisco
, you can use (Arista|Cisco)
to search for both cases.
All right with all that down, let’s give it a try with an output from the command show ip arp
. I like to use Regex101 to test out new parsers, so let’s drop our output in the text box.
In order to tackle the output, we just go piece by piece. Most Regex projects will parse line by line, so we can actually ignore the header of this output for now. We want to capture the value under Protocol Internet
. Looks like a good candidate for \S+
for any non-whitespace character occurring one or more times.
Whoa! Everything lit up. But really that makes sense, given what \S+
means. Don’t fret though, let’s keep going. Next is the whitespace that we can capture \s+
Let’s move on to the value under address Address
. We can use the handy \d+
to grab any digit one or more times and use \.
to backslash out the .
in the IPv4 address.
Keep in mind that using \d+
is just a quick way to represent one octet in an IPv4 address. If you are validating user input, you might want to be more precise with the pattern. We can use it in this case because a network device should not output any weird IPv4 address like 999.999.999.999
.
But awesome—look at that! For the next few values, I’m going to use our friend \S+
and match the names of our keys to the headers in the output. Remember to include the \s+
for the whitespace in between values.
Looking good! That last line isn’t working because Interface
has a case where there may be no output. No worries, we can account for this by using the |
for cases with no output.
Awesome! Our final pattern can now match an entire line of different patterns!
I hope that was fun! If this at all interested you, I encourage you to take a look at the contribution pages for NTC Templates and Cisco Genie Parser. Both projects use Regex in very different ways, so you’re going to have to get a little familiar with the particular project you want to contribute to before starting your first parser. Cisco Genie Parser uses the Regex library that is built into Python, which is what we walked through in the example above. NTC Templates leverages TextFSM, which has a language all its own. A lot of the same concepts apply, you just may need to get yourself familiar with TextFSM before starting.
If you want to go little more in depth than what we did here, again, Mikhail Yohman has an entire series on different parsing strategies and even talks specifically about both NTC Templates and Cisco Genie Parser.
Knox Hutchinson also has an awesome video on Genie Parser from SCRATCH, in which he walks you through the entire process of making the parser and actually creating a pull request to have it merge into the source code.
I hope you enjoyed this quick peek into the wonderful world of parsers. This post is just meant to give you some inspiration to make your first contribution to an open source project. It can be a rewarding and fulfilling process that I think every engineer should experience in their career. Thanks for reading, and I look forward to seeing whatever you create!
-Robert
Edit – 03/10/2021 – Another great resource you should check out is Juhi Mahajan’s fantastic step-by-step video: How to write a Genie parser for Cisco!
Share details about yourself & someone from our team will reach out to you ASAP!