Contributing to Open Source Parsers

Blog Detail

As Network Engineers, we get to use plenty of open sourced tools to help with our workflows. Open source projects like Linux, Python, and Netmiko are not only built by the code maintainer—many generous engineers contribute back to the source code. And if you find yourself using any open source tools extensively, it’s good practice (and even good karma) to try to give something back to these projects. However, contributing back to something like the Python source code can seem very daunting and scary if you’ve never contributed to an open source project before. No need to fear, though, because there are projects with an easy and simple way to get started! And all you need to know is a little Python and Regex. Sound exciting? Then let’s get you started on your way to contributing to your first open source project!

Open Source Parsers

I am of course talking about the wonderful world of CLI parsers. Projects like NTC Templates and Cisco Genie Parser aimed to take raw CLI outputs from our network devices and convert them into nice and clean structured data for other programs or scripts to use. They achieve this amazing feat by using custom Regex patterns for each and every command that could possibility be sent to network devices.

If this is your first time ever hearing about the magic of CLI Parsers, then boy am I excited for you! Our very own Mikhail Yohman has created an entire series to help you dive deep into Parsing Strategies! He goes very in depth on how these parsers work under the hood, so I won’t go into specifics here.

What I do want to highlight is that these projects encourage developers to submit custom Regex patterns for commands that are not in the project currently. This is an awesome opportunity to not only give to the community, but also to get your code into a major open source project! The post will walk you through how to develop a Regex pattern that can match an entire output.

Some Advanced Regex Commands

Okay, in order to write custom Regex patterns that can grab entire outputs, we first need to go over some Regex syntax that wasn’t covered in my previous post.

Metacharacters\s\d

First up are some shortcuts we call metacharacters that are built into Regex. The two we are going to talk about are \s and \d.

\s is a shortcut for any whitespace character. This, of course, includes the space character; but it can also capture characters like tab and new line!

\d is a quick one to understand because it just means any digit character. So instead of writing [0-9], we can just drop in \d to represent the same thing.

Any metacharacter can be capitalized to indicate you want to capture the opposite. So just as \s matches a whitespace character, a capital \S matches any NON-whitespace character. Super helpful when you want to grab something that maybe contains letters and numbers!

Quantifiers{}*+

If you want to match something 1 to 3 times, you can use quantifiers {} and write {1,3}. But you don’t have to be limited to just those two numbers. If you omit the last number like {3,}, you would match a pattern three or more times.

Regex has some cool shorthand for common quantifiers. We can use + instead of {1,} to match something one or more times and * instead of {0,} for zero or more times.

Named capture groups(?P<name>)

Capture groups are a really awesome feature in Regex because it allows us to assign the data that we parse with a pattern to a number so we can refer back to it later. Named capture groups are the same thing, but they let us assign a named key to a value instead of a number. We can let Regex know we want to use a named capture group by adding a ?P<some_name> to a normal capture group (). Altogether it would look something like this: (?P<some_name>)

Let’s use the MAC address 0000.0c59.f892 to help explain. Like we discussed before, \S can match any non-whitespace character, so it is perfect for this case since MAC addresses can contain letters, numbers, and other special characters. We can then use the pattern (?P<mac_address>\S+) to capture the value and assign it a key name. This will produce the following key/value pair:

"mac_address": "0000.0c59.f892"

OR operator|

One last thing I want to go over is the OR operator. Sometimes you want to match more than one thing for a particular capture group. In those cases we can use the OR operator represented with |. So if you wanted to find Arista OR Cisco, you can use (Arista|Cisco) to search for both cases.

Let’s Try It Out

All right with all that down, let’s give it a try with an output from the command show ip arp. I like to use Regex101 to test out new parsers, so let’s drop our output in the text box.

In order to tackle the output, we just go piece by piece. Most Regex projects will parse line by line, so we can actually ignore the header of this output for now. We want to capture the value under Protocol Internet. Looks like a good candidate for \S+ for any non-whitespace character occurring one or more times.

Whoa! Everything lit up. But really that makes sense, given what \S+ means. Don’t fret though, let’s keep going. Next is the whitespace that we can capture \s+

Let’s move on to the value under address Address. We can use the handy \d+ to grab any digit one or more times and use \. to backslash out the . in the IPv4 address.

Keep in mind that using \d+ is just a quick way to represent one octet in an IPv4 address. If you are validating user input, you might want to be more precise with the pattern. We can use it in this case because a network device should not output any weird IPv4 address like 999.999.999.999.

But awesome—look at that! For the next few values, I’m going to use our friend \S+ and match the names of our keys to the headers in the output. Remember to include the \s+ for the whitespace in between values.

Looking good! That last line isn’t working because Interface has a case where there may be no output. No worries, we can account for this by using the | for cases with no output.

Awesome! Our final pattern can now match an entire line of different patterns!

Contributing to Projects

I hope that was fun! If this at all interested you, I encourage you to take a look at the contribution pages for NTC Templates and Cisco Genie Parser. Both projects use Regex in very different ways, so you’re going to have to get a little familiar with the particular project you want to contribute to before starting your first parser. Cisco Genie Parser uses the Regex library that is built into Python, which is what we walked through in the example above. NTC Templates leverages TextFSM, which has a language all its own. A lot of the same concepts apply, you just may need to get yourself familiar with TextFSM before starting.

Go More in Dept

If you want to go little more in depth than what we did here, again, Mikhail Yohman has an entire series on different parsing strategies and even talks specifically about both NTC Templates and Cisco Genie Parser.

Knox Hutchinson also has an awesome video on Genie Parser from SCRATCH, in which he walks you through the entire process of making the parser and actually creating a pull request to have it merge into the source code.


Conclusion

I hope you enjoyed this quick peek into the wonderful world of parsers. This post is just meant to give you some inspiration to make your first contribution to an open source project. It can be a rewarding and fulfilling process that I think every engineer should experience in their career. Thanks for reading, and I look forward to seeing whatever you create!

-Robert

Edit – 03/10/2021 – Another great resource you should check out is Juhi Mahajan’s fantastic step-by-step video: How to write a Genie parser for Cisco!



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Regex for Network Engineers

Blog Detail

Every IT professional will encounter Regex at some point in their career. Contrary to how it looks, Regex is not just some funny strings of random characters. You can in fact find it powering some of the world’s most critical technology infrastructure. Regex is supported out of the box in many different technologies, so learning just the basics can really accelerate your automation workflow. This post will teach you the basics and give you some real-world applications of Regex for Network Engineers.

Sounds good! But what is Regex?

Well, this is Regex:

[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}

Yeah, I know, looks kind of like hieroglyphics, right? Well, this may look like a foreign language, especially if you are just starting out, but as with any kind of new language, we need to work through the basics first.

Okay, for real this time. What is Regex?

Regex, or regular expression, is a pattern matching engine used to find or parse text and outputs for specified patterns. Regex is built into tools like Vim, grep, and even Python! We can use Regex to parse text and outputs so we can get only the data we need.

Why is it important for Network Engineers?

Well, there are patterns all around us in the networking world. A MAC address is nothing more than a pattern of some characters that can be A-F and 0-9 followed by a period or colon (depending on who you ask). And take the IPv4 address for example. If you had to teach someone how to spot an IPv4 address in a random list of characters, what would you tell them?

Well, you might say an IPv4 address is:

  • One to three digits followed by a period
  • One to three digits followed by another period
  • One to three digits NOT followed by a period.

Is this starting to make sense?

[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}

Well, hold your horses, we’re almost there. Let’s quickly go over some basics.

Basics

Many people, myself included, use the website Regex101 when making and testing new patterns. With this we can actually see what the Regex engine is thinking when it matches our patterns.

regex1

You can see that since I typed “text” in the Regular Expression box, it will highlight only the words that match “text”. Regex101 also includes a handy Explanation and other information on the right.

Let’s go into the different syntax you’ll be using to create Regex patterns.

Character classes [ ]

First things first, we need to learn about character classes. A character class in Regex is represented by brackets [], and it is where you put all the characters like 0-9 or A-Z that you want to match.

Quantifiers { }

A quantifier in Regex is represented by curly braces {}, which are used to express how many times you want to match your characters. For example, if you want something to match one to three times, you use {1,3}. The comma , acts as a “through” in Regex.

Escape characters \

As you may have noticed, Regex uses a lot of regular everyday characters like .[]{}/ as their pattern syntax. But what if you want to match literally something like the period .? Well, any characters like that can be matched by adding an escape character before the character. In Regex this is represented by the backslash \.

So . becomes \. to match. And, yes, you can even escape character the escape character by doing a double backslash \\.

Put these skills to work!

So that is a very basic overview of Regex. So enough talking—let’s get to work! Let’s look at our IPv4 instructions again:

  • One to three digits followed by a period
  • One to three digits followed by another period
  • One to three digits NOT followed by a period.

For digits we’re going to make a capture group for all possible digits. This would be written as [0-9].

regex2

For one to three we’re going to use our handy quantifier to express that range. This would be written as {1,3}.

regex3

Finally, the period . is a special Regex syntax, so we’ll need to use the escape character \. This would be written as \..

regex4

Okay, now let’s just repeat that—don’t forget to omit the last period .!

regex5

Look at that!

[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}

Not so scary now, is it? So let’s put this new pattern to work. The following is a simple show ip route that, of course, contains IPv4 addresses.

regex6

Let’s drop our pattern and see what happens.

regex7

SUCCESS!! Regex with our pattern can now match any IPv4 address we throw at it!


Conclusion

Thanks for reading and spending some time learning Regex with me. This, of course, is a very basic overview that just scratches the surface. We can continue to improve our pattern to make it more and more accurate at matching IPv4 addresses. For example, 999.999.999.999 would be a valid match with our pattern, but not a valid IPv4 address. This article is just meant to be a good jumping off point for you to make your own patterns.

And I hope you do feel empowered to go out and make your own Regex patterns! There are so many novel and unique ways Network Engineers are using Regex today. One of the most popular use case is parsing the CLI outputs from our network devices. There are many open sourced projects that need custom Regex patterns for the many different network devices out in the wild. We’ll go through some of those and how you can contribute to projects like NTC Templates and Cisco’s Genie Parsers in a later post.

In the meantime, can you work out a pattern to match MAC addresses? Try it out, and let me know your solution in the comments below!



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!