Parsing Strategies – An Introduction

Blog Detail

Welcome to the first post in this series about parsing unstructured data into structured data. When beginning your automation journey, you may start with quick wins that may not need to act upon operational data from show commands, but as you progress quickly through your journey, you will find the need to be able to parse the unstructured data obtained from your devices into structured data.

Unfortunately at this time, not all of us have been able to replace our “legacy” network equipment with all the newer networking products that come with APIs, streaming telemetry, etc. that help us programmatically interact with our network.

There are several parsing strategies that we will cover in greater detail along with methods to consume them:

We’ve covered parsing lightly in previous posts that use the parsing of unstructured data such as this post, to transform the data into something useable by other systems. This series will take us deeper into the “how” of parsing unstructured data.

Before we start diving too deep into the implementations, let’s discuss why parsing unstructured data into structured data is beneficial.

Why Do I Need Structured Data From My CLI?

Parsing is the act of translating a language (unstructured data that humans can easily read) to another language (structured data that a computer can easily read). Below is an example of how we’d do some form of validation with unstructured data:

>>> unstructured_data = """
... Capability codes:
...     (R) Router, (B) Bridge, (T) Telephone, (C) DOCSIS Cable Device
...     (W) WLAN Access Point, (P) Repeater, (S) Station, (O) Other
... 
... Device ID           Local Intf     Hold-time  Capability      Port ID
... S2                  Fa0/13         120        B               Gi0/13
... Cisco-switch-1      Gi1/0/7        120                        Gi0/1
... Juniper-switch1     Gi2/0/1        120        B,R             666
... Juniper-switch1     Gi1/0/1        120        B,R             531
... 
... Total entries displayed: 4
"""
>>> neighbors = [
...     "S2",
...     "Cisco-switch-1",
...     "Juniper-switch1",
]
>>> for neighbor in neighbors:
...     if neighbor in unstructured_data:
...         print(f"{neighbor} on router")
S2 on router
Cisco-switch-1 on router
Juniper-switch1 on router
>>> neighbors = [
...     {"name": "S2", "intf": "Fa0/13"},
...     {"name": "Cisco-switch-1", "intf": "Gi1/0/7"},
...     {"name": "Juniper-switch1", "intf": "Gi2/0/1"},
...     {"name": "Juniper-switch1", "intf": "Gi1/0/1"},
... ]
>>> for neighbor in neighbors:
...     for cfg_line in unstructured_data.splitlines():
...         if neighbor["name"] in cfg_line and neighbor["intf"] in cfg_line:
...             print(f"Neighbor {neighbor["name"]} is seen on {neighbor["intf"]}")
Neighbor S2 is seen on Fa0/13
Neighbor Cisco-switch-1 is seen on Gi1/0/7
Neighbor Juniper-switch1 is seen on Gi2/0/1
Neighbor Juniper-switch1 is seen on Gi1/0/1

Luckily, we can parse this data and perform meaningful comparisons on the data once we have transformed it into structured data. This gives us the ability to assert, with confidence, that the neighbors that are seen match the expected interfaces. This check can be critical in making sure that the correct configuration exists on the correct interfaces for each device.

Here is a short list that provides a few use cases as to why you may want to turn your unstructured data into structured data.

  • The ability to store the structured data in a Time Series Database (TSDB) for telemetry and analytics that can help you quickly determine the root cause of an issue that the network is experiencing.
  • Perform specific actions depending on the operational data you retrieved from the device such as bringing down an interface or bringing up a BGP peer.
  • Making sure each device is compliant operationally, such as determining that each device is seeing the proper neighbors on each of it’s interfaces.

Summary

Each of the following posts will work with the unstructured LLDP data obtained from csr1000v routers and used to assert that the neighbors that the device sees are valid neighbors per a variable we will define within the next post. This will help to determine which neighbors we’re expecting to see connected to each router. We will want to do two different checks; that each neighbor is what we are expecting to see, and that there aren’t any extra neighbors that we’re not expecting to see.

After reading these posts, you should be able to parse any unstructured data obtained from devices into structured data that is meaningful to you along your network automation journey!


Conclusion

The next post in this series will go over the topology we’ll be using throughout this series and take a dive into NTC Templates with Ansible.

-Mikhail



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Intro to Data Structures

Blog Detail

From an automation standpoint, it is important to understand what data structures are, how to use them, and more importantly how to access the data within the structure.

This post won’t go into how to build data structures as that is a whole topic in and of itself, but how do we obtain the data we want from a data structure that a system provides us such as Ansible, a web API, etc? We’re going to use Python when dissecting a data structure as we can use it to tell us what type the data is or what a specific type is within the data structure.

There are two main data types that we will discuss as they’re the most common.

Let’s start with looking at what a dictionary is and how we can get data from a dictionary using the built-in Python interactive interpreter.

Dictionaries

A dictionary is referred to as a mapping in other programming languages and is made up of key value pairs. The key must be an integer or a string, where the value may be any type of object. Dictionaries are mainly used when the order does not matter, but accessing specific data that can be found by a key. Let’s take a look at a dictionary.

>>> my_dict = {
...    'test_one': 'My first key:value',
...    'test': 'My second key:value',
...    10: 'Look at my key',
...}
>>> my_dict
{'test': 'My second key:value', 'test_one': 'My first key:value', 10: 'Look at my key'}
>>> my_dict.keys()
['test', 'test_one', 10]
>>> my_dict.values()
['My second key:value', 'My first key:value', 'Look at my key']
>>> type(my_dict)
<type 'dict'>

Note how the dictionary is in a different order than we created it in, which means the keys are significant in our ability to extract the values stored within the dictionary.

Let’s take a look at how to access data within a dictionary.

>>> my_dict['test_one']
'My first key:value'
>>> my_dict[10]
'Look at my key'
>>> my_dict['test_two']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'test_two'

Notice if we try and access a key that does not exist, we get a KeyError. This error can be avoided by using the .get() method on the dictionary. This will attempt to get the key from the dictionary and, if it doesn’t exist, will return None. It also accepts an argument that it will return if the key does not exist.

>>> test = my_dict.get("test_two")
>>> test
>>> type(test)
<class 'NoneType'>
>>> test = my_dict.get("test_two", "Return this value")

>>> test
'Return this value'
>>> if my_dict.get('test_one'):
...    print('It exists!')
...
It exists!

Now that we understand how dictionaries work and how we can obtain data from a dictionary, let’s move onto lists.

Lists

A list is referred to as an array in other programming languages and is a collection of different data types that are stored in indices within the list. A list can consist of objects of any type (strings, integers, dictionaries, tuples, etc.). The order of a list is maintained in the same order the list is created and the data can be obtained by accessing the indexes of the list.

Let’s take a look at creating a list.

>>> my_list = [
...     'index one',
...     {'test': 'dictionary'},
...     [1, 2 ,3],
... ]
>>> my_list
['index one', {'test': 'dictionary'}, [1, 2, 3]]
>>> for item in my_list:
...     type(item)
...
<type 'str'>
<type 'dict'>
<type 'list'>

As you can see, the list can store different data types, and the order is in the same order that we contructed the list in. We also iterated over the list to access each index, but we can access each item by the index they’re stored at as well.

>>> my_list[0]
'index one'
>>> my_list[1]
{'test': 'dictionary'}
>>> my_list[2]
[1, 2, 3]
>>> my_list[3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> type(my_list)
<type 'list'>

The first index of a list starts at zero and increments up by one at each index. Let’s add a new item to the list and validate the order is still intact.

>>> my_list.append(5)
>>> my_list
['index one', {'test': 'dictionary'}, [1, 2, 3], 5]
>>> len(my_list)
4
>>> my_list[3]
5

Now that we understand how lists work and how we can obtain data from a list, let’s move on and put this all together when we encounter data structures in the wild!

How to Navigate Data Structures

After the above sections, it seems like navigating and obtaining the information from a data structure is a no-brainer, but can be intimidating when you come across a more complex data structure.

facts = {
    "ansible_check_mode": False,
    "ansible_diff_mode": False,
    "ansible_facts": {
        "_facts_gathered": True,
        "discovered_interpreter_python": "/usr/bin/python",
        "net_all_ipv4_addresses": [
            "192.168.1.1",
            "10.111.41.12",
            "172.16.133.1",
            "172.16.130.1",
        ],
        "net_filesystems": [
            "bootflash:"
        ],
        "net_filesystems_info": {
            "bootflash:": {
                "spacefree_kb": 5869720.0,
                "spacetotal_kb": 7712692.0
            }
        },
        "net_gather_network_resources": [],
        "net_gather_subset": [
            "hardware",
            "default",
            "interfaces",
            "config"
        ],
        "net_hostname": "csr1000v",
        "net_image": "bootflash:packages.conf",
        "net_interfaces": {
            "GigabitEthernet1": {
                "bandwidth": 1000000,
                "description": "MANAGEMENT INTERFACE - DON'T TOUCH ME",
                "duplex": "Full",
                "ipv4": [
                    {
                        "address": "10.10.20.48",
                        "subnet": "24"
                    }
                ],
                "lineprotocol": "up",
                "macaddress": "0050.56bb.e14e",
                "mediatype": "Virtual",
                "mtu": 1500,
                "operstatus": "up",
                "type": "CSR vNIC"
            }
        }
    }
}

The above data structure is what we get from gather facts in Ansible. We’re going to deal with the data structure outside of Ansible so we can determine breakdown each type data type in the structure. This is a great example as it’s a real world data structure and has nesting that we will need to traverse to get the data we want.

Let’s start by looking at the data type of the initial structure and then see how we can get the ansible_check_mode data.

>>> type(facts)
<class 'dict'>
>>> facts.get('ansible_check_mode')
False

As you can see, the initial data structure is a dictionary and since ansible_check_mode is in this initial dictionary it makes it a key. We can get the value of ansible_check_mode by using the .get() method.

What if we want to loop over all the IP addresses within net_all_ipv4_addresses? Let’s see how we can do that.

>>> type(facts['ansible_facts'])
<class 'dict'>
>>> facts['ansible_facts'].keys()
dict_keys(['_facts_gathered', 'discovered_interpreter_python', 'net_all_ipv4_addresses', 'net_filesystems', 'net_filesystems_info', 'net_gather_network_resources', 'net_gather_subset', 'net_hostname', 'net_image', 'net_interfaces'])
>>> type(facts['ansible_facts']['net_all_ipv4_addresses'])
<class 'list'>
>>> for ip in facts['ansible_facts']['net_all_ipv4_addresses']:
...     print(ip)
...
192.168.1.1
10.111.41.12
172.16.133.1
172.16.130.1

As we can see above, net_all_ipv4_addresses is a key within the ansible_facts dictionary. We have to navigate through two nested dictionaries to get to the list of IPv4 addresses we want to print out.

Let’s move on and obtain the IP address and subnet mask on GigabitEthernet1.

>>> type(facts['ansible_facts']['net_interfaces'])
<class 'dict'>
>>> type(facts['ansible_facts']['net_interfaces']['GigabitEthernet1'])
<class 'dict'>
>>> type(facts['ansible_facts']['net_interfaces']['GigabitEthernet1']['ipv4'])
<class 'list'>
>>> len(facts['ansible_facts']['net_interfaces']['GigabitEthernet1']['ipv4'])
1
>>> type(facts['ansible_facts']['net_interfaces']['GigabitEthernet1']['ipv4'][0])
<class 'dict'>
>>> gi1_subnet = facts['ansible_facts']['net_interfaces']['GigabitEthernet1']['ipv4'][0]['subnet']
>>> gi1_address = facts['ansible_facts']['net_interfaces']['GigabitEthernet1']['ipv4'][0]['address']
>>> f"{gi1_address}/{gi1_subnet}"
'10.10.20.48/24'

This is definitely a more complex data structure to traverse to get the data we need. We’ll walk through how to traverse this data structure.

We can see that net_interfaces is a dictionary so we’ll use a key to traverse to the next level. The data we’re interested in is in the GigabitEthernet1 key. We see that is also a dictionary so we understand to get to the next level in the hierarchy, we will use another key which is ipv4. The ipv4 data is stored within a list and the length of the list is one, which means we can access it at index zero. The data within index zero is a dictionary which means we now need to access the address and subnet via their respective keys.

We store the address and the subnet in their own variables that tell the story of how we got to the data we want.

dictionary[dictionary][dictionary][dictionary][dictionary][list][dictionary]

Now that you understand how each data type works, you can tackle any complex data structure you encounter.

Let’s take a look at another example and keep flexing this muscle memory. Let’s determine how much space has been used on bootflash: by subtracting the spacefree_kb value from the spacetotal_kb value.

>>> type(facts['ansible_facts'])
<class 'dict'>
>>> type(facts['ansible_facts']['net_filesystems_info'])
<class 'dict'>
>>> type(facts['ansible_facts']['net_filesystems_info']['bootflash:'])
<class 'dict'>
>>> space_free = facts['ansible_facts']['net_filesystems_info']['bootflash:']['spacefree_kb']
>>> space_free
5869720.0
>>> space_total = facts['ansible_facts']['net_filesystems_info']['bootflash:']['spacetotal_kb']
>>> space_total
7712692.0
>>> space_used = space_total - space_free
>>> space_used
1842972.0

As you can see, we had to navigate through four dictionaries including the intitial structure, but we didn’t have to navigate through any lists this time to get the data we wanted.

Remember that these complex data structures can be intimidating, but breaking down each data type within the structure helps us deconstruct them into smaller chunks to navigate and process until we get to the data we want.

-Mikhail



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Intro to Ansible Collections

Blog Detail

The intention of this blog post is to provide an overview of Ansible Collections, including the problem they are solving, how they work, and how to use them.

The Problem

Ansible has had tremendous growth within the last several years and with growth comes growing pains. Ansible has accumulated several hundred modules in the core repository that are gated by the Ansible development team to ensure quality. This is causing a huge burden to support and becomes prohibitive in keeping up with PRs for new modules or bug fixes for existing modules. Just looking at the open issues and PRs on the Ansible Github indicates the tremendous burden. Currently there are over 4000 open issues and over 2000 open PRs.

Solving the Problem

Ansible Collections is Ansible’s attempt at alleviating the bottleneck created by having the Ansible development team manage PRs, allowing maintainers of modules to develop their modules outside of Ansible’s typical release cycle and provide quicker feature releases or bug fixes. Along with the previous reason there are a few other problems Ansible Collections is attempting to solve such as; plugin/role name collisions and difficult code sharing for most plugins. Both problems are solved by Ansible Collections use of namespaces. This helps to prevent collisions of similary named plugins, modules, etc. as well as having each collection available by it’s Fully Qualified Collection Name (FQCN). These manifest themselves in how you can install a specific collection as well as when developing modules since you can import plugins/module_utils from other collections by using their namespace. We won’t be diving into any development topics in this blog.

Click the following link to learn more about Ansible Galaxy Namespaces.

When Ansible first announced Ansible Collections it was very exciting news, but received mixed reviews from community members. There are some concerns over the quality of modules that will be moved outside of Ansible core and into Ansible Collections and how it will affect companies that have more stringent requirements when it comes to obtaining Open Source Software (OSS) from the Internet. However, there are some ways around obtaining Ansible Collections from the Internet, such as hosting your own galaxy server internally. If an organization already has an internal Ansible Galaxy server, this should be as simple as upgrading it to support collections.

Ansible is addressing some of the concerns with the quality of modules by introuding a partner program that entails a certification process to validate the collections retain the quality of modules that exist within core. The certified modules will be available in both their new Automation Hub and Galaxy.

Some of the certified partners can also be found here

Keep in mind that Ansible Collections was released with Ansible 2.8 and is still in it’s infancy. There will most likely be a lot more changes to them as feedback is provided by the community. Eventually, the majority of modules will be moved out of Ansible core and into Ansible Collections.

Ansible Collections Structure

Ansible Collections are able to include any kind of plugin available within Ansible Core along with roles, playbooks, etc. Below is an example structure of a collection:

collection/
├── docs/
├── galaxy.yml
├── plugins/
│   ├── modules/
│   │   └── module1.py
│   ├── inventory/
│   └── .../
├── README.md
├── roles/
│   ├── role1/
│   ├── role2/
│   └── .../
├── playbooks/
│   ├── files/
│   ├── vars/
│   ├── templates/
│   └── tasks/
└── tests/

As you can see, playbooks and roles can be included within the collection as well as other plugins. These are all available via the namespacing method discussed above.

Here are some links to Ansible documentation in regards to Ansible Collections: Using Collections Developing Collections

Install Ansible Collections

Let’s take a look at several methods to install Ansible Collections:

Install with default values: ~/.ansible/collections/ansible_collections

ansible-galaxy collection install fragmentedpacket.netbox_modules

Install a specific version

ansible-galaxy collection install fragmentedpacket.netbox_modules:0.0.8

Install version within specific version range

ansible-galaxy collection install "fragmentedpacket.netbox_modules:>0.0.7,<0.0.9"

Install via a requirements.yml file
---
collections:
# With just the collection name. This downloads the latest version of the collection
- fragmentedpacket.netbox_modules

# With the collection name, version, and source options
- name: fragmentedpacket.netbox_modules
  version: '0.8.0'
  source: 'The Galaxy URL to pull the collection from (default: ``--api-server`` from cmdline)'

Run the following command to install using the requirements file: ansible-galaxy collection install -r requirements.yml

Install with non-default path

NOTE: this path must be specified within ansible.cfg (collections_paths under [defaults] or the COLLECTIONS_PATHS environment variable) ansible-galaxy collection install fragmentedpacket.netbox_modules -p /other/path/to/collections

Install Ansible Collection from source git repo
(ansible) [root@54924523a23c src]# git clone git@github.com:fragmentedpacket/netbox_modules.git
(ansible) [root@54924523a23c src]# cd ansible_collection/git_repo
(ansible) [root@54924523a23c src]# ansible-galaxy collection build .
Created collection for fragmentedpacket.netbox_modules at /src/cloned-repos/ansible_collections/fragmentedpacket/netbox_modules
fragmentedpacket-netbox_modules-0.1.0.tar.gz
(ansible) [root@54924523a23c src]# ansible-galaxy collection install fragmentedpacket-netbox_modules-0.1.0.tar.gz
Process install dependency map
Starting collection install process
Installing 'fragmentedpacket.netbox_modules:0.1.0' to '/root/.ansible/collections/ansible_collections/fragmentedpacket/netbox_modules'

Now that we have the Ansible Collection installed, it is now time to use it. We’ll explain how in the next section.

Using Ansible Collections

This method uses the keyword collections at the play level:

---
- hosts: localhost
  connection: local
  gather_facts: no
  collections:
    - fragmentedpacket.netbox_modules

  tasks:
    - name: "Test collections"
      netbox_device:
        netbox_url: "http://localhost"
        netbox_token: "1234"
        data:
          name: "test"

    - import_role:
        name: netbox_modules_role

    - debug:
        msg: "{{ lookup('netbox', 'param1') | collection_filter_plugin('test') }}"

The other method that can be used is using the Fully Qualified Collection Name (FQCN):

---
- hosts: localhost
  connection: local
  gather_facts: no

  tasks:
    - name: "Test collections"
      fragmentedpacket.netbox_modules.netbox_device:
        netbox_url: "http://localhost"
        netbox_token: "1234"
        data:
          name: "test"

    - import_role:
        name: fragmentedpacket.netbox_modules.collection_role

    - debug:
        msg: "{{ lookup('fragmentedpacket.netbox_modules.netbox', 'param1') | fragmentedpacket.netbox_modules.collection_filter('test') }}"

If attempting to use a collection within a role, it currently does not appear to to inherit the play level directive for collections and will have to be specified within the task level of the roles:

- name: "Test collections"
  fragmentedpacket.netbox_modules.netbox_device:
    netbox_url: "http://localhost"
    netbox_token: "1234"
    data:
      name: "test"

OR

- name: "Test collections"
  collections:
    - fragmentedpacket.netbox_modules
  netbox_device:
    netbox_url: "http://localhost"
    netbox_token: "1234"
    data:
      name: "test"

We can validate it is using the Ansible Collection module by running the playbook with -vvv and looking for Using module file within the output.

Using module file /src/cloned-repos/ansible_collections/fragmentedpacket/netbox_device/plugins/modules/netbox_device.py

-Mikhail



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!