Network Automation Architecture – Part 04

Blog Detail

Over the last two years, in our Telemetry blog posts series we discussed many telemetry and observability concepts, showing characteristics of modern network telemetry. The telemetry stack and its architectural components: collectordatabase, visualization, and alerting make the network telemetry and observability the real evolution of the network monitoring. You probably also already heard from us about Telegraf, Prometheus, Data Enrichment, and Data Normalization. Each of these functions has been already introduced in our blog series.

Introduction to Architecture of the Network Telemetry and Observability

In this blog post, we will focus on the Architecture of Telemetry and Observability. Over the last years at Network to Code we developed the Network Automation Framework, which is also composed of the Telemetry & Observability element. The Network Telemetry and Observability stack is a critical piece of any network automation strategy and is a prerequisite to building advanced workflows and enabling event-based network automation. While I mentioned a few of the tools above, it is important to note that not every telemetry stack is the same, the elements are composable. Due to rapid development growth, many interesting and valuable tools have been made available in the last years.

While we introduced the architecture elements: collector, database, visualization, please refer for the details in Nikos’ blog post. In this particular blog, let’s discuss what we take into consideration while architecting a telemetry and observability solution.

The process of architecting a telemetry system starts with the analysis of requirements. Most common challenges with respect to the telemetry systems are as follows:

  • Heterogeneous data – data coming from different sources, in different formats (CLI, SNMP, gNMI, other)
  • Quality of the data within telemetry system (e.g., decommissioned devices, lack of normalization and enrichment)
  • Quality of the exposed data (i.e., lack of meaningful dashboards)
  • Lack of correlation between events
  • Number of tools involved (including legacy, not deprecated)
  • System configuration overhead (i.e., missing devices)

As you might notice, most of our challenges are due to data quality or complexity, not necessarily due to tools or software used. Those challenges are often the triggers for the telemetry system overhaul or even a complete replacement.

Architecting the Telemetry System

Telemetry Stack Components

During the architecture process, we follow the stack architecture presented below. We consider the stack as composed of the following elements: collector, database, visualization, and alerting. For the detailed information about each of those, please refer to our previous blog posts.

Understanding Requirements

To start the architecture process, we have to define and understand constraints, dependencies, and requirements. Not every system is the same, each one has unique needs and serves a unique purpose.

Dividing requirements with regard to the specific components allows viewing the system as a set of functions, each serving a different purpose. Below, I present just a set of example requirements; while the list is not full, it might give you ideas about how many architectures we could design with different components fitting the use cases. Telemetry stacks are customizable, each of the functions can be implemented in a number of ways, including the integrations between components.

General Requirements – Examples

  • What is the data to be collected? (Logs? Flows? Metrics?)
  • What is the extensibility of the designed system?
  • What is the scalability of the designed system? Is horizontal scalability needed?
  • What is the expected access? (API? UI? CLI?)
  • Who will use the system, and how will they use it? (Capacity Planning Team? NOC? Ad hoc users?)
  • How will the system’s configuration be generated? (Collectors?)
  • How will the system’s load be distributed? (Regional pods?)
  • How does the organization deploy new applications?
  • How are users trained to use new applications?

Collector

  • What is the expected data resolution?
  • What is the expected data collection method? (gNMI? SNMP?)
  • What is the expected data? (BGP? System metrics?)
  • What is the deployment model? (Container on the network device? Stand-alone?)
  • Are the synthetic metrics needed?

Data Distribution and Processing

  • Which data will be enriched and normalized?
  • What are the needed methods to perform data manipulations? (Regex? Enum?)
  • How will the data flow between systems? (Kafka?)
  • How will the data be validated?

Database

  • What is the preferred query language? (Influx? PromQL?)
  • What are the backfilling requirements?
  • What are the storage requirements? (Retention period?)
  • What is the preferred database type? (Relational? TSDB?)

Visualization

  • Can we correlate events displayed?
  • Can we create meaningful, role-based, useful dashboards?
  • Can we automatically generate dashboards? (IaaC?)
  • Can we use source-of-truth data (e.g., site names) in the dashboards?

Alerting

  • What are the available integrations? (Automation Orchestrator? Email? Slack?)
  • How will the alerts be managed?
  • Can we use source-of-truth data (e.g., interface descriptions, SLAs) with the alerts?

Designing the System

The process of designing a telemetry system is preceded by understanding and collecting specific requirements, preparing the proof of concept (“PoC”) plan, and delivering the PoC itself. The PoC phase allows for verifying the requirements, testing the integrations, and visually presenting the planned solution. PoC is aligned with the design documentation, where we document all the necessary details of the architected telemetry and observability system. We find answers for and justify all the requirements: constraints, needs, and dependencies.

Implementing the System

Implementing a telemetry system requires us to collaborate with various teams. As we introduce the new application, imagine we have to communicate with:

  • Network Engineering (system users)
  • Security (access requirements)
  • Platform (system deployment and operations)
  • Monitoring (system users)

Telemetry and observability systems are critical to every company. We must ensure the implemented system meets all the organization’s requirements. Not only do we have to map existing functionalities into the new system (e.g., existing Alerts), we have to ensure all the integrations work as expected.

Telemetry and observability implementation involves the application deployment and configuration management. To achieve the best user experience through an integration, we can leverage the Source of Truth systems while managing the configurations. This means a modern telemetry and observability solution has the Source of Truth at its center. The configuration files are generated in a programmable way, utilizing data fetched from the SoT system to ensure that only information within the scope of the SoT is used to enrich or normalize the telemetry and observability system.

Using the System

While the system is implemented, we work on ensuring the system is being used properly. There are several use cases for the telemetry and observability, thus some of the usage examples involve:

  • Collecting from a new data source or new data (metric)
  • Scaling the collector system for a new planned capacity
  • Presenting new data on a dashboard or building a new dashboard
  • Adding a new alert or modifying an existing one
  • Receiving and handling (silencing, aggregating) an alert

Conclusion

As we recognize the potential challenges of any new system being introduced, we ensure the system’s functions are well known for system users. This is critical for telemetry and observability systems, as those typically introduce a set of protocols, standards, and solutions that might be new in a certain environment.

-Marek



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Nautobot Application: BGP Models

Blog Detail

We are happy to announce the release of a new application for Nautobot. With this application, it’s now possible to model your ASNs and BGP Peerings (internal and external) within Nautobot!

This is the first application of the Network Data Models family which gave us a great opportunity to test some new capabilities of the application framework introduced by Nautobot. Data modeling is an interesting exercise, and with BGP being a complex ecosystem, this has been an interesting project. This blog will present the application and some design principles that we had in mind when it was developed.

Nautobot

The development of this application was initially sponsored by the Riot Direct team at Riot Games. Thanks to them for contributing it back to the community.

Overview

This application adds the following new data models into Nautobot:

  • BGP Routing Instance : device-specific BGP process
  • Autonomous System : network-wide description of a BGP autonomous system (AS)
  • Peer Group Template : network-wide template for Peer Group objects
  • Peer Group : device-specific configuration for a group of functionally related BGP peers
  • Address Family : device-specific configuration of a BGP address family (AFI-SAFI)
  • Peering and Peer Endpoints : A BGP Peering is represented by a Peering object and two endpoints, each representing the configuration of one side of the BGP peering. A Peer Endpoint must be associated with a BGP Routing Instance.
  • Peering Role : describes the valid options for PeerGroupPeerGroupTemplate, and/or Peering roles

With these new models, it’s now possible to populate the Source of Truth (SoT) with any BGP peerings, internal or external, regardless of whether both endpoints are fully defined in the Source of Truth.

The minimum requirement to define a BGP peering is two IP addresses and one or two autonomous systems (one ASN for iBGP, two ASNs for eBGP).

Peering

Peering

Autonomous Systems

Autonomous Systems

Peer Endpoint

Peer Endpoint

Peer Group

Peer Group

Peering Roles

Peering Roles

Installing the Application

The application is available as a Python package in PyPI and can be installed atop an existing Nautobot installation using pip:

$ pip3 install nautobot-bgp-models

This application is compatible with Nautobot 1.3.0 and higher.

Once installed, the application needs to be enabled in the nautobot_config.py file:

# nautobot_config.py
PLUGINS = [
    # ...,
    "nautobot_bgp_models",
]

Design Principles

BGP is a protocol with a long and rich history of implementations. As we understand existing limitations of data modeling relevant to this protocol, we had to find right solutions both for innovations and improvements. In this section we explain our approach to the BGP data models.

Network View and Relationship First

One of the advantages of a Source of Truth is that it captures how all objects are related to each other and then exposes those relationships via the UI and API, making it easy for users to consume that information.

Instead of modeling a BGP session from a device point of view with a local IP address and a remote IP address, the decision to model a BGP peering as a relationship between two endpoints was chosen. This way, each endpoint has a complete understanding of what is connected on the other side, and information won’t be duplicated when a session between two devices exists in the SoT.

This design also accounts for external peering sessions where the remote device is not present in Nautobot, as is often the case when you are peering with a transit provider.

Start Simple

For the first version we decided to focus on the main building blocks that compose a BGP peering. Over time the BGP application will evolve to support more information: routing policy, community, etc. Before increasing the complexity we’d love to see how our customers and the community leverage the application.

Inheritance

Many of the Border Gateway Protocol implementations are based on the concept of inheritance. It’s possible to centralize almost all information into a Peer Group Template model, and all BGP endpoints associated with this Peer Group Template will inherit all its attributes.

The concept is very applicable to automation, and we wanted to have a similar concept in the SoT. As such, we implemented an inheritance system between some models:

  • PeerGroup inherits from PeerGroupTemplate.
  • PeerEndpoint inherits from PeerGroupPeerGroupTemplateBGPRoutingInstance.

As an example, a PeerEndpoint associated with a PeerGroup will automatically inherit attributes of the PeerGroup that haven’t been defined at the PeerEndpoint level. If an attribute is defined on both, the value defined on the PeerEndpoint will be used.

(*) Refer to the application documentation for all details about the implemented inheritance pattern.

The inherited values will be automatically displayed in the UI and can be retrieved from the REST API with the additional ?include_inherited=true parameter.

Inheritance

Extra Attributes

Extra attributes allow to describe models provided by the application with additional information. We made a design decision to allow application users to abstract their configuration parameters and store contextual information in this special field. What makes it very special is the support for inheritance. Extra attributes are not only inherited, but also intelligently deep-merged, thus allowing for inheriting and overriding attributes from related objects.

Integration with the Core Data Model

With Nautobot, one of our goals is to make it easy to extend the data model of the Source of Truth, not only by making it easy to introduce new models but also by allowing applications to extend the core data model. In multiple places, the BGP application is leveraging existing Core Data models.

Extensibility

We designed the BGP models to provide a sane baseline that will fit most of the use cases, and we encourage everyone to leverage all the extensibility features provided by Nautobot to store and organize the additional information that you need under each model or capture any relationship that is important for your organization.

All models introduced by this application support the same extensibility features supported by Nautobot, which include:

  • Custom fields
  • Custom links
  • Relationships
  • Change logging
  • Custom data validation logic
  • Webhooks in addition to the REST API and GraphQL.

An example can be seen in the Nautobot Sandbox where a relationship between a circuit and a BGP session was added to track the association between a BGP session and a given circuit.


Conclusion

More information on this application can be found at Nautobot BGP Plugin. You can also get a hands-on feel by visiting the public Nautobot Sandbox.

As usual, we would like to hear your feedback. Feel free to reach out to us on Network to Code’s Slack Channel!

-Damien & Marek



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Exploring IOS-XE and NX-OS based RESTCONF Implementations with YANG and Openconfig

Blog Detail

Here at Network to Code we work with network devices’ APIs every day. APIs are critical to enable our customers working with Network Automation. APIs are also usually the first (if applicable) choice in our solutions – even if that’s the hard way.

As we push network devices’ APIs to their limits, we go beyond the examples available on GitHub or in vendor’s documentation. One of our past use cases included RESTCONF (*) protocol usage on Cisco NX-OS 9000 series. While this was an opportunity to work with YANG, Openconfig, Postman, Python, etc. this was also a chance to understand differences between implementation of the protocol between Cisco’s platforms. For tests we used NX-OS v9.3-1 and IOS-XE 16.9.3.

Note: Jason Edelman described NETCONF and RESTCONF’s principles in 2016 in a blog post

Examining IOS-XE RESTCONF and NX-OS RESTCONF

NX-OS implementation is based on draft-ietf-netconf-restconf-10. This draft was published in March 2016, however RFC 8040 was published in January, 2017. One of the changes between those versions was a change in headers needed in the HTTP request.

For IOS-XE, the headers are declared using a dash notation in yang-data:

restconf_headers = {
    'Accept': 'application/yang-data+json',
    'Content-Type': 'application/yang-data+json'
}   

However, for NX-OS based on a draft, Cisco implemented a dot notation in yang.data:

restconf_headers = {
    'Accept': 'application/yang.data+json',
    'Content-Type': 'application/yang.data+json'
}

YANG models are different

Supported YANG models are also different between IOS-XE and NX-OS devices. It means, native models are platform specific, not necessarily vendor specific. First step when examing supported models is to retrieve advertised capabilities from the device (you might use Hank Preston’s get_capabilities.py code to do that).

Once we confirm a particular capability is supported, let’s see how to programmatically get BGP peers from NX-OS:

#!/usr/bin/env python
import requests

username = 'admin'
password = ''
device = 'nxos1'

restconf_headers = {
    'Accept': 'application/yang.data+json',
    'Content-Type': 'application/yang.data+json'
}

bgp_url = 'https://{device}/restconf/data/Cisco-NX-OS-device:System/bgp-items/inst-items/dom-items/Dom-list=default/peer-items/Peer-list'

get_response = requests.get(bgp_url.format(device=device),
                            auth=(username, password),
                            headers=restconf_headers,
                            verify=False,
                            )

Same operation for IOS-XE based platform would require a change in bgp_url variable:

#!/usr/bin/env python
import requests

username = 'admin'
password = ''
device = 'csr1'

restconf_headers = {
    'Accept': 'application/yang-data+json',
    'Content-Type': 'application/yang-data+json'
}

bgp_url = 'https://{device}/restconf/data/Cisco-IOS-XE-bgp-oper:bgp-state-data'

get_response = requests.get(bgp_url.format(device=device),
                            auth=(username, password),
                            headers=restconf_headers,
                            verify=False,
                            )

You might notice the difference in the requested URI (endpoints), as you would also notice the difference in the RESTCONF response – both responses will have different data structures, meaning, the data can not be accessed in the same way.

That being said, to add a new BGP peer under NX-OS we could use a native module and create a data structure as follows:

#!/usr/bin/env python

import json

import requests

username = 'admin'
password = ''
device = 'nxos1'

restconf_headers = {
    'Accept': 'application/yang.data+json',
    'Content-Type': 'application/yang.data+json'
}


def bgp_add(peer_address, peer_asn):
    bgp_payload = {
        "bgp-items": {
            "inst-items": {
                "dom-items": {
                    "Dom-list": [
                        {
                            "name": "default",
                            "peer-items": {
                                "Peer-list": [
                                    {
                                        "addr": peer_address,
                                        "asn": peer_asn
                                    }
                                ]
                            }
                        }
                    ]
                }
            }
        }
    }

    bgp_url = 'https://{device}/restconf/data/Cisco-NX-OS-device:System'

    requests.patch(bgp_url.format(device=device),
                   auth=(username, password),
                   headers=restconf_headers,
                   verify=False,
                   data=json.dumps(bgp_payload)
                   )

Same operation on IOS-XE would be similar, however the data payload and the URI itself looks much simpler while using XE’s native YANG model:

#!/usr/bin/env python

import json

import requests

username = 'admin'
password = ''
device = 'csr1'

restconf_headers = {
    'Accept': 'application/yang-data+json',
    'Content-Type': 'application/yang-data+json'
}


def bgp_add(peer_address, peer_asn):
    bgp_payload = {"Cisco-IOS-XE-bgp:neighbor": {'id': peer_address,
                                                 'remote-as': peer_asn}}

    # 65000 in the URL represents the ASN number
    bgp_url = 'https://{device}/restconf/data/Cisco-IOS-XE-native:native/Cisco-IOS-XE-native:router=bgp/65000/neighbor'

    requests.patch(bgp_url.format(device=device),
                   auth=(username, password),
                   headers=restconf_headers,
                   verify=False,
                   data=json.dumps(bgp_payload)
                   )

The reason of differences is the following:

In NX-OS, it is the “device” module that describes the BGP. However, in the IOS-XE it is the native BGP module, which is different than NX-OS’.

Note: The list of supported YANG modules is available on GitHub: https://github.com/YangModels/yang/tree/master/vendor/cisco

Openconfig Adoption is Progressing

The Openconfig working group produces set of YANG modules that are vendor neutral and agnostic – the overall idea of Openconfig project is, you could use the same YANG model to communicate with different vendors (and platforms), keeping the same data structures. Openconfig might sound like an alternative solution to the differences between Cisco’s native modules on NX-OS and IOS-XE.

To understand similarities and differences while using Openconfig on different platforms, let’s examine existing loopback200 interface on nxos1 device (NX-OS):

nxos1# show run int lo200

!Command: show running-config interface loopback200
!Running configuration last done at: Thu Oct 24 12:06:42 2019
!Time: Thu Oct 24 12:07:04 2019

version 9.3(1) Bios:version  

interface loopback200
  description NTC
  ip address 192.0.2.2/32

Getting url https://nxos1/restconf/data/openconfig-interfaces:interfaces/interface=lo200 would result in the following response:

{
   "interface": [
      {
         "name": "lo200",
         "config": {
            "description": "NTC",
            "enabled": "true",
            "name": "lo200",
            "type": "softwareLoopback"
         },
         "state": {
            "admin-status": "UP",
            "ifindex": "335544520",
            "oper-status": "UP",
            "description": "NTC",
            "enabled": "true",
            "mtu": "1500",
            "type": "softwareLoopback"
         }
      }
   ]
}

nxos1 response contained information about the configuration and state, structured accordingly to the openconfig-interfaces model.

Similar configuration is present on csr1 device (IOS-XE):

csr1#show run int lo200
Building configuration...

Current configuration : 89 bytes
!
interface Loopback200
 description NTC - XE
 ip address 192.0.2.2 255.255.255.255
end

Getting url https://csr1/restconf/data/openconfig-interfaces:interfaces/interface=Loopback200 would result in the following response:

{
  "openconfig-interfaces:interface": {
    "name": "Loopback200",
    "config": {
      "type": "iana-if-type:softwareLoopback",
      "name": "Loopback200",
      "description": "NTC - XE",
      "enabled": true
    },
    "state": {
      "type": "iana-if-type:softwareLoopback",
      "name": "Loopback200",
      "description": "NTC - XE",
      "enabled": true,
      "ifindex": 29,
      "admin-status": "UP",
      "oper-status": "UP",
      "last-change": "2019-10-23T01:12:11.000147+00:00",
      "counters": {
        "in-octets": "0",
        "in-unicast-pkts": "0",
        "in-broadcast-pkts": "0",
        "in-multicast-pkts": "0",
        "in-discards": "0",
        "in-errors": "0",
        "in-unknown-protos": 0,
        "out-octets": "0",
        "out-unicast-pkts": "0",
        "out-broadcast-pkts": "0",
        "out-multicast-pkts": "0",
        "out-discards": "0",
        "out-errors": "0",
        "last-clear": "2019-10-22T23:13:05.000807+00:00"
      }
    },
    "subinterfaces": {
      "subinterface": [
        {
          "index": 0,
          "config": {
            "index": 0,
            "name": "Loopback200",
            "description": "NTC - XE",
            "enabled": true
          },
          "state": {
            "index": 0,
            "name": "Loopback200.0",
            "description": "NTC - XE",
            "enabled": true,
            "admin-status": "UP",
            "oper-status": "UP",
            "last-change": "2019-10-23T01:12:11.000147+00:00",
            "counters": {
              "in-octets": "0",
              "in-unicast-pkts": "0",
              "in-broadcast-pkts": "0",
              "in-multicast-pkts": "0",
              "in-discards": "0",
              "in-errors": "0",
              "out-octets": "0",
              "out-unicast-pkts": "0",
              "out-broadcast-pkts": "0",
              "out-multicast-pkts": "0",
              "out-discards": "0",
              "out-errors": "0",
              "last-clear": "2019-10-22T23:13:05.000807+00:00"
            }
          },
          "openconfig-if-ip:ipv4": {
            "addresses": {
              "address": [
                {
                  "ip": "192.0.2.2",
                  "config": {
                    "ip": "192.0.2.2",
                    "prefix-length": 32
                  },
                  "state": {
                    "ip": "192.0.2.2",
                    "prefix-length": 32
                  }
                }
              ]
            }
          },
          "openconfig-if-ip:ipv6": {
            "config": {
              "enabled": false
            },
            "state": {
              "enabled": false
            }
          }
        }
      ]
    }
  }
}

You might notice, that some parts of the responses have the same data structures with IOS-XE being more verbose in its response (state, counters and subinterfaces with IP addressing). It is due to the deviations from the openconfig-interfaces model – as sometimes not all features could be supported, YANG models can declare particular capability as “not supported”.

As version 9 of NX-OS supports tens of Openconfig YANG modules, there is still a difference in NX-OS and IOS-XE support for Openconfig. IOS-XE offers a broader support for Openconfig. The difference in supported modules should be considered while planning Openconfig usage in your Network Automation journey. Please also note, that Openconfig support is characterized by a list of guidelines, limitations and deviations from a published models.

PATCH vs. PUT

Some of the implementation details are worth checking during your code development process. Typically, in REST APIs a PUT method represents a “create or replace” operation, while PATCH represents a “merge” operation. On NX-OS we noticed a difference in behaviour of a PUT method, which was not behaving as per RFC 8040 (and draft 10).

According to the RFC 8040 PUT method is “create or replace”:

The RESTCONF server MUST support the PUT method. The PUT method is sent by the client to create or replace the target data resource. A request message-body MUST be present, representing the new data resource, or the server MUST return a “400 Bad Request” status-line. The error-tag value “invalid-value” is used in this case.

Using PUT method on NX-OS had the same result as using PATCH method during our deployment – it used to merge our BGP changes into existing BGP configuration. PUT on IOS-XE indeed was a “create or replace” operation as RFC defines it.

The moral of the story is that APIs are still new on platforms and clearly shows the importance of user testing of network automation elements like APIs.


Conclusion

Presented examples are just some of the differences we identified during our work. As we are glad to see model driven programmability support in both device types, there are lots of factors to consider before using a particular protocol in your Network Automation system.

-Marek



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!