Automation Principles – Data Normalization

Ken Celenza

April 12, 2022

This is part of a series of posts to help establish sound Network Automation Principles.

Providing a common method to interact between various different systems is a fairly pervasive idea throughout technology. Within your first days of learning about traditional networking, you will inevitably hear about the OSI model. The concept being that each layer provides an interface from one layer and to another layer. The point is, there must be an agreement between those interfaces.

The same is true with data, which poses a problem within the Network space, as most interfaces to the network are via vendor- specific CLI, API, etc. This is what makes a uniform YANG model via an open modeling standard such as Open Config or IETF models so attractive.

The problem for adoption for such a standard is multifaceted. While cynics believe it is a vendor ploy to keep vendor lock-in, I think it is a bit more nuanced than that. Without spending too much time on the subject, here are some points to consider.

Vendor-neutral models are by nature complex, as they should contain the superset of all features
Complexity makes it more difficult to use the product
Despite the complexity, vendor-neutral models always seem to lack a core feature to any given vendor
Vendors have to extend models to support “their differentiators” or features, which is complex and subject to future issues
Data does not always map easily from the vendor’s model to any other model
All of this makes it complex to actually build out, if these features are not built from within the OS to start with

That’s a huge topic, with a 30,000-foot view of some pros/cons, no reason to dive deeper now.

Data Normalization in Computer Science

The construct on agreed upon interfaces has many names in Computer Science, depending on the context. While not all are related to data specifically, the concept remains the same.

Interface – as the generic term to define how two systems connect, not to be confused with a “network interface”.
API (or Application Programming Interface) – which is not always a REST API, is an agreed upon interface.
Contract – as a term to to reinforce the idea that there is an agreed upon standard between two systems.
Signature – as a type enforced definition of a function.

As mentioned, some of these terms are specific to a context, such as signature being more associated with a function, but these are all terms you will here often that describe the basic concepts.

Data Normalization in NAPALM

NAPALM provides a series of “getters”; these are basically what a Network Engineer would call “show commands” in structured data and normalized. Let’s observe the following example, taken from the get_arp_table doc string.

Returns a list of dictionaries having the following set of keys:
    * interface (string)
    * mac (string)
    * ip (string)
    * age (float)

Example::

    [
        {
            'interface' : 'MgmtEth0/RSP0/CPU0/0',
            'mac'       : '5C:5E:AB:DA:3C:F0',
            'ip'        : '172.17.17.1',
            'age'       : 1454496274.84
        },
        {
            'interface' : 'MgmtEth0/RSP0/CPU0/0',
            'mac'       : '5C:5E:AB:DA:3C:FF',
            'ip'        : '172.17.17.2',
            'age'       : 1435641582.49
        }
    ]

What you will observe here is there is no mention of vendor, and there is seemingly nothing unique about this data to tie it to any single vendor. This allows the developer to make programmatic decisions in a single way, regardless of vendor. The way in which data is normalized is up to the author of the specific NAPALM driver.

import sys
from napalm import get_network_driver
from my_custom_inventory import get_device_details

network_os, ip, username, password = get_device_details(sys.argv[1])

driver = get_network_driver(network_os)
with driver(ip, username, password) as device:
    arp_table = device.get_arp_table()

for arp_entry in arp_table:
    if arp_entry['interface'].startswith("TenGigabitEthernet"):
        print(f"Found 10Gb port {arp_entry['interface']}")

From the above snippet, you can see that regardless of what the fictional function get_device_details returns for a valid network OS, the process will remain the same. The hard work of performing the data normalization still has to happen within the respective NAPALM driver. That may mean connecting to the device, running CLI commands, then parsing; or that could mean making an API call and transposing the data structure from the vendors to what NAPALM expects.

Configuration Data Normalization

Considerations for building out your own normalized data model:

Do not follow the vendor’s syntax, this simply pushes the problem along
Having thousands of configuration “nerd nobs” requires expert-level understanding of a data model that will not likely be as well documented or understood as the vendor’s CLI. The point is to remove complexity, not shift complexity
Express the business intention, not the vendor configuration
Abstract the uniqueness of the OS implementation away from from intent
Express the greatest amount of configuration state in the least amount of actual data

Generally speaking, when I am building a data model, I try to build it to be normalized. While not always achievable on day 1 (due to lacking complete understanding of requirements or lacking imagination) the thought process is always there. Even if dealing with a single vendor, the first question I will ask is “would this work for another vendor?”

Reviewing the following configurations from multiple vendors:

I will pick out what is unique from the configuration, and thus a variable.

Based on observation of the above configurations, the following normalized data structure was created.

bgp:
  asn: 6500
  networks:
    - "1.1.1.0/24"
    - "1.1.2.0/24"
    - "1.1.3.0/24"
  neighbors:
    - description: "NYC-RT02"
      ip: "10.10.10.2"
    - description: "NYC-RT03"
      ip: "10.10.10.3"
    - description: "NYC-RT04"
      ip: "10.10.10.4"

With this data-normalized model in mind, you can quickly see how the below template can be applied.

router bgp {{ bgp['asn'] }}
  router-id {{ bgp['id'] }}
  address-family ipv4 unicast
{% for net in bgp['networks'] %}
    network {{ net }}
{% endfor %}
{% for neighbor in bgp['neighbors'] %}
  neighbor {{ neighbor['ip'] }} remote-as {{ neighbor['asn'] }}
    description {{ neighbor['description'] }}
    address-family ipv4 unicast
{% endfor %}

In this example, the Jinja template provides the glue between a normalized data model and the vendor-specific configuration. Jinja is just used as an example; this could just as easily be converted to an operation with REST API, NETCONF, or any other vendor’s syntax.

The Case for Localized Simple Normalized Data Models

Within Network to Code, we have found that simple normalized data models tend to get more traction than the more complex ones. While it is clear that each enterprise building its own normalized data model is not exactly efficient either—since each organization is having to reinvent the wheel—the adoption tends to offset that inefficiency.

Perhaps there is room within the community for some improvement here, such as creating a venue to easily publish data models and have others consume those data models. This can serve as inspiration, a starting point, and a means of comparison of various different normalized data models without the rigor that is required for solidified data models that would come from OC/IETF.

Data Normalization Enforcement

Enforcing normalized data models is filled with tools. This will be covered in more detail within the Data Model blog, but here are a few:

JSON Schema
Any relational database
Kwalify

You may even find a utility called Schema Enforcer valuable if you’re looking at using JSON Schema for data model enforcement within a CI pipeline. Check out this intro blog if you’re interested.

Conclusion

There are many ways to normalize data, and many times when the vendor syntax or output is nearly the same. You may be able to reuse the exact same code from one vendor to another. By creating normalized data models, you can better prepare for future use cases, remove some amount of vendor lock-in, and provide a consistent developer experience across vendors.

Creating normalized data models takes some practice to get right, but it is a skill that can be honed over time and truly provide a richer experience.

-Ken

Tags :

automation automation-architecture automation-concepts automation-journey netdevops

Does this all sound amazing? Want to know more about how Network to Code can help you do this, reach out to our sales team. If you want to help make this a reality for our clients, check out our careers page.

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Author

Chiara Geronzi

View all posts

Cookie	Duration	Description
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__hstc	5 months 27 days	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_gat_gtag_UA_*	1 minute	Google Analytics sets this cookie to store a unique user ID.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
hubspotutk	5 months 27 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
ln_or	1 day	Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Automation Principles – Data Normalization

Data Normalization in Computer Science

Data Normalization in NAPALM

Configuration Data Normalization

The Case for Localized Simple Normalized Data Models

Data Normalization Enforcement

Conclusion

Tags :

Share :

Contents

Recent Posts

December 11, 2024

December 5, 2024

November 25, 2024

November 15, 2024

October 4, 2024

Contact Us to Learn More

Author

Nautobot

What we do

How we do it

Company

Community

Resources

Contact us