A Journey in Golden Config


On the surface, all network engineers have a good understanding about what configuration compliance, or golden config, is. However, I’ve found the devil is in the details, and there are many different ideas as to what it should be. In this article, we will explore what some of those views are and some of the pros/cons for them.

Bulk Matching

This works best for “global configurations” such as ntp, dns, snmp, etc. The idea is that you have a standard of what the configurations should be, and you should ensure the actual configurations are the same.

This works well if the configurations are truly the same with a pattern, as described in this pseudo code:

expected_config = "ntp server 10.1.1.1\nntp server 10.1.1.2\nntp server 10.1.1.3 prefer"
actual_config = run_ios_command("show run | in ^ntp server")

if expected_config == actual_config:
    return True
else:
    return False

This tends to not scale well when you have exceptions, a lot of regionality (e.g., differences for region-based standards such as eu, am, ep, etc.), exceptions to the rule (e.g., if a site has a local ntp server, use it), and does not really consider the configurations that will always be different like IPs on interfaces, BGP ASNs, VLANs, etc.

Regex Pattern Matching

Sometimes, you are not as concerned about the actual data, such as whether or not the NTP servers are specific IPs, but instead are concerned that there are NTP servers configured. In such situations you can set up regex to match against your configurations.

Let’s take a look at this as described in this microfocus documentation.

Block Start: interface (.*)
Block End: !

Condition A: Config Block
must not contain
ip address (10\..*)\s(.*)

Condition B: Config Text
must contain only:
Must contain these lines:
ntp server 169\.243\.103\.34
ntp server 170\.242\.62\.16
ntp server 170\.242\.62\.17
ntp server 169\.243\.226\.94
But must not have any additional lines containing:
ntp server(.*)

Logic: A AND B

As you can see, you can use regex to perform “greedy” matches, such as ip address (10\..*)\s(.*), as well as specific matches, such as ntp server 169\.243\.103\.34.

There are also some scaling concerns with this approach, such as which devices does this template apply to, which templates are applied to a device, complex regex matching that quickly gets out of control, and not providing a path to “fix” the configurations.

Profiling Configurations

As you will note, the previous options made it difficult to scale beyond the global configurations. Developing compliance on configurations like interface can be rather difficult, if not impossible, in them.

With profiling the configurations, we can build strategies to pull out the relevant data and ensure a configuration can be rebuilt to the actual configurations. Well, that was a confusing mouthful, so let’s break this down a bit.

  • Grab a piece of configuration, such as all configuration under “interface GigabitEthernet0/1”.
    • We will call this actual_configuration.
  • Grab the detail from that configuration that you would use to profile it, such as the description and VLAN.
  • Use that data with predefined templates and process through a templating engine.
    • We call this expected_configuration.
  • Compare actual_configuration and expected_configuration and see whether they are the same.
    • If there are configs in actual_configuration and not in expected_configuration, there are unexpected configurations.
    • If there are configs in expected_configuration and not in actual_configuration, there are missing configurations.

Well, this is still a bit much, how about a diagram?

All make sense? If not, one more effort with actual code:

Basic Functions

>>>
>>> import jinja2
>>> import difflib
>>>
>>> def parse_cfg(actual_configuration):
...     parsed_data = []
...     interface = ""
...     description = ""
...     vlan = 0
...     for line in actual_configuration.splitlines():
...         if line.startswith("interface "):
...             if interface:
...                 parsed_data.append({"interface": interface, "description": description, "vlan": vlan})
...             interface = line.split()[1]
...             description = ""
...             vlan = 0
...         if line.startswith(" description "):
...             description = line[13:]
...         if line.startswith(" switchport access vlan "):
...             vlan = line[24:]
...     parsed_data.append({"interface": interface, "description": description, "vlan": vlan})
...     return parsed_data
...
>>> def regen_cfg(vars):
...     template_str = """"""
...     environment = jinja2.Environment()
...     template = environment.from_string(template_str)
...     return template.render(**vars)
...
>>> def compare_cfg(actual_configuration, expected_configuration):
...     if actual_configuration == expected_configuration:
...         return True
...     else:
...         for text in difflib.unified_diff(actual_configuration.split("\n"), expected_configuration.split("\n")):
...             print(text)
...         return False
...
>>> 

Example of Compliant Configuration

>>> compliance_actual_configuration = """
... interface GigabitEthernet0/1
...  description USER PORT
...  switchport mode access
...  switchport access vlan 205
...  snmp trap mac-notification change added
...  snmp trap mac-notification change removed
...  auto qos trust dscp
...  no mdix auto
...  spanning-tree portfast
...  spanning-tree guard root"""
>>>
>>> compliant_vars = {}
>>> compliant_vars['interface_vars'] = parse_cfg(compliance_actual_configuration)
>>> compliant_vars['interface_vars']
[{'interface': 'GigabitEthernet0/1', 'description': 'USER PORT', 'vlan': '205'}]
>>> compliant_expected_configuration = regen_cfg(compliant_vars)
>>>
>>> compare_cfg(compliance_actual_configuration, compliant_expected_configuration)
True
>>> 

Example of Non-Compliant Configurations

>>> non_compliant_actual_configuration = """
... interface GigabitEthernet0/1
...  description USER PORT
...  switchport access vlan 205
...  spanning-tree portfast
...  spanning-tree guard root
... """
>>>
>>> non_compliant_vars = {}
>>> non_compliant_vars['interface_vars'] = parse_cfg(non_compliant_actual_configuration)
>>> non_compliant_expected_configuration = regen_cfg(non_compliant_vars)
>>>
>>> compare_cfg(non_compliant_actual_configuration, non_compliant_expected_configuration)
---

+++

@@ -1,7 +1,11 @@


 interface GigabitEthernet0/1
  description USER PORT
+ switchport mode access
  switchport access vlan 205
+ snmp trap mac-notification change added
+ snmp trap mac-notification change removed
+ auto qos trust dscp
+ no mdix auto
  spanning-tree portfast
  spanning-tree guard root
-
False
>>>

With this approach you solve many of the challenges of comparing different types of configurations. That being said, numerous challenges remain.

  • Each configuration stanza requires some custom code
  • Exception management is difficult
    • For example, you want to add broadcast suppression on twenty interfaces within your entire org, how do you handle that?
  • In some cases you do not care about the current configuration, you simply want the configuration to match the expected; so you must support this solution and another solution as well.

Intended State vs Actual State

Having built many such solutions, the idea of building a comparison of the actual state vs intended seemed the most logical. In such a design, the lion’s share of the work is how to generate configurations, in a “traditional” Infrastructure as Code (IaC) approach. With IaC, you generate your configurations (within networking) generally by combining the data with Jinja templates.

Let’s break down this process a bit.

  • Obtain the actual configuration from the backup
    • Parse out the relevant configuration, often by breaking up into features (think stanza levels of configurations)
    • We will call this actual_configuration.
  • Generate the intended configuration
    • We will call this intended_configuration.
  • Compare the two configuration parts

To help bring this to life, here is a diagram of how this works:

In pursuing this approach, you get the direct benefits for configuration compliance of:

  • Having a single solution for any CLI-based configurations, regardless of vendor
  • Limiting the amount of code for any given configuration (to nearly zero)
  • Providing a platform for exception management
  • Providing a path to fix the configurations

Note: Though outside the scope of this blog, the ability to remedy configurations is generally predicated on having both an actual and an intended state.

Additionally, there are the collateral benefits of:

  • Providing a reason to develop an IaC solution
  • Providing a reason to build out configurations
  • Providing a reason to populate a Source of Truth

To finally drive home how this works, let’s review some code.

Basic Setup

>>> import jinja2
>>> from netutils.config import compliance
>>>
>>> def regen_cfg(vars):
...     template_str = """"""
...     environment = jinja2.Environment()
...     template = environment.from_string(template_str)
...     return template.render(**vars)
...
>>> # This is our pseudo SoT
>>> vars = {}
>>> vars['interface_vars'] = [{'interface': 'GigabitEthernet0/1', 'description': 'USER PORT', 'vlan': '205'}]
>>>
>>> network_os = "cisco_ios"
>>> features = [
...     {"name": "interface", "ordered": True, "section": ["interface "]},
... ]
>>>

Example of Compliant Configuration

<span role="button" tabindex="0" data-code=">>> compliant_backup_cfg = """ … interface GigabitEthernet0/1 … description USER PORT … switchport mode access … switchport access vlan 205 … snmp trap mac-notification change added … snmp trap mac-notification change removed … auto qos trust dscp … no mdix auto … spanning-tree portfast … spanning-tree guard root""" >>> >>> compliant_intended_cfg = regen_cfg(vars) >>> >>> compliance.compliance(features, compliant_backup_cfg, compliant_intended_cfg, network_os, "string") {'interface': {'compliant': True, 'missing': '', 'extra': '', 'cannot_parse': True, 'unordered_compliant': True, 'ordered_compliant': True, 'actual': 'interface GigabitEthernet0/1\n description USER PORT\n switchport mode access\n switchport access vlan 205\n snmp trap mac-notification change added\n snmp trap mac-notification change removed\n auto qos trust dscp\n no mdix auto\n spanning-tree portfast\n spanning-tree guard root', 'intended': 'interface GigabitEthernet0/1\n description USER PORT\n switchport mode access\n switchport access vlan 205\n snmp trap mac-notification change added\n snmp trap mac-notification change removed\n auto qos trust dscp\n no mdix auto\n spanning-tree portfast\n spanning-tree guard root'}} >>> # simplified yaml view of same data # interface: # compliant: true # missing: '' # extra: '' # cannot_parse: true # unordered_compliant: true # ordered_compliant: true # actual: <omitted> # intended:
>>> compliant_backup_cfg = """
... interface GigabitEthernet0/1
...  description USER PORT
...  switchport mode access
...  switchport access vlan 205
...  snmp trap mac-notification change added
...  snmp trap mac-notification change removed
...  auto qos trust dscp
...  no mdix auto
...  spanning-tree portfast
...  spanning-tree guard root"""
>>>
>>> compliant_intended_cfg = regen_cfg(vars)
>>>
>>> compliance.compliance(features, compliant_backup_cfg, compliant_intended_cfg, network_os, "string")
{'interface': {'compliant': True, 'missing': '', 'extra': '', 'cannot_parse': True, 'unordered_compliant': True, 'ordered_compliant': True, 'actual': 'interface GigabitEthernet0/1\n description USER PORT\n switchport mode access\n switchport access vlan 205\n snmp trap mac-notification change added\n snmp trap mac-notification change removed\n auto qos trust dscp\n no mdix auto\n spanning-tree portfast\n spanning-tree guard root', 'intended': 'interface GigabitEthernet0/1\n description USER PORT\n switchport mode access\n switchport access vlan 205\n snmp trap mac-notification change added\n snmp trap mac-notification change removed\n auto qos trust dscp\n no mdix auto\n spanning-tree portfast\n spanning-tree guard root'}}
>>>
# simplified yaml view of same data
# interface:
#   compliant: true
#   missing: ''
#   extra: ''
#   cannot_parse: true
#   unordered_compliant: true
#   ordered_compliant: true
#   actual: <omitted>
#   intended: <omitted>
<span role="button" tabindex="0" data-code=">>> non_compliant_backup_cfg = """ … interface GigabitEthernet0/1 … description USER PORT … switchport access vlan 205 … spanning-tree portfast … spanning-tree guard root … """ >>> >>> non_compliant_intended_cfg = regen_cfg(vars) >>> >>> compliance.compliance(features, non_compliant_backup_cfg, non_compliant_intended_cfg, network_os, "string") {'interface': {'compliant': False, 'missing': 'interface GigabitEthernet0/1\n switchport mode access\n snmp trap mac-notification change added\n snmp trap mac-notification change removed\n auto qos trust dscp\n no mdix auto', 'extra': '', 'cannot_parse': True, 'unordered_compliant': False, 'ordered_compliant': False, 'actual': 'interface GigabitEthernet0/1\n description USER PORT\n switchport access vlan 205\n spanning-tree portfast\n spanning-tree guard root', 'intended': 'interface GigabitEthernet0/1\n description USER PORT\n switchport mode access\n switchport access vlan 205\n snmp trap mac-notification change added\n snmp trap mac-notification change removed\n auto qos trust dscp\n no mdix auto\n spanning-tree portfast\n spanning-tree guard root'}} >>> # simplified yaml view of same data # interface: # compliant: false # missing: <omitted> # extra: '' # cannot_parse: true # unordered_compliant: false # ordered_compliant: false # actual: <omitted> # intended:
>>> non_compliant_backup_cfg = """
... interface GigabitEthernet0/1
...  description USER PORT
...  switchport access vlan 205
...  spanning-tree portfast
...  spanning-tree guard root
... """
>>>
>>> non_compliant_intended_cfg = regen_cfg(vars)
>>>
>>> compliance.compliance(features, non_compliant_backup_cfg, non_compliant_intended_cfg, network_os, "string")
{'interface': {'compliant': False, 'missing': 'interface GigabitEthernet0/1\n switchport mode access\n snmp trap mac-notification change added\n snmp trap mac-notification change removed\n auto qos trust dscp\n no mdix auto', 'extra': '', 'cannot_parse': True, 'unordered_compliant': False, 'ordered_compliant': False, 'actual': 'interface GigabitEthernet0/1\n description USER PORT\n switchport access vlan 205\n spanning-tree portfast\n spanning-tree guard root', 'intended': 'interface GigabitEthernet0/1\n description USER PORT\n switchport mode access\n switchport access vlan 205\n snmp trap mac-notification change added\n snmp trap mac-notification change removed\n auto qos trust dscp\n no mdix auto\n spanning-tree portfast\n spanning-tree guard root'}}
>>>
# simplified yaml view of same data
# interface:
#   compliant: false
#   missing: <omitted>
#   extra: ''
#   cannot_parse: true
#   unordered_compliant: false
#   ordered_compliant: false
#   actual: <omitted>
#   intended: <omitted>

It may not be immediately obvious, but the key is in the feature definition features = [{"name": "interface", "ordered": True, "section": ["interface "]}]. This is the only thing that needs to change when adding additional features. This truly becomes powerful once you have an SoT and have built out your IaC processes.

This process is the underlying principle on which Nautobot Golden Config app is built:

The app provides tooling and ease of use around the processes, which makes it more consumable, but the crux of what is happening is described in these last few paragraphs and code snippets.

Custom Business Logic

While not more strictly defined, it is important to cover custom business logic. There are times when you may only care about the application of certain features but not check beyond that. Let’s take an example used in Nautobot Golden Config custom compliance engine.

<span role="button" tabindex="0" data-code="# sample_config = '''router bgp 400 # no synchronization # bgp log-neighbor-changes # neighbor 70.70.70.70 remote-as 400 # neighbor 70.70.70.70 password cisco # neighbor 70.70.70.70 update-source Loopback80 # no auto-summary # ''' import re BGP_PATTERN = re.compile("\s*neighbor (?P<ip>\d+\.\d+\.\d+\.\d+) .*") BGP_SECRET = re.compile("\s*neighbor (?P
# sample_config = '''router bgp 400
#  no synchronization
#  bgp log-neighbor-changes
#  neighbor 70.70.70.70 remote-as 400
#  neighbor 70.70.70.70 password cisco
#  neighbor 70.70.70.70 update-source Loopback80
#  no auto-summary
# '''
import re
BGP_PATTERN = re.compile("\s*neighbor (?P<ip>\d+\.\d+\.\d+\.\d+) .*")
BGP_SECRET = re.compile("\s*neighbor (?P<ip>\d+\.\d+\.\d+\.\d+) password (\S+).*")
def custom_compliance_func(obj):
    if obj.rule == 'bgp' and obj.device.platform.slug == 'ios':
        actual_config = obj.actual
        neighbors = []
        secrets = []
        for line in actual_config.splitlines():
            match = BGP_PATTERN.search(line)
            if match:
                neighbors.append(match.groups("ip")[0])
            secret_match = BGP_SECRET.search(line)
            if secret_match:
                secrets.append(match.groups("ip")[0])
    neighbors = list(set(neighbors))
    secrets = list(set(secrets))
    if secrets != neighbors:
        compliance_int = 0
        compliance = False
        ordered = False
        missing = f"neighbors Found: {str(neighbors)}\nneigbors with secrets found: {str(secrets)}"
        extra = ""
    else:
        compliance_int = 1
        compliance = True
        ordered = True
        missing = ""
        extra = ""
    return {
        "compliance": compliance,
        "compliance_int": compliance_int,
        "ordered": ordered,
        "missing": missing,
        "extra": extra,
    }

In the above case you are simply enforcing that if neighbor 70.70.70.70 is found, there is a configured password on it. The obvious downside to this is every situation must be handled with custom code, and you are not reviewing all of the configuration or even a majority of the configuration.

Linting

In some cases you are truly only looking for certain conditions. This can be nice, since it applies more generically and has good utility around ensuring security configurations are applied correctly. Services such as STIG or Cisco Config Analysis Tool are largely based on the same concept, which is to confirm that a piece of configuration is on or specifically not on.

We can take a look at a netlint that was built by fellow Network to Coder Leo Kirchner.

>>> from netlint.checks.checker import Checker
>>> from netlint.checks.utils import NOS
>>>
>>> configuration = [
...   "feature ssh",
...   "feature bgp",
...   "hostname test.local"
... ]
>>>
>>> checker = Checker()
>>>
>>> checker.run_checks(configuration, NOS.CISCO_NXOS)
False
>>> checker.check_results
{'NXOS101': None, 'NXOS102': CheckResult(text='BGP enabled but never used.', lines=['feature bgp']), 'NXOS103': None, 'NXOS104': None, 'VAR101': None, 'VAR102': None, 'VAR103': None}
>>>

Conclusion

Throughout my career, I have personally deployed and built each of these types of systems. However, I truly believe that the only long-term method to scale is provided in the “Intended State vs Actual State” section and used within Nautobot Golden Config. Any other method is likely good to get quick results, but tends to not work in situations more complicated than the initial POC. The collateral benefits are also equally compelling in themselves.

That being said, would love to hear your feedback. Are there any other types of golden config that I have missed? If so, look forward to seeing you in the comments.

-Ken



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Author