Network Automation Architecture – Automation Engine

In the previous blogs of this series about Network Automation Architecture, 12, and 3, we have presented the key architectural components and their respective functions. This blog will be expanding on the role of the Automation Engine. The automation engine is the component that contains all the tasks that interact with the network, to change the state of the network via configuration management processes.

Introduction

The automation engine is flexible and can be built on top of many different languages and frameworks. A few examples are: Python, Golang, Rust; they can be further broken down into specific frameworks such as: Ansible, Salt, Nornir, NAPALM, Netmiko which is the scope of Python, Terraform for cloud, Scrapligo in the cases of Golang, or even validation focused tools like Batfish. The automation engine attempts to achieve tasks such as: configuration backups/rendering/compliance/provisioning, Zero Touch Provisioning (ZTP), as well as NetDevOps principles such as Continuous Integration/Continuous Delivery (CI/CD).

The automation engine is the component that manages the network state and performs network tasks. This component will actively make changes to the network state and the connectivity between the automation engine and in-scope devices should be permitted by security policy.

Automation Engine

There are some challenges that the automation engine has to solve; interacting with network devices is complicated. Command line interfaces (CLIs) have been the main interface for network engineers to modify and manage network equipment. More recent trends involve an API to interact with specific devices, e.g. Arista eAPI. Alternatively, some vendors are moving toward element managers that offer APIs and handle device connections within their own frameworks. Finally, YANG via NETCONF/RESTCONF/gNMI was developed to attempt to solve vendor independent automation, but is still working towards gaining mass adoption.

CLIs were not built for automation; but over the last many years there have been many projects that have been built and open sourced to help solve these problems. Some of these were mentioned in the introduction; however for the sake of clarity, Nornir, Scrapli(go), NAPALM, Netmiko are all examples that provide frameworks to interact with CLIs and automate these tasks.

These projects generally require a few pieces of metadata:

  • Device platform – which is used to map the platform (or OS) to the network driver for the given framework in use.
  • Device credentials – how the automation engine authenticates to the network device.
  • Management IP address – IP address/FQDN that the automation engine can use to reach a network device.

Note: These are the bare minimum attributes, and they should be stored within the Source of Truth (SoT) component. The automation engine should have a method to query for the information.

While APIs have helped aid the adoption of automation and made the interaction with these devices simpler, each vendors API is implemented differently. The automation engine must provide a flexible interface that is capable of manipulating parameters and reading multiple returned data formats E.g (XML, JSON).

Main Challenges

While considering configuration management and the automation engine in general some of the key challenges are listed below. This is not an exhaustive list.

  • Configuration Management:
    • Configuration Rendering: A few topics to consider; full configuration rendering, partial configuration rendering, secrets interpolation.
      • Secrets Management: How to pull secrets from an external secrets management system, Ansible Vault, other?
    • Configuration Remediation: It’s one thing to do a diff and understand what is extra and what is missing. (As an example, this is solved in Nautobot Golden Config App.) It’s a completely different challenge to remediate those configurations.
    • Configuration Deployment: The process of deploying a rendered configuration onto an element.
  • Configuration Provisioning: Creating objects, such as creating an EC2 instance, Network Functions Virtualization (NFV) Appliance, or network service (such as an AWS IGW).
  • System Load Distribution:
    • What security posture do we need to adhere to?
    • Only certain subnets can speak to management networks?
    • Only certain communication protocols are allowed?
  • Operational Actions:
    • Rebooting a device.
    • Reset IPSEC tunnel.
    • Bounce a interface.
    • Bounce a BGP neighbor.
  • Operational Compliance and Checks: What operational data should be collected, how should the data be transformed?

For some of the more advanced topics mentioned above the next section serves to provide addition details and considerations.

Challenges Clarified

Let’s deep dive into several of nuances of some of these topics.

  • Full vs Partial configuration deployments: This challenge may seem simple but it’s actually quite complex. Before you can push a configuration you must be able to render the configuration; before you can render it you must have the source of truth data. This is truly a crawl, walk, run situation. What are some things you need to consider?
    • Merge vs. Replace
      • Replace at what level? Full config is generally easier that partial configuration merge, Junos allows stanza level replacements, but most OS’s do not.
    • How to push a subset of the configuration. Identify configuration snippets that are least impactful, but provide a great Return on Investment (ROI).
    • How to validate a configuration deployment via a CI/CD pipeline (Fail Fast).
      • This is also an iterative approach. Start simple and grow the complexity.
      • Check out Batfish.
  • Secrets interpolation: There are configuration lines in most vendors that require credential/secret values to be populated. The rendering of configuration by the automation engine must be flexible and secure enough to do this without exposing the secrets to unintended audiences.
  • Remediating a configuration: Remediation of a configuration based on a diff of actual and intended state comes with some business requirements around what the business’s confidence level is, e.g., remediating the configuration completely (including removing “extras”) vs just adding the “missing” configuration elements.
    • An engine like hier_config can provide a remediation plan.

As you can see from the challenges above, there are many questions you must answer. Once these questions are answered, it becomes much easier to try to choose an automation engine that fits your organization’s goals.

Choosing an Automation Engine

One of the biggest challenges with the Automation Engine component of this architecture is picking the right tool(s) for the job. There is no shortage of open source tools that fit this component of the architecture; furthermore, there is an ever-expanding catalog of closed source / vendor specific tools that aim to accomplish the tasks.

This is an interesting topic. Throughout the years, NTC has engaged with many customers. Even customers at the most basic entry into their network automation journey are already using the automation engine element. A simple one-off script that goes and collects data off of a device fits this component. Since this component in most cases is one of the first to be selected, it’s not always easy convincing a client that other options exist.

For these and many other reasons, we’ve found that most of the automation engine options available can achieve great results if you have the rest of the automation architecture in place. Selecting the right engine for your business comes down to skill set, previous adoption, willingness to learn, and in some cases having product support, which many large enterprises rely on today.

Regardless of the application/framework in use, the automation engine communicates with network devices. And as mentioned in Network Automation Architecture – The Components, it’s important to understand the automation engine not as an isolated component, but as the final executor of the outcome of the other components.

Furthermore; there will be situations where a single automation engine does not meet the business requirements, in these circumstances multiple automation engines can be used, but a level of effort should be exhausted to keep the number of different automation engines to a minium; otherwise, the learning curve and skill set to operate/maintain this component gets too complex and leads to slowed adoption.

Some of the characteristics to consider are mentioned below:

  • Does the tool have an API?
    • Most Automation Engines have an API, but is it robust? Is it RESTful and support all the CRUD operations? Is there other types of APIs like GraphQL?
  • Does the tool integrate with the SoT?
  • Does the tool have a User Interface (UI)?
  • Is the tool flexible enough to accomplish RBAC requirements for the customer.
  • Credential Management
  • The ability to create rich and complex Forms
  • Job Isolation
  • Network Device Support
  • Secrets Integration
  • Scheduler
  • Traceability / Logging

Advanced Concepts

One of the biggest challenges related to the automation engine is the connectivity conundrum that exists in enterprises. The security of networks continues to grow in complexity; the management control plane of network devices is no different. In many cases centralized applications aren’t allowed to connect to network devices. Whether that is due to DMZ design, Geo location issues, or mergers and acquisitions, the automation engine must be flexible enough to run inside those pods.

Here are some of the existing solutions to this problem.

Automation EngineSolution
AnsibleExecution Environments
PythonCelery, Redis (RQ), Taskmaster
SaltstackMaster/Minion

Closing

To close out this blog, I want to show what a release process with validation steps might look like in a high-level diagram, this diagram came directly from one of the Webinars Ken Celenza and I did Community Webinar: Using Batfish for Network & Routing Verification.


Conclusion

Keep an eye out for the remaining parts of this series!

Cheers, -Jeff



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Author