Automate Your Circuit Maintenances!

One true fact about every network is that it is composed by a myriad of circuits/links to interconnect the network devices. Some of them would be private links (for example, a cross-connect between two devices in your Data Center), but, because an isolated network is not super useful, you have high odds of having third-party circuits, too (for instance, connecting your border routers to the internet).

It’s easy to guess that these third-party circuits will have their own life cycle, so after being provisioned they will require maintenances to handle planned operations, such as fixing or improving the service. In these situations, the Network Service Provider (NSP) will/should contact you to notify in advance about the coming disruption, so you could implement the mitigation actions you consider necessary.

Still, nowadays the usual way to contact to the customer has been sending an email with all the details that the NSP considers convenient. On the customer side, usually a network operator would check the email inbox periodically, understand what the notification is about, and define a plan to mitigate it.

How This Is Impacting You

Ideally, the NSP should implement all the preventive actions to not affect your service during its shutdown, so if you have a circuit with a BGP service, before dropping the circuit it should bring the BGP session down to let the traffic go to other available routes, and afterwards, shutdown the circuit without dropping traffic. But “what if” this is not properly done? Obviously, this could impact on your business.

Moreover, even we assume that the NSP will mitigate impact on their side, we would like to minimize any operational burden on our side, maybe by muting the alerting related to the state of the circuit, as it’s a planned operation, thereby skipping a triggered page in the middle of the night.

Obviously, to take all the previously mentioned actions, someone from the network team has to read and understand what is going to happen, and determine when and which circuits are affected. Then, an action plan must be defined for “pre” and “post” maintenance. But you can’t take a notification as the last word, as normally, updates will follow the maintenance, changing the impact or the dates, so changing your action plan.

This is not for us, we don’t have too many maintenances

Are you sure? My hands-on experience with Internet Transit circuits, as well as others, gives me a rough estimation of 3 maintenances per circuit, per year (this depends on multiple factors).

Considering one engineering hour of work to handle one maintenance (including reading notification, defining the action plan, implementing the actions before the start time, and the same after the end time), you can get an approximate estimation of how much time is spent on and its impact on your operations.

As an example, one of our customers with a lot of circuits reported up to 15 maintenances per day, so on average, two engineers would be focused on handling circuit maintenances.

Changing the Paradigm

But, don’t worry, network automation comes to our rescue!

Let’s ask ourselves: “what if” we could automatically fetch and understand those notifications and convert them to data that we could leverage to automate what we had been doing manually?

summary

In this picture, instead of relying only on the network operator to analyze a maintenance notification and then manually perform the required operations, we are proposing to enhance the Source of Truth (SoT), where we store the intended state of the network, to automatically fetch the notifications, parse them to understand their data, and populate it into the existing SoT data models. Eventually, when the network automation process attempts to match the intended state to the current state, the necessary changes will be rolled out.

Notice that we are moving from statically defining the network intent, to a dynamic approach, where we delegate part of the intent to external sources. And therefore new challenges will arise, such as curating this data to limit its scope.

Previous Work

Obviously, we are not the first ones trying to solve this problem. You probably already have created some homegrown scripts that help you with this.

In 2016 a group of people got together and created a NANOG BCOP (Best Current Operational Practice): A machine-parsable standard for formatting maintenance notifications. They were focused on solving a big problem: how to use a common format that would make maintenance notifications parsable by a machine. The proposed format uses the iCalendar format RFC5545 and defines the parameters that should define a maintenance and its properties. The BCOP was proposed as an IETF draft and, even thought not becoming an actual standard, it has gotten some traction, and there are currently a bunch of providers who have adopted it.

But, until all the NSPs are following the recommended format, we have a problem, as some notifications would be difficult to automatically parse. To tackle this issue, an interesting open source project, Janitor, was created. It is a flask application with an embedded parser for circuit maintenance notification emails, some following the previously recommended format and some not.

Proposed Solution

From Network to Code (NTC), after reviewing all the previous work, we are proposing the following approach:

proposed_architecture

The key point is splitting the problem into two parts:

  • Parser library, with the goal of taking whatever format the maintenance notification may have and converting it to an object that follows the proposed standard format. This means that an NSP that follows the standard would be a direct mapping,. And for the others, only a custom parser to translate custom formats to the standard one will be required.
  • An SoT Plugin, which will take care of fetching the notifications, parse them using the parser library, and organize the returned object in the SoT that will eventually offer to external integrations via multiple APIs.

This divide-and-conquer approach provides a key benefit: each component will have a different road map. So if you are using another SoT, you can still benefit from the parser without having to embrace the proposed SoT. Obviously, if you want a turnkey solution, you can directly adopt the proposed SoT that is using the parser.

Circuit Maintenance Parser

The first component is the circuit-maintenance-parser Python library with a single goal: convert heterogeneous circuit maintenance notification formats to the well-defined standard format proposed above.

The library will take a raw text notification as input and will return a parsed object (dict) compliant with the BCOP notification format. The provider type can be specified, but if not, the standard iCal format will be used.

To install it from PyPI: pip install circuit-maintenance-parser

And to use it:

from circuit_maintenance_parser import init_parser

raw_text = """BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Maint Note//https://github.com/maint-notification//
BEGIN:VEVENT
SUMMARY:Maint Note Example
DTSTART;VALUE=DATE-TIME:20151010T080000Z
DTEND;VALUE=DATE-TIME:20151010T100000Z
DTSTAMP;VALUE=DATE-TIME:20151010T001000Z
UID:42
SEQUENCE:1
X-MAINTNOTE-PROVIDER:example.com
X-MAINTNOTE-ACCOUNT:137.035999173
X-MAINTNOTE-MAINTENANCE-ID:WorkOrder-31415
X-MAINTNOTE-IMPACT:OUTAGE
X-MAINTNOTE-OBJECT-ID;X-MAINTNOTE-OBJECT-IMPACT=NO-IMPACT:acme-widgets-as-a-service
X-MAINTNOTE-OBJECT-ID;X-MAINTNOTE-OBJECT-IMPACT=OUTAGE:acme-widgets-as-a-service-2
X-MAINTNOTE-STATUS:TENTATIVE
ORGANIZER;CN="Example NOC":mailto:noone@example.com
END:VEVENT
END:VCALENDAR
"""

data = {
  "subject": "this is a circuit maintenance from some NSP",
  "sender": "support@networkserviceprovider.com",
  "source": "gmail",
  "raw": raw_text,
}

parser = init_parser(**data)

parsed_notifications = parser.process()

print(parsed_notifications[0].to_json())
{
  "account": "137.035999173",
  "circuits": [
    {
      "circuit_id": "acme-widgets-as-a-service",
      "impact": "NO-IMPACT"
    },
    {
      "circuit_id": "acme-widgets-as-a-service-2",
      "impact": "OUTAGE"
    }
  ],
  "end": 1444471200,
  "maintenance_id": "WorkOrder-31415",
  "organizer": "mailto:noone@example.com",
  "provider": "example.com",
  "sequence": 1,
  "stamp": 1444435800,
  "start": 1444464000,
  "status": "TENTATIVE",
  "summary": "Maint Note Example",
  "uid": "42"
}

Currently, there are only a few providers that are supported, but adding new parsers is easy, especially for the ones that match the standard one. So we expect that along with community adoption, more new parsers will come.

Circuit Maintenance Nautobot Plugin

NTC released Nautobot a few months ago as a fork from Netbox SoT, extending it with several new features that established the foundation to build the new Circuit Maintenance Plugin, leveraging its microkernel architecture.

sot_architecture
  • The Notifications Handler is an asynchronous job that fetches notifications from several Notification Sources (either an email box or an external API endpoint). When a new notification is received, it imports the circuit-maintenance-parser library to parse it and obtain the standard object.
  • Two relevant data models have been added: Notification and Circuit Maintenance. The Circuit Maintenance is the key component and is the result of parsing the notifications and linking to the Circuits that are part of the core data models. It creates, and updates, Circuit Maintenances automatically from the notification objects.
  • The plugin is also extending two core data models Provider and Circuit in order to make them aware of the new plugin data.
  • The plugin also registers new API objects for the REST and GraphQL endpoints, to offer them to external network automation integrations.

Circuit Maintenance UI

circuit_maintenance

For each maintenance notification, we will capture all the relevant information, such as the start and end time and the status of the maintenance (i.e. CONFIRMED).

The Circuits table lists all the affected circuits with the specific impact, so we can understand the expected impact in a granular way.

In the Notes section, a network operator can add manual notes about the maintenance but also incorporates automatically generated notes that warn about a circuit ID referenced in the notification that is not in the Circuits database, in case this was an error.

At the bottom, you can also see all the Notifications related to this circuit maintenance, to understand where all this information is coming from and the life cycle of the maintenance, from its creation and through subsequent updates.

Everything that could be done via UI is also available via API (REST and GraphQL). Other than retrieving data, you can also create Circuit Maintenances or trigger Notifications Handler jobs.

How Can This Help Me?

Automating the handling of your Circuit Maintenances could bring several benefits that you could progressively adopt:

  • Reporting: Using the relationships between the data models, and between the Circuit Maintenance and the affected Circuits, it’s easy to correlate and understand how many maintenances have had a provider for a period of time or the associated downtime.
  • Alerting: Checking from a Site perspective, you could automatically get an alert if, by chance, all your Internet Transit circuits in a PoP are going to be affected by overlapping maintenances from different NSPs, and take preventive actions.
  • Automate network operations: Taking the preventive actions on your side before a circuit goes into a maintenance (muting alerting for the interface state or shutting down BGP to drag traffic out) and then recovering the circuit when the maintenance is over.
  • Implement a Self-Healing strategy: Create custom Circuit Maintenances by observing your network performance and relying on the network automation to tune your network as needed.

Resources


Conclusion

Handling Circuit Maintenances manually has always been a pain for every network operations team, so NTC has taken a step forward by releasing these two open source projects that will solve that problem by implementing a network automation strategy.

The projects were released a few days ago at RIPE82 and their success greatly depends on the community adoption, because it’s key to extend the Parser with new providers and add more interesting features to the SoT Plugin.

We encourage you to check it out and provide feedback, so we can grow the community.

-Christian



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Author