One true fact about every network is that it is composed by a myriad of circuits/links to interconnect the network devices. Some of them would be private links (for example, a cross-connect between two devices in your Data Center), but, because an isolated network is not super useful, you have high odds of having third-party circuits, too (for instance, connecting your border routers to the internet).
It’s easy to guess that these third-party circuits will have their own life cycle, so after being provisioned they will require maintenances to handle planned operations, such as fixing or improving the service. In these situations, the Network Service Provider (NSP) will/should contact you to notify in advance about the coming disruption, so you could implement the mitigation actions you consider necessary.
Still, nowadays the usual way to contact to the customer has been sending an email with all the details that the NSP considers convenient. On the customer side, usually a network operator would check the email inbox periodically, understand what the notification is about, and define a plan to mitigate it.
Ideally, the NSP should implement all the preventive actions to not affect your service during its shutdown, so if you have a circuit with a BGP service, before dropping the circuit it should bring the BGP session down to let the traffic go to other available routes, and afterwards, shutdown the circuit without dropping traffic. But “what if” this is not properly done? Obviously, this could impact on your business.
Moreover, even we assume that the NSP will mitigate impact on their side, we would like to minimize any operational burden on our side, maybe by muting the alerting related to the state of the circuit, as it’s a planned operation, thereby skipping a triggered page in the middle of the night.
Obviously, to take all the previously mentioned actions, someone from the network team has to read and understand what is going to happen, and determine when and which circuits are affected. Then, an action plan must be defined for “pre” and “post” maintenance. But you can’t take a notification as the last word, as normally, updates will follow the maintenance, changing the impact or the dates, so changing your action plan.
This is not for us, we don’t have too many maintenances
Are you sure? My hands-on experience with Internet Transit circuits, as well as others, gives me a rough estimation of 3 maintenances per circuit, per year (this depends on multiple factors).
Considering one engineering hour of work to handle one maintenance (including reading notification, defining the action plan, implementing the actions before the start time, and the same after the end time), you can get an approximate estimation of how much time is spent on and its impact on your operations.
As an example, one of our customers with a lot of circuits reported up to 15 maintenances per day, so on average, two engineers would be focused on handling circuit maintenances.
But, don’t worry, network automation comes to our rescue!
Let’s ask ourselves: “what if” we could automatically fetch and understand those notifications and convert them to data that we could leverage to automate what we had been doing manually?
In this picture, instead of relying only on the network operator to analyze a maintenance notification and then manually perform the required operations, we are proposing to enhance the Source of Truth (SoT), where we store the intended state of the network, to automatically fetch the notifications, parse them to understand their data, and populate it into the existing SoT data models. Eventually, when the network automation process attempts to match the intended state to the current state, the necessary changes will be rolled out.
Notice that we are moving from statically defining the network intent, to a dynamic approach, where we delegate part of the intent to external sources. And therefore new challenges will arise, such as curating this data to limit its scope.
Obviously, we are not the first ones trying to solve this problem. You probably already have created some homegrown scripts that help you with this.
In 2016 a group of people got together and created a NANOG BCOP (Best Current Operational Practice): A machine-parsable standard for formatting maintenance notifications. They were focused on solving a big problem: how to use a common format that would make maintenance notifications parsable by a machine. The proposed format uses the iCalendar format RFC5545 and defines the parameters that should define a maintenance and its properties. The BCOP was proposed as an IETF draft and, even thought not becoming an actual standard, it has gotten some traction, and there are currently a bunch of providers who have adopted it.
But, until all the NSPs are following the recommended format, we have a problem, as some notifications would be difficult to automatically parse. To tackle this issue, an interesting open source project, Janitor, was created. It is a flask
application with an embedded parser for circuit maintenance notification emails, some following the previously recommended format and some not.
From Network to Code (NTC), after reviewing all the previous work, we are proposing the following approach:
The key point is splitting the problem into two parts:
This divide-and-conquer approach provides a key benefit: each component will have a different road map. So if you are using another SoT, you can still benefit from the parser without having to embrace the proposed SoT. Obviously, if you want a turnkey solution, you can directly adopt the proposed SoT that is using the parser.
The first component is the circuit-maintenance-parser Python library with a single goal: convert heterogeneous circuit maintenance notification formats to the well-defined standard format proposed above.
The library will take a raw text notification as input and will return a parsed object (dict
) compliant with the BCOP notification format. The provider type can be specified, but if not, the standard iCal
format will be used.
To install it from PyPI: pip install circuit-maintenance-parser
And to use it:
from circuit_maintenance_parser import init_parser
raw_text = """BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Maint Note//https://github.com/maint-notification//
BEGIN:VEVENT
SUMMARY:Maint Note Example
DTSTART;VALUE=DATE-TIME:20151010T080000Z
DTEND;VALUE=DATE-TIME:20151010T100000Z
DTSTAMP;VALUE=DATE-TIME:20151010T001000Z
UID:42
SEQUENCE:1
X-MAINTNOTE-PROVIDER:example.com
X-MAINTNOTE-ACCOUNT:137.035999173
X-MAINTNOTE-MAINTENANCE-ID:WorkOrder-31415
X-MAINTNOTE-IMPACT:OUTAGE
X-MAINTNOTE-OBJECT-ID;X-MAINTNOTE-OBJECT-IMPACT=NO-IMPACT:acme-widgets-as-a-service
X-MAINTNOTE-OBJECT-ID;X-MAINTNOTE-OBJECT-IMPACT=OUTAGE:acme-widgets-as-a-service-2
X-MAINTNOTE-STATUS:TENTATIVE
ORGANIZER;CN="Example NOC":mailto:noone@example.com
END:VEVENT
END:VCALENDAR
"""
data = {
"subject": "this is a circuit maintenance from some NSP",
"sender": "support@networkserviceprovider.com",
"source": "gmail",
"raw": raw_text,
}
parser = init_parser(**data)
parsed_notifications = parser.process()
print(parsed_notifications[0].to_json())
{
"account": "137.035999173",
"circuits": [
{
"circuit_id": "acme-widgets-as-a-service",
"impact": "NO-IMPACT"
},
{
"circuit_id": "acme-widgets-as-a-service-2",
"impact": "OUTAGE"
}
],
"end": 1444471200,
"maintenance_id": "WorkOrder-31415",
"organizer": "mailto:noone@example.com",
"provider": "example.com",
"sequence": 1,
"stamp": 1444435800,
"start": 1444464000,
"status": "TENTATIVE",
"summary": "Maint Note Example",
"uid": "42"
}
Currently, there are only a few providers that are supported, but adding new parsers is easy, especially for the ones that match the standard one. So we expect that along with community adoption, more new parsers will come.
NTC released Nautobot a few months ago as a fork from Netbox SoT, extending it with several new features that established the foundation to build the new Circuit Maintenance Plugin, leveraging its microkernel architecture.
circuit-maintenance-parser
library to parse it and obtain the standard object.For each maintenance notification, we will capture all the relevant information, such as the start and end time and the status of the maintenance (i.e. CONFIRMED
).
The Circuits table lists all the affected circuits with the specific impact, so we can understand the expected impact in a granular way.
In the Notes section, a network operator can add manual notes about the maintenance but also incorporates automatically generated notes that warn about a circuit ID referenced in the notification that is not in the Circuits database, in case this was an error.
At the bottom, you can also see all the Notifications related to this circuit maintenance, to understand where all this information is coming from and the life cycle of the maintenance, from its creation and through subsequent updates.
Everything that could be done via UI is also available via API (REST and GraphQL). Other than retrieving data, you can also create Circuit Maintenances or trigger Notifications Handler jobs.
Automating the handling of your Circuit Maintenances could bring several benefits that you could progressively adopt:
Handling Circuit Maintenances manually has always been a pain for every network operations team, so NTC has taken a step forward by releasing these two open source projects that will solve that problem by implementing a network automation strategy.
The projects were released a few days ago at RIPE82 and their success greatly depends on the community adoption, because it’s key to extend the Parser with new providers and add more interesting features to the SoT Plugin.
We encourage you to check it out and provide feedback, so we can grow the community.
-Christian
Share details about yourself & someone from our team will reach out to you ASAP!