Circuit Maintenance Parser Powered by AI/ML
More than two years ago, NTC released the circuit-maintenance-parser Python library to facilitate the arduous job of understanding what network service providers say when sending circuit maintenance notifications without any normalized format. We explained the why and how in these two blogs: 1 and 2. This has proven useful, but recently we challenged ourselves: how could technology like Artificial Intelligence and Machine Learning (AI/ML) make it even better?
Recap of Two Years, and What’s Next?
The circuit-maintenance-parser
library provides several parsers for transforming circuit maintenance notifications from many network providers into normalized ones, making it very easy to digest them programmatically.
In two years, we have seen the addition of new parsers together with updates and fixes for the existing ones. (You can check the complete list of currently supported ones in the repository README), but just to name a few: NTT, AWS, Equinix, Cogent, COLT, EXA, etc. Also, we received notice of many users of the library worldwide!
An example of an application leveraging the library is the Nautobot Circuit Maintenance App that fetches emails from network providers, parses them, and updates the related circuits in Nautobot.
The parsers can work on many different data types (e.g., ICal, plain text, HTML, CSV, etc.). There is a generic implementation that works on a proposed reference format in this BCOP.
To better understand the new changes introduced, it’s convenient to explain first the four basic entities of the library:
Provider
: represents a network service provider that can leverage severalProcessors
, in a specific order (if one fails, it tries the next).Processor
: combines the structured data parsed by one or severalParsers
to create one or severalMaintenances
.Parser
: extracts structured data from a raw notification.Maintenance
: it’s the outcome of the parsing process, and it adheres to the reference format mentioned above.
So far, so good. The library has been able to evolve and adapt to new requirements. However, every update requires a human modifying or creating a new parser (i.e., developing the logic, creating a PR, and accepting and releasing the parser).
Nowadays, with the explosion of Large Language Models (LLM) as a subset of the Machine Learning technologies, the text processing is getting transformed by new opportunities, and we believe the circuit-maintenance-parser
is a great use case to explore them. So, let’s see how we approached it.
Understanding How LLM Parsers Work
In short, a circuit maintenance notification is a text that contains key information that needs to be extracted and normalized according to the library requirements. This is what we tried to solve following the next guidelines:
- A new
Parser
, calledLLM
, has been created to implement the logic required to ask the question that should provide the parsed response. TheLLM
needs to be implemented for a specific platform (e.g., OpenAI) to interact with it using the predefined hooks (i.e., to craft the API calls that every platform provides). - Every
Provider
could include, as the last resort, aProcessor
that containsLLM
parser implementation, when some conditions are met. Thus, theLLM
parser is not the first parsing option at all. Human-defined parsers are used first. Only if all of them fail are theLLM
parsers taken into account. - The
Maintenance
object comes with a newMetadata
attribute which provides information about theProvider
,Processor
, andParsers
used in the information extraction. This is very important to allow library users to consider when using the data, because the level of confidence is not the same for all the parsers.
Hopefully this makes sense to you; so now it’s time to see it in action.
Let’s Use It
First, we need to install into the library the openai
extension (it’s the only implemented LLM provider for now).
pip install circuit-maintenance-parser[openai]
Then, using the built-in CLI tool (i.e., circuit-maintenance-parser
), we can see how it works, leveraging example data from the tests.
You could reproduce the same interacting directly with the library, but the CLI offers a simpler interface for demonstrating it.
Before getting into the magic of LLM, let’s see how the library works without LLM-powered parsers (the default option).
$ circuit-maintenance-parser --data-file tests/unit/data/aws/aws1.eml --data-type email --provider-type aws -v
Circuit Maintenance Notification #0
{
"account": "0000000000001",
"circuits": [
{
"circuit_id": "aaaaa-00000001",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000002",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000003",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000004",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000005",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000006",
"impact": "OUTAGE"
}
],
"end": 1621519200,
"maintenance_id": "15faf02fcf2e999792668df97828bc76",
"organizer": "aws-account-notifications@amazon.com",
"provider": "aws",
"sequence": 1,
"stamp": 1620337976,
"start": 1621497600,
"status": "CONFIRMED",
"summary": "Planned maintenance has been scheduled on an AWS Direct Connect router in A Block, New York, NY from Thu, 20 May 2021 08:00:00 GMT to Thu, 20 May 2021 14:00:00 GMT for 6 hours. During this maintenance window, your AWS Direct Connect services listed below may become unavailable.",
"uid": "0"
}
Metadata #0
provider='aws' processor='CombinedProcessor' parsers=['EmailDateParser', 'TextParserAWS1', 'SubjectParserAWS1'] generated_by_llm=False
At this point, you can see that the parsing was run successfully producing one Maintenance
, with the new Metadata
providing info of how it has been parsed.
You can see that it leveraged the provider-type
to tell the library which provider had to be used (aws
). However, without this information, the library can’t parse it properly, because it defaults to the GenericProvider
which only understands the ICal data type using the BCOP recommended format. Let’s try it:
$ circuit-maintenance-parser --data-file tests/unit/data/aws/aws1.eml --data-type email -v
Provider processing failed: Failed creating Maintenance notification for GenericProvider.
Details:
- Processor SimpleProcessor from GenericProvider failed due to: None of the supported parsers for processor SimpleProcessor (ICal) was matching any of the provided data types (email-header-date, email-header-subject, text/plain).
Now, let’s see how the new OpenAI
parser (implementing the LLM
) can help us. The only mandatory thing to activate is to set the PARSER_OPENAI_API_KEY
environmental variable:
export PARSER_OPENAI_API_KEY="use your token here"
By default, it uses ChatGPT 3.5 model; but you can change it with the PARSER_OPENAI_MODEL environmental variable. To see all the available options (including options to customize the LLM question), check the docs.
At this point, every Provider
will have the OpenAI
parser as the last resort.
Let’s repeat the previous example without providing the provider-type
(your output can differ, it’s not deterministic), and notice the Metadata
associated to this output that mentions the parsers being used. You will also see how this takes slightly longer than before because the OpenAI API is being used.
$ circuit-maintenance-parser --data-file tests/unit/data/aws/aws1.eml --data-type email -v
Circuit Maintenance Notification #0
{
"account": "Amazon Web Services",
"circuits": [
{
"circuit_id": "aaaaa-00000001",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000002",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000003",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000004",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000005",
"impact": "OUTAGE"
},
{
"circuit_id": "aaaaa-00000006",
"impact": "OUTAGE"
}
],
"end": 1621519200,
"maintenance_id": "aaaaa-00000001",
"organizer": "unknown",
"provider": "genericprovider",
"sequence": 1,
"stamp": 1620337976,
"start": 1621497600,
"status": "CONFIRMED",
"summary": "Planned maintenance has been scheduled on an AWS Direct Connect router in A Block, New York, NY for 6 hours.",
"uid": "0"
}
Metadata #0
provider='genericprovider' processor='CombinedProcessor' parsers=['EmailDateParser', 'OpenAIParser'] generated_by_llm=True
The output should provide a “similar” successful parsing like the above one. However, a closer look will reveal some differences. Some of them may be acceptable, and others not. Having the metadata (including a generated_by_llm
boolean), the library user can choose how this information should be managed, maybe adding extra validation before accepting it.
If you use any of the available tools to extract the difference between the JSON objects (such as https://www.jsondiff.com/), you can see which are the differences (you may get different output depending on your results). Keep in mind that you may need to discard or adjust some information.
{
"account": "Amazon Web Services",
"maintenance_id": "aaaaa-00000001",
"organizer": "unknown",
"provider": "genericprovider",
"summary": "Planned maintenance has been scheduled on an AWS Direct Connect router in A Block, New York, NY for 6 hours."
}
And, if you are wondering what would happen if you properly set the provider type, the result will be exactly the same as before because the aws
provider knows how to parse it properly, and the LLM parser is not actually hit.
Conclusion
At NTC, we are constantly considering how to leverage AI/ML technologies to support network automation use cases for all the different components of our recommended architecture (more info in this blog series), and this new feature is an example of how our open source projects can be powered by them.
We would like to encourage you to give it a try, and provide constructive feedback in the form of Issues or Feature Requests in the library repository.
Thanks for reading!
-Christian
Tags :
Contact Us to Learn More
Share details about yourself & someone from our team will reach out to you ASAP!