Network to Code Video Case Study: Automating a National Network With Open-Source Nautobot

Video Description

In this episode of "Simplified Network Automation," experts Michael Milaitis and Nikos Kallergis share their experience automating the Greek government's vast network infrastructure, spanning 35,000 sites. They discuss overcoming challenges, transitioning to open-source solutions, and implementing adaptable, reliable automation tools to streamline network management and reduce human error.

Transcript Highlights

Introduction and Background of Guests

Michael Milaitis and Nikos Kallergis share their backgrounds in networking and automation, detailing their transition from traditional roles to leveraging automation for mitigating human errors.

Triggers and Motivations for Adopting Network Automation

The guests discuss how manual work and errors drove their shift to automation, with Michael developing Python scripts and Nikos learning key lessons from early automation attempts.

The Greek Government Network Automation Project Overview

The guests outline the Greek government’s 35,000-site network automation project, focusing on the shift from licensed to open-source solutions for scalability and maintainability.

Architecture, Design, and Tooling of the Automation Solution

They describe the redesigned network architecture, detailing the use of NetBox, GitLab, Ansible, and HashiCorp Vault to create a flexible and scalable automation platform.

Live Demonstration of Automation Workflows Using NetBox and Nautobot

A live demo showcases the automation platform’s ability to generate device configurations and ensure network compliance through zero-touch provisioning and continuous monitoring.

Challenges, Ongoing Operations, and Future Prospects

The guests discuss challenges like engineer resistance to full automation, ongoing operational tasks, and plans to open-source portions of their automation work for community benefit.

00:02
Hello everyone and welcome to another episode of simplified network automation with NetGru. I’m your host and today we have two special guests, Michael Milaitis and Nikos Kallergis. Welcome guys, both of them from Greece introduce themselves. Nikos, why don’t we start with you? Sure, of course. Hello there. Thanks for the invitation and having us here. I’m Nikos. I live and work in Athens, Greece as you said. Career-wise I have been a network engineer for almost 20 years now. Multi-discipline I want to say I never
00:42
stuck to one thing. I have done routing, switching, firewalls, wireless IP telefan. I have been pre-sales not very proud of that but that was what gave me the time to start diving into automation. Maybe eight years ago close to 10 even and for the last 10 I have been very enthusiastic with automation has been my primary focus for five years now three and a half of which I’m with NTC my current role is a network automation architect with NTC and I guess that pretty much rounds up the introduction for me nice. And NTC
01:26
is Network to code, right? Just making sure. NTC is Network to Code. Yeah, we use a lot. Yeah. Sorry for that. No, no worries. Cool. Cool. Welcome. Happy to have you here, Nikos and then Michael, welcome. Oh, did you want to say something, Nichos, before? I do. I do. I just saw Ewan is jumping in and huge shout out not only for helping in setting this interview up but for his contribution in everything that we will discuss during this hour. Yes, John is big thank you for everything. Y is a great friend. So
02:04
super happy that that he’s joining us and looking forward to meeting. I mean we’ve met in Cisco Lab Amsterdam and that’s how we kind of started this conversation with having you guys on but yeah Yanis thank you for everything that you do. Thank you. And then Michael welcome tell us a bit about yourself like what do you do? Thank you very much for the invitation. I’m Michael Milaitis. I’m a senior automation consultant at Network to Code. My background is mainly in large scale networks. I spent over 10 years working
02:42
in the network engineer department of some of the biggest ISPs in Greece. Over the past few years, I shifted my focus to network automation. In my previous role, I built and led a team focused on network automation. We’re working on projects such as network notable deployments, ACI automation and service and things like that. And one of the highlights was the delivering of the automation solution for the Greek state in collaboration with Network to Code. We also designed and implemented a
03:24
hybrid network solution ACI automation frameworks with CI/CD pipelines NSO services for a major service provider G and I’m really really proud for the team that we built at the time. So yeah that’s my introduction. Nice. Nice. All right. So both of you guys, right, well all three of us pretty much network people at heart, have been around for a while and then kind of jumped on the automation bandwagon. How did that happen? Like what made the transition for you, right? Because I mean we’ve been doing networking for
04:01
many many years, but then at some point it’s like okay something must have happened for you to say okay automation, I kind of see it as valuable. Let me get on this. So what was the trigger for each one of you to kind of adopt automation and start working with it? Mi you started touching on that please feel free. Okay. So actually in my previous roles we had in one of my previous roles as I said I worked in service providers. So we had very frequent maintenance window and most of our operations were done outside of
04:45
regular working hours. And what I realized during that time is that no matter how careful you are, you are still human and mistakes happen. So when you’re working on a live network serving thousand and thousands of customers and important customers like banks, those mistakes can turn into major problems to you if you have work provider you can understand me that’s when I when I started using automation to generate configuration needed so my first automation if you can call automation it was Python script with print command
05:27
It really helped me reduce errors and that is when I began to understand the true value of automation, not just for configuration but also for validation because when you’re migrating an interface of a core routers that serves hundreds or thousands of MLS circuits, it’s impossible to test it manually. So bringing automation situations like this even with very dummy solutions like very very dummy Python scripts, you can realize how essential it is for efficiency and also for reliable network operation.
06:08
So that shifted my focus to automation and I realized the importance of it. Nice. What about you Nikos? I should have gone first. My story is silly. So, one of the things that is crucial I believe for incidentally both Mi and my story is that we were both trained in the same university. We were not friends at the time, but the focus of the university was developing coding. So for 4 years, we had pretty large exposure to the world of code and stuff and I believe that played
06:55
a key role going forward. Incidentally when I finished we had this apprenticeship program at this university that by the last semester, they sent you to a company for 6 months and there I met a now common friend Manol. Manoli and I were fresh out of school. I knew you know C sharp was not a thing at the time. It was plus+ and I was dabbling into Java and he took me by the hand. He was the lead engineer in the oil company that I went for my apprenticeship and he decided that it
07:37
was a good idea to create an ad hoc NMS with Microsoft Visio, Excel, Pearl and some Visual Basic and we did that. It was hilarious of course but it was a really good exercise and by the end we had become so arrogant with what we had just done that we started even sending commands to switches on the reliability thing that Mal has also touched on. I was able to enable viterial fashion. There was two big rings in this installation oil rig kind of and I was able to overflow two huge tanks because the
08:25
switches that had the controllers on them got completely bricked because I was doing VTP without spanning the tree properly. And so I learned that automation is really consistent in making mistakes. That was my first dive into automation and now reality after that I started working as a network engineer that was a huge inspiration around 2015, maybe a little bit earlier than that people started talking about this new thing, the big names that I had been reading before was Ivan Pepelak with IP space
09:07
portal. I guess he was the biggest inspiration for me. But also Matt Oswalt, he had this incentive from the Juniper side creating some labs around automation and obviously Jason Edelman and Damian Garos that were huge. also just mander and well when I joined Network to Code, it was a huge thing 5 years after reading these you know almost extraterrestrial people knowing so weird things that are networking, but also not networking. I interviewed for Damian Garos my first meeting with
09:54
Jason and a little bit after I met Josh and it was more like that. For me the topics of network engineering, I have touched on all of them. So I was a little bit bored in the company that I worked at the time. I was a sales engineer that was apparently not a good choice for me but as I said, that gave me the time to dive into that and this is how it came to be. Nice. So with a bit of free time, right, and trying to make the work easier for you and there you go, there’s the triggers for automation
10:39
mostly. Yes. And a lot of nice words around it at the time. The things were SDN. A lot of hype behind that. Yes. Open flow is going to put us all out of business. Open flow. Oh my god. Yes. Open flow was huge at the time. Everyone was talking about that. I was so angry with Mr. Pelniaak for busing at the time. I thought that it’s the future. How can you say that? And it was pretty much at the same time that there was a guy commenting in his comment section that called him a grumpy
11:19
old man on his ivory tower. And he was so sarcastic about that. I loved him so much. And at that moment I started thinking you know maybe he’s right about that. Maybe you know, centralized control does not make that much sense. Two years after the momentum of openflow kind of died, but there were still some really nice words. We were talking about lev that played their role. It’s not pure automation but the whole overlay stuff that pretty much ended up in VXL and VPN was another direction that was
12:01
really interesting. So much complexity that it invited automation and I guess that was one of the things that yeah also played a huge part into that. Yeah, I remember exactly because I was also reading Ivan’s blogs. I still do. I remember exactly what you’re talking about when everybody was on the SDN bandwagon and he’s like, “Hold on a second. This is not going to scale, right?” I was like, “This is not going to work.” At the time, I was a pre-sales engineer. So I was selling
12:35
the faxes and all the similar ideas that Cisco had at the time, central switches with remote line cards and so on like crazy. Everyone was so enthusiastic about that. And it was a really cool solution at the time. But yeah, it comes back to you, I guess. Yep. Yeah, it always does. All right. So then let’s talk a bit about this project because I mean I know you guys have been working on this with automating the the Greek government, all the infrastructure and I mean this is a
13:14
huge project right, we’re talking about hospitals and schools and everything the major places all around Greece, so how how did you get to work on this, who started this project first of all, how did you get started and how did you get involved? M do you want to start some slides that we have prepared in order to dive in? Yeah, we have two or three of them. Such a huge setup, right? I mean this is is this the largest network increase
14:06
besides ISPs. Besides ISPs, yes, beside ISPs because it involves more than 35,000 branches of the Greek state. Actually this project involves deploying a one network to interconnect all of the public sector pops which are essentially more than 35,000 branches including schools and also these branches are grouped by tenants since each tenant represents a separate administrative domain. So I believe it’s a huge network if you exclude service providers. I think that is one
14:55
of the biggest networks that you can see in Greece. Yeah. And some history behind that. In Greece we had a public sector ISP. It was called OT. It was run by the government until the mid ’90s if I’m not mistaken. And by constitution, they had the responsibility to provide circuits for all the entities of the public sector in Greece. At some point it was privatized. I am not exactly sure if I recall correctly, just slightly before 2000. And this is how the Greek state version
15:40
one project came to be. It was way smaller. It began in 2001 if I’m not mistaken. It covered 5,000 points of President Mhalli or 10 something like that, right? Yeah. 5,000. Yeah. It was way smaller in impact and size of course and 10 years later, everyone understood that it was time to refresh it and make it larger to include all the other organizations that were left outside in the first iteration. So around 2010, a committee started designing what eventually came to be the
16:30
Greek state project. A tender came out. Thank you. A tender came out around 2014. It was a thing that went on for almost four years after the bid was won. Eventually, to cut the story short, most of the big integrators in Greece, among which the two that Mahals and I were working for formed the union because of the size of the project and split the project in between them in order to start executing it. So it was put in action. They started spinning around end of
17:21
2018 and this is where we come into the picture. Mal was assigned the lead role for the networking part of the network and I was assigned the lead role for the automation part of that. The guy connecting the two was a colleague of Mal’s Tasos really crucial into everything that will come from here on. We started forming a team of seven to eight people discussing around how we can implement Mal’s design into automation because the whole thing was procured in 2014. The solutions were
18:11
let’s say not optimal. Some of the Cisco equipment that was to be bought was at the end of life, so it had to be refreshed. And the most crucial part was that of the network controller. The first version of the Greek state was based on checkpoint firewalls for the most part. Very simple specifications. For those not familiar, checkpoint firewalls have a management server that sits in the middle and pretty much configures the gateways that are in the points of presence. So you
18:49
have a centralized control and some goodies that come with the closed source nature of vendors like checkpoint. We did not have that. It was Cisco routers for us that had to also double as firewalls and we had to somehow manage them centrally. There was a software that was somehow hot around 2014 or 2015, but five years later it was nowhere to be found in the market. It was closed source. It was licensed per device per year which means that the costs would be pretty large. So we
19:30
were in a strange position. And one afternoon, Tasos the that friend of ours called me, and we had a meeting afternoon and somewhere in the third beer I want to say we had this revelation like you know what, let’s do it with netbooks or not get some services with the money that we will put for the licenses and we will have a really full open source solution really proud of of us. We will become better engineers and the whole team will benefit and also you know leave something nice
20:17
behind like an open source big solution, no fees going on for all these organizations and so on. So we had this big idea and this is where auanes comes into play. He was the middleman for contacting Tasota. Yes, that’s it. Zen and N1 contacted NTC for us. We came with a good offer to them like we have this much money that we would give for licenses. Is it enough to do the managed services, professional services to set this up? We found a middle ground and we started
21:08
doing that. And this was with Nicholas, this was around 2019 time frame wise, late 19 maybe early 20. We got an approval from the authorities. They were enthusiastic. Luckily, the people that were reviewing our change proposal were big proponents for open source and they really liked the approach that instead of going with a proprietary tool, we would go with open source making everything more supportable and maintainable going forward. And once we got a sign off, we embarked on the journey to
21:55
start designing the automation solution with Network to Code. Yeah, pretty much that’s the story of how we came to the project. Gotcha. And on the network architecture, Michael, were there any changes compared to the original design? I would imagine that yes, right? Because back when you started the design in 2010, I mean things have evolved also. So by 2020, did you have to rearchitect everything or like just small changes? There are a lot of changes. I wasn’t involved in 2010. So I get
22:35
involved in the recent redesign of the project. So there are lots of changes. Some device models have changed and the networking solutions have changed. Some VPN architecture that we used so we redesigned everything from scratch based on the new requirements because also the requirements have changed since 2010. So we redesigned everything from scratch from the beginning. Wow. All right. Yeah, it was a very interesting project. I can say we learned lots and lots of things, new things. Yeah. And what
23:24
also made it a little bit even more interesting is that six months after that, once we got the sign off to go open source, I jumped ship first and joined NTC. Luckily my first project was the Greek state. So I continued doing pretty much the same thing but you know from the NTC side. Even more likely, Mhallis was moved into my position. So we were peering as we were pretty much the lead engineers from the Network to Code side and from the Greek state side. So it was a really easy
24:06
communication. We have a report. It’s easy and we had a really cool team around us too. and not a big one. Four or five people and this is how we started working on providing these services pretty much. Yeah. Okay. So the service provider I see here is internet access then voice of IP, DNS, email, web hosting. So pretty much everything for these locations right from a network perspective. Yes. So then you guys, wow. I mean, having to rearchitect everything, having
24:51
to change moving from this checkpoint license model that’s going to cost an arm and a leg to going open source. I mean, these are massive, massive changes, right? And like big decisions that you had to go through. So then what happened next? I’m on the edge of my seat over here. It was very fun. You make it sound like it was anxiety reading or anything at the time at least in my head. Mi please keep me honest here. And the idea was that once we have this done and set first we will be
25:32
very proud of what we will have accomplished but we will have positioned ourselves in a really good place going forward as engineers like we will pretty much be in the place where our icons let’s say did do some name dropping before. That was the motive at least for me. What pushed me forward. All right. So then how did you approach this? Right. You decide we’re going with providing these services. We’re going to go with these tools and do the rearchitecture. So then how do you approach it? How do you
26:16
get started with this required gathering? Exactly. This is how we started by gathering the requirements pretty much. You know Greek tenders tend to be overly verbose. The one that allotted this thing to the union of integrators was around I don’t know some thousands of pages detailing every single aspect of what has to be done plus it was already 10 years old. So you know it’s out of fashion somehow. So we started a round of gathering information from the documents but also from the union companies that
26:59
were to run that. I’m a little bit ashamed of that, but this is what actually happened. Network to Code has a program called seda and the idea is that through a series of interviews we’re trying to formalize what is to be and provide a blueprint with architecture especially focused on automation but also around the workflows that we want to implement with said automation. So gathering the requirements we started pretty much okay I will not stick to the requirements of the demo that males will present later
27:42
pretty much goes through all that there were a lot of requirements though and eventually we came down to an architectural document around 100-200 pages that also details the tooling that we decided to use for that we went with, not with no actual second thoughts. It was the involvement with NTC at the time it was new and very flashy. So we adopted it and you know with everything that is databased, you come to realize that usually okay integrity is cool but
28:32
you need something like git. So for our sources of truth, it was not about git and a lot of knowledge went into the jinja templates that was to a very large extent Mhal’s work really elaborate intricate jinja templates to do this whole management of Cisco devices for the automation piece, it was mostly Python in the form of notable jobs and plugins, and there was a big telemetry piece centered around the elastic stack. The main reason that we went with that was there was a requirement to do sock kind of.
29:20
Elastic did provide that. We used GitLab for source control and CI/CD and Husker vault for secrets management. All these are pretty tightly integrated as a platform all together. Okay. So see of course some familiar tools in here and so says openwrt low. No open WRT. No, it was mostly Cisco but not the Lynxis variety. Yeah. All right. Cool. What next? Year and a half of involvement into setting up and trying to pretty much standardize what we had to do with 35,000 sites. Mal knows this
30:18
piece way better than I do because at the time he was really hands on in it too. It started with this big idea that we will separate the points of presence with t-shirt sizes like small, medium, large, extra large and that will be the only differentiator. Every tert size will get the same equipment, the same circuits, the same configuration and so on. Obviously this is in the fantasy world. It’s not realistic. And so we started making exceptions until we came to a point where we had to somehow model
31:06
these exceptions and be able to overrides based on multiple levels pretty much. The type of the organization be municipality, a school, a university, police station, you name it. The sizes, telephone sizes that apparently we came to understand are different from the network sizes and drilling down to the site level specifically. So just to show that I will very quickly go yeah the idea and that will also explain why we’re doing that due to contractual reasons and I don’t have a better explanation
31:52
for that this whole thing ended up being three systems, what we call infos short for information society. That’s the authority that pretty much designed the whole thing or at least played a very central role. There was another system doing hostmaster/ipam tasks and this is the piece that we were mostly working on CSMS sort for centralized security management system. Each system provided information to the next one so from the first we got sites prefixes and inventory from the host master we
32:29
enriched that with VRF and tenants and in CSMS we generated the actual artifacts that would eventually lead to the configurations. The reason I’m saying all that is that because we were receiving the site’s inventory and the base prefixes the host master level prefixes from this hostmaster entity to CSMS. We had most of the information needed in order to do what we call data generation from the minimum information generated all the resources that are in this funny box here. So data sync the blueprints I will
33:12
show them right now, led to data generation the artifacts like devices interfaces and so on in notaboot from there populating Jinja templates in order to drive config generation and then a really elaborate way that Mal is and some of our colleagues Christian Kenya worked on to do the GTP. On that I will leave the bottom floor for now and I will go to the actual blueprints. All right. So I hope the size is okay. This is the actual because please just I can zoom in. Of course. Yeah. Let me try that. I hope that’s
34:00
better. That’s better. Yes. Perfect. Okay. But yeah, no worries. It’s not something overly useful. The idea here is that this is the actual repo of blueprints that we’re using in order to do data generation. The t-shirt sizes are under access class. This is the internal code we used for the size of the points of presence. And these were the t-shirt sizes that I mentioned before. I forgot a symmetric that means DSL very small points of presence that used asymmetric DSL a DSL H and then we have small medium large the idea here is
34:44
that we were defining in YAML format what was to be created, so for the very small ones that’s one rack that is common for both wiring and the working of some villains, very simple structures, a device with its interfaces and so on. Whereas for enlarger, we have separate tracks, vans , devices with more interfaces and circuits and so on. We have a lot of overrides. So for example the asymmetric tenants are not one size after all they have intricacies. The metropolitan area networks have some rent way to connect
35:42
the internet doesn’t have enough ports for them. There are some others that do have switches. So we have extra villains, extra devices and so on and ideas that are merging as we go to the more specific and eventually with this structure we’re able to do standardization and also down to the site level. I’m doing spoilers now. This will be the demo that Milaitis will be delivering. So that’s me. I had forgotten about that, but the idea is that for the specific site we this is our
36:31
lab. So it has some special attributes we can override on the specific you know site that has this and that needed interfaces ranges and so on. So yeah, that was the first piece of the implementation synchronization with other systems and definition of what is to be created in not what we call data generation with these YAML structures.This living repo and not would pull from that and run some elaborate jobs that we managed to create in order to do the data generation. I can talk for hours on that,
37:19
but I really believe that the demo that Malis has prepared is way more interesting than my face spewing words. So if I may I will stop sharing and Mali all yours. Okay, let me start thinking. All right, so we have the templates defined, right? So then Milaitis takes those templates, loads them up in Nautobot and then creates configuration for every single site the sites that we choose pretty much. Again, the size of this thing is pretty huge 35K. So there was no clean phases like we
38:07
did this thing first for all of them and then the next one everything was parallelized. So while we’re getting information about new sites, there was a discovery with field engineers before they were onboarded. We were running data generation for the existing sites and then these were being actually installed GTP and so on and the pipeline ran again and again tens of operators doing that up to the day if I’m not mistaken. And with that I will stop talking. M all yours. One thing to add on top of what
38:45
Nikico said that network design was still in progress when we started the automation solution. So we should be very flexible to integrate new requirements and new features and this makes the project environment more complicated because we should always keep changing our solution to fit the networking team’s requirements. It’s like building a car while you drive it, right? Something like an analogy. Yes. So yeah, this is a demo lab environment. As you can see, there are no devices
39:32
populated in our not instance. What we have here is the data that we have been seeing from the other system as Nikos described. So we have a location. This is our site. It’s a demo site. So the site has been populated and has been synced from another notable instance that is the source of truth for location information. This site at the time had only circuits. So we have one circuit, we have prefixes which again have synced from the IPAM Nautobot instance which is a
40:19
different Nuatobot instances from the automation platform and also we have information about the device in a JSON format like the device serial number. So based on this information, the location, the circuits, the prefixes, we’re going to run a job from our Nautobot instance and this job is going to populate the device. So here when we run the job, we expect to see a new device. So we’re going to what a Nautobot job is actually is a Python script. Where you can write it through Nautobot and this job is going to create based
41:07
on our design the network device and based on this network device populating Nautobot we’re going to build the network configuration so I’m going to generate the device object and also I will create the DNS records because that was a requirement to when we create a new device we should automatically create the DNS record in the DNS the server. So I will run the job. Mi one tiny warning. I guess your internet connection is not perfect. So there are some visual artifacts when you are scrolling very fast. So if you
41:47
may scroll slowly. Okay. Sorry about that. Yeah, absolutely. No worries. It’s understandable just when you scroll. Yep. Okay. So we run the object generation job. So the device has been populated. So here you can see the device. Okay. So it’s an ISR 1121. It shows up there. You have the serial number, the device type, the rack, the tenant and also you can see the interfaces with the IPAM information based on the prefixes that that that we define and also the circuits has also been
42:40
terminated in the correct interface. So we also have circuit termination here. So what this job did create the device, create the interfaces and now we are able to render our digital template and produce the device configuration. Mi before you go into that, if you may I know I said please don’t scroll but yeah okay, if you may go to gigabit interface 7 017, that incidentally kind of demos how the overrides that we were discussing so if you may scroll a tiny bit further up a bit a bit more yeah so
43:24
that we see there it is that’s the lord of the overrides that we saw in the YAML documents just before. So for this specific site, the data generation took the overrides into consideration and did a very silly configuration for gigabit eth. Okay. So now I’m going to run the generate intended configuration with this job to render the gata plates. So I will choose the device and run the job. And hopefully we through the device we can see the intended config, the config based on our
44:05
design that should be on the device. So here let me refresh. So you’re creating all this before even the device is deployed to the site right? You just load it up in Nautobot. You have the serial number already added. You generate the configuration, the device is not even there, for example. It’s just being, you know, ordered from Cisco. And once it gets there, you start the ZTP process, I would imagine. Yeah, exactly. We’ll demo that. Tiny note on what you’re asking. It was a really big thing
44:41
that we did not want to go through the staging phase because we did the calculations and the logistics of storing, staging, sending and doing the GDP process for 35,000 devices were nightmares. So instead we had to do this whole process that Mihardless demos very independent let’s say from the physical aspect of the device. Pretty much the story is that from the Cisco factory, the devices were delivered to the point of presence and they were ZTP as perhaps will show directly. So yeah. So yeah actually this
45:28
is the device configuration. This is actually the rent. So what we need, one priority requirement was to eliminate the need for manual device staging. So we need to develop a zero touch provisioning solution that would allow field engineers to automatically get the device configuration from not. So what we essentially need to do is to get the device configuration based on the serial number. Mhm. So I will show you this API request that we, so this is an API request to Nautobot our input is the serial number of the
46:09
device and if we send this API request we are going to get the full device configuration and this okay so this Nautobot API coin can provide the device configuration but the requirement was to get this device configuration directly from the device. So you cannot do a kernel like this from the device. So what we need to do is use the copy https command. So the requirement is to use a command like this copy https running config. In order to get the device. So a field engineer with a single command to
47:01
get the device configuration through Nautobot. So what we did is we created this URI. I’m going to get the configuration to a test cfg command just to not affect the device configuration. What this URI is is to encode the token and the serial number of the device. This gets interpreted by an intermediate reverse proxy which transforms it to a Nautobot API request. So this copy HTTPS command will bring the device configuration to the device. So this will take a couple of seconds because this will trigger
47:48
the Nautobot jobs to run, produce the configuration, render the templates and then send them over HTTPS directly to the device. So with one command you can get the full device configuration, the intended configuration which is produced by our device on our design to the device. So gotcha I don’t know if it was or if you want to ask something about that. Yeah I have a question for the previous step right when you instantiated the device in Nautobot where did you get the IP addresses from? Is that part of the Nautobot job? Is there a
48:26
database in the back end? Like what’s going on there? How did you populate those like you know IPv4, IPv6, all that those villains? Blood, sweat, and tears. [Laughter] Yeah, it’s automatically generated. Yeah, please. Sorry. Yeah. No, no, I showed you the prefixes there. So each prefix has a role. So if I go to prefixes, we have those prefixes. Each prefix has a role. This a one prefix, this is a datal prefix, this is a lubak prefix. So based on the pre prefixes assigned to the specific site we can
49:11
produce the required IP addresses. I see. I see. And we get and we get those prefixes from the IPAM Nautobot instance and just just just to conclude with the GTP. So this command the copy HTTPS command get finished so I can show you that the test cfg file has the running configuration of the device. But yeah, as for your question around IP addresses, not only this IP addresses but also circuit information, a lot of credentials and stuff like that, there is a lot of logic hidden in
49:57
the blueprints that we tried to show before. So in the most complex situation, a really big organization would get a /21 or something and that’s the only information that we have along with the serial number of the device. From that we can extrapolate the devices, the interfaces, the prefixes from the 21 to the /24s, 29ths and so on. what has to be assigned to the circuit interfaces credentials. Anyway, this job that Malis clicks the button and runs in 5 seconds took us weeks to do and months even.
50:42
Yeah. And it was in progress for even many more months after that because the changes were coming here and there. So yeah. So we have the intended configuration. Now we’re going to get the backup configuration which actually is the actual configuration of the device. So this job login the device and gets the actual configuration. So one job renders our data plates produce the intended. This job gets the actual config from the device which is the backup configuration. And then we need to perform configuration
51:23
compliance. So we need to check the actual device configuration with the intended and see how compliant we are with our design. So this is the configuration compliance job. So does Mal and may have done this so many hundreds of times that yeah the story here is that we did automate a huge part of that. When we’re running the GTP the state of the device is 100% what we intended it to be. But one of the main challenges of automating such a large network is that you need a lot of engineers to
52:05
operate it and not a lot of them are on board with automation. So it’s really really common and a lot of times really necessary that they will jump in, nighttime, after hours, you name it and do changes right and they do manual changes because at that time you do what is needed to bring the site up. This is where the configuration compliance comes into play. Okay. Exactly. And here you can see the compliance report. Here are all the features that we have defined. With green you can see the compliant
52:41
features. With red you can see the non-compliant features and in the non-compliant you can see what are the missing configuration, what is the remediation and the remediating configuration. So what conf what we should configure in order to bring the device configuration to the intended state. For example, if we see some access list we will see that this configuration is missing. This configuration is extra. So it is on the device but it’s not in our JA templates. So in order to remediate the
53:16
configuration we need to apply this command. So if I can show you a very simple example. If we check the logging for example, the logging at this time is compliant because the intended config is the same with the actual. So if we go to the device and sorry, open the correct terminal. So if we go to the device. And do you run these compliance jobs on a daily basis? Like what’s on a schedule? They run just to make sure that device is okay. It’s a scheduled job that runs every night. Yeah. So if I go and change the configuration
54:04
so I will say we will put 40 here. So this is going to bring our login feature in non-compliant. So I will go and run again. All the backup intended and compliance. So let me find this one. So since I changed the device configuration, we expect the logging feature to run the job. This job is going to get the configuration ending. And produce the compliance report. So this is like an incredible amount of work to just build everything on top of Nautobot and have all the jobs, all the ZTP, the compliance
55:02
and you said it’s all open source. So can people find these tasks and these jobs somewhere? Can they have their own instances of Nautobot and kind of build and use what you guys have done here? If I’m not mistaken, Mel, we have only used open source plugins and everything. The actual implementation, we have not publicized it. I mean there is a lot of project specific logic into that. So I wonder if it would make sense to make it public for other people to get, but at least a couple of jobs or
55:44
tasks, right it’s like okay the ZTP piece or the config compliance right and kind of like sanitized remove any you know credentials stuff like that but it’s like hey this is the Python task for Nautobot to do complete compliance just you know you run on a schedule. It connects to the device, gets the config, does a diff. Yep. And it shows you. I think that’d be super helpful. Well, incidentally, all the jobs that you intended, backups, compliance, also the the candidate configuration with secrets and everything. This come
56:22
out of the box with Nautobot golden config plug-in. I feel like an advertiser. Sorry. But that was actually the reason that we went with Network to Code in the first place to be totally honest because they had the mechanisms to make all that almost out of the box. So we started with a big advantage that half of our work was already there for us to implement. The ZTP that’s a view that Malis implemented with colleagues of ours. The data generation, that’s 5,000 lines of code plus the blueprints. Anyway, so yeah, there are
57:04
pieces that we should consider open sourcing. That’s actually a really good remark. So, as you can see, the attended configuration is different from the actual because the history size is 40 instead of 20. And we can see that this is the remediating configuration. We should bring the long history size to 20 because this is our design and we can apply this to our device. So if we go to config plans. So if we go to config add the config plan this is the remediation config plan. Let’s put a random change control ID.
57:53
The feature is logging and the device. This is going to produce the configuration plan. So here we have a configuration plan. We can see the config set. What are the actual commands that we will send advice? We need to open this config plan and make it to approve the status to change the status to approved so that we can proceed and deploy the configuration to the network. And now we can deploy it. This will spin up a new Nautobot job that will connect through norer to our network device and change the
58:46
device configuration based on the remediated config. So again we go to the device, see the comp is compliant now. No, not yet because I need to run the jobs again to get the actual config from the device. So I need to go again and run the git backup because I need to get the actual configuration device again so that Nautobot would be aware of this change and then this is going to be compliant. You’re doing that like in the movies where the bomb time ticker goes down to two seconds. It’s exactly
59:36
top of the hour and you will see that we comply but yeah there you go. Okay, let’s see if the login git compliant. Yeah, so it’s again 20. So we make our device compliant with our design with just two clicks. And this is what we that’s our network configuration management solution. We generate the config through the Jinja templates. We get the config for the device. We compare them and we produce the compliance report and then we can remediate the device configuration. And I really hope it did
01:00:18
not go unnoticed that we did not show the jijna templates because this is where we’re hiding the dirts. But yeah. They are huge. They’re huge. Yeah. I mean there’s a lot of hidden stuff, right? To make this so easy and seamless and just having everything managed through Nautobot. There’s a lot of heavy lifting happening in the back. I’m sure because it’s so smooth and the experience is nice and everything makes sense. So there, you know, there’s a place where these things happen in the back. So then let me
01:00:50
ask you guys, what next for this, right? So you got the net, you got the automation in place like what’s coming up next for this project. Version three I guess in some years. Right now the main guy running the automation piece is our friend Yordanis. We would have invited him here but he’s vacationing in Vienna so I guess, good for him. There are a few tens of operators running the platform every day and it’s still in the deployment phase. Not all the sites are yet in production. So more
01:01:35
and more are being done. Thank you Panayote. We were involved in this project up to a month ago at Network to Code. We’re still supporting the whole thing. We were delivering new automations. We had created a really cool single site synchronization that was a big ask for a year or two. Now there are a few other workflows that the operators are asking for. These are being handled right now by Danis. He pretty much is in the shoes of Mhallis since Mhalis joined the Network to Code. And I believe this is the current
01:02:23
state of the project. Also there are lots of others features that we have implemented instead of configuration management, implemented firewall rule management, automatic OS upgrades device life cycle we get the CVS for each software that we have in the network automated from Cisco portal, VPN management. We can create VPNs on demands. But we don’t have the time to show you all this functionality. But there are lots of other things that we have implemented,
01:03:02
but we don’t have time, maybe in another time to show you. No, I mean I yes, I’d love to have you guys back actually to go into a bit more details if you have time, but and also if we can publish something to make something on GitHub public for folks. I mean, Yuanis was suggesting here the process to trigger the ZTP. ZP. Yeah, it would be interesting maybe to publicize. So if we can get folks, you know, like a GitHub repo, a public one is like just a bit of story of all like exactly what we’ve
01:03:36
talked today and then a couple of jobs for Nautobot like ZTP and even if it’s built in the config drift or the golden config thing if it’s built in Nautobot just to mention I think that’d be fantastic because we can then point people to that you have the video right here is how we show what you folks have done and then you want to do the ZTP is like right there. You know how it goes. I mean we’re doing that every other day now that we’re in Network to Code and we tend to forget sometimes how magical it can feel. It’s a given for us.
01:04:15
I mean if someone asks me if you know configuration management I would say yeah config does it and I don’t even think about the effort that has gone behind that. Thank you, Ken. That’s my director. He’s the creator of the golden config. But yeah. All right. All right. Well, we’re a bit over time here. Thank you so much, you guys, for this. Any final rewards before we wrap up? I have mentioned all the people that I had written about in my small paper. So, I’m good. Thank you very very much, Adrian, for that though. It was a really nice
01:04:50
experience. I really appreciate it. No, definitely really really happy that you guys made it. Like I said, let’s make sure we bring you back, but next time let’s have a public GitHub repo with at least some of the jobs to show folks. I think that’d be fantastic. All right. Perfect. Makes sense. Well, thank you all for watching. See you next week. We’re gonna have a bidget bakal and we’re going to talk NSO automation, creating services with NSO, all that config management. So see you then. But in
01:05:28
the meantime, big thanks, Milaitis, Nikos. Thank you again so much. Thanks everyone for watching this. See you guys on the next one. Take care, folks. Bye. Bye. Thank you very much. Take care. Bye. Bye. Bye everyone.
ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Thanks for submitting the form.