Blog Detail
Network automation has become prevalent in the network industry over the last few years and yet we have little data on the state of the market today. There is a lot of discussion about Ansible and Python but beyond that there is not a good source for those seeking to understand what tools are being used by different companies, what operations people are automating the most/least, or even how long it is taking on average to learn network automation.
The NetDevOps Survey project was started in 2016 to address these questions and more. The idea was to start a survey about the network automation industry to help bring clarity to these questions. Network automation is deeply rooted in open source, and it was decided to make the project open and collaborative, following the best practices from open source projects. The intention was to have the survey be both anonymous and vendor neutral.
When the initiative started in 2016, 20 of us came together to define the first set of questions. At the time, I was working at Juniper and Jason Edelman was on the early days of Network to Code, but we worked together collaboratively on the project.
After a few years of inactivity, the second edition of the survey was released in October 2019. This is in big part thanks to Francois Caen who pushed for it to come back, and provided the help to organize this new edition.
As we worked on updating the survey for the 2019 edition, we tried to reuse the same questions as much as possible to allow us to compare the evolution of the responses over time. We also added a completely new section to understand how organizations and individuals are transitioning into network automation. This section was suggested by the community and was a welcome addition–the insights we are getting from it have been very interesting.
Participants in the 2019 NetDevOps Survey
The 2019 edition resulted in 300 responses which is about the same number that as the first edition in 2016.
The first set of questions was designed to give a better understanding of the type of networks and environments the participants come from.
Looking at the graphs below, there is a good distribution of participants both in term of types of environment and network sizes with an average around 1000 devices. It’s interesting to note that while there is a lot of coverage (blogs/podcasts/press) around network automation, most sources are focused on data centers. 60% of the participants in this survey are also managing Campus and/or WAN networks, but the data center is still the environment mentioned by most participants (~75%). This number has declined slightly since 2016, when data centers were mentioned by 80% of participants. These numbers are in line with the migration to the cloud that we here at Network to Code have observed with our customers.
State of Network Automation Through Automation
The main section of the survey is meant to understand which day-to-day operations are currently automated and which tools are used for each use case. We put together a list of 13 of the most common operations spanning topics such as configuration management, troubleshooting, and software upgrades.
While the 3 main operations that are automated today are focused on configuration management, it is interesting to see a significant increase around compliance check and pre-post changes. At the bottom of the graph we are also seeing a noticeable increase in responses on troubleshooting and software qualification.
Configuration Management
If we look specifically at configuration management, it’s interesting to see that 60% of the participants are using Ansible and roughly the same percentage are also using some scripts at different levels of abstraction. Nornir and Saltstack are both used by ~10% of the participants, an impressive achievement for these 2 open source projects that have been mainly driven/promoted by the community. Kudos to David Barroso, Mircea Ulinic, Kirk Byers, and Dmitry Figol.
the graph is a little bit misleading because we split scripts in 2 categories this year but if we add #2 and #3 we are close to 60%.
Interestingly, on average, participants selected more than 2 responses to this question, which means that a lot of participants are using more than 2 solutions to generate and deploy configuration. This fact got me curious, so I decided to dive deeper into the responses to understand which tools people are mostly using in addition to Ansible.
In the graph below, I narrowed down the responses to only the participants that selected Ansible. It is interesting to note that 12% of them are also using Nornir and more than 60% are using some scripts in addition to Ansible. There is not enough information to truly explain the reasoning behind these results but it is something I think it would be interesting to investigate deeper in the next edition.
As a side note, there are a lot of interesting analytics that haven’t been done yet on the data, such as diving deeper into each response or exploring how certain groups of participants respond to specific questions. If you are interested in doing some analysis on your own, the database and some tools are available in GitHub.
Maturity level / Automated Changes
At Network to Code, we often refer to network automation as a journey, which takes couple of years on average. As a part of the survey, I was personally interested in understanding the current level of maturity of our industry. How fast or how slowly is the market evolving? In the graph below, we can see that 37% of the participants have been leveraging automation in a significant way for less than 1 year and another 29% have been for 1 to 2 years. These numbers will be interesting to monitor year over year.
Another way to measure the level of maturity is to look at how manual and automated changes are coexisting, or not, within an organization. Usually, in the most advanced environment, manual changes are completely forbidden. There are two questions in the survey that give some good insight on this topic:
- Do you allow configuration to be manually changed in the CLI in addition to automated deployment?
- Have you automated the decision to deploy a new configuration?
To the first question, 14.5% of the participants indicated that they don’t allow manual changes in addition to automated deployment. This marks a significant increase from 2016, where only 8.8% of the participants responded “No”. And 46% of the participants indicated that they have fully or partially automated the decision to deploy a new configuration.
Anomaly Detection / Telemetry & Analytics
There has been an increase in conversations and projects surrounding telemetry and analytics in the last few years. A lot of my friend and colleagues working for webscale companies have reported using or building new telemetry and analytics stacks that are becoming an integral part of automation platforms.
Interestingly, the two questions related to anomaly detection/telemetry and analytics are showing a different picture. The majority of participants are still leveraging traditional monitoring solutions based on SNMP/Syslog and leveraging mostly Up/Down signals to detect issues in the network. With only 40% of the participants leveraging flows data and 10% using end to end probes.
My personal take-away is that today, telemetry and analytics is where network automation was 3-4 years ago with a significant disconnect between the most advanced companies and traditional enterprises.
A few years ago, network automation was not even a topic for most enterprise engineers, while a handful of companies were already all-in. At the pace at which the industry is moving these days, I think telemetry and analytics will make some progress in the enterprise space in the next couple of years.
Transition to Network Automation
As mentioned earlier, based on the input of the community we added a new section to understand how both organizations and individuals are transitioning to network automation, how long is it taking, what strategies are they adopting and more.
Team / Org
The results to the question what actions did you team take to transition to network automation show that most enterprises don’t have a concrete strategy and are relying on their existing staff to learn on their own or are just sending them to training. Less than 20% of the participants mentioned hiring a dedicated resource for network automation and less than 10% mentioned working with a consulting firm to help them in their automation journey.
Individual
As individuals, most participants (81%) estimated that it took them less than 1000 hours to learn network automation and 25% even estimated less than 200 hours. The majority of participants had to invest some personal time to learn new skills, while 40% where able to learn on the job either part-time or full-time.
Overall 34% of the participants mentioned that it took them less than 1 year to make the transition and another 45% estimated the transition at 1 to 2 years.
Industry Trends
The last section of the survey focuses on trends. What topics and tools are, or are not, top of mind right now? For this section, we selected a dozen tools and another dozen topics. For each of them we asked the participants if they are:
- Already using them in production (dark green)
- Currently evaluating them (green)
- Thinking about it (light green)
- Not interested (grey)
- No idea (orange)
There is a of information in the graphs below, so it is hard to cover everything but my personal takeaways are:
- 35% of the participants are already using a Source of Truth (SoT) in production and another 50% are either evaluating one or thinking about it. In our experience at Network to Code, a SoT (or SoT strategy) is a critical component of a network automation strategy and it often seems like the topic is not getting enough attention. It’s very encouraging to see such high level of interest in this topic.
- The level of adoption for ChatOps is still relatively low, with only ~15% of the participants using them in production and almost 30% of the participants expressing no interest. At NTC, we are seeing a lot of interesting use cases that can be solved with ChatOps and we are expecting this technology to get adopted more broadly in the future.
- DevOps, Infrastructure as Code (IaC), and CI/CD are getting a lot of interest and are getting used in production more and more.
On the tools side, there is even more going on. My personal takeaways are:
- Git and Ansible are used in production at a massive scale–both solutions are used in production by ~70% of the participants.
- Modern monitoring tools like ELK, Grafana, Prometheus & Influx are used in production by more than 30% of the participants. These numbers are encouraging but don’t necessarily align with the previous responses to the anomaly detection questions. This could be explained if both new and legacy solutions are coexisting right now and the new solutions are still mostly used for visibility but are not used yet for alerting.
- Nornir and Network Verification Software (Batfish, Forward Networks, etc. ) have a disproportionate ratio of production deployment compared to the level of participants evaluating or considering them. These two technologies will be interesting to monitor in the upcoming months/years.
Evolution over Time
Another interesting way to look at these results is to examine the evolution of the responses between 2016 and 2019. I selected a few below that I found the most interesting/surprising.
Looking at Git and Ansible, it’s interesting to see that for both technologies the level of interest was already very high in 2016 but the deployment in production were significantly lower. Both have gained significant market share in the last few years.
On the other side, solutions like Chef and Puppet have followed the opposite trajectory with a significant decrease in interest and deployment in production from the participants over the last three years.
The results surrounding event driven automation are surprising because, while the level of interest was already very high in 2016, the number of deployments in production has not significantly increased between 2016 and 2019. One explanation could be that EDA requires a higher level of maturity and expertise to be properly deployed in production. Based on the previous results, with 2/3 of the participants using automation for less than 2 years, it’s likely that the market has not reached this level of maturity yet.
Last but not least, it’s interesting to visualize the progression of Infrastructure as Code, CI/CD, and NAPALM over the last few years. Increased interest in these topics confirms what we are witnessing every day with our customers.
NetDevOps Survey
If you’re interested in learning more about about the NetDevOps Survey project, you can find the project on Github or join the conversation in the #netdevops_survey channel in the Network to Code slack channel.
All the results are available in Github in different formats:
- Raw TSV files
- SQLite Database with a Python library to query it
- 150+ Graphs similar to the one used in this blog
The plan is to start working on the 2020 Edition around August 2020 to have it ready to accept responses by October 2020.
How to help
If you’re interested in helping with the project or providing feedback, the best way to reach us is to open an issue on GitHub or join us in Slack.
At this point one of our biggest concerns is increasing the visibility of the project. The more participants we can get for the next edition, the deeper the insights and the better the project. Being community driven, we’ve been lacking marketing support to reach a broader and more diverse audience. Anything you can do to help here would go a long way.
Conclusion
Thanks for reading all the way to the end and for your interest in this project. If you are interested in diving deeper, the complete results of the 2019 Edition are available online.
I am personally looking forward to reading more analysis and hearing more perspectives on these results. I’m also looking forward to the next edition.
-Damien (@damgarros)
Contact Us to Learn More
Share details about yourself & someone from our team will reach out to you ASAP!