This is part of a series of posts to understand Network Automation Principles.
Every engineer sets out to build “the right system” the first time, but is that even achievable? Do you even know your requirements? Do you really know your requirements? No, seriously, do you really really know what your requirements are? Enter YAGNI and Premature Optimization.
YAGNI: “You Ain’t Gonna Need It” is a principle of extreme programming that states a programmer should not add functionality until deemed necessary, and “do the simplest thing that could possibly work.” (DTSTTCPW) – Wikipedia
The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. – The Art of Computer Programming
The Art of Computer Programming was written in the 60s, and still holds true today.
A year ago I was having a design conversation with a colleague, where he reminded me that the Big-O notation does not consider constants.
Big-O notation does not care about constants because Big-O notation only describes the long-term growth rate of functions, rather than their absolute magnitudes.
This is a good rule to consider when giving two near equally weighted choices, such as comparing a hash/dictionary lookup vs a list lookup, which is comparing O(1) vs O(n) and are both well understood/easily accomplished. However introducing optimizations such as threading or multiprocessing before you need to inherently makes the code more difficult to support. This greatly increases the design considerations, likelihood for race conditions, and ability to troubleshoot.
While it is understood why constants do not matter in Big-O notation, in practical application it certainly matters. Take a Big-O time complexity of the worst rate O(n!) (O-factorial), where you may never reach a number above “2” as the “N” value. Additionally, optimizing a process that takes only .1% of your real world time adds little value. However, it is hard to see where optimizations is going to be, until you get there. Finally, optimizing your process may not lead to any additional savings, such as when the actual process mis-optimization is in waiting for a human approval.
Another small example using Python constructs lambda’s and list comprehension over simple for loops. Especially in development teams where programming has not been the primary focus, the added complexity generally exceeds the efficiency gained in building network automation solutions.
Mistakes—I’ve made a few, and I will likely to continue to make a few more. I have gone down rabbit holes of introducing non-functional requirements (such as authentication and logging) too early, adding features before needed, and making software “frameworky.” I’ve been happy that I was able to use the optimize code, only to never see the software leave proof-of-concept phase. Said another way, I spent a lot of time of code that was not used in any meaningful way based on self-imposed requirements and optimizations.
In network automation, speed is something you will hear about often. In fact, it is probably the thing I hear most from my peers in the industry, and these are people that I deem intelligent and respect. However, in the use cases I am dealing with on a daily basis, it is rarely a major factor. Most use cases fall into a few primary categories.
These examples are in my experience the most common in the enterprise space and have little time sensitivity, where a connection taking milliseconds or minutes will have little effect on the overall product.
I have been working with a customer on a daily process to run backups and configuration compliance for 8000+ devices. Ansible has been clearly documented to be slower by several in the community than say Nornir 1 2. However, the places the optimizations would provide the most benefit were not likely the places you would expect, and the solutions were non-obvious as well.
The requirements as they relate to this blog post include:
The initial roll-out of this process was a success, but as devices and features are added, the process was taking longer and longer. A quick analysis showed that backup configuration and configuration generation were taking too long.
One issue encountered over the years was overloading the TACACS server with too many requests at once, making it impossible to scale too wide too quickly without additional infrastructure changes. For this reason, this multithreading resolution was not a good fit.
While there were clear issues with scaling too wide too fast, it does not mean that we could not make use of scaling wider. Resource utilization that comes with managing a large amount of devices in Ansible was slowing the process down. The reasons for inefficiencies are less important as the solution was rather simple. The solutions include, creating multiple workers in Ansible AWX/Tower and distributing the load by OS type for NXOS and EOS devices, and per region for IOS devices. This optimization was relatively easy to make and caused few if any design changes.
We noticed that configuration generation was taking a long time. There were two primary hypotheses we had for why:
Swapping out Jinja would have been difficult, however, swapping out how idempotency was implemented is a simple change. We simply needed to swap out the Ansible template module for the Ansible template lookup and save the file with a custom filter.
Original Task:
- name: "GENERATE FILES FROM JINJA TEMPLATES"
template:
src: "{{ src_file }}"
dest: "{{ dst_file }}"
New Task:
- name: "GENERATE FILES FROM JINJA TEMPLATES"
set_fact:
save_to_file: "{{ lookup('template', src_file) | save_output_to_file(dst_file) }}"
Filter Plugin:
def save_output_to_file(content, destination):
"""Save content to file."""
with open(destination, "w") as fh:
fh.write(content)
return destination
class FilterModule(object):
"""Ansible filter class."""
def filters(self):
"""List of filters."""
return {
"save_output_to_file": save_output_to_file,
}
It turns out that we gained 7x efficiency in this change, and decreased from about 5 hours of processing to about 40 minutes. Furthermore, we gained nothing with idempotency in this use case since the intention was to track via GitHub. With hundreds of devices, knowing if the task changed was not helpful in the long run, but was helpful using version control.
Had we proceeded to optimize before understanding the actual issue we would have almost certainly looked at multithreading, which would have caused us to introduce new technologies that would ultimately not work this use case. Understanding the issues and attacking them with actual data (task run times) allowed us to understand the problems we faced. No features were introduced that were not immediately needed.
Great!! Then go for it, introduce the correct amount of complexity needed to solve your problem. These principles do not change that, they simply challenge each engineer to really consider the options. For every reason not to introduce a feature there will be reasons to introduce a feature, some may even say “It Depends.”
Personal experience says that much to the dismay of an automator, network engineers tends to struggle to consolidate on standard configurations and designs of networking devices. Automation engineers struggle to consolidate on solutions sets and tools for similar reasons. Put another way “network snowflakes” is Bad, “automation snowflakes” is justified. With that in mind, I would urge automators to lean towards not introducing new tools, technologies, or complexities.
Ultimately, this is a nuanced topic without a singular answer. For every example I have where I wish I had not spent time building something, I have another example where I wish I had “done it right the first time.” Given that the intended audience is those building network automation solutions in the enterprise space, it is important to consider the context..
I am not advocating for never considering tomorrow’s problem and I am personally a fan of building out with modularity in mind. That being said, a balance always needs to be considered.
-Ken
Share details about yourself & someone from our team will reach out to you ASAP!