Automation Principles - YAGNI / Premature Optimizations

Automation Principles – YAGNI / Premature Optimizations

Ken Celenza

June 16, 2020

This is part of a series of posts to understand Network Automation Principles.

Every engineer sets out to build “the right system” the first time, but is that even achievable? Do you even know your requirements? Do you really know your requirements? No, seriously, do you really really know what your requirements are? Enter YAGNI and Premature Optimization.

YAGNI: “You Ain’t Gonna Need It” is a principle of extreme programming that states a programmer should not add functionality until deemed necessary, and “do the simplest thing that could possibly work.” (DTSTTCPW) – Wikipedia

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming. – The Art of Computer Programming

The Art of Computer Programming was written in the 60s, and still holds true today.

Computer Science Theory

A year ago I was having a design conversation with a colleague, where he reminded me that the Big-O notation does not consider constants.

Big-O notation does not care about constants because Big-O notation only describes the long-term growth rate of functions, rather than their absolute magnitudes.

This is a good rule to consider when giving two near equally weighted choices, such as comparing a hash/dictionary lookup vs a list lookup, which is comparing O(1) vs O(n) and are both well understood/easily accomplished. However introducing optimizations such as threading or multiprocessing before you need to inherently makes the code more difficult to support. This greatly increases the design considerations, likelihood for race conditions, and ability to troubleshoot.

While it is understood why constants do not matter in Big-O notation, in practical application it certainly matters. Take a Big-O time complexity of the worst rate O(n!) (O-factorial), where you may never reach a number above “2” as the “N” value. Additionally, optimizing a process that takes only .1% of your real world time adds little value. However, it is hard to see where optimizations is going to be, until you get there. Finally, optimizing your process may not lead to any additional savings, such as when the actual process mis-optimization is in waiting for a human approval.

Another small example using Python constructs lambda’s and list comprehension over simple for loops. Especially in development teams where programming has not been the primary focus, the added complexity generally exceeds the efficiency gained in building network automation solutions.

My Personal Journey

Mistakes—I’ve made a few, and I will likely to continue to make a few more. I have gone down rabbit holes of introducing non-functional requirements (such as authentication and logging) too early, adding features before needed, and making software “frameworky.” I’ve been happy that I was able to use the optimize code, only to never see the software leave proof-of-concept phase. Said another way, I spent a lot of time of code that was not used in any meaningful way based on self-imposed requirements and optimizations.

Network Automation Use Case

In network automation, speed is something you will hear about often. In fact, it is probably the thing I hear most from my peers in the industry, and these are people that I deem intelligent and respect. However, in the use cases I am dealing with on a daily basis, it is rarely a major factor. Most use cases fall into a few primary categories.

Small incremental changes connecting to 1-5 devices at a time during a change window.
Daily tasks such as backup configurations, configuration compliance, and operational data collection that is not time sensitive.
Infrequent global changes such as updating SNMP ACL’s or updating NTP server IP addresses.

These examples are in my experience the most common in the enterprise space and have little time sensitivity, where a connection taking milliseconds or minutes will have little effect on the overall product.

Real Life Example

I have been working with a customer on a daily process to run backups and configuration compliance for 8000+ devices. Ansible has been clearly documented to be slower by several in the community than say Nornir 1 2. However, the places the optimizations would provide the most benefit were not likely the places you would expect, and the solutions were non-obvious as well.

The Requirements

The requirements as they relate to this blog post include:

Complete all jobs in less than 24 hours, including:
- Backup configurations of networking devices.
- Intended configuration generation, integrating with an external Source of Truth (SoT).
- Configuration compliance comparing per feature (BGP, NTP, SNMP, etc.) on each device.
- Roll up reporting of the above.
Ability to run jobs for individual devices

The initial roll-out of this process was a success, but as devices and features are added, the process was taking longer and longer. A quick analysis showed that backup configuration and configuration generation were taking too long.

Multithreading

One issue encountered over the years was overloading the TACACS server with too many requests at once, making it impossible to scale too wide too quickly without additional infrastructure changes. For this reason, this multithreading resolution was not a good fit.

Scaling Wide

While there were clear issues with scaling too wide too fast, it does not mean that we could not make use of scaling wider. Resource utilization that comes with managing a large amount of devices in Ansible was slowing the process down. The reasons for inefficiencies are less important as the solution was rather simple. The solutions include, creating multiple workers in Ansible AWX/Tower and distributing the load by OS type for NXOS and EOS devices, and per region for IOS devices. This optimization was relatively easy to make and caused few if any design changes.

Configuration Generation

We noticed that configuration generation was taking a long time. There were two primary hypotheses we had for why:

An inefficiency in how Ansible uses Jinja.
An inefficiency in how Ansible idempotently checks for file configuration changes.

Swapping out Jinja would have been difficult, however, swapping out how idempotency was implemented is a simple change. We simply needed to swap out the Ansible template module for the Ansible template lookup and save the file with a custom filter.

Original Task:

- name: "GENERATE FILES FROM JINJA TEMPLATES"
  template:
    src: "{{ src_file }}"
    dest: "{{ dst_file }}"

New Task:

- name: "GENERATE FILES FROM JINJA TEMPLATES"
  set_fact:
    save_to_file: "{{ lookup('template', src_file) | save_output_to_file(dst_file) }}"

Filter Plugin:

def save_output_to_file(content, destination):
    """Save content to file."""
    with open(destination, "w") as fh:
        fh.write(content)
    return destination

class FilterModule(object):
    """Ansible filter class."""

    def filters(self):
        """List of filters."""
        return {
            "save_output_to_file": save_output_to_file,
        }

It turns out that we gained 7x efficiency in this change, and decreased from about 5 hours of processing to about 40 minutes. Furthermore, we gained nothing with idempotency in this use case since the intention was to track via GitHub. With hundreds of devices, knowing if the task changed was not helpful in the long run, but was helpful using version control.

Lessons Learned

Had we proceeded to optimize before understanding the actual issue we would have almost certainly looked at multithreading, which would have caused us to introduce new technologies that would ultimately not work this use case. Understanding the issues and attacking them with actual data (task run times) allowed us to understand the problems we faced. No features were introduced that were not immediately needed.

But My Use Case Is Unique!

Great!! Then go for it, introduce the correct amount of complexity needed to solve your problem. These principles do not change that, they simply challenge each engineer to really consider the options. For every reason not to introduce a feature there will be reasons to introduce a feature, some may even say “It Depends.”

Personal experience says that much to the dismay of an automator, network engineers tends to struggle to consolidate on standard configurations and designs of networking devices. Automation engineers struggle to consolidate on solutions sets and tools for similar reasons. Put another way “network snowflakes” is Bad, “automation snowflakes” is justified. With that in mind, I would urge automators to lean towards not introducing new tools, technologies, or complexities.

Conclusion

Ultimately, this is a nuanced topic without a singular answer. For every example I have where I wish I had not spent time building something, I have another example where I wish I had “done it right the first time.” Given that the intended audience is those building network automation solutions in the enterprise space, it is important to consider the context..

I am not advocating for never considering tomorrow’s problem and I am personally a fan of building out with modularity in mind. That being said, a balance always needs to be considered.

-Ken

Tags :

automation automation-concepts automation-journey netdevops tutorial

Does this all sound amazing? Want to know more about how Network to Code can help you do this, reach out to our sales team. If you want to help make this a reality for our clients, check out our careers page.

Author

Cookie	Duration	Description
__hssc	30 minutes	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__hstc	5 months 27 days	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_gat_gtag_UA_*	1 minute	Google Analytics sets this cookie to store a unique user ID.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
hubspotutk	5 months 27 days	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
ln_or	1 day	Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.

Cookie	Duration	Description
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Automation Principles – YAGNI / Premature Optimizations

Computer Science Theory

My Personal Journey

Network Automation Use Case

Real Life Example

The Requirements

Multithreading

Scaling Wide

Configuration Generation

Lessons Learned

But My Use Case Is Unique!

Conclusion

Tags :

Share :

Contents

Recent Posts

December 11, 2024

December 5, 2024

November 25, 2024

November 15, 2024

October 4, 2024

Contact Us to Learn More

Author

Nautobot

What we do

How we do it

Company

Community

Resources

Contact us