Mutable vs Immutable Infrastructure
The whole DevOps and Platform Engineering community has been discussing the subject of immutability for years now. That said, I still think that it is a good idea to write a few words about mutable and immutable infrastructure (not to be mistaken for Python objects) and how this applies to network infrastructure in both on-prem, virtual, and cloud environments.
This is part of a series of posts intended to provide an understanding of Network Automation Principles.
Mutable vs Immutable Infrastructure
In the networking realm, we were used to dealing with our infrastructure in a mutable approach. However, the improvement in network management interfaces and the new pattern of network as a service introduced the immutable approach to network management. Let’s do a quick definition of both approaches:
- Mutable: Traditionally, infrastructure management has been administered as our beloved pets. When we need to change something we “groom” them in order to update them in place. Those changes, whatever they may be, are applied incrementally-meaning one step after the other-to patch the existing configuration. Because of this, rollback operations are a bit complex and error-prone if you need to undo a lot of changes. A typical example of mutable infrastructure is the old bare-metal servers, where you log in and configure them per use case.
- Immutable: Conversely, with the increase in cloud usage and automation, infrastructure management took a turn. We shifted to the “cattle” mentality, “killing” the old and replacing them with the new ones. Whenever a change is required, instead of jumping into a machine and applying it, a new instance (e.g., a VM) is created with the necessary changes already applied. The old instances are replaced entirely and the state of the running infrastructure remains static after deployment. If the change is faulty, simply use the previous instance or image. Containers, for example, are immutable infrastructure—every time a new version needs to be deployed, the container is re-created.
Not What, but How
For the purposes of this document, I want to present an example that showcases the fact that the same infrastructure type may be handled either being named, as a pet, or numbered, as cattle. A good example is how you manage virtual machines: either you deploy and configure them step-by-step or have tools to create a new VM and replace the old one. Notice that we are not talking about doing it manually or in an automated way, both are compatible. So mutable or immutable infrastructure, in reality, is dictated by the management approach (obviously, technology has to support us).
In the next figure, you can observe both approaches:
Networking Realm
Now, focusing on the networking realm, the easier equivalent of the VM example is probably the virtualized appliances, often referred to as Network Function Virtualization (NFV). For example, if you want to upgrade a Network Virtual Appliance (NVA) in a cloud provider, an immutable approach would be to deploy a new one with the intended version and shift the traffic to it. If everything works as expected, destroy the old instance, otherwise shift the traffic back.
Another topic is configuration management, where replacing the whole device is obviously not possible. Due to the consequences of misconfigurations, the network industry often hesitates in adopting configuration approaches other than traditional CLI. Furthermore, novel approaches like network automation could lean either way, similar to the Virtual Machine example from above. For example, configuring a device using commands that alter its configuration can be considered a mutable approach. On the other hand, rendering the whole configuration and replacing the running one resembles an approach based on immutability, e.g., the config replace
function in Cisco IOS and Arista EOS. That may sound simple, and it is not a new request (see RFC 3535 from 2002), but not all platforms provide the required functionality.
So, let’s understand the pros and cons of each approach.
Benefits
A clear immediate gain of adopting the immutable approach is the easy and clean rollback. If even a single line of the configuration is not properly applied, the whole device rolls back to a healthy state (i.e., the previous configuration). Many seasoned engineers will relate to the fear of losing SSH access to the device due to a partially applied ACL? For the good part, the whole idea is based on the fact that you must have quick rollback functionality in place, eliminating the need for manual interventions and late-night calls.
Another benefit is that deployments tend to increase consistency after a while because they enforce repeatability. That also gives a predictable behavior across the organization, eliminates configuration drift, and reduces issues between the various environments. As a result, testing and troubleshooting benefit from the lack of snowflake exceptions.
This way scalability of the infrastructure gets a big boost, and you can easily scale to meet demand. The deployment of new instances becomes easier and faster as no decision mechanism is involved; every new one adheres to the same architectural decisions. Moreover, every improvement is adopted across the organization—ensuring compliance with configuration and security standards.
Challenges
However, as with everything in life, there are some challenges when adopting something new. First and foremost, you need something in place to start with, and it requires investment both in tools and in processes. This by itself adds complexity throughout the infrastructure. However, the induced friction can be reduced with proper strategy and becomes easier to consume by adopting it in stages.
A second point to notice is that network equipment tends to be a bit legacy sometimes, in which case replacement operations may not be supported. Another challenge is the implicit state in the network infrastructure (e.g., adjacencies). Those topics do not play well with immutable infrastructure concepts, and deeper discussion needs to be made on each one.
Finally, the most significant challenge. Monitoring and testing are becoming a crucial part of the organization and the everyday life of the engineers. Every workflow should start and end with testing, because an immutable approach forces you to become more aware of the change outcome and impact; the immutable approach is a binary operation that either works or fails as a whole. Why this is challenging, well it depends—in my opinion, traditional manual configuration should also have solutions in place for monitoring. At the same time, it is common to lag behind in those areas because the process most often relies on manual interventions. On the other hand, in immutable environments the blast radius of a change potentially increases to catastrophic levels. However, by adopting proper testing and monitoring, risks can be reduced and confidence can be gained through deployments.
Principles
In order to adopt a more immutable approach for networking, some requirements need to be defined.
- Configuration should be defined in a declarative manner. So the data to generate the intended configuration state should be structured in files (e.g., YAML) or stored in a Source of Truth—in any case, outside of the actual device.
- Change failures must be mitigated by rollbacks. So in order to be able to do it and trace configuration changes across time, configuration artifacts (including config templates and rendered configs) should be versioned. That makes me think that a version control system is ideal for storing configuration artifacts, which also adds the benefit of having mechanisms for reverting back in time and easily visualizing changes.
The change to an immutable approach clearly adds complexity but also strives for the minimization of manual configuration operations, which pose a risk in immutability. This highlights the importance of using automation frameworks for configuring the actual devices. When you have the tools in place, use them also for testing and validating the outcome, to reduce risks and downtime. If that validation fails, change the whole configuration to a previous, stable state. As you see, automation facilitates the effort to minimize manual configuration updates.
Conclusion
I understand that the applicability of immutable infrastructure concepts within networking is not as far-reaching as it is in other IT disciplines. From my perspective, in the last years, we have seen a lot of those practices being adopted by infrastructure in general. And to be fair, an immutable approach helps to adopt DevOps practices, so why not explore that concept with networking?
All that said, I will finish this post with a question: Do you have anything in your networking or network automation solution that can be immutable? If yes, what’s holding you back from implementation?
-Gerasimos
Contact Us to Learn More
Share details about yourself & someone from our team will reach out to you ASAP!