Automation Principles – Robustness Principle
Robustness Principle History
This is part of a series of posts to understand Network Automation Principles.
As a traditional network engineer, the robustness principle has an interesting history to it. It is often called Postel’s Law, after the author of RFC 761 Jon Postel, in his initial RFC for TCP. Specifically, in section 2.10, he mentions:
TCP implementations should follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others.
Which is often rewritten as:
Be conservative in what you send, be liberal in what you accept.
Jon’s goal was to ensure resilient and flexible communication between different networked systems. While this was initially aimed at protocol design, it has broader applications across other domains such as automation, software development, and network engineering.
Robustness in Network Automation
Network automation requires interactions between various systems (e.g., devices, controllers, etc.), via diverse APIs. Following the Robustness Principle can help make automation workflows more resilient.
Let’s explore how this applies to different aspects of network automation.
Be Strict in What You Send
When developing automation scripts, ensure that the data you generate and send to APIs, devices, or external systems adheres to a well-defined schema. This prevents ambiguity and makes troubleshooting easier.
For example, if your automation script sends API requests to a Source of Truth (SoT), make sure your payloads conform to the expected format:
{
"device_name": "nyc-core01",
"interface": "GigabitEthernet0/1",
"description": "Uplink to ISP"
}
Avoid sending loosely structured data that could introduce inconsistencies, and adhere to a well-defined schema. The next data payload, even though it references the same data (i.e., values), is using a different schema (i.e., keys) that need to be interpreted accordingly.
{
"device": "nyc-core01",
"desc": "Uplink to ISP",
"if_name": "Gig0/1"
}
In this second example, we introduce variability in key names and formatting, which can lead to failures or unintended behavior.
Be Liberal in What You Accept
Devices, APIs, and data sources are not always consistent, especially over different versions. Handling inconsistencies gracefully makes automation workflows more resilient. A strict parser that rejects any unexpected data might cause unnecessary failures, whereas a more tolerant parser can correct minor issues.
For instance, suppose you’re processing API responses that are typically a list of dictionaries (as there are generally multiple in your response), but the API may send a single dictionary (often a challenge in XML to JSON conversion), or that the dictionaries’ values are always strings, but sometimes you could get integers. To ensure consistency, convert single dictionary inputs into a structured dictionary where the key is the interface name and typecast VLANs to integers:
# Assume response_data is retrieved from an API
response_data = {
"interface": "GigabitEthernet0/1",
"description": " Uplink to ISP ", # Extra spaces
"data_vlan": "100", # As string
"voice_vlan": 200, # As integer
}
# Normalize response_data into a dictionary structure
if isinstance(response_data, dict):
response_data = {response_data["interface"]: response_data}
# Type cast VLANs and strip description spaces
for iface, entry in response_data.items():
entry["description"] = entry.get("description", "").strip()
entry["data_vlan"] = int(entry.get("data_vlan", 0))
entry["voice_vlan"] = int(entry.get("voice_vlan", 0))
print(response_data)
By ensuring a structured format and casting VLANs to integers regardless of input type, your automation remains predictable and resilient.
Avoid Propagating Bad Data
Being “liberal” does not mean blindly accepting and forwarding incorrect data. Instead, apply validation and normalization to ensure that incorrect data does not cascade into larger problems.
When processing VLAN information from different sources, enforce valid VLAN ranges to prevent misconfigurations:
def validate_vlan(vlan):
if not (1 <= vlan <= 4094):
raise ValueError(f"Invalid VLAN: {vlan}")
return vlan
# Example usage
try:
vlan_id = validate_vlan(5000) # This will raise an error
except ValueError as e:
print(e)
This prevents automation from pushing invalid VLAN configurations to the network.
Handle Failures Gracefully
Network automation should be robust against failures and avoid leaving systems in an inconsistent state. Implementing retries, logging, session clearing, and rollback mechanisms can help.
In this example, we will ensure you close your session regardless of any error:
def tcp_ping(ip: str, port: int, timeout: int = 1) -> bool:
sckt = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sckt.settimeout(int(timeout))
try:
sckt.connect((ip, int(port)))
sckt.shutdown(socket.SHUT_RDWR)
return True
except socket.timeout:
return False
finally:
# Most importantly, we ensure that we handle any issues gracefully.
sckt.close()
This ensures that there are no hanging sessions, because in any case (successful or failure) the TCP socket is closed.
Logging Relevant Data Events
Sometimes, when accepting potentially bad data, it is not always clear what the best option is. You may want to fail early and fail often; however, you may want to ignore the issue, but log and inform users. This is often a challenge when systems are updated, as some part of the data that was required in one version is no longer required.
Let’s build on one of the prior examples:
import logging
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
# Define expected keys
expected_keys = {"description", "data_vlan", "voice_vlan"}
# Type cast VLANs, strip description spaces, and log unexpected keys
for iface, entry in response_data.items():
entry["description"] = entry.get("description", "").strip()
entry["data_vlan"] = int(entry.get("data_vlan", 0))
entry["voice_vlan"] = int(entry.get("voice_vlan", 0))
# Log unexpected keys
unknown_keys = set(entry.keys()) - expected_keys
for key in unknown_keys:
logging.warning(f"Unexpected key '{key}' found in interface {iface}, value: {entry[key]}")
print(response_data)
# Result:
# 2025-07-16 03:32:20,840 - WARNING - Unexpected key 'interface' found in interface GigabitEthernet0/1, value: GigabitEthernet0/1
This allows the user to be aware of what errors exist without breaking normal data operations.
Conclusion
Final Thoughts
A pattern of always being strict and “fail early and fail often” can be a frustrating experience. I know of a recent change to docker-compose cases, an error: docker-compose.yml: version is obsolete,
led to hundreds of issues that were and still are rather annoying. It’s an innocuous setting, with no real issue key and silently ignored.
Careful consideration should be used to make your tools and automation user-friendly, and the Robustness Principle can help you on your journey.
-Ken