Blog Detail
One of the domains that I have worked in during my career is network security. And in this space, when it comes to firewalls, I’ve seen many problem areas such as:
- Human error – ACL updates that have resulted in ACL entries being added incorrectly i.e wrong position within the list (think “deny any any” or “permit any any” at position 1!) resulting in either service outages or unintentional access into your network.
- ACL clutter – Over time, ACLs are added to the rule base. However, when these rules are added, often they encompass other rule sets that are no longer required as they are never hit. Not only does this only present unnecessary clutter for anyone reading the ACL, but also requires greater cycles for the firewall to process.
- Bad actors – Bad actors – an individual, group, or organization interested in attacking IT systems – that have managed to get into the firewall because of a vulnerability or another attack vector and open ports to allow them access into the network.
Based on these pain points I wanted to write an ACL auditing tool based on Batfish, that would automate the checks needed to prevent these issues from occurring, whilst also providing you with a springboard into the world of Batfish and network security automation.
Why Batfish? Batfish provides a great open source, vendor agnostic way to validate ACLs, as we will dive into later.
Note: To fully follow this guide you will need to have both Docker and Docker Compose installed.
Lets begin…
Batfish 101
What is Batfish?
Batfish is an open-source network configuration analysis tool that provides the ability to validate configuration data, query network adjacencies, verify firewall ACL rule sets and also analyze routing/flow paths.1
Batfish runs as a service, i.e a dockerized container. Snapshots of your network are then uploaded to the Batfish service. A snapshot is a collection of information that represents your network, such as device configurations, link/connectivity data and server details such as IP and IPtable settings. Therefore, Batfish requires no direct access to your network, and operates via a purely offline based model.
Batfish then ingests your network snapshot and builds a series of internal vendor agnostic models about your network. These models not only include configuration, but also control plane state such as BGP sessions etc. Questions are then issued to the Batfish service about your network via the Python SDK (pybatfish
) or an Ansible Batfish role. Available question types include:
- Configuration Properties
- Topology
- Routing Protocols
- Routing and Forwarding Tables
- Packet Forwarding
- Access-lists and firewall rules
- Snapshot Input
- VXLAN and EVPN
- Resolving Specifiers
- Differential Questions
Furthermore, Batfish also supports the uploading of multiple snapshots from which you can then perform comparison against, as we will later. Below is an example of using pybatfish
to check the session status of BGP.
>>> bfq.bgpSessionStatus(nodes="/spine|leaf/").answer().frame()
status: TERMINATEDNORMALLY
.... Wed Jun 26 15:01:16 2019 DST Begin job.
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Session_Type Established_Status
0 leaf1 default 64521 None 3.3.3.3 64520 spine1 None 1.1.1.1 EBGP_MULTIHOP ESTABLISHED
1 leaf1 default 64521 None 3.3.3.3 64520 spine2 None 2.2.2.2 EBGP_MULTIHOP ESTABLISHED
2 leaf2 default 64522 None 4.4.4.4 64520 spine1 None 1.1.1.1 EBGP_MULTIHOP ESTABLISHED
3 leaf2 default 64522 None 4.4.4.4 64520 spine2 None 2.2.2.2 EBGP_MULTIHOP ESTABLISHED
4 spine1 default 64520 None 1.1.1.1 64521 leaf1 None 3.3.3.3 EBGP_MULTIHOP ESTABLISHED
5 spine1 default 64520 None 1.1.1.1 64522 leaf2 None 4.4.4.4 EBGP_MULTIHOP ESTABLISHED
6 spine2 default 64520 None 2.2.2.2 64521 leaf1 None 3.3.3.3 EBGP_MULTIHOP ESTABLISHED
7 spine2 default 64520 None 2.2.2.2 64522 leaf2 None 4.4.4.4 EBGP_MULTIHOP ESTABLISHED
Installation
To install Batfish the following commands are run to pull down and then run our Batfish container image.2
docker pull batfish/allinone
docker run --name batfish -d -v batfish-data:/data -p 8888:8888 -p 9997:9997 -p 9996:9996 batfish/allinone
However, for this tutorial we can use a pre-built environment via docker-compose
using the following commands.
git clone git@github.com:networktocode/ntc-soteria.git -b v0.1
cd ntc-soteria
docker-compose build
docker-compose up -d
docker-compose exec ntc-soteria bash
Once run, you will have 2 running containers (Batfish
and ntc-soteria
) and will be placed into the shell of the ntc-soteria
container. This container will have pybatfish
installed and access to the Batfish container.
We will be using ntc-soteria
further when building our ACL auditor, and dive into this further later on in this guide.
Example
Let’s look at a small example. From the ntc-soteria
repo previously cloned, we will use an example Cisco ASA configuration and run a question against our Batfish service.
Next, we fire up our Python interpreter, import the required pybatfish
modules and create a snapshot from the ASA configuration contained within the ./data
directory.
from pybatfish.client.commands import bf_session
from pybatfish.question import bfq
from pybatfish.question.question import load_questions
from acl_auditor.helpers import read_file
asa_config = read_file('data/asa.cfg')
bf_session.host = 'batfish'
load_questions()
bf_session.init_snapshot_from_text(asa_config, snapshot_name="base", overwrite=True)
We can now start asking questions about our snapshot. Below shows the ipOwners
question to get the ip details of the device. Note: answer()
runs the question and returns the answer in a JSON format. frame()
wraps the answer as pandas dataframe. The Pandas Dataframe provides us with a data structure and various methods to parse, maniuplate and iterate the results.
>>> bfq.ipOwners().answer().frame()
status: TRYINGTOASSIGN
.... no task information
status: TERMINATEDNORMALLY
.... 2020-07-03 08:56:32.506000+01:00 Begin job.
Node VRF Interface IP Mask Active
0 fw1 default webfarm 10.0.1.254 24 True
1 fw1 default mgmt 172.29.132.100 24 False
2 fw1 default inside 10.0.0.13 30 True
3 fw1 default outside 192.168.0.254 24 True
Note: answer()
.frame()`.
As mentioned in the previous section, there are numerous questions available. This can also be seen by printing the names (questions) within the bfq
namespace. Like so:
>>> from pprint import pprint
>>> pprint.pprint(dir(bfq))
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'aaaAuthenticationLogin',
'bgpEdges',
'bgpPeerConfiguration',
'bgpProcessConfiguration',
'bgpSessionCompatibility',
'bgpSessionStatus',
'bidirectionalReachability',
'bidirectionalTraceroute',
'compareFilters',
'definedStructures',
'detectLoops',
'differentialReachability',
'edges',
'eigrpEdges',
'evpnL3VniProperties',
'f5BigipVipConfiguration',
'fileParseStatus',
'filterLineReachability',
'filterTable',
'findMatchingFilterLines',
'initIssues',
'interfaceMtu',
'interfaceProperties',
'ipOwners',
'ipsecEdges',
'ipsecSessionStatus',
'isisEdges',
...
From this list you will see 2 questions – filterLineReachability
and compareFilters
. These questions will form the basis of our ACL auditor.
Creating an ACL Auditor
We will now look at how to build an ACL auditor. We will be using the environment via the pre-built repo ntc-soteria
https://github.com/networktocode/ntc-soteria, that used previoulsy to run a simple Batfish example.
Many of you may be asking, what’s the strange name ntc-soteria
. Well,
in Greek mythology, Soteria was the goddess or spirit (daimon) of safety and salvation, deliverance, and preservation from harm.
ACL Auditor Overview
Our ACL auditor will be a CLI based tool, written in Python, powered by Batfish and will provide two types of audits:
- Differential – Compares and reports the differences between a set of reference flows and a configured (implemented) ACL. Reference flows are 5-tuple policy definitions that define what should be permitted or denied by the firewall. By calculating and reporting the difference between the reference flows and implemented flows, we can ensure no unintended traffic is being permitted (or denied) by the firewall. This will be performed via the Batfish question
compareFilters
. - Unreachable Entries – Reports any entries within an ACL that will never be hit due to being shadowed by prior lines within the ACL. This will be performed via the Batfish question
filterLineReachability
.
Audit Types
Let’s look at each audit type in more detail.
Differential
This audit takes 3 pieces of information, a single YAML file containing a set of reference flows, the configuration of your firewall, and also the ACL name in question. It then compares your reference flows and implemented flows to provide you with a set of results showing the differences. The results include:
- flows that your firewall IS permitting or denying but should not as they are not included in your reference flow definition.
- flows that your firewall IS NOT permitting or denying but should as they are included in your reference flow definition.
Some use cases for this audit include:
- Preventing human error during firewall changes. For example, incorrect addition of an “ip any any.”
- Adding to the previous point, you can also add this to your ACL CI pipelines.
- Allows you to run routine scripted checks against your ACL base to ensure no flows are opened incorrectly (for example by bad actors).
Unreachable
This check takes a firewall configuration containing your ACL rule sets. It then reports on any lines that will not match any packet, because of being shadowed by prior lines. The key use cases for this are:
- Prevent human error during firewall changes. For example, incorrect placement of an encompassing deny rule.
- Adding to the previous point, you can also add this to your ACL CI pipelines.
- Assist in keeping your ACL rule sets minimal and free of unnecessary lines. This helps in clarity and also can reduce firewall overhead.
The Code
Code Layout/Files
From the shell you previously entered during the Batfish example earlier, you will now be presented with the following code structure for our tool.
tree .
.
├── Dockerfile // How to assemble the Docker image.
├── Makefile // Set of shell shortcuts. See avail via `make`.
├── README.md // Details about repo.
├── acl_auditor
│ ├── __init__.py
│ ├── auditor.py // Main script file.
│ ├── helpers.py // Various helpers (file, acl generators).
│ ├── report.j2 // HTML report jinja2 template.
│ └── reporter.py // Formats outputs, and renders outputs/report.
├── data
│ ├── asa.cfg // Example ASA configuration.
│ ├── csr.cfg // Example CSR configuration.
│ ├── flows.yml // Example flow reference.
│ ├── report-example.png // Example image of HTML report.
│ └── report.html // Example HTML report.
├── docker-compose.yml // Docker environment definition.
├── poetry.lock // Package management file for Poetry.
├── pyproject.toml // Package management file for Poetry.
└── tests
├── __init__.py
├── test_config.cfg // Test config for unit tests.
├── test_flows.yml // Test flows for unit tests.
└── unit
├── __init__.py
└── test_helpers.py // Unit tests
Based on the files above. At a high level we will:
- Via the CLI, run the
auditor.py
module and pass in a set of inputs. Example inputs have been included within the data directory. auditor.py
contains a classACLAuditor
. This class contains various methods for performing the required Batfish actions.- Once the Batfish operations have been performed the results will be parsed and formatted via the
reporter.py
module, for output via the CLI and/or HTML.
A visual representation of this is below.
Unreachable Entry Audit
Let’s now look at how we build our unreachable audit. As mentioned previously this audit will report on any ACL entries that are shadowed by another ACL rule, and therefore would never be hit. To calculate this result we will use the Batfish question:
bfq.filterLineReachability().answer().frame()
Below shows an overview of the steps that we will perform within this audit.
Build Batfish Session
As per our differential audit, the Batfish session will be created at the point of ACLAuditor
instantiation. Like so:
./acl_auditor/auditor.py
...
class ACLAuditor:
def __init__(self, config_file):
bf_session.host = "batfish"
load_questions()
self.config_file = config_file
Create Snapshot
Next, we need to create a snapshot using our device configuration. We use the same method as we used before, as shown below:
./acl_auditor/auditor.py
...
def _create_base_snapshot(self):
bf_session.init_snapshot_from_text(
self.config_file, snapshot_name="base", overwrite=True
)
Query Batfish
We now query Batfish via the bfq.filterLineReachability()
, like so:
./acl_auditor/auditor.py
...
def get_unreachable_lines(self):
...
return bfq.filterLineReachability().answer()
Output Reports
Just like we did for our previous report we then pass our results into various reporting functions within reporter.py
, which formats the outputs and also deals with the rendering of the HTML template using Jinja2.
Example
We will again use the example ASA configuration supplied. Within this configuration let’s focus in on the following ACL:
access-list acl-inside extended deny ip any4 any4
access-list acl-inside extended permit udp host 10.0.2.1 host 8.8.8.8 eq domain
access-list acl-inside extended permit udp host 10.0.2.1 host 8.8.4.4 eq domain
We run the audit, we get the following results:
./acl_auditor/auditor.py -c unreachable -d data/asa.cfg
+---------------------+-------------------------------------------------+---------------------------+-----------------------+----------------+
| Sources | Unreachable Line | Unreachable Line Action | Blocking Lines | Reason |
|---------------------+-------------------------------------------------+---------------------------+-----------------------+----------------+
| ['fw1: acl-inside'] | permit udp host 10.0.2.1 host 8.8.4.4 eq domain | PERMIT | ['deny ip any4 any4'] | BLOCKING_LINES |
| ['fw1: acl-inside'] | permit udp host 10.0.2.1 host 8.8.8.8 eq domain | PERMIT | ['deny ip any4 any4'] | BLOCKING_LINES |
+---------------------+-------------------------------------------------+---------------------------+-----------------------+----------------+
Here we can see that the line deny ip any4 any4
is blocking the 2 lines for DNS access out to Google. Great!
Differential Audit
So how do we use Batfish to perform a differential audit? That is, how do we compare and report on the differences between a set of reference flows and an ACL. In short we use the Batfish question bfq.compareFilters()
. The questions takes a node name, along with 2 snapshots, containing your ACLs, and then returns the differences.
bfq.compareFilters(nodes='rtr-with-acl').answer(snapshot='filters-change',reference_snapshot='filters').frame()
Unlike the previous audit this one is a little more advanced. Below shows the steps involved. To summarize we will:
- Create a snapshot from our device configuration.
- Convert our YAML reference flows into an ACL.
- Create a reference snapshot using the reference ACL.
- Compare the 2 snapshots.
- Return the results.
Let’s step through the key steps and code:
Build Batfish Session
Our Batfish session will be built within the constructor of the ACLAuditor
class. Like so:
./acl_auditor/auditor.py
class ACLAuditor:
def __init__(self, config_file):
bf_session.host = "batfish"
load_questions()
self.config_file = config_file
...
Convert Reference Flows
First we take a set of reference flows, that we have defined as YAML (as shown below), and convert them into an ACL based format.
./data/flows.yml
---
- source_ip: 10.0.1.1/32
dest_ip: 8.8.8.8/32
dest_port: 53
proto: udp
action: permit
- source_ip: 10.0.1.1/32
dest_ip: 10.200.1.1/32
dest_port: 3306
proto: tcp
action: permit
For this we use YAML to ACL convertor helper functions found within helpers.py
– generate_acl_syntax_juniper_srx()
.
Create Snapshots
We now have our reference flows in an ACL based format. We will use this ACL to generate a reference snapshot. We will then use our device config to generate a base snapshot.
Like so:
...
def _create_base_snapshot(self):
bf_session.init_snapshot_from_text(
self.config_file, snapshot_name="base", overwrite=True
)
def _create_reference_snapshot(self, hostname):
platform = "juniper_srx"
reference_acl = create_acl_from_yaml(
self.flows_file, hostname, self.acl_name, platform
)
bf_session.init_snapshot_from_text(
reference_acl,
platform=platform,
snapshot_name="reference",
overwrite=True,
)
self.validate_reference_snapshot()
Query Batfish
With the 2 snapshots created, we can run our bfq.compareFilters()
question, as shown below.
def get_acl_differences(self, flows_file, acl_name):
...
return bfq.compareFilters().answer(
snapshot="base", reference_snapshot="reference"
)
Output Reports
Once done we then pass our results into various reporting functions within reporter.py
, which formats the outputs and also deals with the rendering of the HTML template using jinja2.
Example
Let’s take our reference flows, which are shown below. These are the flows that should be configured; nothing more, nothing less.
---
- source_ip: 10.0.1.1/32
dest_ip: 8.8.8.8/32
dest_port: 53
proto: udp
action: permit
- source_ip: 10.0.1.1/32
dest_ip: 10.200.1.1/32
dest_port: 3306
proto: tcp
action: permit
In this case, we will use an ASA configuration as our device config. Below shows the ACL in question:
access-list acl-webfarm extended permit tcp any host 10.0.2.1 eq 3306
access-list acl-webfarm extended permit udp host 10.0.1.1 host 8.8.8.8 eq domain
access-list acl-webfarm extended permit udp host 10.0.1.1 host 8.8.4.4 eq domain
access-list acl-webfarm extended deny ip any4 any4
When we run the audit we get the following results:
+------------------------+---------------------------------------------------------+---------------------------+-------------------------------------------------+
| Reference Flow Index | Reference Flow Content | Implemented Flow Action | Implemented Flow Content |
|------------------------+---------------------------------------------------------+---------------------------+-------------------------------------------------+
| 1 | "flow2 (10.0.1.1/32 any 10.200.1.1/32 3306 tcp permit)" | DENY | deny ip any4 any4 |
| No Match | | PERMIT | permit tcp any host 10.0.2.1 eq 3306 |
| No Match | | PERMIT | permit udp host 10.0.1.1 host 8.8.4.4 eq domain |
+------------------------+---------------------------------------------------------+---------------------------+-------------------------------------------------+
So we have 3 differences (failures) that the audit has returned. Lets step through each one by line:
- The reference flow
10.0.1.1/32 any 10.200.1.1/32 3306 tcp permit
is not permitted due to the implemented linedeny ip any4 any4
. - The implemented ACL is permitting
permit tcp any host 10.0.2.1 eq 3306
. However, no match for this flow is found within the reference flows. - Likewise, the implemented ACL is permitting
permit udp host 10.0.1.1 host 8.8.4.4 eq domain
. However,No Match
for this flow is found within the reference flows.
Great, we have detected flows that should have been implemented and also flows that were incorrectly implemented.
HTML Report
We previously ran the audits individually out to just the CLI. However, I’ve also included the option to output the results as an HTML template, as shown below:
This report is generated via an additional -o html
option when running both audits. For example:
./acl_auditor/auditor.py -c all -d data/asa.cfg -r data/flows.yml -a acl-inside -o html
A detailed dive into how the template is constructed and rendered is outside the scope of this article. But the key points are:
- The report is rendered using a jinja2 template (
report.j2
) via thereporter.py
module. - The HTML report generated uses the following Material/Bootstrap framework: https://fezvrasta.github.io/bootstrap-material-design/.
Thanks
A thanks goes out to Ratul Mahajan and Dan Halperin at Intentionet for their help and input into this tool.
References
- “A Hands-on Guide to Multi-Tiered Firewall … – PacketFlow.” 13 Dec. 2019, https://www.packetflow.co.uk/a-hands-on-guide-to-multi-tiered-firewall-changes-with-ansible-and-batfish-part-1/. Accessed 23 Jun. 2020. ↩
- “A Hands-on Guide to Multi-Tiered Firewall … – PacketFlow.” 13 Dec. 2019, https://www.packetflow.co.uk/a-hands-on-guide-to-multi-tiered-firewall-changes-with-ansible-and-batfish-part-1/. Accessed 24 Jun. 2020. ↩
Conclusion
I hope you have enjoyed reading this article as much as I have enjoyed writing it. When it comes to Batfish, I have only really scratched the surface in what you can perform when it comes to flow validation. For example, this audit could be extended to check flows across multiple devices (think dual layer firewall topologies).
I hope this has provided you with a springboard into the world of Batfish, and network security based automation.
Thanks for reading.
-Rick Donato (@rickjdon)
Tags :
Contact Us to Learn More
Share details about yourself & someone from our team will reach out to you ASAP!