One of the domains that I have worked in during my career is network security. And in this space, when it comes to firewalls, I’ve seen many problem areas such as:
Based on these pain points I wanted to write an ACL auditing tool based on Batfish, that would automate the checks needed to prevent these issues from occurring, whilst also providing you with a springboard into the world of Batfish and network security automation.
Why Batfish? Batfish provides a great open source, vendor agnostic way to validate ACLs, as we will dive into later.
Note: To fully follow this guide you will need to have both Docker and Docker Compose installed.
Lets begin…
Batfish is an open-source network configuration analysis tool that provides the ability to validate configuration data, query network adjacencies, verify firewall ACL rule sets and also analyze routing/flow paths.1
Batfish runs as a service, i.e a dockerized container. Snapshots of your network are then uploaded to the Batfish service. A snapshot is a collection of information that represents your network, such as device configurations, link/connectivity data and server details such as IP and IPtable settings. Therefore, Batfish requires no direct access to your network, and operates via a purely offline based model.
Batfish then ingests your network snapshot and builds a series of internal vendor agnostic models about your network. These models not only include configuration, but also control plane state such as BGP sessions etc. Questions are then issued to the Batfish service about your network via the Python SDK (pybatfish
) or an Ansible Batfish role. Available question types include:
Furthermore, Batfish also supports the uploading of multiple snapshots from which you can then perform comparison against, as we will later. Below is an example of using pybatfish
to check the session status of BGP.
>>> bfq.bgpSessionStatus(nodes="/spine|leaf/").answer().frame()
status: TERMINATEDNORMALLY
.... Wed Jun 26 15:01:16 2019 DST Begin job.
Node VRF Local_AS Local_Interface Local_IP Remote_AS Remote_Node Remote_Interface Remote_IP Session_Type Established_Status
0 leaf1 default 64521 None 3.3.3.3 64520 spine1 None 1.1.1.1 EBGP_MULTIHOP ESTABLISHED
1 leaf1 default 64521 None 3.3.3.3 64520 spine2 None 2.2.2.2 EBGP_MULTIHOP ESTABLISHED
2 leaf2 default 64522 None 4.4.4.4 64520 spine1 None 1.1.1.1 EBGP_MULTIHOP ESTABLISHED
3 leaf2 default 64522 None 4.4.4.4 64520 spine2 None 2.2.2.2 EBGP_MULTIHOP ESTABLISHED
4 spine1 default 64520 None 1.1.1.1 64521 leaf1 None 3.3.3.3 EBGP_MULTIHOP ESTABLISHED
5 spine1 default 64520 None 1.1.1.1 64522 leaf2 None 4.4.4.4 EBGP_MULTIHOP ESTABLISHED
6 spine2 default 64520 None 2.2.2.2 64521 leaf1 None 3.3.3.3 EBGP_MULTIHOP ESTABLISHED
7 spine2 default 64520 None 2.2.2.2 64522 leaf2 None 4.4.4.4 EBGP_MULTIHOP ESTABLISHED
To install Batfish the following commands are run to pull down and then run our Batfish container image.2
docker pull batfish/allinone
docker run --name batfish -d -v batfish-data:/data -p 8888:8888 -p 9997:9997 -p 9996:9996 batfish/allinone
However, for this tutorial we can use a pre-built environment via docker-compose
using the following commands.
git clone git@github.com:networktocode/ntc-soteria.git -b v0.1
cd ntc-soteria
docker-compose build
docker-compose up -d
docker-compose exec ntc-soteria bash
Once run, you will have 2 running containers (Batfish
and ntc-soteria
) and will be placed into the shell of the ntc-soteria
container. This container will have pybatfish
installed and access to the Batfish container.
We will be using ntc-soteria
further when building our ACL auditor, and dive into this further later on in this guide.
Let’s look at a small example. From the ntc-soteria
repo previously cloned, we will use an example Cisco ASA configuration and run a question against our Batfish service.
Next, we fire up our Python interpreter, import the required pybatfish
modules and create a snapshot from the ASA configuration contained within the ./data
directory.
from pybatfish.client.commands import bf_session
from pybatfish.question import bfq
from pybatfish.question.question import load_questions
from acl_auditor.helpers import read_file
asa_config = read_file('data/asa.cfg')
bf_session.host = 'batfish'
load_questions()
bf_session.init_snapshot_from_text(asa_config, snapshot_name="base", overwrite=True)
We can now start asking questions about our snapshot. Below shows the ipOwners
question to get the ip details of the device. Note: answer()
runs the question and returns the answer in a JSON format. frame()
wraps the answer as pandas dataframe. The Pandas Dataframe provides us with a data structure and various methods to parse, maniuplate and iterate the results.
>>> bfq.ipOwners().answer().frame()
status: TRYINGTOASSIGN
.... no task information
status: TERMINATEDNORMALLY
.... 2020-07-03 08:56:32.506000+01:00 Begin job.
Node VRF Interface IP Mask Active
0 fw1 default webfarm 10.0.1.254 24 True
1 fw1 default mgmt 172.29.132.100 24 False
2 fw1 default inside 10.0.0.13 30 True
3 fw1 default outside 192.168.0.254 24 True
Note: answer()
.frame()`.
As mentioned in the previous section, there are numerous questions available. This can also be seen by printing the names (questions) within the bfq
namespace. Like so:
>>> from pprint import pprint
>>> pprint.pprint(dir(bfq))
['__builtins__',
'__cached__',
'__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'aaaAuthenticationLogin',
'bgpEdges',
'bgpPeerConfiguration',
'bgpProcessConfiguration',
'bgpSessionCompatibility',
'bgpSessionStatus',
'bidirectionalReachability',
'bidirectionalTraceroute',
'compareFilters',
'definedStructures',
'detectLoops',
'differentialReachability',
'edges',
'eigrpEdges',
'evpnL3VniProperties',
'f5BigipVipConfiguration',
'fileParseStatus',
'filterLineReachability',
'filterTable',
'findMatchingFilterLines',
'initIssues',
'interfaceMtu',
'interfaceProperties',
'ipOwners',
'ipsecEdges',
'ipsecSessionStatus',
'isisEdges',
...
From this list you will see 2 questions – filterLineReachability
and compareFilters
. These questions will form the basis of our ACL auditor.
We will now look at how to build an ACL auditor. We will be using the environment via the pre-built repo ntc-soteria
https://github.com/networktocode/ntc-soteria, that used previoulsy to run a simple Batfish example.
Many of you may be asking, what’s the strange name ntc-soteria
. Well,
in Greek mythology, Soteria was the goddess or spirit (daimon) of safety and salvation, deliverance, and preservation from harm.
Our ACL auditor will be a CLI based tool, written in Python, powered by Batfish and will provide two types of audits:
compareFilters
.filterLineReachability
.Let’s look at each audit type in more detail.
This audit takes 3 pieces of information, a single YAML file containing a set of reference flows, the configuration of your firewall, and also the ACL name in question. It then compares your reference flows and implemented flows to provide you with a set of results showing the differences. The results include:
Some use cases for this audit include:
This check takes a firewall configuration containing your ACL rule sets. It then reports on any lines that will not match any packet, because of being shadowed by prior lines. The key use cases for this are:
From the shell you previously entered during the Batfish example earlier, you will now be presented with the following code structure for our tool.
tree .
.
├── Dockerfile // How to assemble the Docker image.
├── Makefile // Set of shell shortcuts. See avail via `make`.
├── README.md // Details about repo.
├── acl_auditor
│ ├── __init__.py
│ ├── auditor.py // Main script file.
│ ├── helpers.py // Various helpers (file, acl generators).
│ ├── report.j2 // HTML report jinja2 template.
│ └── reporter.py // Formats outputs, and renders outputs/report.
├── data
│ ├── asa.cfg // Example ASA configuration.
│ ├── csr.cfg // Example CSR configuration.
│ ├── flows.yml // Example flow reference.
│ ├── report-example.png // Example image of HTML report.
│ └── report.html // Example HTML report.
├── docker-compose.yml // Docker environment definition.
├── poetry.lock // Package management file for Poetry.
├── pyproject.toml // Package management file for Poetry.
└── tests
├── __init__.py
├── test_config.cfg // Test config for unit tests.
├── test_flows.yml // Test flows for unit tests.
└── unit
├── __init__.py
└── test_helpers.py // Unit tests
Based on the files above. At a high level we will:
auditor.py
module and pass in a set of inputs. Example inputs have been included within the data directory.auditor.py
contains a class ACLAuditor
. This class contains various methods for performing the required Batfish actions.reporter.py
module, for output via the CLI and/or HTML.A visual representation of this is below.
Let’s now look at how we build our unreachable audit. As mentioned previously this audit will report on any ACL entries that are shadowed by another ACL rule, and therefore would never be hit. To calculate this result we will use the Batfish question:
bfq.filterLineReachability().answer().frame()
Below shows an overview of the steps that we will perform within this audit.
As per our differential audit, the Batfish session will be created at the point of ACLAuditor
instantiation. Like so:
./acl_auditor/auditor.py
...
class ACLAuditor:
def __init__(self, config_file):
bf_session.host = "batfish"
load_questions()
self.config_file = config_file
Next, we need to create a snapshot using our device configuration. We use the same method as we used before, as shown below:
./acl_auditor/auditor.py
...
def _create_base_snapshot(self):
bf_session.init_snapshot_from_text(
self.config_file, snapshot_name="base", overwrite=True
)
We now query Batfish via the bfq.filterLineReachability()
, like so:
./acl_auditor/auditor.py
...
def get_unreachable_lines(self):
...
return bfq.filterLineReachability().answer()
Just like we did for our previous report we then pass our results into various reporting functions within reporter.py
, which formats the outputs and also deals with the rendering of the HTML template using Jinja2.
We will again use the example ASA configuration supplied. Within this configuration let’s focus in on the following ACL:
access-list acl-inside extended deny ip any4 any4
access-list acl-inside extended permit udp host 10.0.2.1 host 8.8.8.8 eq domain
access-list acl-inside extended permit udp host 10.0.2.1 host 8.8.4.4 eq domain
We run the audit, we get the following results:
./acl_auditor/auditor.py -c unreachable -d data/asa.cfg
+---------------------+-------------------------------------------------+---------------------------+-----------------------+----------------+
| Sources | Unreachable Line | Unreachable Line Action | Blocking Lines | Reason |
|---------------------+-------------------------------------------------+---------------------------+-----------------------+----------------+
| ['fw1: acl-inside'] | permit udp host 10.0.2.1 host 8.8.4.4 eq domain | PERMIT | ['deny ip any4 any4'] | BLOCKING_LINES |
| ['fw1: acl-inside'] | permit udp host 10.0.2.1 host 8.8.8.8 eq domain | PERMIT | ['deny ip any4 any4'] | BLOCKING_LINES |
+---------------------+-------------------------------------------------+---------------------------+-----------------------+----------------+
Here we can see that the line deny ip any4 any4
is blocking the 2 lines for DNS access out to Google. Great!
So how do we use Batfish to perform a differential audit? That is, how do we compare and report on the differences between a set of reference flows and an ACL. In short we use the Batfish question bfq.compareFilters()
. The questions takes a node name, along with 2 snapshots, containing your ACLs, and then returns the differences.
bfq.compareFilters(nodes='rtr-with-acl').answer(snapshot='filters-change',reference_snapshot='filters').frame()
Unlike the previous audit this one is a little more advanced. Below shows the steps involved. To summarize we will:
Let’s step through the key steps and code:
Our Batfish session will be built within the constructor of the ACLAuditor
class. Like so:
./acl_auditor/auditor.py
class ACLAuditor:
def __init__(self, config_file):
bf_session.host = "batfish"
load_questions()
self.config_file = config_file
...
First we take a set of reference flows, that we have defined as YAML (as shown below), and convert them into an ACL based format.
./data/flows.yml
---
- source_ip: 10.0.1.1/32
dest_ip: 8.8.8.8/32
dest_port: 53
proto: udp
action: permit
- source_ip: 10.0.1.1/32
dest_ip: 10.200.1.1/32
dest_port: 3306
proto: tcp
action: permit
For this we use YAML to ACL convertor helper functions found within helpers.py
– generate_acl_syntax_juniper_srx()
.
We now have our reference flows in an ACL based format. We will use this ACL to generate a reference snapshot. We will then use our device config to generate a base snapshot.
Like so:
...
def _create_base_snapshot(self):
bf_session.init_snapshot_from_text(
self.config_file, snapshot_name="base", overwrite=True
)
def _create_reference_snapshot(self, hostname):
platform = "juniper_srx"
reference_acl = create_acl_from_yaml(
self.flows_file, hostname, self.acl_name, platform
)
bf_session.init_snapshot_from_text(
reference_acl,
platform=platform,
snapshot_name="reference",
overwrite=True,
)
self.validate_reference_snapshot()
With the 2 snapshots created, we can run our bfq.compareFilters()
question, as shown below.
def get_acl_differences(self, flows_file, acl_name):
...
return bfq.compareFilters().answer(
snapshot="base", reference_snapshot="reference"
)
Once done we then pass our results into various reporting functions within reporter.py
, which formats the outputs and also deals with the rendering of the HTML template using jinja2.
Let’s take our reference flows, which are shown below. These are the flows that should be configured; nothing more, nothing less.
---
- source_ip: 10.0.1.1/32
dest_ip: 8.8.8.8/32
dest_port: 53
proto: udp
action: permit
- source_ip: 10.0.1.1/32
dest_ip: 10.200.1.1/32
dest_port: 3306
proto: tcp
action: permit
In this case, we will use an ASA configuration as our device config. Below shows the ACL in question:
access-list acl-webfarm extended permit tcp any host 10.0.2.1 eq 3306
access-list acl-webfarm extended permit udp host 10.0.1.1 host 8.8.8.8 eq domain
access-list acl-webfarm extended permit udp host 10.0.1.1 host 8.8.4.4 eq domain
access-list acl-webfarm extended deny ip any4 any4
When we run the audit we get the following results:
+------------------------+---------------------------------------------------------+---------------------------+-------------------------------------------------+
| Reference Flow Index | Reference Flow Content | Implemented Flow Action | Implemented Flow Content |
|------------------------+---------------------------------------------------------+---------------------------+-------------------------------------------------+
| 1 | "flow2 (10.0.1.1/32 any 10.200.1.1/32 3306 tcp permit)" | DENY | deny ip any4 any4 |
| No Match | | PERMIT | permit tcp any host 10.0.2.1 eq 3306 |
| No Match | | PERMIT | permit udp host 10.0.1.1 host 8.8.4.4 eq domain |
+------------------------+---------------------------------------------------------+---------------------------+-------------------------------------------------+
So we have 3 differences (failures) that the audit has returned. Lets step through each one by line:
10.0.1.1/32 any 10.200.1.1/32 3306 tcp permit
is not permitted due to the implemented line deny ip any4 any4
.permit tcp any host 10.0.2.1 eq 3306
. However, no match for this flow is found within the reference flows.permit udp host 10.0.1.1 host 8.8.4.4 eq domain
. However, No Match
for this flow is found within the reference flows.Great, we have detected flows that should have been implemented and also flows that were incorrectly implemented.
We previously ran the audits individually out to just the CLI. However, I’ve also included the option to output the results as an HTML template, as shown below:
This report is generated via an additional -o html
option when running both audits. For example:
./acl_auditor/auditor.py -c all -d data/asa.cfg -r data/flows.yml -a acl-inside -o html
A detailed dive into how the template is constructed and rendered is outside the scope of this article. But the key points are:
report.j2
) via the reporter.py
module.A thanks goes out to Ratul Mahajan and Dan Halperin at Intentionet for their help and input into this tool.
I hope you have enjoyed reading this article as much as I have enjoyed writing it. When it comes to Batfish, I have only really scratched the surface in what you can perform when it comes to flow validation. For example, this audit could be extended to check flows across multiple devices (think dual layer firewall topologies).
I hope this has provided you with a springboard into the world of Batfish, and network security based automation.
Thanks for reading.
-Rick Donato (@rickjdon)
Share details about yourself & someone from our team will reach out to you ASAP!