Ansible – BGP Networking Troubleshooting Guide

Network troubleshooting is a common automation use case. Network outages are costly and time-consuming and often require the network engineers to log into network equipment and manually investigate network issues. Working on network operations teams, I quickly noticed that troubleshooting network problems is a playbook of repeatable steps, hence the rationale for automating network troubleshooting with Ansible.

Use Case – BGP

Troubleshooting Layer 3 connectivity tends to lead an operations engineer to jump into multiple routers and check routing. Let’s say internet access has been lost from the WAN edge. If I were troubleshooting this, my instincts would tell me to go to my edge router(s) and check the BGP neighbor going towards my ISP.

east-rtr#show ip bgp summary

<...output omitted...>

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
4.4.4.4      4   400       0       0        0    0    0 08:11    Idle

From the output of show ip bgp summary the issue can determined, BGP is down toward the ISP. How can Ansible help? This is a simplified example with one router and one WAN connection, but what happens if you have 10, 15, or more BGP relationships you need to check. It is costly to manually log in to each router to check the status of BGP. How can Ansible help?

Checking BGP with Ansible

Here is a sequential listing of what the Ansible playbook is doing.

  1. Run show ip bgp summary outputs from ISP routers.
  2. Use ansible-napalm to get BGP facts on the neighbors for easy reporting.
  3. Create an easy-to-consume report using a Jinja2 template to create a report with BGP neighbor status.
  4. Assemble all the device reports into a single overview report.
  5. Iterate through the neighbors and if a neighbor is down, attempt to ping the destination IP to verify Layer 3 reachability using napalm-ping.

Pre-req

There needs to be a valid Ansible inventory, either a static inventory file or dynamic inventories utilizing an existing SoT (Source of Truth). For demonstration purposes a static file will be used.

inventory.cfg

[isp_routers]

[isp_routers:vars]
ansible_network_os=ios

[isp_routers:children]
east_isp
west_isp

[east_isp]
east-rtr

[west_isp]
west-rtr

For help building an inventory file. See Ansible Inventory

Step 1

Create a simple playbook to execute show ip bgp neighbors on all of the routers in the group called isp_routers.

---
- name: "PLAY:1 - GET BGP SUMMARY"
  gather_facts: False
  connection: "network_cli"
  hosts: "isp_routers"
  tasks:
  - name: "TASK:1 - 'SHOW IP BGP SUMMARY'"
    ios_command:
      commands: "show ip bgp summary"
    register: "output_ios"
  - name: "TASK:2 - PRINT BGP OUTPUT"
    debug:
      msg: "{{ output_ios.stdout[0] }}"

Running the playbook results in the following output.

▶ ansible-playbook pb.yml -u ntc -k
SSH password: 

PLAY [PLAY:1 - GET BGP SUMMARY] **************************************************************************************************************************************************************************************

TASK [TASK:1 - 'SHOW IP BGP SUMMARY'] ********************************************************************************************************************************************************************************

ok: [east-rtr]
ok: [west-rtr]

TASK [TASK:2 - PRINT BGP OUTPUT] *************************************************************************************************************************************************************************************
ok: [east-rtr] => {
    "msg": "BGP router identifier 1.1.1.1, local AS number 100\nBGP table version is 416, main routing table version 416\n28 network entries using 6944 bytes of memory\n41 path entries using 5576 bytes of memory\n8/7 BGP path/bestpath attribute entries using 2304 bytes of memory\n4 BGP AS-PATH entries using 128 bytes of memory\n0 BGP route-map cache entries using 0 bytes of memory\n0 BGP filter-list cache entries using 0 bytes of memory\nBGP using 14952 total bytes of memory\nBGP activity 124/96 prefixes, 232/191 paths, scan interval 60 secs\n32 networks peaked at 23:40:21 Jan 7 2021 UTC (6w5d ago)\n\nNeighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd\n4.4.4.4      4   400       0       0        0    0    0 08:21    Idle"
}
ok: [west-rtr] => {
    "msg": "BGP router identifier 2.2.2.2, local AS number 100\nBGP table version is 579, main routing table version 579\n28 network entries using 6944 bytes of memory\n41 path entries using 5576 bytes of memory\n8/7 BGP path/bestpath attribute entries using 2304 bytes of memory\n4 BGP AS-PATH entries using 128 bytes of memory\n0 BGP route-map cache entries using 0 bytes of memory\n0 BGP filter-list cache entries using 0 bytes of memory\nBGP using 14952 total bytes of memory\nBGP activity 158/130 prefixes, 267/226 paths, scan interval 60 secs\n32 networks peaked at 23:40:21 Jan 7 2021 UTC (6w5d ago)\n\nNeighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd\n8.8.8.8      4   400       0       0        0    0    0 18:52    1"
}

PLAY RECAP ***********************************************************************************************************************************************************************************************************
east-rtr   : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
west-rtr   : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0    

At this point you have a single pane to quickly check all the BGP neighbors; however, it’s hard to read the output. To take this playbook to the next level, we can easily take command output and create structured data using one of the various cli parsing modules.

What if more information is needed? You could check route counts or layer 3 reachability.

Let’s dig into this use case further.

Step 2

Use napalm-ansible module to run get-facts on BGP.

Note: For readability the rest of the task will be in a PLAY:2 of the playbook.

- name: "PLAY:2 - USE NAPALM BGP FACTS"
  gather_facts: False
  connection: "network_cli"
  hosts: "isp_routers"
  tasks:
  - name: "TASK:1 - 'GET BGP FACTS'"
    napalm_get_facts:
      filter:
        - "bgp_neighbors"
    register: "bgp"
  - debug: var=bgp

Results in:

▶ ansible-playbook napalm_pb.yml -u ntc -k
SSH password: 

PLAY [PLAY:2 - USE NAPALM BGP FACTS] *********************************************************************************************************************************************************************************

TASK [TASK:1 - 'GET BGP FACTS'] **************************************************************************************************************************************************************************************
ok: [east-rtr]
ok: [west-rtr]

TASK [debug] *********************************************************************************************************************************************************************************************************
ok: [east-rtr]] => {
    "bgp": {
        "ansible_facts": {
            "napalm_bgp_neighbors": {
                "global": {
                    "peers": {
                        "4.4.4.4": {
                            "address_family": {
                                "ipv4 unicast": {
                                    "accepted_prefixes": 0,
                                    "received_prefixes": 0,
                                    "sent_prefixes": 0
                                }
                            },
                            "description": "",
                            "is_enabled": true,
                            "is_up": false,
                            "local_as": 100,
                            "remote_as": 400,
                            "remote_id": "4.4.4.4",
                            "uptime": 0
                        },
                    "router_id": "1.1.1.1"
                }
            }
        },
        "changed": false,
        "failed": false
    }
}
ok: [west-rtr] => {
    "bgp": {
        "ansible_facts": {
            "napalm_bgp_neighbors": {
                "global": {
                    "peers": {
                        "8.8.8.8": {
                            "address_family": {
                                "ipv4 unicast": {
                                    "accepted_prefixes": 1,
                                    "received_prefixes": 1,
                                    "sent_prefixes": 11
                                }
                            },
                            "description": "",
                            "is_enabled": true,
                            "is_up": true,
                            "local_as": 100,
                            "remote_as": 400,
                            "remote_id": "8.8.8.8",
                            "uptime": 1641600
                        },
                    "router_id": "2.2.2.2"
                }
            }
        },
        "changed": false,
        "failed": false
    }
}

PLAY RECAP ***********************************************************************************************************************************************************************************************************
east-rtr   : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   
west-rtr   : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

The playbook returns structured operational data on the BGP neighbors. This data can easily be used to build a report.

Step 3

Create a report with the validation of BGP.

We can take the registered data from TASK:1 and pass it to the template module where a Jinja2 template can be used to create a report.

We will add TASK:2 to our PLAY.

---
- name: "TASK:2 - 'GENERATE REPORTS'"
  template:
    src: "./templates/bgp_report.j2"
    dest: "./build/{{ inventory_hostname }}.txt"

An example of a Jinja2 template can be seen below:

bgp_report.j2


Hostname: {{ inventory_hostname }}
-----------------
{% for neighbor, details in bgp["ansible_facts"]["napalm_bgp_neighbors"]["global"]["peers"].items() %}
Neighbor:           {{ neighbor }}
Enabled:            {{ details["is_enabled"] }}
Neighbor_UP:        {{ details["is_up"] }}
Accepted Prefixes:  {{ details['address_family']['ipv4 unicast']['accepted_prefixes'] }}
Received Prefixes:  {{ details['address_family']['ipv4 unicast']['received_prefixes'] }}
Sent Prefixes:      {{ details['address_family']['ipv4 unicast']['sent_prefixes'] }}

{% endfor %}

The Jinja2 template will render the structured data into a human-readable format.

Example:

Hostname: east-rtr
-----------------
Neighbor:          4.4.4.4
Enabled:           True
Neighbor_UP:       False
Accepted Prefixes: 0
Received Prefixes: 0
Sent Prefixes:     0

Step 4

With multiple devices in our inventory group, a file per device will be written. Parsing through multiple files can slow down the time to resolution; therefore, merging all these files together into one all-encompassing report will be done in the next task.

The Ansible assemble module will be used to merge all the reports together.

- name: "TASK:3 - ASSEMBLE REPORTING FROM HOST DETAILS"
    assemble:
    src: "./build"  # Directory with files to merge.
    dest: "./reports/report.txt"  # Merged output filename.

Once TASK:3 executes, one report is generated with the following output:

Hostname: east-rtr
-----------------
Neighbor:          4.4.4.4
Enabled:           True
Neighbor_UP:       False
Accepted Prefixes: 0
Received Prefixes: 0
Sent Prefixes:     0

Hostname: west-rtr
-----------------
Neighbor:          8.8.8.8
Enabled:           True
Neighbor_UP:       True
Accepted Prefixes: 8
Received Prefixes: 8
Sent Prefixes:     22

Now a single easy-to-read file exists to look at neighbors. We see east-rtr has a BGP neighbor that is DOWN.

Step 5

Check whether any DOWN neighbor is reachable via ping.

- name: "TASK:4 - PING BGP NEIGHBORS THAT ARE DOWN"
  napalm_ping:
    hostname: "{{ inventory_hostname }}"
    username: "{{ ansible_user }}"
    password: "{{ ansible_password }}"
    dev_os: "{{ ansible_network_os }}"
    destination: "{{ item.key }}"
  loop: "{{ bgp['ansible_facts']['napalm_bgp_neighbors']['global']['peers'] | dict2items }}"
  when: "not item.value.is_up"
  register: neighbor_down

After the reachability check is completed, print the results for the DOWN neighbors.

- name: "TASK:5 - PRINT PING RESULTS FOR DOWN NEIGHBORS"
  debug:
    msg: "{{ item['ping_results'] }}"
  loop: "{{ neighbor_down['results'] }}"
  when: "item['ping_results'] is defined"

TASK:5 example output:

    "msg": {
        "success": {
            "packet_loss": 5,
            "probes_sent": 5,
            "results": [],
            "rtt_avg": 0.0,
            "rtt_max": 0.0,
            "rtt_min": 0.0,
            "rtt_stddev": 0.0
        }
    }
}

Playbook Summary

Valuable troubleshooting data was gathered by running this playbook. A BGP neighbor is down on east-rtr. Details about all neighbors were also collected, including: enabled state, current neighbor state, and sent/received route counts. Finally, for any DOWN neighbors a reachability check using ping was performed. Most importantly, all this data was assembled across all our isp_routers in just seconds. This was still a simplified example with only two routers, but extrapolating this across tens, hundreds, or more routers is very powerful.

It is important to mention that additional tasks could be added to this playbook to troubleshoot further, for example:

  • Check the routing to the neighbor IP.
  • Grab the next-hop IP from the route entry.
  • Verify that the ARP table for the next-hop IP has a MAC entry.

Full Playbook

- name: "PLAY:1 - GET BGP SUMMARY"
  gather_facts: False
  connection: "network_cli"
  hosts: "isp_routers"
  tasks:
    - name: "TASK:1 - 'SHOW IP BGP SUMMARY'"
      ios_command:
        commands: "show ip bgp summary"
      register: "output_ios"
    - name: "TASK:2 - PRINT BGP OUTPUT"
      debug:
        msg: "{{ output_ios.stdout[0] }}"
- name: "PLAY:2 - USE NAPALM BGP FACTS"
  gather_facts: False
  connection: "network_cli"
  hosts: "isp_routers"
  tasks:
    - name: "TASK:1 - 'GET BGP FACTS'"
      napalm_get_facts: filter="bgp_neighbors"
      register: "bgp"
    - debug: var=bgp
    - name: "TASK:2 - 'GENERATE REPORT'"
      template:
        src: "./templates/bgp_report.j2"
        dest: "./build/{{ inventory_hostname }}.txt"
    - name: "TASK:3 - ASSEMBLE REPORTING FROM HOST DETAILS"
      assemble:
        src: "./build"
        dest: "./reports/report.txt"
    - name: "TASK:4 - PING BGP NEIGHBORS THAT ARE DOWN"
      napalm_ping:
        hostname: "{{ inventory_hostname }}"
        username: "{{ ansible_user }}"
        password: "{{ ansible_password }}"
        dev_os: "{{ ansible_network_os }}"
        destination: "{{ item['key'] }}"
      with_dict: "{{ bgp['ansible_facts']['napalm_bgp_neighbors']['global']['peers'] }}"
      when: "not item['value']['is_up']"
      register: "neighbor_down"
    - name: "TASK:5 - PRINT PING RESULTS FOR DOWN NEIGHBORS"
      debug:
        msg: "{{ item['ping_results'] }}"
      loop: "{{ neighbor_down['results'] }}"
      when: "item['ping_results'] is defined"

Conclusion

BGP troubleshooting is one of a multitude of operational troubleshooting playbooks that could be executed for troubleshooting connectivity issues. Taking these same steps to other use cases can greatly improve MTTR on network issues and outages. Furthermore, these playbooks can be extended using a module to update ITSM ticket notes, or even for use during an existing daily network readiness task.

-Jeff



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Author