Network troubleshooting is a common automation use case. Network outages are costly and time-consuming and often require the network engineers to log into network equipment and manually investigate network issues. Working on network operations teams, I quickly noticed that troubleshooting network problems is a playbook of repeatable steps, hence the rationale for automating network troubleshooting with Ansible.
Use Case – BGP
Troubleshooting Layer 3 connectivity tends to lead an operations engineer to jump into multiple routers and check routing. Let’s say internet access has been lost from the WAN edge. If I were troubleshooting this, my instincts would tell me to go to my edge router(s) and check the BGP neighbor going towards my ISP.
east-rtr#show ip bgp summary<...output omitted...>Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd4.4.4.444000000008:11 Idle
From the output of show ip bgp summary the issue can determined, BGP is down toward the ISP. How can Ansible help? This is a simplified example with one router and one WAN connection, but what happens if you have 10, 15, or more BGP relationships you need to check. It is costly to manually log in to each router to check the status of BGP. How can Ansible help?
Checking BGP with Ansible
Here is a sequential listing of what the Ansible playbook is doing.
Run show ip bgp summary outputs from ISP routers.
Use ansible-napalm to get BGP facts on the neighbors for easy reporting.
Create an easy-to-consume report using a Jinja2 template to create a report with BGP neighbor status.
Assemble all the device reports into a single overview report.
Iterate through the neighbors and if a neighbor is down, attempt to ping the destination IP to verify Layer 3 reachability using napalm-ping.
Pre-req
There needs to be a valid Ansible inventory, either a static inventory file or dynamic inventories utilizing an existing SoT (Source of Truth). For demonstration purposes a static file will be used.
Create a simple playbook to execute show ip bgp neighbors on all of the routers in the group called isp_routers.
----name:"PLAY:1 - GET BGP SUMMARY"gather_facts: Falseconnection:"network_cli"hosts:"isp_routers"tasks:-name:"TASK:1 - 'SHOW IP BGP SUMMARY'"ios_command:commands:"show ip bgp summary"register:"output_ios"-name:"TASK:2 - PRINT BGP OUTPUT"debug:msg:"{{ output_ios.stdout[0] }}"
Running the playbook results in the following output.
▶ ansible-playbook pb.yml -u ntc -kSSH password:PLAY [PLAY:1- GET BGP SUMMARY] **************************************************************************************************************************************************************************************TASK [TASK:1-'SHOW IP BGP SUMMARY'] ********************************************************************************************************************************************************************************ok: [east-rtr]ok: [west-rtr]TASK [TASK:2- PRINT BGP OUTPUT] *************************************************************************************************************************************************************************************ok: [east-rtr] =>{"msg": "BGP router identifier 1.1.1.1, local AS number 100\nBGP table version is 416, main routing table version 416\n28 network entries using 6944 bytes of memory\n41 path entries using 5576 bytes of memory\n8/7 BGP path/bestpath attribute entries using 2304 bytes of memory\n4 BGP AS-PATH entries using 128 bytes of memory\n0 BGP route-map cache entries using 0 bytes of memory\n0 BGP filter-list cache entries using 0 bytes of memory\nBGP using 14952 total bytes of memory\nBGP activity 124/96 prefixes, 232/191 paths, scan interval 60 secs\n32 networks peaked at 23:40:21 Jan 7 2021 UTC (6w5d ago)\n\nNeighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd\n4.4.4.4 4 400 0 0 0 0 0 08:21 Idle"}ok: [west-rtr] =>{"msg": "BGP router identifier 2.2.2.2, local AS number 100\nBGP table version is 579, main routing table version 579\n28 network entries using 6944 bytes of memory\n41 path entries using 5576 bytes of memory\n8/7 BGP path/bestpath attribute entries using 2304 bytes of memory\n4 BGP AS-PATH entries using 128 bytes of memory\n0 BGP route-map cache entries using 0 bytes of memory\n0 BGP filter-list cache entries using 0 bytes of memory\nBGP using 14952 total bytes of memory\nBGP activity 158/130 prefixes, 267/226 paths, scan interval 60 secs\n32 networks peaked at 23:40:21 Jan 7 2021 UTC (6w5d ago)\n\nNeighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd\n8.8.8.8 4 400 0 0 0 0 0 18:52 1"}PLAY RECAP ***********************************************************************************************************************************************************************************************************east-rtr: ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0west-rtr: ok=2 changed=0 unreachable=0 failed=0 skipped=0 rescued=0 ignored=0
At this point you have a single pane to quickly check all the BGP neighbors; however, it’s hard to read the output. To take this playbook to the next level, we can easily take command output and create structured data using one of the various cli parsing modules.
With multiple devices in our inventory group, a file per device will be written. Parsing through multiple files can slow down the time to resolution; therefore, merging all these files together into one all-encompassing report will be done in the next task.
The Ansible assemble module will be used to merge all the reports together.
-name:"TASK:3 - ASSEMBLE REPORTING FROM HOST DETAILS"assemble:src:"./build" # Directory with files to merge.dest:"./reports/report.txt" # Merged output filename.
Once TASK:3 executes, one report is generated with the following output:
After the reachability check is completed, print the results for the DOWN neighbors.
-name:"TASK:5 - PRINT PING RESULTS FOR DOWN NEIGHBORS"debug:msg:"{{ item['ping_results'] }}"loop:"{{ neighbor_down['results'] }}"when:"item['ping_results'] is defined"
Valuable troubleshooting data was gathered by running this playbook. A BGP neighbor is down on east-rtr. Details about all neighbors were also collected, including: enabled state, current neighbor state, and sent/received route counts. Finally, for any DOWN neighbors a reachability check using ping was performed. Most importantly, all this data was assembled across all our isp_routers in just seconds. This was still a simplified example with only two routers, but extrapolating this across tens, hundreds, or more routers is very powerful.
It is important to mention that additional tasks could be added to this playbook to troubleshoot further, for example:
Check the routing to the neighbor IP.
Grab the next-hop IP from the route entry.
Verify that the ARP table for the next-hop IP has a MAC entry.
Full Playbook
-name:"PLAY:1 - GET BGP SUMMARY"gather_facts: Falseconnection:"network_cli"hosts:"isp_routers"tasks:-name:"TASK:1 - 'SHOW IP BGP SUMMARY'"ios_command:commands:"show ip bgp summary"register:"output_ios"-name:"TASK:2 - PRINT BGP OUTPUT"debug:msg:"{{ output_ios.stdout[0] }}"-name:"PLAY:2 - USE NAPALM BGP FACTS"gather_facts: Falseconnection:"network_cli"hosts:"isp_routers"tasks:-name:"TASK:1 - 'GET BGP FACTS'"napalm_get_facts: filter="bgp_neighbors"register:"bgp"-debug:var=bgp-name:"TASK:2 - 'GENERATE REPORT'"template:src:"./templates/bgp_report.j2"dest:"./build/{{ inventory_hostname }}.txt"-name:"TASK:3 - ASSEMBLE REPORTING FROM HOST DETAILS"assemble:src:"./build"dest:"./reports/report.txt"-name:"TASK:4 - PING BGP NEIGHBORS THAT ARE DOWN"napalm_ping:hostname:"{{ inventory_hostname }}"username:"{{ ansible_user }}"password:"{{ ansible_password }}"dev_os:"{{ ansible_network_os }}"destination:"{{ item['key'] }}"with_dict:"{{ bgp['ansible_facts']['napalm_bgp_neighbors']['global']['peers'] }}"when:"not item['value']['is_up']"register:"neighbor_down"-name:"TASK:5 - PRINT PING RESULTS FOR DOWN NEIGHBORS"debug:msg:"{{ item['ping_results'] }}"loop:"{{ neighbor_down['results'] }}"when:"item['ping_results'] is defined"
Conclusion
BGP troubleshooting is one of a multitude of operational troubleshooting playbooks that could be executed for troubleshooting connectivity issues. Taking these same steps to other use cases can greatly improve MTTR on network issues and outages. Furthermore, these playbooks can be extended using a module to update ITSM ticket notes, or even for use during an existing daily network readiness task.
Does this all sound amazing? Want to know more about how Network to Code can help you do this, reach out to our sales team. If you want to help make this a reality for our clients, check out our careers page.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies. In case of sale of your personal information, you may opt out by using the link Do not sell my personal information. Privacy | Cookies
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
__hssc
30 minutes
HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc
session
This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement
1 year
Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent
1 year
CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Cookie
Duration
Description
__cf_bm
30 minutes
Cloudflare set the cookie to support Cloudflare Bot Management.
li_gc
5 months 27 days
Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc
1 day
LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory
1 month
LinkedIn sets this cookie for LinkedIn Ads ID syncing.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Cookie
Duration
Description
__hstc
5 months 27 days
Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga
1 year 1 month 4 days
Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_gat_gtag_UA_*
1 minute
Google Analytics sets this cookie to store a unique user ID.
_gid
1 day
Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
AnalyticsSyncHistory
1 month
Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
CONSENT
2 years
YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
hubspotutk
5 months 27 days
HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
ln_or
1 day
Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Cookie
Duration
Description
bcookie
1 year
LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie
1 year
LinkedIn sets this cookie to store performed actions on the website.
li_sugr
3 months
LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
VISITOR_INFO1_LIVE
5 months 27 days
YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC
session
Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices
never
YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id
never
YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId
never
YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests
never
YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.