Over the past several posts, I have discussed how to gather metrics about your infrastructure and web applications. Now, the natural progression is to move into alerting with Prometheus. This post will build on the previous post on gathering website and DNS responses. I will be taking you through how to setup a rule whenever a website gives a response other than a 200 OK response. To accomplish this we will take a look at the metric http_response_http_response_code gathered via Telegraf.
Prometheus Setup
You configure rules in files and reference those file names within the Prometheus configuration. A common practice is to name the file alert.rules within the /etc/prometheus/ directory.
The following outlines what the file will contain. The alert rules will be defined by a YAML file that specifies the alert name (alert), expression (expr) to search for within Prometheus, and the time (for) that the event status meets the criteria. There are additional keys available as well, such as labels and annotations as demonstrated below:
groups:-name: websitesrules:-alert: WebsiteDownexpr: http_response_http_response_code !=200for: 1mlabels:severity: criticalannotations:summary:"{{ $labels.instance }} is not responding with 200 OK."
This is what the configuration will look like for the prometheus.yml file. The rules file that is created above will be added to the array under the key rule_files. This will allow for multiple files to be processed by Prometheus.
Once the rules are loaded, you can verify the rules by going to the Prometheus url – http://<hostname_or_ip>:9090/rules. You will now see what rules are loaded:
Prometheus AlertManager
Now you have a configuration for the alerts, but how do you actually manage them? You’ll need to add an application into the environment, Prometheus AlertManager. AlertManager is where you will handle the silencing, deduplicating, grouping, and routing of alerts to the appropriate outputs. These destinations can include, but not limited to, Slack, email, or webhooks. The AlertManager configuration page has the details on how to make configuration for these:
Email
HipChat
PagerDuty
Pushover
Slack
OpsGenie
Webhook
VictorOps
WeChat
AlertManager Installation
Installation can be done in several ways. There are binaries available for many common platforms, Docker containers, and installation from source. In this demo, I will just be installing the binary via the installation using wget to download the file.
Once the file is downloaded, we will expand it within the directory:
tar -xzf alertmanager-0.20.0.linux-amd64.tar.gz
AlertManager Configuration
The AlertManager configuration is to be handled in the alertmanager.yml file. An example may look like:
route:group_by: [Alertname] # Send all notifications to me.receiver: email-mereceivers:-name: email-meemail_configs:-to: $GMAIL_ACCOUNTfrom: $GMAIL_ACCOUNTsmarthost: smtp.gmail.com:587auth_username:"$GMAIL_ACCOUNT"auth_identity:"$GMAIL_ACCOUNT"auth_password:"$GMAIL_AUTH_TOKEN"
AlertManager Execution
To start this test instance of AlertManager the command ./alertmanager --config.file="alertmanager.yml" is executed to start AlertManager:
You can see that the application starts up, and then the listening address port is displayed indicating in this instance the AlertManager is listening on port 9093.
Prometheus Alerts in Action
Now that the configuration has been called out, let’s take a look at how this looks put all together.
To see the status of the alerts within the Prometheus environment, you can navigate to the Alerts menu item, or to the URL http://<hostname_or_ip>:9090/alerts. Once there, the following image shows the status of each of the rules within the files that the rules are being added to.
At this point there are no websites down. To confirm this in the Prometheus Graph you can search for ALERTS within the graph application of Prometheus. You should get the message No datapoints found. if you have nothing alerting. This will help you understand if you are receiving an alert status and it is being suppressed or if there is something else wrong with the configuration.
At this point I am going to have my DNS server deny access to the ServiceNow website. This will simulate the service unavailable
Prometheus Alerts Pending
After some time the website becomes non responsive. Next we can see within the Alerts management page that Prometheus was first in a waiting status that the website was down, but had not crossed the threshold for amount of time that was set (1 minute). You see the 1 pending rule.
Prometheus Alerts Firing
Once the threshold that was defined has passed, the alert will move from a Pending state to a Firing state. In this state Prometheus has sent the alert off to AlertManager to handle the processing of the alert.
First, let’s take a look at the Prometheus Alerts page. This page shows that the alert has moved through to the Firing phase. This has the same information that you seen in the Pending state but now in the red state.
Now, moving on to the Graph section of Prometheus and searching for ALERT, you can now see the lines along the way of the state of the ALERT.
At the start the graph with the mouse cursor over the section indicating when the event was in a Pending state. The second graph shows the mouse hovering over the Firing state. Each gives you additional information to help debug if alerts are not getting to their destination.
Prometheus AlertManager Firing
The last image is the view from the AlertManager perspective. This shows what alerts have been triggered and which tags are found within the search for the alert.
Summary
This wraps up (for now) this series of posts focused on leveraging Telegraf, Prometheus, and Grafana to monitor your environment. Take a look at the post list below for the others in the series and jump on into the Network to Code Slack Telemetry channel to start a conversation on what you are doing, what you want to do, or just to talk network telemetry!
Does this all sound amazing? Want to know more about how Network to Code can help you do this, reach out to our sales team. If you want to help make this a reality for our clients, check out our careers page.
In network service delivery, the network exists to have applications ride on it. Yes, even voice is considered an application when it is riding over the top of the network. We have explored in previous posts how to get telemetry data from your network devices to get an understanding of how they are performing from a device perspective. Now, in this post, I will move on to exploring how to monitor web applications and DNS using Telegraf, Prometheus, and Grafana. Often your operations teams will receive reports of websites not working for a user or you are just looking to get some more visibility into your own web services. The following method could be used to get more insight into the network and the name resolution required for those applications.
There are also several other Telegraf inputs available including ping (ICMP) and TCP tests. As of this post in May 2020 there are 181 different input plugins available to choose from. Take a look at the Telegraf plugins for more details and explore what other plugins you may be able to use to monitor your environment.
I will not be going into the setup of these tools, as this is already covered in a previous post. The previous posts in the series are:
These posts can help you get up and running when it comes to monitoring your network devices in CLI, SNMP, and gNMI.
Blackbox exporter from Prometheus is also a valid choice for this process, and I encourage you to try both the Telegraf and Blackbox exporters in your environment.
Sequence Diagram
Telegraf Setup – HTTP Response
Telegraf has the HTTP Response plugin that does exactly what we would be looking to use for gathering metrics about a HTTP response. This lets you define the list of websites that you wish to monitor, set options for proxy, response timeout, method, any data you may want to include in the body, and various responses. Take a look at the plugin documentation for more details. Here is the configuration that is going to get setup for this demonstration:
####################################################### Check on status of URLs######################################################[[inputs.http_response]] urls = ["https://www.networktocode.com","https://blog.networktocode.com","https://www.service-now.com"] method ="GET" follow_redirects =true####################################################### Export Information to Prometheus######################################################[[outputs.prometheus_client]] listen =":9012" metric_version =2
Upon executing this, here are the relevant Prometheus metrics that we are gathering:
# HELP http_response_content_length Telegraf collected metric# TYPE http_response_content_length untypedhttp_response_content_length{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"}1.791348e+06http_response_content_length{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"}123667http_response_content_length{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"}478636# HELP http_response_http_response_code Telegraf collected metric# TYPE http_response_http_response_code untypedhttp_response_http_response_code{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"}200http_response_http_response_code{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"}200http_response_http_response_code{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"}200# HELP http_response_response_time Telegraf collected metric# TYPE http_response_response_time untypedhttp_response_response_time{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"}0.371015121http_response_response_time{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"}0.186775794http_response_response_time{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"}0.658694795# HELP http_response_result_code Telegraf collected metric# TYPE http_response_result_code untypedhttp_response_result_code{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"}0http_response_result_code{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"}0http_response_result_code{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"}0
You have several pieces that come back right away including:
content_length: How long the content is
response_code: HTTP response code
response_time: How long did it take for the request to process
result_code: This is a function of Telegraf, to take an OK response to map to 0
Telegraf – DNS Check
On top of this, I want to also show how to add in a second input. We will add in a DNS query to test the name resolution of the sites as well to verify that the DNS lookup is working as expected. This could also be extended to test and verify DNS from a user perspective within your environment.
####################################################### Check on status of URLs######################################################[[inputs.http_response]] urls = ["https://www.networktocode.com","https://blog.networktocode.com","https://www.service-now.com"] method ="GET" follow_redirects =true[[inputs.dns_query]] servers = ["8.8.8.8"] domains = ["blog.networktocode.com","www.networktocode.com","www.servicenow.com"]####################################################### Export Information to Prometheus######################################################[[outputs.prometheus_client]] listen =":9012" metric_version =2
Based on the plugin definition we are going to define to use the Google DNS resolver. And the interesting domains that we are going to verify are blog.networktocode.com, www.networktocode.com, and the popular ITSM tool ServiceNow.
Here is what gets added to the Prometheus Client output:
# HELP dns_query_query_time_ms Telegraf collected metric# TYPE dns_query_query_time_ms untypeddns_query_query_time_ms{domain="blog.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"}70.950858dns_query_query_time_ms{domain="www.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"}48.118903dns_query_query_time_ms{domain="www.servicenow.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"}48.552328# HELP dns_query_rcode_value Telegraf collected metric# TYPE dns_query_rcode_value untypeddns_query_rcode_value{domain="blog.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"}0dns_query_rcode_value{domain="www.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"}0dns_query_rcode_value{domain="www.servicenow.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"}0# HELP dns_query_result_code Telegraf collected metric# TYPE dns_query_result_code untypeddns_query_result_code{domain="blog.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"}0dns_query_result_code{domain="www.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"}0dns_query_result_code{domain="www.servicenow.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"}0
The corresponding values gathered from the dns_query input are:
dns_query_query_time_ms: Amount of time it took for the query to respond
dns_query_rcode_value: Return code value for a DNS entry
dns_query_result_code: Code defined by Telegraf for the response
Prometheus
The configuration for Prometheus at this point has a single addition to gather the statistics for each of the websites:
When you navigate to the base page to check on how Prometheus is doing with polling the data you can get a base graph. Here you see that all three sites are appearing on the graph with respect to response time:
Grafana
What does it look like to get this information into a graph on Grafana?
Grafana – Websites
To build this chart, this is a small configuration. In the Metrics section I only put the query of http_response_response_time. With the legend I set it to {{ server }} to get the website address as the table legend.
In the visualization section, the only thing that is needs to be doneis to adjust in the Left Y AxisUnit to be seconds (s) to provide the proper Y-Axis Metric.
Grafana – DNS
This is going to be another small configuration panel, similar to the previous one. In the Metrics section the corresponding query to get response time is dns_query_query_time_ms. The legend you then set to {{ domain }} to match that of what is in the query shown above.
In the visualization section, you should use the Unit of milliseconds (ms). If you copied the panel from the Website panel, don’t forget to change this. The unit of measure is in fact different and the time scale would be off.
Conclusion
Hopefully this post will help you gain some insight into your environment. We have been using this process internally at Network to Code already, keeping an eye on our key services that we rely on to understand if there is an individual issue or an issue with the service. Let us know your thoughts and comments! To continue the conversation, check out the #Telemetry channel inside the Network to Code Slack. Sign up at slack.networktocode.com.
Does this all sound amazing? Want to know more about how Network to Code can help you do this, reach out to our sales team. If you want to help make this a reality for our clients, check out our careers page.
This post, the second in a series focused on using Telegraf, Prometheus, and Grafana for Network Telemetry, will focus on transforming data and making additional graphs within Grafana. This post will cover the following topics:
Here is where you can find the first post in the series on how to gather data from SNMP based devices.
Purpose
The intent of this post is to demonstrate how to bring multiple telemetry gathering methods into one. In our experience, a successful telemetry & analytics stack should be able to collect data transparently from SNMP, telemetry Streaming (gNMI) and CLI/API. We covered SNMP and CLI gathering in previous posts. This post will focus on gathering telemetry data with gNMI. Beyond the collection of data, when we are collecting the same type of data from multiple sources it’s important to ensure that the data will have the format in the database. In this post, we’ll look at how Telegraf can help normalize and decorate the data before sending it to the database.
Network Topology
In the topology there is a mix of devices per the table below:
Device Name
Device Type
Telemetry Source
houston
Cisco IOS-XE
SNMP
amarillo
Cisco NXOS
SNMP
austin
Cisco IOS-XR
gNMI
el-paso
Cisco IOS-XR
gNMI
san-antonio
Cisco IOS-XR
gNMI
dallas
Cisco IOS-XR
gNMI
This blog post was created based on a Cisco-only environment, but if you’re interested in a multi-vendor approach check out @damgarros’s NANOG 77 presentation on YouTube. That video shows how to use only gNMI to collect data from Arista, Juniper, and Cisco devices in a single place. This topology used here is meant to show the collection from multiple sources (SNMP + gNMI) in one.
Application Installs Note
Software installation was covered in the previous post in this series, and I recommend taking a look at either that post for the particular installation instructions, or heading over to the product page referenced in the introduction.
Overview
Here is the sequence of events that is being addressed in this post. I am starting with Telegraf gathering and collecting gNMI data from network devices. This is being processed into Prometheus metrics that will be scraped by a Prometheus server. Then Grafana will generate graphs on the data that is gathered and processed appropriately.
gNMI Introduction
gNMI stands for gRPC (Remote Procedure Calls) Network Management Interface. gRPC is a standard developed by Google that leverages HTTP/2 for transport using Protocol Buffers. gNMI is a gRPC-based protocol to get configuration and telemetry from a network device. All messages are defined as protocol buffers that intend to keep data as small as possible in the definition to be as efficient as possible. The data is serialized into the proper format by the device and sent off. This can hold quite a bit of information and is read by the receiver. You can take a look at the gNMI reference for more detailed information.
gNMI can handle not only telemetry data that this post is about, but also is intended to transport configuration about the device as well.
So why use gNMI? gRPC is incredibly fast and efficient at transmitting data, and by extension gNMI is also fast and efficient.
gNMI Cisco Configuration
gNMI is supported by many of today’s leading network vendors. As an example for configuring a Cisco IOS-XR device here are the configuration lines needed to enable gNMI in this demo environment:
grpc port 50000 no-tls
Pretty straight to the point. If you wish to create a subscription model within the Cisco IOS-XR there are some more detailed configuration options available. Take a look at Cisco’s Guide to Configure Model-driven Telemetry
Telegraf
Gathering Streaming Data With gNMI
The first step that I will be walking through is setting up Telegraf to subscribe to gNMI data. This is specifically to collect telemetry data from IOS-XR devices in this lab scenario. With gNMI, like other streaming Telemetry subscriptions, you need to tell the network device that you want to subscribe to receive the data. The device will then send the periodic updates of telemetry data to the receiver. There is a periodic “keep-alive” message sent to keep the subscription active by the subscriber.
gNMI Telegraf Configuration
Telegraf has a plugin that will take care of the subscription and the input section looks like the code below. Note that the subscription port is defined within the addresses section.
The configuration shows that you define the address, username, and password. This configuration also shows a redial setup in case of a failure and particular subscriptions to be excluded from the request.
There are two subscriptions that we are subscribing to in this instance:
In each of these cases the sampling will be every 10 seconds in this demo, which means that the device will send the statistics every 10 seconds. Every 10 seconds there will be new metrics available to be scraped by Prometheus. The sample interval and Prometheus scrape interval should be the same interval.
To collect the telemetry for this demo we are once again using the Prometheus client output from Telegraf. Telegraf will collect, process, and format the data that will then be scraped by a Prometheus server. Let’s take a look at what that output looks like next.
gNMI Output – BGP
I’m only going to take a look at a few of the items in the output here. There are too many that would fill up too much real estate in your screen to make it worthwhile.
# HELP bgp_neighbor_messages_received_UPDATE Telegraf collected metric# TYPE bgp_neighbor_messages_received_UPDATE untypedbgp_neighbor_messages_received_UPDATE{device="dallas",identifier="BGP",name="default",neighbor_address="10.0.0.1",peer_type="EXTERNAL",role="leaf"}9bgp_neighbor_messages_received_UPDATE{device="dallas",identifier="BGP",name="default",neighbor_address="10.0.0.17",peer_type="EXTERNAL",role="leaf"}0bgp_neighbor_messages_received_UPDATE{device="dallas",identifier="BGP",name="default",neighbor_address="10.0.0.25",peer_type="EXTERNAL",role="leaf"}9bgp_neighbor_messages_received_UPDATE{device="dallas",identifier="BGP",name="default",neighbor_address="10.0.0.9",peer_type="EXTERNAL",role="leaf"}9
Some items were removed to assist in readability and message delivery
The output is what you would expect. A list of the neighbors identified by the neighbor_address key in the tags. With the BGP subscription you get:
bgp_neighbor_established_transitions
bgp_neighbor_last_established
bgp_neighbor_messages_received_NOTIFICATION
bgp_neighbor_messages_received_UPDATE
bgp_neighbor_messages_sent_NOTIFICATION
bgp_neighbor_messages_sent_UPDATE
bgp_neighbor_peer_as
bgp_neighbor_peer_as
bgp_neighbor_queues_output
bgp_neighbor_session_state
gNMI Output – Interfaces
There are a lot of statistics sent back with the interface subscription. We’ll be taking a look at just one of them, interface_state_counters_in_octets, in this instance. We get a look at each interface and its associated counter in the data.
This is great information, and we have seen something similar with SNMP. Now to the transformations that Telegraf offers.
Changing data with Enum and Replacement
Telegraf has a couple of different processors available to process the data and get it into a format that is appropriate and consistent for your environment. Let’s take a look at a couple of them and how they are used in the use case here.
Telegraf – Enum
The first processor used is within the BGP data collection. When the data comes back from the subscription for a BGP session state, it comes back as a string value. It is great to be able to read the current state, but is not very helpful for a Time Series Data Base (TSDB). A TSDB is looking to get the data back represented as a number of some sort, either an integer or a float. The whole point is to measure information at a point in time.
The Telegraf process then looks like this:
To accommodate this, the use of the enum processor is put into action. The following is added to the configuration:
[[processors.enum]] [[processors.enum.mapping]] ## Name of the field to map field ="session_state" [processors.enum.mapping.value_mappings] IDLE =1 CONNECT =2 ACTIVE =3 OPENSENT =4 OPENCONFIRM =5 ESTABLISHED =6
Within the session_state any instances of the string IDLE will be replaced with the integer 1. This is then set up to store for the long term within a TSDB. This is the same then for all of the rest of the states as well, with ESTABLISHED states stored as the integer 6. Later in Grafana this number will be reversed into the word for representation on a graph.
Telegraf – Rename
The second processor that is used in this demo is the rename processor. This rename processor has a function to replace items. Below is what is used to rename the SNMP counters that are collected for SNMP devices and moved to match the names for gNMI.
[[processors.rename]] [[processors.rename.replace]] field ="ifHCInOctets" dest ="state_counters_in_octets" [[processors.rename.replace]] field ="ifHCOutOctets" dest ="state_counters_out_octets"
This states that if looking for ifHCInOctets – replace the field with state_counters_in_octets. And the same for the outbound with ifHCOutOctets replacing with state_counters_out_octets. Once Telegraf has replaced those fields, you can use the data gathered with SNMP and that with gNMI in the same queries!
Tagging Data
Tagging data is one of the biggest favors that you can do for yourself. Tagging gives flexibility for future analysis, comparison, and graphing data points. For instance if you tag your BGP neighbors with the upstream peer provider, you will be able to easily identify the interfaces which belong to that particular peer. If you have four geographically diverse interfaces, this will allow you to quickly identify the interfaces based on the tag rather than manually deciding later at the time of graphing or alerting.
This brings us to the third Telegraf processor in this post regex processor. This processor will take a regex search pattern, and complete the replacement. Something new here is that if you use the result_key option, a new field will be created and not replace what is there, resulting in a whole new field. This regex replacement will add a new tag for intf_role using server as the definition.
Throughout the upcoming Grafana section you will get to see a number of PromQL (Prometheus Query Language) queries. Take a look at the Prometheus.io basics page to get full documentation of the queries that are available. It is these queries that are being executed that will be used by Grafana to populate the data in the graphs.
Grafana
Through the next several sections you will get to see how to build a dashboard using PromQL and variable substitution, among other topics to build these dashboards on a per device basis. From a device perspective dashboard, these two dashboards look different in the number of interfaces and neighbors displayed, but they are born out of the same dashboard configuration.
Variables
First, you’ll need to set up the variable device that is seen on the upper left hand corner of the dashboard. When I first started building dashboards I remember that this may be one of the most important skills when looking to level up your Grafana dashboards, as it will allow you to get significant amount of value while reducing re-work to keep adding additional devices into a panel.
Variables – Adding to Your Dashboard
To add a dashboard wide variable follow these steps:
Navigate into your dashboard
Click on the gear icon in the upper right hand navigation section
Click on Variables
Click the green New button on the right hand side
This image already had a variable added, which is the devices
Once in the new screen you will have the following image:
Here you will be defining a PromQL query to build out your device list. In the bottom section of the screen you see the heading of Preview of values. Here you will be able to observe a sample of what the query will result in for your variables.
The fields that you need to fill in include:
Field
Information Needed
Name
Name of the variable you wish to use
Type
Query
Data source
Prometheus
Refresh
When would you like to refresh the variables? Use the dropdown to select which fits your org best
Query
PromQL to get the data points
Regex
Regex search to reduce the search results
You can experiment with the rest of the fields as you see fit to get your variables defined properly.
Once you have your search pattern set, make sure to click Save on the left hand side of Grafana.
To reference the variables once they are created, you use the dollar sign ($) in front of the name for Grafana to execute that as a variable within a query. Within the Legend area the use of Jinja-like formatting of the double curly braces will identify as a variable.
Grafana Plugins
Grafana is extensible via the use of plugins. There are quite a few plugins available for Grafana and I encourage you to take a look at the plugin page to be able to search for what you may want to use on your own. There are 3 types of plugins: Panel, Data Source, and App to help extend the Grafana capabilities.
Grafana Discrete Plugin (BGP State Over Time)
The next table to take a look at is using a feature within Grafana that allows you to add plugins. I’ll look at how we were able to build out the graph. This can help to identify issues quickly in your environment by just looking at the dashboard. Take a look at this where there was a BGP neighbor that was down. It is quickly identifiable on the dashboard and that action will be needed.
The two panels in the top row are using a Grafana plugin called Discrete. This provides data values in the color that is defined within the configuration over time. The panel then gives you the ability to hover over to see the changes over time. You install the plugin with the grafana-cli command:
Once installed you can setup a new panel with the panel type of Discrete.
The panel will be created with the following parameters
BGP Session Status – Discrete Panel 1
Query: Prometheus data source
Key
Value
Metrics
bgp_neighbor_session_state{device=”$device”}
Legend
{{ device }} {{ neighbor_address }}
Min step
Resolution
1/1
Format
Time series
Instant
Unchecked
You will note that in the Metrics section the variable reference is $device, noted by the dollar sign in the device name. The Legend has two variables included in both the device and neighbor_address within the Legend. This is what gets displayed in the discrete table for each line.
Critical Interface State – Discrete Panel 2
Now because the interfaces have been assigned a label, a discrete panel can be generated to show the interface state, along with the role. For demonstration, we are naming this panel Critical Interfaces, the interfaces for Servers or Uplinks to other network devices have been labeled as with server or uplink accordingly. By querying for any role we can get thi information into the panel. The legend has the value of {{device}} {{name}} > {{intf_role}} > {{neighbor}} to get to the appropriate mappings that are to be shown. This is the resulting panel:
To get to this panel we can see the following discrete panel settings. This panel build is a little bit smaller, but gets a lot of information added into a panel!
Device Dashboards vs Environment Dashboards
This is not a pick one over the other segment, rather this is saying that both should be present in your Grafana Dashboard setup.
In this post I have gone through and shown a lot of device-specific panels. The value here is that you are able to get to a device by device-specific view very quickly, without having to create a separate page for each and every device in your environment. That said, that the panels can be expanded by the use of variables to identify individual devices.
You should also look at using an environment dashboard where you are getting specific pieces of information to match your need. Need to know what an application performance looks like that includes Network, Server, Storage, and Application performance? You can work to build out these dashboards by hand, but this will take longer to build. As you leverage tags in the gathering of telemetry into your TSDB, you will be on your way to building dashboards in an automated fashion to get the big picture very quickly.
Conclusion
Hopefully this has been helpful. Again, check out the first post in the series if you need more information on these tools generally. In the next post, I will cover how to advance your Prometheus environment with monitoring remote sites and a I’ll discuss a couple of methodologies to enable alerting within the environment.
The next post will include how to alert using this technology stack.
Does this all sound amazing? Want to know more about how Network to Code can help you do this, reach out to our sales team. If you want to help make this a reality for our clients, check out our careers page.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies. In case of sale of your personal information, you may opt out by using the link Do not sell my personal information. Privacy | Cookies
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
__hssc
30 minutes
HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc
session
This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
cookielawinfo-checkbox-advertisement
1 year
Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent
1 year
CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Cookie
Duration
Description
__cf_bm
30 minutes
Cloudflare set the cookie to support Cloudflare Bot Management.
li_gc
5 months 27 days
Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc
1 day
LinkedIn sets the lidc cookie to facilitate data center selection.
UserMatchHistory
1 month
LinkedIn sets this cookie for LinkedIn Ads ID syncing.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Cookie
Duration
Description
__hstc
5 months 27 days
Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga
1 year 1 month 4 days
Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_gat_gtag_UA_*
1 minute
Google Analytics sets this cookie to store a unique user ID.
_gid
1 day
Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
AnalyticsSyncHistory
1 month
Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
CONSENT
2 years
YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
hubspotutk
5 months 27 days
HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.
ln_or
1 day
Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Cookie
Duration
Description
bcookie
1 year
LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie
1 year
LinkedIn sets this cookie to store performed actions on the website.
li_sugr
3 months
LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
VISITOR_INFO1_LIVE
5 months 27 days
YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC
session
Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices
never
YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id
never
YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId
never
YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests
never
YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.