Monitoring Websites with Telegraf and Prometheus

In network service delivery, the network exists to have applications ride on it. Yes, even voice is considered an application when it is riding over the top of the network. We have explored in previous posts how to get telemetry data from your network devices to get an understanding of how they are performing from a device perspective. Now, in this post, I will move on to exploring how to monitor web applications and DNS using Telegraf, Prometheus, and Grafana. Often your operations teams will receive reports of websites not working for a user or you are just looking to get some more visibility into your own web services. The following method could be used to get more insight into the network and the name resolution required for those applications.

There are also several other Telegraf inputs available including ping (ICMP) and TCP tests. As of this post in May 2020 there are 181 different input plugins available to choose from. Take a look at the Telegraf plugins for more details and explore what other plugins you may be able to use to monitor your environment.

I will not be going into the setup of these tools, as this is already covered in a previous post. The previous posts in the series are:

These posts can help you get up and running when it comes to monitoring your network devices in CLI, SNMP, and gNMI.

Blackbox exporter from Prometheus is also a valid choice for this process, and I encourage you to try both the Telegraf and Blackbox exporters in your environment.

Sequence Diagram

Telegraf Setup – HTTP Response

Telegraf has the HTTP Response plugin that does exactly what we would be looking to use for gathering metrics about a HTTP response. This lets you define the list of websites that you wish to monitor, set options for proxy, response timeout, method, any data you may want to include in the body, and various responses. Take a look at the plugin documentation for more details. Here is the configuration that is going to get setup for this demonstration:

#####################################################
#
# Check on status of URLs
#
#####################################################
[[inputs.http_response]]
  urls = ["https://www.networktocode.com", "https://blog.networktocode.com", "https://www.service-now.com"]
  method = "GET"
  follow_redirects = true

#####################################################
#
# Export Information to Prometheus
#
#####################################################
[[outputs.prometheus_client]]
  listen = ":9012"
  metric_version = 2

#####################################################
#
# Check on status of URLs
#
#####################################################
[[inputs.http_response]]
  urls = ["https://www.networktocode.com", "https://blog.networktocode.com", "https://www.service-now.com"]
  method = "GET"
  follow_redirects = true

#####################################################
#
# Export Information to Prometheus
#
#####################################################
[[outputs.prometheus_client]]
  listen = ":9012"
  metric_version = 2

Upon executing this, here are the relevant Prometheus metrics that we are gathering:

# HELP http_response_content_length Telegraf collected metric
# TYPE http_response_content_length untyped
http_response_content_length{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"} 1.791348e+06
http_response_content_length{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"} 123667
http_response_content_length{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"} 478636
# HELP http_response_http_response_code Telegraf collected metric
# TYPE http_response_http_response_code untyped
http_response_http_response_code{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"} 200
http_response_http_response_code{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"} 200
http_response_http_response_code{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"} 200
# HELP http_response_response_time Telegraf collected metric
# TYPE http_response_response_time untyped
http_response_response_time{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"} 0.371015121
http_response_response_time{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"} 0.186775794
http_response_response_time{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"} 0.658694795
# HELP http_response_result_code Telegraf collected metric
# TYPE http_response_result_code untyped
http_response_result_code{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"} 0
http_response_result_code{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"} 0
http_response_result_code{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"} 0

# HELP http_response_content_length Telegraf collected metric
# TYPE http_response_content_length untyped
http_response_content_length{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"} 1.791348e+06
http_response_content_length{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"} 123667
http_response_content_length{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"} 478636
# HELP http_response_http_response_code Telegraf collected metric
# TYPE http_response_http_response_code untyped
http_response_http_response_code{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"} 200
http_response_http_response_code{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"} 200
http_response_http_response_code{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"} 200
# HELP http_response_response_time Telegraf collected metric
# TYPE http_response_response_time untyped
http_response_response_time{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"} 0.371015121
http_response_response_time{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"} 0.186775794
http_response_response_time{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"} 0.658694795
# HELP http_response_result_code Telegraf collected metric
# TYPE http_response_result_code untyped
http_response_result_code{method="GET",result="success",result_type="success",server="https://blog.networktocode.com",status_code="200"} 0
http_response_result_code{method="GET",result="success",result_type="success",server="https://www.networktocode.com",status_code="200"} 0
http_response_result_code{method="GET",result="success",result_type="success",server="https://www.service-now.com",status_code="200"} 0

You have several pieces that come back right away including:

content_length: How long the content is
response_code: HTTP response code
response_time: How long did it take for the request to process
result_code: This is a function of Telegraf, to take an OK response to map to 0

Telegraf – DNS Check

On top of this, I want to also show how to add in a second input. We will add in a DNS query to test the name resolution of the sites as well to verify that the DNS lookup is working as expected. This could also be extended to test and verify DNS from a user perspective within your environment.

#####################################################
#
# Check on status of URLs
#
#####################################################
[[inputs.http_response]]
  urls = ["https://www.networktocode.com", "https://blog.networktocode.com", "https://www.service-now.com"]
  method = "GET"
  follow_redirects = true

[[inputs.dns_query]]
  servers = ["8.8.8.8"]
  domains = ["blog.networktocode.com", "www.networktocode.com", "www.servicenow.com"]

#####################################################
#
# Export Information to Prometheus
#
#####################################################
[[outputs.prometheus_client]]
  listen = ":9012"
  metric_version = 2

#####################################################
#
# Check on status of URLs
#
#####################################################
[[inputs.http_response]]
  urls = ["https://www.networktocode.com", "https://blog.networktocode.com", "https://www.service-now.com"]
  method = "GET"
  follow_redirects = true

[[inputs.dns_query]]
  servers = ["8.8.8.8"]
  domains = ["blog.networktocode.com", "www.networktocode.com", "www.servicenow.com"]

#####################################################
#
# Export Information to Prometheus
#
#####################################################
[[outputs.prometheus_client]]
  listen = ":9012"
  metric_version = 2

The new section is:

[[inputs.dns_query]]
  servers = ["8.8.8.8"]
  domains = ["blog.networktocode.com", "www.networktocode.com", "www.servicenow.com"]

[[inputs.dns_query]]
  servers = ["8.8.8.8"]
  domains = ["blog.networktocode.com", "www.networktocode.com", "www.servicenow.com"]

Based on the plugin definition we are going to define to use the Google DNS resolver. And the interesting domains that we are going to verify are blog.networktocode.com, www.networktocode.com, and the popular ITSM tool ServiceNow.

Here is what gets added to the Prometheus Client output:

# HELP dns_query_query_time_ms Telegraf collected metric
# TYPE dns_query_query_time_ms untyped
dns_query_query_time_ms{domain="blog.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 70.950858
dns_query_query_time_ms{domain="www.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 48.118903
dns_query_query_time_ms{domain="www.servicenow.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 48.552328
# HELP dns_query_rcode_value Telegraf collected metric
# TYPE dns_query_rcode_value untyped
dns_query_rcode_value{domain="blog.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
dns_query_rcode_value{domain="www.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
dns_query_rcode_value{domain="www.servicenow.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
# HELP dns_query_result_code Telegraf collected metric
# TYPE dns_query_result_code untyped
dns_query_result_code{domain="blog.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
dns_query_result_code{domain="www.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
dns_query_result_code{domain="www.servicenow.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0

# HELP dns_query_query_time_ms Telegraf collected metric
# TYPE dns_query_query_time_ms untyped
dns_query_query_time_ms{domain="blog.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 70.950858
dns_query_query_time_ms{domain="www.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 48.118903
dns_query_query_time_ms{domain="www.servicenow.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 48.552328
# HELP dns_query_rcode_value Telegraf collected metric
# TYPE dns_query_rcode_value untyped
dns_query_rcode_value{domain="blog.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
dns_query_rcode_value{domain="www.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
dns_query_rcode_value{domain="www.servicenow.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
# HELP dns_query_result_code Telegraf collected metric
# TYPE dns_query_result_code untyped
dns_query_result_code{domain="blog.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
dns_query_result_code{domain="www.networktocode.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0
dns_query_result_code{domain="www.servicenow.com",rcode="NOERROR",record_type="NS",result="success",server="8.8.8.8"} 0

The corresponding values gathered from the dns_query input are:

dns_query_query_time_ms: Amount of time it took for the query to respond
dns_query_rcode_value: Return code value for a DNS entry
dns_query_result_code: Code defined by Telegraf for the response

Prometheus

The configuration for Prometheus at this point has a single addition to gather the statistics for each of the websites:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'telegraf website'
    scrape_interval: 10s
    static_configs:
      - targets:
        - "localhost:9012"

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'telegraf website'
    scrape_interval: 10s
    static_configs:
      - targets:
        - "localhost:9012"

When you navigate to the base page to check on how Prometheus is doing with polling the data you can get a base graph. Here you see that all three sites are appearing on the graph with respect to response time:

Grafana

What does it look like to get this information into a graph on Grafana?

Grafana – Websites

To build this chart, this is a small configuration. In the Metrics section I only put the query of http_response_response_time. With the legend I set it to {{ server }} to get the website address as the table legend.

In the visualization section, the only thing that is needs to be doneis to adjust in the Left Y Axis Unit to be seconds (s) to provide the proper Y-Axis Metric.

Grafana – DNS

This is going to be another small configuration panel, similar to the previous one. In the Metrics section the corresponding query to get response time is dns_query_query_time_ms. The legend you then set to {{ domain }} to match that of what is in the query shown above.

In the visualization section, you should use the Unit of milliseconds (ms). If you copied the panel from the Website panel, don’t forget to change this. The unit of measure is in fact different and the time scale would be off.

Conclusion

Hopefully this post will help you gain some insight into your environment. We have been using this process internally at Network to Code already, keeping an eye on our key services that we rely on to understand if there is an individual issue or an issue with the service. Let us know your thoughts and comments! To continue the conversation, check out the #Telemetry channel inside the Network to Code Slack. Sign up at slack.networktocode.com.

-Josh

Tags :

automation monitoring netdevops prometheus telegraf telemetry tutorial

Does this all sound amazing? Want to know more about how Network to Code can help you do this, reach out to our sales team. If you want to help make this a reality for our clients, check out our careers page.

Monitoring Websites with Telegraf and Prometheus

Sequence Diagram

Telegraf Setup – HTTP Response

Telegraf – DNS Check

Prometheus

Grafana

Grafana – Websites

Grafana – DNS

Conclusion

Tags :

Share :

Contents

Recent Posts

May 20, 2026

March 12, 2026

February 19, 2026

January 29, 2026

December 18, 2025

Contact Us to Learn More

Author

Nautobot

What we do

How we do it

Company

Community

Resources

Contact us

Author