Nautobot and Django QuerySet Annotations – Part 3

Blog Detail

This is Part 3 of the series on Django QuerySet annotations. In Part 1 we talked about what the annotations are and why you would want to use them. In Part 2 we looked at subqueries and more annotation examples.

In this post, we’ll learn how to use annotations spanning multiple tables, and we will go through more advanced use cases.

In the first part of this series we looked at examples of using Count aggregator with annotations to count the number of related objects. We took advantage of reverse relationships1 created by Django and used value of the related_name argument for accessing related objects.

However, not all related objects have a reverse relationship. When a model is defined, it’s possible to explicitly disable the reverse relationship by using the value + (plus sign) for the related_name. Or perhaps our models are indirectly related, i.e., there is no ForeignKey pointing directly from one model to another.

In these cases we have to use subqueries2 in our annotations to process related objects. The resulting queries tend to be a bit more complex; but once you get the hang of it, it all starts to make sense.

Let’s go back to the example where we counted number of circuits for each provider. We’ll try to build an equivalent query that doesn’t use the explicit relationship but uses a subquery instead. This subquery will leverage the fact that the Circuit model has ForeignKey pointing to the Provider model.

This is the query we used previously:

from django.db.models import Count

providers = Provider.objects.annotate(circuit_count=Count("circuits"))

Let’s modify it. First, we build the skeleton of the outer query:

from django.db.models import Subquery, OuterRef

Provider.objects.annotate(circuit_count=Subquery(...))

We keep the same annotation field name but instead of using the Count aggregator we insert a result returned by the subquery.

We now need to focus on building that subquery:

circuits = Circuit.objects \
                  .filter(provider=OuterRef("pk")) \
                  .values("provider") \
                  .order_by() \
                  .annotate(c=Count("provider")) \
                  .values("c")

That’s a bit more complex than using Count aggregator, but sometimes this is the only available option.

Before we combine the subquery and the outer query to get our final result, let’s have a closer look at all of the components of the subquerry to make sure we understand what they do.

  • filter(provider=OuterRef("pk")) — We want only Circuit objects with provider equal to the provider in the outer query. Field pk in OuterRef3 refers to the primary key of the Provider model.
  • values("provider") — Using values clause before annotation will trigger grouping behavior of the annotation as discussed in Part 2 of this series. We will count number of circuits for given provider, so we specify provider field here.
  • order_by() — We clear the default ordering to ensure grouping operation returns correct results.
  • annotate(c=Count("provider")) — Now we apply our annotation with the Count aggregator. New field, named c, will hold number of circuits for each provider group. We applied filtering at the beginning, so there will be only one such group.
  • values("c") — Finally, we specify that only one column is returned, which is the column added by the annotation.

Our subquery must return only one row and one column so that it can be used in the subsequent annotation. This is why we had to use values("c"); otherwise two columns would be returned, and we would get an error.

In this particular subquery only one row will be returned because of the filtering we applied. That won’t always be the case, so we will talk later about how to explicitly force subquery to return one row only.

We built the subquery and talked through how it works. We now need to use it in the main query to get circuit count for each of the providers:

providers = Provider.objects \
                    .annotate(circuit_count=Subquery(circuits)) \
                    .values("name", "circuit_count")

That’s it. This should have done the job. Let’s investigate the resulting values:

>>> pprint(list(providers))
[{'circuit_count': None, 'name': 'AT&T'},
 {'circuit_count': None, 'name': 'Cogent'},
 {'circuit_count': None, 'name': 'Deutsche Telekom'},
 {'circuit_count': None, 'name': 'GTT'},
 {'circuit_count': None, 'name': 'Liberty Global'},
 {'circuit_count': None, 'name': 'Lumen Technologies'},
 {'circuit_count': 40, 'name': 'NTT'},
 {'circuit_count': None, 'name': 'Orange'},
 {'circuit_count': None, 'name': 'PCCW Global'},
 {'circuit_count': None, 'name': 'Sprint'},
 {'circuit_count': None, 'name': 'Tata Communications'},
 {'circuit_count': None, 'name': 'Telecom Italia'},
 {'circuit_count': 40, 'name': 'Telia Carrier'},
 {'circuit_count': None, 'name': 'Telxius'},
 {'circuit_count': None, 'name': 'Verizon'},
 {'circuit_count': None, 'name': 'Zayo'}]

Something is not quite right here. Why is None showing up for some of the providers? When we used Count with reverse relationships, the value 0 was shown for the providers that don’t have circuits.

We get None here because when the subquery doesn’t return any results for given provider, a value of None will be used when annotating the Provider objects.

Fortunately, Django provides a function that can help us fix this. We’ll look at it now.

Coalesce Function

We saw that when subquery does not return anything, the value assigned to the annotation field is None. This is generally not what we want. If subquery doesn’t return anything, we would like to define a default value that is assigned to a field instead.

It so happens that Django provides a function that does exactly that. It’s a function named Coalesce4. Imagine it working like a dictionary get() method in Python, with default return value provided.

Coalesce accepts a list of at least two field names or expressions and returns the first non-null value. This means we could use multiple subqueries and wrap them in the Coalesce function. The first non-empty result from the ones returned by subqueries will be used. The caveat here is that the arguments must be of similar type. You can’t mix numbers with text, for example.

So now we know we have to use Coalesce function to fix our query. Let’s go ahead and feed it the subquery we created and an integer 0, which will be our default value.

from django.db.models.functions import Coalesce

providers = Provider.objects \
                    .annotate(
                        circuit_count=Coalesce(Subquery(circuits), 0)
                    ) \
                    .values("name", "circuit_count")

Let’s investigate the resulting values:

>>> pprint(list(providers))
[{'circuit_count': 0, 'name': 'AT&T'},
 {'circuit_count': 0, 'name': 'Cogent'},
 {'circuit_count': 0, 'name': 'Deutsche Telekom'},
 {'circuit_count': 0, 'name': 'GTT'},
 {'circuit_count': 0, 'name': 'Liberty Global'},
 {'circuit_count': 0, 'name': 'Lumen Technologies'},
 {'circuit_count': 40, 'name': 'NTT'},
 {'circuit_count': 0, 'name': 'Orange'},
 {'circuit_count': 0, 'name': 'PCCW Global'},
 {'circuit_count': 0, 'name': 'Sprint'},
 {'circuit_count': 0, 'name': 'Tata Communications'},
 {'circuit_count': 0, 'name': 'Telecom Italia'},
 {'circuit_count': 40, 'name': 'Telia Carrier'},
 {'circuit_count': 0, 'name': 'Telxius'},
 {'circuit_count': 0, 'name': 'Verizon'},
 {'circuit_count': 0, 'name': 'Zayo'}]

Great! We fixed our problem, and now providers that don’t have any circuits have the value of the field circuit_count set to 0.

Limiting Subquery Results to One Column and One Row

As I mentioned in the previous example, a subquery will often return more than one row and column, which cannot be used in some annotations.

Consider the below example of annotating Location objects with the name of the most recently added device within each location:

recently_added = Device.objects \
                       .filter(location=OuterRef("pk")) \
                       .order_by("-last_updated") \
                       .values("name")

locations = Location.objects \
                    .annotate(newest_device=Subquery(recently_added))
>>> print(locations)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  ... (cut for brevity)
django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression

We defined subquery recently_added, which returns devices located in a given location, ordered by the date they were last updated.

When we try to use this subquery in an annotation, we get an error. Django helpfully tells us that subquery returned too many rows. This happens because there could be multiple devices located in any given location, and each row contains device record.

To fix this, we use one element slice [:1] on the QuerySet returned by the subquery. Subquery expects to receive a QuerySet that produces one row, which is why we are using a slice. Note that we cannot use [0] index or first() function here, as these return a single instance of an object, or a dictionary when used with values.

recently_added = Device.objects \
                       .filter(location=OuterRef("pk")) \
                       .order_by("-last_updated") \
                       .values("name")[:1]

locations = Location.objects \
                    .annotate(newest_device=Subquery(recently_added)) \
                    .values("name", "newest_device")
>>> pprint(list(locations))
[{'name': 'Jersey City', 'newest_device': 'jcy-spine-02.infra.ntc.com'},
 {'name': 'New York City', 'newest_device': 'nyc-spine-02.infra.ntc.com'},
 {'name': 'ams', 'newest_device': 'ams-leaf-08'},
 {'name': 'atl', 'newest_device': 'atl-leaf-08'},
 {'name': 'bkk', 'newest_device': 'bkk-leaf-08'},
 {'name': 'can', 'newest_device': 'can-leaf-08'},
 {'name': 'cdg', 'newest_device': 'cdg-leaf-08'},
 {'name': 'del', 'newest_device': 'del-leaf-10'},
 {'name': 'den', 'newest_device': 'den-leaf-08'},
 {'name': 'dfw', 'newest_device': 'dfw-leaf-08'},
 {'name': 'dxb', 'newest_device': 'dxb-leaf-08'},
 {'name': 'fra', 'newest_device': 'fra-leaf-08'},
 {'name': 'hkg', 'newest_device': 'hkg-leaf-08'},
 {'name': 'hnd', 'newest_device': 'hnd-leaf-08'},
 {'name': 'icn', 'newest_device': 'icn-leaf-04'},
 {'name': 'jfk', 'newest_device': 'jfk-leaf-08'},
 {'name': 'lax', 'newest_device': 'lax-leaf-10'},
 {'name': 'lhr', 'newest_device': 'lhr-leaf-08'},
 {'name': 'ord', 'newest_device': 'ord-leaf-08'},
 {'name': 'pek', 'newest_device': 'pek-leaf-08'},
 {'name': 'pvg', 'newest_device': 'pvg-leaf-04'},
 {'name': 'sin', 'newest_device': 'sin-leaf-08'}]

And there you have it — our annotation is now working as expected.

Annotations Across Many Tables

A word of warning when using annotations across more than two tables: Due to the way Django translates queries with annotations to underlying SQL, using aggregator functions like Count or Avg across multiple tables will give incorrect results.

For instance, if we try annotating Location records with the count of related prefixes and the count of related devices, we get incorrect numbers:

>>> locations = Location.objects.annotate(pfx_count=Count("prefixes"))
>>> locations.get(name="lax").pfx_count
5
>>> locations = Location.objects.annotate(device_count=Count("devices"))
>>> locations.get(name="lax").device_count
12
>>> locations = Location.objects.annotate(pfx_count=Count("prefixes"), device_count=Count("devices"))
>>> locations.get(name="lax").device_count
60
>>> locations.get(name="lax").pfx_count
60

As you can see above, when using a single annotation with Count, we get correct results. But when we try to annotate Location objects with related Prefix and Device objects in a single query, we get back an incorrect value for both fields5.

Note that the in the case of Count, we can address this problem by providing argument distinct=True when invoking Count. However, this will not work for other aggregation functions.

>>> locations = Location.objects \
                        .annotate(
                            pfx_count=Count("prefixes", distinct=True),
                            device_count=Count("devices", distinct=True)
                        )
>>> locations.get(name="lax").device_count
12
>>> locations.get(name="lax").pfx_count
5

We will see how we can solve this problem for the generic case by using our trusty subqueries. In the example below, each of the subqueries will be run independently behind the scenes, and the results of each will be assigned to the given field.

from django.db.models import Subquery, OuterRef

pfx_count_sq = Location.objects \
                       .filter(pk=OuterRef("pk")) \
                       .annotate(c=Count("prefixes")) \
                       .values("c")

device_count_sq = Location.objects \
                          .filter(pk=OuterRef("pk")) \
                          .annotate(c=Count("devices")) \
                          .values("c")

locations = Location.objects \
                    .annotate(
                        pfx_count=Subquery(pfx_count_sq), \
                        device_count=Subquery(device_count_sq) \
                    )
>>> locations.get(name="lax").pfx_count
5
>>> locations.get(name="lax").device_count
12

First, we defined two subqueries, pfx_count_sq and device_count_sq. These could be included directly inside of the annotate() method but defining them separately makes the code easier to read.

In each of the subqueries, we used OuterRef class, which allows us to use the pk field from the outer Location object when Subquery is run.

Each subquery uses annotation with a Count aggregator. We use values("c") here to return just one column c, otherwise we could not use the result of the subquery in the outer query annotation.

You also need to ensure only one row is returned. In our case, we are guaranteed to have only one match. If your subquery returns more than one row, you’d have to use slice [:1] to get back only one item.

Lastly, we wrap subqueries in SubQuery class and plug them into the annotate() method in the main query to get the final result.

As you can see, the resulting values now match the numbers we got when using two separate Count annotations.

Note that using Count and other Django aggregation functions is usually more performant than using subqueries because of the underlying database optimizations. However, subqueries are generally preferable to expressing the equivalent logic in Python code. In case of complex subqueries, you should profile underlying database queries to ensure their performance is acceptable.

More Examples

We’ll now have a look at some more examples of annotations, using different Django functions.

Using Length Function

Let’s compute the length of each of the device names and sort the results in descending order.

from django.db.models import F
from django.db.models.functions import Length

devices = Device.objects \
                .annotate(name_len=Length(F("name"))) \
                .order_by("-name_len")
>>> pprint(list(devices)[:20])
[<Device: jcy-spine-02.infra.ntc.com>,
 <Device: nyc-spine-01.infra.ntc.com>,
 <Device: jcy-spine-01.infra.ntc.com>,
 <Device: nyc-spine-02.infra.ntc.com>,
 <Device: nyc-leaf-02.infra.ntc.com>,
 <Device: nyc-leaf-01.infra.ntc.com>,
 <Device: jcy-rtr-01.infra.ntc.com>,
 <Device: nyc-rtr-01.infra.ntc.com>,
 <Device: jcy-rtr-02.infra.ntc.com>,
 <Device: nyc-rtr-02.infra.ntc.com>,
 <Device: jcy-bb-01.infra.ntc.com>,
 <Device: nyc-bb-01.infra.ntc.com>,
 <Device: den-leaf-02>,
 <Device: dfw-leaf-05>,
 <Device: dxb-leaf-02>,
 <Device: dxb-leaf-05>,
 <Device: fra-leaf-04>,
 <Device: fra-leaf-08>,
 <Device: hkg-edge-01>,
 <Device: hkg-leaf-06>]

We made use of Length6 function and F7 object. F object is used to pass value of the device’s name field to the Length function. Length function will return number of characters in the passed argument.

Filtering Values of the Annotation Field

In the below example, we use Count in the annotation. Then, we filter the values in the annotation field vlans_count to get back only locations that have at least one VLAN assigned to them.

locations = Location.objects \
                    .annotate(vlans_count=Count("vlans")) \
                    .filter(vlans_count__gt=1) \
                    .values("name", "vlans_count")
>>> pprint(list(locations))
[{'name': 'ams', 'vlans_count': 16},
 {'name': 'atl', 'vlans_count': 16},
 {'name': 'bkk', 'vlans_count': 16},
 {'name': 'can', 'vlans_count': 16},
 {'name': 'cdg', 'vlans_count': 16},
 {'name': 'del', 'vlans_count': 20},
 {'name': 'den', 'vlans_count': 16},
 {'name': 'dfw', 'vlans_count': 16},
 {'name': 'dxb', 'vlans_count': 16},
 {'name': 'fra', 'vlans_count': 16},
 {'name': 'hkg', 'vlans_count': 16},
 {'name': 'hnd', 'vlans_count': 16},
 {'name': 'icn', 'vlans_count': 8},
 {'name': 'jfk', 'vlans_count': 16},
 {'name': 'lax', 'vlans_count': 20},
 {'name': 'lhr', 'vlans_count': 16},
 {'name': 'ord', 'vlans_count': 16},
 {'name': 'pek', 'vlans_count': 16},
 {'name': 'pvg', 'vlans_count': 8},
 {'name': 'sin', 'vlans_count': 16}]

Using Exists Subquery

We can use Exists8 subquery subclass to get back a boolean True if subquery returns results or False if it doesn’t. For instance, we might want to check whether a given location has devices that are assigned edge role.

from django.db.models import Exists, OuterRef

edge_dev_sq = Subquery(
                Device.objects \
                      .filter(
                          location=OuterRef("pk"),
                          device_role__slug="edge"
                      )
              )

locs_w_edge_dev = Location.objects \
                          .annotate(has_edge_devices=Exists(edge_dev_sq)) \
                          .values("name", "has_edge_devices")
>>> pprint(list(locs_w_edge_dev))
[{'has_edge_devices': False, 'name': 'Jersey City'},
 {'has_edge_devices': False, 'name': 'New York City'},
 {'has_edge_devices': True, 'name': 'ams'},
 {'has_edge_devices': True, 'name': 'atl'},
 {'has_edge_devices': True, 'name': 'bkk'},
 {'has_edge_devices': True, 'name': 'can'},
 {'has_edge_devices': True, 'name': 'cdg'},
 {'has_edge_devices': True, 'name': 'del'},
 {'has_edge_devices': True, 'name': 'den'},
 {'has_edge_devices': True, 'name': 'dfw'},
 {'has_edge_devices': True, 'name': 'dxb'},
 {'has_edge_devices': True, 'name': 'fra'},
 {'has_edge_devices': True, 'name': 'hkg'},
 {'has_edge_devices': True, 'name': 'hnd'},
 {'has_edge_devices': True, 'name': 'icn'},
 {'has_edge_devices': True, 'name': 'jfk'},
 {'has_edge_devices': True, 'name': 'lax'},
 {'has_edge_devices': True, 'name': 'lhr'},
 {'has_edge_devices': True, 'name': 'ord'},
 {'has_edge_devices': True, 'name': 'pek'},
 {'has_edge_devices': True, 'name': 'pvg'},
 {'has_edge_devices': True, 'name': 'sin'}]

Using Python Expressions in Annotations

You can also use Python expressions in your annotations. We’ve seen an example of that in Part 2, where we computed percentage of free interfaces on devices. Here is an example that shows how to annotate Device objects with the time elapsed since the object was created.

import datetime

today = datetime.date.today()

devices = Device.objects \
                .annotate(
                    created_delta=today - F("created")
                )
>>> devices[0].created_delta
datetime.timedelta(days=47)
>>> devices[0].created_delta.days
47

We use datetime.datetime.date.today() to get today’s date. Then, we use this value in the annotation to subtract the date the object was created from today’s date. The result is a datetime.timedelta object, which we can query for the number of days since the device was created.

Count distinct Argument

In our final example we will use Count’s distinct argument to compute number of distinct device roles used by devices in each location.

When we use distinct=True, we’re telling Count to compute number of unique values for the given argument. Here, we’re giving it devices__device_role, which will access device role for the devices in each of the locations.

locations = Location.objects \
                    .annotate(
                         devrole_count=Count(
                            "devices__device_role", distinct=True
                         )
                    ) \
                    .values("name", "devrole_count")
>>> pprint(list(locations))
[{'devrole_count': 3, 'name': 'Jersey City'},
 {'devrole_count': 4, 'name': 'New York City'},
 {'devrole_count': 2, 'name': 'ams'},
 {'devrole_count': 2, 'name': 'atl'},
 {'devrole_count': 2, 'name': 'bkk'},
 {'devrole_count': 2, 'name': 'can'},
 {'devrole_count': 2, 'name': 'cdg'},
 {'devrole_count': 2, 'name': 'del'},
 {'devrole_count': 2, 'name': 'den'},
 {'devrole_count': 2, 'name': 'dfw'},
 {'devrole_count': 2, 'name': 'dxb'},
 {'devrole_count': 2, 'name': 'fra'},
 {'devrole_count': 2, 'name': 'hkg'},
 {'devrole_count': 2, 'name': 'hnd'},
 {'devrole_count': 2, 'name': 'icn'},
 {'devrole_count': 2, 'name': 'jfk'},
 {'devrole_count': 2, 'name': 'lax'},
 {'devrole_count': 2, 'name': 'lhr'},
 {'devrole_count': 2, 'name': 'ord'},
 {'devrole_count': 2, 'name': 'pek'},
 {'devrole_count': 2, 'name': 'pvg'},
 {'devrole_count': 2, 'name': 'sin'}]

References

  1. Django docs – Related objects reference 
  2. Django docs – Query Expressions – SubQuery 
  3. Django docs – Query Expressions – OuterRef 
  4. Django docs – Database Functions – Coalesce 
  5. Django docs – Aggregation – Combining multiple aggregations 
  6. Django docs – Database Functions – Length 
  7. Django docs – Query Expressions – F 
  8. Django docs – Query Expressions – Exists 

Conclusion

This post concludes the series on Nautobot and Django QuerySet annotations. We learned about annotations and subqueries and their use cases. We explored different functions that can be used with annotations and worked through a number of practical examples which showcase their power.

With this acquired knowledge, you should be able to write more powerful and efficient queries.

I hope you enjoyed this series and are looking forward to writing some awesome Nautobot code!

-Przemek Rogala



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Nautobot and Django QuerySet Annotations – Part 2

Blog Detail

This is Part 2 of the series on Django QuerySet annotations. In Part 1 we talked about what the annotations are and why you would want to use them. We then worked through a few examples showing simple use cases.

In this post we’ll look at the behavior of annotations when used with the values clause. We’ll then discuss subqueries and how these can be used with annotations.

UsingvaluesClause with Annotations

Normally, annotation adds one field per each object in the QuerySet. This behavior changes when values1 clause is used before the annotation is applied.

In this case, original results are grouped according to the combinations specified in the values clause. The annotation is then provided for each unique value group.

One of the use cases for this is counting number of members in each group. For instance, we could count how many devices are in each of the possible states (Active, Planned, etc.) for a given site.

When using values, we will usually have to clear the default ordering to stop it from interfering with the grouping operation. You do this by applying order_by()2 method without specifying any fields.

Order of applying values and annotations matters. If annotation goes first, it will be computed over all objects, and values will just return specified column or columns. We’ve seen this behavior in the examples in the previous post.

If what you want is to apply annotation to groups of values, you need to use values before using annotations.

Let’s see how this looks in action:

from django.db.models import Count

device_role_count = Device.objects.values("device_role__name") \
    .order_by() \
    .annotate(count=Count("device_role__name"))
>>> from pprint import pprint
>>> pprint(list(device_role_count))
[{'count': 4, 'device_role__name': 'spine'},
 {'count': 4, 'device_role__name': 'Router'},
 {'count': 158, 'device_role__name': 'leaf'},
 {'count': 2, 'device_role__name': 'Backbone'},
 {'count': 40, 'device_role__name': 'edge'}]
  • values("device_role__name") gets device role names for all of the devices.
  • order_by() clears the default ordering.
  • annotate(count=Count("device_role__name")) annotates the result with the count of the number of devices for each of the device role names.

Computing Ratios between Results ofCountAggregator

Another use case where annotations with values() are handy is when computing ratios between two or more counts.

For instance, you want to know the percentage of free interfaces (i.e., interfaces that are not connected) for each of your devices. To get that information you could use the below query.

from django.db.models import F, Q

devices = Device.objects.filter(interfaces__isnull=False) \
    .annotate(intf_total=Count("interfaces"), 
              notconn_intf=Count("interfaces", filter=Q(interfaces__cable=None)), 
              intf_free_perc=100 * F("notconn_intf") / F("intf_total")) \
    .values("name", "intf_total", "notconn_intf", "intf_free_perc") \
    .order_by("intf_free_perc")
>>> pprint(list(devices)[:5])
[{'intf_free_perc': 70,
  'intf_total': 10,
  'name': 'jcy-rtr-01.infra.ntc.com',
  'notconn_intf': 7},
 {'intf_free_perc': 70,
  'intf_total': 10,
  'name': 'jcy-rtr-02.infra.ntc.com',
  'notconn_intf': 7},
 {'intf_free_perc': 70,
  'intf_total': 10,
  'name': 'jcy-bb-01.infra.ntc.com',
  'notconn_intf': 7},
 {'intf_free_perc': 77,
  'intf_total': 62,
  'name': 'lax-edge-02',
  'notconn_intf': 48},
 {'intf_free_perc': 77,
  'intf_total': 62,
  'name': 'lax-edge-01',
  'notconn_intf': 48}]

There are a lot of moving parts here, so let’s break this query down.

  • filter(interfaces__isnull=False) – first we filter out devices that don’t have any interfaces.

Next we create annotations. The first two annotations are examples of counts that we’ve seen before.

  • intf_total=Count("interfaces") – gives us the total number of interfaces per device.
  • notconn_intf=Count("interfaces", filter=Q(interfaces__cable=None)) – returns count of the interfaces that are not connected to a cable.
  • intf_free_perc=100 * F("notconn_intf") / F("intf_total") – this is the interesting bit. We’re making use of F3 object, which allows us to refer to the value of another field of the model when running queries. Dynamically computed fields added by annotations can also be referred to here. We take advantage of that fact and compute the ratio between the fields we just computed, notconn_intf and intf_total. To get the value as a percentage, we multiply the result by 100.
  • values("name", "intf_total", "notconn_intf", "intf_free_perc") – in the final result, we’re only interested in the three annotation fields displayed next to the name of each of the devices.
  • order_by("intf_free_perc") – to top it off, we sort the results by the percentage of free interfaces per device in ascending order.

Even though this query looks complicated, it really is quite readable once you break it down into the individual components.

Subqueries

Before we move on, we need to briefly touch upon the concept of subqueries4.

Subqueries are queries with results that can be used inside of other queries, as well as in annotations.

You would use a subquery to express logic that would be difficult or impossible to express with a simple query.

Let’s have a look at an example so you can see how useful they can be.

Imagine that you want to find devices belonging to the site with the most recently updated VLAN. To do that, you first define a query to get the most recently updated VLAN. We return only one value, the site of the most recently updated VLAN.

most_recent_vlan_update = VLAN.objects.order_by("-last_updated").values("site")[:1]

We then explicitly wrap this query in a Subquery object and use it in the outer query.

from django.db.models import Subquery

devices = Device.objects.filter(site=Subquery(most_recent_vlan_update))

This query is run against Device objects. We filter these objects to only the ones with a site field value that matches the value returned by the subquery.

<span role="button" tabindex="0" data-code=">>> pprint(list(devices)) [<Device: jfk-edge-01>, <Device: jfk-edge-02>, <Device: jfk-leaf-01>, <Device: jfk-leaf-02>, <Device: jfk-leaf-03>, <Device: jfk-leaf-04>, <Device: jfk-leaf-05>, <Device: jfk-leaf-06>, <Device: jfk-leaf-07>,
>>> pprint(list(devices))
[<Device: jfk-edge-01>,
 <Device: jfk-edge-02>,
 <Device: jfk-leaf-01>,
 <Device: jfk-leaf-02>,
 <Device: jfk-leaf-03>,
 <Device: jfk-leaf-04>,
 <Device: jfk-leaf-05>,
 <Device: jfk-leaf-06>,
 <Device: jfk-leaf-07>,
 <Device: jfk-leaf-08>]

The end result is devices matching the site returned by the subquery.

There is another reason for using subqueries. In most cases, their use results in performance improvements as your application makes fewer calls to the database.

The above query, where we used a subquery, resulted in a single call to the database. Django internally translated that query to a single SQL expression.

Compare this to the below code, where we make two separate queries to the database. The first query evaluates instantly, and the result is assigned to the site variable. We then use its value in the second query.

site = VLAN.objects.order_by("-last_updated").first().site
devices = Device.objects.filter(site=site)

The efficiency gains can be huge, especially in the cases where we replace Python loops containing queries with optimized queries containing subqueries and annotations.

Having this short introduction to subqueries, let’s return to our discussion of annotations.

Using Subqueries in Annotations

We can use subqueries in annotations. This means we can take the result of the subquery and assign it to a dynamically generated field on the model in the outer query.

Let’s take the subquery we defined above, this time returning VLAN name, and use it in the annotation.

most_recent_vlan_update = VLAN.objects.order_by("-last_updated").values("name")[:1]

devices = Device.objects \
            .annotate(most_recent_vlan_update=Subquery(most_recent_vlan_update)) \
            .values("name", "most_recent_vlan_update")
>>> pprint(list(devices)[::20])
[{'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'ams-edge-01'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'bkk-edge-01'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'cdg-edge-01'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'del-leaf-09'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'dfw-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'fra-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'hnd-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'jfk-leaf-06'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'lhr-leaf-04'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'ord-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'sin-leaf-01'}]

The idea here is to add to each device a new field that would hold the name of the most recently updated VLAN.

But this doesn’t quite work as it should. Each device gets assigned the same VLAN. That is, the subquery returns the most recently updated VLAN across all of the sites, and this is what is used in the annotation.

Is there a way of matching the site of the device with the site used in the subquery?

Yes, there is!

Subqueries and OuterRef

To fix our query we need to use OuterRef5 expression.

OuterRef acts like an F object, except it works across the query-subquery boundary. With OuterRef, you can refer (inside the subquery) to the field from the outer query.

Let’s modify our subquery so that only VLANs assigned to the same site as the device are taken into account in the annotation.

from django.db.models import OuterRef

most_recent_vlan_update = VLAN.objects \
                            .filter(site=OuterRef("site")) \
                            .order_by("-last_updated") \
                            .values("name")[:1]

devices = Device.objects \
            .annotate(most_recent_vlan_update=Subquery(most_recent_vlan_update)) \
            .values("name", "most_recent_vlan_update")
>>> pprint(list(devices)[::20])
[{'most_recent_vlan_update': 'ams-108-mgmt', 'name': 'ams-edge-01'},
 {'most_recent_vlan_update': 'bkk-108-mgmt', 'name': 'bkk-edge-01'},
 {'most_recent_vlan_update': 'cdg-108-mgmt', 'name': 'cdg-edge-01'},
 {'most_recent_vlan_update': 'del-110-mgmt', 'name': 'del-leaf-09'},
 {'most_recent_vlan_update': 'dfw-108-mgmt', 'name': 'dfw-leaf-07'},
 {'most_recent_vlan_update': 'fra-108-mgmt', 'name': 'fra-leaf-07'},
 {'most_recent_vlan_update': 'hnd-108-mgmt', 'name': 'hnd-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'jfk-leaf-06'},
 {'most_recent_vlan_update': 'lhr-108-mgmt', 'name': 'lhr-leaf-04'},
 {'most_recent_vlan_update': 'ord-108-mgmt', 'name': 'ord-leaf-07'},
 {'most_recent_vlan_update': 'sin-108-mgmt', 'name': 'sin-leaf-01'}]

Much better. Now each device is assigned the most recently updated VLAN with a site matching the site of the device.

References

  1. Django docs – QuerySet – values() 
  2. Django docs – QuerySet – order_by() 
  3. Django docs – Query Expressions – F() expressions 
  4. Django docs – Query Expressions – Subquery() expressions 
  5. Django docs – Query Expressions – Referencing columns from the outer queryset 

Conclusion

In this post, we learned how the values clause changes the way Django QuerySet annotations work. We also learned about subqueries and how they can help us build more advanced annotation. In the next part of this series, we will look at more examples of using subqueries with annotations. Stay tuned!

-Przemek



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Nautobot ChatOps Adds Support for Socket Mode in Slack

Blog Detail

The Nautobot ChatOps plugin just recently received a new feature for Slack that has been requested for a long time. If you are using Nautobot ChatOps plugin v1.10.0 or greater with Slack, you now have the ability to use Slack’s socket mode for connecting compared to their traditional webhook method!

In this blog post, we will dive into what exactly this means, how you can leverage socket mode, and why you may want to use it over the current method of connecting. Let’s dive in!

How Does ChatOps Connect Today?

Nautobot ChatOps today currently supports four major platforms: Slack, Microsoft Teams, Webex, and Mattermost. All four of these platforms support incoming webhooks, so this is the default setup method for them.

Let’s take a look at how this traffic flow currently works using this diagram from the official documentation page.

  1. When a user sends a message to the bot via the client application (phone or desktop app), the connection is initiated from the client, and the message gets sent to Slack’s cloud servers.
  2. The Slack servers then initiate a connection to the ChatOps plugin running on the Nautobot server/container.
  3. Since this is a new incoming connection, it must be port forwarded and allowed through any firewalls in your network between the internet and your Nautobot server.
  4. Since your externally facing firewall is stateful, Nautobot then replies back to Slack using the established TCP session.
  5. Slack then returns the response to the client.

The important step to notice here is step 3. In order for Slack’s web services to communicate to Nautobot with the request originating from the user/client, traffic must be explicitly allowed inbound. Normally this wouldn’t be much of an issue on a firewall. However, the issue is made more complicated for three reasons:

  1. A port must be opened and forwarded from the public internet to your Nautobot appliciation.
  2. It is not viable to simply white-list Slack’s IP addresses as the API requests originate from within AWS, so the IP ranges are quite extensive and will encompass other services on AWS besides Slack (aka the rest of the internet).
  3. Slack requires a valid Third-Party SSL Certificate before it will make the connection.

There are a few ways to secure inbound ChatOps access from Slack, such as adding verification that the webhook is legitimate, using an externally facing API Gateway, etc. One example Nautobot currently uses is verifying all inbound requests from Slack using a signed secret.

While we won’t get into all of them here, you can read about some of them in Slack’s documentation “Best practices for security”.

In the end, this still requires enabling inbound access from the public internet to your internal Nautobot instance, which tends to make InfoSec a little nervous.

Introducing socket mode.

What Is Socket Mode?

Socket mode allows traffic to no longer originate from Slack on the public internet, but instead from internally going out.

  1. The Nautobot server initiates and maintains a connection to the Slack cloud servers. Because the traffic originates from inside the network, externally facing firewalls do not need to allow inbound traffic from Slack to the Nautobot server. They only need to allow traffic outbound from the Nautobot server to Slack on the internet. This connection remains open through this entire process.
  2. A user will send a message via their desktop/mobile Slack client to the Nautobot ChatOps bot.
  3. Slack forwards the message to Nautobot over the existing WebSocket connection.
  4. Nautobot then replies over the existing WebSocket connection to Slack.
  5. Slack forwards the response to the client.

This reduces the externally facing Nautobot footprint out to the internet and does not require opening specific inbound access to Nautobot from the internet just to get ChatOps working.

Additionally, once Nautobot establishes a connection to Slack it keeps the connection active, eliminating the need to reestablish a new connection each time (as with webhooks).

Should I Use Socket Mode?

Either setup option works for Nautobot ChatOps with Slack. However, with the number of customers we’ve worked with at NTC that express concern over exposing Nautobot to the public internet and do not have an API gateway they can leverage to lock down inbound access, we expect to see many users prefer to use socket mode for security and compliance reasons.

If you currently have Nautobot ChatOps working with Slack, there is no real need to move over to socket mode. However, if you are setting it up new for the first time and your InfoSec team has concerns with enabling inbound access, socket mode may work better for your situation.

Setup Instructions

Support for socket mode with Slack was added to Nautobot v1.10.0, so if you are on an older version you will need to upgrade your Nautobot ChatOps plugin first. Since this plugin acts independently from other Nautobot ChatOps extensions (e.g., Meraki ChatOpsAnsible ChatOps, etc.), configuring this base plugin for socket mode will enable these other plugin extensions to leverage it as well without needing to specifically upgrade them (provided no other dependency conflicts exist).

Instructions for setting up Nautobot ChatOps for Slack can be found here. Note that the Nautobot application requires an additional management command to be run at startup time, nautobot-server start_slack_socket, which can be added as a systemd service so it automatically runs each time the app is restarted. Details can be found in section “Startup Slack Sockets”.

There is also an additional credential slack_app_token needing to be added to the Nautobot ChatOps plugin config section in nautobot_config.py. Details for this are found in section “Post App-Creation Steps” in the install docs as well.

Note: While socket mode support is only added to the Slack platform so far, we are working on adding in support for socket mode with the other providers (where supported) in the future.


Conclusion

I hope you enjoyed learning about our new support for socket mode with Slack. If you try it out, let me know what you think of it using the comment section below. Or feel free to submit a feature request, bug report, or even a code contribution via PR in our Nautobot ChatOps plugin public repo on GitHub.

 

Thanks for reading, and happy automating!

-Matt



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!