Nautobot and Django QuerySet Annotations – Part 2

This is Part 2 of the series on Django QuerySet annotations. In Part 1 we talked about what the annotations are and why you would want to use them. We then worked through a few examples showing simple use cases.

In this post we’ll look at the behavior of annotations when used with the values clause. We’ll then discuss subqueries and how these can be used with annotations.

UsingvaluesClause with Annotations

Normally, annotation adds one field per each object in the QuerySet. This behavior changes when values1 clause is used before the annotation is applied.

In this case, original results are grouped according to the combinations specified in the values clause. The annotation is then provided for each unique value group.

One of the use cases for this is counting number of members in each group. For instance, we could count how many devices are in each of the possible states (Active, Planned, etc.) for a given site.

When using values, we will usually have to clear the default ordering to stop it from interfering with the grouping operation. You do this by applying order_by()2 method without specifying any fields.

Order of applying values and annotations matters. If annotation goes first, it will be computed over all objects, and values will just return specified column or columns. We’ve seen this behavior in the examples in the previous post.

If what you want is to apply annotation to groups of values, you need to use values before using annotations.

Let’s see how this looks in action:

from django.db.models import Count

device_role_count = Device.objects.values("device_role__name") \
    .order_by() \
    .annotate(count=Count("device_role__name"))
>>> from pprint import pprint
>>> pprint(list(device_role_count))
[{'count': 4, 'device_role__name': 'spine'},
 {'count': 4, 'device_role__name': 'Router'},
 {'count': 158, 'device_role__name': 'leaf'},
 {'count': 2, 'device_role__name': 'Backbone'},
 {'count': 40, 'device_role__name': 'edge'}]
  • values("device_role__name") gets device role names for all of the devices.
  • order_by() clears the default ordering.
  • annotate(count=Count("device_role__name")) annotates the result with the count of the number of devices for each of the device role names.

Computing Ratios between Results ofCountAggregator

Another use case where annotations with values() are handy is when computing ratios between two or more counts.

For instance, you want to know the percentage of free interfaces (i.e., interfaces that are not connected) for each of your devices. To get that information you could use the below query.

from django.db.models import F, Q

devices = Device.objects.filter(interfaces__isnull=False) \
    .annotate(intf_total=Count("interfaces"), 
              notconn_intf=Count("interfaces", filter=Q(interfaces__cable=None)), 
              intf_free_perc=100 * F("notconn_intf") / F("intf_total")) \
    .values("name", "intf_total", "notconn_intf", "intf_free_perc") \
    .order_by("intf_free_perc")
>>> pprint(list(devices)[:5])
[{'intf_free_perc': 70,
  'intf_total': 10,
  'name': 'jcy-rtr-01.infra.ntc.com',
  'notconn_intf': 7},
 {'intf_free_perc': 70,
  'intf_total': 10,
  'name': 'jcy-rtr-02.infra.ntc.com',
  'notconn_intf': 7},
 {'intf_free_perc': 70,
  'intf_total': 10,
  'name': 'jcy-bb-01.infra.ntc.com',
  'notconn_intf': 7},
 {'intf_free_perc': 77,
  'intf_total': 62,
  'name': 'lax-edge-02',
  'notconn_intf': 48},
 {'intf_free_perc': 77,
  'intf_total': 62,
  'name': 'lax-edge-01',
  'notconn_intf': 48}]

There are a lot of moving parts here, so let’s break this query down.

  • filter(interfaces__isnull=False) – first we filter out devices that don’t have any interfaces.

Next we create annotations. The first two annotations are examples of counts that we’ve seen before.

  • intf_total=Count("interfaces") – gives us the total number of interfaces per device.
  • notconn_intf=Count("interfaces", filter=Q(interfaces__cable=None)) – returns count of the interfaces that are not connected to a cable.
  • intf_free_perc=100 * F("notconn_intf") / F("intf_total") – this is the interesting bit. We’re making use of F3 object, which allows us to refer to the value of another field of the model when running queries. Dynamically computed fields added by annotations can also be referred to here. We take advantage of that fact and compute the ratio between the fields we just computed, notconn_intf and intf_total. To get the value as a percentage, we multiply the result by 100.
  • values("name", "intf_total", "notconn_intf", "intf_free_perc") – in the final result, we’re only interested in the three annotation fields displayed next to the name of each of the devices.
  • order_by("intf_free_perc") – to top it off, we sort the results by the percentage of free interfaces per device in ascending order.

Even though this query looks complicated, it really is quite readable once you break it down into the individual components.

Subqueries

Before we move on, we need to briefly touch upon the concept of subqueries4.

Subqueries are queries with results that can be used inside of other queries, as well as in annotations.

You would use a subquery to express logic that would be difficult or impossible to express with a simple query.

Let’s have a look at an example so you can see how useful they can be.

Imagine that you want to find devices belonging to the site with the most recently updated VLAN. To do that, you first define a query to get the most recently updated VLAN. We return only one value, the site of the most recently updated VLAN.

most_recent_vlan_update = VLAN.objects.order_by("-last_updated").values("site")[:1]

We then explicitly wrap this query in a Subquery object and use it in the outer query.

from django.db.models import Subquery

devices = Device.objects.filter(site=Subquery(most_recent_vlan_update))

This query is run against Device objects. We filter these objects to only the ones with a site field value that matches the value returned by the subquery.

<span role="button" tabindex="0" data-code=">>> pprint(list(devices)) [<Device: jfk-edge-01>, <Device: jfk-edge-02>, <Device: jfk-leaf-01>, <Device: jfk-leaf-02>, <Device: jfk-leaf-03>, <Device: jfk-leaf-04>, <Device: jfk-leaf-05>, <Device: jfk-leaf-06>, <Device: jfk-leaf-07>,
>>> pprint(list(devices))
[<Device: jfk-edge-01>,
 <Device: jfk-edge-02>,
 <Device: jfk-leaf-01>,
 <Device: jfk-leaf-02>,
 <Device: jfk-leaf-03>,
 <Device: jfk-leaf-04>,
 <Device: jfk-leaf-05>,
 <Device: jfk-leaf-06>,
 <Device: jfk-leaf-07>,
 <Device: jfk-leaf-08>]

The end result is devices matching the site returned by the subquery.

There is another reason for using subqueries. In most cases, their use results in performance improvements as your application makes fewer calls to the database.

The above query, where we used a subquery, resulted in a single call to the database. Django internally translated that query to a single SQL expression.

Compare this to the below code, where we make two separate queries to the database. The first query evaluates instantly, and the result is assigned to the site variable. We then use its value in the second query.

site = VLAN.objects.order_by("-last_updated").first().site
devices = Device.objects.filter(site=site)

The efficiency gains can be huge, especially in the cases where we replace Python loops containing queries with optimized queries containing subqueries and annotations.

Having this short introduction to subqueries, let’s return to our discussion of annotations.

Using Subqueries in Annotations

We can use subqueries in annotations. This means we can take the result of the subquery and assign it to a dynamically generated field on the model in the outer query.

Let’s take the subquery we defined above, this time returning VLAN name, and use it in the annotation.

most_recent_vlan_update = VLAN.objects.order_by("-last_updated").values("name")[:1]

devices = Device.objects \
            .annotate(most_recent_vlan_update=Subquery(most_recent_vlan_update)) \
            .values("name", "most_recent_vlan_update")
>>> pprint(list(devices)[::20])
[{'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'ams-edge-01'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'bkk-edge-01'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'cdg-edge-01'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'del-leaf-09'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'dfw-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'fra-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'hnd-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'jfk-leaf-06'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'lhr-leaf-04'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'ord-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'sin-leaf-01'}]

The idea here is to add to each device a new field that would hold the name of the most recently updated VLAN.

But this doesn’t quite work as it should. Each device gets assigned the same VLAN. That is, the subquery returns the most recently updated VLAN across all of the sites, and this is what is used in the annotation.

Is there a way of matching the site of the device with the site used in the subquery?

Yes, there is!

Subqueries and OuterRef

To fix our query we need to use OuterRef5 expression.

OuterRef acts like an F object, except it works across the query-subquery boundary. With OuterRef, you can refer (inside the subquery) to the field from the outer query.

Let’s modify our subquery so that only VLANs assigned to the same site as the device are taken into account in the annotation.

from django.db.models import OuterRef

most_recent_vlan_update = VLAN.objects \
                            .filter(site=OuterRef("site")) \
                            .order_by("-last_updated") \
                            .values("name")[:1]

devices = Device.objects \
            .annotate(most_recent_vlan_update=Subquery(most_recent_vlan_update)) \
            .values("name", "most_recent_vlan_update")
>>> pprint(list(devices)[::20])
[{'most_recent_vlan_update': 'ams-108-mgmt', 'name': 'ams-edge-01'},
 {'most_recent_vlan_update': 'bkk-108-mgmt', 'name': 'bkk-edge-01'},
 {'most_recent_vlan_update': 'cdg-108-mgmt', 'name': 'cdg-edge-01'},
 {'most_recent_vlan_update': 'del-110-mgmt', 'name': 'del-leaf-09'},
 {'most_recent_vlan_update': 'dfw-108-mgmt', 'name': 'dfw-leaf-07'},
 {'most_recent_vlan_update': 'fra-108-mgmt', 'name': 'fra-leaf-07'},
 {'most_recent_vlan_update': 'hnd-108-mgmt', 'name': 'hnd-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'jfk-leaf-06'},
 {'most_recent_vlan_update': 'lhr-108-mgmt', 'name': 'lhr-leaf-04'},
 {'most_recent_vlan_update': 'ord-108-mgmt', 'name': 'ord-leaf-07'},
 {'most_recent_vlan_update': 'sin-108-mgmt', 'name': 'sin-leaf-01'}]

Much better. Now each device is assigned the most recently updated VLAN with a site matching the site of the device.

References

  1. Django docs – QuerySet – values() 
  2. Django docs – QuerySet – order_by() 
  3. Django docs – Query Expressions – F() expressions 
  4. Django docs – Query Expressions – Subquery() expressions 
  5. Django docs – Query Expressions – Referencing columns from the outer queryset 

Conclusion

In this post, we learned how the values clause changes the way Django QuerySet annotations work. We also learned about subqueries and how they can help us build more advanced annotation. In the next part of this series, we will look at more examples of using subqueries with annotations. Stay tuned!

-Przemek



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Author