Nautobot and Django QuerySet Annotations – Part 3

Blog Detail

This is Part 3 of the series on Django QuerySet annotations. In Part 1 we talked about what the annotations are and why you would want to use them. In Part 2 we looked at subqueries and more annotation examples.

In this post, we’ll learn how to use annotations spanning multiple tables, and we will go through more advanced use cases.

In the first part of this series we looked at examples of using Count aggregator with annotations to count the number of related objects. We took advantage of reverse relationships1 created by Django and used value of the related_name argument for accessing related objects.

However, not all related objects have a reverse relationship. When a model is defined, it’s possible to explicitly disable the reverse relationship by using the value + (plus sign) for the related_name. Or perhaps our models are indirectly related, i.e., there is no ForeignKey pointing directly from one model to another.

In these cases we have to use subqueries2 in our annotations to process related objects. The resulting queries tend to be a bit more complex; but once you get the hang of it, it all starts to make sense.

Let’s go back to the example where we counted number of circuits for each provider. We’ll try to build an equivalent query that doesn’t use the explicit relationship but uses a subquery instead. This subquery will leverage the fact that the Circuit model has ForeignKey pointing to the Provider model.

This is the query we used previously:

from django.db.models import Count

providers = Provider.objects.annotate(circuit_count=Count("circuits"))

Let’s modify it. First, we build the skeleton of the outer query:

from django.db.models import Subquery, OuterRef

Provider.objects.annotate(circuit_count=Subquery(...))

We keep the same annotation field name but instead of using the Count aggregator we insert a result returned by the subquery.

We now need to focus on building that subquery:

circuits = Circuit.objects \
                  .filter(provider=OuterRef("pk")) \
                  .values("provider") \
                  .order_by() \
                  .annotate(c=Count("provider")) \
                  .values("c")

That’s a bit more complex than using Count aggregator, but sometimes this is the only available option.

Before we combine the subquery and the outer query to get our final result, let’s have a closer look at all of the components of the subquerry to make sure we understand what they do.

  • filter(provider=OuterRef("pk")) — We want only Circuit objects with provider equal to the provider in the outer query. Field pk in OuterRef3 refers to the primary key of the Provider model.
  • values("provider") — Using values clause before annotation will trigger grouping behavior of the annotation as discussed in Part 2 of this series. We will count number of circuits for given provider, so we specify provider field here.
  • order_by() — We clear the default ordering to ensure grouping operation returns correct results.
  • annotate(c=Count("provider")) — Now we apply our annotation with the Count aggregator. New field, named c, will hold number of circuits for each provider group. We applied filtering at the beginning, so there will be only one such group.
  • values("c") — Finally, we specify that only one column is returned, which is the column added by the annotation.

Our subquery must return only one row and one column so that it can be used in the subsequent annotation. This is why we had to use values("c"); otherwise two columns would be returned, and we would get an error.

In this particular subquery only one row will be returned because of the filtering we applied. That won’t always be the case, so we will talk later about how to explicitly force subquery to return one row only.

We built the subquery and talked through how it works. We now need to use it in the main query to get circuit count for each of the providers:

providers = Provider.objects \
                    .annotate(circuit_count=Subquery(circuits)) \
                    .values("name", "circuit_count")

That’s it. This should have done the job. Let’s investigate the resulting values:

>>> pprint(list(providers))
[{'circuit_count': None, 'name': 'AT&T'},
 {'circuit_count': None, 'name': 'Cogent'},
 {'circuit_count': None, 'name': 'Deutsche Telekom'},
 {'circuit_count': None, 'name': 'GTT'},
 {'circuit_count': None, 'name': 'Liberty Global'},
 {'circuit_count': None, 'name': 'Lumen Technologies'},
 {'circuit_count': 40, 'name': 'NTT'},
 {'circuit_count': None, 'name': 'Orange'},
 {'circuit_count': None, 'name': 'PCCW Global'},
 {'circuit_count': None, 'name': 'Sprint'},
 {'circuit_count': None, 'name': 'Tata Communications'},
 {'circuit_count': None, 'name': 'Telecom Italia'},
 {'circuit_count': 40, 'name': 'Telia Carrier'},
 {'circuit_count': None, 'name': 'Telxius'},
 {'circuit_count': None, 'name': 'Verizon'},
 {'circuit_count': None, 'name': 'Zayo'}]

Something is not quite right here. Why is None showing up for some of the providers? When we used Count with reverse relationships, the value 0 was shown for the providers that don’t have circuits.

We get None here because when the subquery doesn’t return any results for given provider, a value of None will be used when annotating the Provider objects.

Fortunately, Django provides a function that can help us fix this. We’ll look at it now.

Coalesce Function

We saw that when subquery does not return anything, the value assigned to the annotation field is None. This is generally not what we want. If subquery doesn’t return anything, we would like to define a default value that is assigned to a field instead.

It so happens that Django provides a function that does exactly that. It’s a function named Coalesce4. Imagine it working like a dictionary get() method in Python, with default return value provided.

Coalesce accepts a list of at least two field names or expressions and returns the first non-null value. This means we could use multiple subqueries and wrap them in the Coalesce function. The first non-empty result from the ones returned by subqueries will be used. The caveat here is that the arguments must be of similar type. You can’t mix numbers with text, for example.

So now we know we have to use Coalesce function to fix our query. Let’s go ahead and feed it the subquery we created and an integer 0, which will be our default value.

from django.db.models.functions import Coalesce

providers = Provider.objects \
                    .annotate(
                        circuit_count=Coalesce(Subquery(circuits), 0)
                    ) \
                    .values("name", "circuit_count")

Let’s investigate the resulting values:

>>> pprint(list(providers))
[{'circuit_count': 0, 'name': 'AT&T'},
 {'circuit_count': 0, 'name': 'Cogent'},
 {'circuit_count': 0, 'name': 'Deutsche Telekom'},
 {'circuit_count': 0, 'name': 'GTT'},
 {'circuit_count': 0, 'name': 'Liberty Global'},
 {'circuit_count': 0, 'name': 'Lumen Technologies'},
 {'circuit_count': 40, 'name': 'NTT'},
 {'circuit_count': 0, 'name': 'Orange'},
 {'circuit_count': 0, 'name': 'PCCW Global'},
 {'circuit_count': 0, 'name': 'Sprint'},
 {'circuit_count': 0, 'name': 'Tata Communications'},
 {'circuit_count': 0, 'name': 'Telecom Italia'},
 {'circuit_count': 40, 'name': 'Telia Carrier'},
 {'circuit_count': 0, 'name': 'Telxius'},
 {'circuit_count': 0, 'name': 'Verizon'},
 {'circuit_count': 0, 'name': 'Zayo'}]

Great! We fixed our problem, and now providers that don’t have any circuits have the value of the field circuit_count set to 0.

Limiting Subquery Results to One Column and One Row

As I mentioned in the previous example, a subquery will often return more than one row and column, which cannot be used in some annotations.

Consider the below example of annotating Location objects with the name of the most recently added device within each location:

recently_added = Device.objects \
                       .filter(location=OuterRef("pk")) \
                       .order_by("-last_updated") \
                       .values("name")

locations = Location.objects \
                    .annotate(newest_device=Subquery(recently_added))
>>> print(locations)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  ... (cut for brevity)
django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression

We defined subquery recently_added, which returns devices located in a given location, ordered by the date they were last updated.

When we try to use this subquery in an annotation, we get an error. Django helpfully tells us that subquery returned too many rows. This happens because there could be multiple devices located in any given location, and each row contains device record.

To fix this, we use one element slice [:1] on the QuerySet returned by the subquery. Subquery expects to receive a QuerySet that produces one row, which is why we are using a slice. Note that we cannot use [0] index or first() function here, as these return a single instance of an object, or a dictionary when used with values.

recently_added = Device.objects \
                       .filter(location=OuterRef("pk")) \
                       .order_by("-last_updated") \
                       .values("name")[:1]

locations = Location.objects \
                    .annotate(newest_device=Subquery(recently_added)) \
                    .values("name", "newest_device")
>>> pprint(list(locations))
[{'name': 'Jersey City', 'newest_device': 'jcy-spine-02.infra.ntc.com'},
 {'name': 'New York City', 'newest_device': 'nyc-spine-02.infra.ntc.com'},
 {'name': 'ams', 'newest_device': 'ams-leaf-08'},
 {'name': 'atl', 'newest_device': 'atl-leaf-08'},
 {'name': 'bkk', 'newest_device': 'bkk-leaf-08'},
 {'name': 'can', 'newest_device': 'can-leaf-08'},
 {'name': 'cdg', 'newest_device': 'cdg-leaf-08'},
 {'name': 'del', 'newest_device': 'del-leaf-10'},
 {'name': 'den', 'newest_device': 'den-leaf-08'},
 {'name': 'dfw', 'newest_device': 'dfw-leaf-08'},
 {'name': 'dxb', 'newest_device': 'dxb-leaf-08'},
 {'name': 'fra', 'newest_device': 'fra-leaf-08'},
 {'name': 'hkg', 'newest_device': 'hkg-leaf-08'},
 {'name': 'hnd', 'newest_device': 'hnd-leaf-08'},
 {'name': 'icn', 'newest_device': 'icn-leaf-04'},
 {'name': 'jfk', 'newest_device': 'jfk-leaf-08'},
 {'name': 'lax', 'newest_device': 'lax-leaf-10'},
 {'name': 'lhr', 'newest_device': 'lhr-leaf-08'},
 {'name': 'ord', 'newest_device': 'ord-leaf-08'},
 {'name': 'pek', 'newest_device': 'pek-leaf-08'},
 {'name': 'pvg', 'newest_device': 'pvg-leaf-04'},
 {'name': 'sin', 'newest_device': 'sin-leaf-08'}]

And there you have it — our annotation is now working as expected.

Annotations Across Many Tables

A word of warning when using annotations across more than two tables: Due to the way Django translates queries with annotations to underlying SQL, using aggregator functions like Count or Avg across multiple tables will give incorrect results.

For instance, if we try annotating Location records with the count of related prefixes and the count of related devices, we get incorrect numbers:

>>> locations = Location.objects.annotate(pfx_count=Count("prefixes"))
>>> locations.get(name="lax").pfx_count
5
>>> locations = Location.objects.annotate(device_count=Count("devices"))
>>> locations.get(name="lax").device_count
12
>>> locations = Location.objects.annotate(pfx_count=Count("prefixes"), device_count=Count("devices"))
>>> locations.get(name="lax").device_count
60
>>> locations.get(name="lax").pfx_count
60

As you can see above, when using a single annotation with Count, we get correct results. But when we try to annotate Location objects with related Prefix and Device objects in a single query, we get back an incorrect value for both fields5.

Note that the in the case of Count, we can address this problem by providing argument distinct=True when invoking Count. However, this will not work for other aggregation functions.

>>> locations = Location.objects \
                        .annotate(
                            pfx_count=Count("prefixes", distinct=True),
                            device_count=Count("devices", distinct=True)
                        )
>>> locations.get(name="lax").device_count
12
>>> locations.get(name="lax").pfx_count
5

We will see how we can solve this problem for the generic case by using our trusty subqueries. In the example below, each of the subqueries will be run independently behind the scenes, and the results of each will be assigned to the given field.

from django.db.models import Subquery, OuterRef

pfx_count_sq = Location.objects \
                       .filter(pk=OuterRef("pk")) \
                       .annotate(c=Count("prefixes")) \
                       .values("c")

device_count_sq = Location.objects \
                          .filter(pk=OuterRef("pk")) \
                          .annotate(c=Count("devices")) \
                          .values("c")

locations = Location.objects \
                    .annotate(
                        pfx_count=Subquery(pfx_count_sq), \
                        device_count=Subquery(device_count_sq) \
                    )
>>> locations.get(name="lax").pfx_count
5
>>> locations.get(name="lax").device_count
12

First, we defined two subqueries, pfx_count_sq and device_count_sq. These could be included directly inside of the annotate() method but defining them separately makes the code easier to read.

In each of the subqueries, we used OuterRef class, which allows us to use the pk field from the outer Location object when Subquery is run.

Each subquery uses annotation with a Count aggregator. We use values("c") here to return just one column c, otherwise we could not use the result of the subquery in the outer query annotation.

You also need to ensure only one row is returned. In our case, we are guaranteed to have only one match. If your subquery returns more than one row, you’d have to use slice [:1] to get back only one item.

Lastly, we wrap subqueries in SubQuery class and plug them into the annotate() method in the main query to get the final result.

As you can see, the resulting values now match the numbers we got when using two separate Count annotations.

Note that using Count and other Django aggregation functions is usually more performant than using subqueries because of the underlying database optimizations. However, subqueries are generally preferable to expressing the equivalent logic in Python code. In case of complex subqueries, you should profile underlying database queries to ensure their performance is acceptable.

More Examples

We’ll now have a look at some more examples of annotations, using different Django functions.

Using Length Function

Let’s compute the length of each of the device names and sort the results in descending order.

from django.db.models import F
from django.db.models.functions import Length

devices = Device.objects \
                .annotate(name_len=Length(F("name"))) \
                .order_by("-name_len")
>>> pprint(list(devices)[:20])
[<Device: jcy-spine-02.infra.ntc.com>,
 <Device: nyc-spine-01.infra.ntc.com>,
 <Device: jcy-spine-01.infra.ntc.com>,
 <Device: nyc-spine-02.infra.ntc.com>,
 <Device: nyc-leaf-02.infra.ntc.com>,
 <Device: nyc-leaf-01.infra.ntc.com>,
 <Device: jcy-rtr-01.infra.ntc.com>,
 <Device: nyc-rtr-01.infra.ntc.com>,
 <Device: jcy-rtr-02.infra.ntc.com>,
 <Device: nyc-rtr-02.infra.ntc.com>,
 <Device: jcy-bb-01.infra.ntc.com>,
 <Device: nyc-bb-01.infra.ntc.com>,
 <Device: den-leaf-02>,
 <Device: dfw-leaf-05>,
 <Device: dxb-leaf-02>,
 <Device: dxb-leaf-05>,
 <Device: fra-leaf-04>,
 <Device: fra-leaf-08>,
 <Device: hkg-edge-01>,
 <Device: hkg-leaf-06>]

We made use of Length6 function and F7 object. F object is used to pass value of the device’s name field to the Length function. Length function will return number of characters in the passed argument.

Filtering Values of the Annotation Field

In the below example, we use Count in the annotation. Then, we filter the values in the annotation field vlans_count to get back only locations that have at least one VLAN assigned to them.

locations = Location.objects \
                    .annotate(vlans_count=Count("vlans")) \
                    .filter(vlans_count__gt=1) \
                    .values("name", "vlans_count")
>>> pprint(list(locations))
[{'name': 'ams', 'vlans_count': 16},
 {'name': 'atl', 'vlans_count': 16},
 {'name': 'bkk', 'vlans_count': 16},
 {'name': 'can', 'vlans_count': 16},
 {'name': 'cdg', 'vlans_count': 16},
 {'name': 'del', 'vlans_count': 20},
 {'name': 'den', 'vlans_count': 16},
 {'name': 'dfw', 'vlans_count': 16},
 {'name': 'dxb', 'vlans_count': 16},
 {'name': 'fra', 'vlans_count': 16},
 {'name': 'hkg', 'vlans_count': 16},
 {'name': 'hnd', 'vlans_count': 16},
 {'name': 'icn', 'vlans_count': 8},
 {'name': 'jfk', 'vlans_count': 16},
 {'name': 'lax', 'vlans_count': 20},
 {'name': 'lhr', 'vlans_count': 16},
 {'name': 'ord', 'vlans_count': 16},
 {'name': 'pek', 'vlans_count': 16},
 {'name': 'pvg', 'vlans_count': 8},
 {'name': 'sin', 'vlans_count': 16}]

Using Exists Subquery

We can use Exists8 subquery subclass to get back a boolean True if subquery returns results or False if it doesn’t. For instance, we might want to check whether a given location has devices that are assigned edge role.

from django.db.models import Exists, OuterRef

edge_dev_sq = Subquery(
                Device.objects \
                      .filter(
                          location=OuterRef("pk"),
                          device_role__slug="edge"
                      )
              )

locs_w_edge_dev = Location.objects \
                          .annotate(has_edge_devices=Exists(edge_dev_sq)) \
                          .values("name", "has_edge_devices")
>>> pprint(list(locs_w_edge_dev))
[{'has_edge_devices': False, 'name': 'Jersey City'},
 {'has_edge_devices': False, 'name': 'New York City'},
 {'has_edge_devices': True, 'name': 'ams'},
 {'has_edge_devices': True, 'name': 'atl'},
 {'has_edge_devices': True, 'name': 'bkk'},
 {'has_edge_devices': True, 'name': 'can'},
 {'has_edge_devices': True, 'name': 'cdg'},
 {'has_edge_devices': True, 'name': 'del'},
 {'has_edge_devices': True, 'name': 'den'},
 {'has_edge_devices': True, 'name': 'dfw'},
 {'has_edge_devices': True, 'name': 'dxb'},
 {'has_edge_devices': True, 'name': 'fra'},
 {'has_edge_devices': True, 'name': 'hkg'},
 {'has_edge_devices': True, 'name': 'hnd'},
 {'has_edge_devices': True, 'name': 'icn'},
 {'has_edge_devices': True, 'name': 'jfk'},
 {'has_edge_devices': True, 'name': 'lax'},
 {'has_edge_devices': True, 'name': 'lhr'},
 {'has_edge_devices': True, 'name': 'ord'},
 {'has_edge_devices': True, 'name': 'pek'},
 {'has_edge_devices': True, 'name': 'pvg'},
 {'has_edge_devices': True, 'name': 'sin'}]

Using Python Expressions in Annotations

You can also use Python expressions in your annotations. We’ve seen an example of that in Part 2, where we computed percentage of free interfaces on devices. Here is an example that shows how to annotate Device objects with the time elapsed since the object was created.

import datetime

today = datetime.date.today()

devices = Device.objects \
                .annotate(
                    created_delta=today - F("created")
                )
>>> devices[0].created_delta
datetime.timedelta(days=47)
>>> devices[0].created_delta.days
47

We use datetime.datetime.date.today() to get today’s date. Then, we use this value in the annotation to subtract the date the object was created from today’s date. The result is a datetime.timedelta object, which we can query for the number of days since the device was created.

Count distinct Argument

In our final example we will use Count’s distinct argument to compute number of distinct device roles used by devices in each location.

When we use distinct=True, we’re telling Count to compute number of unique values for the given argument. Here, we’re giving it devices__device_role, which will access device role for the devices in each of the locations.

locations = Location.objects \
                    .annotate(
                         devrole_count=Count(
                            "devices__device_role", distinct=True
                         )
                    ) \
                    .values("name", "devrole_count")
>>> pprint(list(locations))
[{'devrole_count': 3, 'name': 'Jersey City'},
 {'devrole_count': 4, 'name': 'New York City'},
 {'devrole_count': 2, 'name': 'ams'},
 {'devrole_count': 2, 'name': 'atl'},
 {'devrole_count': 2, 'name': 'bkk'},
 {'devrole_count': 2, 'name': 'can'},
 {'devrole_count': 2, 'name': 'cdg'},
 {'devrole_count': 2, 'name': 'del'},
 {'devrole_count': 2, 'name': 'den'},
 {'devrole_count': 2, 'name': 'dfw'},
 {'devrole_count': 2, 'name': 'dxb'},
 {'devrole_count': 2, 'name': 'fra'},
 {'devrole_count': 2, 'name': 'hkg'},
 {'devrole_count': 2, 'name': 'hnd'},
 {'devrole_count': 2, 'name': 'icn'},
 {'devrole_count': 2, 'name': 'jfk'},
 {'devrole_count': 2, 'name': 'lax'},
 {'devrole_count': 2, 'name': 'lhr'},
 {'devrole_count': 2, 'name': 'ord'},
 {'devrole_count': 2, 'name': 'pek'},
 {'devrole_count': 2, 'name': 'pvg'},
 {'devrole_count': 2, 'name': 'sin'}]

References

  1. Django docs – Related objects reference 
  2. Django docs – Query Expressions – SubQuery 
  3. Django docs – Query Expressions – OuterRef 
  4. Django docs – Database Functions – Coalesce 
  5. Django docs – Aggregation – Combining multiple aggregations 
  6. Django docs – Database Functions – Length 
  7. Django docs – Query Expressions – F 
  8. Django docs – Query Expressions – Exists 

Conclusion

This post concludes the series on Nautobot and Django QuerySet annotations. We learned about annotations and subqueries and their use cases. We explored different functions that can be used with annotations and worked through a number of practical examples which showcase their power.

With this acquired knowledge, you should be able to write more powerful and efficient queries.

I hope you enjoyed this series and are looking forward to writing some awesome Nautobot code!

-Przemek Rogala



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Nautobot and Django QuerySet Annotations – Part 2

Blog Detail

This is Part 2 of the series on Django QuerySet annotations. In Part 1 we talked about what the annotations are and why you would want to use them. We then worked through a few examples showing simple use cases.

In this post we’ll look at the behavior of annotations when used with the values clause. We’ll then discuss subqueries and how these can be used with annotations.

UsingvaluesClause with Annotations

Normally, annotation adds one field per each object in the QuerySet. This behavior changes when values1 clause is used before the annotation is applied.

In this case, original results are grouped according to the combinations specified in the values clause. The annotation is then provided for each unique value group.

One of the use cases for this is counting number of members in each group. For instance, we could count how many devices are in each of the possible states (Active, Planned, etc.) for a given site.

When using values, we will usually have to clear the default ordering to stop it from interfering with the grouping operation. You do this by applying order_by()2 method without specifying any fields.

Order of applying values and annotations matters. If annotation goes first, it will be computed over all objects, and values will just return specified column or columns. We’ve seen this behavior in the examples in the previous post.

If what you want is to apply annotation to groups of values, you need to use values before using annotations.

Let’s see how this looks in action:

from django.db.models import Count

device_role_count = Device.objects.values("device_role__name") \
    .order_by() \
    .annotate(count=Count("device_role__name"))
>>> from pprint import pprint
>>> pprint(list(device_role_count))
[{'count': 4, 'device_role__name': 'spine'},
 {'count': 4, 'device_role__name': 'Router'},
 {'count': 158, 'device_role__name': 'leaf'},
 {'count': 2, 'device_role__name': 'Backbone'},
 {'count': 40, 'device_role__name': 'edge'}]
  • values("device_role__name") gets device role names for all of the devices.
  • order_by() clears the default ordering.
  • annotate(count=Count("device_role__name")) annotates the result with the count of the number of devices for each of the device role names.

Computing Ratios between Results ofCountAggregator

Another use case where annotations with values() are handy is when computing ratios between two or more counts.

For instance, you want to know the percentage of free interfaces (i.e., interfaces that are not connected) for each of your devices. To get that information you could use the below query.

from django.db.models import F, Q

devices = Device.objects.filter(interfaces__isnull=False) \
    .annotate(intf_total=Count("interfaces"), 
              notconn_intf=Count("interfaces", filter=Q(interfaces__cable=None)), 
              intf_free_perc=100 * F("notconn_intf") / F("intf_total")) \
    .values("name", "intf_total", "notconn_intf", "intf_free_perc") \
    .order_by("intf_free_perc")
>>> pprint(list(devices)[:5])
[{'intf_free_perc': 70,
  'intf_total': 10,
  'name': 'jcy-rtr-01.infra.ntc.com',
  'notconn_intf': 7},
 {'intf_free_perc': 70,
  'intf_total': 10,
  'name': 'jcy-rtr-02.infra.ntc.com',
  'notconn_intf': 7},
 {'intf_free_perc': 70,
  'intf_total': 10,
  'name': 'jcy-bb-01.infra.ntc.com',
  'notconn_intf': 7},
 {'intf_free_perc': 77,
  'intf_total': 62,
  'name': 'lax-edge-02',
  'notconn_intf': 48},
 {'intf_free_perc': 77,
  'intf_total': 62,
  'name': 'lax-edge-01',
  'notconn_intf': 48}]

There are a lot of moving parts here, so let’s break this query down.

  • filter(interfaces__isnull=False) – first we filter out devices that don’t have any interfaces.

Next we create annotations. The first two annotations are examples of counts that we’ve seen before.

  • intf_total=Count("interfaces") – gives us the total number of interfaces per device.
  • notconn_intf=Count("interfaces", filter=Q(interfaces__cable=None)) – returns count of the interfaces that are not connected to a cable.
  • intf_free_perc=100 * F("notconn_intf") / F("intf_total") – this is the interesting bit. We’re making use of F3 object, which allows us to refer to the value of another field of the model when running queries. Dynamically computed fields added by annotations can also be referred to here. We take advantage of that fact and compute the ratio between the fields we just computed, notconn_intf and intf_total. To get the value as a percentage, we multiply the result by 100.
  • values("name", "intf_total", "notconn_intf", "intf_free_perc") – in the final result, we’re only interested in the three annotation fields displayed next to the name of each of the devices.
  • order_by("intf_free_perc") – to top it off, we sort the results by the percentage of free interfaces per device in ascending order.

Even though this query looks complicated, it really is quite readable once you break it down into the individual components.

Subqueries

Before we move on, we need to briefly touch upon the concept of subqueries4.

Subqueries are queries with results that can be used inside of other queries, as well as in annotations.

You would use a subquery to express logic that would be difficult or impossible to express with a simple query.

Let’s have a look at an example so you can see how useful they can be.

Imagine that you want to find devices belonging to the site with the most recently updated VLAN. To do that, you first define a query to get the most recently updated VLAN. We return only one value, the site of the most recently updated VLAN.

most_recent_vlan_update = VLAN.objects.order_by("-last_updated").values("site")[:1]

We then explicitly wrap this query in a Subquery object and use it in the outer query.

from django.db.models import Subquery

devices = Device.objects.filter(site=Subquery(most_recent_vlan_update))

This query is run against Device objects. We filter these objects to only the ones with a site field value that matches the value returned by the subquery.

<span role="button" tabindex="0" data-code=">>> pprint(list(devices)) [<Device: jfk-edge-01>, <Device: jfk-edge-02>, <Device: jfk-leaf-01>, <Device: jfk-leaf-02>, <Device: jfk-leaf-03>, <Device: jfk-leaf-04>, <Device: jfk-leaf-05>, <Device: jfk-leaf-06>, <Device: jfk-leaf-07>,
>>> pprint(list(devices))
[<Device: jfk-edge-01>,
 <Device: jfk-edge-02>,
 <Device: jfk-leaf-01>,
 <Device: jfk-leaf-02>,
 <Device: jfk-leaf-03>,
 <Device: jfk-leaf-04>,
 <Device: jfk-leaf-05>,
 <Device: jfk-leaf-06>,
 <Device: jfk-leaf-07>,
 <Device: jfk-leaf-08>]

The end result is devices matching the site returned by the subquery.

There is another reason for using subqueries. In most cases, their use results in performance improvements as your application makes fewer calls to the database.

The above query, where we used a subquery, resulted in a single call to the database. Django internally translated that query to a single SQL expression.

Compare this to the below code, where we make two separate queries to the database. The first query evaluates instantly, and the result is assigned to the site variable. We then use its value in the second query.

site = VLAN.objects.order_by("-last_updated").first().site
devices = Device.objects.filter(site=site)

The efficiency gains can be huge, especially in the cases where we replace Python loops containing queries with optimized queries containing subqueries and annotations.

Having this short introduction to subqueries, let’s return to our discussion of annotations.

Using Subqueries in Annotations

We can use subqueries in annotations. This means we can take the result of the subquery and assign it to a dynamically generated field on the model in the outer query.

Let’s take the subquery we defined above, this time returning VLAN name, and use it in the annotation.

most_recent_vlan_update = VLAN.objects.order_by("-last_updated").values("name")[:1]

devices = Device.objects \
            .annotate(most_recent_vlan_update=Subquery(most_recent_vlan_update)) \
            .values("name", "most_recent_vlan_update")
>>> pprint(list(devices)[::20])
[{'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'ams-edge-01'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'bkk-edge-01'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'cdg-edge-01'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'del-leaf-09'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'dfw-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'fra-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'hnd-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'jfk-leaf-06'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'lhr-leaf-04'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'ord-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'sin-leaf-01'}]

The idea here is to add to each device a new field that would hold the name of the most recently updated VLAN.

But this doesn’t quite work as it should. Each device gets assigned the same VLAN. That is, the subquery returns the most recently updated VLAN across all of the sites, and this is what is used in the annotation.

Is there a way of matching the site of the device with the site used in the subquery?

Yes, there is!

Subqueries and OuterRef

To fix our query we need to use OuterRef5 expression.

OuterRef acts like an F object, except it works across the query-subquery boundary. With OuterRef, you can refer (inside the subquery) to the field from the outer query.

Let’s modify our subquery so that only VLANs assigned to the same site as the device are taken into account in the annotation.

from django.db.models import OuterRef

most_recent_vlan_update = VLAN.objects \
                            .filter(site=OuterRef("site")) \
                            .order_by("-last_updated") \
                            .values("name")[:1]

devices = Device.objects \
            .annotate(most_recent_vlan_update=Subquery(most_recent_vlan_update)) \
            .values("name", "most_recent_vlan_update")
>>> pprint(list(devices)[::20])
[{'most_recent_vlan_update': 'ams-108-mgmt', 'name': 'ams-edge-01'},
 {'most_recent_vlan_update': 'bkk-108-mgmt', 'name': 'bkk-edge-01'},
 {'most_recent_vlan_update': 'cdg-108-mgmt', 'name': 'cdg-edge-01'},
 {'most_recent_vlan_update': 'del-110-mgmt', 'name': 'del-leaf-09'},
 {'most_recent_vlan_update': 'dfw-108-mgmt', 'name': 'dfw-leaf-07'},
 {'most_recent_vlan_update': 'fra-108-mgmt', 'name': 'fra-leaf-07'},
 {'most_recent_vlan_update': 'hnd-108-mgmt', 'name': 'hnd-leaf-07'},
 {'most_recent_vlan_update': 'jfk-108-mgmt', 'name': 'jfk-leaf-06'},
 {'most_recent_vlan_update': 'lhr-108-mgmt', 'name': 'lhr-leaf-04'},
 {'most_recent_vlan_update': 'ord-108-mgmt', 'name': 'ord-leaf-07'},
 {'most_recent_vlan_update': 'sin-108-mgmt', 'name': 'sin-leaf-01'}]

Much better. Now each device is assigned the most recently updated VLAN with a site matching the site of the device.

References

  1. Django docs – QuerySet – values() 
  2. Django docs – QuerySet – order_by() 
  3. Django docs – Query Expressions – F() expressions 
  4. Django docs – Query Expressions – Subquery() expressions 
  5. Django docs – Query Expressions – Referencing columns from the outer queryset 

Conclusion

In this post, we learned how the values clause changes the way Django QuerySet annotations work. We also learned about subqueries and how they can help us build more advanced annotation. In the next part of this series, we will look at more examples of using subqueries with annotations. Stay tuned!

-Przemek



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Developing Nautobot Plugins – Part 4

Blog Detail

This is part 4 of the tutorial series on writing Nautobot plugins. Nautobot plugins are a way to extend the base functionality of Nautobot. Plugins can extend the database schema, add custom pages, and even update existing pages within Nautobot; the possibilities are nearly endless. In this blog series, we are developing a plugin for modeling and managing DNS zone data within Nautobot. In the previous posts, we covered setting up a development environment (part 1), creating models, views and navigation (part 2), and creating forms and tables (part 3). In this post, we will create filters used in GUI views and API calls. We will also add a search panel to the GUI list views for our models.

For coding along with this blog post, please clone part 3 of nautobot-example-dns-manager and use that as a starting point.

The completed part 4 version of nautobot-example-dns-manager is also available as a reference.

Defining Filters

To implement filtering of the records used by our plugin we will create FilterSet classes for model classes. FilterSet classes provide a mechanism for searching through database records and returning only those that matched the constraints defined by the operator.

FilterSet classes used by the plugin are placed in the filters.py file. By the Nautobot convention, we name FilterSet classes by appending FilterSet to the name of the model class. For example, filter class for DnsZoneModel will be named DnsZoneModelFilterSet. Note that the internal machinery of Nautobot, including unit test helpers, expects the filter classes to follow this convention.

We will start with the FilterSet class for DnsZoneModel:

class DnsZoneModelFilterSet(NautobotFilterSet):
    """Filter for filtering DnsZoneModel objects."""

    q = SearchFilter(
        filter_predicates={
            "name": "icontains",
            "mname": "icontains",
            "rname": "icontains",
        },
    )
    ttl__gte = django_filters.NumberFilter(field_name="ttl", lookup_expr="gte")
    ttl__lte = django_filters.NumberFilter(field_name="ttl", lookup_expr="lte")

    class Meta:
        model = DnsZoneModel
        fields = "__all__"

Let’s break this code down.

We follow Nautobot’s best practices, defined here, and ask for filters to be generated automatically for all the fields defined on the model:

    class Meta:
        model = DnsZoneModel
        fields = "__all__"

Next, we create an additional filter named q. By convention, the q filter is used for free text search and is placed as the first filter on the list of filters in the GUI. To define this filter we use the SearchFilter helper class provided by Nautobot for this use case.

from nautobot.utilities.filters import SearchFilter

This class expects an argument named filter_predicates, which is a dictionary with keys being field names of the model that we want to be searched. Corresponding key values define the field lookup type that will be applied to the field values when searching. We will use an icontains lookup, which performs a case-insensitive way to see whether any field contains the search term.

Below is the completed code for the q field:

    q = SearchFilter(
        filter_predicates={
            "name": "icontains",
            "mname": "icontains",
            "rname": "icontains",
        },
    )

See Django field lookups docs for a full list of available filter lookups.

We will also add extra filters for ttl field, one for searching ttl equal to or greater than some value, and one for less than or equal to some value.

To do that we explicitly define two new filters, ttl__gte and ttl__lte. These will use the django_filters.NumberFilter type, which is for filtering numeric values. For each field, we need to define which underlying model attribute we are mapping to; in this case it is ttl. We also need to specify the lookup expression that will be applied to each of the filters. This is done by assigning the expression name to the lookup_expr argument. In our case, these expressions are gte and lte.

    ttl__gte = django_filters.NumberFilter(field_name="ttl", lookup_expr="gte")
    ttl__lte = django_filters.NumberFilter(field_name="ttl", lookup_expr="lte")

This completes the FilterForm for the DnsZoneModel:

class DnsZoneModelFilterSet(NautobotFilterSet):
    """Filter for filtering DnsZoneModel objects."""

    q = SearchFilter(
        filter_predicates={
            "name": "icontains",
            "mname": "icontains",
            "rname": "icontains",
        },
    )
    ttl__gte = django_filters.NumberFilter(field_name="ttl", lookup_expr="gte")
    ttl__lte = django_filters.NumberFilter(field_name="ttl", lookup_expr="lte")

    class Meta:
        model = DnsZoneModel
        fields = "__all__"

CNameRecordModel and ARecordModel link to DnsZoneModel via the zone attribute. For the filtering to work correctly for this field we need to make this lookup use the NaturalKeyOrPKMultipleChoiceFilter class. This lookup type needs a queryset argument to know which model instances it should be filtering against. In our case, it is DnsZoneModel; so we provide the queryset that returns all instances of this model.

We also define a label that tells the user this filter takes slug or id.

Finalized filter field for zone attribute:

zone = NaturalKeyOrPKMultipleChoiceFilter(
    queryset=DnsZoneModel.objects.all(),
    label="DNS Zone (slug or ID)",
)

The other fields and definitions replicate the code we wrote for DnsZoneModelFilterSet. Using that code we complete the ARecordModelFilterSet and CNameRecordModelFilterSet classes:

class ARecordModelFilterSet(NautobotFilterSet):
    """Filter for filtering ARecordModel objects."""

    q = SearchFilter(
        filter_predicates={
            "name": "icontains",
        },
    )
    zone = NaturalKeyOrPKMultipleChoiceFilter(
        queryset=DnsZoneModel.objects.all(),
        label="DNS Zone (slug or ID)",
    )
    ttl__gte = django_filters.NumberFilter(field_name="ttl", lookup_expr="gte")
    ttl__lte = django_filters.NumberFilter(field_name="ttl", lookup_expr="lte")

    class Meta:
        model = ARecordModel
        fields = "__all__"
class CNameRecordModelFilterSet(NautobotFilterSet):
    """Filter for filtering CNameRecordModel objects."""

    q = SearchFilter(
        filter_predicates={
            "name": "icontains",
            "value": "icontains",
        },
    )
    zone = NaturalKeyOrPKMultipleChoiceFilter(
        queryset=DnsZoneModel.objects.all(),
        label="DNS Zone (slug or ID)",
    )
    ttl__gte = django_filters.NumberFilter(field_name="ttl", lookup_expr="gte")
    ttl__lte = django_filters.NumberFilter(field_name="ttl", lookup_expr="lte")

    class Meta:
        model = CNameRecordModel
        fields = "__all__"

The final step needed for FilterSet to take effect is to point to the newly defined classes from the UIViewSet classes.

We do this by assigning each of the FilterSet classes to the filterset_class class attribute of the corresponding UIViewSet class.

For example, here we add FilterSet class to the DnsZoneModelUIViewSet view:

class DnsZoneModelUIViewSet(
    view_mixins.ObjectListViewMixin,
    view_mixins.ObjectDetailViewMixin,
    view_mixins.ObjectEditViewMixin,
    view_mixins.ObjectDestroyViewMixin,
    view_mixins.ObjectBulkDestroyViewMixin,
):
    queryset = DnsZoneModel.objects.all()
    table_class = DnsZoneModelTable
    form_class = DnsZoneModelForm
    serializer_class = serializers.DnsZoneModelSerializer
    filterset_class = DnsZoneModelFilterSet

Building Filter Forms

We have completed FilterSet classes, which define the filtering logic. To expose filtering in the GUI, we need to create FilterForm classes for each of our models.

Filter form describes how the form will appear in the GUI and will handle input validation before passing the values to the filtering logic.

By convention, we create FilterForm classes in the forms.py file. The class names should follow the <ModelClassName>FilterForm format. For example, DnsZoneModelFilterForm is the FilterForm class we will define for the DnsZoneModel model.

In the FilterForm class we define the model class the form is meant for. We then define each of the model fields we want to expose in the form. These fields will have to be assigned a form field type that matches the field type defined in the model.

For example, the name field, which is a models.CharField on the model, becomes forms.CharField in the form.

Selecting the correct form field class is important, as it will present the operator with the matching UI field. It will also define the validation logic applied before the entered value is passed to the filter sets.

Form field classes are listed in the Form Fields Django docs.

Let’s define the filter form class for DnsZoneModel, and then we’ll walk through the code.

class DnsZoneModelFilterForm(NautobotFilterForm):
    """Filtering/search form for `DnsZoneModelForm` objects."""

    model = DnsZoneModel

    q = forms.CharField(required=False, label="Search")
    name = forms.CharField(required=False)
    mname = forms.CharField(required=False, label="Primary server")
    rname = forms.EmailField(required=False, label="Admin email")
    refresh = forms.IntegerField(required=False, min_value=300, max_value=2147483647, label="Refresh timer")
    retry = forms.IntegerField(required=False, min_value=300, max_value=2147483647, label="Retry timer")
    expire = forms.IntegerField(required=False, min_value=300, max_value=2147483647, label="Expiry timer")
    ttl = forms.IntegerField(required=False, min_value=300, max_value=2147483647, label="Time to Live")
    ttl__gte = forms.IntegerField(required=False, label="TTL Greater/equal than")
    ttl__lte = forms.IntegerField(required=False, label="TTL Less/equal than")

Form field class initializers take arguments; some are shared across all types, and some are type specific.

We don’t want any fields to be required by default, so we’ll pass the argument required=False to the field initializers.

We will also define custom labels to replace auto-generated ones, which by default use the name of the model field. Labels are provided to the label argument.

For IntegerField fields, it’s a good idea to provide minimum and maximum allowed values if the model defines them. This will provide additional validation of the values at a UI layer. To do that, use min_value and max_value arguments in the IntegerField initializers.

Finally, notice that we included ttl__gte and ttl__lte fields, which match the custom filter fields defined earlier.

Form classes for ARecordModel and CNameRecordModel follow a similar pattern. The one difference is the zone field, which we want to be a multiple-choice field to allow an operator to select one or more DnsZoneModel instances to filter against.

This is done by defining the zone field to be of the DynamicModelMultipleChoiceField type provided by Nautobot in nautobot.utilities.forms. Choices presented in the GUI are provided by the queryset defined in the queryset argument. Here we want all of the DnsZoneModel instances to be available, so we use the DnsZoneModel.objects.all() queryset.

Additionally, we want the value of slug field to be used in the queries. By providing slug as the value to the to_field_name argument, we change the default (which is to use the model’s primary key). It’s important that the value of the field chosen for this purpose is unique for each instance of the model.

With zone form field defined, we complete FilterForm classes for ARecordModel and CNameRecordModel:

class ARecordModelFilterForm(NautobotFilterForm):
    """Filtering/search form for `ARecordModelForm` objects."""

    model = ARecordModel

    q = forms.CharField(required=False, label="Search")
    name = forms.CharField(required=False)
    zone = DynamicModelMultipleChoiceField(required=False, queryset=DnsZoneModel.objects.all(), to_field_name="slug")
    ttl = forms.IntegerField(required=False, min_value=300, max_value=2147483647, label="Time to Live")
    ttl__gte = forms.IntegerField(required=False, label="TTL Greater/equal than")
    ttl__lte = forms.IntegerField(required=False, label="TTL Less/equal than")
class CNameRecordModelFilterForm(NautobotFilterForm):
    """Filtering/search form for `CNameRecordModelForm` objects."""

    model = CNameRecordModel

    q = forms.CharField(required=False, label="Search")
    name = forms.CharField(required=False)
    zone = DynamicModelMultipleChoiceField(required=False, queryset=DnsZoneModel.objects.all(), to_field_name="slug")
    value = forms.CharField(required=False, label="Redirect FQDN")
    ttl = forms.IntegerField(required=False, min_value=300, max_value=2147483647, label="Time to Live")
    ttl__gte = forms.IntegerField(required=False, label="TTL Greater/equal than")
    ttl__lte = forms.IntegerField(required=False, label="TTL Less/equal than")

Once we have our FilterForm classes defined, we need to link them to the corresponding UIViewSet class.

We do it by assigning FilterSet class to the filterset_form_class class attribute of the corresponding UIViewSet class.

UIViewSet classes, including FilterSet and FilterForm references:

class ARecordModelUIViewSet(
    view_mixins.ObjectListViewMixin,
    view_mixins.ObjectDetailViewMixin,
    view_mixins.ObjectEditViewMixin,
    view_mixins.ObjectDestroyViewMixin,
    view_mixins.ObjectBulkDestroyViewMixin,
):
    queryset = ARecordModel.objects.all()
    table_class = ARecordModelTable
    form_class = ARecordModelForm
    serializer_class = serializers.ARecordModelSerializer
    filterset_class = ARecordModelFilterSet
    filterset_form_class = ARecordModelFilterForm
class CNameRecordModelUIViewSet(
    view_mixins.ObjectListViewMixin,
    view_mixins.ObjectDetailViewMixin,
    view_mixins.ObjectEditViewMixin,
    view_mixins.ObjectDestroyViewMixin,
    view_mixins.ObjectBulkDestroyViewMixin,
):
    queryset = CNameRecordModel.objects.all()
    table_class = CNameRecordModelTable
    form_class = CNameRecordModelForm
    serializer_class = serializers.CNameRecordModelSerializer
    filterset_class = CNameRecordModelFilterSet
    filterset_form_class = CNameRecordModelFilterForm
class DnsZoneModelUIViewSet(
    view_mixins.ObjectListViewMixin,
    view_mixins.ObjectDetailViewMixin,
    view_mixins.ObjectEditViewMixin,
    view_mixins.ObjectDestroyViewMixin,
    view_mixins.ObjectBulkDestroyViewMixin,
):
    queryset = DnsZoneModel.objects.all()
    table_class = DnsZoneModelTable
    form_class = DnsZoneModelForm
    serializer_class = serializers.DnsZoneModelSerializer
    filterset_class = DnsZoneModelFilterSet
    filterset_form_class = DnsZoneModelFilterForm

Filtering in GUI

With all the code in place, we start Nautobot in our local dev environment and navigate to the list views for models.

For each of the models, you should see a view similar to the ones below. Notice the search panel on the right-hand side with the fields we defined.

References


Conclusion
In this blog post, we learned how to build filtering logic for models defined in our plugin. We then exposed these filters using the search form displayed in the list view for each of the models. In the next installment of this series, we will learn how to add REST and GraphQL APIs to the plugin.
-Przemek


ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!