Nautobot and Django QuerySet Annotations – Part 1
In this post, we’re going to have a look at Django QuerySet annotations. We’ll answer questions about what they are and why you would want to use them. Following that, we’ll go through a few examples illustrating different use cases.
Briefly on QuerySets
Before we get to annotations, let’s have a quick refresher on Django QuerySets.
QuerySets represent a collection of objects from a database. You get QuerySet by interacting with the Django model’s Manager
. All Django models come with at least one Manager
, called objects
by default.
For example, Device.objects.all()
returns a QuerySet containing all Device objects in the database.
Most QuerySet methods return new QuerySets. This allows chaining of filters and other operations, like annotations, on them.
At its most basic level, a QuerySet translates to SQL SELECT statements, with other operations, like filters and annotations, translated to other SQL statements accordingly.
It’s worth emphasizing that QuerySets are lazy, meaning the database query is only executed when needed1. This means you can apply filters and modify your query as much as you want and Django won’t run it until your code triggers evaluation. For example, when you iterate over the objects in QuerySet or call a list()
on it.
In Nautobot you will have to care about QuerySets only when you write Plugins and Jobs or otherwise interact directly with the Nautobot models. If you’re just consuming data via GUI or APIs, you will not be interacting directly with QuerySets, as Nautobot will handle them for you.
What Are Annotations
Django annotations2 are a way of enriching the objects returned in QuerySets. That is, when you run queries against your models you can ask for new fields, whose values will be dynamically computed, to be added when evaluating the query. These fields will be accessible as if they were normal attributes of a model.
Why Use Annotations?
QuerySet annotations are very useful for computing attributes that can span relationships or record a result of aggregate functions.
One common use case is to count a number of related objects. For instance, say we want to retrieve sites from Nautobot, and for each of the sites we want to know how many devices are assigned to that site. This is a perfect fit for annotation utilizing the aggregate function Count
.
One question you might ask is, why not do all of this in Python? After all, you can get the results of a query and process them in Python using functions, loops, and conditionals.
There are a few reasons why that’s not always a good idea:
- With Python, the data needs to be loaded into the memory for processing. Annotations are translated behind the scenes into SQL queries, so in most cases, they will be much more performant than Python code. Databases are highly optimized for processing SQL and unless you’re doing a lot of subqueries, i.e., you have complex annotation, you are better off using annotations.
- Many Django apps and constructs expect to receive QuerySet as their input. It’s not always trivial to convert from Python list/dictionary back to the QuerySet. By using annotations you stay in the world of QuerySets and can use them wherever QuerySets are expected as an input.
Setup Examples
If you want to follow the examples in this post you can spin up a local instance of Nautobot by following instructions in the following repository: Nautobot Lab .
Once you have Nautobot up and running, simply launch nbshell
with the following command:
docker exec -it nautobot nautobot-server nbshell
Then run the below code to modify some of the objects to better fit our examples.
status_active = Status.objects.get(name="Active")
for c in Circuit.objects.filter(provider__name="NTT")[:34]:
c.status = status_active
c.save()
for c in Circuit.objects.filter(provider__name="Telia Carrier")[:29]:
c.status = status_active
c.save()
Annotations with Count
Aggregator Examples
Let’s now have a look at a simple annotation and its syntax.
Here we are asking for Provider
records. For each provider, we’re adding the field circuit_count
. That field will record a number of the circuits linked to each of the providers.
from django.db.models import Count
providers = Provider.objects.annotate(circuit_count=Count("circuits"))
Let’s break down this code.
- We imported an aggregation function
Count
3. This will allow us to compute the number of related objects. - Then we use the
annotate
method on theProvider.objects
Manager. - In the call to
annotate
we have a named argumentcircuit_count
. This is the name of the field that will be added to each of the returned objects. - We use the
Count
aggregator function as the value of the argument.Count("circuits")
means we want to get back a number of objects related toProvider
through thecircuits
relationship.
And this is what we get back when this code is run.
>>> type(providers)
<class 'nautobot.utilities.querysets.RestrictedQuerySet'>
>>>
>>> from pprint import pprint
>>> pprint(vars(providers.get(name="NTT")))
{'_custom_field_data': {},
'_state': <django.db.models.base.ModelState object at 0x7fccef3fc8e0>,
'account': '',
'admin_contact': '',
'asn': 2914,
'circuit_count': 40,
'comments': '',
'created': datetime.date(2021, 4, 16),
'id': UUID('ba6a30e7-c68f-477d-9a47-d766493fdba4'),
'last_updated': datetime.datetime(2021, 4, 16, 13, 32, 26, 186917, tzinfo=<UTC>),
'name': 'NTT',
'noc_contact': '',
'portal_url': '',
'slug': 'ntt'}
>>> providers.get(name="NTT").circuit_count
40
As you can see, providers
refers to a RestrictedQuerySet containing Provider
objects. These objects have a new field called circuit_count
, which can be accessed like any other field4. In the example above, we retrieved the provider named NTT
and the value of its circuit_count
field.
RestrictedQuerySet is a Nautobot-specific subclass of QuerySet that automatically limits the results of a query to only those objects that a given user has permission to access.
Python Equivalent Examples
If we wanted to do the same using Python we would do something like this:
providers = Provider.objects.values()
for provider in providers:
provider["circuit_count"] = Circuit.objects.filter(provider=provider["id"]).count()
>>> pprint(providers[6])
{'_custom_field_data': {},
'account': '',
'admin_contact': '',
'asn': 2914,
'circuit_count': 40,
'comments': '',
'created': datetime.date(2021, 4, 16),
'id': UUID('ba6a30e7-c68f-477d-9a47-d766493fdba4'),
'last_updated': datetime.datetime(2021, 4, 16, 13, 32, 26, 186917, tzinfo=<UTC>),
'name': 'NTT',
'noc_contact': '',
'portal_url': '',
'slug': 'ntt'}
You can probably see the problem with that code. We are forcing evaluation of the database query when iterating over the QuerySet. Additionally this code will make potentially thousands of SQL queries inside the for
loop, which can drastically increase the time it takes for your code to run.
What is even worse, we can no longer use QuerySet filtering and other operations on the resulting data.
Back to Annotations Examples
Now that we established that annotations are a better choice for this kind of use case, let’s have a look at what more we can do.
Since the original query with the annotation returned a QuerySet, that means we can refer to the new field in filtering, ordering, or any other operations that QuerySet allows us to do.
So let’s order the results using the circuit_count
field we just added.
providers = Provider.objects.annotate( \
circuit_count=Count("circuits")) \
.order_by("-circuit_count")
>>> pprint(vars(providers[0]))
{'_custom_field_data': {},
'_state': <django.db.models.base.ModelState object at 0x7fccef3fc820>,
'account': '',
'admin_contact': '',
'asn': 2914,
'circuit_count': 40,
'comments': '',
'created': datetime.date(2021, 4, 16),
'id': UUID('ba6a30e7-c68f-477d-9a47-d766493fdba4'),
'last_updated': datetime.datetime(2021, 4, 16, 13, 32, 26, 186917, tzinfo=<UTC>),
'name': 'NTT',
'noc_contact': '',
'portal_url': '',
'slug': 'ntt'}
>>>
>>> [p.circuit_count for p in providers]
[40, 40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
We applied order_by()
clause and passed it the name of the field created by annotations, circuit_count
.
By adding -
character in front of the field name, we ask for the items to be returned in descending order.
Next, we can ask for the query to return only selected fields and their values. We’ll do this by appending the values()
method.
providers = Provider.objects.annotate( \
circuit_count=Count("circuits")) \
.order_by("-circuit_count") \
.values("name", "circuit_count")
>>> pprint(list(providers))
[{'circuit_count': 40, 'name': 'NTT'},
{'circuit_count': 40, 'name': 'Telia Carrier'},
{'circuit_count': 0, 'name': 'Telecom Italia'},
{'circuit_count': 0, 'name': 'AT&T'},
{'circuit_count': 0, 'name': 'PCCW Global'},
{'circuit_count': 0, 'name': 'Tata Communications'},
{'circuit_count': 0, 'name': 'Verizon'},
{'circuit_count': 0, 'name': 'Telxius'},
{'circuit_count': 0, 'name': 'Deutsche Telekom'},
{'circuit_count': 0, 'name': 'Cogent'},
{'circuit_count': 0, 'name': 'Lumen Technologies'},
{'circuit_count': 0, 'name': 'Liberty Global'},
{'circuit_count': 0, 'name': 'Orange'},
{'circuit_count': 0, 'name': 'Sprint'},
{'circuit_count': 0, 'name': 'Zayo'},
{'circuit_count': 0, 'name': 'GTT'}]
We asked here for name
and circuit_count
fields to be returned.
Field name
came from the Provider
model while the circuit_count
is the dynamically computed field created by the annotation.
How about we take it one step further and do some filtering. We’ll narrow down the results to the providers that have at least one circuit.
providers = Provider.objects.annotate( \
circuit_count=Count("circuits")) \
.filter(circuit_count__gt=0) \
.order_by("-circuit_count") \
.values("name", "circuit_count")
>>> providers
<RestrictedQuerySet [{'name': 'NTT', 'circuit_count': 40}, {'name': 'Telia Carrier', 'circuit_count': 40}]>
We added a call to the filter()
method where we make use of the circuit_count
annotation.
filter(circuit_count__gt=0)
means that we want Provider
objects with the value of newly computed field circuit_count
being greater than 0.
But that’s not all — we can also do filtering in the annotation itself. For instance, we’ll count related circuits that are in a state other than Active
.
from django.db.models import Q
providers = Provider.objects.annotate( \
inactive_circuit_count= \
Count("circuits", filter=~Q(circuits__status__name="Active"))) \
.filter(inactive_circuit_count__gt=0) \
.order_by("-inactive_circuit_count") \
.values("name", "inactive_circuit_count")
>>> providers
<RestrictedQuerySet [{'name': 'Telia Carrier', 'inactive_circuit_count': 11}, {'name': 'NTT', 'inactive_circuit_count': 6}]>
- First, the
Q
object5 is imported to allow us to perform negation logic in the filter. - We passed an argument
filter
to theCount
aggregator. - Filter value of
~Q(circuits__status__name="Active")
tellsCount
to take into account only circuits whose status name is not equal toActive
.
You can even create multiple annotations for the same field with different filters.
providers = Provider.objects.annotate( \
c_total_count=Count("circuits"), \
c_active_count= \
Count("circuits", filter=Q(circuits__status__name="Active")), \
c_inactive_count= \
Count("circuits", filter=~Q(circuits__status__name="Active"))) \
.filter(c_total_count__gt=0) \
.values("name", "c_total_count", "c_active_count", "c_inactive_count")
>>> pprint(list(providers))
[{'c_active_count': 34,
'c_inactive_count': 6,
'c_total_count': 40,
'name': 'NTT'},
{'c_active_count': 29,
'c_inactive_count': 11,
'c_total_count': 40,
'name': 'Telia Carrier'}]
Here we created three annotations.
c_active_count
counts number of circuits related toProvider
that are in theActive
state.c_inactive_count
counts number of circuits related toProvider
that are in the state other thanActive
.c_total_count
counts all related circuits regardless of their status.
Examples of Annotations and Reverse Relationships
Now you might be wondering, how did I know that you need to use the name circuits
in Count
? After all, Provider
model doesn’t have that field defined.
However, if you look at the Circuit
model, you can see that it has a provider
field defined which is a ForeignKey
linking to the Provider
model.
class Circuit(PrimaryModel, StatusModel):
...
provider = models.ForeignKey(to="circuits.Provider", on_delete=models.PROTECT, related_name="circuits")
...
In that field definition, you have an argument named related_name
with a value of circuits
. This tells Django to create a reverse relationship from the Provider
back to Circuit
using the name specified by related_name
, that is, circuits
.
This is why in my annotations I referred to the circuits
name.
Technically the name to traverse reverse relationship used in filters and annotations is specified in the argument called related_query_name
. However, if that argument is not specified explicitly, but related_name
has been defined, then related_query_name
will default to the value of related_name
argument.
If neither related_name
or related_query_name
are defined, Django will by default use the name of the model for the related_query_name
.
It is also possible to prevent Django from creating a reverse relationship by setting related_name
to +
. In those cases, we will have to use subqueries to provide annotations for the related objects. We’ll have a look at how this is done in the next post of this series.
If you want to read more about related_name
, refer to the Django docs6.
Examples of Annotations and Filtering
It’s worth mentioning that the order of filtering and annotations matters. Annotations are applied over the state of the query when the annotation is requested. If you do filtering before annotations then annotation will be computed only over the filtered subset. This is especially important when using Count
and Avg
aggregators, as it can skew the results7.
Conclusion
In this post, we learned about Django QuerySet annotations. They are a powerful tool that you can use to enrich the results of your queries. This becomes very useful when writing custom queries in Nautobot plugins and gives you a performance boost compared to using Python code for data manipulation.
I hope the few examples I have presented here already show you how much can be done with annotations and will provide some inspiration for your adventures in the world of Django and Nautobot.
In the next part of this series, we will look at more advanced use cases and we will continue the discussion of things to look out for when using annotations. Stay tuned!
-Przemek Rogala
References
- Django docs – When QuerySets are evaluated ↩
- Django docs – Generating aggregates for each item in a QuerySet ↩
- Django docs –
Count
function ↩ - Django docs – field lookups ↩
- Django docs – Complex lookups with Q objects ↩
- Django docs – ForeignKey.related_name ↩
- Django docs – Order of annotate() and filter() clauses ↩
Tags :
Contact Us to Learn More
Share details about yourself & someone from our team will reach out to you ASAP!