Nautobot and Django QuerySet Annotations – Part 1

In this post, we’re going to have a look at Django QuerySet annotations. We’ll answer questions about what they are and why you would want to use them. Following that, we’ll go through a few examples illustrating different use cases.

Briefly on QuerySets

Before we get to annotations, let’s have a quick refresher on Django QuerySets.

QuerySets represent a collection of objects from a database. You get QuerySet by interacting with the Django model’s Manager. All Django models come with at least one Manager, called objects by default.

For example, Device.objects.all() returns a QuerySet containing all Device objects in the database.

Most QuerySet methods return new QuerySets. This allows chaining of filters and other operations, like annotations, on them.

At its most basic level, a QuerySet translates to SQL SELECT statements, with other operations, like filters and annotations, translated to other SQL statements accordingly.

It’s worth emphasizing that QuerySets are lazy, meaning the database query is only executed when needed1. This means you can apply filters and modify your query as much as you want and Django won’t run it until your code triggers evaluation. For example, when you iterate over the objects in QuerySet or call a list() on it.

In Nautobot you will have to care about QuerySets only when you write Plugins and Jobs or otherwise interact directly with the Nautobot models. If you’re just consuming data via GUI or APIs, you will not be interacting directly with QuerySets, as Nautobot will handle them for you.

What Are Annotations

Django annotations2 are a way of enriching the objects returned in QuerySets. That is, when you run queries against your models you can ask for new fields, whose values will be dynamically computed, to be added when evaluating the query. These fields will be accessible as if they were normal attributes of a model.

Why Use Annotations?

QuerySet annotations are very useful for computing attributes that can span relationships or record a result of aggregate functions.

One common use case is to count a number of related objects. For instance, say we want to retrieve sites from Nautobot, and for each of the sites we want to know how many devices are assigned to that site. This is a perfect fit for annotation utilizing the aggregate function Count.

One question you might ask is, why not do all of this in Python? After all, you can get the results of a query and process them in Python using functions, loops, and conditionals.

There are a few reasons why that’s not always a good idea:

  • With Python, the data needs to be loaded into the memory for processing. Annotations are translated behind the scenes into SQL queries, so in most cases, they will be much more performant than Python code. Databases are highly optimized for processing SQL and unless you’re doing a lot of subqueries, i.e., you have complex annotation, you are better off using annotations.
  • Many Django apps and constructs expect to receive QuerySet as their input. It’s not always trivial to convert from Python list/dictionary back to the QuerySet. By using annotations you stay in the world of QuerySets and can use them wherever QuerySets are expected as an input.

Setup Examples

If you want to follow the examples in this post you can spin up a local instance of Nautobot by following instructions in the following repository: Nautobot Lab .

Once you have Nautobot up and running, simply launch nbshell with the following command:

docker exec -it nautobot nautobot-server nbshell

Then run the below code to modify some of the objects to better fit our examples.

status_active = Status.objects.get(name="Active")

for c in Circuit.objects.filter(provider__name="NTT")[:34]:
   c.status = status_active
   c.save()

for c in Circuit.objects.filter(provider__name="Telia Carrier")[:29]:
    c.status = status_active
    c.save()

Annotations with Count Aggregator Examples

Let’s now have a look at a simple annotation and its syntax.

Here we are asking for Provider records. For each provider, we’re adding the field circuit_count. That field will record a number of the circuits linked to each of the providers.

from django.db.models import Count

providers = Provider.objects.annotate(circuit_count=Count("circuits"))

Let’s break down this code.

  • We imported an aggregation function Count3. This will allow us to compute the number of related objects.
  • Then we use the annotate method on the Provider.objects Manager.
  • In the call to annotate we have a named argument circuit_count. This is the name of the field that will be added to each of the returned objects.
  • We use the Count aggregator function as the value of the argument. Count("circuits") means we want to get back a number of objects related to Provider through the circuits relationship.

And this is what we get back when this code is run.

<span role="button" tabindex="0" data-code=">>> type(providers) <class 'nautobot.utilities.querysets.RestrictedQuerySet'> >>> >>> from pprint import pprint >>> pprint(vars(providers.get(name="NTT"))) {'_custom_field_data': {}, '_state': <django.db.models.base.ModelState object at 0x7fccef3fc8e0>, 'account': '', 'admin_contact': '', 'asn': 2914, 'circuit_count': 40, 'comments': '', 'created': datetime.date(2021, 4, 16), 'id': UUID('ba6a30e7-c68f-477d-9a47-d766493fdba4'), 'last_updated': datetime.datetime(2021, 4, 16, 13, 32, 26, 186917, tzinfo=
>>> type(providers)
<class 'nautobot.utilities.querysets.RestrictedQuerySet'>
>>>
>>> from pprint import pprint
>>> pprint(vars(providers.get(name="NTT")))
{'_custom_field_data': {},
 '_state': <django.db.models.base.ModelState object at 0x7fccef3fc8e0>,
 'account': '',
 'admin_contact': '',
 'asn': 2914,
 'circuit_count': 40,
 'comments': '',
 'created': datetime.date(2021, 4, 16),
 'id': UUID('ba6a30e7-c68f-477d-9a47-d766493fdba4'),
 'last_updated': datetime.datetime(2021, 4, 16, 13, 32, 26, 186917, tzinfo=<UTC>),
 'name': 'NTT',
 'noc_contact': '',
 'portal_url': '',
 'slug': 'ntt'}

>>> providers.get(name="NTT").circuit_count
40

As you can see, providers refers to a RestrictedQuerySet containing Provider objects. These objects have a new field called circuit_count, which can be accessed like any other field4. In the example above, we retrieved the provider named NTT and the value of its circuit_count field.

RestrictedQuerySet is a Nautobot-specific subclass of QuerySet that automatically limits the results of a query to only those objects that a given user has permission to access.

Python Equivalent Examples

If we wanted to do the same using Python we would do something like this:

providers = Provider.objects.values()

for provider in providers:
    provider["circuit_count"] = Circuit.objects.filter(provider=provider["id"]).count()
<span role="button" tabindex="0" data-code=">>> pprint(providers[6]) {'_custom_field_data': {}, 'account': '', 'admin_contact': '', 'asn': 2914, 'circuit_count': 40, 'comments': '', 'created': datetime.date(2021, 4, 16), 'id': UUID('ba6a30e7-c68f-477d-9a47-d766493fdba4'), 'last_updated': datetime.datetime(2021, 4, 16, 13, 32, 26, 186917, tzinfo=
>>> pprint(providers[6])
{'_custom_field_data': {},
 'account': '',
 'admin_contact': '',
 'asn': 2914,
 'circuit_count': 40,
 'comments': '',
 'created': datetime.date(2021, 4, 16),
 'id': UUID('ba6a30e7-c68f-477d-9a47-d766493fdba4'),
 'last_updated': datetime.datetime(2021, 4, 16, 13, 32, 26, 186917, tzinfo=<UTC>),
 'name': 'NTT',
 'noc_contact': '',
 'portal_url': '',
 'slug': 'ntt'}

You can probably see the problem with that code. We are forcing evaluation of the database query when iterating over the QuerySet. Additionally this code will make potentially thousands of SQL queries inside the for loop, which can drastically increase the time it takes for your code to run.

What is even worse, we can no longer use QuerySet filtering and other operations on the resulting data.

Back to Annotations Examples

Now that we established that annotations are a better choice for this kind of use case, let’s have a look at what more we can do.

Since the original query with the annotation returned a QuerySet, that means we can refer to the new field in filtering, ordering, or any other operations that QuerySet allows us to do.

So let’s order the results using the circuit_count field we just added.

providers = Provider.objects.annotate( \
  circuit_count=Count("circuits")) \
  .order_by("-circuit_count")
<span role="button" tabindex="0" data-code=">>> pprint(vars(providers[0])) {'_custom_field_data': {}, '_state': <django.db.models.base.ModelState object at 0x7fccef3fc820>, 'account': '', 'admin_contact': '', 'asn': 2914, 'circuit_count': 40, 'comments': '', 'created': datetime.date(2021, 4, 16), 'id': UUID('ba6a30e7-c68f-477d-9a47-d766493fdba4'), 'last_updated': datetime.datetime(2021, 4, 16, 13, 32, 26, 186917, tzinfo=
>>> pprint(vars(providers[0]))
{'_custom_field_data': {},
 '_state': <django.db.models.base.ModelState object at 0x7fccef3fc820>,
 'account': '',
 'admin_contact': '',
 'asn': 2914,
 'circuit_count': 40,
 'comments': '',
 'created': datetime.date(2021, 4, 16),
 'id': UUID('ba6a30e7-c68f-477d-9a47-d766493fdba4'),
 'last_updated': datetime.datetime(2021, 4, 16, 13, 32, 26, 186917, tzinfo=<UTC>),
 'name': 'NTT',
 'noc_contact': '',
 'portal_url': '',
 'slug': 'ntt'}
>>>
>>> [p.circuit_count for p in providers]
[40, 40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

We applied order_by() clause and passed it the name of the field created by annotations, circuit_count.

By adding - character in front of the field name, we ask for the items to be returned in descending order.

Next, we can ask for the query to return only selected fields and their values. We’ll do this by appending the values() method.

providers = Provider.objects.annotate( \
  circuit_count=Count("circuits")) \
  .order_by("-circuit_count") \
  .values("name", "circuit_count")
>>> pprint(list(providers))
[{'circuit_count': 40, 'name': 'NTT'},
 {'circuit_count': 40, 'name': 'Telia Carrier'},
 {'circuit_count': 0, 'name': 'Telecom Italia'},
 {'circuit_count': 0, 'name': 'AT&T'},
 {'circuit_count': 0, 'name': 'PCCW Global'},
 {'circuit_count': 0, 'name': 'Tata Communications'},
 {'circuit_count': 0, 'name': 'Verizon'},
 {'circuit_count': 0, 'name': 'Telxius'},
 {'circuit_count': 0, 'name': 'Deutsche Telekom'},
 {'circuit_count': 0, 'name': 'Cogent'},
 {'circuit_count': 0, 'name': 'Lumen Technologies'},
 {'circuit_count': 0, 'name': 'Liberty Global'},
 {'circuit_count': 0, 'name': 'Orange'},
 {'circuit_count': 0, 'name': 'Sprint'},
 {'circuit_count': 0, 'name': 'Zayo'},
 {'circuit_count': 0, 'name': 'GTT'}]

We asked here for name and circuit_count fields to be returned.

Field name came from the Provider model while the circuit_count is the dynamically computed field created by the annotation.

How about we take it one step further and do some filtering. We’ll narrow down the results to the providers that have at least one circuit.

providers = Provider.objects.annotate( \
    circuit_count=Count("circuits")) \
    .filter(circuit_count__gt=0) \
    .order_by("-circuit_count") \
    .values("name", "circuit_count")
<span role="button" tabindex="0" data-code=">>> providers
>>> providers
<RestrictedQuerySet [{'name': 'NTT', 'circuit_count': 40}, {'name': 'Telia Carrier', 'circuit_count': 40}]>

We added a call to the filter() method where we make use of the circuit_count annotation.

filter(circuit_count__gt=0) means that we want Provider objects with the value of newly computed field circuit_count being greater than 0.

But that’s not all — we can also do filtering in the annotation itself. For instance, we’ll count related circuits that are in a state other than Active.

from django.db.models import Q

providers = Provider.objects.annotate( \
    inactive_circuit_count= \
    Count("circuits", filter=~Q(circuits__status__name="Active"))) \
    .filter(inactive_circuit_count__gt=0) \
    .order_by("-inactive_circuit_count") \
    .values("name", "inactive_circuit_count")
<span role="button" tabindex="0" data-code=">>> providers
>>> providers
<RestrictedQuerySet [{'name': 'Telia Carrier', 'inactive_circuit_count': 11}, {'name': 'NTT', 'inactive_circuit_count': 6}]>
  • First, the Q object5 is imported to allow us to perform negation logic in the filter.
  • We passed an argument filter to the Count aggregator.
  • Filter value of ~Q(circuits__status__name="Active") tells Count to take into account only circuits whose status name is not equal to Active.

You can even create multiple annotations for the same field with different filters.

providers = Provider.objects.annotate( \
    c_total_count=Count("circuits"), \
    c_active_count= \
    Count("circuits", filter=Q(circuits__status__name="Active")), \
    c_inactive_count= \
    Count("circuits", filter=~Q(circuits__status__name="Active"))) \
    .filter(c_total_count__gt=0) \
    .values("name", "c_total_count", "c_active_count", "c_inactive_count")
>>> pprint(list(providers))
[{'c_active_count': 34,
  'c_inactive_count': 6,
  'c_total_count': 40,
  'name': 'NTT'},
 {'c_active_count': 29,
  'c_inactive_count': 11,
  'c_total_count': 40,
  'name': 'Telia Carrier'}]

Here we created three annotations.

  • c_active_count counts number of circuits related to Provider that are in the Active state.
  • c_inactive_count counts number of circuits related to Provider that are in the state other than Active.
  • c_total_count counts all related circuits regardless of their status.

Examples of Annotations and Reverse Relationships

Now you might be wondering, how did I know that you need to use the name circuits in Count? After all, Provider model doesn’t have that field defined.

However, if you look at the Circuit model, you can see that it has a provider field defined which is a ForeignKey linking to the Provider model.

class Circuit(PrimaryModel, StatusModel):
    ...

    provider = models.ForeignKey(to="circuits.Provider", on_delete=models.PROTECT, related_name="circuits")

    ...

In that field definition, you have an argument named related_name with a value of circuits. This tells Django to create a reverse relationship from the Provider back to Circuit using the name specified by related_name, that is, circuits.

This is why in my annotations I referred to the circuits name.

Technically the name to traverse reverse relationship used in filters and annotations is specified in the argument called related_query_name. However, if that argument is not specified explicitly, but related_name has been defined, then related_query_name will default to the value of related_name argument.

If neither related_name or related_query_name are defined, Django will by default use the name of the model for the related_query_name.

It is also possible to prevent Django from creating a reverse relationship by setting related_name to +. In those cases, we will have to use subqueries to provide annotations for the related objects. We’ll have a look at how this is done in the next post of this series.

If you want to read more about related_name, refer to the Django docs6.

Examples of Annotations and Filtering

It’s worth mentioning that the order of filtering and annotations matters. Annotations are applied over the state of the query when the annotation is requested. If you do filtering before annotations then annotation will be computed only over the filtered subset. This is especially important when using Count and Avg aggregators, as it can skew the results7.


Conclusion

In this post, we learned about Django QuerySet annotations. They are a powerful tool that you can use to enrich the results of your queries. This becomes very useful when writing custom queries in Nautobot plugins and gives you a performance boost compared to using Python code for data manipulation.

I hope the few examples I have presented here already show you how much can be done with annotations and will provide some inspiration for your adventures in the world of Django and Nautobot.

In the next part of this series, we will look at more advanced use cases and we will continue the discussion of things to look out for when using annotations. Stay tuned!

-Przemek Rogala

References



ntc img
ntc img

Contact Us to Learn More

Share details about yourself & someone from our team will reach out to you ASAP!

Author