Skip to content

[dev] API Query Profiling / Improvements #11002

@SchrodingersGat

Description

@SchrodingersGat

Overview

While investigating various profiling options for #10996 I have found that there are some places where we need to seriously consider the (in)efficiencies of our API.

Many of our API endpoints (although some more than others) need a lot of optimization (to reduce the total query response time). There are a number of shortcomings of our current design which I have identified. Some will be easy(ish) to fix, some may require some larger architectural considerations.

Related Issues

Related Pull Requests

Examples

Part API

Consider the following API request: http://localhost:8000/api/part/?format=json&limit=500

This has a round-trip time of ~900ms (and increases with higher limit numbers).

The biggest culprit here is our immense queryset aggregation:

Image

Which expands to the SQL in the attached file: part_query.sql

There are a number of other identified bottlenecks in this (and other) API endpoints.

Problem Scope

There are a number of things we need to address:

Complex Annotations

We have many database queries which have complex annotations - this is to avoid the 1+N problem, which has historically been the cause of extremely large query times (> 30 seconds).

However the annotations (especially related to "order requirements" on the Part model) are now so complex that just running the generated SQL is the largest part of the API request.

We need to address these, somehow - although I'm not sure how we do this without either:

  • Re-introducing 1+N issues
  • Losing existing functionality in the API

Annotate Before Filter

There are a number of points in the code where we annotate before filtering - which means that the database calls are slow because they annotate across all rows in the table - even ones which will be later removed via filtering.

Table Indexes

There are a number of places where we should add custom indexes to the tables to aid in lookup. We do not currently have any custom indexes and there are a lot of cross-table references that are used very often.

taggit issues

Ref: #10996

Currently the taggit library code introduces an overhead on any API queries that reference a model that has a tags field - even if the serialized data does not require the tags field.

Search Function

The search functionality is noticably slow as the database size grows to any significant level. This needs to be investigated. Perhaps better DB indexing would work here?

Table Caching

There are a number of data annotations which are computed very often, but the underlying data does not often change. For example, the "requirements" for any given part could be computed when the underlying data changes, and then pulled directly from a table with a fk link back to the part instance.

For example:

  • number required for build orders / sales orders
  • number allocated to build orders / sales orders
  • total number in stock

We already cache the pricing data, and recalculate when needed.

Caching the "requirements" data to a separate table would represent a huge speedup and allows us to simplify the queryset annotations significantly.

Approach

I am hoping that there is someone with more advanced django / database knowledge who can assist with this, as this is pushing the boundaries of my optimization knowledge!

I am hoping to break these down into achievable tasks, so that we can gradually work towards a much speedier API. We should also look at adding some more advanced regression tests into CI - to ensure that we do not introduce any performance bottlenecks to the API in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiRelates to the APIenhancementThis is an suggested enhancement or new feature

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions