[dev] API Query Profiling / Improvements


## Overview

While investigating various profiling options for https://github.com/inventree/InvenTree/issues/10996 I have found that there are some places where we need to seriously consider the (in)efficiencies of our API. 

Many of our API endpoints (although some more than others) need a lot of optimization (to reduce the total query response time). There are a number of shortcomings of our current design which I have identified. Some will be easy(ish) to fix, some may require some larger architectural considerations.

## Related Issues

- https://github.com/inventree/InvenTree/issues/10996

## Related Pull Requests

- https://github.com/inventree/InvenTree/pull/11006
- https://github.com/inventree/InvenTree/pull/11007
- https://github.com/inventree/InvenTree/pull/11009
- https://github.com/inventree/InvenTree/pull/11010
- https://github.com/inventree/InvenTree/pull/11012
- https://github.com/inventree/InvenTree/pull/11021
- https://github.com/inventree/InvenTree/pull/11023

## Examples

### Part API

Consider the following API request: `http://localhost:8000/api/part/?format=json&limit=500`

This has a round-trip time of ~900ms (and increases with higher limit numbers).

The biggest culprit here is our immense queryset aggregation:

<img width="987" height="680" alt="Image" src="https://github.com/user-attachments/assets/d5a59f6a-e7ce-4a6e-b9e4-ae3059229e55" />

Which expands to the SQL in the attached file: [part_query.sql](https://github.com/user-attachments/files/24138785/part_query.sql)

There are a number of other identified bottlenecks in this (and other) API endpoints.

## Problem Scope

There are a number of things we need to address:

### Complex Annotations

We have many database queries which have complex annotations - this is to avoid the `1+N` problem, which has historically been the cause of extremely large query times (> 30 seconds).

However the annotations (especially related to "order requirements" on the `Part` model) are now so complex that just running the generated SQL is the largest part of the API request.

We need to address these, somehow - although I'm not sure how we do this without either:

- Re-introducing `1+N` issues
- Losing existing functionality in the API

### Annotate Before Filter

There are a number of points in the code where we annotate *before* filtering - which means that the database calls are slow because they annotate across all rows in the table - even ones which will be later removed via filtering.

### Table Indexes

There are a number of places where we should add custom indexes to the tables to aid in lookup. We do not currently have any custom indexes and there are a lot of cross-table references that are used very often.

### taggit issues

Ref: https://github.com/inventree/InvenTree/issues/10996

Currently the taggit library code introduces an overhead on any API queries that reference a model that has a tags field - even if the serialized data does not require the tags field.

### Search Function

The search functionality is noticably slow as the database size grows to any significant level. This needs to be investigated. Perhaps better DB indexing would work here?

### Table Caching

There are a number of data annotations which are computed very often, but the underlying data does not often change. For example, the "requirements" for any given part could be computed when the underlying data changes, and then pulled directly from a table with a fk link back to the part instance.

For example:

- number required for build orders / sales orders
- number allocated to build orders / sales orders
- total number in stock

We already cache the pricing data, and recalculate when needed.

Caching the "requirements" data to a separate table would represent a *huge* speedup and allows us to simplify the queryset annotations significantly.

## Approach

I am hoping that there is someone with more advanced django / database knowledge who can assist with this, as this is pushing the boundaries of my optimization knowledge!

I am hoping to break these down into achievable tasks, so that we can gradually work towards a much speedier API. We should also look at adding some more advanced regression tests into CI - to ensure that we do not introduce any performance bottlenecks to the API in the future.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[dev] API Query Profiling / Improvements #11002

Overview

Related Issues

Related Pull Requests

Examples

Part API

Problem Scope

Complex Annotations

Annotate Before Filter

Table Indexes

taggit issues

Search Function

Table Caching

Approach

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[dev] API Query Profiling / Improvements #11002

Description

Overview

Related Issues

Related Pull Requests

Examples

Part API

Problem Scope

Complex Annotations

Annotate Before Filter

Table Indexes

taggit issues

Search Function

Table Caching

Approach

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions