Optimize Process list by batching #6831

BartChris · 2025-12-19T15:50:24Z

This Pull Request tries to further optimize the process list in Kitodo.Production. The changes address bottlenecks which were identified earlier while preserving existing behavior (See #6649 (comment))

The linked issue identified the following bottlenecks which all stem from executing SQL logic for each process in the list. (100 times on a list with max size)

Query (simplified)	Executions	Avg (ms)	Max (ms)	Total (s)	% of DB time
`tasks0_ ... WHERE process_id=?`	100	0.45	0.95	0.045	42%
`t.processingStatus ... WITH RECURSIVE process_children`	100	0.27	0.49	0.027	25%
`process0_.id ... parent_id=?`	100	0.19	0.36	0.019	18%
`comments0_ ... JOIN user`	3	0.59	0.81	0.0018	1.7%
`batches0_.process_id IN (...)`	3	0.19	0.27	0.0006	0.5%
Other queries	~10	—	—	~0.015	~13%

The queries identified there can all be made more efficient by executing them only once for all processes, caching the result and reusing the cached result for the view.

The first optimization extends an idea introduced in #5360 (see esp. #5360 (comment)). In order to recursively calculate the progress for all processes in the list (including parents) we rely on native SQL queries which are now supported by current versions of MySQL and MariaDB. The changes here go one step further and recursively calculate the progress for all processes in the list at once.

The second optimization is directed at the calculation of the task title of open/in work tasks of a process, which is used in a tooltip in the list. We can use default HQL to retrieve the information for all processes at once and cache it for reuse in the view. The same is true for identifying all processes with children, which can also be done in one batch query.

The same general pattern has also been applied in another PR to optimize the user list (#6803): Calculate the values for all processes in the derived LazyBeanModel for this view and store them in a HashMap which serves as a cache, which is accessed by the view.

To asses whether this actually improves on performance maybe @solth or @henning-gerhardt can give it a try.

BartChris · 2025-12-22T12:59:11Z

Another optimization to inspect in general: When filtering for tasks and their state we join the task table, what is probably not strictly necessary.

When filtering by task name and state the query constructed involves joining a potentially very large task table and usually looks like this:

SELECT process
FROM Process AS process
INNER JOIN process.tasks AS task
  WITH task.processingStatus = :queryObject
 AND task.title = :userFilter2
WHERE process.project.client.id = :sessionClientId
  AND process.id NOT IN (:id)
  AND process.id IN (:userFilter1query1)
  AND process.id IN (:userFilter1query2)
  AND (process.sortHelperStatus IS NULL OR process.sortHelperStatus != :completedState)
  AND process.project.id IN (:projectIDs)
ORDER BY process.id ASC

based on the logic defined here.

kitodo-production/Kitodo/src/main/java/org/kitodo/production/services/data/FilterField.java

Lines 45 to 48 in 75ed87a

    
           TASK_READY("tasks AS task WITH task.processingStatus = :queryObject AND task.title", 
        
                   "~.processingStatus = :queryObject AND ~.title", LikeSearch.NO, 
        
                   "tasks AS task WITH task.processingStatus = :queryObject AND task.id", 
        
                   "processingStatus = :queryObject AND id", TaskStatus.OPEN, null, -1),

I think for tasks we can employ EXISTS queries as well which are more efficient. We only want to answer the question whether a process has tasks with that attributes or not, so query could be something like this:

SELECT process
FROM Process AS process
WHERE process.project.client.id = :sessionClientId
  AND process.id NOT IN (:id)
  AND process.id IN (:userFilter1query1)
  AND process.id IN (:userFilter1query2)
  AND (process.sortHelperStatus IS NULL
       OR process.sortHelperStatus != :completedState)
  AND process.project.id IN (:projectIDs)
  AND EXISTS (
      SELECT 1
      FROM Task task
      WHERE task.process = process
        AND task.processingStatus = :queryObject
        AND task.title = :userFilter2
  )
ORDER BY process.id ASC

BartChris · 2025-12-29T17:52:38Z

Selecting or unselecting also triggers a lot of queries. The more processes are selected, the more queries are triggered. Maybe we can also cache the rowdata which is retrieved anew (for all seleced rows) whenever a row selection is made:

@Override
    public Object getRowData() {
        Stopwatch stopwatch = new Stopwatch(this, "getRowData");
        List<Object> data = getWrappedData();
        if (isRowAvailable()) {
            return stopwatch.stop(data.get(getRowIndex()));
        } else {
            return stopwatch.stop(null);
        }
    }

BartChris force-pushed the process_list_batching branch 14 times, most recently from c686e82 to 34f2e4f Compare December 21, 2025 02:11

BartChris force-pushed the process_list_batching branch 2 times, most recently from f6728a2 to 6c133ff Compare December 23, 2025 10:18

BartChris added 2 commits December 23, 2025 12:20

Optimize task status calculation

91887c4

More optimizations

dfce9b3

BartChris force-pushed the process_list_batching branch from 6c133ff to d52fe8c Compare December 23, 2025 11:21

Delete old methods

105d7c5

BartChris force-pushed the process_list_batching branch from d52fe8c to 105d7c5 Compare December 23, 2025 11:31

BartChris mentioned this pull request Dec 23, 2025

[3.9] Optimize Process list by batching #6834

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Process list by batching #6831

Optimize Process list by batching #6831

Uh oh!

BartChris commented Dec 19, 2025 •

edited

Loading

Uh oh!

BartChris commented Dec 22, 2025 •

edited

Loading

Uh oh!

BartChris commented Dec 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimize Process list by batching #6831

Are you sure you want to change the base?

Optimize Process list by batching #6831

Uh oh!

Conversation

BartChris commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BartChris commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BartChris commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BartChris commented Dec 19, 2025 •

edited

Loading

BartChris commented Dec 22, 2025 •

edited

Loading

BartChris commented Dec 29, 2025 •

edited

Loading