-
Notifications
You must be signed in to change notification settings - Fork 1
Problem: MCPServer stalling, no task/transfer execution #1791
Description
Expected behaviour
Archivematica runs without requiring regular manual intervention (restarts).
Current behaviour
We have to manually restart the MCPServer in order to continue processing.
Steps to reproduce
We send transfers from a Django application using the AMClient.
Initially, we sent 100 parallel requests and had to restart the MCPServer every couple of hours. We then reduced this to 10 parallel requests, which allowed the system to run for several days before stalling again (after completing thousands of transfers).
The logs do not show any obvious exceptions or errors that would explain this state.
Example MCPServer last logs before stopping:
INFO ThreadPoolExecutor-1_0 2026-03-27 18:23:10 client:run:106: Running Move to generate transfer tree (package 6a7c09c1-deba-473e-8b71-555ccfe49c62)
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task fc7a6e6a-a5ff-482d-9b6b-b5919c0f4837 finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task e66cb271-0f94-435e-97b8-ea307f5be23c finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task 8acba6b2-803a-4674-86f5-5163872a7d87 finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task 4a36ca71-ff21-473d-aaa7-bb4cfbbc820b finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task 12eb2c75-640a-4e71-822c-fa1d34b3169e finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task 249f486a-2340-44de-8a94-b6ab7be7d77c finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task fb22de16-c5ab-464f-9efb-186ceccf5b63 finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task 91553207-8ec9-46f1-9582-c8667204e64d finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task d76af8ea-6302-42ee-8200-425911ba5401 finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:submit:219: Submitted gearman job 44c54361-7203-4ccf-be36-83a3748e33d0 (movetransfer_v0.0)
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 gearman_backend:update_task_results:261: Task 21a81349-7a40-446b-8e3a-186b5ba1ea7f finished! Result COMPLETE - 0
DEBUG ThreadPoolExecutor-1_0 2026-03-27 18:23:10 chain:__next__:110: Done with chain 97be337c-ff27-4869-bf63-ef1abc9df15d for package 6a7c09c1-deba-473e-8b71-555ccfe49c62
After this point, no new tasks are spawned. Our application continues sending new transfers, but they are not processed. For example:
DEBUG RPCServer-2 2026-03-30 06:23:15 packages:create_package:349: Transfer object created: d4cd02fc-b8f3-4801-8384-09ca4f3c75dc
DEBUG RPCServer-2 2026-03-30 06:23:15 packages:create_package:354: Package d4cd02fc-b8f3-4801-8384-09ca4f3c75dc: starting transfer (('Archive_30447_Step_202105', 'standard', '<transfer-source-id>:/Archive_30447/', '/var/archivematica/sharedDirectory/tmp/tmpxuhf_70q'))
DEBUG ThreadPoolExecutor-1_0 2026-03-30 06:23:15 packages:_start_package_transfer_with_auto_approval:412: Package d4cd02fc-b8f3-4801-8384-09ca4f3c75dc: determined vars transfer_rel=tmp/tmpxuhf_70q/Archive_30447_Step_202105, filepath=/var/archivematica/sharedDirectory/tmp/tmpxuhf_70q/Archive_30447_Step_202105, path=<transfer-source-id>:/Archive_30447/.
DEBUG ThreadPoolExecutor-1_0 2026-03-30 06:23:15 packages:_start_package_transfer_with_auto_approval:420: Package d4cd02fc-b8f3-4801-8384-09ca4f3c75dc: copying chosen contents from transfer sources (from=<transfer-source-id>:/Archive_30447/., to=tmp/tmpxuhf_70q/Archive_30447_Step_202105)
DEBUG ThreadPoolExecutor-1_0 2026-03-30 06:23:15 storageService:get_location:193: Storage locations returned: [{'description': None, 'enabled': True, 'path': '/var/archivematica/sharedDirectory', 'pipeline': ['/api/v2/pipeline/<pipeline-id>/'], 'purpose': 'CP', 'quota': None, 'relative_path': 'var/archivematica/sharedDirectory/', 'resource_uri': '/api/v2/location/<currently-processing-id>/', 'space': '/api/v2/space/<space-id>/', 'used': 0, 'uuid': '<currently-processing-id>'}]
DEBUG ThreadPoolExecutor-1_0 2026-03-30 06:23:15 storageService:get_location:193: Storage locations returned: [{'description': 'Default transfer source', 'enabled': True, 'path': '/sip-path', 'pipeline': ['/api/v2/pipeline/<pipeline-id>/'], 'purpose': 'TS', 'quota': None, 'relative_path': '/sip-path', 'resource_uri': '/api/v2/location/<transfer-source-id>/', 'space': '/api/v2/space/<space-id>/', 'used': 0, 'uuid': '<transfer-source-id>'}]
DEBUG ThreadPoolExecutor-1_0 2026-03-30 06:23:15 packages:_copy_from_transfer_sources:247: source: Archive_30447/., destination: /var/archivematica/sharedDirectory/tmp/tmpxuhf_70q/Archive_30447_Step_202105/.
DEBUG ThreadPoolExecutor-1_0 2026-03-30 06:23:16 packages:_start_package_transfer_with_auto_approval:428: Package d4cd02fc-b8f3-4801-8384-09ca4f3c75dc: moving package to processing directory
DEBUG ThreadPoolExecutor-1_0 2026-03-30 06:23:16 packages:_start_package_transfer_with_auto_approval:433: Package d4cd02fc-b8f3-4801-8384-09ca4f3c75dc: starting workflow processing
DEBUG ThreadPoolExecutor-1_0 2026-03-30 06:23:16 chain:__init__:86: Creating JobChain 6953950b-c101-4f4c-a0c3-0cd0684afe5e for package d4cd02fc-b8f3-4801-8384-09ca4f3c75dc (initial link 045c43ae-d6cf-44f7-97d6-c8a602748565)
After this log line, no JobChain is actually created. Instead, similar log blocks repeat for different transfers.
In this example, we waited 2 days to see if the system would recover automatically, but it did not. In earlier cases, processing sometimes continued after 1–2 hours.
In the UI, many transfers and ingests are incomplete, and no new tasks are spawned. The Gearman queue is empty. The stalling often happens around the step Running Move to generate transfer tree.
While the MCPServer is in this stuck state the new transfers are lost: they never reach Storage Service and after a restart they are not recovered. But the packages that were already in progress will be completed.
One other thing to note is that we added a CronJob that runs every 6 hours to clean up transient data. This increased the running time, but eventually it always gets stuck on a continuous load.
python -m archivematica.dashboard.manage purge_transient_processing_data --keep-failed
Your environment (version of Archivematica, operating system, other relevant details)
Archivematica v1.18.0, Storage Service v0.24.0
We are using Archivematica on OpenShift with CephFS persistent storage for the processing directories. Could the CephFS storage be an issue? The transfer source is EOS (CERN storage) mounted on the pod.
The Docker images are the ones provided by Archivematica. For MCPClient, we added LibreOffice, pstoedit, and a newer version of GhostScript.
For Artefactual use:
Before you close this issue, you must check off the following:
- All pull requests related to this issue are properly linked
- All pull requests related to this issue have been merged
- A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
- Documentation regarding this issue has been written and merged (if applicable)
- Details about this issue have been added to the release notes (if applicable)