Open
Description
Currently Digger orchestrator checks out the repo first in order to parse digger.yml and determine the impacted projects. This approach is problematic for 2 reasons:
- The orchestrator has to be granted higher level of privilege than it otherwise would need. And if encrypted sensitive data stored in the repo (e.g. sops) then the orchestrator would need to have decryption privileges, which would effectively make it "ring zero" in terms of trust level.
- Filesystem operations on the orchestrator (git pull) can lead to performance issues, and make it more difficult to achieve HA setup compared to a service that does not rely on FS. Also in case of heavy repos this will increase the memory footprint.
We could solve for both by doing the initial parsing within the CI instead of the orchestrator:
- Orchestrator receives a request via webhook, but does not do any parsing
- Orchestrator triggers a "warmup job" that does parsing (with decryption if needed) and determines which projects are affected
- The warmup job reports its findings back to the orchestrator, e.g. "projects A, C, D and Z need to be planned"
- The orchestrator triggers 3 additional jobs to plan projects C, D and Z
- The warmup job continues with project A
The continuation of the first job is actually optional - it'll only save some init time. We could just as well trigger 4 jobs afresh for each project. It can be argued that this approach is actually cleaner because it'll make it easy to understand and debug in case of failures; each job only does one thing.