Description
Is your feature request related to a problem? Please describe.
At the moment, Optimus works just as an orchestrator that transpiles basic job instructions to Airflow parsable dags. It doesn't have anything to execute jobs of its own. Although Airflow is pretty stable and used widely, it is build as a generic job executor. I believe we can do a lot more if we have our own scheduling & execution engine. But for now we will start with having just the execution engine.
Note: I am not suggesting to deprecate the use of Airflow, this will work in tandem to enhance the existing workflow.
Describe the solution you'd like
I have already decided to call this prime
(long back) and it will be used as follows:
- User should be able to simply request a run on demand via
optimus job run <job-name>
and the job should start executing. - Optimus API will allow any external service to execute on demand jobs instead of scheduling them via airflow if needed.
- Users should be able to leverage on-demand execution for adhoc operations like DML queries, one time syncs, cleanups, etc.
- User should be able to run complete pipeline of jobs end-to-end, this is when a job is dependent on a different job and so on.
- Running a whole pipeline will allow users to write
tests
for data pipeline which can execute in different execution environment then the production similar to how we write integration tests for code. - Running a pipeline will allow
replay
execution to be done on Optimus clusters. - Optimus should work as a distributed cluster and from the start should be capable of scaling
on-demand horizontally
. - Optimus should be able to run as
highly-available
(except postgres db). - No extra moving parts, like airflow has
Redis
,Celery
, etc. - Local development of Optimus will be really easy as developers won't be needing anything other than Optimus and a postgres db(we should start supporting sqlite as well to even get rid of postgres.)
Describe alternatives you've considered
There are no alternatives to this at the moment.
Additional context
We can break this into different milestones as follows:
v1
- user will be able to run a single job and get the final status
- job can have multiple hooks, and it should execute everything in that job in correct order
- we will start with supporting local docker host as the execution engine
v2
- support for executing set of jobs interdependent on each other
- proper user commands to interact with optimus server to fetch things like logs of execution.
v3
- support for kubernetes as the execution engine
- running replay on optimus cluster
v4
- support for writing testable pipelines
- support for own scheduling engine capable of doing everything what airflow can do except UI
v5 (although this will go out of scope of this feature request but wanted to list the vision)
- get rid of postgres and keep Optimus state in local filesystem to make Optimus truly independent and highly available.