Support for on-demand execution of jobs using its own scheduler

**Is your feature request related to a problem? Please describe.**
At the moment, Optimus works just as an orchestrator that transpiles basic job instructions to Airflow parsable dags. It doesn't have anything to execute jobs of its own. Although Airflow is pretty stable and used widely, it is build as a generic job executor. I believe we can do a lot more if we have our own scheduling & execution engine. But for now we will start with having just the execution engine. 
Note: I am not suggesting to deprecate the use of Airflow, this will work in tandem to enhance the existing workflow.

**Describe the solution you'd like**
I have already decided to call this `prime`(long back) and it will be used as follows:
- User should be able to simply request a run on demand via `optimus job run <job-name>` and the job should start executing.
- Optimus API will allow any external service to execute on demand jobs instead of scheduling them via airflow if needed.
- Users should be able to leverage on-demand execution for adhoc operations like DML queries, one time syncs, cleanups, etc.
- User should be able to run complete pipeline of jobs end-to-end, this is when a job is dependent on a different job and so on.
- Running a whole pipeline will allow users to write `tests` for data pipeline which can execute in different execution environment then the production similar to how we write integration tests for code.
- Running a pipeline will allow `replay` execution to be done on Optimus clusters.
- Optimus should work as a distributed cluster and from the start should be capable of scaling `on-demand horizontally`.
- Optimus should be able to run as `highly-available`(except postgres db).
- No extra moving parts, like airflow has `Redis`, `Celery`, etc.
- Local development of Optimus will be really easy as developers won't be needing anything other than Optimus and a postgres db(we should start supporting sqlite as well to even get rid of postgres.)

**Describe alternatives you've considered**
There are no alternatives to this at the moment.

**Additional context**
We can break this into different milestones as follows:
### v1
- user will be able to run a single job and get the final status
- job can have multiple hooks, and it should execute everything in that job in correct order
- we will start with supporting local docker host as the execution engine

### v2
- support for executing set of jobs interdependent on each other
- proper user commands to interact with optimus server to fetch things like logs of execution.

### v3
- support for kubernetes as the execution engine
- running replay on optimus cluster

### v4
- support for writing testable pipelines
- support for own scheduling engine capable of doing everything what airflow can do except UI

### v5 (although this will go out of scope of this feature request but wanted to list the vision)
- get rid of postgres and keep Optimus state in local filesystem to make Optimus truly independent and highly available.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for on-demand execution of jobs using its own scheduler #300

v1

v2

v3

v4

v5 (although this will go out of scope of this feature request but wanted to list the vision)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for on-demand execution of jobs using its own scheduler #300

Description

v1

v2

v3

v4

v5 (although this will go out of scope of this feature request but wanted to list the vision)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions