[EPIC] Enable AI Services when watchlist is created or updated. #1144
Description
Issues:
- Update AI services to fetch parameters from model-training-parameters Nats jetstream kv #1146
- Update /model/train endpoint to enable AI services before sending over workloads. #1147
- Update AIOps gateway plugin to update model-training-parameters Nats Jetstream kv storage. #1148
- Update AI services to launch training jobs upon startup of service #1233
Summary:
Currently, when a user would like to enable AI services, they will go to the Opni Admin Dashboard. First, they are required to enable Logging and once that is done, they will go to the AIOps panel and check the "Enable GPU Services button". When they hit the "Save" button, the GPU services are installed within the Kubernetes cluster. This includes the workload DRAIN service, the training controller service, the GPU Controller service and CPU Inferencing service. This UX can be avoided by simply detecting the availability of a GPU within the cluster and when the user creates or updates the workload log anomaly watchlist to train a Deep Learning model, that is when these GPU services should be installed, rather than through a checkbox button.
Use case:
This will remove the "Enable GPU Services" check box and now will install the GPU services when the user decides to update the watchlist for the very first time with workloads. Opni GPU services will automatically come up upon the creation or update of a workload log anomaly watchlist.
Benefits:
- Improves the usability of Opni AIOps log anomaly
Level of Effort:
- Code implementation: <= 3 days
- Testing and debugging: <= 2 days
- Documentation: 2 days