Skip to content
This repository was archived by the owner on Feb 28, 2025. It is now read-only.
This repository was archived by the owner on Feb 28, 2025. It is now read-only.

[EPIC] Enable AI Services when watchlist is created or updated. #1144

Open
@AmartC

Description

@AmartC

OEP: https://github.com/rancher/opni/blob/main/enhancements/aiops/20230216-enable-gpu-services-training.md

Issues:

Summary:

Currently, when a user would like to enable AI services, they will go to the Opni Admin Dashboard. First, they are required to enable Logging and once that is done, they will go to the AIOps panel and check the "Enable GPU Services button". When they hit the "Save" button, the GPU services are installed within the Kubernetes cluster. This includes the workload DRAIN service, the training controller service, the GPU Controller service and CPU Inferencing service. This UX can be avoided by simply detecting the availability of a GPU within the cluster and when the user creates or updates the workload log anomaly watchlist to train a Deep Learning model, that is when these GPU services should be installed, rather than through a checkbox button.

Use case:

This will remove the "Enable GPU Services" check box and now will install the GPU services when the user decides to update the watchlist for the very first time with workloads. Opni GPU services will automatically come up upon the creation or update of a workload log anomaly watchlist.

Benefits:

  • Improves the usability of Opni AIOps log anomaly

Level of Effort:

  • Code implementation: <= 3 days
  • Testing and debugging: <= 2 days
  • Documentation: 2 days

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions