[EPIC] Filter out anomalous keywords from training dataset of workload logs #1091
Description
Summary:
Currently, when a user would like to train a Deep Learning model on a watchlist of workloads, the corresponding logs for all of the workloads specified are fetched from Opensearch within the last hour. While a check is made within the Opensearch query to omit any logs which were previously marked as anomalous, there is no check made for any workload logs which contain keywords that are typically associated with anomalous logs. I propose that we maintain a list of anomalous keywords and for any log message that is fetched from Opensearch, we do not include it in the training dataset if it contains at least one word from the list of anomalous keywords.
Use case:
This will filter out workload log messages with anomalous keywords from the training data of the Deep Learning model.
Benefits:
- Acts as safe guard to avoid adding clearly anomalous log messages to training dataset.
- Improves insights provided by Deep Learning model.
Level of Effort:
- Add changes to code base and test changes: 1 day