Skip to content
This repository was archived by the owner on Feb 28, 2025. It is now read-only.
This repository was archived by the owner on Feb 28, 2025. It is now read-only.

[EPIC] Filter out anomalous keywords from training dataset of workload logs #1091

Open
@sanjay920

Description

@sanjay920

Summary:

Currently, when a user would like to train a Deep Learning model on a watchlist of workloads, the corresponding logs for all of the workloads specified are fetched from Opensearch within the last hour. While a check is made within the Opensearch query to omit any logs which were previously marked as anomalous, there is no check made for any workload logs which contain keywords that are typically associated with anomalous logs. I propose that we maintain a list of anomalous keywords and for any log message that is fetched from Opensearch, we do not include it in the training dataset if it contains at least one word from the list of anomalous keywords.

Use case:

This will filter out workload log messages with anomalous keywords from the training data of the Deep Learning model.

Benefits:

  • Acts as safe guard to avoid adding clearly anomalous log messages to training dataset.
  • Improves insights provided by Deep Learning model.

Level of Effort:

  • Add changes to code base and test changes: 1 day

Issues:

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions