This project was developed to apply various concepts learned during the Knowledge Engineering course, considering available time and resources. Assumptions were made regarding model behavior, and observations were drawn by comparing the obtained results with expected outcomes.
The case study is structured into four main sections, each analyzing a different aspect:
-
Application of Machine Learning Models (Random Forest, k-NN, and XGBoost) using four different approaches:
- Baseline results on the original dataset
- Class weighting to handle imbalances
- Random undersampling to balance class sizes
- Random oversampling to balance class sizes
-
Feature Scaling applied to probabilistic learning using a Gaussian Naïve Bayes classifier, followed by result analysis.
-
Feature Selection for decision trees and evaluation of its impact.
-
Clustering Analysis using the elbow method to assess data distribution concerning binary classification.
The dataset under study contains a large volume of economic transactions recorded under various conditions. These transactions serve as the basis for detecting and classifying fraudulent activities.