A Survey of Data Poisoning Techniques for Evasion Attacks on Financial Dataset
Christian Kyle Ranon
Bachelor of Business and Data Analytics
IE University โ School of Science and Technology
Supervised by: Luis Angel Galindo ([email protected])
Submission Date: April 30, 2025
This project investigates the vulnerability of machine learning models in the financial domain to Adversarial Machine Learning (AML)โspecifically, data poisoning attacks that occur during the training phase. It aims to showcase how easily predictive performance in risk assessment models can be undermined and offers insights into mitigation and defense strategies.
- Name: Financial Risk Assessment Dataset
- Source: Kaggle Dataset Link
- EDA Report: View EDA Report
- Languages: Python
- Libraries:
scikit-learn: Preprocessing, modelingAIJack: Adversarial simulation (SVM poisoning)ydata-profiling: Exploratory Data Analysispandas,numpy,matplotlib: Data manipulation & visualization
-
Adversarial Model Design
- Based on NISTโs taxonomy (2023), we define the adversaryโs goal (availability attack), knowledge (white-box), and capabilities (full data access for simulation).
-
Algorithm Under Attack
- Support Vector Machines (SVM) using a linear kernel.
- Attacks follow the approach defined in Biggio et al. (2012).
-
Dataset Processing
- Missing data imputed with
KNNImputer - Categorical data encoded via
OrdinalEncoderand one-hot encoding - Standardized numerical features with
StandardScaler - Class balancing: 1,500 samples each for Low, Medium, and High Risk
- Missing data imputed with
The poisoned dataset significantly reduced model performance, demonstrating the fragility of financial classification models under adversarial pressure. The discussion outlines attack effects and highlights potential defenses.
The paper argues for greater attention to adversarial robustness in the financial industry. It reveals how easy it is to introduce adversarial bias in datasets, especially those relying on tabular data in credit risk scoring.