The current XGBoost has not been tested with extra hyperparameters. Moreover, it trains on only 260k samples which should be enough, but could be increased if we are to remove all test samples in the training data (the current training data is contaminated with test samples).
One dataset could be https://huggingface.co/datasets/m4vic/prompt-injection-dataset
This issue is pretty big, so my request is for pull requests that work towards this problem, no matter how small or how big. Whether it be rewriting the Jupyter Notebook so it doesn't use test samples, or adding adding hyperparameter sweeps.
The current XGBoost has not been tested with extra hyperparameters. Moreover, it trains on only 260k samples which should be enough, but could be increased if we are to remove all test samples in the training data (the current training data is contaminated with test samples).
One dataset could be https://huggingface.co/datasets/m4vic/prompt-injection-dataset
This issue is pretty big, so my request is for pull requests that work towards this problem, no matter how small or how big. Whether it be rewriting the Jupyter Notebook so it doesn't use test samples, or adding adding hyperparameter sweeps.