Research Artifact of the paper Efficient Detection of Intermittent Job Failures Using Few-Shot Learning accepted at the IEEE 41st International Conference on Software Maintenance and Evolution ICSME 2025, Industry Track.
This artifact has been awarded the "Open Research Object" and "Research Object Reviewed" badges at ICSME 2025 Artifact Evaluation Track. It includes:
- SLID - Source Code for creating and evaluating few-shot fine-tuned Small Language models for Intermittent job failures Detection.
- Experimental Results including raw results from running the experiment on the Veloren project.
- Jupyter Notebooks used for conducting the study.
For the purpose of the original study, we collected CI job data from GitLab projects using the glbuild Python library. For confidentiality reasons, the data collected from TELUS projects are not included. However, we included the build job dataset collected and manually labeled from the open-source software (OSS) project Veloren to facilitate reproducibility and reuse.
1.) notebooks/
includes the Jupyter Notebooks used to prepare data and answer our RQs. These notebooks are not exercisable, but for read-only purpose.
2.) data/
includes the datasets of the studied OSS project Veloren.
- Prepared Dataset
prepared.zip
with automated labels and features for baseline replication - Sample Dataset
sampled.zip
for performing manual labeling - Labeled Sample Dataset
labeled.zip
including the manual and automated labels. This dataset is the input of the FSL model for the OS project. - Raw Sampled Logs
logs/raw.zip
of each job in the sampled dataset. Each log file in the directory is named as follows:
{projectId}_{jobId}_{automatedLabel}_{manualLabel}_{failureCategoryId}.log
where the failureCategoryId
maps on the categories in the failure_reasons.csv file.
2.) src/
contains the source code for:
- Creating and evaluating an FSL model
models/run.py
- Creating and evaluation a baseline model
models/baselines/sota_brown_detector.py
- FSL hyperparameter search module
models/hp_search.py
- FSL model evaluator module
models/evaluator.py
- Log pre-processing utilities
preprocessing/log.py
poetry self add poetry-plugin-shell
poetry install
poetry shell
unzip data/prepared.zip -d .
Optionally, also unzip data/sampled.zip
, data/labeled.zip
, and data/logs/raw.zip
Here is an example of one-shot fine-tuning using the OSS project's CI job data included in this package. The seed
arguments can be changed for another reproducible repeat.
NOTE: We recommend 16GB or more of GPU and a Linux-based operating system for fast training (~5min for one-shot training).
python src/models/run.py --project veloren --shots 1 --seed 1
FSL results are appended to the data/results/runs/veloren.csv
file. FSL results obtained on the Veloren project during our experiments are recorded in data/results/runs/veloren_saved.csv
.
Expected results content is described in the following table:
0_precision | 0_recall | 1_precision | 1_recall | 1_f1_score | random_seed | num_shots | training_time |
---|---|---|---|---|---|---|---|
0.78 | 0.96 | 0.91 | 0.57 | 0.70 | 1 | 1 | 0.41 |
0.95 | 0.36 | 0.48 | 0.97 | 0.64 | 4 | 1 | 0.74 |
0.75 | 0.87 | 0.72 | 0.52 | 0.61 | 2 | 1 | 0.50 |
0.79 | 0.98 | 0.95 | 0.6 | 0.73 | 3 | 1 | 0.48 |
0.80 | 0.95 | 0.9 | 0.63 | 0.74 | 5 | 1 | 0.39 |
During our experiments we used the following values for each argument:
project
: A, B, C, D, E, velorenshots
: 1 to 15seed
: 1 to 100
Run the SOTA brown job detector on the project veloren
for comparison.
python src/models/baselines/sota_brown_detector.py --project veloren --seed 1
Baseline results are appended to the data/results/baselines/veloren.csv
file. Baseline results obtained on the Veloren project during our experiments are recorded in data/results/baselines/veloren_saved.csv
.