This project enables experiments with large language models (LLMs) for classification tasks. It supports processing data using predefined configurations, handling multiple model setups, and generating evaluation reports.
-
Run Experiments with Configurable Inputs:
Run classification tasks using CSV input files and configuration settings. -
Support for Investigator and Model Modes:
- Investigator Mode: Execute experiments for a specific investigator using predefined configurations.
- Models Mode: Execute experiments for multiple models with their respective configurations.
-
Generative Model Integration:
Utilizes LLMs for predictions with user-defined prompts. -
Partial Result Handling:
Saves intermediate results to prevent data loss during lengthy executions. -
Evaluation Metrics:
Includes evaluation functionality such as edit distance analysis for classification performance.
- Clone the repository:
git clone https://github.com/diverso-lab/ConfigurationLLMClassificator cd ConfigurationLLMClassificator
- Install dependencies:
pip install -r requirements.txt
-
Investigator Mode:
Execute experiments for a specific investigator using their configuration:python main.py --mode i --investigator investigatorName
-
Models Mode:
Run experiments for multiple models, optionally filtering by specific model names:python main.py --mode models --models model1 model2
A JSON file (e.g., configs/investigatorName_config.json
) defines the settings for a single investigator:
{
"csv_path": "path/to/data.csv",
"model": "model_name",
"system_prompt": "Define classification prompt",
"max_tokens": 256,
"temperature": 1,
"true_column": "class"
}
A JSON file (e.g., configs/models_config.json
) contains settings for multiple models:
[
{
"csv_path": "path/to/data1.csv",
"model": "model1",
"system_prompt": "Define prompt",
"max_tokens": 256,
"temperature": 1,
"true_column": "class"
},
{
"csv_path": "path/to/data2.csv",
"model": "model2",
"system_prompt": "Define another prompt",
"max_tokens": 512,
"temperature": 1,
"true_column": "label"
}
]
-
Results Directory:
Results are saved in theoutput/
directory with a unique hash based on the configuration. -
Files:
config.csv
: Saves the configuration used for this experiment.results.csv
: Predicted labels for each instance.report.csv
: Performance metrics and evaluation results.