Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
250 changes: 180 additions & 70 deletions examples/llm-agent/singletask_learning_bench/README.md
Original file line number Diff line number Diff line change
@@ -1,115 +1,225 @@
# Quick Start about Personalized LLM Agent
# Ianvs LLM-Agent Benchmark
### Setup & Reproduction Guide
> KubeEdge Ianvs · singletask_learning_bench · bloom-1b4-zh

Welcome to Ianvs! Ianvs aims to test the performance of distributed synergy AI solutions following recognized standards, in order to facilitate more efficient and effective development. Quick start helps you to test your algorithm on Ianvs with a simple example of industrial defect detection. You can reduce manual procedures to just a few steps so that you can build and start your distributed synergy AI solution development within minutes.
---

Before using Ianvs, you might want to have the device ready:
## 1. Overview

- One machine is all you need, i.e., a laptop or a virtual machine is sufficient and a cluster is not necessary
- 2 CPUs or more
- 4GB+ free memory depends on the algorithm and simulation setting
- 10GB+ free disk space
- Internet connection for GitHub and pip, etc
- Python 3.6+ installed
This guide documents all the changes required to get the Ianvs LLM-Agent benchmark pipeline running end-to-end with a locally hosted 1.4B parameter BLOOM model. It covers environment setup, dataset creation, path fixes, configuration file creation, Python code modifications, and metric evaluation.

In this example, we are using the Linux platform with Python 3.7.1. If you are using Windows, most steps should still apply but a few like commands and package requirements might be different.
By the end of this guide, the pipeline will:

The proposal for this demo: [Personalized LLM Agent based on KubeEdge-Ianvs Cloud-Edge Collaboration](https://github.com/Frank-lilinjie/ianvs/blob/main/docs/proposals/algorithms/single-task-learning/Personalized%20LLM%20Agent%20based%20on%20KubeEdge-Ianvs%20Cloud-Edge%20Collaboration.md)
- Load the `bloom-1b4-zh` model weights from local disk
- Parse a custom JSONL dataset via `JsonlDataParse`
- Fine-tune the model using LoRA via the HuggingFace Trainer
- Evaluate predictions using ROUGE-1, ROUGE-2, and ROUGE-L metrics
- Output a ranked results table via the Ianvs benchmarking framework

## Step 1. Ianvs Preparation
---

First, we download the code of Ianvs. Assuming that we are using `/ianvs` as workspace, Ianvs can be cloned with `Git` as:
## 2. Prerequisites

```shell
mkdir /ianvs
cd /ianvs #One might use another path preferred
### 2.1 System Requirements

mkdir project
cd project
git clone https://github.com/kubeedge/ianvs.git
- Ubuntu 20.04 or later (tested on Ubuntu with Dell G15)
- Python 3.10 or 3.12
- At least 8 GB RAM (16 GB recommended for the 1.4B model)
- At least 10 GB free disk space for model weights
- Internet access for initial pip installs (model weights downloaded separately)

### 2.2 Base Ianvs Installation

Ensure Ianvs is cloned and its core package is installed in your virtual environment before proceeding:

```bash
git clone https://github.com/kubeedge/ianvs.git
cd ianvs
pip install -r requirements.txt
pip install .
```

Then, we install third-party dependencies for ianvs.
---

**Attention**: The project requires updating the sedna.zip in the file to sednaJsonForAgent.zip.
## 3. Environment Setup

```shell
sudo apt-get update
sudo apt-get install libgl1-mesa-glx -y
python -m pip install --upgrade pip
### 3.1 Create and Activate Virtual Environment

cd ianvs
python -m pip install ./examples/resources/third_party/*
python -m pip install -r requirements.txt
```bash
python3 -m venv ianvs_env
source ianvs_env/bin/activate
```

We are now ready to install Ianvs.
### 3.2 Install Required Libraries

Install all machine learning dependencies into the virtual environment. These are **not** included in the base Ianvs requirements:

```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install transformers peft datasets evaluate rouge_score
```

| Library | Purpose |
|---|---|
| `torch` | PyTorch backend for model training and inference |
| `transformers` | HuggingFace model loading, tokenizer, Trainer API |
| `peft` | LoRA-based parameter-efficient fine-tuning |
| `datasets` | In-memory dataset construction for the Trainer |
| `evaluate` | Metric loading framework |
| `rouge_score` | Direct ROUGE scoring without remote downloads |

### 3.3 Download Model Weights Locally

The `Langboat/bloom-1b4-zh` model must be downloaded to local disk. The HuggingFace Hub downloader can crash with relative paths, so a local copy avoids that entirely.

```shell
python setup.py install
```bash
pip install huggingface_hub
huggingface-cli download Langboat/bloom-1b4-zh \
--local-dir ianvs/examples/llm-agent/pretrains/Langboat/bloom-1b4-zh
```

## Step 2. Dataset and Model Preparation
Expected files after download:

In this case, we have provided datasets for three different scenarios: human pose detection, environmental sound classification, and facial recognition. These datasets are generated by GPT-4. They adhere to the standard Agent data structure, featuring two distinct roles: User and Assistant. Their content corresponds to the prompt and label, respectively. You can customize your dataset in a similar format.
- `config.json`
- `tokenizer.json` / `tokenizer_config.json` / `special_tokens_map.json`
- `pytorch_model.bin` (or sharded safetensors files)

- Place the dataset at the following path: `./examples/LLM-Agent-Benchmark/dataset/`
---

- Place the configuration file at the following path: `./examples/LLM-Agent-Benchmark/config/`
## 4. Dataset Setup

- Place the pre-trained model at the following path: `./examples/LLM-Agent-Benchmark/pretrains/`
### 4.1 Background

- Place the evaluate at the following path: `./examples/LLM-Agent-Benchmark/evaluate/`. The source code can be obtained from [Evaluate](https://github.com/huggingface/evaluate)
The original KubeEdge dataset URL referenced in the example returns a 404 error. A custom 10-item dataset must be created manually to satisfy the pipeline.

The pretrain model used in the current case originates from [bloom-1b4-zh](https://huggingface.co/Langboat/bloom-1b4-zh)
### 4.2 Create the Dataset File

The file path for the project is as follows:
Create the file at:

```
-ianvs
|-....
|-examples
|-...
|-LLM-Agent-Benchmark
|-config
|-dataset
|-evaluate
|-pretrains
|-singletask_learning_bench
|-testalgorithms
|-basemodel.py
|-test_algorithm.yaml
|-testenv
|-rouge.py
|-testenv.yaml
|-benchmarkingjob.yaml
|-README.md
ianvs/examples/llm-agent/dataset/activity_classification.jsonl
```

Each line must be a valid JSON object with exactly the keys `question` and `answer`. The `JsonlDataParse` class inside Ianvs hardcodes these key names:

```jsonl
{"question": "What activity is being performed?", "answer": "Walking"}
{"question": "Describe the motion pattern.", "answer": "Repetitive arm swing"}
```

Add at least 8 more lines following the same format.

> **Important:** The file must use `.jsonl` format (one JSON object per line), not `.json`.

---

## 5. Configuration Files

### 5.1 testenv.yaml

**Location:** `examples/llm-agent/singletask_learning_bench/testenv/testenv.yaml`

| Field | Old Value | New Value |
|---|---|---|
| Dataset keys | `train_index` / `test_index` | `train_data` / `test_data` |
| Dataset file | `dummy.txt` | `activity_classification.jsonl` |
| Dataset format | `TxtDataParse` | `JsonlDataParse` |

### 5.2 algorithm.yaml

**Location:** `examples/llm-agent/singletask_learning_bench/testalgorithms/algorithm.yaml`

- Replace all relative `./examples/LLM-Agent-Benchmark/` paths with absolute paths
- Fix incorrect capitalisation in folder names (`LLM-Agent-Benchmark` → `llm-agent`)
- Remove any invisible trailing whitespace at the end of the `train_config` path — this causes a silent crash

### 5.3 config.json (create from scratch)

**Location:** `examples/llm-agent/config/config.json`

Create this file with the following absolute-path fields:

```json
{
"tokenizer_dir": "ianvs/examples/llm-agent/pretrains/Langboat/bloom-1b4-zh",
"data_dir": "ianvs/dataset/",
"output_dir": "ianvs/examples/llm-agent/output/",
"auth_token": "",
"token_factor": 1,
"device": "cpu",
"trust_remote": true
}
```

### 5.4 train_config.json (create from scratch)

**Location:** `examples/llm-agent/config/train_config.json`

- Use `true` (JSON boolean), not `"True"` (string) — a string value silently breaks the Trainer
- Set `output_dir` to an absolute local path
- Include `"half_lora": true` to reduce memory usage on CPU

---
## 6. Running the Benchmark

### 6.1 Full Command

## Step 3. Ianvs Execution and Presentation
```bash
source ianvs_env/bin/activate
cd ianvs
ianvs -f examples/llm-agent/singletask_learning_bench/benchmarkingjob.yaml
```

We are now ready to run the ianvs for benchmarking.
### 6.2 Expected Output

```shell
cd /ianvs/project
A successful run produces a ranked results table:

ianvs -f ./examples/LLM-Agent-Benchmark/singletask_learning_bench/benchmarkingjob.yaml
```
+------+-----------+--------+--------+--------+--------------------+
| rank | algorithm | rouge1 | rouge2 | rougeL | paradigm |
+------+-----------+--------+--------+--------+--------------------+
| 1 | LLM_agent | 0.048 | 0.0 | 0.048 | singletasklearning |
+------+-----------+--------+--------+--------+--------------------+
```

Finally, the user can check the result of benchmarking on the console and also in the output path( e.g. `/ianvs/lifelong_learning_bench/workspace`) defined in the benchmarking config file ( e.g. `benchmarkingjob.yaml`). In this quick start, we have done all configurations for you and the interested readers can refer to [benchmarkingJob.yaml](https://ianvs.readthedocs.io/en/latest/guides/how-to-test-algorithms.html#step-1-test-environment-preparation) for more details.
> Low scores are expected with only 10 training samples. See Section 8 for how to improve them.

| rank | algorithm | rouge1 | rouge2 | rougeL | paradigm | basemodel | basemodel-config | basemodel-train_config | time | url |
| ---- | --------- | -------- | -------- | -------- | ------------------ | --------- | ------------------------------------------------- | ------------------------------------------------------- | ------------------- | ------------------------------------------------------------ |
| 1 | LLM_agent | 0.401155 | 0.310173 | 0.401876 | singletasklearning | LLM_agent | ./examples/LLM-Agent-Benchmark/config/config.json | ./examples/LLM-Agent-Benchmark/config/train_config.json | 2024-09-24 15:08:17 | ./workspace/benchmarkingjob/LLM_agent/adb8baf8-7a43-11ef-960e-b07b25dd6922 |
---

This ends the quick start experiment.
## 7. Improving ROUGE Scores

# What is next
The low baseline scores have four fixable causes:

If any problems happen, the user can refer to [the issue page on Github](https://github.com/kubeedge/ianvs/issues) for help and are also welcome to raise any new issue.
| Issue | Current Setting | Recommended Fix |
|---|---|---|
| Too few training samples | 10 items | Expand dataset to 100+ items |
| Too few training epochs | 2 epochs | Set `num_train_epochs` to 10+ |
| Weak LoRA config | `r=8, lora_alpha=1` | Set `r=16, lora_alpha=32` |
| Prompt not stripped from output | Full sequence returned by `predict()` | Slice off input tokens before decoding |

Enjoy your journey on Ianvs!
The most impactful single fix is stripping the input prompt from the model output before ROUGE scoring. Without this, the prediction string contains the full prompt plus the generated answer, inflating mismatches against the clean reference string.


## 8. Final Directory Layout

Key files after all changes are applied:

```
ianvs/

examples/llm-agent/
dataset/
activity_classification.jsonl ← custom dataset
config/
config.json ← created from scratch
train_config.json ← created from scratch
pretrains/Langboat/bloom-1b4-zh/ ← downloaded weights
singletask_learning_bench/
benchmarkingjob.yaml
testalgorithms/
algorithm.yaml ← paths fixed
basemodel.py ← 6 code changes
testenv/
testenv.yaml ← keys + format fixed
rouge.py ← EOF removed, scorer rewritten
```
6 changes: 6 additions & 0 deletions examples/llm-agent/singletask_learning_bench/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
torch
transformers
peft
datasets
evaluate
rouge_score
Loading
Loading