kubeedge · NishantSinghhhhh · Apr 23, 2026
diff --git a/examples/llm-agent/singletask_learning_bench/README.md b/examples/llm-agent/singletask_learning_bench/README.md
@@ -1,115 +1,225 @@
-# Quick Start about Personalized LLM Agent 
+# Ianvs LLM-Agent Benchmark
+### Setup & Reproduction Guide
+> KubeEdge Ianvs · singletask_learning_bench · bloom-1b4-zh
 
-Welcome to Ianvs! Ianvs aims to test the performance of distributed synergy AI solutions following recognized standards, in order to facilitate more efficient and effective development. Quick start helps you to test your algorithm on Ianvs with a simple example of industrial defect detection. You can reduce manual procedures to just a few steps so that you can build and start your distributed synergy AI solution development within minutes.
+---
 
-Before using Ianvs, you might want to have the device ready:
+## 1. Overview
 
-- One machine is all you need, i.e., a laptop or a virtual machine is sufficient and a cluster is not necessary
-- 2 CPUs or more
-- 4GB+ free memory depends on the algorithm and simulation setting
-- 10GB+ free disk space
-- Internet connection for GitHub and pip, etc
-- Python 3.6+ installed
+This guide documents all the changes required to get the Ianvs LLM-Agent benchmark pipeline running end-to-end with a locally hosted 1.4B parameter BLOOM model. It covers environment setup, dataset creation, path fixes, configuration file creation, Python code modifications, and metric evaluation.
 
-In this example, we are using the Linux platform with Python 3.7.1. If you are using Windows, most steps should still apply but a few like commands and package requirements might be different.
+By the end of this guide, the pipeline will:
 
-The proposal for this demo: [Personalized LLM Agent based on KubeEdge-Ianvs Cloud-Edge Collaboration](https://github.com/Frank-lilinjie/ianvs/blob/main/docs/proposals/algorithms/single-task-learning/Personalized%20LLM%20Agent%20based%20on%20KubeEdge-Ianvs%20Cloud-Edge%20Collaboration.md)
+- Load the `bloom-1b4-zh` model weights from local disk
+- Parse a custom JSONL dataset via `JsonlDataParse`
+- Fine-tune the model using LoRA via the HuggingFace Trainer
+- Evaluate predictions using ROUGE-1, ROUGE-2, and ROUGE-L metrics
+- Output a ranked results table via the Ianvs benchmarking framework
 
-## Step 1. Ianvs Preparation
+---
 
-First, we download the code of Ianvs. Assuming that we are using `/ianvs` as workspace, Ianvs can be cloned with `Git` as:
+## 2. Prerequisites
 
-```shell
-mkdir /ianvs
-cd /ianvs #One might use another path preferred
+### 2.1 System Requirements
 
-mkdir project
-cd project
-git clone https://github.com/kubeedge/ianvs.git   
+- Ubuntu 20.04 or later (tested on Ubuntu with Dell G15)
+- Python 3.10 or 3.12
+- At least 8 GB RAM (16 GB recommended for the 1.4B model)
+- At least 10 GB free disk space for model weights
+- Internet access for initial pip installs (model weights downloaded separately)
+
+### 2.2 Base Ianvs Installation
+
+Ensure Ianvs is cloned and its core package is installed in your virtual environment before proceeding:
+
+```bash
+git clone https://github.com/kubeedge/ianvs.git
+cd ianvs
+pip install -r requirements.txt
+pip install .
 ```
 
-Then, we install third-party dependencies for ianvs.
+---
 
-**Attention**: The project requires updating the sedna.zip in the file to sednaJsonForAgent.zip.
+## 3. Environment Setup
 
-```shell
-sudo apt-get update
-sudo apt-get install libgl1-mesa-glx -y
-python -m pip install --upgrade pip
+### 3.1 Create and Activate Virtual Environment
 
-cd ianvs 
-python -m pip install ./examples/resources/third_party/*
-python -m pip install -r requirements.txt
+```bash
+python3 -m venv ianvs_env
+source ianvs_env/bin/activate
 ```
 
-We are now ready to install Ianvs.
+### 3.2 Install Required Libraries
+
+Install all machine learning dependencies into the virtual environment. These are **not** included in the base Ianvs requirements:
+
+```bash
+pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
+pip install transformers peft datasets evaluate rouge_score
+```
+
+| Library | Purpose |
+|---|---|
+| `torch` | PyTorch backend for model training and inference |
+| `transformers` | HuggingFace model loading, tokenizer, Trainer API |
+| `peft` | LoRA-based parameter-efficient fine-tuning |
+| `datasets` | In-memory dataset construction for the Trainer |
+| `evaluate` | Metric loading framework |
+| `rouge_score` | Direct ROUGE scoring without remote downloads |
+
+### 3.3 Download Model Weights Locally
+
+The `Langboat/bloom-1b4-zh` model must be downloaded to local disk. The HuggingFace Hub downloader can crash with relative paths, so a local copy avoids that entirely.
 
-```shell
-python setup.py install 
+```bash
+pip install huggingface_hub
+huggingface-cli download Langboat/bloom-1b4-zh \
+    --local-dir ianvs/examples/llm-agent/pretrains/Langboat/bloom-1b4-zh
 ```
 
-## Step 2. Dataset and Model Preparation
+Expected files after download:
 
-In this case, we have provided datasets for three different scenarios: human pose detection, environmental sound classification, and facial recognition. These datasets are generated by GPT-4. They adhere to the standard Agent data structure, featuring two distinct roles: User and Assistant. Their content corresponds to the prompt and label, respectively. You can customize your dataset in a similar format.
+- `config.json`
+- `tokenizer.json` / `tokenizer_config.json` / `special_tokens_map.json`
+- `pytorch_model.bin` (or sharded safetensors files)
 
-- Place the dataset at the following path: `./examples/LLM-Agent-Benchmark/dataset/`
+---
 
-- Place the configuration file at the following path: `./examples/LLM-Agent-Benchmark/config/`
+## 4. Dataset Setup
 
-- Place the pre-trained model at the following path: `./examples/LLM-Agent-Benchmark/pretrains/`
+### 4.1 Background
 
-- Place the evaluate at the following path: `./examples/LLM-Agent-Benchmark/evaluate/`. The source code can be obtained from [Evaluate](https://github.com/huggingface/evaluate)
+The original KubeEdge dataset URL referenced in the example returns a 404 error. A custom 10-item dataset must be created manually to satisfy the pipeline.
 
-The pretrain model used in the current case originates from [bloom-1b4-zh](https://huggingface.co/Langboat/bloom-1b4-zh)
+### 4.2 Create the Dataset File
 
-The file path for the project is as follows:
+Create the file at:
 
 ```
--ianvs
-	|-....
-	|-examples
-		|-...
-		|-LLM-Agent-Benchmark
-			|-config
-			|-dataset
-			|-evaluate
-			|-pretrains
-			|-singletask_learning_bench
-				|-testalgorithms
-					|-basemodel.py
-					|-test_algorithm.yaml
-       	|-testenv
-       		|-rouge.py
-       		|-testenv.yaml
-       	|-benchmarkingjob.yaml
-       	|-README.md
+ianvs/examples/llm-agent/dataset/activity_classification.jsonl
+```
+
+Each line must be a valid JSON object with exactly the keys `question` and `answer`. The `JsonlDataParse` class inside Ianvs hardcodes these key names:
+
+```jsonl
+{"question": "What activity is being performed?", "answer": "Walking"}
+{"question": "Describe the motion pattern.", "answer": "Repetitive arm swing"}
+```
+
+Add at least 8 more lines following the same format.
+
+> **Important:** The file must use `.jsonl` format (one JSON object per line), not `.json`.
+
+---
+
+## 5. Configuration Files
+
+### 5.1 testenv.yaml
+
+**Location:** `examples/llm-agent/singletask_learning_bench/testenv/testenv.yaml`
+
+| Field | Old Value | New Value |
+|---|---|---|
+| Dataset keys | `train_index` / `test_index` | `train_data` / `test_data` |
+| Dataset file | `dummy.txt` | `activity_classification.jsonl` |
+| Dataset format | `TxtDataParse` | `JsonlDataParse` |
+
+### 5.2 algorithm.yaml
+
+**Location:** `examples/llm-agent/singletask_learning_bench/testalgorithms/algorithm.yaml`
+
+- Replace all relative `./examples/LLM-Agent-Benchmark/` paths with absolute paths
+- Fix incorrect capitalisation in folder names (`LLM-Agent-Benchmark` → `llm-agent`)
+- Remove any invisible trailing whitespace at the end of the `train_config` path — this causes a silent crash
+
+### 5.3 config.json (create from scratch)
+
+**Location:** `examples/llm-agent/config/config.json`
+
+Create this file with the following absolute-path fields:
+
+```json
+{
+  "tokenizer_dir": "ianvs/examples/llm-agent/pretrains/Langboat/bloom-1b4-zh",
+  "data_dir":      "ianvs/dataset/",
+  "output_dir":    "ianvs/examples/llm-agent/output/",
+  "auth_token":    "",
+  "token_factor":  1,
+  "device":        "cpu",
+  "trust_remote":  true
+}
 ```
 
+### 5.4 train_config.json (create from scratch)
+
+**Location:** `examples/llm-agent/config/train_config.json`
+
+- Use `true` (JSON boolean), not `"True"` (string) — a string value silently breaks the Trainer
+- Set `output_dir` to an absolute local path
+- Include `"half_lora": true` to reduce memory usage on CPU
+
+---
+## 6. Running the Benchmark
 
+### 6.1 Full Command
 
-## Step 3. Ianvs Execution and Presentation
+```bash
+source ianvs_env/bin/activate
+cd ianvs
+ianvs -f examples/llm-agent/singletask_learning_bench/benchmarkingjob.yaml
+```
 
-We are now ready to run the ianvs for benchmarking.
+### 6.2 Expected Output
 
-```shell
-cd /ianvs/project
+A successful run produces a ranked results table:
 
-ianvs -f ./examples/LLM-Agent-Benchmark/singletask_learning_bench/benchmarkingjob.yaml
+```
++------+-----------+--------+--------+--------+--------------------+
+| rank | algorithm | rouge1 | rouge2 | rougeL |      paradigm      |
++------+-----------+--------+--------+--------+--------------------+
+|  1   | LLM_agent |  0.048 |  0.0   |  0.048 | singletasklearning |
++------+-----------+--------+--------+--------+--------------------+
 ```
 
-Finally, the user can check the result of benchmarking on the console and also in the output path( e.g. `/ianvs/lifelong_learning_bench/workspace`) defined in the benchmarking config file ( e.g. `benchmarkingjob.yaml`). In this quick start, we have done all configurations for you and the interested readers can refer to [benchmarkingJob.yaml](https://ianvs.readthedocs.io/en/latest/guides/how-to-test-algorithms.html#step-1-test-environment-preparation) for more details.
+> Low scores are expected with only 10 training samples. See Section 8 for how to improve them.
 
-| rank | algorithm | rouge1   | rouge2   | rougeL   | paradigm           | basemodel | basemodel-config                                  | basemodel-train_config                                  | time                | url                                                          |
-| ---- | --------- | -------- | -------- | -------- | ------------------ | --------- | ------------------------------------------------- | ------------------------------------------------------- | ------------------- | ------------------------------------------------------------ |
-| 1    | LLM_agent | 0.401155 | 0.310173 | 0.401876 | singletasklearning | LLM_agent | ./examples/LLM-Agent-Benchmark/config/config.json | ./examples/LLM-Agent-Benchmark/config/train_config.json | 2024-09-24 15:08:17 | ./workspace/benchmarkingjob/LLM_agent/adb8baf8-7a43-11ef-960e-b07b25dd6922 |
+---
 
-This ends the quick start experiment.
+## 7. Improving ROUGE Scores
 
-# What is next
+The low baseline scores have four fixable causes:
 
-If any problems happen, the user can refer to [the issue page on Github](https://github.com/kubeedge/ianvs/issues) for help and are also welcome to raise any new issue.
+| Issue | Current Setting | Recommended Fix |
+|---|---|---|
+| Too few training samples | 10 items | Expand dataset to 100+ items |
+| Too few training epochs | 2 epochs | Set `num_train_epochs` to 10+ |
+| Weak LoRA config | `r=8, lora_alpha=1` | Set `r=16, lora_alpha=32` |
+| Prompt not stripped from output | Full sequence returned by `predict()` | Slice off input tokens before decoding |
 
-Enjoy your journey on Ianvs!
+The most impactful single fix is stripping the input prompt from the model output before ROUGE scoring. Without this, the prediction string contains the full prompt plus the generated answer, inflating mismatches against the clean reference string.
 
 
+## 8. Final Directory Layout
 
+Key files after all changes are applied:
+
+```
+ianvs/
+
+  examples/llm-agent/
+    dataset/
+    	activity_classification.jsonl      ← custom dataset
+    config/
+      config.json                          ← created from scratch
+      train_config.json                    ← created from scratch
+    pretrains/Langboat/bloom-1b4-zh/       ← downloaded weights
+    singletask_learning_bench/
+      benchmarkingjob.yaml
+      testalgorithms/
+        algorithm.yaml                     ← paths fixed
+        basemodel.py                       ← 6 code changes
+      testenv/
+        testenv.yaml                       ← keys + format fixed
+        rouge.py                           ← EOF removed, scorer rewritten
+```
diff --git a/examples/llm-agent/singletask_learning_bench/requirements.txt b/examples/llm-agent/singletask_learning_bench/requirements.txt
@@ -0,0 +1,6 @@
+torch
+transformers
+peft
+datasets
+evaluate 
+rouge_score