kubeedge · NishantSinghhhhh · Apr 23, 2026 · Apr 23, 2026 · Apr 23, 2026
diff --git a/examples/llm-edge-benchmark-suite/single_task_bench/README.md b/examples/llm-edge-benchmark-suite/single_task_bench/README.md
@@ -1,2 +1,113 @@
-Large Language Model Edge Benchmark Suite: Implementation on KubeEdge-lanvs
+# llm-edge-benchmark-suite single_task_bench
 
+This guide outlines the complete setup, configuration, and execution process for running the Large Language Model (LLM) benchmarking suite using the [Ianvs](https://github.com/kubeedge/ianvs) edge computing framework.
+
+This specific environment is configured to run the **Single Task Learning** paradigm, evaluating LLM inference performance (latency, throughput, and Time-To-First-Token) using `llama-cpp-python` with quantized models (e.g., Qwen 1.5 0.5B GGUF).
+
+---
+
+## 📋 Prerequisites
+
+Before running the benchmark, ensure you have the following ready:
+1. **Ianvs Framework**: Installed and configured.
+2. **Virtual Environment**: Your active Ianvs virtual environment (e.g., `ianvs_env`).
+3. **C++ Build Tools**: Required for compiling `llama.cpp` bindings (e.g., `build-essential` on Ubuntu).
+
+---
+
+## 🛠️ Step 1: Environment Setup
+
+First, activate your Ianvs virtual environment:
+```bash
+source /path/to/your/ianvs_env/bin/activate
+```
+
+Navigate to the benchmark directory:
+```bash
+cd /home/nishant/LOCAL_DISK_D/ianvs/examples/llm-edge-benchmark-suite/single_task_bench
+```
+
+Install the required dependencies using the provided `requirements.txt`:
+```bash
+pip install -r requirements.txt
+```
+*(Note: If `requirements.txt` is missing, ensure you install `llama-cpp-python>=0.2.20`, `torch`, `transformers`, `pyyaml`, and `psutil`).*
+
+---
+
+## 📥 Step 2: Download the Model
+
+The benchmark requires a localized `.gguf` model file. By default, this suite uses the `Qwen1.5-0.5B-Chat` model.
+
+Create the target directory and use a resumable download command (`wget -c`) to prevent file corruption:
+
+```bash
+mkdir -p /home/nishant/LOCAL_DISK_D/ianvs/models/qwen
+wget -c -O /home/nishant/LOCAL_DISK_D/ianvs/models/qwen/qwen_1_5_0_5b.gguf [https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-GGUF/resolve/main/qwen1_5-0_5b-chat-q4_k_m.gguf](https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-GGUF/resolve/main/qwen1_5-0_5b-chat-q4_k_m.gguf)
+```
+**Verification:** Ensure the downloaded file is approximately `398MB` to confirm it is not corrupted.
+
+---
+
+## ⚙️ Step 3: Configuration Alignment
+
+Ianvs requires strict configuration alignment. Double-check the following YAML files to ensure paths and paradigms are correct:
+
+### 1. Test Environment (`testenv/testenv.yaml`)
+Ensure all dataset paths are set to **absolute paths**. Relative paths will cause parsing errors.
+```yaml
+dataset:
+  train_data: "/home/nishant/LOCAL_DISK_D/ianvs/dataset/data.jsonl" # Must be absolute
+```
+
+### 2. Algorithm Configuration (`testalgorithms/algorithm.yaml`)
+Ensure the `paradigm_type` is correctly set for standard single-task inference, and the model path is absolute:
+```yaml
+paradigm_type: singletasklearning  # Do NOT use 'singletasklearningwithcompression' here
+modules:
+  basemodel:
+    model_path: "/home/nishant/LOCAL_DISK_D/ianvs/models/qwen/qwen_1_5_0_5b.gguf"
+```
+
+---
+
+## 🧩 Step 4: Algorithm Script (`basemodel.py`)
+
+The Ianvs `SingleTaskLearning` paradigm requires the model class to strictly adhere to the machine learning lifecycle contract. Your `LlamaCppModel` class in `basemodel.py` must include:
+
+1. **Pipeline Methods:** `preprocess` and `postprocess` with optional arguments to prevent `TypeError` exceptions.
+2. **Training Bypass:** A safe no-op `train` method, since we are using pre-trained weights for inference.
+3. **TTFT Measurement:** The `predict` method must use `stream=True` to accurately measure `prefill_latency` (Time-to-First-Token).
+
+*Example Snippet of required methods:*
+```python
+    def preprocess(self, data=None, **kwargs):
+        return data
+
+    def postprocess(self, predict_output=None, **kwargs):
+        return predict_output
+
+    def train(self, train_data, valid_data=None, **kwargs):
+        return kwargs.get("model_path", "")
+```
+
+---
+
+## 🚀 Step 5: Execution
+
+Once the setup and configurations are validated, run the benchmarking job from your terminal:
+
+```bash
+ianvs -f /home/nishant/LOCAL_DISK_D/ianvs/examples/llm-edge-benchmark-suite/single_task_bench/benchmarkingjob.yaml
+```
+
+### Expected Output
+The Ianvs core will parse the configurations, load the `LlamaCppModel`, and execute the inference loop. Upon completion, a `workspace` directory will be generated containing the logs and a final leaderboard table (`rank.csv`).
+
+You should see an output table similar to this:
+```text
++------+-----------+---------+------------+-----------------+--------------------+---------------+
+| rank | algorithm | latency | throughput | prefill_latency |      paradigm      |   basemodel   |
++------+-----------+---------+------------+-----------------+--------------------+---------------+
+|  1   | llama-cpp | 171.29  |   0.0058   |     171.27      | singletasklearning | LlamaCppModel |
++------+-----------+---------+------------+-----------------+--------------------+---------------+
diff --git a/examples/llm-edge-benchmark-suite/single_task_bench/requirements.txt b/examples/llm-edge-benchmark-suite/single_task_bench/requirements.txt
@@ -0,0 +1,12 @@
+# LLM Core Execution
+llama-cpp-python>=0.2.20
+
+# Machine Learning & Neural Network Basics
+torch>=2.0.0
+transformers>=4.35.0
+numpy>=1.24.0
+
+# Ianvs Utilities & Data Handling
+pyyaml>=6.0
+pandas>=2.0.0
+requests>=2.31.0
diff --git a/examples/llm-edge-benchmark-suite/single_task_bench/testalgorithms/basemodel.py b/examples/llm-edge-benchmark-suite/single_task_bench/testalgorithms/basemodel.py
@@ -22,6 +22,7 @@ def __init__(self, **kwargs):
         quantization_type = kwargs.get("quantization_type", None)
         if quantization_type:
             logging.info(f"Using quantization type: {quantization_type}")
+
         # Init LLM model
         self.model = Llama(
             model_path=model_path,
@@ -35,22 +36,24 @@ def __init__(self, **kwargs):
             embedding=kwargs.get("embedding", False),
         )
 
+    # 1. FIXED: Optional arguments for Ianvs pipeline
+    def preprocess(self, data=None, **kwargs):
+        """
+        Pass-through for text data.
+        """
+        return data
+
     def predict(self, data, input_shape=None, **kwargs):
-        data = data[:10]
-        process = psutil.Process(os.getpid())
-        start_time = time.time()
-
-        results = []
-        total_times = []
-        prefill_latencies = []
-        mem_usages = []
-
-        for prompt in data:
-            prompt_start_time = time.time()
-
-            f = io.StringIO()
-            with redirect_stderr(f):
-                output = self.model(
+            data = data[:10]
+            process = psutil.Process(os.getpid())
+
+            results = []
+
+            for prompt in data:
+                prompt_start_time = time.time()
+
+                # Run model with stream=True to measure exact TTFT
+                output_stream = self.model(
                     prompt=prompt,
                     max_tokens=kwargs.get("max_tokens", 32),
                     stop=kwargs.get("stop", ["Q:", "\n"]),
@@ -59,31 +62,44 @@ def predict(self, data, input_shape=None, **kwargs):
                     top_p=kwargs.get("top_p", 0.95),
                     top_k=kwargs.get("top_k", 40),
                     repeat_penalty=kwargs.get("repeat_penalty", 1.1),
+                    stream=True  # <--- This is the magic flag
                 )
-            stdout_output = f.getvalue()
-
-            # parse timing info
-            timings = self._parse_timings(stdout_output)
-            prefill_latency = timings.get('prompt_eval_time', 0.0)  # ms
-            generated_text = output['choices'][0]['text']
-
-            prompt_end_time = time.time()
-            prompt_total_time = (prompt_end_time - prompt_start_time) * 1000  # convert to ms
-
-            result_with_time = {
-                "generated_text": generated_text,
-                "total_time": prompt_total_time,
-                "prefill_latency": prefill_latency,
-                "mem_usage":process.memory_info().rss,
-            }
-
-            results.append(result_with_time)
-
-        predict_dict = {
-            "results": results,
-        }
-
-        return predict_dict
+
+                generated_text = ""
+                prefill_latency = 0.0
+                first_token = True
+
+                # Iterate through the stream as the model generates it
+                for chunk in output_stream:
+                    if first_token:
+                        # The time difference right here is your prefill latency!
+                        prefill_latency = (time.time() - prompt_start_time) * 1000 
+                        first_token = False
+
+                    # Piece the text back together
+                    if "text" in chunk["choices"][0]:
+                        generated_text += chunk["choices"][0]["text"]
+
+                prompt_end_time = time.time()
+                prompt_total_time = (prompt_end_time - prompt_start_time) * 1000  # convert to ms
+
+                result_with_time = {
+                    "generated_text": generated_text,
+                    "total_time": prompt_total_time,
+                    "prefill_latency": prefill_latency,
+                    "mem_usage": process.memory_info().rss,
+                }
+
+                results.append(result_with_time)
+
+            return {"results": results}
+
+    # 2. FIXED: Optional arguments for Ianvs pipeline
+    def postprocess(self, predict_output=None, **kwargs):
+        """
+        Pass-through for prediction output.
+        """
+        return predict_output
 
     def _parse_timings(self, stdout_output):
         import re
@@ -131,5 +147,11 @@ def save(self, model_path):
     def load(self, model_url):
         pass
 
+    # 3. FIXED: Safe no-op for training pre-trained models
     def train(self, train_data, valid_data=None, **kwargs):
-        return
+        """
+        Dummy train method. 
+        Returns the model path to satisfy Ianvs pipeline requirements.
+        """
+        logging.info("Training step bypassed: Using pre-trained weights for LLM inference.")
+        return kwargs.get("model_path", "")