Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 38 additions & 13 deletions docs/user_interface/how-to-config-testenv.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,29 +13,60 @@ For example:

```yaml
testenv:
# dataset configuration
dataset:
...
# metrics configuration for test case's evaluation; list type;
metrics:
...
```

### The configuration of dataset

| Property | Required | Description |
|----------|----------|-------------|
|Property|Required|Description|
|---|---|---|
|train_url|yes|The url address of train dataset index; Type: string|
|test_url|yes|The url address of test dataset index; Type: string|

#### Supported Data Formats

The dataset files can be provided in several formats. The supported data formats are **TXT**, **CSV**, **JSON**, and **JSONL**.

Here is how the data files should be prepared depending on the format:

##### 1. TXT Format
For TXT format, each line typically represents a single data record or a path to a data file, optionally followed by its corresponding label separated by a space.
```txt
/path/to/image1.jpg dog
/path/to/image2.jpg cat
```

##### 2. CSV Format
For CSV format, the file should contain comma-separated values. It usually includes headers, where one column represents the data (or path to data) and another represents the label.
```csv
image_path,label
/path/to/image1.jpg,dog
/path/to/image2.jpg,cat
```

##### 3. JSON / JSONL Format
For JSON format, it can be a JSON array of objects, or JSON Lines (JSONL) where each line is a valid JSON object.
```json
[
{"image": "/path/to/image1.jpg", "label": "dog"},
{"image": "/path/to/image2.jpg", "label": "cat"}
Comment on lines +54 to +55
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the CSV example which uses image_path, it would be clearer to also use image_path here as the key for the path to the data.

Suggested change
{"image": "/path/to/image1.jpg", "label": "dog"},
{"image": "/path/to/image2.jpg", "label": "cat"}
{"image_path": "/path/to/image1.jpg", "label": "dog"},
{"image_path": "/path/to/image2.jpg", "label": "cat"}

]
```

Or JSONL:
```json
{"image": "/path/to/image1.jpg", "label": "dog"}
{"image": "/path/to/image2.jpg", "label": "cat"}
Comment on lines +61 to +62
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with the CSV example, please use image_path here as well instead of image.

Suggested change
{"image": "/path/to/image1.jpg", "label": "dog"}
{"image": "/path/to/image2.jpg", "label": "cat"}
{"image_path": "/path/to/image1.jpg", "label": "dog"}
{"image_path": "/path/to/image2.jpg", "label": "cat"}

```

For example:

```yaml
# dataset configuration
dataset:
# the url address of train dataset index; string type;
train_index: "./dataset/mmlu-5-shot/train_data/data.json"
# the url address of test dataset index; string type;
test_index: "./dataset/mmlu-5-shot/test_data/metadata.json"
```

Expand Down Expand Up @@ -65,18 +96,12 @@ You can select multiple metrics in `examples/cloud-edge-collaborative-inference-
```yaml
# testenv.yaml
testenv:
# dataset configuration
dataset:
# the url address of train dataset index; string type;
train_data: "./dataset/mmlu-5-shot/train_data/data.json"
# the url address of test dataset index; string type;
test_data_info: "./dataset/mmlu-5-shot/test_data/metadata.json"

# metrics configuration for test case's evaluation; list type;
metrics:
# metric name; string type;
- name: "Accuracy"
# the url address of python file
url: "./examples/cloud-edge-collaborative-inference-for-llm/testenv/accuracy.py"

- name: "Edge Ratio"
Expand Down