Skip to content

Commit 089f6f1

Browse files
committed
dataset instructions changed from hugging face to kaggle
dataset instructions changed from hugging face to kaggle Signed-off-by: Aryan Nanda <nandaaryan823@gmail.com> changes in readme of cloud-edge-collaborative-inference done to use kaggle instead of huggingface Signed-off-by: Aryan <nandaaryan823@gmail.com>
1 parent 284c2f6 commit 089f6f1

3 files changed

Lines changed: 43 additions & 39 deletions

File tree

examples/cloud-edge-collaborative-inference-for-llm/Dockerfile

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,11 @@ RUN apt-get update && apt-get install -y \
99
curl \
1010
gnupg \
1111
git \
12-
git-lfs && \
13-
git lfs install
12+
unzip
13+
14+
# Copy kaggle.json (Make sure this file is in the same directory as your Dockerfile)
15+
COPY kaggle.json /root/.kaggle/kaggle.json
16+
RUN chmod 600 /root/.kaggle/kaggle.json
1417

1518
# Clone Ianvs repo
1619
RUN git clone https://github.com/kubeedge/ianvs.git
@@ -26,16 +29,17 @@ RUN /bin/bash -c "source activate $CONDA_ENV && \
2629
pip install -r examples/cloud-edge-collaborative-inference-for-llm/requirements.txt && \
2730
python setup.py install"
2831

29-
# Download and move dataset (still run inside /ianvs)
32+
# Download Kaggle CLI
33+
RUN pip install kaggle
34+
35+
# Download dataset
3036
RUN cd /ianvs && \
31-
git clone https://huggingface.co/datasets/FuryMartin/Ianvs-MMLU-5-shot && \
32-
git lfs install && \
33-
cd Ianvs-MMLU-5-shot && \
34-
git lfs pull && \
35-
mkdir -p /ianvs/dataset && \
36-
mv mmlu-5-shot /ianvs/dataset/ && \
37-
mv workspace-mmlu /ianvs/ && \
38-
rm -rf Ianvs-MMLU-5-shot # Optional cleanup
37+
kaggle datasets download -d kubeedgeianvs/ianvs-mmlu-5shot && \
38+
kaggle datasets download -d kubeedgeianvs/ianvs-gpqa-diamond && \
39+
unzip -o ianvs-mmlu-5shot.zip && \
40+
unzip -o ianvs-gpqa-diamond.zip && \
41+
rm -rf ianvs-mmlu-5shot.zip && \
42+
rm -rf ianvs-gpqa-diamond.zip
3943

4044
# Set final working directory
41-
WORKDIR /ianvs
45+
WORKDIR /ianvs

examples/cloud-edge-collaborative-inference-for-llm/README.md

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -99,35 +99,47 @@ Before using this example, you need to have the device ready:
9999

100100
The Docker-based setup assumes you have Docker installed on your system and are using an Ubuntu-based Linux distribution.
101101

102-
*If you don't have Docker installed, follow the Docker Engine installation guide [here](https://docs.docker.com/engine/install/ubuntu/).*
102+
**Note**:
103+
- If you don't have Docker installed, follow the Docker Engine installation guide [here](https://docs.docker.com/engine/install/ubuntu/).
104+
- To enable Docker to download datasets from Kaggle within your docker container, you need to configure the Kaggle CLI authentication token. Please follow the [official Kaggle API documentation](https://www.kaggle.com/docs/api#:~:text=is%20%24PYTHON_HOME/Scripts.-,Authentication,-In%20order%20to) to download your `kaggle.json` token. Once downloaded, move the file to the `~/ianvs/examples/cloud-edge-collaborative-inference-for-llm/` directory after doing step 1(cloning the ianvs repo):
103105

104-
1. From the root directory of Ianvs, build the `cloud-edge-collaborative-inference-for-llm` Docker image:
106+
```bash
107+
mv /path/to/kaggle.json ~/ianvs/examples/cloud-edge-collaborative-inference-for-llm/
108+
```
109+
110+
1. Clone Ianvs Repo
111+
```
112+
git clone https://github.com/kubeedge/ianvs.git
113+
cd ianvs
114+
```
115+
116+
2. From the root directory of Ianvs, build the `cloud-edge-collaborative-inference-for-llm` Docker image:
105117

106118
**Note**: If you have already build the image, then move on to the second step directly.
107119

108120
```bash
109121
docker build -t ianvs-experiment-image ./examples/cloud-edge-collaborative-inference-for-llm/
110122
```
111123

112-
2. Run the image in an interactive shell:
124+
3. Run the image in an interactive shell:
113125
```bash
114126
docker run -it ianvs-experiment-image /bin/bash
115127
```
116128

117-
3. Activate the ianvs-experiment Conda environment:
129+
4. Activate the ianvs-experiment Conda environment:
118130
```bash
119131
conda activate ianvs-experiment
120132
```
121133

122-
4. Set the required environment variables for the API (use either OpenAI or GROQ credentials):
134+
5. Set the required environment variables for the API (use either OpenAI or GROQ credentials):
123135
```bash
124136
export OPENAI_BASE_URL="https://api.openai.com/v1"
125137
export OPENAI_API_KEY=sk_xxxxxxxx
126138
```
127139

128140
`Alternatively, for GROQ, use GROQ_BASE_URL and GROQ_API_KEY.`
129141

130-
5. Run the Ianvs benchmark:
142+
6. Run the Ianvs benchmark:
131143
```bash
132144
ianvs -f examples/cloud-edge-collaborative-inference-for-llm/benchmarkingjob.yaml
133145
```
@@ -171,23 +183,14 @@ If you want to use speculative decoding models like [EAGLE](https://github.com/S
171183

172184
##### Dataset Configuration
173185

174-
Here, we provide `MMLU-5-shot` dataset and `GPQA-diamond` dataset for testing. The following is the instruction for dataset preparation for `MMLU-5-shot`, `GPQA-diamond` follows the same progress.
175-
176-
1. Download `mmlu-5-shot` from [Ianvs-MMLU-5-shot](https://huggingface.co/datasets/FuryMartin/Ianvs-MMLU-5-shot), (or [Ianvs-GPQA-diamond](https://huggingface.co/datasets/FuryMartin/Ianvs-GPQA-diamond)) which is a transformed MMLU-5-shot dataset formatted to fit Ianvs's requirements.
177-
178-
```bash
179-
git clone https://huggingface.co/datasets/FuryMartin/Ianvs-MMLU-5-shot
180-
git lfs install
181-
cd Ianvs-MMLU-5-shot
182-
git lfs pull
183-
cd ..
184-
```
185-
186-
2. Create a `dataset` folder in the root directory of Ianvs and move `mmlu-5-shot` into the `dataset` folder.
186+
Here, we provide `MMLU-5-shot` dataset and `GPQA-diamond` dataset for testing. The following instruction for dataset preparation for `MMLU-5-shot`, `GPQA-diamond` follows the same progress.
187187

188+
1. Download `mmlu-5-shot` in the root directory of ianvs from [Ianvs-MMLU-5-shot](https://www.kaggle.com/datasets/kubeedgeianvs/ianvs-mmlu-5shot), which is a transformed MMLU-5-shot dataset formatted to fit Ianvs's requirements.
189+
**Note**: To enable Docker to download datasets from Kaggle within your docker container, you need to configure the Kaggle CLI authentication token. Please follow the [official Kaggle API documentation](https://www.kaggle.com/docs/api#:~:text=is%20%24PYTHON_HOME/Scripts.-,Authentication,-In%20order%20to) to download your `kaggle.json` token.
188190
```bash
189-
mkdir dataset
190-
mv Ianvs-MMLU-5-shot/mmlu-5-shot/ dataset/
191+
kaggle datasets download -d kubeedgeianvs/ianvs-mmlu-5shot
192+
unzip -o ianvs-mmlu-5shot.zip
193+
rm -rf ianvs-mmlu-5shot.zip
191194
```
192195

193196
3. Then, check the path of `train_data` and `test_data` in
@@ -342,11 +345,8 @@ The testing process may take much time, depending on the number of test cases an
342345

343346
To enable you directly get the results, here we provide a workspace folder with cached results of `Qwen/Qwen2.5-1.5B-Instruct`, `Qwen/Qwen2.5-3B-Instruct`,`Qwen/Qwen2.5-7B-Instruct` and `gpt-4o-mini`.
344347

345-
You can download `workspace-mmlu` folder from [Ianvs-MMLU-5-shot](https://huggingface.co/datasets/FuryMartin/Ianvs-MMLU-5-shot) and put it under your `ianvs` folder.
346-
347-
```bash
348-
mv Ianvs-MMLU-5-shot/workspace-mmlu/ .
349-
```
348+
You can download `workspace-mmlu` folder from [Ianvs-MMLU-5-shot](https://www.kaggle.com/datasets/kubeedgeianvs/ianvs-mmlu-5shot) and put it under your `ianvs` folder.
349+
- Since we have already downloaded the `Ianvs-MMLU-5-shot` folder. There is no need to do this again.
350350

351351
##### Run Joint Inference example
352352

examples/cloud-edge-collaborative-inference-for-llm/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@ transformers
33
openai
44
accelerate
55
datamodel_code_generator
6-
git-lfs
6+
kaggle
77
groq

0 commit comments

Comments
 (0)