[vllm] Repair stale Meta Llama GCS caches flattened into Mistral layout

🤖

**Describe the bug**
Some historical cached `meta-llama` model prefixes in GCS were downloaded with a flattened layout: files that should have lived under `original/` were written at the model root. When a cached Llama prefix contains root `params.json` and root `consolidated*.pth`, native TPU vLLM can parse the checkpoint as Mistral and resolve `MistralForCausalLM` instead of `LlamaForCausalLM`.

**To Reproduce**
1. Inspect any broken prefix below and confirm it has root `params.json` and root `consolidated*.pth`, with no `original/params.json`.
2. Launch native `vllm serve` against that GCS prefix on TPU.
3. Check startup logs for `Resolved architecture: MistralForCausalLM`.
4. Compare with a repaired prefix, where `params.json` and `consolidated*.pth` live under `original/` and vLLM resolves `LlamaForCausalLM`.

**Expected behavior**
Meta Llama caches should preserve `original/params.json` and `original/consolidated*.pth`, and native TPU vLLM should resolve `LlamaForCausalLM`.

**Additional context**
Current download code in `lib/marin/src/marin/datakit/download/huggingface.py:99` preserves nested HF paths, and `experiments/models.py:35` writes stable model prefixes. The issue is stale historical caches already written under those stable prefixes.

Why delete and re-download is required:
- re-running the current downloader into the same prefix does not remove stale root `params.json` or stale root `consolidated*.pth`
- the stable output path means malformed objects remain mixed with the corrected layout unless the prefix is cleaned first
- deleting the whole prefix is safer than partial cleanup and also clears executor bookkeeping such as `.executor_status`

Affected broken prefixes found in the accessible Marin buckets:
- `gs://marin-us-central1/models/meta-llama--Llama-3-1-8B--d04e592/`
- `gs://marin-us-central1/models/meta-llama--Llama-3-1-8B--main/`
- `gs://marin-us-central1/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/`
- `gs://marin-us-central1/models/meta-llama--Llama-3-2-1B--4e20de3/`
- `gs://marin-us-central1/models/meta-llama--Llama-3-2-1B--main/`
- `gs://marin-us-central1/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/`
- `gs://marin-us-central2/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/`
- `gs://marin-us-central2/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/`
- `gs://marin-us-east1/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/`
- `gs://marin-us-east5/models/meta-llama--Llama-3-2-1B--main/`

Already repaired / clean:
- `gs://marin-us-east5/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/`
- `gs://marin-us-east5/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/`

No `meta-llama` model prefixes were found in `gs://marin-eu-west4/models/` or `gs://marin-us-west4/models/`.

Validated repair: after deleting and re-downloading `gs://marin-us-east5/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/`, native TPU vLLM resolved `LlamaForCausalLM` and a 100-request smoke test succeeded.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vllm] Repair stale Meta Llama GCS caches flattened into Mistral layout #4356

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[vllm] Repair stale Meta Llama GCS caches flattened into Mistral layout #4356

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions