🤖
Describe the bug
Some historical cached meta-llama model prefixes in GCS were downloaded with a flattened layout: files that should have lived under original/ were written at the model root. When a cached Llama prefix contains root params.json and root consolidated*.pth, native TPU vLLM can parse the checkpoint as Mistral and resolve MistralForCausalLM instead of LlamaForCausalLM.
To Reproduce
- Inspect any broken prefix below and confirm it has root
params.json and root consolidated*.pth, with no original/params.json.
- Launch native
vllm serve against that GCS prefix on TPU.
- Check startup logs for
Resolved architecture: MistralForCausalLM.
- Compare with a repaired prefix, where
params.json and consolidated*.pth live under original/ and vLLM resolves LlamaForCausalLM.
Expected behavior
Meta Llama caches should preserve original/params.json and original/consolidated*.pth, and native TPU vLLM should resolve LlamaForCausalLM.
Additional context
Current download code in lib/marin/src/marin/datakit/download/huggingface.py:99 preserves nested HF paths, and experiments/models.py:35 writes stable model prefixes. The issue is stale historical caches already written under those stable prefixes.
Why delete and re-download is required:
- re-running the current downloader into the same prefix does not remove stale root
params.json or stale root consolidated*.pth
- the stable output path means malformed objects remain mixed with the corrected layout unless the prefix is cleaned first
- deleting the whole prefix is safer than partial cleanup and also clears executor bookkeeping such as
.executor_status
Affected broken prefixes found in the accessible Marin buckets:
gs://marin-us-central1/models/meta-llama--Llama-3-1-8B--d04e592/
gs://marin-us-central1/models/meta-llama--Llama-3-1-8B--main/
gs://marin-us-central1/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/
gs://marin-us-central1/models/meta-llama--Llama-3-2-1B--4e20de3/
gs://marin-us-central1/models/meta-llama--Llama-3-2-1B--main/
gs://marin-us-central1/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/
gs://marin-us-central2/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/
gs://marin-us-central2/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/
gs://marin-us-east1/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/
gs://marin-us-east5/models/meta-llama--Llama-3-2-1B--main/
Already repaired / clean:
gs://marin-us-east5/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/
gs://marin-us-east5/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/
No meta-llama model prefixes were found in gs://marin-eu-west4/models/ or gs://marin-us-west4/models/.
Validated repair: after deleting and re-downloading gs://marin-us-east5/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/, native TPU vLLM resolved LlamaForCausalLM and a 100-request smoke test succeeded.
🤖
Describe the bug
Some historical cached
meta-llamamodel prefixes in GCS were downloaded with a flattened layout: files that should have lived underoriginal/were written at the model root. When a cached Llama prefix contains rootparams.jsonand rootconsolidated*.pth, native TPU vLLM can parse the checkpoint as Mistral and resolveMistralForCausalLMinstead ofLlamaForCausalLM.To Reproduce
params.jsonand rootconsolidated*.pth, with nooriginal/params.json.vllm serveagainst that GCS prefix on TPU.Resolved architecture: MistralForCausalLM.params.jsonandconsolidated*.pthlive underoriginal/and vLLM resolvesLlamaForCausalLM.Expected behavior
Meta Llama caches should preserve
original/params.jsonandoriginal/consolidated*.pth, and native TPU vLLM should resolveLlamaForCausalLM.Additional context
Current download code in
lib/marin/src/marin/datakit/download/huggingface.py:99preserves nested HF paths, andexperiments/models.py:35writes stable model prefixes. The issue is stale historical caches already written under those stable prefixes.Why delete and re-download is required:
params.jsonor stale rootconsolidated*.pth.executor_statusAffected broken prefixes found in the accessible Marin buckets:
gs://marin-us-central1/models/meta-llama--Llama-3-1-8B--d04e592/gs://marin-us-central1/models/meta-llama--Llama-3-1-8B--main/gs://marin-us-central1/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/gs://marin-us-central1/models/meta-llama--Llama-3-2-1B--4e20de3/gs://marin-us-central1/models/meta-llama--Llama-3-2-1B--main/gs://marin-us-central1/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/gs://marin-us-central2/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/gs://marin-us-central2/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/gs://marin-us-east1/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/gs://marin-us-east5/models/meta-llama--Llama-3-2-1B--main/Already repaired / clean:
gs://marin-us-east5/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/gs://marin-us-east5/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/No
meta-llamamodel prefixes were found ings://marin-eu-west4/models/orgs://marin-us-west4/models/.Validated repair: after deleting and re-downloading
gs://marin-us-east5/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/, native TPU vLLM resolvedLlamaForCausalLMand a 100-request smoke test succeeded.