Skip to content

[vllm] Repair stale Meta Llama GCS caches flattened into Mistral layout #4356

@ahmeda14960

Description

@ahmeda14960

🤖

Describe the bug
Some historical cached meta-llama model prefixes in GCS were downloaded with a flattened layout: files that should have lived under original/ were written at the model root. When a cached Llama prefix contains root params.json and root consolidated*.pth, native TPU vLLM can parse the checkpoint as Mistral and resolve MistralForCausalLM instead of LlamaForCausalLM.

To Reproduce

  1. Inspect any broken prefix below and confirm it has root params.json and root consolidated*.pth, with no original/params.json.
  2. Launch native vllm serve against that GCS prefix on TPU.
  3. Check startup logs for Resolved architecture: MistralForCausalLM.
  4. Compare with a repaired prefix, where params.json and consolidated*.pth live under original/ and vLLM resolves LlamaForCausalLM.

Expected behavior
Meta Llama caches should preserve original/params.json and original/consolidated*.pth, and native TPU vLLM should resolve LlamaForCausalLM.

Additional context
Current download code in lib/marin/src/marin/datakit/download/huggingface.py:99 preserves nested HF paths, and experiments/models.py:35 writes stable model prefixes. The issue is stale historical caches already written under those stable prefixes.

Why delete and re-download is required:

  • re-running the current downloader into the same prefix does not remove stale root params.json or stale root consolidated*.pth
  • the stable output path means malformed objects remain mixed with the corrected layout unless the prefix is cleaned first
  • deleting the whole prefix is safer than partial cleanup and also clears executor bookkeeping such as .executor_status

Affected broken prefixes found in the accessible Marin buckets:

  • gs://marin-us-central1/models/meta-llama--Llama-3-1-8B--d04e592/
  • gs://marin-us-central1/models/meta-llama--Llama-3-1-8B--main/
  • gs://marin-us-central1/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/
  • gs://marin-us-central1/models/meta-llama--Llama-3-2-1B--4e20de3/
  • gs://marin-us-central1/models/meta-llama--Llama-3-2-1B--main/
  • gs://marin-us-central1/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/
  • gs://marin-us-central2/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/
  • gs://marin-us-central2/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/
  • gs://marin-us-east1/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/
  • gs://marin-us-east5/models/meta-llama--Llama-3-2-1B--main/

Already repaired / clean:

  • gs://marin-us-east5/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/
  • gs://marin-us-east5/models/meta-llama--Llama-3-3-70B-Instruct--6f6073b/

No meta-llama model prefixes were found in gs://marin-eu-west4/models/ or gs://marin-us-west4/models/.

Validated repair: after deleting and re-downloading gs://marin-us-east5/models/meta-llama--Llama-3-1-8B-Instruct--0e9e39f/, native TPU vLLM resolved LlamaForCausalLM and a 100-request smoke test succeeded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    agent-generatedCreated by automation/agentbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions