Skip to content

Security: Unsafe torch.load in FSDP2 scaler path (CWE-502) #3964

@spartan8806

Description

@spartan8806

Security Vulnerability Report

Severity: HIGH (CVSS 3.1: 7.8)
CWE: CWE-502 — Deserialization of Untrusted Data
Reporter: Conner Webber (conner.webber000@gmail.com)
90-day disclosure deadline: 2026-06-06


Vulnerability

File: accelerate/accelerator.py, line ~3823 (FSDP2 scaler loading path)

Code:

scaler_state = torch.load(input_scaler_file)

This call to torch.load() is missing weights_only=True, which allows arbitrary Python object deserialization via pickle.

Context

Every OTHER torch.load() call in the accelerate codebase uses either weights_only=True or the project's own safe wrapper accelerate.utils.other.load(). The safe equivalent already exists at checkpointing.py:283:

scaler_state = load(input_scaler_file)

This is clearly an oversight in the FSDP2 code path that was not updated when the rest of the codebase was hardened.

Impact

A malicious scaler.pt file placed in a checkpoint directory (e.g., via a shared model on HuggingFace Hub, a compromised checkpoint, or supply chain attack) achieves arbitrary code execution when a user resumes FSDP2 training with gradient scaling enabled.

The attack surface is significant because:

  1. Users routinely download and resume from shared checkpoints
  2. HuggingFace Hub is a primary distribution channel for model checkpoints
  3. The scaler file is loaded automatically during load_state() without user inspection
  4. No sandboxing or validation occurs before deserialization

CVSS 3.1: 7.8 (HIGH)

AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

  • Attack Vector: Local (malicious file must be present on disk, e.g., downloaded checkpoint)
  • Attack Complexity: Low (no special conditions needed)
  • Privileges Required: None
  • User Interaction: Required (user must call load_state on the malicious checkpoint)
  • Confidentiality/Integrity/Availability: All High (arbitrary code execution)

Suggested Fix

Replace:

scaler_state = torch.load(input_scaler_file)

With either:

from accelerate.utils.other import load
scaler_state = load(input_scaler_file)

Or:

scaler_state = torch.load(input_scaler_file, weights_only=True)

The first option is preferred as it matches the pattern used elsewhere in the codebase and provides consistent safe loading behavior.

Note to Maintainers

I attempted to report this via GitHub's Private Vulnerability Reporting (PVRA), but it is not enabled on this repository. There is also no SECURITY.md with alternative reporting instructions. I recommend enabling PVRA or adding a SECURITY.md to allow future security issues to be reported privately. I am happy to coordinate disclosure — please reach out to the email above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions