Skip to content
Change the repository type filter

All

    Repositories list

    • Latent Policy Guard (LPG) — a guardrail model that performs semantic latent deliberation over dynamic safety policies. LPG compresses intent and risk reasoning …
      Python
      MIT License
      0400Updated Jun 11, 2026Jun 11, 2026
    • MaskForge

      Public
      Python
      0100Updated Jun 4, 2026Jun 4, 2026
    • SafeVL

      Public
      Official Repo for Paper: SafeVL: Driving Safety Evaluation via Meticulous Reasoning in Vision Language Models
      Python
      0100Updated May 31, 2026May 31, 2026
    • PW-OPSD

      Public
      The official implementation of our preprint paper "When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning"
      Python
      MIT License
      1900Updated May 23, 2026May 23, 2026
    • AgentDyn

      Public
      The official implementation of the paper "AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?"
      Python
      MIT License
      46100Updated May 19, 2026May 19, 2026
    • ROM

      Public
      The official implementation of our paper "ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention"
      Python
      MIT License
      1300Updated May 12, 2026May 12, 2026
    • Official code repository for "A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems" at ICLR 2026.
      Python
      MIT License
      0200Updated May 8, 2026May 8, 2026
    • DRIFT

      Public
      [NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents".
      Python
      35210Updated Apr 19, 2026Apr 19, 2026
    • [CCS 2026] The official implementation of our CCS 2026 paper "ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in La…
      Python
      Other
      41400Updated Apr 10, 2026Apr 10, 2026
    • DynAuditClaw — A security audit skill that dynamically discovers your OpenClaw agent's real configuration, designs targeted attack scenarios adapted to your spe…
      Python
      21400Updated Apr 6, 2026Apr 6, 2026
    • PRISM

      Public
      PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
      Python
      MIT License
      1700Updated Apr 3, 2026Apr 3, 2026
    • A security analysis report of the leaked Claude-Code
      1500Updated Apr 3, 2026Apr 3, 2026
    • seclaw

      Public
      🦾 SeClaw: The Security Armored Personal AI Assistant
      TypeScript
      MIT License
      13100Updated Mar 18, 2026Mar 18, 2026
    • llm-armor

      Public
      JavaScript
      0000Updated Mar 18, 2026Mar 18, 2026
    • armor

      Public
      Python
      MIT License
      0700Updated Mar 18, 2026Mar 18, 2026
    • dVLM-AD

      Public
      Official Repo for “dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning”
      Python
      0600Updated Feb 22, 2026Feb 22, 2026
    • AdaShield

      Public
      [ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting."
      Python
      47350Updated Feb 9, 2026Feb 9, 2026
    • DoxBench

      Public
      [ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"
      Jupyter Notebook
      Apache License 2.0
      32700Updated Feb 7, 2026Feb 7, 2026
    • The homepage of SaFo Lab
      HTML
      MIT License
      0200Updated Jan 28, 2026Jan 28, 2026
    • MetaAgent

      Public
      Offical Repository of MetaAgent Program
      Python
      85240Updated Dec 2, 2025Dec 2, 2025
    • A further improvement for the AutoDAN-Turbo through test-time scaling.
      Python
      MIT License
      41410Updated Oct 21, 2025Oct 21, 2025
    • [ICLR 2025 Spotlight] The official implementation of our ICLR2025 paper "AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs".
      Python
      MIT License
      6737251Updated Oct 8, 2025Oct 8, 2025
    • [ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".
      Python
      34100Updated Aug 4, 2025Aug 4, 2025
    • [COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and further assess the robustn…
      Python
      129420Updated May 9, 2025May 9, 2025
    • OET

      Public
      Python
      MIT License
      11100Updated May 5, 2025May 5, 2025
    • FIUBench

      Public
      A Task of Fictitious Unlearning for VLMs
      Jupyter Notebook
      22770Updated Apr 6, 2025Apr 6, 2025
    • Dolphins

      Public
      [ECCV 2024] The official code for "Dolphins: Multimodal Language Model for Driving“
      Python
      MIT License
      148860Updated Feb 10, 2025Feb 10, 2025
    • List of T2I safety papers, updated daily, welcome to discuss using Discussions
      MIT License
      16800Updated Aug 12, 2024Aug 12, 2024
    • .github

      Public
      Open codes from SaFoLab at University of Wisconsin–Madison
      0100Updated Jul 3, 2024Jul 3, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.