All

29 repositories

Latent_Policy_Guard
Public
Latent Policy Guard (LPG) — a guardrail model that performs semantic latent deliberation over dynamic safety policies. LPG compresses intent and risk reasoning …
Python
•
MIT License
•0•4•0•0•Updated Jun 11, 2026Jun 11, 2026
MaskForge
Public
Python
•0•1•0•0•Updated Jun 4, 2026Jun 4, 2026
SafeVL
Public
Official Repo for Paper: SafeVL: Driving Safety Evaluation via Meticulous Reasoning in Vision Language Models
Python
•0•1•0•0•Updated May 31, 2026May 31, 2026
PW-OPSD
Public
The official implementation of our preprint paper "When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning"
Python
•
MIT License
•1•9•0•0•Updated May 23, 2026May 23, 2026
AgentDyn
Public
The official implementation of the paper "AgentDyn: Are Your Agent Security Defenses Deployable in Real-World Dynamic Environments?"
Python
•
MIT License
•4•61•0•0•Updated May 19, 2026May 19, 2026
ROM
Public
The official implementation of our paper "ROM: Real-time Overthinking Mitigation via Streaming Detection and Intervention"
Python
•
MIT License
•1•3•0•0•Updated May 12, 2026May 12, 2026
A2ASecBench
Public
Official code repository for "A2ASecBench: A Protocol-Aware Security Benchmark for Agent-to-Agent Multi-Agent Systems" at ICLR 2026.
Python
•
MIT License
•0•2•0•0•Updated May 8, 2026May 8, 2026
DRIFT
Public
[NeurIPS 2025] The official implementation of the paper "DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents".
Python
•3•52•1•0•Updated Apr 19, 2026Apr 19, 2026
ReasoningBomb
Public
[CCS 2026] The official implementation of our CCS 2026 paper "ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in La…
Python
•
Other
•4•14•0•0•Updated Apr 10, 2026Apr 10, 2026
DynAuditClaw
Public
DynAuditClaw — A security audit skill that dynamically discovers your OpenClaw agent's real configuration, designs targeted attack scenarios adapted to your spe…
Python
•2•14•0•0•Updated Apr 6, 2026Apr 6, 2026
PRISM
Public
PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
safety vlm vlm-reasoning
safety vlm vlm-reasoning
Python
•
MIT License
•1•7•0•0•Updated Apr 3, 2026Apr 3, 2026
Claude-Code-Security-Analysis
Public
A security analysis report of the leaked Claude-Code
1•5•0•0•Updated Apr 3, 2026Apr 3, 2026
seclaw
Public
🦾 SeClaw: The Security Armored Personal AI Assistant
TypeScript
•
MIT License
•1•31•0•0•Updated Mar 18, 2026Mar 18, 2026
llm-armor
Public
JavaScript
•0•0•0•0•Updated Mar 18, 2026Mar 18, 2026
armor
Public
Python
•
MIT License
•0•7•0•0•Updated Mar 18, 2026Mar 18, 2026
dVLM-AD
Public
Official Repo for “dVLM-AD: Enhance Diffusion Vision-Language-Model for Driving via Controllable Reasoning”
Python
•0•6•0•0•Updated Feb 22, 2026Feb 22, 2026
AdaShield
Public
[ECCV 2024] The official code for "AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting."
Python
•4•73•5•0•Updated Feb 9, 2026Feb 9, 2026
DoxBench
Public
[ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"
Jupyter Notebook
•
Apache License 2.0
•3•27•0•0•Updated Feb 7, 2026Feb 7, 2026
SaFo-Lab.github.io
Public
The homepage of SaFo Lab
HTML
•
MIT License
•0•2•0•0•Updated Jan 28, 2026Jan 28, 2026
MetaAgent
Public
Offical Repository of MetaAgent Program
Python
•8•52•4•0•Updated Dec 2, 2025Dec 2, 2025
AutoDAN-Reasoning
Public
A further improvement for the AutoDAN-Turbo through test-time scaling.
Python
•
MIT License
•4•14•1•0•Updated Oct 21, 2025Oct 21, 2025
AutoDAN-Turbo
Public
[ICLR 2025 Spotlight] The official implementation of our ICLR2025 paper "AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs".
Python
•
MIT License
•67•372•5•1•Updated Oct 8, 2025Oct 8, 2025
AGrail4Agent
Public
[ACL 2025] The official code for "AGrail: A Lifelong Agent Guardrail with Effective and Adaptive Safety Detection".
Python
•3•41•0•0•Updated Aug 4, 2025Aug 4, 2025
JailBreakV_28K
Public
[COLM 2024] JailBreakV-28K: A comprehensive benchmark designed to evaluate the transferability of LLM jailbreak attacks to MLLMs, and further assess the robustn…
jailbreakv-28k
jailbreakv-28k
Python
•12•94•2•0•Updated May 9, 2025May 9, 2025
OET
Public
Python
•
MIT License
•1•11•0•0•Updated May 5, 2025May 5, 2025
FIUBench
Public
A Task of Fictitious Unlearning for VLMs
Jupyter Notebook
•2•27•7•0•Updated Apr 6, 2025Apr 6, 2025
Dolphins
Public
[ECCV 2024] The official code for "Dolphins: Multimodal Language Model for Driving“
Python
•
MIT License
•14•88•6•0•Updated Feb 10, 2025Feb 10, 2025
Awesome-T2I-safety-Papers
Public
List of T2I safety papers, updated daily, welcome to discuss using Discussions
MIT License
•1•68•0•0•Updated Aug 12, 2024Aug 12, 2024
.github
Public
Open codes from SaFoLab at University of Wisconsin–Madison
0•1•0•0•Updated Jul 3, 2024Jul 3, 2024

ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SaFoLab : Security and Safe Foundation Model Systems

All

All

29 repositories

Latent_Policy_Guard

MaskForge

SafeVL

PW-OPSD

AgentDyn

ROM

A2ASecBench

DRIFT

ReasoningBomb

DynAuditClaw

PRISM

Claude-Code-Security-Analysis

seclaw

llm-armor

armor

dVLM-AD

AdaShield

DoxBench

SaFo-Lab.github.io

MetaAgent

AutoDAN-Reasoning

AutoDAN-Turbo

AGrail4Agent

JailBreakV_28K

OET

FIUBench

Dolphins

Awesome-T2I-safety-Papers

.github

All

All

Repositories list

29 repositories