Change the repository type filter
All
Repositories list
26 repositories
ReflCtrl
Public- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.
- [CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality
- A new training framework for Trustworthy Large Reasoning Models
Concept-Bottleneck-LLM
Public- [ICML 24] S-DQN and S-PPO: Robust smoothed deep RL agents without sacrificing performance
CB-LLMs
Public[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.- [ICML 25] A unified mathematical framework to evaluate neuron explanations of deep learning models with sanity tests
efficient_neuron_eval
Public- [NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)
- [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks
- [NAACL 25] Two novel, light-weight, and training-free skill unlearning methods for LLMs
RAT_MisD
PublicBoosting misclassification detection ability by radius-aware training (RAT)- [TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models
- [ECCV 24] A new and low-cost test-time defense for DNNs based on neuron-level-interpretability methods
Audio_Network_Dissection
Public[ICML 24] AND: the first framework to provide automatic natural language explanations for deep acoustic networkDSC-210-NLA-FA22
PublicNN-LPK
Public- [ICLR 24] This work proposes RSCP+ to provide robustness guarantee in evaluation, and two novel methods PTT and RCT to robustify conformal predictions with improved efficiency through post-hoc transformation and training.
Label-free-CBM
Public[ICLR 23] A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data- [NeurIPS'23 ATTRIB] An efficient framework to generate neuron explanations for LLMs
- [ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs
- [ICCV 23] Evaluating robustness of neuron explanation methods