Incident Response Detective - GRPO for SRE social-authority attacks by Shikhar-0809 · Pull Request #3366 · huggingface/blog

Shikhar-0809 · 2026-04-26T06:14:50Z

Case study on using GRPO to train a small LLM (Llama 3.1 8B) to resist social-authority attacks in SRE incident response scenarios.

Key findings:

Llama 3.3 70B fails at 99.9% rate on adversarial tasks (0.001 score)
GRPO-trained Llama 3.1 8B achieves 0.94 score on same tasks
Environment: OpenEnv-compliant RL environment for incident response
Training: 50min on Kaggle T4 x2

Submission for Meta x HuggingFace OpenEnv Hackathon.

Added a comprehensive overview of the Incident Response Detective project, detailing the problem of AI in SRE operations, the proposed solution, training approach, results, and future work.

Incident Response Detective - GRPO for SRE social-authority attacks

afc55f3

Added a comprehensive overview of the Incident Response Detective project, detailing the problem of AI in SRE operations, the proposed solution, training approach, results, and future work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incident Response Detective - GRPO for SRE social-authority attacks#3366

Incident Response Detective - GRPO for SRE social-authority attacks#3366
Shikhar-0809 wants to merge 1 commit intohuggingface:mainfrom
Shikhar-0809:add-incident-response-detective

Shikhar-0809 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Shikhar-0809 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant