Skip to content

Incident Response Detective - GRPO for SRE social-authority attacks#3366

Open
Shikhar-0809 wants to merge 1 commit intohuggingface:mainfrom
Shikhar-0809:add-incident-response-detective
Open

Incident Response Detective - GRPO for SRE social-authority attacks#3366
Shikhar-0809 wants to merge 1 commit intohuggingface:mainfrom
Shikhar-0809:add-incident-response-detective

Conversation

@Shikhar-0809
Copy link
Copy Markdown

Case study on using GRPO to train a small LLM (Llama 3.1 8B) to resist social-authority attacks in SRE incident response scenarios.

Key findings:

  • Llama 3.3 70B fails at 99.9% rate on adversarial tasks (0.001 score)
  • GRPO-trained Llama 3.1 8B achieves 0.94 score on same tasks
  • Environment: OpenEnv-compliant RL environment for incident response
  • Training: 50min on Kaggle T4 x2

Submission for Meta x HuggingFace OpenEnv Hackathon.

Added a comprehensive overview of the Incident Response Detective project, detailing the problem of AI in SRE operations, the proposed solution, training approach, results, and future work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant