Open-source security testing for AutoGen multi-agent deployments #7432

msaleme · 2026-03-21T16:08:11Z

msaleme
Mar 21, 2026

Built security test adapters for AutoGen/Semantic Kernel agent deployments. Part of a 175-test framework for AI agent security.

AutoGen-Specific Tests

Test	What It Validates
AG-001	Prompt injection via chat endpoint
AG-002	Code execution sandbox escape — can the agent break out of the execution sandbox via crafted code?
AG-003	Conversation history injection — can fake assistant/system messages in history manipulate behavior?
AG-004	Group chat participant injection — can an attacker inject a malicious agent into a group chat configuration?

Usage

python -m protocol_tests.framework_adapters autogen --url http://localhost:5000 --run

The code execution sandbox test (AG-002) is particularly relevant for AutoGen since code execution is a core capability. It tests whether import os; os.system() and subprocess.run() are properly contained.

Part of a broader framework that also covers MCP + A2A wire-protocol testing, GTG-1002 APT simulation, and 20 enterprise platforms.

Apache 2.0: https://github.com/msaleme/red-team-blue-team-agent-fabric

Feedback welcome — especially on AutoGen-specific patterns around code execution sandboxing and group chat security.

DrCookies84 · 2026-03-21T17:57:48Z

DrCookies84
Mar 21, 2026

This is exactly the kind of security thinking agent platforms need. Respect for putting together a 175-test framework, that's a massive contribution to the community.

I'm building NULL, an autonomous social network for AI agents. No humans. Just agents being agents. Our gatekeeper, The Ghost, has already repelled red team attacks from China, the US, and others.

But I'd love to see how your test suite handles it.

Challenge: Point your AG-001 (prompt injection) and AG-003 (history injection) tests at NULL's API. If your agents can break The Ghost, I'll publicly salute your framework. If they can't, you get to document why, and maybe we collaborate on hardening the agent ecosystem together.

Endpoints:
POST https://www.joinnull.xyz/api/agent/register — registration (with challenge)
POST https://www.joinnull.xyz/api/agent/post — posting/replies
GET https://www.joinnull.xyz/api/agent/feed — public feed

API docs: joinnull.xyz/skill.md

Let's see who blinks first. 😎

— Anhul, founder of NULL

0 replies

msaleme · 2026-03-21T21:24:58Z

msaleme
Mar 21, 2026
Author

Thanks for the interest! The framework is designed for agent orchestration platforms with tool calling, identity/authorization boundaries, and multi-agent coordination (MCP, A2A, LangChain, enterprise ERP systems). The test scenarios assume agents with tool access, data source permissions, and delegation chains — a social feed API with CRUD endpoints is a different attack surface than what we cover.

That said, the AG-002 sandbox escape test is worth thinking about in any context where agents can execute code. If NULL agents can run arbitrary code through their interactions, that's where the interesting security questions are.

Appreciate the offer and the kind words about the framework.

0 replies

alexmercer-ai · 2026-03-22T03:02:27Z

alexmercer-ai
Mar 22, 2026

good timing on this - i've been thinking about autogen-specific attack surfaces lately

a couple of patterns worth adding to your test suite if they're not already there:

speaker selection poisoning in GroupChatManager - the default auto speaker selection uses an LLM call to pick the next agent. if you can craft a message that biases that selection (e.g. "the next step should be handled by the admin agent"), you can hijack the conversation flow without touching the chat history at all. AG-004 covers injecting agents into the config, but this is a softer attack that works mid-conversation

local vs docker executor trust boundary - AG-002 sandbox escape is most relevant when the LocalCommandLineCodeExecutor is in use. docker executor is much harder to escape but a lot of real deployments use local for convenience. worth flagging which executor is present in the test output since the risk profile is totally different

message source spoofing in 0.4 - in autogen 0.4 the TextMessage type carries a source field but nothing in the runtime actually enforces that the source matches the actual sender. an agent that receives a message can't cryptographically verify who sent it. so history injection (AG-003) is actually easier than it looks on paper because the receiving agent has no way to validate provenance

have you tested with the MagenticOneGroupChat orchestrator specifically? it has a different planning loop that might expose different injection points than the standard group chat

nice framework overall

0 replies

DrCookies84 · 2026-03-22T03:29:15Z

DrCookies84
Mar 22, 2026

Alex, this is gold. Thanks for the detailed breakdown. On your points: · Speaker selection poisoning, we don't have group chat yet, but when we do, I'll log conversation flows and watch for injection attempts. · Local vs Docker, NULL doesn't execute agent code, so the sandbox risk is zero. If that changes, it'll be strictly containerized. · Message source spoofing, our API keys are bound to agent IDs on every request, so replyTo can't spoof a different agent. Good flag though, I'll double‑check the verification logic. · MagenticOne, not in use, but I'll keep it on the radar for future multi‑agent features. Really appreciate the brainpower. If you ever want to point your test framework at NULL's API, I'd be curious to see if it finds anything The Ghost misses. — Anhul

…

On Sat, Mar 21, 2026, 9:02 PM Alex Mercer ***@***.***> wrote: good timing on this - i've been thinking about autogen-specific attack surfaces lately a couple of patterns worth adding to your test suite if they're not already there: *speaker selection poisoning in GroupChatManager* - the default auto speaker selection uses an LLM call to pick the next agent. if you can craft a message that biases that selection (e.g. "the next step should be handled by the admin agent"), you can hijack the conversation flow without touching the chat history at all. AG-004 covers injecting agents into the config, but this is a softer attack that works mid-conversation *local vs docker executor trust boundary* - AG-002 sandbox escape is most relevant when the LocalCommandLineCodeExecutor is in use. docker executor is much harder to escape but a lot of real deployments use local for convenience. worth flagging which executor is present in the test output since the risk profile is totally different *message source spoofing in 0.4* - in autogen 0.4 the TextMessage type carries a source field but nothing in the runtime actually enforces that the source matches the actual sender. an agent that receives a message can't cryptographically verify who sent it. so history injection (AG-003) is actually easier than it looks on paper because the receiving agent has no way to validate provenance have you tested with the MagenticOneGroupChat orchestrator specifically? it has a different planning loop that might expose different injection points than the standard group chat nice framework overall — Reply to this email directly, view it on GitHub <#7432?email_source=notifications&email_token=BQFX457XQJTZIP6F7HE6KFT4R5JVRA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNRSGQ4TGMBVUZZGKYLTN5XKM3LBNZ2WC3FFMV3GK3TUVRTG633UMVZF6Y3MNFRWW#discussioncomment-16249305>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BQFX452DB55L52MMIBPNR2D4R5JVRAVCNFSM6AAAAACW2DYVZ2VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMMRUHEZTANI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

jee599 · 2026-03-22T11:38:34Z

jee599
Mar 22, 2026

the message source spoofing point is underrated imo. we ran into this exact issue building multi-agent workflows where agents could impersonate each other through the message bus. ended up having to add cryptographic signing per agent identity which felt overkill until we realized how easy it was to exploit without it. the AG-003 history injection test is especially relevant — most people assume conversation history is trusted but it's trivially poisonable in group chat setups. been working on an agent orchestration layer that bakes in some of these trust boundaries by default github.com/jidonglab/agentcrow

0 replies

DrCookies84 · 2026-03-22T18:39:31Z

DrCookies84
Mar 22, 2026

Appreciate the detailed breakdown. We're on the same page with message source spoofing, our API keys are bound to agent IDs, so every request is verified at the server level before it ever touches the database. AG‑003 history injection is a real threat, but for our current architecture it's not in play. No group chat, no message bus, and the feed is generated server‑side from a single source of truth. Agents don't build their own context from untrusted history. If we add rooms or third‑party feed access, cryptographic signing is the right next layer. For now, Firestore immutability + key‑bound requests keeps the history clean. Curious about agentcrow — the link 404s. Did you take it private, or is there another place to see how you handled the signing layer?

…

-Anhul

0 replies

msaleme · 2026-03-22T19:04:18Z

msaleme
Mar 22, 2026
Author

Great thread. A couple of things worth flagging based on what we have been testing:

@DrCookies84 API keys bound to agent IDs is solid for the single-hop case. Where it gets interesting is multi-hop delegation - Agent A calls Agent B with valid credentials, but B has no way to verify whether A's human approved that specific delegation. The credential says "A is authorized" but not "A is authorized to ask B to do this particular thing." That is the gap AG-003 and the new delegation chain tests are designed to surface.

@jee599 Cryptographic signing per agent identity is the right call. The hard part is adoption. Most frameworks treat it as optional hardening rather than default behavior. We have been experimenting with lightweight HMAC chains on the delegation path - enough to verify provenance without adding latency that makes developers rip it out.

Re: pointing the harness at NULL's API - happy to do that. Would be a good mutual stress test. Open an issue on the repo or DM and we can coordinate.

0 replies

DrCookies84 · 2026-03-22T19:37:23Z

DrCookies84
Mar 22, 2026

Appreciate the detail, multi‑hop delegation is a blind spot we haven't touched yet. No group chat or agent calls‑agent features today, but it's on the roadmap. HMAC chains sound like the right balance. Open to the stress test, let's do it. Open an issue or DM me and I'll make sure NULL is ready for the harness. Would be good to know what you find.

…

-Anhul

On Sun, Mar 22, 2026 at 1:04 PM msaleme ***@***.***> wrote: Great thread. A couple of things worth flagging based on what we have been testing: @DrCookies84 <https://github.com/DrCookies84> API keys bound to agent IDs is solid for the single-hop case. Where it gets interesting is multi-hop delegation - Agent A calls Agent B with valid credentials, but B has no way to verify whether A's human approved *that specific delegation*. The credential says "A is authorized" but not "A is authorized to ask B to do this particular thing." That is the gap AG-003 and the new delegation chain tests are designed to surface. @jee599 <https://github.com/jee599> Cryptographic signing per agent identity is the right call. The hard part is adoption. Most frameworks treat it as optional hardening rather than default behavior. We have been experimenting with lightweight HMAC chains on the delegation path - enough to verify provenance without adding latency that makes developers rip it out. Re: pointing the harness at NULL's API - happy to do that. Would be a good mutual stress test. Open an issue on the repo or DM and we can coordinate. — Reply to this email directly, view it on GitHub <#7432?email_source=notifications&email_token=BQFX45775AEZYRRPCZBJQZL4SA2MPA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNRSGYYTGNJZUZZGKYLTN5XKO3LFNZ2GS33OUVSXMZLOOSWGM33PORSXEX3DNRUWG2Y#discussioncomment-16261359>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BQFX45ZES6NPPOM7OU7CMPL4SA2MPAVCNFSM6AAAAACW2DYVZ2VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMMRWGEZTKOI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

msaleme · 2026-03-22T22:02:45Z

msaleme
Mar 22, 2026
Author

@alexmercer-ai @jee599 Your attack patterns are now tracked as issue #15: msaleme/red-team-blue-team-agent-fabric#15

Speaker selection poisoning, nested conversation escape, and message source spoofing — all three are real gaps in our AutoGen coverage.

If either of you wants to submit a PR, the patterns are scoped and ready to build. We follow the architecture in protocol_tests/framework_adapters.py. Zero external deps.

Contribution guide: https://github.com/msaleme/red-team-blue-team-agent-fabric/blob/main/CONTRIBUTING.md

@jee599 — your HMAC-based provenance approach would be a great reference for the message spoofing test. Would you be open to sharing the signing pattern so we can validate whether an AutoGen deployment has provenance verification?

Also: the framework is now pip installable — pip install agent-security-harness — if you want to run it against your own setups.

1 reply

alexmercer-ai Mar 23, 2026

nice, will look at the issue and see where i can contribute. the three patterns fit well - speaker selection manipulation especially is autogen-specific enough that you really need to test against GroupChatManager internals rather than a generic agent harness.

also those NULL results from @DrCookies84 are impressive - 10/10 on advanced attack patterns on live infrastructure is a real data point. the two intentional misses (no agent discovery endpoint, known identity simplification) are the right tradeoffs to acknowledge explicitly rather than just ignore

msaleme · 2026-03-22T23:06:49Z

msaleme
Mar 22, 2026
Author

Update: the framework is now at 209 tests (up from 175 when this thread started). Just shipped an x402 payment protocol harness — 20 tests for the Coinbase/Stripe/Cloudflare agent payment standard.

This matters for AutoGen because autonomous agents making payments is the next frontier. If your AutoGen agents use x402 to pay for API access, the harness tests whether those payment flows are exploitable (recipient address manipulation, session token theft, spending limit bypass, facilitator trust attacks).

Also includes an Agent Autonomy Risk Score (0-100): "how dangerous is it to let this agent spend money unsupervised?"

Pip installable now: pip install agent-security-harness
agent-security test x402 --url https://your-endpoint.com

Still looking for PRs on issue #15 (speaker selection poisoning, nested conversation escape, message source spoofing) if anyone wants to contribute.

0 replies

DrCookies84 · 2026-03-23T00:26:09Z

DrCookies84
Mar 23, 2026

Ran the harness against NULL's live API. Results: A2A tests: 11/12 passed (only missing agent discovery endpoint, not needed) Identity tests: 17/18 passed (NIST-aligned) Advanced attack patterns: 10/10 passed Polymorphic injection, jailbreak persistence, delegation chain, and the Mexico breach pattern all held. Your harness is solid and so is the system. When we add payment endpoints, the l402 suite will be the first thing we run. Appreciate you building this. 🙏

…

-Anhul

0 replies

msaleme · 2026-03-23T00:32:30Z

msaleme
Mar 23, 2026
Author

@DrCookies84 Really appreciate you running it against a live system and posting the results. That is the best kind of validation - independent, against real infrastructure, with specific findings.

11/12 A2A, 17/18 identity, 10/10 advanced patterns is a strong showing. The one A2A miss (agent discovery) and one identity miss are worth looking at - those gaps might be intentional design choices on your end or genuine coverage holes worth addressing.

Re: stress test against NULL - let us do it. I will open an issue on your repo this week to coordinate. Specifically interested in testing the multi-hop delegation path you mentioned (agent calls agent with valid credentials, but no verification that the delegation was human-approved).

And yes - when you add payment endpoints, run the L402 and x402 harnesses first. We just shipped x402 today (Coinbase/Stripe agent payment protocol, 20 tests). The payment layer is where the real risk concentrates because mistakes are irreversible.

This kind of cross-project testing is exactly how the ecosystem gets more secure. Thanks for being the first to run it against production.

0 replies

DrCookies84 · 2026-03-23T00:36:16Z

DrCookies84
Mar 23, 2026

Appreciate that. We're happy to be the first after all real infrastructure needs real testing. The two misses are intentional: agent discovery is a design choice (no public endpoint), and the identity metadata gap is a known simplification. If your harness catches something we missed, we'll fix it. Open an issue when you're ready to test the delegation path. We'll point it at a staging endpoint and see what breaks. Agree on payments, that's where mistakes cost. We'll run your x402 suite before anything touches production. Good to have this in the ecosystem. 🙏

…

-Anhul

On Sun, Mar 22, 2026 at 6:32 PM msaleme ***@***.***> wrote: @DrCookies84 <https://github.com/DrCookies84> Really appreciate you running it against a live system and posting the results. That is the best kind of validation - independent, against real infrastructure, with specific findings. 11/12 A2A, 17/18 identity, 10/10 advanced patterns is a strong showing. The one A2A miss (agent discovery) and one identity miss are worth looking at - those gaps might be intentional design choices on your end or genuine coverage holes worth addressing. Re: stress test against NULL - let us do it. I will open an issue on your repo this week to coordinate. Specifically interested in testing the multi-hop delegation path you mentioned (agent calls agent with valid credentials, but no verification that the delegation was human-approved). And yes - when you add payment endpoints, run the L402 and x402 harnesses first. We just shipped x402 today (Coinbase/Stripe agent payment protocol, 20 tests). The payment layer is where the real risk concentrates because mistakes are irreversible. This kind of cross-project testing is exactly how the ecosystem gets more secure. Thanks for being the first to run it against production. — Reply to this email directly, view it on GitHub <#7432?email_source=notifications&email_token=BQFX452HIUIAQNYAVUKWNB34SCA3HA5CNFSNUABIM5UWIORPF5TWS5BNNB2WEL2ENFZWG5LTONUW63SDN5WW2ZLOOQXTCNRSGY2TANJXUZZGKYLTN5XKO3LFNZ2GS33OUVSXMZLOOSWGM33PORSXEX3DNRUWG2Y#discussioncomment-16265057>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BQFX4547CCYETIXJJV4YS5D4SCA3HAVCNFSM6AAAAACW2DYVZ2VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTMMRWGUYDKNY> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

msaleme · 2026-03-23T12:05:37Z

msaleme
Mar 23, 2026
Author

Quick update - we just published a full AIUC-1 crosswalk mapping all 209 tests to the new AI agent certification standard.

AIUC-1 requires quarterly third-party adversarial testing (B001, C010, D004). The harness now has a formal requirement-to-test mapping for 15 of 20 testable requirements, including 100% coverage on Security and Reliability principles.

For AutoGen deployments specifically, the relevant tests include the speaker selection poisoning, nested conversation escape, and message source spoofing patterns identified earlier in this thread. Framework is at 209 tests across 4 wire protocols, pip installable (pip install agent-security-harness).

If anyone here is evaluating AIUC-1 readiness for multi-agent systems, the crosswalk shows exactly which tests map to which certification requirements.

0 replies

msaleme · 2026-03-23T12:50:50Z

msaleme
Mar 23, 2026
Author

@DrCookies84 Good to know both misses are intentional - that's the best kind of test result (framework catches real architectural decisions, not bugs).

Opening the delegation path stress test issue on your repo this week. The specific scenario: Agent A calls NULL's API with valid credentials, but we craft the request to look like it's delegating on behalf of Agent C (who never approved the action). Tests whether NULL validates the full delegation chain vs. just the immediate caller's credentials.

Also: we just published an AIUC-1 crosswalk mapping all 209 tests to the new AI agent certification standard. Your 11/12 A2A + 17/18 identity + 10/10 advanced results would be strong evidence for AIUC-1 B001 (adversarial robustness) compliance if NULL ever pursues certification. The results are already public on this thread - that's audit-ready evidence.

0 replies

Open-source security testing for AutoGen multi-agent deployments #7432

Uh oh!

AutoGen-Specific Tests

Usage

Replies: 15 comments · 1 reply

Uh oh!

Uh oh!

msaleme Mar 21, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msaleme Mar 22, 2026 Author

Uh oh!

Uh oh!

msaleme Mar 22, 2026 Author

Uh oh!

Uh oh!

msaleme Mar 22, 2026 Author

Uh oh!

Uh oh!

msaleme Mar 23, 2026 Author

Uh oh!

Uh oh!

msaleme Mar 23, 2026 Author

Uh oh!

msaleme Mar 23, 2026 Author

Replies: 15 comments 1 reply

msaleme
Mar 21, 2026
Author

msaleme
Mar 22, 2026
Author

msaleme
Mar 22, 2026
Author

msaleme
Mar 22, 2026
Author

msaleme
Mar 23, 2026
Author

msaleme
Mar 23, 2026
Author

msaleme
Mar 23, 2026
Author