Guidance on using Model Armor with streaming enabled #4251
marttinslucas
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I’m looking for guidance on the recommended usage of Model Armor when streaming responses are enabled in the Google ADK / GenAI stack.
In streaming mode, model outputs are delivered as chunked events, which makes it difficult to evaluate the full semantic context of a response from a Model Armor perspective. The analysis ends up being applied to partial text fragments rather than the complete model output.
I tested the Model Armor plugin available in adk-samples, but it does not seem to provide built-in support for aggregating streamed chunks before performing a consolidated evaluation.
As a result, I’m observing the following behavior:
• Certain events are blocked by Model Armor mid-stream
• Meanwhile, subsequent text chunks continue to be emitted
• This leads to a “broken” final response, where moderation blocks some events, but the user still receives partial or continuing output
This raises a few questions:
1. Is there an official recommendation from Google on how Model Armor should be used when streaming is enabled?
2. Is the expected pattern to:
• Buffer and reassemble all chunks before running Model Armor, or
• Apply Model Armor incrementally per chunk (and if so, how should blocking be handled)?
3. Are there any reference implementations or best practices for combining Model Armor with streaming in production systems?
4. Is this a known limitation of streaming + moderation, or is there a configuration we might be missing?
Any guidance, documentation, or examples would be greatly appreciated.
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions