This ADR documents the design and decisions for extending the detector API for various fields, to accomodate various outputs from models in the guardrails ecosystem.
This serves as an extension to ADR 003 - Detector API design. The detector API can be found at this Github page.
Libraries like the vllm-detector-adapter provide the detector API and serve LLMs like Granite Guardian and Llama Guard as detectors easily thrugh vLLM.
As the models undergo development, more information is being provided on model output. Llama Guard will provide "unsafe" categories when input has been categorized as "unsafe" e.g. "unsafe\nS1" ref, and Granite Guardian 3.2 began to provide additional information such as confidence e.g. "No\n<confidence> High </confidence> to indicate No risk and High confidence in that decision ref.
At the time of writing, the detector API endpoints return two types of responses:
/text/contentsreturns a list of lists of detections with spans. Each list of detections corresponds to the respective content incontentsprovided in the user request, socontentswith 2 texts would return a list of 2 lists.- Other endpoints return a list of detections, without spans.
Any placement of additional detector fields should account for the two types of responses.
- This ADR considers fields that apply on each particular detection or decision made by the detector model.
- Knowledge of future model plans is being restricted here, so only a few examples are given with already released model functionalities.
- We will add a new high-level field of
metadatato account for additional information from detector models. This field will provide a dictionary with string keys and arbitrary values, so that values are not constrained to particular types like strings or floats. This will enable flexibility and is how APIs like Llama Stack provides additional information, whether on datasets or models.
Example
{
"detection_type": "animal",
"detection": "goose",
"score": 0.2,
"evidence": [],
"metadata":
{
"confidence": "High",
"key": 0.3,
"categories": ["bird"]
}
}- To distinguish
metadatafrom the existingevidencefield, any attributes underevidenceare meant to help answer: "Why was this decision made?" metadatawill just present information, and the orchestrator will not be altering workflow directions based on any information within themetadata. The orchestrator is currently not designed to take any action or decision based on model outputs, as the API is designed to present information to the orchestrator API user or consuming application. The user or application can then decide what to do with the information, whether doing another generation call, masking text, or further presenting the info to that consuming application's users.- The updates will affect what endpoints of the detector API return, and changes will be reflected on the orchestrator API as well.
- To keep the experience consistent among various detector API endpoints i.e.
/text/contentsvs. others, any added fields will be on the same level e.g. on the same level asdetection.
A few alternate strategies were considered with pros and cons documented.
a. features
Example
{
"detection_type": "animal",
"detection": "goose",
"score": 0.2,
"evidence": [],
"features":
{
"confidence": "High",
"key": 0.3,
"categories": ["bird"]
}
}Pros:
- Slightly more descriptive than just
metadata
Cons:
featuresmight not be appropriate for all attributes- May be confusing in relation to model 'features' in relation to data
- Similar to the
metadatacase, addition offeaturescould create potential confusion withevidence - Similar to the
metadatacase, arbitrary keys and values will be difficult to validate, but implementations also do not have to validate.
b. attributes - This is a more general term than features but can be considered more restricting than metadata. Not all fields may be considered attributes of the decision.
c. controls - This concept may be too Granite specific ref. Fields like confidence are also not "controlled" or requested by the user.
Currently, a list of evidence can be provided, with arbitrary string attributes as name and corresponding string value and float score, with nested evidence as necessary. value and score may not appropriate for each field or attribute case and can be optional.
Example
{
"detection_type": "animal",
"detection": "goose",
"score": 0.2,
"evidence": [
{
"name": "confidence",
"value": "High"
},
{
"name": "categories",
"value": "bird"
}
]
}Pros:
- The current API can remain the same
- Generally flexible to various attributes and values that are strings, floats
Cons:
- Not all fields or attributes are necessarily appropriate as
evidenceor explanatory toward the particular detection but may be providing more information valueis constrained to string andscoreis constrained to float currently. For some fields,valuemay be more appropriate as another data format.
Example
{
"detection_type": "animal",
"detection": "goose",
"score": 0.2,
"evidence": [],
"confidence": "High"
}Pro:
- Similar to the
metadatacase, enables a lot of flexibility - Potentially slightly less confusion of a completely alternate field than
evidence, but would still require dilineating what is notevidence
Cons:
- More so than the
metadatacase, this will be more difficult to document in the API and allow users to expect particular fields on responses, especially if different detectors provide different fields - Arbitrary keys and values will be difficult to validate
- Still raises the question of what goes under existing
evidenceor is put at the higher level
- Both detector API users and orchestrator API users will see additional fields reflected with detection results.
- The APIs will handle additional model outputs as model versions are released.
- API users will be able to parse the
metadatafield to receive additional model information. - Implementers of the detector API can use the
metadatafield to provide additional model information.
Proposed