Important
This package is in early development and subject to rapid changes. Breaking changes between versions are likely as the project evolves.
The framework provides useful classes and helpers for creating DIAL Interceptors in Python for chat completion and embedding models.
An interceptor could be thought of as a middleware that
- modifies an incoming DIAL request received from the client (or it may leave it as is)
- calls upstream DIAL application (the upstream for short) with the modified request
- modifies the response from the upstream (or it may leave it as is)
- returns the modified response to the client
The upstream is encapsulated behind a special deployment id interceptor. This deployment id is resolved by the DIAL Core into an appropriate deployment id.
Interceptors could be classified into the following categories:
- Pre-interceptors that only modify the incoming request from the client (e.g. rejecting requests following certain criteria)
- Post-interceptors that only modify the response received from the upstream (e.g. censoring the response)
- Generic interceptors that modify both the incoming request and the response from the upstream (e.g. caching the responses)
To create chat completion interceptor one needs to implement instance of the class ChatCompletionInterceptor and for embedding interceptor - EmbeddingsInterceptor.
See example interceptor implementations for more details.
Copy .env.example to .env and customize it for your environment:
| Variable | Default | Description |
|---|---|---|
| LOG_LEVEL | INFO | Log level. Use DEBUG for dev purposes and INFO in prod |
| WEB_CONCURRENCY | 1 | Number of workers for the server |
| DIAL_URL | The URL of the DIAL Core server |
This project uses Python>=3.11 and Poetry>=2.1.1 as a dependency manager.
Check out Poetry's documentation on how to install it on your system before proceeding.
To install requirements:
poetry installThis will install all requirements for running the package, linting, formatting and tests.
The recommended IDE is VSCode. Open the project in VSCode and install the recommended extensions.
The VSCode is configured to use PEP-8 compatible formatter Black.
Alternatively you can use PyCharm.
Set-up the Black formatter for PyCharm manually or install PyCharm>=2023.2 with built-in Black support.
As of now, Windows distributions do not include the make tool. To run make commands, the tool can be installed using the following command (since Windows 10):
winget install GnuWin32.MakeFor convenience, the tool folder can be added to the PATH environment variable as C:\Program Files (x86)\GnuWin32\bin.
The command definitions inside Makefile should be cross-platform to keep the development environment setup simple.
To run the linting before committing:
make lintTo auto-fix formatting issues run:
make formatTo run unit tests:
make testTo remove the virtual environment and build artifacts:
make cleanThe repository also provides examples of various DIAL Interceptors all packed into a DIAL service.
Keep in mind, that the following example interceptors aren't ready for a production use. They are provided solely as examples to demonstrate basic use cases of interceptors. Use at your discretion.
| Interceptor name | Category | Description |
|---|---|---|
| reply-as-pirate | Pre | Injects systems prompt Reply as a pirate to the request |
| reject-external-links | Pre | Rejects any URL in DIAL attachments which do not point to DIAL Core storage |
| reject-blacklisted-words | Generic | Rejects the request if it contains any blacklisted words |
| image-watermark | Post | Stamps "EPAM DIAL" watermark on all image attachments in the response. Demonstrates how to work with files stored on DIAL File Storage. |
| statistics-reporter | Post | Collects statistics on the response stream (tokens/sec, finish reason, completion tokens etc) and reports it in a new stage when response is finished |
| spacy-anonymizer | Generic | Anonymizes PII in the request via Spacy library, calls the upstream, deanonymizes the response. The list of anonymized entities is configurable via SPACY_ANONYMIZER_LABELS_TO_REDACT env variable |
| google-dlp-anonymizer | Generic | Anonymizes PII in the request via Google DLP API, calls the upstream, deanonymizes the response. The list of anonymized entities could be specified in the interceptor configuration |
| langfuse | Generic | Integration with Langfuse |
| replicator:N | Generic | Calls the upstream N times and combines the N response into a single response. Could be useful for stabilization of model's output, since certain models aren't deterministic. |
| cache | Generic | Caches incoming chat completion requests. Not ready for production use. Use at your discretion |
| no-op | Generic | No-op interceptor - does not modify the request or the response, simply proxies the upstream |
| Interceptor name | Category | Description |
|---|---|---|
| reject-blacklisted-words | Pre | Rejects the request if it contains any blacklisted words |
| normalize-vector | Post | Normalizes the vector in the response |
| project-vector:N | Post | Changes the dimensionality of the vectors in the response to N, where N is an integer path parameter. |
| no-op | Generic | No-op interceptor - does not modify the request or the response, simply proxies the upstream |
Copy .env.example to .env and customize it for your environment:
| Variable | Default | Description |
|---|---|---|
| SPACY_ANONYMIZER_LABELS_TO_REDACT | PERSON,ORG,GPE,PRODUCT | Comma-separated list of spaCy entity types to redact. Find the full list of entities here. |
| GOOGLE_DLP_ANONYMIZER_INFO_TYPES_TO_DE_IDENTIFY | PHONE_NUMBER,FIRST_NAME,LAST_NAME | Comma-separated list of Google info types to de-identify. The full list of InfoType's for anonymization could be found in the Google DLP documentation. Alternatively, info types could be configured on per-deployment basis in the DAIL Core Config. |
| GCP_PROJECT_ID | GCP project ID used by google-dlp-anonymizer interceptor. The required IAM Role to access the DLP de-identify API is DLP User. |
|
| LANGFUSE_SECRET_KEY | Langfuse secret key | |
| LANGFUSE_PUBLIC_KEY | Langfuse public key | |
| LANGFUSE_HOST | Langfuse server host |
To run the server with examples using pip:
pip install uvicorn python-dotenv "aidial-interceptors-sdk[examples]"
echo "DIAL_URL=URL" > .env
uvicorn "aidial_interceptors_sdk.examples.app:app" --host "0.0.0.0" --port 5000 --env-file ./.envDon't forget to set the appropriate DIAL_URL in the .env file.
The command will start the server on http://localhost:5000 exposing endpoints for each of the interceptors like the following:
http://localhost:5000/openai/deployments/spacy-anonymizer/chat/completionshttp://localhost:5000/openai/deployments/normalize-vector/embeddings
First clone the repository:
git clone https://github.com/epam/ai-dial-interceptors-sdk.git
cd ai-dial-interceptors-sdk
echo "DIAL_URL=URL" > .envThen run dev server with examples:
make examples_serveOr run the server from Docker container:
make examples_docker_serveThe interceptor endpoints are defined in the interceptors section of the DIAL Core configuration like this:
{
"interceptors": {
"chat-reply-as-pirate": {
"endpoint": "${INTERCEPTOR_SERVICE_URL}/openai/deployments/reply-as-pirate/chat/completions"
},
"chat-statistics-reporter": {
"endpoint": "${INTERCEPTOR_SERVICE_URL}/openai/deployments/statistics-reporter/chat/completions"
},
"chat-google-dlp-anonymizer": {
"endpoint": "${INTERCEPTOR_SERVICE_URL}/openai/deployments/google-dlp-anonymizer/chat/completions"
}
}
}where INTERCEPTOR_SERVICE_URL is the URL of the interceptor service, which is http://localhost:5000 when run locally, or the interceptor service URL when deployed within Kubernetes.
The declared interceptors could be then attached to particular models and applications:
{
"models": {
"anthropic.claude-v3-haiku": {
"type": "chat",
"iconUrl": "anthropic.svg",
"endpoint": "${BEDROCK_ADAPTER_SERVICE_URL}/openai/deployments/anthropic.claude-3-haiku-20240307-v1:0/chat/completions",
"interceptors": [
"chat-statistics-reporter",
"chat-reply-as-pirate"
]
}
}
}Make sure that
- chat completion interceptors are only used in chat models or applications,
- embeddings interceptors are only used in embeddings models.
The stack of interceptors in DIAL works similarly to a stack of middlewares in Express.js or Django:
Client -> (original request) ->
Interceptor 1 -> (modified request #1) ->
Interceptor 2 -> (modified request #2) ->
Upstream -> (original response) ->
Interceptor 2 -> (modified response #1) ->
Interceptor 1 -> (modified response #2) ->
ClientEvery request/response in the diagram above goes through the DIAL Core. This is hidden from the diagram for brevity.
If an interceptor support configuration, it must expose /configuration endpoint which must return JSON schema of the configuration. This configuration endpoint must be specified under feature.configurationEndpoint fields in the DIAL Core configuration.
The interceptor configuration could be preset on the per-interceptor basis in DIAL Core configuration via the defaults field:
{
"interceptors": {
"chat-google-dlp-anonymizer": {
"endpoint": "${INTERCEPTOR_ORIGIN}/openai/deployments/google-dlp-anonymizer/chat/completions",
"features": {
"configurationEndpoint": "${INTERCEPTOR_ORIGIN}/openai/deployments/google-dlp-anonymizer/configuration",
},
"defaults": {
"custom_fields": {
"interceptor_configuration": "$interceptor_configuration"
}
}
}
},
"models": {
"anthropic.claude-v3-haiku": {
"type": "chat",
"iconUrl": "anthropic.svg",
"endpoint": "${BEDROCK_ADAPTER_SERVICE_URL}/openai/deployments/anthropic.claude-3-haiku-20240307-v1:0/chat/completions",
"interceptors": [
"chat-google-dlp-anonymizer"
]
}
}
}The given defaults field means that DIAL Core will enrich chat completion request sent to the interceptor with custom_fields.interceptor_configuration field equal to $interceptor_configuration. This JSON value must follow the JSON schema exposed by the /configuration endpoint.
The interceptor allows to configure the entities in the text that are going to be identified and replaced with placeholders.
Here is an example of $interceptor_configuration for the interceptor:
{
"deidentification_config": {
"info_types": [
"PHONE_NUMBER",
"FIRST_NAME",
"LAST_NAME"
]
}
}The full list of targets for anonymization (aka info-types) could be found in the Google DLP documentation.
The list of info types in the DIAL Core config overrides over the one configured in the GOOGLE_DLP_ANONYMIZER_INFO_TYPES_TO_DE_IDENTIFY environment variable.