generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 86
Open
Labels
Description
Description
I am trying to use a custom chat template and tool parser plugin with the djl-serving container, but I am not sure how to correctly specify and load these files using OPTION_CHAT_TEMPLATE and OPTION_TOOL_PARSER_PLUGIN.
I prepared chat-template and custom tool parser in model artifact.
<s3://<path-to-model>/> -+- config.json
+- my_template.jinja
+- my_tool_calling_parser.py
+- (other files)
How to Reproduce?
I tried to deploy endpoint using following code.
import sagemaker
from sagemaker.djl_inference import DJLModel, DJLPredictor
import json
import os
role = sagemaker.get_execution_role()
sess = sagemaker.Session()
djl_image_uri = "<myaccount>.dkr.ecr.ap-northeast-1.amazonaws.com/bedrock/djl-serving:0.33.0-lmi"
instance_type = "ml.g6.12xlarge"
endpoint_base_name = "my-endpoint"
# model s3 uri
model_id = "s3://<path-to-model>/"
envs = {
"VLLM_ATTENTION_BACKEND": "FLASHINFER",
"VLLM_USE_V1": "0",
"OPTION_ROLLING_BATCH": "vllm",
"OPTION_MAX_ROLLING_BATCH_SIZE": "64",
"OPTION_ENABLE_STEAMING": "true",
"OPTION_KV_CACHE_DTYPE": "fp8",
"OPTION_ENABLE_AUTO_TOOL_CHOICE": "true",
"OPTION_CHAT_TEMPLATE": "./my_template.jinja",
"OPTION_TOOL_PARSER_PLUGIN": "./my_tool_calling_parser.py",
"OPTION_TOOL_CALL_PARSER": "my_parser",
}
model = DJLModel(
model_id=model_id,
image_uri=djl_image_uri,
tensor_parallel_degree=1,
job_queue_size=1024,
model_loading_timeout=300,
role=role,
env=envs
)
endpoint_name = sagemaker.utils.name_from_base(endpoint_base_name)
predictor = model.deploy(
kms_key="",
initial_instance_count=1,
instance_type=instance_type,
endpoint_name=endpoint_name,
container_startup_health_check_timeout=600,
serializer=sagemaker.serializers.JSONSerializer(),
deserializer=sagemaker.deserializers.JSONDeserializer(),
)
Error Message
This error occured after executing model.deploy
[INFO ] PyProcess - W-156-2bd0e06687b290c-stdout: Value error, Invalid tool call parser: my_parser (chose from { granite-20b-fc,granite,hermes,internlm,jamba,llama4_json,llama3_json,mistral,phi4_mini_json,pythonic }) [type=value_error, input_value={'handler': 'handle', 'ch...32b7b82c0df2c0b917edc7'}, input_type=dict]
Question
How can I correctly specify and use custom chat template and tool parser plugin files in the djl-serving container? Is there a specific way to reference these files?
Thank you in advance for your help!
Reactions are currently unavailable