Skip to content

[Question] How to use OPTION_CHAT_TEMPLATE and OPTION_TOOL_PARSER_PLUGIN. #2860

@kanno-go

Description

@kanno-go

Description

I am trying to use a custom chat template and tool parser plugin with the djl-serving container, but I am not sure how to correctly specify and load these files using OPTION_CHAT_TEMPLATE and OPTION_TOOL_PARSER_PLUGIN.

I prepared chat-template and custom tool parser in model artifact.

<s3://<path-to-model>/> -+- config.json
                         +- my_template.jinja
                         +- my_tool_calling_parser.py
                         +- (other files)
 

How to Reproduce?

I tried to deploy endpoint using following code.

import sagemaker
from sagemaker.djl_inference import DJLModel, DJLPredictor
import json
import os
 
role = sagemaker.get_execution_role()
sess = sagemaker.Session()
 
djl_image_uri = "<myaccount>.dkr.ecr.ap-northeast-1.amazonaws.com/bedrock/djl-serving:0.33.0-lmi"
 
instance_type = "ml.g6.12xlarge"
 
endpoint_base_name = "my-endpoint"
 
# model s3 uri
model_id = "s3://<path-to-model>/"
 
envs = {
    "VLLM_ATTENTION_BACKEND": "FLASHINFER",
    "VLLM_USE_V1": "0",
    "OPTION_ROLLING_BATCH": "vllm",
    "OPTION_MAX_ROLLING_BATCH_SIZE": "64",
    "OPTION_ENABLE_STEAMING": "true",
    "OPTION_KV_CACHE_DTYPE": "fp8",
    "OPTION_ENABLE_AUTO_TOOL_CHOICE": "true",
    "OPTION_CHAT_TEMPLATE": "./my_template.jinja",
    "OPTION_TOOL_PARSER_PLUGIN": "./my_tool_calling_parser.py",
    "OPTION_TOOL_CALL_PARSER": "my_parser",
}
 
model = DJLModel(
    model_id=model_id,
    image_uri=djl_image_uri,
    tensor_parallel_degree=1,
    job_queue_size=1024,
    model_loading_timeout=300,
    role=role,
    env=envs
)
 
endpoint_name = sagemaker.utils.name_from_base(endpoint_base_name)
predictor = model.deploy(
    kms_key="",
    initial_instance_count=1,
    instance_type=instance_type,
    endpoint_name=endpoint_name,
    container_startup_health_check_timeout=600,
    serializer=sagemaker.serializers.JSONSerializer(),
    deserializer=sagemaker.deserializers.JSONDeserializer(),
)

Error Message

This error occured after executing model.deploy

[INFO ] PyProcess - W-156-2bd0e06687b290c-stdout:   Value error, Invalid tool call parser: my_parser (chose from { granite-20b-fc,granite,hermes,internlm,jamba,llama4_json,llama3_json,mistral,phi4_mini_json,pythonic }) [type=value_error, input_value={'handler': 'handle', 'ch...32b7b82c0df2c0b917edc7'}, input_type=dict]

Question

How can I correctly specify and use custom chat template and tool parser plugin files in the djl-serving container? Is there a specific way to reference these files?

Thank you in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions