Why `(max_new_tokens + input_token_num) >= _max_req_len` instead of >? #16020

dirtyDan0 · 2025-12-28T14:27:47Z

dirtyDan0
Dec 28, 2025

Hi, I hit a boundary-condition issue when hosting an SGLang server.

Error message:

Requested token count exceeds the model's maximum context length of 16384 tokens. You requested a total of 16384 tokens: 14860 tokens from the input messages and 1524 tokens for the completion.

In this case, input_token_num + max_new_tokens == 16384 (exactly equal to the max context length), but the request is still rejected.

Could you explain why the check is >= _max_req_len rather than > _max_req_len? Thank U.

sglang/python/sglang/srt/managers/tokenizer_manager.py

Lines 746 to 771 in e6d5a21

    
           # Validate total tokens (input + max_new_tokens) 
        
           max_new_tokens = obj.sampling_params.get("max_new_tokens") 
        
           if ( 
        
               self.validate_total_tokens 
        
               and max_new_tokens is not None 
        
               and (max_new_tokens + input_token_num) >= _max_req_len 
        
           ): 
        
               if self.server_args.allow_auto_truncate: 
        
                   logger.warning( 
        
                       f"Requested token count ({input_token_num} input + {max_new_tokens} new) " 
        
                       f"exceeds the model's context length ({self.context_len} tokens). " 
        
                       "Truncating max_new_tokens." 
        
                   ) 
        
                   obj.sampling_params["max_new_tokens"] = max( 
        
                       0, _max_req_len - input_token_num 
        
                   ) 
        
               else: 
        
                   total_tokens = max_new_tokens + input_token_num 
        
                   error_msg = ( 
        
                       f"Requested token count exceeds the model's maximum context length " 
        
                       f"of {self.context_len} tokens. You requested a total of {total_tokens} " 
        
                       f"tokens: {input_token_num} tokens from the input messages and " 
        
                       f"{max_new_tokens} tokens for the completion. Please reduce the number " 
        
                       f"of tokens in the input messages or the completion to fit within the limit." 
        
                   ) 
        
                   raise ValueError(error_msg)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why `(max_new_tokens + input_token_num) >= _max_req_len` instead of >? #16020

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Why (max_new_tokens + input_token_num) >= _max_req_len instead of >? #16020

Uh oh!

dirtyDan0 Dec 28, 2025

Replies: 0 comments

Why `(max_new_tokens + input_token_num) >= _max_req_len` instead of >? #16020

dirtyDan0
Dec 28, 2025