Skip to content

Conversation

hiento09
Copy link
Contributor

This pull request updates the container startup command in the apps/jan-inference-model/Dockerfile to enhance model serving performance and resource management. The new command introduces several runtime options to better control memory usage, execution mode, and model input size.

Model serving configuration improvements:

  • Added --gpu-memory-utilization 0.65 to limit GPU memory usage to 65%, helping prevent out-of-memory errors.
  • Enabled --enforce-eager to force eager execution mode, which can simplify debugging and improve compatibility.
  • Set --max_model_len 32768 to increase the maximum allowed model input length, supporting larger prompts or documents.

@hiento09 hiento09 self-assigned this Aug 12, 2025
Copy link
Member

@Minh141120 Minh141120 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@hiento09 hiento09 merged commit 1518fb0 into dev Aug 12, 2025
1 check passed
@hiento09 hiento09 deleted the fix/dockerfile-inference-model branch August 12, 2025 03:29
jjchen01 added a commit that referenced this pull request Aug 12, 2025
* feat: add ci workflows (#18)

* chore: update jan inference model Dockerfile cmd (#22)

---------

Co-authored-by: hiento09 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants