Skip to content

Conversation

@simone-chen
Copy link
Contributor

@simone-chen simone-chen commented Jan 27, 2026

Overview:

Set cuda_graph_batch_sizes to a logarithmic series of graphs than capturing all graph sizes with step 1.

Details:

  • Refactored the _eval function in the rule engine to support a wider range of python expressions (e.g. list comprehension) for better flexibility. Instead of using jinja's expression compiler, use asteval to evaluate the rules in Python.
    • Why: The jinja expression compiler supports very limited python expression and doesn't support the use case of creating a logarithmic series. Without this change, we must define the series in all extra_engine_args*.yaml.j2.
      >>> env = Environment()
      >>> expr = env.compile_expression("[x * 2 for x in numbers]")
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
        File "/home/simonec/virtual_envs/venv_aic/lib/python3.12/site-packages/jinja2/environment.py", line 812, in compile_expression
          self.handle_exception(source=source)
        File "/home/simonec/virtual_envs/venv_aic/lib/python3.12/site-packages/jinja2/environment.py", line 942, in handle_exception
          raise rewrite_traceback_stack(source=source)
        File "<unknown>", line 1, in template
      jinja2.exceptions.TemplateSyntaxError: expected token ',', got 'for'
    
    • Intention: All computation logic should be centralized in .rule files instead of scattering in jinja templates, ensuring consistency and maintainability.
  • Set cuda_graph_batch_sizes to a logarithmic series of graphs than capturing all graph sizes with step 1. This saves excessive resource consumption at runtime.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 27, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the fix label Jan 27, 2026
…on syntax support

Signed-off-by: Simone Chen <simonec@nvidia.com>
Signed-off-by: Simone Chen <simonec@nvidia.com>
@simone-chen simone-chen force-pushed the simonec/trtc-199-aic-generator-fix-cuda-graph-batch-sizes branch from e8cbb5e to 740ad71 Compare January 27, 2026 20:07
agg enable_mixed_chunk = true

agg_prefill_decode cuda_graph_batch_sizes = ((range(1, max_batch_size + 1) | list) if max_batch_size else [])
agg_prefill_decode cuda_graph_batch_sizes = ([x for x in [1,2,4,8,16,32,64,128,256,512,1024] if x < max_batch_size] + [max_batch_size] if max_batch_size else [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is pending in discussion in this PR: #245.
We need a meeting to finalize this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants