@@ -268,6 +268,196 @@ bash submit.sh --help
268268 - Testing duration $\ge$ 10 mins.
269269 - Sample concatenation permutation is enabled.
270270
271+ ## Plugin System for ` mlperf-inf-mm-q3vl benchmark `
272+
273+ The ` mlperf-inf-mm-q3vl ` package supports a plugin system that allows third-party
274+ packages to register additional subcommands under ` mlperf-inf-mm-q3vl benchmark ` . This
275+ uses Python's standard entry points mechanism.
276+
277+ The purpose of this feature is to allow benchmark result submitters to customize and fit
278+ ` mlperf-inf-mm-q3vl ` to the inference system that they would like to benchmark,
279+ ** without** direct modification to the source code of ` mlperf-inf-mm-q3vl ` which is
280+ frozen after the benchmark being finalized.
281+
282+ ### How it works
283+
284+ 1 . ** Plugin Discovery** : When the CLI starts, it automatically discovers all registered
285+ plugins via the ` mlperf_inf_mm_q3vl.benchmark_plugins ` entry point group.
286+ 2 . ** Plugin Loading** : Each plugin's entry point function is called to retrieve either a
287+ single command or a Typer app.
288+ 3 . ** Command Registration** : The plugin's commands are automatically added to the
289+ ` benchmark ` subcommand group.
290+
291+ ### Example: creating a ` mlperf-inf-mm-q3vl-foo ` plugin package for ` mlperf-inf-mm-q3vl benchmark foo `
292+
293+ #### Step 1: Package Structure
294+
295+ Create a new Python package with the following structure:
296+
297+ ```
298+ mlperf-inf-mm-q3vl-foo/
299+ ├── pyproject.toml
300+ └── src/
301+ └── mlperf_inf_mm_q3vl_foo/
302+ ├── __init__.py
303+ └── plugin.py
304+ ```
305+
306+ #### Step 2: Implement the ` mlperf-inf-mm-q3vl-foo ` plugin
307+
308+ Create your plugin entry point function in ` plugin.py ` :
309+
310+ ``` python
311+ """ Plugin to support benchmarking the Foo inference system."""
312+
313+ from typing import Annotated
314+ from collections.abc import Callable
315+ from loguru import logger
316+ from pydantic_typer import Typer
317+ from typer import Option
318+ from mlperf_inf_mm_q3vl.schema import Settings, Dataset, Endpoint, Verbosity
319+ from mlperf_inf_mm_q3vl.log import setup_loguru_for_benchmark
320+
321+ from .schema import FooEndpoint
322+
323+ def register_foo_benchmark () -> Callable[[Settings, Dataset, FooEndpoint, int , int , Verbosity], None ]:
324+ """ Entry point for the plugin to benchmark the Foo inference system.
325+
326+ This function is called when the CLI discovers the plugin.
327+ It should return either:
328+ - A single command function (decorated with appropriate options)
329+ - A tuple of (Typer app, command name) for more complex hierarchies
330+ """
331+
332+ def benchmark_foo (
333+ * ,
334+ settings : Settings,
335+ dataset : Dataset,
336+ # Add your foo-specific parameters here
337+ foo : FooEndpoint,
338+ custom_param : Annotated[
339+ int ,
340+ Option(help = " Custom parameter for foo backend" ),
341+ ] = 2 ,
342+ random_seed : Annotated[
343+ int ,
344+ Option(help = " The seed for the random number generator." ),
345+ ] = 12345 ,
346+ verbosity : Annotated[
347+ Verbosity,
348+ Option(help = " The verbosity level of the logger." ),
349+ ] = Verbosity.INFO ,
350+ ) -> None :
351+ """ Deploy and benchmark using Foo backend.
352+
353+ This command deploys a model using the Foo backend
354+ and runs the MLPerf benchmark against it.
355+ """
356+ from .deploy import FooDeployer
357+
358+ setup_loguru_for_benchmark(settings = settings, verbosity = verbosity)
359+ logger.info(
360+ f " Start to benchmark the Foo inference system with endpoint spec {} and custom param {} " ,
361+ foo,
362+ custom_param,
363+ )
364+ # Your implementation here
365+ with FooDeployer(endpoint = foo, settings = settings, custom_param = custom_param):
366+ # FooDeployer will make sure that Foo is deployed and currently healthy.
367+ # Run benchmark using the core run_benchmark function
368+ run_benchmark(
369+ settings = settings,
370+ dataset = dataset,
371+ endpoint = vllm,
372+ random_seed = random_seed,
373+ )
374+
375+ # Return the command function
376+ # The entry point name will be used as the subcommand name
377+ return benchmark_foo
378+ ```
379+
380+ #### Step 3: Configure ` pyproject.toml `
381+
382+ Register the plugin in its package's ` pyproject.toml ` :
383+
384+ ``` toml
385+ [project ]
386+ name = " mlperf-inf-mm-q3vl-foo"
387+ version = " 0.1.0"
388+ description = " Enable mlperf-inf-mm-q3vl to benchmark the Foo inference system."
389+ requires-python = " >=3.12"
390+ dependencies = [
391+ " mlperf-inf-mm-q3vl @ git+https://github.com/mlcommons/inference.git#subdirectory=multimodal/qwen3-vl/" ,
392+ # Add your backend-specific dependencies here
393+ ]
394+
395+ [project .entry-points ."mlperf_inf_mm_q3vl .benchmark_plugins" ]
396+ # The key here becomes the subcommand name.
397+ foo = " mlperf_inf_mm_q3vl_foo.plugin:register_foo_benchmark"
398+
399+ [build-system ]
400+ requires = [" setuptools>=80" ]
401+ build-backend = " setuptools.build_meta"
402+ ```
403+
404+ #### Step 4: Install and use ` mlperf-inf-mm-q3vl benchmark foo `
405+
406+ ``` bash
407+ # Install your plugin package
408+ pip install mlperf-inf-mm-q3vl-foo
409+
410+ # The new subcommand is now available
411+ mlperf-inf-mm-q3vl benchmark foo --help
412+ mlperf-inf-mm-q3vl benchmark foo \
413+ --settings-file settings.toml \
414+ --dataset shopify-global-catalogue \
415+ --custom-param 3
416+ ```
417+
418+ #### Advanced: Nested Subcommands
419+
420+ If you want to create multiple subcommands under a single plugin (e.g.,
421+ ` mlperf-inf-mm-q3vl benchmark foo standard ` and
422+ ` mlperf-inf-mm-q3vl benchmark foo optimized ` ), return a tuple of ` (Typer app, name) ` :
423+
424+ ``` python
425+ def register_foo_benchmark () -> tuple[Typer, str ]:
426+ """ Entry point that creates nested subcommands."""
427+ from pydantic_typer import Typer
428+
429+ # Create a Typer app for your plugin
430+ foo_app = Typer(help = " Benchmarking options for the Foo inference systems." )
431+
432+ @foo_app.command (name = " standard" )
433+ def foo_standard (...) -> None :
434+ """ Run standard Foo benchmark."""
435+ # Implementation
436+ ...
437+
438+ @foo_app.command (name = " optimized" )
439+ def foo_optimized (...) -> None :
440+ """ Run optimized Foo benchmark with max performance."""
441+ # Implementation
442+ ...
443+
444+ # Return tuple of (app, command_name)
445+ return (foo_app, " foo" )
446+ ```
447+
448+ This will create:
449+ - ` mlperf-inf-mm-q3vl benchmark foo standard `
450+ - ` mlperf-inf-mm-q3vl benchmark foo optimized `
451+
452+ ### Best Practices
453+
454+ 1 . Dependencies: Declare ` mlperf-inf-mm-q3vl ` as a dependency in your plugin package.
455+ 2 . Documentation: Provide clear docstrings for your plugin commands - they appear in
456+ ` --help ` output.
457+ 3 . Schema Reuse: Reuse the core ` Settings ` , ` Dataset ` , and other schemas from
458+ ` mlperf_inf_mm_q3vl.schema ` for consistency and minimizing boilerplate code.
459+ 4 . Lazy Imports: If your plugin has heavy dependencies, import them inside functions
460+ rather than at module level to avoid slowing down CLI startup
271461
272462## Developer Guide
273463
0 commit comments