Skip to content

perf: short-circuit limited dataset JSON previews#936

Merged
Zigfreidish merged 1 commit into
mainfrom
perf/cron-registered-python-slice-202605121224
May 12, 2026
Merged

perf: short-circuit limited dataset JSON previews#936
Zigfreidish merged 1 commit into
mainfrom
perf/cron-registered-python-slice-202605121224

Conversation

@Zigfreidish
Copy link
Copy Markdown
Collaborator

Summary

  • Short-circuit limited .json dataset previews by incrementally decoding canonical top-level rows / data arrays instead of fully materializing the payload before applying limit.
  • Add focused regression coverage for the limited JSON preview path and edge coverage for fallback behavior.
  • Extend the registered dataset-registry-preview-limit-short-circuit focused commands to include the new regression tests.

Plan or Spec

  • docs/plans/2026-05-08-dataset-json-preview-limit.md
  • Registered probe: dataset-registry-preview-limit-short-circuit in infra/perf/pr_scoped_probes.json.

Commands Run

# Baseline before implementation, from origin/main worktree
PYTHONPATH="$PWD:$PWD/services/mlx-worker-python" uv run --project services/mlx-worker-python python3 scripts/dataset_registry_preview_limit_probe.py
exit 0
{"elapsed_ms_mean": 132.835093, "elapsed_ms_min": 126.472688, "file_count": 50000.0, "peak_bytes_mean": 17588055.286, "rows_returned": 1.0, "sample_count": 7.0}

PYTHONPATH="$PWD:$PWD/services/mlx-worker-python" uv run --project services/mlx-worker-python pytest -q services/mlx-worker-python/tests/test_dataset_registry.py::test_dataset_catalog_json_row_reader_limit_uses_incremental_decode
exit 0
1 passed in 0.14s

# Registered probe test_command for dataset-registry-preview-limit-short-circuit
python3 - <<'PY'
import json, subprocess
from pathlib import Path
cmd=[x for x in json.loads(Path('infra/perf/pr_scoped_probes.json').read_text()) if x['id']=='dataset-registry-preview-limit-short-circuit'][0]['test_command']
raise SystemExit(subprocess.run(cmd, shell=True).returncode)
PY
exit 0
14 passed in 0.55s

# Registered probe coverage_command after adding the helper edge tests
python3 - <<'PY'
import json, subprocess
from pathlib import Path
cmd=[x for x in json.loads(Path('infra/perf/pr_scoped_probes.json').read_text()) if x['id']=='dataset-registry-preview-limit-short-circuit'][0]['coverage_command']
raise SystemExit(subprocess.run(cmd, shell=True).returncode)
PY
exit 0
15 passed in 0.45s
TOTAL 111 0 100%

# Registered probe_command after implementation
python3 - <<'PY'
import json, subprocess
from pathlib import Path
cmd=[x for x in json.loads(Path('infra/perf/pr_scoped_probes.json').read_text()) if x['id']=='dataset-registry-preview-limit-short-circuit'][0]['probe_command']
raise SystemExit(subprocess.run(cmd, shell=True).returncode)
PY
exit 0
{"elapsed_ms_mean": 2.523591, "elapsed_ms_min": 2.352464, "file_count": 50000.0, "peak_bytes_mean": 5361759.571, "rows_returned": 1.0, "sample_count": 7.0}

python3 -m json.tool infra/perf/pr_scoped_probes.json >/tmp/pr_scoped_probes.valid.json
exit 0

git diff --check
exit 0

Coverage and Metrics

  • Changed-scope coverage: TOTAL 111 0 100% from the registered coverage_command.
  • Local Linux registered probe (dataset-registry-preview-limit-short-circuit):
    • old_mean=132.835093 ms, new_mean=2.523591 ms, delta_ms=-130.311502, speedup=52.64x, elapsed_reduction=98.10%.
    • old_peak_bytes_mean=17588055.286, new_peak_bytes_mean=5361759.571, peak_delta_bytes=-12226295.715, peak_reduction=69.51%.
  • CI PR-scoped performance validation is required before merge and should report the registered probe result for this PR.

Known Gaps

  • Local validation is Linux/Python-only. No Swift runtime behavior is touched.
  • The incremental JSON path intentionally handles canonical top-level list, rows, and data array payloads; non-canonical JSON shapes keep the existing full-decode fallback.

Evidence Checklist

  • Relevant plan or spec is identified.
  • Behavior changes are reflected in the relevant docs.
  • Protocol changes include regenerated generated artifacts. N/A: no protocol changes.
  • Dependency changes include updated lockfiles. N/A: no dependency changes.
  • Relevant tests were run.
  • A metrics report is included, or N/A is stated explicitly with the reason.
  • Deferred work and known gaps are stated explicitly.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an incremental JSON decoding mechanism to the dataset registry catalog to optimize row reading when a limit is specified. By using json.JSONDecoder().raw_decode, the system can now extract a specific number of rows from large JSON files without parsing the entire payload, which significantly improves performance for data previews. The changes include the implementation of _limited_rows_from_json_text and _json_text_first_array_start, along with comprehensive unit tests and updated performance probe configurations. The reviewer suggested refactoring the repeated whitespace-skipping logic into a dedicated helper function to improve maintainability.

Comment on lines +666 to +712
def _json_text_first_array_start(json_text: str) -> int | None:
cursor = 0
text_length = len(json_text)
while cursor < text_length and json_text[cursor].isspace():
cursor += 1
if cursor >= text_length:
return None
if json_text[cursor] == "[":
return cursor
if json_text[cursor] != "{":
return None

decoder = json.JSONDecoder()
cursor += 1
while cursor < text_length:
while cursor < text_length and json_text[cursor].isspace():
cursor += 1
if cursor >= text_length or json_text[cursor] == "}":
return None
try:
key, cursor = decoder.raw_decode(json_text, cursor)
except json.JSONDecodeError:
return None
if not isinstance(key, str):
return None
while cursor < text_length and json_text[cursor].isspace():
cursor += 1
if cursor >= text_length or json_text[cursor] != ":":
return None
cursor += 1
while cursor < text_length and json_text[cursor].isspace():
cursor += 1
if key in {"rows", "data"} and cursor < text_length and json_text[cursor] == "[":
return cursor
try:
_, cursor = decoder.raw_decode(json_text, cursor)
except json.JSONDecodeError:
return None
while cursor < text_length and json_text[cursor].isspace():
cursor += 1
if cursor < text_length and json_text[cursor] == ",":
cursor += 1
continue
if cursor < text_length and json_text[cursor] == "}":
return None
return None
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for skipping whitespace, while cursor < text_length and json_text[cursor].isspace(): cursor += 1, is repeated multiple times in this function and also in _limited_rows_from_json_text. To improve maintainability and reduce code duplication, consider extracting this into a small helper function.

For example:

def _skip_whitespace(text: str, cursor: int, text_length: int) -> int:
    while cursor < text_length and text[cursor].isspace():
        cursor += 1
    return cursor

You could then replace the repeated while loops with a call to this helper, like cursor = _skip_whitespace(json_text, cursor, text_length).

@github-actions
Copy link
Copy Markdown

Melix PR Scoped Performance Report

  • Status: ok
  • Changed files: 3
  • Selected probes: 100
  • Direct/gated probes: 3
  • Regressions: 0
  • Context regressions: 6
  • Verification failures: 0

Changed Files

  • infra/perf/pr_scoped_probes.json
  • services/mlx-worker-python/tests/test_dataset_registry.py
  • services/mlx-worker-python/worker/dataset_registry/catalog.py

Dataset registry split match string stem

  • Status: ok
  • Gate: direct
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 134.768 135.923 +1.155 (+0.86%) neutral
path_constructor_calls_mean 0.000 0.000 +0.000 neutral
peak_bytes_mean 42353.400 42353.400 +0.000 (+0.00%) neutral

Hub catalog tag normalization single pass

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 256.365 253.966 -2.399 (-0.94%) neutral
tag_normalization_calls_mean 5000.000 5000.000 +0.000 (+0.00%) neutral

Hub catalog next cursor fast parse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1552.959 1557.598 +4.639 (+0.30%) neutral
cursor_parse_calls_mean 50000.000 50000.000 +0.000 (+0.00%) neutral
peak_bytes_mean 11604.800 11078.400 -526.400 (-4.54%) neutral

Dataset registry snapshot inference single pass

  • Status: ok
  • Gate: direct
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 745.920 739.328 -6.592 (-0.88%) neutral
legacy_inference_helper_calls_mean 0.000 0.000 +0.000 neutral
peak_bytes_mean 1044929.600 1046969.600 +2040.000 (+0.20%) neutral

Multimodal fast-path signature top-level key cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 10424.389 10458.469 +34.080 (+0.33%) neutral

Multimodal preprocessing local URI parse elision

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 285.542 288.021 +2.479 (+0.87%) neutral
urlparse_calls_mean 5000.000 5000.000 +0.000 (+0.00%) neutral
read_bytes_calls_mean 5000.000 5000.000 +0.000 (+0.00%) neutral

Runtime utils kwarg signature cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 19.482 20.146 +0.664 (+3.41%) neutral
inspect_signature_calls_mean 1.000 1.000 +0.000 (+0.00%) neutral

Runtime utils package version cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 8.514 8.760 +0.246 (+2.89%) neutral
metadata_version_calls_mean 3.000 3.000 +0.000 (+0.00%) neutral

Runtime utils top-level weight streaming

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 459.490 457.937 -1.553 (-0.34%) neutral
peak_bytes_mean 216650.000 216650.000 +0.000 (+0.00%) neutral

MLX text stop kwarg signature cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 29.195 27.547 -1.648 (-5.64%) improvement
inspect_signature_calls_mean 2.000 2.000 +0.000 (+0.00%) neutral
stream_signature_calls_mean 1.000 1.000 +0.000 (+0.00%) neutral

MLX text stop-filter prefix cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1472.068 1458.676 -13.392 (-0.91%) neutral
peak_bytes_mean 500956.000 500956.000 +0.000 (+0.00%) neutral
prefix_length_computations_mean 1.000 1.000 +0.000 (+0.00%) neutral

Text family config copy elision

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 273.526 282.305 +8.780 (+3.21%) neutral
peak_bytes_mean 3884.200 3884.200 +0.000 (+0.00%) neutral
config_copy_calls_mean 0.000 0.000 +0.000 neutral

MLX audio speech signature cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 10.356 10.399 +0.043 (+0.41%) neutral
inspect_signature_calls_mean 0.000 0.000 +0.000 neutral

Stream assembler parser-mode cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 4.118 4.057 -0.061 (-1.47%) neutral

Deterministic image edit digest reuse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 15.093 15.967 +0.874 (+5.79%) neutral
digest_calls_mean 2.000 2.000 +0.000 (+0.00%) neutral

Deterministic image output byte accounting

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 19.840 19.590 -0.250 (-1.26%) neutral
output_byte_scan_calls_mean 0.000 0.000 +0.000 neutral

Deterministic embedding duplicate input cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 5.285 5.493 +0.208 (+3.94%) neutral
embed_text_calls_mean 512.000 512.000 +0.000 (+0.00%) neutral

Stream assembler structural prefix cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 975.134 980.758 +5.624 (+0.58%) neutral
peak_bytes_mean 182.000 182.000 +0.000 (+0.00%) neutral
prefix_identity_hits 1750000.000 1750000.000 +0.000 (+0.00%) neutral

Stream assembler token-byte fast decode

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1851.526 1900.667 +49.141 (+2.65%) neutral
peak_bytes_mean 6128021.000 6128021.000 +0.000 (+0.00%) neutral
generated_token_count_mean 80000.000 80000.000 +0.000 (+0.00%) neutral

Benchmark evaluation report running aggregates

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (98.0%)
Metric Base Head Delta Status
load_input_ms_mean 7.562 6.470 -1.092 (-14.44%) improvement
elapsed_ms_mean 349.932 341.983 -7.949 (-2.27%) neutral
peak_bytes_mean 13625369.600 13625374.400 +4.800 (+0.00%) neutral

Benchmark export run-scan single pass

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 92.438 90.742 -1.696 (-1.83%) neutral
per_run_ms_mean 0.385 0.378 -0.007 (-1.83%) neutral
csv_elapsed_ms_mean 2.136 2.153 +0.017 (+0.82%) neutral

Benchmark queue decoded-record cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
cold_elapsed_ms 7.084 6.802 -0.282 (-3.99%) neutral
warm_elapsed_ms_mean 0.992 0.956 -0.037 (-3.68%) neutral
warm_json_loads_mean 0.000 0.000 +0.000 neutral

Benchmark store matrix streaming

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
peak_bytes_mean 178410.700 178482.700 +72.000 (+0.04%) neutral

Closure audit probe-source short circuit

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 3.488 3.504 +0.016 (+0.46%) neutral
peak_bytes_mean 33427.000 33355.000 -72.000 (-0.22%) neutral
probe_file_reads_mean 1.000 1.000 +0.000 (+0.00%) neutral

Phase8 metrics closure-audit reuse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1.055 1.054 -0.001 (-0.05%) neutral
closure_audit_calls_mean 1.000 1.000 +0.000 (+0.00%) neutral
sample_count 5.000 5.000 +0.000 (+0.00%) neutral

Evaluation job-id high-water mark

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 20.206 19.531 -0.675 (-3.34%) neutral
per_call_ms_mean 0.101 0.098 -0.003 (-3.34%) neutral

Evaluation sample probe aggregation

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 23.149 23.040 -0.109 (-0.47%) neutral
per_call_ms_mean 0.001 0.001 -0.000 (-0.43%) neutral

Evaluation answer normalization fast path

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 659.494 655.666 -3.827 (-0.58%) neutral
numeric_extract_calls_mean 300.000 300.000 +0.000 (+0.00%) neutral
option_extract_calls_mean 300.000 300.000 +0.000 (+0.00%) neutral

Evaluation latency percentile vector reuse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 263.803 263.952 +0.149 (+0.06%) neutral
sorted_calls_mean 1.000 1.000 +0.000 (+0.00%) neutral

Evaluation dialogue diagnostics top-k

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 330.449 327.581 -2.867 (-0.87%) neutral
peak_bytes_mean 3132185.600 3132185.600 +0.000 (+0.00%) neutral

Evaluation compare target lookup early stop

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1.760 1.758 -0.002 (-0.13%) neutral
get_loaded_model_calls_mean 12.000 12.000 +0.000 (+0.00%) neutral

Evaluation store compare summary CSV streaming

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 476.231 477.064 +0.833 (+0.17%) neutral
peak_bytes_mean 25997.000 25997.000 +0.000 (+0.00%) neutral

Evaluation store samples CSV streaming

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1675.905 1583.773 -92.132 (-5.50%) improvement
peak_bytes_mean 160481.700 160388.300 -93.400 (-0.06%) neutral

Evaluation compare target lookup short-circuit

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 10.867 10.866 -0.001 (-0.01%) neutral
get_loaded_model_calls_mean 3.000 3.000 +0.000 (+0.00%) neutral

Evaluation final-result cache-hit materialization

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 2.377 2.352 -0.025 (-1.06%) neutral
peak_bytes_mean 2102670.000 2102670.000 +0.000 (+0.00%) neutral
sample_count 15000.000 15000.000 +0.000 (+0.00%) neutral
read_rows_calls_mean 0.000 0.000 +0.000 neutral
cache_hit_count 5.000 5.000 +0.000 (+0.00%) neutral

Evaluation final-result JSON typed-score running aggregate

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1327.869 1295.347 -32.522 (-2.45%) neutral
peak_bytes_mean 871437.000 871437.000 +0.000 (+0.00%) neutral
score_checksum 35.000 35.000 +0.000 (+0.00%) neutral
key_count 2000.000 2000.000 +0.000 (+0.00%) neutral
iteration_count 40.000 40.000 +0.000 (+0.00%) neutral

Training config target-module cache

  • Status: regression
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 2229.685 2253.907 +24.222 (+1.09%) neutral
peak_bytes_mean 1675.143 1915.143 +240.000 (+14.33%) regression
checksum 900000.000 900000.000 +0.000 (+0.00%) neutral
iteration_count 50000.000 50000.000 +0.000 (+0.00%) neutral
case_count 4.000 4.000 +0.000 (+0.00%) neutral

LoRA experiment run-dir name scan

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
peak_bytes_mean 3028014.000 3028014.000 +0.000 (+0.00%) neutral
path_attr_reads_mean 0.000 0.000 +0.000 neutral
run_dir_count 8000.000 8000.000 +0.000 (+0.00%) neutral

Training dataset quality/token summary

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 349.712 347.509 -2.203 (-0.63%) neutral
peak_bytes_mean 2202812.000 2202812.000 +0.000 (+0.00%) neutral

Training dataset validation split partial selection

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 799.310 800.561 +1.251 (+0.16%) neutral
peak_bytes_mean 620968.000 620968.000 +0.000 (+0.00%) neutral
validation_count 1500.000 1500.000 +0.000 (+0.00%) neutral
checksum 22440493.000 22440493.000 +0.000 (+0.00%) neutral

Training dataset validation sample-limit loading

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 0.536 0.544 +0.008 (+1.43%) neutral
peak_bytes_mean 29710.200 29710.200 +0.000 (+0.00%) neutral
validation_sample_count_mean 1.000 1.000 +0.000 (+0.00%) neutral

Maintenance bench report readback

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 51.280 47.105 -4.175 (-8.14%) improvement
bench_report_read_calls_mean 0.000 0.000 +0.000 neutral
request_id_calls_mean 1.000 1.000 +0.000 (+0.00%) neutral

Maintenance percentile vector reuse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 306.409 294.174 -12.235 (-3.99%) neutral
sort_calls_mean 300.000 300.000 +0.000 (+0.00%) neutral

Maintenance prompt shape vector repeat

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 151.034 148.771 -2.263 (-1.50%) neutral
token_count_mean 5160960.000 5160960.000 +0.000 (+0.00%) neutral

Maintenance benchmark parameter normalization single convert

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 848.811 832.303 -16.508 (-1.94%) neutral
calls_per_value_mean 1.000 1.000 +0.000 (+0.00%) neutral
peak_bytes_mean 814430.600 814430.600 +0.000 (+0.00%) neutral

Upload receipt published-files scandir

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 5.713 5.744 +0.030 (+0.53%) neutral
special_entry_follow_dir_checks_mean 0.000 0.000 +0.000 neutral

Download pipeline snapshot manifest base reuse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 73.806 72.815 -0.992 (-1.34%) neutral
elapsed_ms_min 72.296 71.243 -1.052 (-1.46%) neutral

Worker registry resident-bytes accumulator

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 0.022 0.023 +0.001 (+4.74%) neutral
loaded_model_listing_elapsed_ms_mean 8.572 8.337 -0.235 (-2.74%) neutral
loaded_model_listing_sort_calls_mean 1.000 1.000 +0.000 (+0.00%) neutral
request_stats_elapsed_ms_mean 0.015 0.015 +0.001 (+3.48%) neutral

Job registry derived-model single pass

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
active_manifest_elapsed_ms_mean 0.019 0.011 -0.008 (-43.11%) improvement
resolve_target_elapsed_ms_mean 0.002 0.002 -0.000 (-4.79%) neutral
manifest_path_elapsed_ms_mean 0.034 0.033 -0.000 (-1.40%) neutral
restore_elapsed_ms_mean 49.922 48.133 -1.789 (-3.58%) neutral

MLX-LM structured result tail parse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 0.165 0.165 +0.000 (+0.13%) neutral
peak_bytes_mean 1767.000 1781.400 +14.400 (+0.81%) neutral

MLX-VLM family-config cache

  • Status: regression
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 6.843 10.124 +3.281 (+47.95%) regression
resolve_calls_mean 1.000 1.000 +0.000 (+0.00%) neutral

MLX-VLM Gemma4 weight-presence single-pass scan

  • Status: regression
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1016.429 986.719 -29.709 (-2.92%) neutral
peak_bytes_mean 188.800 198.400 +9.600 (+5.08%) regression
visited_names_mean 1999960.000 1999960.000 +0.000 (+0.00%) neutral

Job registry restore sort elision

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
restore_elapsed_ms_mean 130.003 129.247 -0.755 (-0.58%) neutral
per_manifest_ms_mean 0.009 0.009 -0.000 (-0.58%) neutral

Deterministic rerank query-context reuse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 4.746 4.068 -0.678 (-14.29%) improvement
query_context_builds_mean 1.000 1.000 +0.000 (+0.00%) neutral
score_calls_mean 64.000 64.000 +0.000 (+0.00%) neutral
tokenize_calls_mean 65.000 65.000 +0.000 (+0.00%) neutral
unique_document_count 64.000 64.000 +0.000 (+0.00%) neutral

Rerank core bounded top-k selection

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 2477.928 2480.137 +2.209 (+0.09%) neutral
peak_bytes_mean 710.857 710.857 +0.000 (+0.00%) neutral

PR-scoped performance registry cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
load_probe_registry_ms_mean 7.110 7.229 +0.120 (+1.69%) neutral
cold_load_probe_registry_ms_mean 156.393 157.506 +1.114 (+0.71%) neutral
build_scope_report_ms_mean 8.058 8.066 +0.008 (+0.10%) neutral

PR-scoped performance scope changed-files JSON read bytes

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 0.934 0.935 +0.001 (+0.07%) neutral
elapsed_ms_min 0.858 0.853 -0.005 (-0.56%) neutral

PR-scoped performance scope matcher

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
build_scope_report_ms_mean 2.684 2.178 -0.505 (-18.83%) improvement
selected_probe_count_mean 7.000 7.000 +0.000 (+0.00%) neutral
force_all_selected_mean 0.000 0.000 +0.000 neutral

PR-scoped performance report results scandir

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 34.057 33.697 -0.359 (-1.05%) neutral
elapsed_ms_min 32.120 31.223 -0.897 (-2.79%) neutral

Package macOS fallback build product scandir

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1.205 1.212 +0.008 (+0.64%) neutral
elapsed_ms_min 1.174 1.186 +0.011 (+0.97%) neutral
cli_elapsed_ms_mean 1.198 1.205 +0.008 (+0.67%) neutral
cli_elapsed_ms_min 1.177 1.188 +0.011 (+0.93%) neutral

Dev-up MLX Metal dist-info scandir

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1.689 1.687 -0.003 (-0.15%) neutral
elapsed_ms_min 1.667 1.662 -0.005 (-0.28%) neutral

Quantization gate manifest event streaming

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
events_consumed_mean 12.000 12.000 +0.000 (+0.00%) neutral
elapsed_ms_mean 0.241 0.234 -0.007 (-2.81%) neutral
elapsed_ms_min 0.207 0.196 -0.011 (-5.52%) improvement

Model ops bundle artifact byte accounting

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 2.325 1.972 -0.353 (-15.18%) improvement
bundle_scandir_calls_mean 0.000 0.000 +0.000 neutral

Model registry plain-local generation-config stat elision

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 127.779 127.928 +0.149 (+0.12%) neutral
generation_config_stat_calls_mean 0.000 0.000 +0.000 neutral
manifest_is_file_calls_mean 0.000 0.000 +0.000 neutral
config_load_calls_mean 400.000 400.000 +0.000 (+0.00%) neutral
manifest_parse_calls_mean 0.000 0.000 +0.000 neutral

Real model support HF cache latest snapshot

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 19.784 19.606 -0.178 (-0.90%) neutral
peak_bytes_mean 4733.900 4733.900 +0.000 (+0.00%) neutral

Swift CLI JSON envelope encoding

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass
Metric Base Head Delta Status
elapsed_ms_mean 349959.085 53689.178 -296269.907 (-84.66%) improvement

Code evaluation code-block last-match streaming

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 0.109 0.105 -0.003 (-2.85%) neutral
peak_bytes_mean 305.000 305.000 +0.000 (+0.00%) neutral

Code evaluation payload JSON byte loading

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 258.427 258.706 +0.279 (+0.11%) neutral
peak_bytes_mean 60425.000 60425.000 +0.000 (+0.00%) neutral

Code evaluation stdio tail single stat

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
stdio_stat_calls_mean 6000.000 6000.000 +0.000 (+0.00%) neutral
elapsed_ms_mean 44.076 43.982 -0.094 (-0.21%) neutral
sandbox_profile_elapsed_ms_mean 73.228 72.653 -0.575 (-0.78%) neutral
sandbox_profile_static_builds_mean 1.000 1.000 +0.000 (+0.00%) neutral

Code evaluation runner script cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 68.561 67.903 -0.657 (-0.96%) neutral
dedent_calls_mean 1.000 1.000 +0.000 (+0.00%) neutral
peak_bytes_mean 32731.143 32731.143 +0.000 (+0.00%) neutral
identity_reuse_mean 1.000 1.000 +0.000 (+0.00%) neutral
config_load_elapsed_ms_mean 68.977 68.643 -0.335 (-0.49%) neutral

Code evaluation fallback test count line scan

  • Status: regression
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1.920 2.155 +0.235 (+12.21%) regression
peak_bytes_mean 24991.714 24991.714 +0.000 (+0.00%) neutral

Code evaluation nonblank test count streaming

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
peak_bytes_mean 112.000 112.000 +0.000 (+0.00%) neutral
elapsed_ms_mean 49.967 50.686 +0.718 (+1.44%) neutral
nonblank_line_count_mean 48000.000 48000.000 +0.000 (+0.00%) neutral

Changed-scope coverage empty-path short circuit

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 0.180 0.179 -0.002 (-1.03%) neutral
source_read_calls_mean 0.000 0.000 +0.000 neutral

Changed-scope coverage diff parser

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 3.454 3.470 +0.016 (+0.46%) neutral
changed_line_count 7680.000 7680.000 +0.000 (+0.00%) neutral
line_count 16080.000 16080.000 +0.000 (+0.00%) neutral

LoRA reward summary candidate min/max reuse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 47.447 48.434 +0.987 (+2.08%) neutral
sorted_calls_mean 2.000 2.000 +0.000 (+0.00%) neutral

Deterministic embedding projection allocation

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 20.274 20.157 -0.117 (-0.58%) neutral
peak_bytes_mean 66938.333 66938.333 +0.000 (+0.00%) neutral

Deterministic OCR token count scan

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 301.906 299.116 -2.790 (-0.92%) neutral
peak_bytes_mean 163.200 163.200 +0.000 (+0.00%) neutral

Deterministic VLM completion token scan

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
split_calls_mean 0.000 0.000 +0.000 neutral
elapsed_ms_mean 60.786 59.955 -0.831 (-1.37%) neutral
peak_bytes_mean 111103.600 111998.000 +894.400 (+0.81%) neutral
completion_tokens 6000.000 6000.000 +0.000 (+0.00%) neutral

Statistical evidence bootstrap single sort

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 154.572 154.907 +0.335 (+0.22%) neutral
peak_bytes_mean 47134.400 47206.400 +72.000 (+0.15%) neutral
sorted_calls_mean 0.000 0.000 +0.000 neutral

Statistical evidence category breakdown single pass

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 13.109 13.271 +0.162 (+1.24%) neutral
peak_bytes_mean 16456.000 16456.000 +0.000 (+0.00%) neutral

Video preprocessing URI byte length reuse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1115.293 1114.513 -0.780 (-0.07%) neutral
byte_length_getattrs_per_call 1.000 1.000 +0.000 (+0.00%) neutral
parse_calls_per_call 1.000 1.000 +0.000 (+0.00%) neutral

Startup signals lazy worker log excerpts

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
conflict_elapsed_ms_mean 1.406 1.405 -0.001 (-0.04%) neutral
conflict_log_reads_mean 0.000 0.000 +0.000 neutral
control_crash_elapsed_ms_mean 12.280 11.839 -0.440 (-3.59%) neutral
control_crash_log_reads_mean 1.000 1.000 +0.000 (+0.00%) neutral
direct_control_crash_elapsed_ms_mean 1.615 1.527 -0.088 (-5.43%) neutral
direct_control_crash_log_reads_mean 0.000 0.000 +0.000 neutral
worker_crash_elapsed_ms_mean 12.257 12.047 -0.211 (-1.72%) neutral
worker_crash_log_reads_mean 1.000 1.000 +0.000 (+0.00%) neutral
tail_scan_elapsed_ms_mean 134.162 134.660 +0.499 (+0.37%) neutral
tail_scan_peak_bytes_mean 21436.200 21436.200 +0.000 (+0.00%) neutral
trailing_whitespace_bytes 80000.000 80000.000 +0.000 (+0.00%) neutral

Startup version comparison single pass

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 23.529 23.558 +0.029 (+0.12%) neutral
peak_bytes_mean 112.000 112.000 +0.000 (+0.00%) neutral
comparison_total -48.000 -48.000 +0.000 (-0.00%) neutral

Release gates M9 failure count single pass

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 6.856 6.948 +0.092 (+1.33%) neutral
endswith_checks_mean 0.000 0.000 +0.000 neutral
failure_count_mean 12800.000 12800.000 +0.000 (+0.00%) neutral

Event extraction alignment accepted-edge cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 2146.798 2096.346 -50.452 (-2.35%) neutral
similarity_elapsed_ms_mean 1.631 1.624 -0.007 (-0.40%) neutral
accepted_edges 28.000 28.000 +0.000 (+0.00%) neutral

Event extraction semantic value-group cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 9966.289 9666.536 -299.753 (-3.01%) neutral
peak_bytes_mean 23414.400 23425.600 +11.200 (+0.05%) neutral
combination_build_calls_mean 8.000 8.000 +0.000 (+0.00%) neutral
group_count_per_sample 21200000.000 21200000.000 +0.000 (+0.00%) neutral

Event extraction group actor alias cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 5.337 5.131 -0.205 (-3.85%) neutral
peak_bytes_mean 5333.200 5333.200 +0.000 (+0.00%) neutral
normalize_calls_mean 0.600 0.600 +0.000 (+0.00%) neutral
value_count 200.000 200.000 +0.000 (+0.00%) neutral

Event extraction fenced JSON trim

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1268.698 1285.242 +16.544 (+1.30%) neutral
peak_bytes_mean 3227320.800 3227320.800 +0.000 (+0.00%) neutral
event_count 1600.000 1600.000 +0.000 (+0.00%) neutral

MLX audio WAV PCM streaming

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 360.964 364.769 +3.805 (+1.05%) neutral
peak_bytes_mean 699680.600 699680.600 +0.000 (+0.00%) neutral

MLX audio local URI zero-copy preprocess

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 0.178 0.176 -0.002 (-1.13%) neutral
peak_bytes_mean 2478.800 2522.000 +43.200 (+1.74%) neutral
local_uri_read_bytes_calls_mean 0.000 0.000 +0.000 neutral

Training dataset chunker top-level base copy

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 24.367 24.588 +0.221 (+0.91%) neutral
peak_bytes_mean 774473.714 774473.714 +0.000 (+0.00%) neutral

MLX audio generate signature cache

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
signature_calls_mean 0.000 0.000 +0.000 neutral
elapsed_ms_mean 69.951 69.052 -0.899 (-1.29%) neutral

Dataset registry preview limit short-circuit

  • Status: ok
  • Gate: direct
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 156.706 2.286 -154.420 (-98.54%) improvement
peak_bytes_mean 17588183.286 5361949.714 -12226233.572 (-69.51%) improvement

Hub catalog size hint regex precompile

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 1796.741 1774.073 -22.668 (-1.26%) neutral
size_hint_calls_mean 40000.000 40000.000 +0.000 (+0.00%) neutral

Multimodal preprocessing image URI single parse

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 47.925 48.965 +1.039 (+2.17%) neutral
urlparse_calls_mean 320.000 320.000 +0.000 (+0.00%) neutral

Quantization indexed shard min single pass

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 2.655 2.712 +0.057 (+2.13%) neutral
sorted_calls_mean 0.000 0.000 +0.000 neutral
peak_bytes_mean 232886.800 232949.200 +62.400 (+0.03%) neutral

Quantization QAT source scan scandir

  • Status: regression
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 28.754 28.834 +0.080 (+0.28%) neutral
rglob_calls_mean 0.000 0.000 +0.000 neutral
peak_bytes_mean 1388809.200 1388814.000 +4.800 (+0.00%) neutral
source_stats_elapsed_ms_mean 811.426 961.243 +149.818 (+18.46%) regression
source_stats_peak_bytes_mean 2055170.000 2055448.400 +278.400 (+0.01%) neutral
source_stats_byte_count 4000000.000 4000000.000 +0.000 (+0.00%) neutral

Engine generate usage token elision

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
prompt_token_count_calls_per_request 0.000 0.000 +0.000 neutral
request_state_append_calls_per_request 0.000 0.000 +0.000 neutral
elapsed_ms_mean 20.580 20.616 +0.036 (+0.17%) neutral
fallback_elapsed_ms_mean 296.090 297.714 +1.624 (+0.55%) neutral
fallback_peak_bytes_mean 73287.400 72883.200 -404.200 (-0.55%) neutral

Vision family prompt token count scan

  • Status: ok
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
split_calls_mean 0.000 0.000 +0.000 neutral
peak_bytes_mean 235.429 235.429 +0.000 (+0.00%) neutral
token_count 1309.000 1309.000 +0.000 (+0.00%) neutral

Integration Swift binary resolution scandir fallback

  • Status: regression
  • Gate: context
  • Targeted tests: pass
  • Coverage: pass (100.0%)
Metric Base Head Delta Status
elapsed_ms_mean 89.407 86.421 -2.986 (-3.34%) neutral
delta_ms_mean -76.891 -73.117 +3.774 (-4.91%) regression
peak_bytes_mean 3700.000 3743.200 +43.200 (+1.17%) neutral
candidate_count 1501.000 1501.000 +0.000 (+0.00%) neutral

@Zigfreidish Zigfreidish merged commit b5515b3 into main May 12, 2026
112 checks passed
@Zigfreidish Zigfreidish deleted the perf/cron-registered-python-slice-202605121224 branch May 12, 2026 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant