Skip to content

Commit 1ebbf10

Browse files
lalaluneclaude
andcommitted
wip(chip,robot): post-merge background churn — npu/asap7 evidence, robot bridge backends
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b29e424 commit 1ebbf10

17 files changed

Lines changed: 762 additions & 37 deletions

packages/chip/docs/arch/npu.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -827,7 +827,7 @@ executing; it rejects sequences whose GEMM preamble, descriptor submission, or
827827
and rejects completion metadata that does not require the done
828828
bit or reject the error bit. `stage_prepared_descriptor_batch` validates an
829829
`eliza.e1_npu_prepared_descriptor_batch.v1` package, checks `descriptor_base`,
830-
`arena_base`, `arena_total_bytes`, `arena_alignment_bytes`,
830+
`batch_index`, `arena_base`, `arena_total_bytes`, `arena_alignment_bytes`,
831831
`required_runtime_steps`, `descriptor_memory_writes`, and
832832
`mmio_preamble_writes` against the packaged `descriptor_image` and
833833
`op_mmio_preamble`, then returns
@@ -839,7 +839,7 @@ returns `eliza.e1_npu_prepared_descriptor_execution_batches_stage_result.v1`.
839839
Before writing descriptor memory or MMIO, it checks every descriptor image base
840840
and `DESC_BASE` submission value against the package-level `descriptor_base +
841841
execution_batch_index * descriptor_stride_bytes` contract, checks
842-
`arena_base consistency` and arena sizing across the outer package, inner packages, and
842+
`batch_index`/`execution_batch_index` identity, `arena_base consistency` and arena sizing across the outer package, inner packages, and
843843
descriptor images, checks `required_runtime_steps` on the outer and inner packages, and checks
844844
`descriptor_memory_writes` exactly match the packaged `descriptor_image`.
845845
It also checks `mmio_preamble_writes` match `op_mmio_preamble`, including op

packages/chip/docs/evidence/process/asap7/big_core_shell_shape.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
"mapper": "abc",
1515
"memory_inference": true,
1616
"param_overrides": [],
17-
"wall_clock_s": 22.52
17+
"wall_clock_s": 18.04
1818
},
1919
"abc_target_period_ps": 4000,
2020
"gate_count_total": 63814,

packages/chip/docs/npu/2028-targets.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ the done bit, requires the done bit, and rejects the error bit. The
9191
`stage_prepared_descriptor_batch` helper validates
9292
`eliza.e1_npu_prepared_descriptor_batch.v1` packages and returns
9393
`eliza.e1_npu_prepared_descriptor_batch_stage_result.v1` after checking
94-
`arena_base`, `arena_total_bytes`, `arena_alignment_bytes`,
94+
`batch_index`, `arena_base`, `arena_total_bytes`, `arena_alignment_bytes`,
9595
`required_runtime_steps`, `descriptor_base`, `descriptor_memory_writes`, and
9696
`mmio_preamble_writes` against the packaged `descriptor_image` and
9797
`op_mmio_preamble`. The
@@ -100,7 +100,8 @@ the done bit, requires the done bit, and rejects the error bit. The
100100
`eliza.e1_npu_prepared_descriptor_execution_batches_stage_result.v1` after
101101
checking each descriptor image and `DESC_BASE` submission against
102102
`descriptor_base + execution_batch_index * descriptor_stride_bytes`, and
103-
checking `arena_base consistency`, arena sizing, `required_runtime_steps`, and
103+
checking `batch_index`/`execution_batch_index` identity, `arena_base consistency`,
104+
arena sizing, `required_runtime_steps`, and
104105
`descriptor_memory_writes` exactly match the packaged
105106
`descriptor_image`. It also checks `mmio_preamble_writes` match
106107
`op_mmio_preamble` before staging. The

packages/chip/docs/spec-db/e1-npu-runtime-contract.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -796,7 +796,7 @@
796796
"descriptor_type": "NpuStreamDescriptor",
797797
"max_entries": 7,
798798
"descriptor_bytes": 16,
799-
"staging": "CommandBuffer.descriptor_image returns a deterministic word-addressed descriptor image and CommandBuffer.stage writes that image through a caller-provided 32-bit memory writer before submit arms the existing descriptor ring; stage_host_runtime_sequence validates GEMM preamble, descriptor submission, and completion_poll register metadata labels/addresses plus completion_poll requires_done_bit/rejects_error_bit metadata, replays a prepared-batch host_runtime_sequence through caller-provided MMIO and descriptor-memory writers, and returns eliza.e1_npu_host_runtime_sequence_stage_result.v1 without polling or executing; stage_prepared_descriptor_batch validates an eliza.e1_npu_prepared_descriptor_batch.v1 package, checks arena_base, arena_total_bytes, arena_alignment_bytes, required_runtime_steps, descriptor_base, descriptor_memory_writes, and mmio_preamble_writes against descriptor_image/op_mmio_preamble metadata, and returns eliza.e1_npu_prepared_descriptor_batch_stage_result.v1; stage_prepared_descriptor_execution_batches validates eliza.e1_npu_prepared_descriptor_execution_batches.v1 packages, checks arena_base consistency and arena sizing across the outer package, inner batches, and descriptor images, checks required_runtime_steps, checks descriptor_base plus execution_batch_index times descriptor_stride_bytes against each descriptor image and DESC_BASE submission write, checks descriptor_memory_writes match descriptor_image, checks mmio_preamble_writes match op_mmio_preamble before staging, stages each ordered execution-batch host_runtime_sequence, and returns eliza.e1_npu_prepared_descriptor_execution_batches_stage_result.v1",
799+
"staging": "CommandBuffer.descriptor_image returns a deterministic word-addressed descriptor image and CommandBuffer.stage writes that image through a caller-provided 32-bit memory writer before submit arms the existing descriptor ring; stage_host_runtime_sequence validates GEMM preamble, descriptor submission, and completion_poll register metadata labels/addresses plus completion_poll requires_done_bit/rejects_error_bit metadata, replays a prepared-batch host_runtime_sequence through caller-provided MMIO and descriptor-memory writers, and returns eliza.e1_npu_host_runtime_sequence_stage_result.v1 without polling or executing; stage_prepared_descriptor_batch validates an eliza.e1_npu_prepared_descriptor_batch.v1 package, checks batch_index, arena_base, arena_total_bytes, arena_alignment_bytes, required_runtime_steps, descriptor_base, descriptor_memory_writes, and mmio_preamble_writes against descriptor_image/op_mmio_preamble metadata, and returns eliza.e1_npu_prepared_descriptor_batch_stage_result.v1; stage_prepared_descriptor_execution_batches validates eliza.e1_npu_prepared_descriptor_execution_batches.v1 packages, checks batch_index/execution_batch_index identity, arena_base consistency and arena sizing across the outer package, inner batches, and descriptor images, checks required_runtime_steps, checks descriptor_base plus execution_batch_index times descriptor_stride_bytes against each descriptor image and DESC_BASE submission write, checks descriptor_memory_writes match descriptor_image, checks mmio_preamble_writes match op_mmio_preamble before staging, stages each ordered execution-batch host_runtime_sequence, and returns eliza.e1_npu_prepared_descriptor_execution_batches_stage_result.v1",
800800
"completion": "Submit arms DESC_BASE/DESC_HEAD/DESC_TAIL once through submit_descriptors and waits for one descriptor completion proof for the staged batch",
801801
"claim_boundary": "command_buffer_descriptor_batching_smoke_only_not_scheduler_iommu_or_production_dma_runtime",
802802
"not_claimed": "No graph scheduler, dependency tracking, coherent IOMMU staging, production DMA allocator, Android delegate command stream, or queue-depth proof beyond the current 3-bit local RTL ring is implemented"

packages/chip/pd/asap7/README.md

Lines changed: 46 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -40,34 +40,41 @@ project those shapes to TSMC N2P / A14 / Intel 14A class using
4040

4141
Two flow modes coexist in this lane:
4242

43-
1. **ORFS post-route** (full PnR) — drives `big_core_shell`, `npu_tile`,
44-
`slc_slice`, `npu_tile_rf_leaf`. Gated by an ORFS local checkout or docker
45-
image. The operator runs ORFS for the block and copies the post-route
46-
shape JSON into `docs/evidence/process/asap7/<block>_shape.json`.
47-
2. **Yosys + ABC synth-only** (no ORFS dependency) — drives `tage_table` and
48-
any other leaf block whose `config.asap7.yaml` entry sets
49-
`flow_mode: yosys_abc_synth_only`. The runner invokes
50-
`scripts/run_asap7_leaf_synth.py`, which:
43+
1. **Yosys + ABC synth-only** (no ORFS dependency, default) — drives every
44+
block currently declared in `config.asap7.yaml`: `tage_table`,
45+
`npu_tile_rf_leaf`, `npu_tile`, `big_core_shell`, `slc_slice`. The runner
46+
invokes `scripts/run_asap7_leaf_synth.py`, which:
5147
1. unpacks the ASAP7 7p5t RVT TT NLDM libraries from
5248
`external/pdks/asap7/asap7sc7p5t_27/LIB/NLDM/*.lib.7z` into
5349
`build/asap7/lib/` (via `scripts/extract_asap7_libs.py` + the
5450
bundled `py7zr`),
5551
2. runs `yosys 0.64 + slang` with the per-block `synth_params`
56-
overrides,
52+
overrides and the per-block `rtl_top` as the SystemVerilog top,
5753
3. ABC-maps the design with `abc -fast` and the ORFS-published
5854
`DONT_USE_CELLS` exclusion set
5955
(`*x1p*_ASAP7*`, `*xp*_ASAP7*`, `SDF*`, `ICG*`),
6056
4. emits a shape JSON tagged
6157
`evidence_class: predictive_finfet_shape_only_not_signoff` that the
6258
downstream `scripts/project_ppa_to_n2p.py` ingests verbatim.
59+
2. **ORFS post-route** (full PnR, opt-in) — available for any block whose
60+
`config.asap7.yaml` entry omits `flow_mode: yosys_abc_synth_only`. Gated
61+
by an ORFS local checkout or docker image. The operator runs ORFS for the
62+
block and copies the post-route shape JSON into
63+
`docs/evidence/process/asap7/<block>_shape.json`. No block currently
64+
ships with this mode by default; it is the upgrade path once full PnR
65+
is required for a leaf.
6366

6467
```sh
65-
make -C pd/asap7 check # preflight: PDK reachable?
66-
make -C pd/asap7 clone-asap7 # one-shot ASAP7 PDK clone
67-
make -C pd/asap7 clone-orfs # one-shot ORFS clone (tier-1 blocks)
68-
make -C pd/asap7 all # run every ORFS block
69-
make -C pd/asap7 leaf-shape MODULE=tage_table # yosys+ABC synth-only leaf shape
70-
make ppa-projection # project all shapes to N2P/A14/Intel-14A/SF2P
68+
make -C pd/asap7 check # preflight: PDK reachable?
69+
make -C pd/asap7 clone-asap7 # one-shot ASAP7 PDK clone
70+
make -C pd/asap7 clone-orfs # one-shot ORFS clone (opt-in PnR)
71+
make -C pd/asap7 leaf-shape MODULE=tage_table # yosys+ABC synth-only leaf shape
72+
make -C pd/asap7 leaf-shape MODULE=npu_tile_rf_leaf # NPU weight-buffer SRAM leaf shape
73+
make -C pd/asap7 leaf-shape MODULE=npu_tile # full e1_npu monolithic tile shape
74+
make -C pd/asap7 leaf-shape MODULE=big_core_shell # CPU subsystem stub leaf shape
75+
make -C pd/asap7 leaf-shape MODULE=slc_slice # SLC bank slice shape (shrunk geom)
76+
make -C pd/asap7 big_core_shell-shape # equivalent per-block target
77+
make ppa-projection # project all shapes to N2P/A14/Intel-14A/SF2P
7178
```
7279

7380
The block list is defined in `config.asap7.yaml` and mirrors the OpenLane
@@ -105,14 +112,30 @@ tional logic area.
105112

106113
### Block tiers
107114

108-
- **Tier 1 — Wrapper blocks** (`big_core_shell`, `npu_tile`, `slc_slice`)
109-
drive the existing module tops and emit a `*_shape.json` post-route shape
110-
report under `docs/evidence/process/asap7/`.
111-
- **Tier 2 — Leaf sub-blocks** (`npu_tile_rf_leaf`) characterize a single
112-
representative sub-block (the first 64-entry NPU register-file slice
113-
inside `e1_npu`). Outputs are tagged `leaf_only` so they are not aggregated
114-
into top-level NPU area. The intent is to land a first sub-block shape
115-
before the full tile closes.
115+
Every block currently uses the yosys + ABC synth-only flow. The block list
116+
mirrors the per-domain leaf shape needed for advanced-node area projection:
117+
118+
- **`tage_table`** — BPU TAGE tagged-table primitive (128-entry leaf-shape
119+
proxy of the production 4096-entry geometry).
120+
- **`npu_tile_rf_leaf`** — NPU weight-staging register file
121+
(`e1_weight_buffer_sram`, 512x32-bit + write-mask, behavioral path). The
122+
hard-macro swap point under `E1_HAVE_HARD_SRAM`.
123+
- **`npu_tile`** — full `e1_npu` monolithic NPU tile (16x32-bit scratch RF,
124+
16 opcodes including INT8/INT4/INT2/FP8 dot products, GEMM/vector engines,
125+
AXI-Lite descriptor engine, perf counters) at production geometry.
126+
- **`big_core_shell`**`e1_cpu_subsystem_stub`, the self-contained
127+
32x64-bit RV64I subset in-order microcontroller stub (decoder + ALU +
128+
architectural RF + AXI4-Lite master FSM). Substitutes for the
129+
Ascalon / Kunminghu / CVA6 big-core RTL until those land.
130+
- **`slc_slice`** — shrunk `e1_slc` (2 KiB, 2-way, 2 banks, 64 B line)
131+
covering the cache lookup/install FSM, BDI compression form classifier,
132+
QoS-aware victim selection, and display-RT reservation counter.
133+
134+
For every block, the storage cost in the leaf shape is the flat-flop cost.
135+
Production silicon swaps storage for vendor SRAM macros at vendor density
136+
(38.1 Mb/mm² HD at N2 per TSMC) — `scripts/project_ppa_to_n2p.py` carries
137+
the logic-only band; reviewers must add macro area separately when sizing
138+
real cache or scratch arrays.
116139

117140
### Fail-closed contract
118141

packages/chip/research/alpha_chip_macro_placement/01_sources/ai_eda_automation_readiness.yaml

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1058,6 +1058,8 @@ stages:
10581058
- sigrok-cli
10591059
- ml-boot-failure-debug
10601060
- llm4sechw-debug
1061+
- llm4sechw-oshd
1062+
- chipbench-ai-aided-design
10611063
local_lane: p1-post-silicon-validation-target-capture
10621064
current_artifacts:
10631065
- scripts/ai_eda/capture_post_silicon_validation_targets.py
@@ -1067,9 +1069,10 @@ stages:
10671069
Post-silicon and lab automation is essential but cannot be simulated into
10681070
existence. RISC-V compliance tests, riscv-dv, QED-style tests,
10691071
trace-debug methods, cross-target on-device tests, RISC-V debug/OpenOCD
1070-
flows, sigrok lab capture, and ML boot-failure triage can guide future
1072+
flows, sigrok lab capture, ML boot-failure triage, LLM hardware-debug
1073+
datasets, and tougher LLM chip-debug benchmarks can guide future
10711074
bring-up, but E1 currently has no silicon, no lab trace corpus, no
1072-
executed compliance suite, and no approved hardware action workflow. Keep
1073-
this lane as target capture until QEMU/Renode, FPGA, board/package,
1074-
debug-policy, manufacturing, real-world, benchmark, and lab evidence
1075-
exists.
1075+
executed compliance suite, no approved external debug corpus, and no
1076+
approved hardware action workflow. Keep this lane as target capture until
1077+
QEMU/Renode, FPGA, board/package, debug-policy, manufacturing,
1078+
real-world, benchmark, corpus-governance, and lab evidence exists.

packages/chip/research/alpha_chip_macro_placement/01_sources/ai_eda_integration_backlog.yaml

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1627,20 +1627,25 @@ work_items:
16271627
- sigrok-cli
16281628
- ml-boot-failure-debug
16291629
- llm4sechw-debug
1630+
- llm4sechw-oshd
1631+
- chipbench-ai-aided-design
16301632
target_area:
16311633
- post_silicon_validation
16321634
- silicon_bringup
16331635
- fpga_bringup
16341636
- riscv_compliance
16351637
- lab_debug
16361638
- boot_failure_triage
1639+
- hardware_debug_benchmark
1640+
- debug_dataset_governance
16371641
owner_scope: dry-run target capture only; no hardware actions, generated lab scripts, test binaries, FPGA bitstreams, or compliance claims
16381642
objective: >
16391643
Track post-silicon debug methods, QED/trace-analysis approaches, RISC-V
16401644
compatibility tests, random-instruction validation, cross-target
1641-
on-device tests, RISC-V debug/lab capture tooling, and ML/XAI
1642-
boot-failure triage while tying any future E1 use to QEMU/Renode, FPGA,
1643-
manufacturing, real-world, and lab evidence.
1645+
on-device tests, RISC-V debug/lab capture tooling, ML/XAI boot-failure
1646+
triage, LLM hardware-debug datasets, and LLM chip-debug benchmarks while
1647+
tying any future E1 use to QEMU/Renode, FPGA, manufacturing, real-world,
1648+
corpus-governance, benchmark, and lab evidence.
16441649
deliverables:
16451650
- scripts/ai_eda/capture_post_silicon_validation_targets.py
16461651
- build/ai_eda/post_silicon_validation_targets/<run-id>/targets_report.json
@@ -1663,6 +1668,7 @@ work_items:
16631668
- No trace schema for reset, boot, UART, JTAG, power, thermal, FPGA, or board observations
16641669
- No pinned OpenOCD board configuration, RISC-V debug module transcript, probe inventory, or sigrok acquisition profile
16651670
- No labeled boot-failure, post-silicon debug, or lab anomaly corpus for ML/XAI triage
1671+
- No approved LLM hardware-debug dataset or benchmark import with pinned revisions, licenses, task manifests, non-overlap review, replay logs, and reviewer disposition
16661672
- No approved workflow for AI-generated lab scripts, test binaries, FPGA bitstreams, or hardware actions
16671673
acceptance_criteria:
16681674
- Report hashes local RISC-V compliance, QEMU/Renode, FPGA, package, board, benchmark, and release-gate inputs.

packages/chip/research/alpha_chip_macro_placement/01_sources/ai_eda_provenance_matrix.yaml

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -935,6 +935,24 @@ entries:
935935
license_status: review_required
936936
release_use: blocked_pending_dataset_license_snapshot_and_non_overlap_review
937937
allowed_current_use: dataset_governance_reference_only
938+
- source_id: llm4sechw-debug
939+
asset_type: hardware_debug_llm_framework
940+
asset_url: https://arxiv.org/abs/2401.16448
941+
license_status: paper_assets_review_required
942+
release_use: blocked_pending_model_dataset_defect_patch_and_review_harness
943+
allowed_current_use: hardware_debug_method_reference_only
944+
- source_id: llm4sechw-oshd
945+
asset_type: open_source_hardware_debug_dataset
946+
asset_url: https://huggingface.co/datasets/KSU-HW-SEC/LLM4SecHW-OSHD
947+
license_status: review_required
948+
release_use: blocked_pending_dataset_snapshot_license_and_contamination_review
949+
allowed_current_use: dataset_governance_reference_only
950+
- source_id: chipbench-ai-aided-design
951+
asset_type: llm_chip_design_debug_reference_model_benchmark
952+
asset_url: https://github.com/zhongkaiyu/ChipBench
953+
license_status: review_required
954+
release_use: blocked_pending_benchmark_license_split_overlap_and_replay_review
955+
allowed_current_use: benchmark_governance_reference_only
938956
- source_id: archgym
939957
asset_type: architecture_dse_framework
940958
asset_url: https://github.com/srivatsankrishnan/oss-arch-gym

0 commit comments

Comments
 (0)