src/apps/ contains BitFly-managed application overlays that are synced into ara/apps/. This tree is the software-side definition of the workload matrix used for correctness checks and paper-style benchmarking.
Use this directory to answer three questions:
- which apps represent the proposed BMPMM path versus the RVV baseline
- how one app maps to one workload slice
- where shared generator and template logic should live
| Category | Directories | Role |
|---|---|---|
| proposed benchmark path | bmpmm_* |
BMPMM-based implementation under evaluation |
| baseline benchmark path | rvv_* |
RVV implementation used for comparison |
| correctness regression | bmpu_verify |
focused validation of BMPU packing and low-bit execution behavior |
| shared infrastructure | common |
generators, case definitions, templates, and common helpers |
| separate inference experiment | llama2 |
exploratory inference flow outside the main benchmark matrix |
For model-split evaluation, one benchmark app corresponds to one workload slice:
<implementation>_<precision>_<model>
Examples:
bmpmm_binary_gemma3_270mbmpmm_INT2_opt_13brvv_INT4_qwen25_15b
This structure keeps:
- generated tensors scoped to one workload slice
- simulator logs scoped to one app
- runtime summaries straightforward to aggregate into paper figures
The main comparison is always:
bmpmm_*as the proposed pathrvv_*as the baseline path
under the same model-derived shape set.
| Form | Example | Intended Use |
|---|---|---|
| generic | bmpmm_INT2, rvv_binary |
short regression, bring-up, or local debugging |
| model-split | bmpmm_INT2_gemma3_270m |
formal benchmark campaigns and reported comparisons |
The batch runner primarily targets the model-split apps.
For model-split apps, the directory name itself is part of the experiment contract:
<implementation>_<precision>_<model>
That name should stay aligned with:
- the generator inputs
- the emitted tensors under
kernel/ - the reporting keys written into benchmark summaries
Most benchmark app directories contain:
main.c: app entry point and case-level loggingkernel/: implementation code plus generated tensors and case metadatatests.c/tests.h: local helpers or validation logic where neededscript/gen_data.py: app-specific data generator
| Change Type | Edit Location |
|---|---|
| shared case-selection or generator policy | common/ |
| one app's workload tensor generation | that app's script/gen_data.py |
| one app's kernel implementation | that app's kernel/ |
| one app's logging or control flow | that app's main.c |
| correctness-oriented checks for BMPU behavior | bmpu_verify/ |
| local inference experiments outside the benchmark matrix | llama2/ |
bmpu_verifyis for correctness, not paper-performance reporting.bmpmm_*versusrvv_*is the main reported comparison.llama2/is not the same thing as the benchmark matrix used byrun_model_split_apps.sh.common/is the right place to factor out behavior shared across multiple apps.
When you introduce a new maintained workflow under src/apps/, document it at the shared-tree level first unless the workflow is truly local to one app directory. This keeps the many repetitive benchmark directories from drifting out of sync.
The many repetitive benchmark app directories are intentionally documented by shared rules here rather than by duplicating boilerplate README files per app.