|
| 1 | +# Showcase Demo Quality Assurance Checklist |
| 2 | + |
| 3 | +**Document ID:** PRES-QA-001 |
| 4 | +**Version:** 1.0 |
| 5 | +**Status:** For Red Team Review |
| 6 | +**Prepared For:** Toyota ML Engineering Review Team |
| 7 | + |
| 8 | +--- |
| 9 | + |
| 10 | +## Preamble: The Toyota Way Applied to ML Systems |
| 11 | + |
| 12 | +This checklist embodies the 14 principles of the Toyota Way [1] applied to machine learning visualization systems. Every claim must be verified through **Genchi Genbutsu** (go and see for yourself). We reject vanity metrics and demand reproducible, measurable evidence. |
| 13 | + |
| 14 | +> "The root of the Toyota Way is to be dissatisfied with the status quo; you have to ask constantly, 'Why are we doing this?'" — Taiichi Ohno |
| 15 | +
|
| 16 | +**Review Philosophy:** |
| 17 | +- Assume all claims are false until proven with evidence |
| 18 | +- Measure everything; opinions are not data |
| 19 | +- One defect discovered in production costs 100x more than one caught in review |
| 20 | +- Respect the reviewer's time: provide reproducible commands for every claim |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## References |
| 25 | + |
| 26 | +[1] Liker, J.K. (2004). *The Toyota Way: 14 Management Principles*. McGraw-Hill. ISBN: 978-0071392310 |
| 27 | + |
| 28 | +[2] Haas, A. et al. (2017). "Bringing the Web up to Speed with WebAssembly." *PLDI '17: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation*, pp. 185-200. DOI: 10.1145/3062341.3062363 |
| 29 | + |
| 30 | +[3] Jangda, A. et al. (2019). "Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code." *USENIX ATC '19*, pp. 107-120. https://www.usenix.org/conference/atc19/presentation/jangda |
| 31 | + |
| 32 | +[4] Sculley, D. et al. (2015). "Hidden Technical Debt in Machine Learning Systems." *NeurIPS 2015*, pp. 2503-2511. https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems |
| 33 | + |
| 34 | +[5] Amershi, S. et al. (2019). "Software Engineering for Machine Learning: A Case Study." *ICSE-SEIP '19*, pp. 291-300. DOI: 10.1109/ICSE-SEIP.2019.00042 |
| 35 | + |
| 36 | +[6] Kenwright, B. (2012). "A Beginners Guide to Dual-Quaternions." *WSCG '12*, pp. 1-10. (GPU animation fundamentals) |
| 37 | + |
| 38 | +[7] McSherry, F. et al. (2015). "Scalability! But at what COST?" *HotOS XV*. https://www.usenix.org/conference/hotos15/workshop-program/presentation/mcsherry |
| 39 | + |
| 40 | +[8] Ratanaworabhan, P. et al. (2010). "JSMeter: Comparing the Behavior of JavaScript Benchmarks with Real Web Applications." *WebApps '10*, pp. 3-3. https://www.usenix.org/conference/webapps-10 |
| 41 | + |
| 42 | +[9] Xu, T. et al. (2016). "Early Detection of Configuration Errors to Reduce Failure Damage." *OSDI '16*, pp. 619-634. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/xu |
| 43 | + |
| 44 | +[10] Paleyes, A. et al. (2022). "Challenges in Deploying Machine Learning: A Survey of Case Studies." *ACM Computing Surveys*, Vol. 55, Issue 6, Article 114. DOI: 10.1145/3533378 |
| 45 | + |
| 46 | +--- |
| 47 | + |
| 48 | +## Checklist Categories |
| 49 | + |
| 50 | +| Category | Points | Focus Area | |
| 51 | +|----------|--------|------------| |
| 52 | +| A. Performance Claims | 20 | Frame rate, latency, throughput | |
| 53 | +| B. Size & Efficiency Claims | 15 | Bundle size, memory, startup | |
| 54 | +| C. Data Format Integrity | 15 | .apr/.ald correctness | |
| 55 | +| D. Visualization Accuracy | 15 | Chart rendering fidelity | |
| 56 | +| E. Animation & Interaction | 10 | Smoothness, responsiveness | |
| 57 | +| F. Cross-Platform | 10 | Browser/device compatibility | |
| 58 | +| G. Code Quality | 10 | Tests, documentation, security | |
| 59 | +| H. Claim Substantiation | 5 | Marketing vs. reality | |
| 60 | + |
| 61 | +--- |
| 62 | + |
| 63 | +## A. Performance Claims (20 Points) |
| 64 | + |
| 65 | +**Principle: Genchi Genbutsu — Measure at the source, not from marketing materials** |
| 66 | + |
| 67 | +### Frame Rate Claims |
| 68 | + |
| 69 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 70 | +|---|-------|----------------|---------------|-----| |
| 71 | +| A1 | Measure actual FPS in Chrome DevTools | `Performance tab → Record 10s → Analyze frames` | Mean ≥ 55fps, P99 ≥ 45fps | [8] | |
| 72 | +| A2 | Measure actual FPS in Firefox | `about:performance` or Performance Monitor | Mean ≥ 55fps | [8] | |
| 73 | +| A3 | Measure FPS under CPU throttling (4x slowdown) | `DevTools → Performance → CPU: 4x slowdown` | Mean ≥ 30fps | [3] | |
| 74 | +| A4 | Verify no frame drops during particle burst (100 particles) | Click 5x rapidly, observe frame timeline | No frames >32ms | [6] | |
| 75 | +| A5 | Verify FPS counter accuracy | Compare DevTools FPS vs displayed FPS | Within ±5fps | [8] | |
| 76 | + |
| 77 | +### Latency Claims |
| 78 | + |
| 79 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 80 | +|---|-------|----------------|---------------|-----| |
| 81 | +| A6 | Measure click-to-render latency | `performance.now()` around click handler | <16ms mean | [2] | |
| 82 | +| A7 | Measure animation start latency | Time from button click to first visual change | <50ms | [2] | |
| 83 | +| A8 | Measure data update propagation | Randomize Data → measure chart update time | <100ms to 90% complete | [2] | |
| 84 | +| A9 | Verify no jank during theme toggle | Record Performance trace during toggle | No long tasks >50ms | [8] | |
| 85 | +| A10 | Measure inference button response | Time from click to model card flash | <100ms | [5] | |
| 86 | + |
| 87 | +### Throughput Claims |
| 88 | + |
| 89 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 90 | +|---|-------|----------------|---------------|-----| |
| 91 | +| A11 | Verify 100 candlesticks render correctly | Count rendered bars visually | Exactly 100 visible | [7] | |
| 92 | +| A12 | Verify 500 particle capacity | Emit 500 particles, count in memory | particles.length === 500 | [6] | |
| 93 | +| A13 | Measure draw call count per frame | Instrument Canvas2D calls | <50 draw calls/frame | [6] | |
| 94 | +| A14 | Verify no memory growth over 5 minutes | `performance.memory.usedJSHeapSize` every 30s | <10% growth | [4] | |
| 95 | +| A15 | Stress test: 1000 rapid clicks | Automated click script | No crash, FPS recovers | [7] | |
| 96 | + |
| 97 | +### Rust/WASM Performance |
| 98 | + |
| 99 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 100 | +|---|-------|----------------|---------------|-----| |
| 101 | +| A16 | Run Rust example benchmarks | `cargo run --example showcase_gpu` | 60fps reported | [2] | |
| 102 | +| A17 | Verify WASM build succeeds | `cargo build --example showcase_gpu --target wasm32-unknown-unknown --release` | Exit code 0 | [2] | |
| 103 | +| A18 | Measure WASM instantiation time | `performance.now()` around WebAssembly.instantiate | <50ms | [3] | |
| 104 | +| A19 | Compare native vs WASM execution | Run same benchmark native and WASM | WASM within 2x of native | [3] | |
| 105 | +| A20 | Profile WASM with Chrome DevTools | `Performance → Bottom-Up → WASM functions` | No single function >10% | [3] | |
| 106 | + |
| 107 | +--- |
| 108 | + |
| 109 | +## B. Size & Efficiency Claims (15 Points) |
| 110 | + |
| 111 | +**Principle: Muda elimination — Every byte must justify its existence** |
| 112 | + |
| 113 | +### Bundle Size Claims |
| 114 | + |
| 115 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 116 | +|---|-------|----------------|---------------|-----| |
| 117 | +| B1 | Measure WASM binary size | `ls -la target/wasm32-unknown-unknown/release/examples/*.wasm` | <500KB | [2] | |
| 118 | +| B2 | Measure HTML/JS size | `wc -c web/showcase/index.html` | <50KB | [8] | |
| 119 | +| B3 | Measure total transfer size | DevTools Network tab, disable cache, reload | <600KB total | [2] | |
| 120 | +| B4 | Verify gzip compression ratio | `gzip -c file.wasm \| wc -c` | >50% reduction | [2] | |
| 121 | +| B5 | Compare to Gradio bundle | Download Gradio app, measure | Presentar <1% of Gradio | [7] | |
| 122 | + |
| 123 | +### Memory Claims |
| 124 | + |
| 125 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 126 | +|---|-------|----------------|---------------|-----| |
| 127 | +| B6 | Measure initial heap size | `performance.memory.usedJSHeapSize` on load | <20MB | [4] | |
| 128 | +| B7 | Measure heap after 1 minute | Same metric after 1 min interaction | <50MB | [4] | |
| 129 | +| B8 | Check for detached DOM nodes | DevTools Memory → Heap snapshot → Detached | 0 detached nodes | [4] | |
| 130 | +| B9 | Verify no canvas memory leaks | Create/destroy canvases, check memory | Stable after GC | [4] | |
| 131 | +| B10 | Measure particle array memory | `sizeof(particles) * particles.length` estimate | <1MB at 500 particles | [6] | |
| 132 | + |
| 133 | +### Startup Claims |
| 134 | + |
| 135 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 136 | +|---|-------|----------------|---------------|-----| |
| 137 | +| B11 | Measure Time to First Paint | DevTools Performance → FP marker | <200ms | [8] | |
| 138 | +| B12 | Measure Time to Interactive | Lighthouse audit | <500ms | [8] | |
| 139 | +| B13 | Measure First Contentful Paint | DevTools Performance → FCP marker | <300ms | [8] | |
| 140 | +| B14 | Cold start with cache disabled | Hard reload (Ctrl+Shift+R) | <1s to interactive | [8] | |
| 141 | +| B15 | Compare to Streamlit startup | Time Streamlit hello world | Presentar 10x faster | [7] | |
| 142 | + |
| 143 | +--- |
| 144 | + |
| 145 | +## C. Data Format Integrity (15 Points) |
| 146 | + |
| 147 | +**Principle: Jidoka — Build quality in; stop and fix problems immediately** |
| 148 | + |
| 149 | +### .apr Model Format |
| 150 | + |
| 151 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 152 | +|---|-------|----------------|---------------|-----| |
| 153 | +| C1 | Verify magic bytes | `hexdump -C demo/assets/sentiment_mini.apr \| head -1` | Starts with `APR\0` | [9] | |
| 154 | +| C2 | Parse model in Rust | `cargo test --package presentar-yaml -- formats` | All tests pass | [9] | |
| 155 | +| C3 | Verify layer count | Load model, check `model.layers.len()` | Exactly 2 | [5] | |
| 156 | +| C4 | Verify parameter count | `model.param_count()` | Exactly 867 | [5] | |
| 157 | +| C5 | Verify weight initialization | Check weight distribution | Xavier-like variance | [5] | |
| 158 | +| C6 | Verify metadata | Check `model.metadata` for task, classes | All keys present | [5] | |
| 159 | +| C7 | Roundtrip test | Save → Load → Compare | Byte-identical | [9] | |
| 160 | + |
| 161 | +### .ald Dataset Format |
| 162 | + |
| 163 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 164 | +|---|-------|----------------|---------------|-----| |
| 165 | +| C8 | Verify magic bytes | `hexdump -C demo/assets/timeseries_100.ald \| head -1` | Starts with `ALD\0` | [9] | |
| 166 | +| C9 | Parse dataset in Rust | `AldDataset::load()` succeeds | No errors | [9] | |
| 167 | +| C10 | Verify tensor count | `dataset.tensors.len()` | Exactly 5 | [9] | |
| 168 | +| C11 | Verify tensor shapes | All tensors have shape `[100]` | True | [9] | |
| 169 | +| C12 | Verify OHLC validity | `high >= low` for all rows | True for 100/100 | [9] | |
| 170 | +| C13 | Verify OHLC validity | `high >= max(open, close)` | True for 100/100 | [9] | |
| 171 | +| C14 | Verify positive prices | All values > 0 | True | [9] | |
| 172 | +| C15 | Roundtrip test | Save → Load → Compare | Byte-identical | [9] | |
| 173 | + |
| 174 | +--- |
| 175 | + |
| 176 | +## D. Visualization Accuracy (15 Points) |
| 177 | + |
| 178 | +**Principle: Standardized work — Every chart must render identically every time** |
| 179 | + |
| 180 | +### Candlestick Chart |
| 181 | + |
| 182 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 183 | +|---|-------|----------------|---------------|-----| |
| 184 | +| D1 | Verify candlestick count | Visual count or DOM inspection | Exactly 100 | [8] | |
| 185 | +| D2 | Verify green/red coloring | Up days green, down days red | Correct for sample | [8] | |
| 186 | +| D3 | Verify Y-axis scale | Compare displayed prices to data | Within 1% | [8] | |
| 187 | +| D4 | Verify current price line | Matches last close value | Exact match | [8] | |
| 188 | +| D5 | Verify wick rendering | High-low range visible | All wicks visible | [6] | |
| 189 | + |
| 190 | +### Bar Chart |
| 191 | + |
| 192 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 193 | +|---|-------|----------------|---------------|-----| |
| 194 | +| D6 | Verify bar count | Visual inspection | Exactly 6 bars | [8] | |
| 195 | +| D7 | Verify bar heights proportional | Tallest bar = highest value | Correct | [8] | |
| 196 | +| D8 | Verify value labels | Labels match bar heights | Within ±1% | [8] | |
| 197 | +| D9 | Verify month labels | Jan-Jun displayed correctly | Correct order | [8] | |
| 198 | +| D10 | Verify animation easing | Bars ease-out, not linear | Visually smooth | [6] | |
| 199 | + |
| 200 | +### Donut Chart |
| 201 | + |
| 202 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 203 | +|---|-------|----------------|---------------|-----| |
| 204 | +| D11 | Verify segment count | Visual inspection | Exactly 5 segments | [8] | |
| 205 | +| D12 | Verify segment proportions | Arc lengths proportional to values | Within ±5% | [8] | |
| 206 | +| D13 | Verify center total | Sum of all segments | Correct sum displayed | [8] | |
| 207 | +| D14 | Verify rotation animation | Donut rotates smoothly | Continuous rotation | [6] | |
| 208 | +| D15 | Verify color consistency | Same colors across refresh | Deterministic | [8] | |
| 209 | + |
| 210 | +--- |
| 211 | + |
| 212 | +## E. Animation & Interaction (10 Points) |
| 213 | + |
| 214 | +**Principle: Heijunka — Smooth, level flow without bursts or stalls** |
| 215 | + |
| 216 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 217 | +|---|-------|----------------|---------------|-----| |
| 218 | +| E1 | Verify bar animation smoothness | Record slow-mo, check for jumps | No discontinuities | [6] | |
| 219 | +| E2 | Verify particle physics | Particles fall with gravity | Realistic arc | [6] | |
| 220 | +| E3 | Verify particle fade | Alpha decreases over lifetime | Smooth fade | [6] | |
| 221 | +| E4 | Verify click-to-emit | Click donut area | Particles spawn at click position | [8] | |
| 222 | +| E5 | Verify button hover states | Mouse over buttons | Visual feedback | [8] | |
| 223 | +| E6 | Verify Randomize Data | Click button | All charts update | [8] | |
| 224 | +| E7 | Verify Run Inference | Click button | Model card flashes 3x | [5] | |
| 225 | +| E8 | Verify Emit Particles | Click button | 30 particles spawn | [6] | |
| 226 | +| E9 | Verify no interaction blocking | Rapid button clicks | All register | [8] | |
| 227 | +| E10 | Verify cleanup | Wait 5s after particles | All particles gone | [6] | |
| 228 | + |
| 229 | +--- |
| 230 | + |
| 231 | +## F. Cross-Platform Compatibility (10 Points) |
| 232 | + |
| 233 | +**Principle: Challenge everything — "It works on my machine" is not acceptance criteria** |
| 234 | + |
| 235 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 236 | +|---|-------|----------------|---------------|-----| |
| 237 | +| F1 | Chrome (latest) | Manual test | All features work | [2] | |
| 238 | +| F2 | Firefox (latest) | Manual test | All features work | [2] | |
| 239 | +| F3 | Safari (latest) | Manual test on macOS | All features work | [2] | |
| 240 | +| F4 | Edge (latest) | Manual test | All features work | [2] | |
| 241 | +| F5 | Mobile Chrome (Android) | Touch interactions work | Tap emits particles | [8] | |
| 242 | +| F6 | Mobile Safari (iOS) | Touch interactions work | Tap emits particles | [8] | |
| 243 | +| F7 | 4K display (3840x2160) | No blurriness | Crisp rendering | [6] | |
| 244 | +| F8 | 1366x768 display | No overflow/clipping | All content visible | [8] | |
| 245 | +| F9 | Dark mode OS setting | No conflicts | Renders correctly | [8] | |
| 246 | +| F10 | Reduced motion preference | `prefers-reduced-motion` | Animations respect | [8] | |
| 247 | + |
| 248 | +--- |
| 249 | + |
| 250 | +## G. Code Quality (10 Points) |
| 251 | + |
| 252 | +**Principle: Respect for people — Clean code respects the next developer's time** |
| 253 | + |
| 254 | +| # | Check | Command/Method | Pass Criteria | Ref | |
| 255 | +|---|-------|----------------|---------------|-----| |
| 256 | +| G1 | All Rust tests pass | `cargo test --example showcase_gpu` | 48/48 pass | [5] | |
| 257 | +| G2 | All Rust tests pass | `cargo test --example generate_demo_assets` | 17/17 pass | [5] | |
| 258 | +| G3 | No clippy warnings | `cargo clippy --example showcase_gpu` | 0 errors | [5] | |
| 259 | +| G4 | No JavaScript console errors | DevTools Console | 0 errors | [8] | |
| 260 | +| G5 | No JavaScript console warnings | DevTools Console | 0 warnings | [8] | |
| 261 | +| G6 | HTML validates | W3C Validator | 0 errors | [8] | |
| 262 | +| G7 | No hardcoded secrets | `grep -r "password\|secret\|key" web/` | 0 matches | [10] | |
| 263 | +| G8 | Deterministic output | Run generator twice, compare | Identical files | [9] | |
| 264 | +| G9 | Comments explain "why" | Code review | Non-trivial logic commented | [1] | |
| 265 | +| G10 | No TODO/FIXME in production | `grep -r "TODO\|FIXME" web/showcase/` | 0 matches | [4] | |
| 266 | + |
| 267 | +--- |
| 268 | + |
| 269 | +## H. Claim Substantiation (5 Points) |
| 270 | + |
| 271 | +**Principle: Say what you mean; mean what you say — Marketing claims must match reality** |
| 272 | + |
| 273 | +| # | Claim | Verification Method | Evidence Required | Ref | |
| 274 | +|---|-------|---------------------|-------------------|-----| |
| 275 | +| H1 | "60fps" | Measured FPS from A1-A2 | Screenshot of DevTools | [8] | |
| 276 | +| H2 | "450KB bundle" | Measured from B1-B3 | `ls -la` output | [2] | |
| 277 | +| H3 | "80ms startup" | Measured from B11-B13 | Lighthouse report | [8] | |
| 278 | +| H4 | "32MB memory" | Measured from B6-B7 | DevTools screenshot | [4] | |
| 279 | +| H5 | "10X better" | Each comparison measured | Data table with sources | [7] | |
| 280 | + |
| 281 | +--- |
| 282 | + |
| 283 | +## Scoring |
| 284 | + |
| 285 | +| Grade | Score | Interpretation | |
| 286 | +|-------|-------|----------------| |
| 287 | +| A+ | 95-100 | Production ready, Toyota Quality | |
| 288 | +| A | 90-94 | Minor issues, safe to ship | |
| 289 | +| B | 80-89 | Significant issues, needs iteration | |
| 290 | +| C | 70-79 | Major issues, do not ship | |
| 291 | +| F | <70 | Fundamental problems, redesign required | |
| 292 | + |
| 293 | +--- |
| 294 | + |
| 295 | +## Sign-Off |
| 296 | + |
| 297 | +| Role | Name | Date | Score | Signature | |
| 298 | +|------|------|------|-------|-----------| |
| 299 | +| QA Lead | | | /100 | | |
| 300 | +| ML Engineer | | | /100 | | |
| 301 | +| Performance Engineer | | | /100 | | |
| 302 | +| Security Reviewer | | | /100 | | |
| 303 | + |
| 304 | +--- |
| 305 | + |
| 306 | +## Appendix: Quick Verification Commands |
| 307 | + |
| 308 | +```bash |
| 309 | +# A. Performance |
| 310 | +cargo run --example showcase_gpu |
| 311 | +cargo build --example showcase_gpu --target wasm32-unknown-unknown --release |
| 312 | + |
| 313 | +# B. Size |
| 314 | +ls -la target/wasm32-unknown-unknown/release/examples/showcase_gpu.wasm |
| 315 | +wc -c web/showcase/index.html |
| 316 | + |
| 317 | +# C. Data Integrity |
| 318 | +cargo test --package presentar-yaml -- formats |
| 319 | +hexdump -C demo/assets/sentiment_mini.apr | head -1 |
| 320 | +hexdump -C demo/assets/timeseries_100.ald | head -1 |
| 321 | + |
| 322 | +# D-F. Manual verification |
| 323 | +cd web/showcase && python3 -m http.server 8080 |
| 324 | +# Open http://localhost:8080 in each browser |
| 325 | + |
| 326 | +# G. Code Quality |
| 327 | +cargo test --example showcase_gpu |
| 328 | +cargo test --example generate_demo_assets |
| 329 | +cargo clippy --example showcase_gpu 2>&1 | grep -c "error" |
| 330 | +``` |
| 331 | + |
| 332 | +--- |
| 333 | + |
| 334 | +*"Quality is not an act, it is a habit." — Aristotle* |
| 335 | + |
| 336 | +*"The Toyota Way is not about perfection. It is about pursuing perfection while accepting that you will never achieve it." — Jeffrey Liker [1]* |
0 commit comments