Skip to content
This repository was archived by the owner on May 5, 2026. It is now read-only.

Commit 6c4c0bd

Browse files
noahgiftclaude
andcommitted
Add 100-point QA checklist for showcase demo red team
Toyota Way-inspired quality assurance document: - 100 verification points across 8 categories - 10 peer-reviewed citations (PLDI, USENIX, NeurIPS, ICSE) - Reproducible commands for every claim - Designed for skeptical ML engineer review Categories: - A. Performance Claims (20 pts) - FPS, latency, throughput - B. Size & Efficiency (15 pts) - Bundle, memory, startup - C. Data Format Integrity (15 pts) - .apr/.ald validation - D. Visualization Accuracy (15 pts) - Chart correctness - E. Animation & Interaction (10 pts) - Smoothness - F. Cross-Platform (10 pts) - Browser compatibility - G. Code Quality (10 pts) - Tests, linting - H. Claim Substantiation (5 pts) - Evidence for claims 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent c683060 commit 6c4c0bd

1 file changed

Lines changed: 336 additions & 0 deletions

File tree

Lines changed: 336 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,336 @@
1+
# Showcase Demo Quality Assurance Checklist
2+
3+
**Document ID:** PRES-QA-001
4+
**Version:** 1.0
5+
**Status:** For Red Team Review
6+
**Prepared For:** Toyota ML Engineering Review Team
7+
8+
---
9+
10+
## Preamble: The Toyota Way Applied to ML Systems
11+
12+
This checklist embodies the 14 principles of the Toyota Way [1] applied to machine learning visualization systems. Every claim must be verified through **Genchi Genbutsu** (go and see for yourself). We reject vanity metrics and demand reproducible, measurable evidence.
13+
14+
> "The root of the Toyota Way is to be dissatisfied with the status quo; you have to ask constantly, 'Why are we doing this?'" — Taiichi Ohno
15+
16+
**Review Philosophy:**
17+
- Assume all claims are false until proven with evidence
18+
- Measure everything; opinions are not data
19+
- One defect discovered in production costs 100x more than one caught in review
20+
- Respect the reviewer's time: provide reproducible commands for every claim
21+
22+
---
23+
24+
## References
25+
26+
[1] Liker, J.K. (2004). *The Toyota Way: 14 Management Principles*. McGraw-Hill. ISBN: 978-0071392310
27+
28+
[2] Haas, A. et al. (2017). "Bringing the Web up to Speed with WebAssembly." *PLDI '17: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation*, pp. 185-200. DOI: 10.1145/3062341.3062363
29+
30+
[3] Jangda, A. et al. (2019). "Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code." *USENIX ATC '19*, pp. 107-120. https://www.usenix.org/conference/atc19/presentation/jangda
31+
32+
[4] Sculley, D. et al. (2015). "Hidden Technical Debt in Machine Learning Systems." *NeurIPS 2015*, pp. 2503-2511. https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems
33+
34+
[5] Amershi, S. et al. (2019). "Software Engineering for Machine Learning: A Case Study." *ICSE-SEIP '19*, pp. 291-300. DOI: 10.1109/ICSE-SEIP.2019.00042
35+
36+
[6] Kenwright, B. (2012). "A Beginners Guide to Dual-Quaternions." *WSCG '12*, pp. 1-10. (GPU animation fundamentals)
37+
38+
[7] McSherry, F. et al. (2015). "Scalability! But at what COST?" *HotOS XV*. https://www.usenix.org/conference/hotos15/workshop-program/presentation/mcsherry
39+
40+
[8] Ratanaworabhan, P. et al. (2010). "JSMeter: Comparing the Behavior of JavaScript Benchmarks with Real Web Applications." *WebApps '10*, pp. 3-3. https://www.usenix.org/conference/webapps-10
41+
42+
[9] Xu, T. et al. (2016). "Early Detection of Configuration Errors to Reduce Failure Damage." *OSDI '16*, pp. 619-634. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/xu
43+
44+
[10] Paleyes, A. et al. (2022). "Challenges in Deploying Machine Learning: A Survey of Case Studies." *ACM Computing Surveys*, Vol. 55, Issue 6, Article 114. DOI: 10.1145/3533378
45+
46+
---
47+
48+
## Checklist Categories
49+
50+
| Category | Points | Focus Area |
51+
|----------|--------|------------|
52+
| A. Performance Claims | 20 | Frame rate, latency, throughput |
53+
| B. Size & Efficiency Claims | 15 | Bundle size, memory, startup |
54+
| C. Data Format Integrity | 15 | .apr/.ald correctness |
55+
| D. Visualization Accuracy | 15 | Chart rendering fidelity |
56+
| E. Animation & Interaction | 10 | Smoothness, responsiveness |
57+
| F. Cross-Platform | 10 | Browser/device compatibility |
58+
| G. Code Quality | 10 | Tests, documentation, security |
59+
| H. Claim Substantiation | 5 | Marketing vs. reality |
60+
61+
---
62+
63+
## A. Performance Claims (20 Points)
64+
65+
**Principle: Genchi Genbutsu — Measure at the source, not from marketing materials**
66+
67+
### Frame Rate Claims
68+
69+
| # | Check | Command/Method | Pass Criteria | Ref |
70+
|---|-------|----------------|---------------|-----|
71+
| A1 | Measure actual FPS in Chrome DevTools | `Performance tab → Record 10s → Analyze frames` | Mean ≥ 55fps, P99 ≥ 45fps | [8] |
72+
| A2 | Measure actual FPS in Firefox | `about:performance` or Performance Monitor | Mean ≥ 55fps | [8] |
73+
| A3 | Measure FPS under CPU throttling (4x slowdown) | `DevTools → Performance → CPU: 4x slowdown` | Mean ≥ 30fps | [3] |
74+
| A4 | Verify no frame drops during particle burst (100 particles) | Click 5x rapidly, observe frame timeline | No frames >32ms | [6] |
75+
| A5 | Verify FPS counter accuracy | Compare DevTools FPS vs displayed FPS | Within ±5fps | [8] |
76+
77+
### Latency Claims
78+
79+
| # | Check | Command/Method | Pass Criteria | Ref |
80+
|---|-------|----------------|---------------|-----|
81+
| A6 | Measure click-to-render latency | `performance.now()` around click handler | <16ms mean | [2] |
82+
| A7 | Measure animation start latency | Time from button click to first visual change | <50ms | [2] |
83+
| A8 | Measure data update propagation | Randomize Data → measure chart update time | <100ms to 90% complete | [2] |
84+
| A9 | Verify no jank during theme toggle | Record Performance trace during toggle | No long tasks >50ms | [8] |
85+
| A10 | Measure inference button response | Time from click to model card flash | <100ms | [5] |
86+
87+
### Throughput Claims
88+
89+
| # | Check | Command/Method | Pass Criteria | Ref |
90+
|---|-------|----------------|---------------|-----|
91+
| A11 | Verify 100 candlesticks render correctly | Count rendered bars visually | Exactly 100 visible | [7] |
92+
| A12 | Verify 500 particle capacity | Emit 500 particles, count in memory | particles.length === 500 | [6] |
93+
| A13 | Measure draw call count per frame | Instrument Canvas2D calls | <50 draw calls/frame | [6] |
94+
| A14 | Verify no memory growth over 5 minutes | `performance.memory.usedJSHeapSize` every 30s | <10% growth | [4] |
95+
| A15 | Stress test: 1000 rapid clicks | Automated click script | No crash, FPS recovers | [7] |
96+
97+
### Rust/WASM Performance
98+
99+
| # | Check | Command/Method | Pass Criteria | Ref |
100+
|---|-------|----------------|---------------|-----|
101+
| A16 | Run Rust example benchmarks | `cargo run --example showcase_gpu` | 60fps reported | [2] |
102+
| A17 | Verify WASM build succeeds | `cargo build --example showcase_gpu --target wasm32-unknown-unknown --release` | Exit code 0 | [2] |
103+
| A18 | Measure WASM instantiation time | `performance.now()` around WebAssembly.instantiate | <50ms | [3] |
104+
| A19 | Compare native vs WASM execution | Run same benchmark native and WASM | WASM within 2x of native | [3] |
105+
| A20 | Profile WASM with Chrome DevTools | `Performance → Bottom-Up → WASM functions` | No single function >10% | [3] |
106+
107+
---
108+
109+
## B. Size & Efficiency Claims (15 Points)
110+
111+
**Principle: Muda elimination — Every byte must justify its existence**
112+
113+
### Bundle Size Claims
114+
115+
| # | Check | Command/Method | Pass Criteria | Ref |
116+
|---|-------|----------------|---------------|-----|
117+
| B1 | Measure WASM binary size | `ls -la target/wasm32-unknown-unknown/release/examples/*.wasm` | <500KB | [2] |
118+
| B2 | Measure HTML/JS size | `wc -c web/showcase/index.html` | <50KB | [8] |
119+
| B3 | Measure total transfer size | DevTools Network tab, disable cache, reload | <600KB total | [2] |
120+
| B4 | Verify gzip compression ratio | `gzip -c file.wasm \| wc -c` | >50% reduction | [2] |
121+
| B5 | Compare to Gradio bundle | Download Gradio app, measure | Presentar <1% of Gradio | [7] |
122+
123+
### Memory Claims
124+
125+
| # | Check | Command/Method | Pass Criteria | Ref |
126+
|---|-------|----------------|---------------|-----|
127+
| B6 | Measure initial heap size | `performance.memory.usedJSHeapSize` on load | <20MB | [4] |
128+
| B7 | Measure heap after 1 minute | Same metric after 1 min interaction | <50MB | [4] |
129+
| B8 | Check for detached DOM nodes | DevTools Memory → Heap snapshot → Detached | 0 detached nodes | [4] |
130+
| B9 | Verify no canvas memory leaks | Create/destroy canvases, check memory | Stable after GC | [4] |
131+
| B10 | Measure particle array memory | `sizeof(particles) * particles.length` estimate | <1MB at 500 particles | [6] |
132+
133+
### Startup Claims
134+
135+
| # | Check | Command/Method | Pass Criteria | Ref |
136+
|---|-------|----------------|---------------|-----|
137+
| B11 | Measure Time to First Paint | DevTools Performance → FP marker | <200ms | [8] |
138+
| B12 | Measure Time to Interactive | Lighthouse audit | <500ms | [8] |
139+
| B13 | Measure First Contentful Paint | DevTools Performance → FCP marker | <300ms | [8] |
140+
| B14 | Cold start with cache disabled | Hard reload (Ctrl+Shift+R) | <1s to interactive | [8] |
141+
| B15 | Compare to Streamlit startup | Time Streamlit hello world | Presentar 10x faster | [7] |
142+
143+
---
144+
145+
## C. Data Format Integrity (15 Points)
146+
147+
**Principle: Jidoka — Build quality in; stop and fix problems immediately**
148+
149+
### .apr Model Format
150+
151+
| # | Check | Command/Method | Pass Criteria | Ref |
152+
|---|-------|----------------|---------------|-----|
153+
| C1 | Verify magic bytes | `hexdump -C demo/assets/sentiment_mini.apr \| head -1` | Starts with `APR\0` | [9] |
154+
| C2 | Parse model in Rust | `cargo test --package presentar-yaml -- formats` | All tests pass | [9] |
155+
| C3 | Verify layer count | Load model, check `model.layers.len()` | Exactly 2 | [5] |
156+
| C4 | Verify parameter count | `model.param_count()` | Exactly 867 | [5] |
157+
| C5 | Verify weight initialization | Check weight distribution | Xavier-like variance | [5] |
158+
| C6 | Verify metadata | Check `model.metadata` for task, classes | All keys present | [5] |
159+
| C7 | Roundtrip test | Save → Load → Compare | Byte-identical | [9] |
160+
161+
### .ald Dataset Format
162+
163+
| # | Check | Command/Method | Pass Criteria | Ref |
164+
|---|-------|----------------|---------------|-----|
165+
| C8 | Verify magic bytes | `hexdump -C demo/assets/timeseries_100.ald \| head -1` | Starts with `ALD\0` | [9] |
166+
| C9 | Parse dataset in Rust | `AldDataset::load()` succeeds | No errors | [9] |
167+
| C10 | Verify tensor count | `dataset.tensors.len()` | Exactly 5 | [9] |
168+
| C11 | Verify tensor shapes | All tensors have shape `[100]` | True | [9] |
169+
| C12 | Verify OHLC validity | `high >= low` for all rows | True for 100/100 | [9] |
170+
| C13 | Verify OHLC validity | `high >= max(open, close)` | True for 100/100 | [9] |
171+
| C14 | Verify positive prices | All values > 0 | True | [9] |
172+
| C15 | Roundtrip test | Save → Load → Compare | Byte-identical | [9] |
173+
174+
---
175+
176+
## D. Visualization Accuracy (15 Points)
177+
178+
**Principle: Standardized work — Every chart must render identically every time**
179+
180+
### Candlestick Chart
181+
182+
| # | Check | Command/Method | Pass Criteria | Ref |
183+
|---|-------|----------------|---------------|-----|
184+
| D1 | Verify candlestick count | Visual count or DOM inspection | Exactly 100 | [8] |
185+
| D2 | Verify green/red coloring | Up days green, down days red | Correct for sample | [8] |
186+
| D3 | Verify Y-axis scale | Compare displayed prices to data | Within 1% | [8] |
187+
| D4 | Verify current price line | Matches last close value | Exact match | [8] |
188+
| D5 | Verify wick rendering | High-low range visible | All wicks visible | [6] |
189+
190+
### Bar Chart
191+
192+
| # | Check | Command/Method | Pass Criteria | Ref |
193+
|---|-------|----------------|---------------|-----|
194+
| D6 | Verify bar count | Visual inspection | Exactly 6 bars | [8] |
195+
| D7 | Verify bar heights proportional | Tallest bar = highest value | Correct | [8] |
196+
| D8 | Verify value labels | Labels match bar heights | Within ±1% | [8] |
197+
| D9 | Verify month labels | Jan-Jun displayed correctly | Correct order | [8] |
198+
| D10 | Verify animation easing | Bars ease-out, not linear | Visually smooth | [6] |
199+
200+
### Donut Chart
201+
202+
| # | Check | Command/Method | Pass Criteria | Ref |
203+
|---|-------|----------------|---------------|-----|
204+
| D11 | Verify segment count | Visual inspection | Exactly 5 segments | [8] |
205+
| D12 | Verify segment proportions | Arc lengths proportional to values | Within ±5% | [8] |
206+
| D13 | Verify center total | Sum of all segments | Correct sum displayed | [8] |
207+
| D14 | Verify rotation animation | Donut rotates smoothly | Continuous rotation | [6] |
208+
| D15 | Verify color consistency | Same colors across refresh | Deterministic | [8] |
209+
210+
---
211+
212+
## E. Animation & Interaction (10 Points)
213+
214+
**Principle: Heijunka — Smooth, level flow without bursts or stalls**
215+
216+
| # | Check | Command/Method | Pass Criteria | Ref |
217+
|---|-------|----------------|---------------|-----|
218+
| E1 | Verify bar animation smoothness | Record slow-mo, check for jumps | No discontinuities | [6] |
219+
| E2 | Verify particle physics | Particles fall with gravity | Realistic arc | [6] |
220+
| E3 | Verify particle fade | Alpha decreases over lifetime | Smooth fade | [6] |
221+
| E4 | Verify click-to-emit | Click donut area | Particles spawn at click position | [8] |
222+
| E5 | Verify button hover states | Mouse over buttons | Visual feedback | [8] |
223+
| E6 | Verify Randomize Data | Click button | All charts update | [8] |
224+
| E7 | Verify Run Inference | Click button | Model card flashes 3x | [5] |
225+
| E8 | Verify Emit Particles | Click button | 30 particles spawn | [6] |
226+
| E9 | Verify no interaction blocking | Rapid button clicks | All register | [8] |
227+
| E10 | Verify cleanup | Wait 5s after particles | All particles gone | [6] |
228+
229+
---
230+
231+
## F. Cross-Platform Compatibility (10 Points)
232+
233+
**Principle: Challenge everything — "It works on my machine" is not acceptance criteria**
234+
235+
| # | Check | Command/Method | Pass Criteria | Ref |
236+
|---|-------|----------------|---------------|-----|
237+
| F1 | Chrome (latest) | Manual test | All features work | [2] |
238+
| F2 | Firefox (latest) | Manual test | All features work | [2] |
239+
| F3 | Safari (latest) | Manual test on macOS | All features work | [2] |
240+
| F4 | Edge (latest) | Manual test | All features work | [2] |
241+
| F5 | Mobile Chrome (Android) | Touch interactions work | Tap emits particles | [8] |
242+
| F6 | Mobile Safari (iOS) | Touch interactions work | Tap emits particles | [8] |
243+
| F7 | 4K display (3840x2160) | No blurriness | Crisp rendering | [6] |
244+
| F8 | 1366x768 display | No overflow/clipping | All content visible | [8] |
245+
| F9 | Dark mode OS setting | No conflicts | Renders correctly | [8] |
246+
| F10 | Reduced motion preference | `prefers-reduced-motion` | Animations respect | [8] |
247+
248+
---
249+
250+
## G. Code Quality (10 Points)
251+
252+
**Principle: Respect for people — Clean code respects the next developer's time**
253+
254+
| # | Check | Command/Method | Pass Criteria | Ref |
255+
|---|-------|----------------|---------------|-----|
256+
| G1 | All Rust tests pass | `cargo test --example showcase_gpu` | 48/48 pass | [5] |
257+
| G2 | All Rust tests pass | `cargo test --example generate_demo_assets` | 17/17 pass | [5] |
258+
| G3 | No clippy warnings | `cargo clippy --example showcase_gpu` | 0 errors | [5] |
259+
| G4 | No JavaScript console errors | DevTools Console | 0 errors | [8] |
260+
| G5 | No JavaScript console warnings | DevTools Console | 0 warnings | [8] |
261+
| G6 | HTML validates | W3C Validator | 0 errors | [8] |
262+
| G7 | No hardcoded secrets | `grep -r "password\|secret\|key" web/` | 0 matches | [10] |
263+
| G8 | Deterministic output | Run generator twice, compare | Identical files | [9] |
264+
| G9 | Comments explain "why" | Code review | Non-trivial logic commented | [1] |
265+
| G10 | No TODO/FIXME in production | `grep -r "TODO\|FIXME" web/showcase/` | 0 matches | [4] |
266+
267+
---
268+
269+
## H. Claim Substantiation (5 Points)
270+
271+
**Principle: Say what you mean; mean what you say — Marketing claims must match reality**
272+
273+
| # | Claim | Verification Method | Evidence Required | Ref |
274+
|---|-------|---------------------|-------------------|-----|
275+
| H1 | "60fps" | Measured FPS from A1-A2 | Screenshot of DevTools | [8] |
276+
| H2 | "450KB bundle" | Measured from B1-B3 | `ls -la` output | [2] |
277+
| H3 | "80ms startup" | Measured from B11-B13 | Lighthouse report | [8] |
278+
| H4 | "32MB memory" | Measured from B6-B7 | DevTools screenshot | [4] |
279+
| H5 | "10X better" | Each comparison measured | Data table with sources | [7] |
280+
281+
---
282+
283+
## Scoring
284+
285+
| Grade | Score | Interpretation |
286+
|-------|-------|----------------|
287+
| A+ | 95-100 | Production ready, Toyota Quality |
288+
| A | 90-94 | Minor issues, safe to ship |
289+
| B | 80-89 | Significant issues, needs iteration |
290+
| C | 70-79 | Major issues, do not ship |
291+
| F | <70 | Fundamental problems, redesign required |
292+
293+
---
294+
295+
## Sign-Off
296+
297+
| Role | Name | Date | Score | Signature |
298+
|------|------|------|-------|-----------|
299+
| QA Lead | | | /100 | |
300+
| ML Engineer | | | /100 | |
301+
| Performance Engineer | | | /100 | |
302+
| Security Reviewer | | | /100 | |
303+
304+
---
305+
306+
## Appendix: Quick Verification Commands
307+
308+
```bash
309+
# A. Performance
310+
cargo run --example showcase_gpu
311+
cargo build --example showcase_gpu --target wasm32-unknown-unknown --release
312+
313+
# B. Size
314+
ls -la target/wasm32-unknown-unknown/release/examples/showcase_gpu.wasm
315+
wc -c web/showcase/index.html
316+
317+
# C. Data Integrity
318+
cargo test --package presentar-yaml -- formats
319+
hexdump -C demo/assets/sentiment_mini.apr | head -1
320+
hexdump -C demo/assets/timeseries_100.ald | head -1
321+
322+
# D-F. Manual verification
323+
cd web/showcase && python3 -m http.server 8080
324+
# Open http://localhost:8080 in each browser
325+
326+
# G. Code Quality
327+
cargo test --example showcase_gpu
328+
cargo test --example generate_demo_assets
329+
cargo clippy --example showcase_gpu 2>&1 | grep -c "error"
330+
```
331+
332+
---
333+
334+
*"Quality is not an act, it is a habit." — Aristotle*
335+
336+
*"The Toyota Way is not about perfection. It is about pursuing perfection while accepting that you will never achieve it." — Jeffrey Liker [1]*

0 commit comments

Comments
 (0)