Skip to content

Commit 5b15818

Browse files
Codebase cleanup: schema fixes, CI, and documentation improvements
Schema fixes: - Add missing allOf conditions for horizontalRule and break block types - Fix tableCell children to require blocks only (not textNode), resolving oneOf ambiguity - Fix cross-schema references to use full URLs instead of relative paths CI/CD: - Add GitHub Action for automated schema and example validation - Handle cross-schema references with -r flag for annotations and phantoms schemas Documentation: - Create comprehensive example document demonstrating all block types - Update spec to reflect tableCell block-only children requirement - Add tableCell content rule and update examples in spec - Update Editors field in introduction - Add strategic insights summary to design-decisions.md - Archive strategy conversation notes to docs/archive/
1 parent ff88972 commit 5b15818

13 files changed

Lines changed: 1133 additions & 22 deletions

File tree

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
name: Validate Schemas
2+
3+
on:
4+
push:
5+
paths:
6+
- 'schemas/**'
7+
- 'examples/**'
8+
- '.github/workflows/validate-schemas.yml'
9+
pull_request:
10+
paths:
11+
- 'schemas/**'
12+
- 'examples/**'
13+
- '.github/workflows/validate-schemas.yml'
14+
15+
jobs:
16+
validate:
17+
runs-on: ubuntu-latest
18+
steps:
19+
- uses: actions/checkout@v4
20+
21+
- uses: actions/setup-node@v4
22+
with:
23+
node-version: '20'
24+
25+
- name: Install ajv-cli
26+
run: npm install -g ajv-cli ajv-formats
27+
28+
- name: Validate schemas compile
29+
run: |
30+
# Compile standalone schemas
31+
for f in schemas/anchor.schema.json schemas/asset-index.schema.json \
32+
schemas/content.schema.json schemas/dublin-core.schema.json \
33+
schemas/manifest.schema.json schemas/precise-layout.schema.json \
34+
schemas/presentation.schema.json schemas/provenance.schema.json; do
35+
echo "Compiling $f..."
36+
ajv compile -s "$f" --spec=draft2020 --strict=false
37+
done
38+
39+
# Compile schemas with cross-references (need -r for referenced schemas)
40+
echo "Compiling schemas/annotations.schema.json..."
41+
ajv compile -s schemas/annotations.schema.json \
42+
-r schemas/anchor.schema.json --spec=draft2020 --strict=false
43+
echo "Compiling schemas/phantoms.schema.json..."
44+
ajv compile -s schemas/phantoms.schema.json \
45+
-r schemas/anchor.schema.json --spec=draft2020 --strict=false
46+
47+
- name: Validate example documents
48+
run: |
49+
# Validate simple-document
50+
ajv validate -s schemas/manifest.schema.json \
51+
-d examples/simple-document/manifest.json \
52+
--spec=draft2020 --strict=false
53+
ajv validate -s schemas/content.schema.json \
54+
-d examples/simple-document/content/document.json \
55+
--spec=draft2020 --strict=false
56+
ajv validate -s schemas/dublin-core.schema.json \
57+
-d examples/simple-document/metadata/dublin-core.json \
58+
--spec=draft2020 --strict=false
59+
60+
# Validate signed-document
61+
ajv validate -s schemas/manifest.schema.json \
62+
-d examples/signed-document/manifest.json \
63+
--spec=draft2020 --strict=false
64+
ajv validate -s schemas/content.schema.json \
65+
-d examples/signed-document/content/document.json \
66+
--spec=draft2020 --strict=false
67+
ajv validate -s schemas/dublin-core.schema.json \
68+
-d examples/signed-document/metadata/dublin-core.json \
69+
--spec=draft2020 --strict=false
70+
71+
# Validate comprehensive-document (if exists)
72+
if [ -d examples/comprehensive-document ]; then
73+
ajv validate -s schemas/manifest.schema.json \
74+
-d examples/comprehensive-document/manifest.json \
75+
--spec=draft2020 --strict=false
76+
ajv validate -s schemas/content.schema.json \
77+
-d examples/comprehensive-document/content/document.json \
78+
--spec=draft2020 --strict=false
79+
ajv validate -s schemas/dublin-core.schema.json \
80+
-d examples/comprehensive-document/metadata/dublin-core.json \
81+
--spec=draft2020 --strict=false
82+
fi
File renamed without changes.
File renamed without changes.

docs/design-decisions.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -499,3 +499,80 @@ Should the format support DRM?
499499

500500
**Status**: Explicitly out of scope
501501
**Considerations**: Opens governance issues; encryption provides confidentiality
502+
503+
---
504+
505+
## Strategic Insights
506+
507+
This section captures key strategic insights from early design discussions that inform the specification's direction and adoption approach. Full discussion notes are archived in `docs/archive/`.
508+
509+
### SI-001: Technical Merit vs Adoption
510+
511+
**Insight**: Technical problems are real and documented, but technical merit is approximately 20% of what determines format success. The other 80% is ecosystem, timing, and adoption strategy.
512+
513+
**Evidence**:
514+
- PDF signature vulnerabilities proven in published security research (21 of 22 desktop viewers vulnerable)
515+
- View/edit divide creates universal workflow friction
516+
- Multi-billion dollar AI extraction industry exists because PDF structure is unreliable
517+
518+
**Implication**: Technical case is necessary but not sufficient. A spec without robust tooling is just documentation.
519+
520+
---
521+
522+
### SI-002: Beachhead Strategy — Academia First
523+
524+
**Insight**: Academia is the optimal initial adoption target, with legal/enterprise as a secondary market.
525+
526+
**Why Academia**:
527+
- Lower switching costs (no enterprise contracts, IT approval chains)
528+
- Cultural alignment with open standards
529+
- Acute pain points: citations as flat text, figures as inaccessible blobs, unreliable text extraction
530+
- LaTeX users prove academics tolerate complexity for better output
531+
- Natural integration points: Overleaf, Zotero, Pandoc, Jupyter
532+
- Long-term pipeline: grad student → professor → journal editor → department mandate (10-15 year arc)
533+
534+
**Why Legal Secondary**:
535+
- High pain point but high friction (entrenched tooling, tech-averse users)
536+
- Better play: become "killer feature" for tooling vendor entering legal market
537+
- "If lawyers trust it for contracts" provides powerful social proof
538+
539+
---
540+
541+
### SI-003: Development Philosophy
542+
543+
**Insight**: Start solo, design for OSS. Empty repos don't attract contributors; working code does.
544+
545+
**Approach**:
546+
1. Begin implementation alone to move fast and establish patterns
547+
2. Design for OSS from day one (clear architecture, good docs, contribution points)
548+
3. Open implementation once there's something functional to contribute to
549+
4. The spec is already open — that's the legitimacy part
550+
551+
**Rationale**: Speed now, community later. Avoid "design by committee" early.
552+
553+
---
554+
555+
### SI-004: Implementation Priorities
556+
557+
**Insight**: Pandoc integration is the highest-leverage early move.
558+
559+
**Build Order**:
560+
1. **cdx-core** (Rust library) — Foundation everything else builds on
561+
2. **cdx-cli** — Dogfoods the core library, essential for tooling development
562+
3. **Pandoc writer** — Markdown → Codex (the academia unlock)
563+
4. **Web viewer** — cdx-core compiled to WASM (zero-install demonstration)
564+
565+
**Why Pandoc**: Academics don't adopt new editors, they adopt new export targets. A Pandoc writer fits existing workflows with zero friction for authors.
566+
567+
---
568+
569+
### SI-005: Spec Evolution Principles
570+
571+
**Insight**: Specs accumulate ad-hoc solutions. Catching inconsistencies early (pre-v1.0) and unifying is much cheaper than fixing later.
572+
573+
**Lessons Learned**:
574+
- "Where does this live?" is the critical question for new features (inside vs outside hashing boundary)
575+
- Contradictions hide in prose — normative algorithms trump aspirational text
576+
- Each fix pulls on connected threads — changes are individually clean but interdependent
577+
578+
**Example**: Three extensions independently invented sub-block addressing. Unifying into the anchor system prevented fragmentation.

0 commit comments

Comments
 (0)