Skip to content

Commit bf23325

Browse files
committed
docs: add developer documentation for incremental builds
Document the incremental build infrastructure including: - Architecture overview and component responsibilities - How to use IncrementalBuildState in consumer applications - Persistence format and cache invalidation strategies - Security considerations and resource limits
1 parent cd31272 commit bf23325

File tree

3 files changed

+581
-0
lines changed

3 files changed

+581
-0
lines changed
Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
====================================
2+
Incremental Builds: Architecture
3+
====================================
4+
5+
This document describes the internal architecture, design decisions, and
6+
security considerations of the incremental build system. For usage documentation,
7+
see :doc:`incremental-builds`.
8+
9+
Design Goals
10+
============
11+
12+
The incremental build system was designed with these priorities:
13+
14+
1. **Correctness** - Never skip a document that needs re-rendering
15+
2. **Performance** - O(1) operations where possible, efficient memory usage
16+
3. **Security** - Prevent resource exhaustion and path traversal attacks
17+
4. **Parallelization** - Support parallel compilation workflows
18+
19+
Architecture Overview
20+
=====================
21+
22+
.. code-block:: text
23+
24+
┌─────────────────────────────────────────────────────────────┐
25+
│ IncrementalBuildCache │
26+
│ (Orchestrates caching, persistence, and state management) │
27+
└─────────────────────────────────────────────────────────────┘
28+
│ │ │
29+
▼ ▼ ▼
30+
┌───────────────────┐ ┌──────────────────┐ ┌─────────────────┐
31+
│ DependencyGraph │ │ DocumentExports │ │ CacheVersioning│
32+
│ (Import/export │ │ (Per-document │ │ (Version │
33+
│ relationships) │ │ public API) │ │ validation) │
34+
└───────────────────┘ └──────────────────┘ └─────────────────┘
35+
│ │
36+
▼ ▼
37+
┌───────────────────┐ ┌──────────────────┐
38+
│ DirtyPropagator │ │ ChangeDetector │
39+
│ (Cascade dirty │ │ (File-based │
40+
│ state) │ │ detection) │
41+
└───────────────────┘ └──────────────────┘
42+
43+
Component Responsibilities
44+
==========================
45+
46+
IncrementalBuildCache
47+
---------------------
48+
49+
The central orchestrator for cache persistence. Uses **sharded storage** with
50+
256 buckets (2-character hex prefix from MD5 hash) for efficient incremental saves.
51+
52+
**Design decisions:**
53+
54+
- Sharded storage: Only modified documents are rewritten, not the entire cache
55+
- Hash-based filenames: Prevents path traversal and handles special characters
56+
- Separate metadata file: ``_build_meta.json`` is always loaded; exports are lazy-loaded
57+
58+
DependencyGraph
59+
---------------
60+
61+
Bidirectional graph tracking import/dependent relationships. Uses keyed arrays
62+
for O(1) lookup performance.
63+
64+
**Design decisions:**
65+
66+
- Bidirectional: Stores both ``imports[A] = [B, C]`` and ``dependents[B] = [A]``
67+
- Keyed arrays: ``$imports[$doc][$target] = true`` for O(1) add/remove/lookup
68+
- Depth-limited traversal: Maximum 100 levels to prevent stack overflow on cycles
69+
70+
DirtyPropagator
71+
---------------
72+
73+
Propagates dirty state through the dependency graph when exports change.
74+
75+
**Design decisions:**
76+
77+
- Uses ``SplQueue`` for O(1) enqueue/dequeue (vs ``array_shift`` which is O(n))
78+
- Export comparison: Only propagates when *exports* change, not just content
79+
- Visited tracking: Prevents infinite loops in cyclic dependencies
80+
81+
GlobalInvalidationDetector
82+
--------------------------
83+
84+
Detects changes that require a full rebuild (config, theme, toctree structure).
85+
86+
**Design decisions:**
87+
88+
- Configurable patterns: Default patterns can be overridden per-project
89+
- Directory patterns: Must match complete path segments (``foo/`` matches
90+
``path/foo/bar`` but not ``prefix_foo/bar``)
91+
92+
Security Model
93+
==============
94+
95+
The incremental build system processes untrusted cache files and must defend
96+
against malicious input.
97+
98+
Resource Limits
99+
---------------
100+
101+
All components enforce consistent limits to prevent memory exhaustion:
102+
103+
.. code-block:: php
104+
105+
// Consistent across all classes
106+
MAX_DOCUMENTS = 100_000
107+
MAX_EXPORTS = 100_000
108+
MAX_OUTPUT_PATHS = 100_000
109+
MAX_PROPAGATION_VISITS = 100_000
110+
111+
// DependencyGraph-specific
112+
MAX_TOTAL_EDGES = 2_000_000
113+
MAX_IMPORTS_PER_DOCUMENT = 1_000
114+
115+
// GlobalInvalidationDetector
116+
MAX_PATTERN_LENGTH = 256
117+
118+
**Important:** These limits are intentionally kept in sync. If you change one,
119+
consider whether related limits should also change.
120+
121+
Input Validation
122+
----------------
123+
124+
All ``fromArray()`` deserialization methods validate:
125+
126+
1. **Type checking**: All values must match expected types
127+
2. **Size limits**: Arrays must not exceed maximum sizes
128+
3. **Format validation**: Hashes must be valid hex strings
129+
4. **Character validation**: Document paths reject control characters
130+
131+
Path Traversal Prevention
132+
-------------------------
133+
134+
The sharded cache system prevents path traversal attacks:
135+
136+
.. code-block:: php
137+
138+
// Shard directory names validated with regex
139+
private function isValidShardName(string $name): bool
140+
{
141+
return preg_match('/^[0-9a-f]{2}$/', $name) === 1;
142+
}
143+
144+
// Document paths become hash-based filenames
145+
$hash = md5($docPath); // e.g., "d41d8cd98f00b204"
146+
$prefix = substr($hash, 0, 2); // e.g., "d4"
147+
$filename = $hash . '.json'; // Full hash as filename
148+
149+
Thread Safety
150+
=============
151+
152+
The incremental build classes are **NOT thread-safe**. They are designed for
153+
single-threaded build processes.
154+
155+
For parallel builds, use the extract/merge pattern:
156+
157+
.. code-block:: php
158+
159+
// Parent process
160+
$cache = new IncrementalBuildCache($versioning);
161+
$cache->load($outputDir);
162+
163+
// Fork child processes, each with their own state
164+
foreach ($chunks as $chunk) {
165+
$childState = $state->extractState();
166+
// Pass $childState to child process
167+
}
168+
169+
// After children complete, merge results sequentially
170+
foreach ($childResults as $result) {
171+
$cache->mergeState($result);
172+
}
173+
174+
$cache->save($outputDir);
175+
176+
Algorithm Complexity
177+
====================
178+
179+
.. list-table::
180+
:header-rows: 1
181+
182+
* - Operation
183+
- Complexity
184+
- Notes
185+
* - ``DependencyGraph::addImport()``
186+
- O(1)
187+
- Keyed array insertion
188+
* - ``DependencyGraph::getImports()``
189+
- O(1)
190+
- Direct array access
191+
* - ``DependencyGraph::propagateDirty()``
192+
- O(V + E)
193+
- BFS traversal
194+
* - ``DirtyPropagator::propagate()``
195+
- O(V + E)
196+
- Uses SplQueue
197+
* - ``IncrementalBuildCache::save()``
198+
- O(dirty)
199+
- Only writes changed exports
200+
* - ``ChangeDetector::detectChanges()``
201+
- O(n)
202+
- Checks each document
203+
204+
Cache Format
205+
============
206+
207+
The cache uses two storage formats:
208+
209+
Metadata File (``_build_meta.json``)
210+
------------------------------------
211+
212+
Always loaded. Contains version info, dependency graph, and output paths.
213+
214+
.. code-block:: json
215+
216+
{
217+
"metadata": {
218+
"version": 1,
219+
"phpVersion": "8.1.0",
220+
"packageVersion": "1.0.0",
221+
"settingsHash": "abc123...",
222+
"createdAt": 1706140800
223+
},
224+
"dependencies": {
225+
"imports": {"doc1": {"doc2": true}},
226+
"dependents": {"doc2": {"doc1": true}}
227+
},
228+
"outputs": {
229+
"doc1": "/output/doc1.html"
230+
}
231+
}
232+
233+
Export Files (``_exports/<hash-prefix>/<hash>.json``)
234+
-----------------------------------------------------
235+
236+
Loaded on demand. One file per document, sharded into 256 directories.
237+
238+
.. code-block:: json
239+
240+
{
241+
"path": "getting-started",
242+
"documentPath": "getting-started",
243+
"contentHash": "a1b2c3...",
244+
"exportsHash": "d4e5f6...",
245+
"anchors": {"installation": "Installation"},
246+
"sectionTitles": {"installation": "Installation"},
247+
"citations": [],
248+
"lastModified": 1706140800,
249+
"documentTitle": "Getting Started"
250+
}
251+
252+
Testing Guidelines
253+
==================
254+
255+
When modifying the incremental build system:
256+
257+
1. **Security tests**: Add tests for limit enforcement and validation
258+
2. **Edge cases**: Test cycles, empty graphs, maximum sizes
259+
3. **Serialization round-trips**: Test ``toArray()``/``fromArray()`` compatibility
260+
4. **Algorithm correctness**: Verify dirty propagation finds all affected documents
261+
262+
Example test patterns:
263+
264+
.. code-block:: php
265+
266+
// Test limit enforcement
267+
public function testRejectsExcessiveDocuments(): void
268+
{
269+
$this->expectException(InvalidArgumentException::class);
270+
// ... create data exceeding MAX_DOCUMENTS
271+
}
272+
273+
// Test cycle handling
274+
public function testHandlesCyclicDependencies(): void
275+
{
276+
$graph->addImport('a', 'b');
277+
$graph->addImport('b', 'c');
278+
$graph->addImport('c', 'a'); // Cycle!
279+
280+
$result = $graph->propagateDirty(['a']);
281+
// Should not infinite loop, should find all three
282+
}
283+
284+
Extending the System
285+
====================
286+
287+
Adding New Dependency Types
288+
---------------------------
289+
290+
To track a new type of cross-reference:
291+
292+
1. Update ``DependencyGraphPass`` to detect the new reference type
293+
2. Call ``$graph->addImport($source, $target)`` for each reference
294+
3. Add tests for the new reference detection
295+
296+
Adding New Export Types
297+
-----------------------
298+
299+
To track additional exported symbols:
300+
301+
1. Update ``DocumentExports`` to include the new field
302+
2. Update ``ExportsCollectorPass`` to collect the new data
303+
3. Update ``ContentHasher::hashExports()`` to include in the hash
304+
4. Add tests for export change detection

0 commit comments

Comments
 (0)