Purpose: Create W3C Web Annotations on resources — manually by selecting text, or via AI-assisted detection that identifies highlights, assessments, comments, tags, and entity references. Both human and AI agents participate as peers in annotation creation.
Related Documentation:
- W3C Web Annotation Data Model - Complete W3C specification implementation
- W3C Selectors - TextPositionSelector and TextQuoteSelector details
- Knowledge System - Event store, view storage, graph database flow
- Frontend Annotations - UI patterns and component architecture
- CodeMirror Integration - Position accuracy and CRLF handling
- @semiont/make-meaning - Detection API and job workers
- Make-Meaning Job Workers - Worker implementation details
- Make-Meaning API Reference - AnnotationDetection methods
The Mark flow adds structured metadata to resources. The application labels, tags, categorizes, and enriches content through auto-tagging, key term highlighting, priority flagging, and status indicators. AI agents perform named entity recognition, topic classification, and semantic enrichment; human collaborators perform triage and classification — determining what each passage means, its urgency, and where it belongs. The resulting annotations serve as anchors for downstream linking, context assembly, and navigation.
Semiont creates W3C-compliant annotations through two complementary paths: manual annotation (a human selects text and chooses a motivation) and AI-assisted detection (an AI agent scans the document and proposes annotations). Both paths produce identical W3C Web Annotations and flow through the same event-sourced pipeline. This system combines:
- W3C Web Annotation Data Model - Standards-compliant annotation structure with dual selectors
- AI Inference - LLM-powered text analysis with configurable prompts and user instructions
- Backend Event Architecture - Event Store → View Storage → Graph Database flow with <50ms latency
- Frontend UI - Real-time progress display with SSE streaming and visual feedback
Supported Formats: Currently available for text-based formats (text/plain, text/markdown). Support for images and PDFs is planned for future releases
Manual annotation — create an annotation directly. The mark
namespace emits mark:create-request via the bus gateway; the backend
annotation-assembly handler builds the full W3C annotation from the
intent (using the authenticated user's DID as the creator) and passes
it to Stower.
const { annotationId } = await client.mark.annotation(resourceId, {
motivation: 'highlighting',
target: {
source: resourceId,
selector: {
type: 'TextQuoteSelector',
exact: 'Ouranos',
prefix: 'In the beginning, ',
suffix: ' ruled the universe',
},
},
// highlighting carries no body — motivation + target is the whole
// annotation per the W3C Web Annotation Model.
});AI-assisted annotation — long-running job that streams progress
events. client.mark.assist() returns an Observable; internally it
emits job:create (with jobType derived from the motivation) on the
bus gateway. A worker claims the job, runs detection, and publishes
job:start / job:report-progress / job:complete / job:fail
events as it goes (filtered by jobId).
// Detect highlights with AI
client.mark.assist(resourceId, 'highlighting', {
instructions: 'Focus on key technical points',
density: 5,
}).subscribe({
next: (event) => console.log('progress:', event),
complete: () => console.log('done'),
});
// Detect entity references
client.mark.assist(resourceId, 'linking', {
entityTypes: ['Person', 'Location'],
includeDescriptiveReferences: false,
}).subscribe({ /* ... */ });| Motivation | W3C Spec | Purpose | Body Content | User Control |
|---|---|---|---|---|
highlighting |
W3C §3.1 | Mark important passages | Empty array [] |
Optional instructions (max 500 chars) + density (1-15) |
assessing |
W3C §3.1 | Evaluate and assess content | Assessment text as TextualBody |
Optional instructions (max 500 chars) + tone + density (1-10) |
commenting |
W3C §3.1 | Add explanatory comments | Comment text as TextualBody with purpose: "commenting" |
Optional instructions (max 500 chars) + tone + density (2-12) |
tagging |
W3C §3.1 | Identify structural roles | Dual-body structure: category (purpose: "tagging") + schema ID (purpose: "describing") |
Selected schema (IRAC/IMRAD/Toulmin) + categories |
linking |
W3C §3.1 | Extract entity references | Entity type tags as TextualBody with purpose: "tagging" |
Selected entity types from registry + include descriptive references option |
All types create annotations with:
- Target: Text selection with dual selectors (TextPositionSelector + TextQuoteSelector)
- Body: Empty for highlights, assessment text for assessments, comment text for comments, entity type tags for references
- Creator: W3C Agent identifying who requested the annotation
- Generator: W3C SoftwareAgent identifying the worker and inference model that produced it (present when a worker did the work, absent when an agent annotated directly)
- Created: ISO 8601 timestamp
Mark events split into two shapes for the purpose of concurrent-write semantics:
-
Immutable appends —
mark:create(a new annotation) andmark:archived(annotation removed) carry their own annotation identity and never collide. Two participants creating annotations on the same passage concurrently produce two distinct annotations; nothing is rejected, nothing merges. Two participants archiving the same annotation concurrently each produce amark:archivedevent; the projection sees the annotation archived (idempotent). -
Body updates —
mark:update-bodyevents arrive atEventStore.appendEvent(packages/event-sourcing/src/event-store.ts) in some order, are persisted to the event log, and replayed throughapplyBodyOperations(packages/core/src/annotation-assembly.ts) in arrival order. Each operation runs against the body produced by the previous event — not the body the originator saw when they issued the command. There is no version field, noIf-Match, no rejection of stale writes. Both writes succeed; the resulting body reflects sequential application of both operation sets.
The Bind flow's bind:update-body forwards to mark:update-body; see BIND.md § Concurrent Binds for the per-operation semantics (add is idempotent on equal items, remove drops first match, replace keys on oldItem). Workflows that need single-writer semantics enforce it at the application layer (typically via a coordination signal like bind:initiate) rather than expecting the protocol to reject concurrent writers.
Every detected annotation follows the W3C Web Annotation Data Model:
{
"@context": "http://www.w3.org/ns/anno.jsonld",
"type": "Annotation",
"id": "http://localhost:4000/annotations/abc123",
"motivation": "highlighting",
"creator": {
"id": "did:web:localhost:users:alice",
"type": "Person",
"name": "Alice"
},
"generator": {
"@type": "SoftwareAgent",
"name": "Highlight Worker / Anthropic claude-sonnet-4-6",
"worker": "Highlight Worker",
"inferenceProvider": "anthropic",
"model": "claude-sonnet-4-6"
},
"created": "2025-12-04T10:30:00Z",
"target": {
"type": "SpecificResource",
"source": "http://localhost:4000/resources/doc-456",
"selector": [
{
"type": "TextPositionSelector",
"start": 52,
"end": 59
},
{
"type": "TextQuoteSelector",
"exact": "Ouranos",
"prefix": "In the beginning, ",
"suffix": " ruled the universe"
}
]
},
"body": []
}Reference annotation example (with entity type tags):
{
"motivation": "linking",
"body": [
{
"type": "TextualBody",
"value": "Person",
"purpose": "tagging"
},
{
"type": "TextualBody",
"value": "Deity",
"purpose": "tagging"
}
]
}Comment annotation example (with explanatory text):
{
"motivation": "commenting",
"body": [
{
"type": "TextualBody",
"value": "Ouranos (also spelled Uranus) is the primordial Greek deity personifying the sky. In Hesiod's Theogony, he is the son and husband of Gaia (Earth) and father of the Titans.",
"purpose": "commenting",
"format": "text/plain",
"language": "en"
}
]
}Implementation:
- Highlights: packages/jobs/src/workers/highlight-annotation-worker.ts
- Assessments: packages/jobs/src/workers/assessment-annotation-worker.ts
- Comments: packages/jobs/src/workers/comment-annotation-worker.ts
- References: packages/jobs/src/workers/reference-annotation-worker.ts
- Tags: packages/jobs/src/workers/tag-annotation-worker.ts
See Job Workers Documentation for complete details on worker architecture and dependency injection.
Every detected annotation uses both W3C selector types (W3C §4.2):
TextPositionSelector (W3C §4.2.1):
- Character offsets from document start:
{ "start": 52, "end": 59 } - Fast, precise lookup when document unchanged
- Required by detection workers to create annotations
TextQuoteSelector (W3C §4.2.4):
- Exact text with prefix/suffix context
- Enables fuzzy anchoring when content shifts
- AI provides 32 characters of prefix/suffix context
- Disambiguates multiple occurrences of same text
Why Dual Selectors?
- Position-based anchoring works when content unchanged
- Text-based anchoring recovers from content edits, line ending changes (CRLF ↔ LF)
- Prefix/suffix enables finding text even when LLM positions are approximate
See W3C-SELECTORS.md for complete selector documentation.
Frontend uses fuzzy anchoring (CODEMIRROR-INTEGRATION.md) to handle:
- Documents edited after annotation creation
- Character position shifts from insertions/deletions
- Line ending normalization (CRLF → LF)
- Multiple occurrences of same text
Implementation: packages/api-client/src/utils/fuzzy-anchor.ts with comprehensive tests.
Detection workers use structured prompts optimized for each annotation type:
Detection workers use the AnnotationDetection class from @semiont/make-meaning for all AI-powered detection logic. Workers handle job orchestration and progress tracking, while detection methods handle prompt construction and response parsing.
Highlight Detection:
- Detection Method: AnnotationDetection.detectHighlights()
- Worker: HighlightAnnotationWorker
- Task: Identify important/noteworthy passages
- Input: First 8000 characters + optional user instructions
- Output: JSON array with
exact,start,end,prefix,suffix - Model params: max_tokens=2000, temperature=0.3
Assessment Detection:
- Detection Method: AnnotationDetection.detectAssessments()
- Worker: AssessmentAnnotationWorker
- Task: Assess and evaluate key passages
- Input: First 8000 characters + optional user instructions
- Output: JSON array with
exact,start,end,prefix,suffix,assessment - Model params: max_tokens=2000, temperature=0.3
Comment Detection:
- Detection Method: AnnotationDetection.detectComments()
- Worker: CommentAnnotationWorker
- Task: Identify passages needing explanatory comments
- Input: First 8000 characters + optional user instructions + optional tone (scholarly/explanatory/conversational/technical)
- Output: JSON array with
exact,start,end,prefix,suffix,comment - Model params: max_tokens=3000 (higher to allow for comment generation), temperature=0.4 (higher for creative context)
- Guidelines: Emphasis on selectivity (3-8 comments per 2000 words), value beyond restating text, focus on context/background/clarification
Tag Detection:
- Detection Method: AnnotationDetection.detectTags()
- Worker: TagAnnotationWorker
- Task: Detect and extract structured tags using ontology schemas
- Input: Full document content + schema ID + category
- Output: JSON array with
exact,start,end,prefix,suffix,category - Model params: max_tokens=2000, temperature=0.3
Reference/Entity Detection:
- Worker: ReferenceAnnotationWorker
- Task: Identify entity references by type (Person, Location, Concept, etc.)
- Input: Full document content + selected entity types (with optional examples)
- Output: JSON array with
exact,entityType,startOffset,endOffset,prefix,suffix - Model params: max_tokens=4000, temperature=0.3
All detection types support various parameters to customize AI behavior and control output.
Optional free-text guidance (max 500 characters) to influence what the AI detects:
Highlight Examples:
- "Focus on key technical points"
- "Highlight definitions and important concepts"
- "Find passages related to security"
Assessment Examples:
- "Evaluate claims for accuracy"
- "Assess the strength of evidence"
- "Focus on methodology"
Comment Examples:
- "Focus on technical terminology"
- "Explain historical references"
- "Clarify complex concepts"
Controls the writing style of generated text:
- Analytical (Assessments): Objective, evidence-based evaluation
- Critical (Assessments): Rigorous examination, identifies weaknesses
- Balanced (Assessments): Fair consideration of strengths and limitations
- Constructive (Assessments): Improvement-focused, actionable feedback
- Scholarly (Comments): Academic style with citations and formal language
- Explanatory (Comments): Clear, educational explanations for general audience
- Conversational (Comments): Casual, friendly style for approachable learning
- Technical (Comments): Precise, detailed technical explanations for expert audience
Controls the target number of annotations per 2000 words:
| Type | Range | Default | Sparse (Low) | Dense (High) |
|---|---|---|---|---|
| Highlights | 1-15 | 5 | 1-3 per 2000 words | 13-15 per 2000 words |
| Assessments | 1-10 | 4 | 1-2 per 2000 words | 8-10 per 2000 words |
| Comments | 2-12 | 5 | 2-3 per 2000 words | 10-12 per 2000 words |
Implementation: Density is communicated to the AI via prompt guidance. The AI aims for the specified density but may vary based on content (e.g., fewer highlights if content lacks noteworthy passages).
UI: Density selector includes:
- Checkbox to enable/disable (enabled by default)
- Slider control with numeric display
- Labels showing "sparse" at minimum, "dense" at maximum
- Current value displayed as "X per 2000 words"
Selection: Users select from entity type registry (Person, Location, Organization, Event, Concept, etc.)
- Multiple types can be selected in a single detection run
- Optional examples can be provided per entity type
- Detection runs once per selected entity type
Purpose: Also detect descriptive references in addition to proper names.
Checkbox option (default: unchecked):
- Unchecked (default): Only detect explicit entity names (e.g., "Einstein", "Paris", "IBM")
- Checked: Also detect descriptive references like "the physicist", "the city", "the tech giant"
Example:
- Text: "Albert Einstein was born in Ulm. The physicist later moved to Switzerland."
- Without descriptive refs: Detects "Albert Einstein", "Ulm", "Switzerland"
- With descriptive refs: Also detects "the physicist" (referencing Einstein)
Use Cases:
- Academic writing with frequent pronoun/description usage
- Historical documents using titles and descriptive phrases
- Technical documents with role-based references ("the CEO", "the lead developer")
Prompt Impact: When enabled, the AI is instructed to find both explicit names and descriptive references that clearly refer to entities.
| Detection Type | Content Limit | Rationale |
|---|---|---|
| Highlights | 8000 chars (~2000 words) | LLM context, response time, cost |
| Assessments | 8000 chars (~2000 words) | LLM context, response time, cost |
| Comments | 8000 chars (~2000 words) | LLM context, response time, cost (higher max_tokens for comment generation) |
| References | Full document | Entity extraction needs complete context |
Impact:
- Highlights/assessments/comments: Only first ~2000 words analyzed, long documents incomplete
- References: Full document processed, but may hit max_tokens (4000) on very long documents
Future Improvements:
- Chunking strategy with sliding window for highlights/assessments/comments
- User-controlled excerpt selection
- Multi-pass detection for long documents
All detection types use similar validation:
Implementation: packages/jobs/src/workers/highlight-annotation-worker.ts
// Parse LLM response
const cleaned = llmResponse.trim().replace(/^```(?:json)?\n?|\n?```$/g, '');
const parsed = JSON.parse(cleaned);
// Validate structure
if (!Array.isArray(parsed)) {
return [];
}
// Filter valid entries
return parsed.filter((h: any) =>
h &&
typeof h.exact === 'string' &&
typeof h.start === 'number' &&
typeof h.end === 'number'
);Validation Strategy:
- Remove markdown code fences if present
- Ensure response is JSON array
- Filter malformed entries
- Does NOT validate positions against content (relies on fuzzy anchoring)
Reference detection additionally validates and corrects positions using prefix/suffix context (entity-extractor.ts).
LLM Position Challenges:
- Character counting can be imprecise (±5 characters typical)
- Multi-byte characters (emojis, Unicode) cause offsets
- Whitespace handling varies
Mitigation Strategy:
- LLM provides BOTH positions AND exact text
- LLM provides prefix/suffix context (32 chars each)
- Reference detection validates and corrects positions before creating annotations
- Fuzzy anchoring finds correct position even if LLM positions wrong
- Frontend validates and corrects positions during rendering
User clicks ✨ button or selects entity types
↓
Frontend → client.mark.assist(rId, motivation, options) emits job:create
via /bus/emit with jobType derived from motivation
(highlight-annotation | assessment-annotation | comment-annotation |
tag-annotation | reference-annotation)
↓
Backend job:create handler builds a PendingJob, persists to queue,
returns job:created { jobId }
↓
Worker (separate process, subscribed to job:queued) claims via job:claim
↓
Worker runs detection, emits mark:progress / mark:assist-finished /
mark:assist-failed via /bus/emit (scoped to resourceId)
↓
Worker also emits mark:create per annotation; Stower persists and
EventStore publishes enriched mark:added events
↓
Every connected frontend receives events on /bus/subscribe;
BrowseNamespace invalidates caches; UI updates in real-time (<50ms)
Commands and result channels:
| Trigger | Request | Success | Failure |
|---|---|---|---|
client.mark.assist(..., 'highlighting', ...) |
job:create (jobType: highlight-annotation) |
mark:assist-finished |
mark:assist-failed |
client.mark.assist(..., 'assessing', ...) |
job:create (jobType: assessment-annotation) |
mark:assist-finished |
mark:assist-failed |
client.mark.assist(..., 'commenting', ...) |
job:create (jobType: comment-annotation) |
mark:assist-finished |
mark:assist-failed |
client.mark.assist(..., 'tagging', ...) |
job:create (jobType: tag-annotation) |
mark:assist-finished |
mark:assist-failed |
client.mark.assist(..., 'linking', ...) |
job:create (jobType: reference-annotation) |
mark:assist-finished |
mark:assist-failed |
All progress events flow on mark:progress scoped to the resource.
All annotation workers follow the same pattern, inheriting from JobWorker base class in @semiont/jobs. See Job Workers Documentation for complete architecture details.
Highlights Worker: HighlightAnnotationWorker
Processing Stages:
- Load Resource (10%): Fetch from Materialized Views → load content via Content Store → charset-aware decoding
- AI Detection (30%): Call
AnnotationDetection.detectHighlights()→ parse validated matches - Create Annotations (60-100%): For each highlight → create W3C annotation → emit
mark:createon EventBus
Assessments Worker: AssessmentAnnotationWorker
Processing Stages: Same as highlights, but calls AnnotationDetection.detectAssessments() and includes assessment text in body
Comments Worker: CommentAnnotationWorker
Processing Stages:
- Load Resource (10%): Fetch from Materialized Views → load content via Content Store → charset-aware decoding
- AI Detection (30%): Call
AnnotationDetection.detectComments()with tone parameter → parse validated matches - Create Annotations (60-100%): For each comment → create W3C annotation with
purpose: "commenting"→ emitmark:createon EventBus
Tags Worker: TagAnnotationWorker
Processing Stages:
- Load Resource (10%): Fetch from Materialized Views → load full content
- Per-Category Detection: For each category → call
AnnotationDetection.detectTags()→ parse validated matches - Create Annotations (60-100%): For each tag → create W3C annotation with dual-body structure (category + schema ID) → emit
mark:createon EventBus
References Worker: ReferenceAnnotationWorker
Processing Stages:
- Load Resource: Fetch from Materialized Views → load full content (no truncation)
- Per-Entity-Type Detection: For each selected entity type → perform AI inference → validate/correct positions
- Create Annotations: For each entity → create W3C annotation with entity type tags → emit
mark:createon EventBus - Progress Updates: Emit progress after each entity type completes
Event Emission: All workers emit job:start, job:progress, job:complete, or job:failed events to the EventBus. The Stower subscribes to these events and persists them to the Event Store. Workers receive dependencies (JobQueue, EventBus, EnvironmentConfig) via constructor parameters, not singletons.
Detection events flow through the bus gateway's single SSE connection, enabling real-time UI updates for every connected participant:
Progress Updates: Workers emit mark:progress on the resource-scoped
EventBus. The frontend's SemiontClient subscribes to these events
via /bus/subscribe; MarkStateUnit surfaces them through an Observable.
Annotation Creation: When a worker emits mark:create on the bus:
- Stower persists to the Event Store.
- The EventStore enrichment callback attaches the post-materialization annotation to the published event.
- Every connected frontend receives the enriched
mark:addedvia the bus subscription. - BrowseNamespace updates its cached Observable in place — no HTTP refetch needed.
See EVENT-BUS.md and CHANNELS.md for the bus protocol and channel inventory.
Event Store → View Storage → Graph Database (Knowledge System):
Worker emits mark:create on EventBus
↓
Stower persists to Event Store (filesystem JSONL - immutable append-only log)
↓
Stower emits mark:created on EventBus
↓
View Materializer updates Materialized Views (fast single-doc queries)
↓
Graph Consumer updates Graph Database (relationship traversal - backlinks, connections)
Storage Locations:
data/events/shards/ab/cd/documents/doc-sha256:abc123/events-000042-{timestamp}.jsonl
data/views/shards/ab/cd/doc-sha256:abc123.jsonl
Neptune/In-Memory graph: (Document)-[:HAS_ANNOTATION]->(Annotation)
Job Failures:
- Worker logs detailed error to backend console
- Generic error message sent to frontend ("Detection failed. Please try again later.")
- Job status preserved in queue for debugging
- Frontend shows user-friendly error toast
Client Disconnection:
- Job continues running even if client disconnects
- Annotations still created and saved to Event Store
- User sees result on page refresh (from View Storage)
Retry Strategy:
- Max 1 retry on transient failures
- Permanent failures marked as
status: 'failed' - No retry on validation errors or missing resources
DetectSection (Highlights/Assessments/Comments): apps/frontend/src/components/resource/panels/DetectSection.tsx
Shared component for HighlightPanel, AssessmentPanel, and CommentsPanel:
- Optional instructions textarea (max 500 characters with counter)
- Optional tone selector dropdown (assessments: analytical/critical/balanced/constructive; comments: scholarly/explanatory/conversational/technical)
- Optional density slider (checkbox + slider control, enabled by default)
- Highlights: 1-15 (default 5)
- Assessments: 1-10 (default 4)
- Comments: 2-12 (default 5)
- Sparkle button (✨) triggers detection
- Real-time progress display during detection
- Color-coded by motivation (yellow/amber for highlights, red/pink for assessments, purple/indigo for comments)
ReferencesPanel: apps/frontend/src/components/resource/panels/ReferencesPanel.tsx
Entity type selection UI:
- Checkbox list of available entity types
- Select all/none buttons
- "Include descriptive references" checkbox (finds descriptive phrases like "the physicist" in addition to proper names)
- Detection progress widget showing per-entity-type progress
- Completion log showing counts per entity type
File: packages/api-client/src/namespaces/mark.ts
The mark.assist() Observable handles the full detection lifecycle —
command emission, progress delivery, completion, and failure — over
the bus gateway. Components subscribe with RxJS operators; cleanup is
automatic on unsubscribe.
const subscription = client.mark.assist(resourceId, 'highlighting', {
instructions: 'Focus on key technical points',
density: 5,
}).subscribe({
next: (progress) => {
setDetectionProgress({
status: progress.status,
percentage: progress.percentage,
message: progress.message,
});
},
complete: () => {
toast.success('Detection complete');
// BrowseNamespace auto-invalidates on mark:added events — no
// explicit refetch needed.
},
error: (err) => {
toast.error(err.message);
setIsDetecting(false);
},
});
// Cleanup
subscription.unsubscribe();Highlighting:
- 10%: Loading resource...
- 30%: Analyzing text with AI...
- 60%: Creating N annotations...
- 100%: Complete! Created N highlights
Assessment:
- 10%: Loading resource...
- 30%: Analyzing text with AI...
- 60%: Creating N annotations...
- 100%: Complete! Created N assessments
Comments:
- 10%: Loading resource...
- 30%: Analyzing text and generating comments...
- 60%: Creating N annotations...
- 100%: Complete! Created N comments
References:
- Per-entity-type progress: "Detecting Person... (1/5)"
- Completion: "Found X Person, Y Location, Z Organization"
UI Feedback:
- Border changes to yellow/red/purple/blue during detection
- Animated icons (✨ for highlights/assessments/comments, 🔵 for references)
- Progress percentage or entity type status
- Real-time message updates
- Completion toast notification
After detection completes:
- Frontend refetches annotations from backend (Materialized Views)
- Annotations converted to TextSegments with positions
- CRLF → LF position conversion applied (CODEMIRROR-INTEGRATION.md)
- Visual feedback (sparkle animation for new annotations)
- Annotations render at correct positions with appropriate styling
Styling (from Annotation Registry):
- Highlights: Yellow background with hover darkening
- Assessments: Red underline with hover opacity change
- Comments: Dashed outline with hover background change
- References: Gradient cyan-to-blue with link icon
- Position accuracy: Annotations render at correct character positions
- Fuzzy anchoring: Finds correct text even when LLM positions are wrong by searching for exact text and using prefix/suffix context for disambiguation
- CRLF handling: Windows line endings normalized correctly (CODEMIRROR-INTEGRATION.md)
- Content limits: Highlights/assessments/comments process first 8000 chars, references process full document
- User instructions: Influence LLM detection results as expected (highlights/assessments/comments)
- Tone selection: Tone influences writing style as expected
- Assessment tones: analytical/critical/balanced/constructive
- Comment tones: scholarly/explanatory/conversational/technical
- Density control: Annotation count roughly matches density setting (±20% variance acceptable)
- Highlights: 1-15 per 2000 words
- Assessments: 1-10 per 2000 words
- Comments: 2-12 per 2000 words
- Descriptive references: When enabled, detects both proper names and descriptive phrases
- Comment quality: Comments add value beyond restating text, provide context/background
- Entity type selection: References detect only selected types
- W3C compliance: Annotations validate against W3C schema
- Event Store persistence: Annotations survive backend restart
- Content truncation: Highlights/assessments/comments only analyze first 8000 characters (long documents incomplete)
- Position approximation: LLM positions may be ±5 characters off (fuzzy anchoring and validation compensate)
- Single-pass processing: No iterative refinement or confidence scores
- No batch position validation: Highlights/assessments/comments don't validate positions before creating annotations (rely on fuzzy anchoring)
- Comment selectivity: AI may occasionally over-comment or under-comment (target is 3-8 per 2000 words)
- Reference max tokens: Very long documents may hit 4000 token limit, truncating entity extraction response
- AnnotationDetection API - Detection methods
- Job Workers Documentation - Worker architecture and dependency injection
- HighlightAnnotationWorker - Highlight worker
- AssessmentAnnotationWorker - Assessment worker
- CommentAnnotationWorker - Comment worker
- TagAnnotationWorker - Tag worker
- ReferenceAnnotationWorker - Reference/entity detection worker
- Make-Meaning Examples - Usage examples
- apps/backend/src/routes/resources/routes/detect-highlights-stream.ts - Highlight detection route
- apps/backend/src/routes/resources/routes/detect-assessments-stream.ts - Assessment detection route
- apps/backend/src/routes/resources/routes/detect-comments-stream.ts - Comment detection route
- apps/backend/src/routes/resources/routes/detect-annotations-stream.ts - Reference detection route
- apps/frontend/src/components/resource/panels/DetectSection.tsx - Shared UI for highlights/assessments/comments (with tone selector)
- apps/frontend/src/components/resource/panels/CommentsPanel.tsx - Comments panel with detection UI
- apps/frontend/src/components/resource/panels/ReferencesPanel.tsx - Reference detection UI
- packages/api-client/src/sse/index.ts - SSE streaming client (detectComments method)
- apps/frontend/src/lib/fuzzy-anchor.ts - Fuzzy anchoring implementation
- apps/frontend/src/lib/annotation-registry.ts - Annotation type metadata
- W3C Web Annotation Data Model - Complete W3C implementation
- W3C Selectors - Dual selector strategy
- Knowledge System - Event store architecture
- Frontend Annotations - UI patterns and components
- CodeMirror Integration - CRLF position handling