docs: add reference catalog and semiotic DQ theoretical foundation

brockwebb · brockwebb · commit 81d6037b4c80 · 2026-02-07T16:13:09.000-05:00
diff --git a/.gitignore b/.gitignore
@@ -17,6 +17,9 @@ tmp/
 # Private notes (unredacted, strategic, not for public)
 notes/
 
+# Reference PDFs (large files, optional to track)
+docs/references/**/*.pdf
+
 # Keep directory structure
 !knowledge-base/source-docs/.gitkeep
 !knowledge-base/rules/.gitkeep
diff --git a/docs/references/CATALOG.md b/docs/references/CATALOG.md
@@ -0,0 +1,83 @@
+# Reference Document Catalog
+
+> **Principle:** No vaporware. Every document we cite must exist locally.
+
+## Status Legend
+- 🟢 Downloaded & verified
+- 🟡 URL identified, not downloaded
+- 🔴 Needed, not yet sourced
+
+---
+
+## ACS Documentation
+
+### Core Handbooks
+
+| ID | Title | Version | Source URL | Local Path | Status |
+|----|-------|---------|------------|------------|--------|
+| ACS-GEN-001 | Understanding and Using ACS Data: What All Data Users Need to Know | 2020 | [census.gov](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf) | `acs/acs_general_handbook_2020.pdf` | 🟡 |
+| ACS-RES-001 | Understanding and Using ACS Data: What Researchers Need to Know | 2020 | [census.gov](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_researchers_handbook_2020.pdf) | `acs/acs_researchers_handbook_2020.pdf` | 🟡 |
+| ACS-PUMS-001 | Understanding and Using ACS PUMS Files | 2020 | [census.gov](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_pums_handbook_2020.pdf) | `acs/acs_pums_handbook_2020.pdf` | 🟡 |
+
+### Technical Documentation
+
+| ID | Title | Source URL | Local Path | Status |
+|----|-------|------------|------------|--------|
+| ACS-TECH-001 | ACS Technical Documentation Portal | [census.gov](https://www.census.gov/programs-surveys/acs/technical-documentation.html) | N/A (web portal) | 🟡 |
+| ACS-METH-001 | ACS Research & Methodology | [census.gov](https://www.census.gov/programs-surveys/acs/methodology.html) | N/A (web portal) | 🟡 |
+| ACS-SF-001 | ACS Summary File Handbook | [nhgis](https://assets.nhgis.org/original-data/acs/acs_summary-file_handbook_2019.pdf) | `acs/acs_summary_file_handbook_2019.pdf` | 🟡 |
+
+### Subject & Code Documentation
+
+| ID | Title | Source URL | Status |
+|----|-------|------------|--------|
+| ACS-SUBJ-001 | Subject Definitions | [census.gov](https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html) | 🟡 |
+| ACS-CODE-001 | Code Lists | [census.gov](https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html) | 🟡 |
+
+---
+
+## CPS Documentation
+
+| ID | Title | Source URL | Local Path | Status |
+|----|-------|------------|------------|--------|
+| CPS-TECH-001 | CPS Technical Documentation | [census.gov](https://www.census.gov/programs-surveys/cps/technical-documentation.html) | N/A (web portal) | 🟡 |
+| BLS-HOM-001 | BLS Handbook of Methods Ch. 1 | [bls.gov](https://www.bls.gov/opub/hom/cps/) | `cps/bls_hom_cps.pdf` | 🟡 |
+
+---
+
+## Theory References
+
+| ID | Title | Source URL | Local Path | Status |
+|----|-------|------------|------------|--------|
+| THEORY-001 | Semiotic DQ Foundations | See semiotic_dq_foundations.md | `theory/semiotic_dq_foundations.md` | 🟢 |
+
+---
+
+## Download Instructions
+
+Priority downloads (curl or manual):
+```bash
+# ACS General Handbook (PRIMARY SOURCE)
+curl -o docs/references/acs/acs_general_handbook_2020.pdf \
+  "https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf"
+
+# ACS Researchers Handbook
+curl -o docs/references/acs/acs_researchers_handbook_2020.pdf \
+  "https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_researchers_handbook_2020.pdf"
+
+# ACS Summary File Handbook
+curl -o docs/references/acs/acs_summary_file_handbook_2019.pdf \
+  "https://assets.nhgis.org/original-data/acs/acs_summary-file_handbook_2019.pdf"
+```
+
+After download, update status to 🟢 and add SHA256 hash.
+
+---
+
+## Extraction Priority
+
+For pragmatics layer, extract from these docs in order:
+
+1. **ACS-GEN-001** - Population thresholds, MOE guidance, comparison rules, period estimates
+2. **ACS-RES-001** - Researcher-specific caveats, PUMS considerations
+3. **BLS-HOM-001** - CPS methodology for cross-survey pragmatics
diff --git a/docs/references/theory/semiotic_dq_foundations.md b/docs/references/theory/semiotic_dq_foundations.md
@@ -2,42 +2,46 @@
 
 References supporting the pragmatics layer architecture.
 
+## The Gap We're Filling
+
+> "Syntactic tests ask 'does this data obey the formal rules?', while pragmatic tests ask 'is this data actually good enough for this specific use and user?'"
+> — Semiotic DQ Thesis (2022)
+
+Existing tools cover syntax and semantics. **No standard tools exist for pragmatics.**
+
 ## Core Framework Papers
 
 ### Semiotic Principles for Metadata Auditing
 - **Source:** [research.amanote](https://research.amanote.com/publication/f5oI3HMBKQvf0BhivObD/semiotic-principles-for-metadata-auditing-and-evaluation)
-- **Relevance:** Concrete auditing framework (syntagm, sign-functions, corpus boundaries) - validates our thread traversal approach
+- **Validates:** Thread traversal as "syntagmatic rules over records"
 
 ### Semiotic DQ for Behavioral Data (2022 Thesis)
 - **Source:** [diva-portal.org](https://www.diva-portal.org/smash/get/diva2:1737820/FULLTEXT01.pdf)
-- **Relevance:** Operationalizes pragmatic indicators (task adequacy, interpretability, context completeness) - validates our latitude concept
+- **Validates:** Latitude concept maps to "unusable / usable with caveats / fit-for-purpose"
 
 ### DataKitchen: Syntax-Semantics-Pragmatics Gap
 - **Source:** [datakitchen.io](https://datakitchen.io/the-syntax-semantics-and-pragmatics-gap-in-data-quality-validate-testing/)
-- **Relevance:** Industry recognition that existing tools cover syntax/semantics but NOT pragmatics - validates our gap analysis
+- **Validates:** Industry recognition that existing tools miss pragmatics
 
 ### Semiotics in Scientific Data Quality
 - **Source:** [honghuang.myweb.usf.edu](http://honghuang.myweb.usf.edu/pub2/Huang_JIS.pdf)
-- **Relevance:** Sign-relations among data, models, interpretations - validates context-as-signs approach
+- **Validates:** Context-as-signs approach
+
+## Architecture Validation
 
-## Key Validations
+| Our Concept | Literature Support |
+|-------------|-------------------|
+| Pragmatics layer | "fitness-for-use from user/decision perspective" |
+| Latitude levels | "minimum viable quality thresholds per use" |
+| Thread traversal | "syntagmatic rules" + "corpus boundaries" |
+| Pack bundles | "metadata catalog with intended use, known unsuitable uses" |
+| LLM handles syntax/semantics | Schema validators + OWL reasoners exist; pragmatics doesn't |
 
-| Our Architecture | Literature Support |
-|------------------|-------------------|
-| Pragmatics = fitness-for-use | Semiotic DQ thesis: "pragmatic tests ask 'is this data actually good enough for this specific use and user?'" |
-| Latitude levels (none→full) | Maps to pragmatic thresholds: "unusable," "usable with caveats," "fit-for-purpose" |
-| Thread traversal | Auditing framework's "syntagmatic rules over records" |
-| Pack as domain bundle | "metadata catalog extended with intended use, known unsuitable uses" |
-| LLM handles syntax/semantics | DataKitchen: schema validators + Great Expectations = syntactic; OWL/reasoners = semantic |
+## Existing Toolchains (What We Don't Build)
 
-## Toolchain Mapping
+**Syntactic:** Great Expectations, dbt tests, JSON Schema, SDMX validators
+**Semantic:** OWL/RDF, Protégé, SPARQL reasoners
 
-What exists (we don't build):
-- **Syntactic:** Schema validators, Great Expectations, dbt tests, SQL constraints
-- **Semantic:** OWL/RDF, Protégé, SPARQL reasoners
+## What We Build
 
-What we build (pragmatics layer):
-- Context items with latitude
-- Thread traversal for query-relevant context
-- Pack compilation for domain bundles
-- Docstring injection for LLM grounding
+**Pragmatic:** Context items, latitude, thread traversal, pack compilation, docstring injection