Skip to content

Commit 81d6037

Browse files
committed
docs: add reference catalog and semiotic DQ theoretical foundation
1 parent 539d02e commit 81d6037

3 files changed

Lines changed: 111 additions & 21 deletions

File tree

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ tmp/
1717
# Private notes (unredacted, strategic, not for public)
1818
notes/
1919

20+
# Reference PDFs (large files, optional to track)
21+
docs/references/**/*.pdf
22+
2023
# Keep directory structure
2124
!knowledge-base/source-docs/.gitkeep
2225
!knowledge-base/rules/.gitkeep

docs/references/CATALOG.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Reference Document Catalog
2+
3+
> **Principle:** No vaporware. Every document we cite must exist locally.
4+
5+
## Status Legend
6+
- 🟢 Downloaded & verified
7+
- 🟡 URL identified, not downloaded
8+
- 🔴 Needed, not yet sourced
9+
10+
---
11+
12+
## ACS Documentation
13+
14+
### Core Handbooks
15+
16+
| ID | Title | Version | Source URL | Local Path | Status |
17+
|----|-------|---------|------------|------------|--------|
18+
| ACS-GEN-001 | Understanding and Using ACS Data: What All Data Users Need to Know | 2020 | [census.gov](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf) | `acs/acs_general_handbook_2020.pdf` | 🟡 |
19+
| ACS-RES-001 | Understanding and Using ACS Data: What Researchers Need to Know | 2020 | [census.gov](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_researchers_handbook_2020.pdf) | `acs/acs_researchers_handbook_2020.pdf` | 🟡 |
20+
| ACS-PUMS-001 | Understanding and Using ACS PUMS Files | 2020 | [census.gov](https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_pums_handbook_2020.pdf) | `acs/acs_pums_handbook_2020.pdf` | 🟡 |
21+
22+
### Technical Documentation
23+
24+
| ID | Title | Source URL | Local Path | Status |
25+
|----|-------|------------|------------|--------|
26+
| ACS-TECH-001 | ACS Technical Documentation Portal | [census.gov](https://www.census.gov/programs-surveys/acs/technical-documentation.html) | N/A (web portal) | 🟡 |
27+
| ACS-METH-001 | ACS Research & Methodology | [census.gov](https://www.census.gov/programs-surveys/acs/methodology.html) | N/A (web portal) | 🟡 |
28+
| ACS-SF-001 | ACS Summary File Handbook | [nhgis](https://assets.nhgis.org/original-data/acs/acs_summary-file_handbook_2019.pdf) | `acs/acs_summary_file_handbook_2019.pdf` | 🟡 |
29+
30+
### Subject & Code Documentation
31+
32+
| ID | Title | Source URL | Status |
33+
|----|-------|------------|--------|
34+
| ACS-SUBJ-001 | Subject Definitions | [census.gov](https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html) | 🟡 |
35+
| ACS-CODE-001 | Code Lists | [census.gov](https://www.census.gov/programs-surveys/acs/technical-documentation/code-lists.html) | 🟡 |
36+
37+
---
38+
39+
## CPS Documentation
40+
41+
| ID | Title | Source URL | Local Path | Status |
42+
|----|-------|------------|------------|--------|
43+
| CPS-TECH-001 | CPS Technical Documentation | [census.gov](https://www.census.gov/programs-surveys/cps/technical-documentation.html) | N/A (web portal) | 🟡 |
44+
| BLS-HOM-001 | BLS Handbook of Methods Ch. 1 | [bls.gov](https://www.bls.gov/opub/hom/cps/) | `cps/bls_hom_cps.pdf` | 🟡 |
45+
46+
---
47+
48+
## Theory References
49+
50+
| ID | Title | Source URL | Local Path | Status |
51+
|----|-------|------------|------------|--------|
52+
| THEORY-001 | Semiotic DQ Foundations | See semiotic_dq_foundations.md | `theory/semiotic_dq_foundations.md` | 🟢 |
53+
54+
---
55+
56+
## Download Instructions
57+
58+
Priority downloads (curl or manual):
59+
```bash
60+
# ACS General Handbook (PRIMARY SOURCE)
61+
curl -o docs/references/acs/acs_general_handbook_2020.pdf \
62+
"https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020.pdf"
63+
64+
# ACS Researchers Handbook
65+
curl -o docs/references/acs/acs_researchers_handbook_2020.pdf \
66+
"https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_researchers_handbook_2020.pdf"
67+
68+
# ACS Summary File Handbook
69+
curl -o docs/references/acs/acs_summary_file_handbook_2019.pdf \
70+
"https://assets.nhgis.org/original-data/acs/acs_summary-file_handbook_2019.pdf"
71+
```
72+
73+
After download, update status to 🟢 and add SHA256 hash.
74+
75+
---
76+
77+
## Extraction Priority
78+
79+
For pragmatics layer, extract from these docs in order:
80+
81+
1. **ACS-GEN-001** - Population thresholds, MOE guidance, comparison rules, period estimates
82+
2. **ACS-RES-001** - Researcher-specific caveats, PUMS considerations
83+
3. **BLS-HOM-001** - CPS methodology for cross-survey pragmatics

docs/references/theory/semiotic_dq_foundations.md

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,42 +2,46 @@
22

33
References supporting the pragmatics layer architecture.
44

5+
## The Gap We're Filling
6+
7+
> "Syntactic tests ask 'does this data obey the formal rules?', while pragmatic tests ask 'is this data actually good enough for this specific use and user?'"
8+
> — Semiotic DQ Thesis (2022)
9+
10+
Existing tools cover syntax and semantics. **No standard tools exist for pragmatics.**
11+
512
## Core Framework Papers
613

714
### Semiotic Principles for Metadata Auditing
815
- **Source:** [research.amanote](https://research.amanote.com/publication/f5oI3HMBKQvf0BhivObD/semiotic-principles-for-metadata-auditing-and-evaluation)
9-
- **Relevance:** Concrete auditing framework (syntagm, sign-functions, corpus boundaries) - validates our thread traversal approach
16+
- **Validates:** Thread traversal as "syntagmatic rules over records"
1017

1118
### Semiotic DQ for Behavioral Data (2022 Thesis)
1219
- **Source:** [diva-portal.org](https://www.diva-portal.org/smash/get/diva2:1737820/FULLTEXT01.pdf)
13-
- **Relevance:** Operationalizes pragmatic indicators (task adequacy, interpretability, context completeness) - validates our latitude concept
20+
- **Validates:** Latitude concept maps to "unusable / usable with caveats / fit-for-purpose"
1421

1522
### DataKitchen: Syntax-Semantics-Pragmatics Gap
1623
- **Source:** [datakitchen.io](https://datakitchen.io/the-syntax-semantics-and-pragmatics-gap-in-data-quality-validate-testing/)
17-
- **Relevance:** Industry recognition that existing tools cover syntax/semantics but NOT pragmatics - validates our gap analysis
24+
- **Validates:** Industry recognition that existing tools miss pragmatics
1825

1926
### Semiotics in Scientific Data Quality
2027
- **Source:** [honghuang.myweb.usf.edu](http://honghuang.myweb.usf.edu/pub2/Huang_JIS.pdf)
21-
- **Relevance:** Sign-relations among data, models, interpretations - validates context-as-signs approach
28+
- **Validates:** Context-as-signs approach
29+
30+
## Architecture Validation
2231

23-
## Key Validations
32+
| Our Concept | Literature Support |
33+
|-------------|-------------------|
34+
| Pragmatics layer | "fitness-for-use from user/decision perspective" |
35+
| Latitude levels | "minimum viable quality thresholds per use" |
36+
| Thread traversal | "syntagmatic rules" + "corpus boundaries" |
37+
| Pack bundles | "metadata catalog with intended use, known unsuitable uses" |
38+
| LLM handles syntax/semantics | Schema validators + OWL reasoners exist; pragmatics doesn't |
2439

25-
| Our Architecture | Literature Support |
26-
|------------------|-------------------|
27-
| Pragmatics = fitness-for-use | Semiotic DQ thesis: "pragmatic tests ask 'is this data actually good enough for this specific use and user?'" |
28-
| Latitude levels (none→full) | Maps to pragmatic thresholds: "unusable," "usable with caveats," "fit-for-purpose" |
29-
| Thread traversal | Auditing framework's "syntagmatic rules over records" |
30-
| Pack as domain bundle | "metadata catalog extended with intended use, known unsuitable uses" |
31-
| LLM handles syntax/semantics | DataKitchen: schema validators + Great Expectations = syntactic; OWL/reasoners = semantic |
40+
## Existing Toolchains (What We Don't Build)
3241

33-
## Toolchain Mapping
42+
**Syntactic:** Great Expectations, dbt tests, JSON Schema, SDMX validators
43+
**Semantic:** OWL/RDF, Protégé, SPARQL reasoners
3444

35-
What exists (we don't build):
36-
- **Syntactic:** Schema validators, Great Expectations, dbt tests, SQL constraints
37-
- **Semantic:** OWL/RDF, Protégé, SPARQL reasoners
45+
## What We Build
3846

39-
What we build (pragmatics layer):
40-
- Context items with latitude
41-
- Thread traversal for query-relevant context
42-
- Pack compilation for domain bundles
43-
- Docstring injection for LLM grounding
47+
**Pragmatic:** Context items, latitude, thread traversal, pack compilation, docstring injection

0 commit comments

Comments
 (0)