You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add unmapped ingredients aggregation and tracking system
- Created LinkML schema for unmapped ingredients (9 classes, 5 enums)
- Implemented aggregation script to identify and track unmapped ingredients
- Added statistics reporting tool for prioritization analysis
- Generated comprehensive documentation and executive summary
- Updated README with system overview and usage commands
System identifies 136 unmapped ingredients across 522 media (4.9% of total),
totaling 3,084 instances requiring ontology term mapping. Supports automated
detection of numeric placeholders, generic terms, and chemical name extraction
from notes fields.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
**Comprehensive Microbial Culture Media Knowledge Graph**
4
4
5
-
A production-ready knowledge base containing **10,595 culture media recipes** from 10 major international repositories, with LinkML schema validation, ontology grounding, and browser-based exploration.
5
+
A production-ready knowledge base containing **10,657 culture media recipes** from 10 major international repositories, with LinkML schema validation, ontology grounding, and browser-based exploration.
-**[Data Sources Summary](docs/DATA_SOURCES_SUMMARY.md)** - All source repositories
275
300
276
-
### Data Quality
301
+
### Data Quality & Enrichment
277
302
-**[Enrichment Guide](docs/ENRICHMENT_GUIDE.md)** - Data quality improvement workflow
303
+
-**[Implementation Summary](IMPLEMENTATION_SUMMARY.md)** - Literature verification & enum normalization
304
+
-**[Unmapped Ingredients Guide](docs/unmapped_ingredients_guide.md)** - System for tracking ingredients needing ontology mapping
305
+
-**[Unmapped Ingredients Summary](UNMAPPED_INGREDIENTS_SUMMARY.md)** - Executive summary with statistics and priorities
278
306
279
307
## 🧬 Recipe Format
280
308
@@ -283,8 +311,8 @@ Recipes are stored as YAML files following the LinkML schema:
283
311
```yaml
284
312
name: BG-11 Medium
285
313
category: algae
286
-
medium_type: complex
287
-
physical_state: liquid
314
+
medium_type: COMPLEX
315
+
physical_state: LIQUID
288
316
289
317
description: Standard cyanobacteria medium from UTEX Culture Collection
290
318
@@ -403,6 +431,65 @@ Every recipe includes:
403
431
- Cross-references to original sources
404
432
- PDF URLs for detailed protocols (CCAP/SAG)
405
433
434
+
## 🔬 Literature Verification
435
+
436
+
**NEW** (2026-02-20): CultureMech now includes a comprehensive literature verification system for validating cross-references through scientific papers.
437
+
438
+
### 6-Tier Cascading PDF Retrieval
439
+
440
+
The system attempts to retrieve PDFs from multiple sources in order:
⚠️ **Important**: The Sci-Hub fallback tier is disabled by default and requires explicit opt-in. Use may violate publisher agreements or local laws. Users are responsible for compliance with institutional policies.
483
+
484
+
**Safety features:**
485
+
- Default: `use_fallback_pdf=False`
486
+
- Legal sources exhausted first
487
+
- Clear warnings when Sci-Hub is enabled
488
+
- Full provenance tracking
489
+
- No auto-distribution of PDFs
490
+
491
+
See [IMPLEMENTATION_SUMMARY.md](IMPLEMENTATION_SUMMARY.md) for complete documentation.
492
+
406
493
## 🌐 Browser Interface
407
494
408
495
The faceted search browser (`app/index.html`) provides:
@@ -434,6 +521,42 @@ just gen-browser-data # Generate browser search data
0 commit comments