CultureMech has migrated from a two-layer system to a three-tier system with clearer naming:
Old Structure (2 layers):
data/raw/ → Layer 1: Raw source files
kb/media/ → Layer 3: Normalized recipes
New Structure (3 tiers):
raw/ → Layer 1: Raw source files (renamed from data/raw/)
raw_yaml/ → Layer 2: Unnormalized YAML (NEW)
normalized_yaml/ → Layer 3: Normalized recipes (renamed from kb/media/)
| Old Path | New Path | Description |
|---|---|---|
data/raw/ |
raw/ |
Raw source files |
| (none) | raw_yaml/ |
NEW unnormalized YAML layer |
kb/media/ |
normalized_yaml/ |
Validated recipes |
data/processed/ |
data/processed/ |
Unchanged (for future use) |
| Old Command | New Command | Notes |
|---|---|---|
| Same | Same | Outputs to raw/ instead of data/raw/ |
Example:
# Commands are the same, but now write to raw/
just fetch-mediadive-raw # Now writes to raw/mediadive/
just fetch-togo-raw 50 # Now writes to raw/togo/These are new commands for generating Layer 2 (raw_yaml):
# Convert all sources to raw_yaml
just convert-to-raw-yaml all
# Convert specific sources
just convert-to-raw-yaml mediadive
just convert-to-raw-yaml togo
just convert-to-raw-yaml atcc
# Individual source commands
just convert-mediadive-raw-yaml
just convert-togo-raw-yaml
just convert-atcc-raw-yaml
# ... (one for each source)| Old Command | New Command | Notes |
|---|---|---|
| Same | Same | Reads from raw/, writes to normalized_yaml/ |
Example:
# Commands are the same
just import-mediadive 50 # Now: raw/ → normalized_yaml/
just import-togo 100 # Now: raw/ → normalized_yaml/| Old Command | New Command |
|---|---|
just validate kb/media/bacterial/LB_Broth.yaml |
just validate normalized_yaml/bacterial/LB_Broth.yaml |
just validate-all |
just validate-all (unchanged) |
just validate-schema kb/media/... |
just validate-schema normalized_yaml/... |
| Old Command | New Command |
|---|---|
just gen-browser-data |
just gen-browser-data (reads from normalized_yaml/) |
just gen-pages |
just gen-pages (reads from normalized_yaml/) |
just gen-page kb/media/... |
just gen-page normalized_yaml/... |
just kgx-export |
just kgx-export (reads from normalized_yaml/) |
| Old Command | New Command |
|---|---|
just count-recipes |
just count-recipes (counts normalized_yaml/) |
just list-recipes |
just list-recipes (lists normalized_yaml/) |
just show-raw-data-stats |
just show-raw-data-stats (shows raw/ stats) |
Default paths have changed for all importers, fetchers, and exporters:
Importers (8 files):
# OLD
parser.add_argument("-i", "--input", default="data/raw/mediadive")
parser.add_argument("-o", "--output", default="kb/media")
# NEW
parser.add_argument("-i", "--input", default="raw/mediadive")
parser.add_argument("-o", "--output", default="normalized_yaml")Fetchers (7 files):
# OLD
parser.add_argument("-o", "--output", default="data/raw/togo")
# NEW
parser.add_argument("-o", "--output", default="raw/togo")Exporters (2 files):
# OLD
parser.add_argument("-i", "--input", default="kb/media")
# NEW
parser.add_argument("-i", "--input", default="normalized_yaml")If you have custom scripts that reference old paths, update them:
# OLD
from pathlib import Path
kb_dir = Path("kb/media")
raw_dir = Path("data/raw")
# NEW
from pathlib import Path
normalized_yaml_dir = Path("normalized_yaml")
raw_dir = Path("raw")If you have custom justfile recipes:
# OLD
kb_dir := "kb/media"
raw_data_dir := "data/raw"
# NEW
normalized_yaml_dir := "normalized_yaml"
raw_dir := "raw"
raw_yaml_dir := "raw_yaml"kb/media/files were moved tonormalized_yaml/- Git history is preserved (files were moved, not deleted/recreated)
.gitignoreupdated to use new paths
✅ Tracked in Git:
normalized_yaml/**/*.yaml- All validated recipesraw/*/README.md- Provenance documentationraw_yaml/*/README.md- Conversion notes (if present)
❌ Not Tracked (in .gitignore):
raw/**/*.json,.tsv,.sql, etc. - Raw data filesraw_yaml/**/*.yaml- Unnormalized YAML (regenerable)
Check git status:
git status
# You should see:
# - Renamed files: kb/media/* → normalized_yaml/*
# - Modified: .gitignore, project.justfile, src/culturemech/**/*.py
# - New files: src/culturemech/convert/*, migration scriptThe migration has already been completed using scripts/migrate_to_three_tier.sh. Here's what happened:
- Created backup:
backups/pre-migration-YYYYMMDD-HHMMSS/ - Created new directories:
raw/,raw_yaml/,normalized_yaml/ - Moved raw data:
data/raw/*→raw/ - Moved normalized recipes:
kb/media/*→normalized_yaml/ - Created placeholders: Empty
raw_yaml/<source>/directories - Verified structure: Confirmed all moves succeeded
A backup was created before migration:
ls backups/
# pre-migration-20260127-095345/
# data/
# kb/To undo the migration:
# 1. Restore from backup
BACKUP_DIR="backups/pre-migration-20260127-095345" # Use your backup timestamp
# 2. Restore data/
rm -rf data/raw/
cp -r "$BACKUP_DIR/data/" .
# 3. Restore kb/
rm -rf kb/media/
cp -r "$BACKUP_DIR/kb/" .
# 4. Remove new directories
rm -rf raw/ raw_yaml/ normalized_yaml/
# 5. Revert code changes (if needed)
git checkout HEAD -- .gitignore project.justfile src/# Check new directories exist
ls -ld raw/ raw_yaml/ normalized_yaml/
# Count files in each tier
find raw/ -name "*.json" | wc -l
find raw_yaml/ -name "*.yaml" | wc -l
find normalized_yaml/ -name "*.yaml" | wc -l# Populate Layer 2
just convert-to-raw-yaml all
# Verify raw_yaml populated
ls raw_yaml/mediadive/ | head -10# Test import on small sample
just import-mediadive 5
# Verify output
ls normalized_yaml/bacterial/# Validate all recipes
just validate-all
# Should pass with no errors# Generate browser data
just gen-browser-data
# Verify app/data.js created
ls -lh app/data.js
# Test serve
just serve-browser
# Visit http://localhost:8000/app/# Check what changed in git
git diff --stat
# Review specific files
git diff .gitignore
git diff project.justfile
git diff src/culturemech/import/mediadive_importer.pyCause: Old path hardcoded in custom script
Fix: Update to use normalized_yaml/
# OLD
recipes_dir = Path("kb/media")
# NEW
recipes_dir = Path("normalized_yaml")Cause: Forgot to fetch raw data after migration
Fix: Fetch raw data
just fetch-raw-dataCause: Haven't run converters yet
Fix: Generate raw_yaml
just convert-to-raw-yaml allCause: Old importer cached or not updated
Fix: Reinstall package
just installA: Provides human-readable intermediate format for debugging and inspection without requiring LinkML knowledge.
A: No, it's optional. You can import directly from raw/ to normalized_yaml/. The raw_yaml/ layer is for inspection and debugging.
A: They're backed up but can be removed after verifying migration. The backup contains everything.
A: Commands are mostly unchanged, but they now use new paths internally. Any command with explicit paths (like just validate kb/media/...) needs path updates.
A: Yes, files were moved (not deleted), so git preserves history. Use git log --follow to see history across renames.
A: Search and replace:
data/raw→rawkb/media→normalized_yaml- Add
raw_yamlreferences if needed
A: The backup contains the pre-migration state. You can compare against it or restore if needed.
- DATA_LAYERS.md - Complete architecture documentation
- RAW_YAML_LAYER.md - Layer 2 details
- QUICK_START.md - Updated getting started guide
- scripts/migrate_to_three_tier.sh - Migration script source
- backups/ - Pre-migration backups
The three-tier migration provides:
- ✅ Clearer naming -
raw/,raw_yaml/,normalized_yaml/ - ✅ Better separation - Three distinct data layers
- ✅ Improved debugging - raw_yaml/ for inspection
- ✅ Maintained functionality - All commands still work
- ✅ Preserved history - Git history intact
- ✅ Backup safety - Pre-migration backup created
The migration is complete and all systems are operational!