Skip to content

Conversation

@aditigopalan
Copy link
Collaborator

@aditigopalan aditigopalan commented Jan 15, 2026

Documentation Improvements and Schema Fixes

Overview

This PR migrates documentation from MkDocs to Sphinx, adds CSV export functionality, fixes schema inheritance issues, and cleans up unused files.

Major Changes

📚 Documentation Migration (MkDocs → Sphinx)

  • Migrated documentation system from MkDocs to Sphinx with Read the Docs integration
  • Added Sphinx configuration (conf.py, .readthedocs.yaml)
  • Created custom CSS for improved content width
  • Restructured documentation to be self-contained per module (all inherited attributes shown on each page)
  • Added level-specific pages for multi-level modules (WES, scRNA-seq, MultiplexMicroscopy, SpatialOmics)

📊 CSV Export Functionality

  • Added CSV export for all modules and levels
  • CSV files include all attributes (including inherited ones) with Type, Pattern, Required, and Description
  • Fixed CSV generation to include all nested class attributes for Clinical module
  • CSV files available at docs/csv/ for download

🔧 Schema Fixes

  • Removed FILENAME pattern constraint from CoreFileAttributes in modules/CoreFile/domains/core.yaml
  • Fixed FileFormatLevel4 enum inheritance: Preserved child class attributes over parent class attributes in inheritance resolution (fixes missing enum in SpatialOmics Level 4 documentation)

🧹 Code Cleanup

  • Removed unused documentation files:
    • docs/identifiers.md
    • docs/corefile.md
    • docs/imaging.md
    • docs/sequencing.md
  • Removed cruft template tracking (.cruft.json and related Makefile targets)
  • Cleaned up obsolete MkDocs configuration

🐛 Bug Fixes

  • Fixed enum links to correctly match Sphinx auto-generated anchors
  • Fixed inheritance resolution to preserve child attributes (e.g., FILE_FORMAT with enum type) over parent attributes (e.g., FILE_FORMAT with string type)
  • Fixed CSV generation to include all attributes from all classes for Clinical module

Technical Details

Documentation Generation Scripts

  • scripts/generate_table_docs.py: Generates markdown documentation with inheritance resolution
  • scripts/generate_csv_docs.py: Generates CSV exports for all modules and levels
  • Both scripts properly handle inheritance chains and preserve child class attributes

Documentation Structure

  • Each module page is self-contained with all required attributes
  • Level pages (e.g., docs/wes/level-1.md) show all inherited attributes organized by source (Core File, Base Sequencing/Imaging, Module-Specific)
  • Enum sections are automatically generated for all used enums

Files Changed

  • Documentation infrastructure: conf.py, .readthedocs.yaml, index.rst, _static/custom.css
  • Schema: modules/CoreFile/domains/core.yaml
  • Scripts: scripts/generate_table_docs.py, scripts/generate_csv_docs.py
  • Generated docs: All docs/*.md files and docs/csv/*.csv files
  • Build: Makefile (removed cruft targets)

Testing

  • Documentation builds successfully with Sphinx
  • CSV files generated correctly for all modules and levels
  • Enum links work correctly in generated documentation
  • Inheritance resolution preserves child attributes correctly

Related Issues

  • Fixes enum inheritance issues in SpatialOmics Level 4
  • Addresses documentation width constraints
  • Adds CSV export functionality as requested

Fixes #109

- Add .readthedocs.yaml configuration
- Update pyproject.toml to include mkdocs-mermaid2-plugin in docs extra
- Remove old GitHub Pages deployment workflow (deploy-docs.yaml)
- Remove gh-deploy target from Makefile
- Create modern, organized documentation structure
- Add comprehensive module documentation pages
- Update mkdocs.yml with organized navigation
- Add getting started and contributing guides
- Organize modules by category (Record-Based vs File-Based)
- Add reference section for API documentation
- Modernize index page with clear overview

The old auto-generated attribute files remain in docs/ but are not
included in navigation. They can be archived or removed in a future PR.
- Remove all 233+ outdated auto-generated attribute files
- Generate fresh documentation from schemas using LinkML gen-doc
- Integrate module READMEs into documentation structure
- Update mkdocs.yml to use generated schema docs
- Create script to generate docs from all modules
- Fix navigation and links throughout

This creates documentation similar to LinkML's own docs:
https://linkml.io/linkml/intro/overview.html

All schema documentation is now auto-generated and up-to-date.
- Remove complex navigation structure
- Direct links to generated schema docs with all attributes
- Clean up unnecessary files
- Fix broken links
- Generate simple markdown with just attributes, classes, and enums
- No complex generated structure
- Clean, readable format
- Simple script to regenerate
aditigopalan and others added 4 commits January 15, 2026 14:57
…tion

- Add comprehensive tests for FILE_FORMAT and FILENAME pattern matching
- Update CSV documentation with new patterns
- Tests verify patterns match correctly across all modules:
  * WES (levels 1, 2, 3)
  * scRNA-seq (levels 1, 2, 3/4)
  * MultiplexMicroscopy (levels 2, 3, 4)
  * DigitalPathology
  * SpatialOmics (levels 1, 4)
- Use class_induced_slots() to properly resolve inherited slot requirements
- Fixes AssertionError where FILENAME required flag wasn't being detected
- All SpatialOmics tests now passing
Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Contributor

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

aditigopalan and others added 3 commits January 15, 2026 15:25
Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Contributor

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates documentation from MkDocs to Sphinx, adds CSV export functionality, fixes schema inheritance issues, and removes unused files. However, the changes include generated MkDocs HTML build artifacts in the site/ directory that should not be committed to the repository.

Changes:

  • Documentation migration to Sphinx with Read the Docs configuration
  • CSV export functionality for all modules
  • Schema fixes for CoreFile and FileFormat enum inheritance

Reviewed changes

Copilot reviewed 73 out of 289 changed files in this pull request and generated 1 comment.

File Description
site/*.html Generated MkDocs HTML files (build artifacts)
site/sitemap.xml Generated sitemap file
site/assets/* Generated CSS, JavaScript, and other static assets

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2 to +3
<!doctype html>
<html lang="en" class="no-js">
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire site/ directory contains generated MkDocs build artifacts that should not be committed to version control. These files are build outputs and should be excluded via .gitignore. According to the PR description, the migration is to Sphinx (not MkDocs), yet these are MkDocs-generated files. These should be removed from the PR, and the build output directory should be added to .gitignore to prevent future commits of generated files.

Copilot uses AI. Check for mistakes.
aditigopalan and others added 3 commits January 15, 2026 15:33
Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Contributor

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

aditigopalan and others added 3 commits January 15, 2026 15:37
Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Contributor

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

aditigopalan and others added 3 commits January 15, 2026 15:44
Auto-generated Python classes from LinkML schema updates.

**Auto-generated by GitHub Actions workflow**
[skip ci]
@github-actions
Copy link
Contributor

Python classes auto-updated!

The Python classes have been automatically regenerated and committed to this PR branch.

**Updated files:**
```
modules/Biospecimen/src/htan_biospecimen/datamodel/biospecimen.py

modules/Clinical/src/htan_clinical/datamodel/clinical.py
modules/DigitalPathology/src/htan_digitalpathology/datamodel/digital_pathology.py
modules/Imaging/src/htan_imaging/datamodel/imaging.py
modules/MultiplexMicroscopy/src/htan_multiplexmicroscopy/datamodel/multiplex_microscopy.py
modules/Sequencing/src/htan_sequencing/datamodel/sequencing.py
modules/SpatialOmics/src/htan_spatial/datamodel/spatial.py
modules/WES/src/htan_wes/datamodel/wes.py
modules/scRNA-seq/src/htan_scrna_seq/datamodel/scrna_seq.py
```

The generated classes are now up to date with the schema changes.

Copy link
Collaborator

@adamjtaylor adamjtaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge to get deployed but this setup seems quite cumbersome. Lets dedicate time after release 8 to see how we can refine it.

"SpatialOmics": "modules/SpatialOmics/domains/spatial.yaml",
}

def resolve_inheritance_chain(class_def, all_classes, base_dir, visited=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is duplicated accross generate_csv_docs and generate_table docs. Same goes for get_conditional_requirements, MODULES and parent schema loading logic. Could you pull into a common module that is imported?

from linkml_runtime.linkml_model.meta import SchemaDefinition
from linkml_runtime.loaders import yaml_loader

MODULES = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are hardcoded and add another step reuuired to change when new modules are added. Its unclear where each of these steps are tracked. Can we automate from the base LinkML schema?

Comment on lines +22 to +25
def merge_readme_with_doc(doc_path: Path, readme_path: Path):
"""Skip README merging - just use the doc as-is (user wants just tables)."""
# Don't merge READMEs anymore - user wants just the tables
return True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does nothing it just returns true. Can we remove?

@aditigopalan aditigopalan merged commit 2f9c84a into main Jan 20, 2026
@aditigopalan aditigopalan deleted the docs branch January 20, 2026 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[docs] Update deploy docs

4 participants