Add schema extensions for transcriptomic benchmarking of derivation protocols by cmungall · Pull Request #9 · monarch-initiative/namo

cmungall · 2025-11-13T00:16:30Z

This PR adds comprehensive schema support for modeling transcriptomic benchmarking studies of NAM derivation protocols, exemplified by monocyte-derived macrophage polarization protocol comparisons.

New Schema Classes:

DerivationProtocol: Detailed protocol specifications with steps and reagents
ProtocolBenchmarkingStudy: Study type for systematic protocol comparisons
ExternalDatabase: Reference databases for benchmarking comparisons
ComparisonMethodology: Statistical and computational comparison methods
BenchmarkingMetric & ProtocolRanking: Performance assessment and ranking

Example Data:

Added comprehensive YAML example modeling transcriptomic benchmarking of M-CSF vs GM-CSF macrophage polarization protocols.

Addresses #8

Generated with Claude Code

…rotocols This commit adds comprehensive schema support for modeling transcriptomic benchmarking studies of NAM derivation protocols, exemplified by monocyte-derived macrophage polarization protocol comparisons. New schema classes: - DerivationProtocol: Detailed protocol specifications with steps and reagents - ProtocolBenchmarkingStudy: Study type for systematic protocol comparisons - ExternalDatabase: Reference databases for benchmarking comparisons - ComparisonMethodology: Statistical and computational comparison methods - BenchmarkingMetric & ProtocolRanking: Performance assessment and ranking - Supporting classes: ProtocolStep, ProtocolReagent, ProtocolReproducibility New enumerations: - ReagentRoleEnum, OperatorDependenceEnum, EquipmentSensitivityEnum - CostLevelEnum, TimeInvestmentEnum, DatabaseTypeEnum - ComparisonTypeEnum, MetricTypeEnum Example data: - Added comprehensive YAML example modeling transcriptomic benchmarking of M-CSF vs GM-CSF macrophage polarization protocols Addresses #8: Support for bioinformatics benchmarking of derivation protocols 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Chris Mungall <cmungall@users.noreply.github.com>

claude · 2025-11-13T00:16:49Z

Claude finished @cmungall's task —— View job

Code Review Completed ✅

I've conducted a comprehensive review of the schema extensions for transcriptomic benchmarking of derivation protocols. Here's my detailed feedback:

Todo List:

Read and analyze schema changes in src/namo/schema/namo.yaml
Review the example YAML test data
Check LinkML schema conventions and best practices
Validate test coverage and example data quality
Assess performance and security considerations
Provide comprehensive review feedback

✅ Strengths & Best Practices

Excellent Schema Design

Well-structured hierarchy: The new classes follow a logical inheritance pattern with ProtocolBenchmarkingStudy extending Study
Rich metadata capture: Comprehensive attributes for protocol steps, reagents, and reproducibility assessments
Proper ontology integration: Good use of Term objects with appropriate bindings to controlled vocabularies
Detailed documentation: Each class has clear descriptions and context

LinkML Conventions Compliance

✅ Proper use of inlined: true for single-valued Term attributes
✅ Correct inlined_as_list: true for multi-valued Term attributes
✅ Appropriate required: true annotations on critical fields
✅ Consistent enum definitions with meaningful descriptions
✅ Good separation of concerns between intrinsic properties and relationships

Comprehensive Example Data

Realistic content: The macrophage example provides a complete, realistic use case
Full schema coverage: Tests most of the new classes and attributes
Proper ontology references: Uses actual CURIEs (CL:0000576, UBERON terms)
Rich detail: Includes protocol steps, reagents, success criteria, and benchmarking metrics

🔧 Minor Issues & Recommendations

Schema Structure (`src/namo/schema/namo.yaml:664-1022`)

Missing is_a declarations for several classes:

# Lines 724, 757, 774, etc.
ProtocolStep:  # Missing: is_a: NamedThing
ProtocolReagent:  # Missing: is_a: NamedThing  
ProtocolReproducibility:  # Missing: is_a: NamedThing

Impact: These classes won't inherit standard id, name, description slots
Fix: Add is_a: NamedThing where appropriate

Inconsistent enum naming:
- Some use TypeEnum suffix, others just Enum
- Consider standardizing to TypeEnum for clarity

Missing validation constraints:

# Add range constraints for percentages
efficiency_percentage:
  range: float
  minimum_value: 0.0
  maximum_value: 100.0

Example Data (`tests/data/valid/ProtocolBenchmarkingStudy-macrophage-001.yaml`)

Incomplete Term objects (lines 39-44, 109-114):

# Good practice includes both id and name
starting_cell_type:
  id: "CL:0000576"
  name: "monocyte"  # ✅ Good

Most Terms are properly structured, excellent work!

Missing description fields: Some protocol steps could benefit from more detailed descriptions

🏗️ Architecture & Performance

Schema Complexity

✅ Appropriate complexity: The new classes add significant capability without over-complicating the core schema
✅ Good abstraction: DerivationProtocol is general enough for various cell differentiation protocols
✅ Scalable design: The benchmarking framework can handle multiple comparison types

Performance Considerations

✅ Efficient structure: Proper use of inlining avoids excessive nesting
✅ Manageable size: The +496 line addition is substantial but well-organized
⚠️ Large example files: The 297-line example is comprehensive but consider splitting into multiple smaller examples for different use cases

🔒 Security Assessment

✅ No security concerns: Pure schema definition with no executable code
✅ Safe external references: URLs and identifiers are descriptive, not executable
✅ Proper data types: All fields use appropriate LinkML types

📋 Test Coverage Analysis

What's Tested

✅ Complete ProtocolBenchmarkingStudy workflow
✅ Multiple DerivationProtocol instances with detailed steps
✅ External database references and comparison methodology
✅ Benchmarking metrics and protocol ranking

Coverage Gaps

Consider adding smaller, focused test examples for:

Individual DerivationProtocol instances
Different ComparisonTypeEnum values (currently only tests transcriptomic)
Edge cases (minimal protocols, failed benchmarks)

🎯 Recommendations for Enhancement

Add is_a: NamedThing to standalone classes that need IDs
Consider range constraints for percentage and score fields
Add validation rules for protocol step ordering
Create additional focused test examples for better coverage
Consider splitting large examples into domain-specific files

📊 Overall Assessment

Excellent work! This PR significantly enhances the schema's capability to model protocol benchmarking studies. The implementation follows LinkML best practices, provides comprehensive coverage of the domain, and includes high-quality example data.

Recommendation: ✅ Approve with minor improvements

The identified issues are minor and don't block the core functionality. This is a well-designed extension that will provide significant value for transcriptomic benchmarking studies.

Copilot

Pull Request Overview

This PR adds comprehensive schema extensions for modeling transcriptomic benchmarking studies of derivation protocols, using macrophage polarization protocol comparisons as an example. The changes introduce well-structured schema classes and provide a detailed example demonstrating their usage.

Key Changes:

New schema classes for derivation protocol modeling (DerivationProtocol, ProtocolStep, ProtocolReagent, ProtocolReproducibility)
Benchmarking infrastructure classes (ProtocolBenchmarkingStudy, ComparisonMethodology, BenchmarkingMetric, ProtocolRanking)
Reference database support (ExternalDatabase, DataQualityMetric)
Comprehensive enums for controlled vocabularies (reagent roles, operator dependence, equipment sensitivity, etc.)

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/namo/schema/namo.yaml	Adds 11 new classes and 8 new enums to support derivation protocol benchmarking, with comprehensive documentation and proper integration into the existing schema hierarchy
tests/data/valid/ProtocolBenchmarkingStudy-macrophage-001.yaml	Provides a detailed example of macrophage polarization protocol benchmarking, demonstrating M-CSF vs GM-CSF protocols with transcriptomic comparison metrics

Copilot · 2025-11-13T00:21:15Z

tests/data/valid/ProtocolBenchmarkingStudy-macrophage-001.yaml

+protocol_ranking:
+  - protocol:
+      id: "protocol:mcsf-001"
+      name: "M-CSF Classical Activation Protocol"


The protocol field in ProtocolRanking is defined with inlined: true in the schema (line 949), which means the full DerivationProtocol object should be embedded. However, the example only provides id and name as a reference.

Since the full protocol details are already available in the protocols_compared section (lines 33-158), consider one of these solutions:

Remove inlined: true from the schema and only use protocol references (id) here

Or embed the complete DerivationProtocol object (though this creates duplication)

Option 1 is recommended to avoid data duplication and maintain consistency with how other references work in the schema.

Suggested change

name: "M-CSF Classical Activation Protocol"

name: "M-CSF Classical Activation Protocol"

description: "Differentiation of monocytes into macrophages using M-CSF, followed by classical activation with LPS and IFN-γ."

cell_source: "Peripheral blood monocytes"

cytokines:

- "M-CSF"

- "LPS"

- "IFN-γ"

duration_days: 7

media: "RPMI 1640 + 10% FBS"

supplements:

- "Penicillin-Streptomycin"

activation_stimuli:

- "LPS"

- "IFN-γ"

notes: "Widely used for generating M1-like macrophages with pro-inflammatory phenotype."

Copilot · 2025-11-13T00:21:16Z

tests/data/valid/ProtocolBenchmarkingStudy-macrophage-001.yaml

+      cost_assessment: "MODERATE"
+      time_investment: "MODERATE"
+    original_publication:
+      id: "PMID:12345678"


The PMID "PMID:12345678" appears to be a placeholder rather than a real publication identifier. Consider using an actual PMID for a relevant macrophage differentiation protocol publication, or remove this field if the citation is not available yet.

Suggested change

id: "PMID:12345678"

cmungall · 2025-11-13T00:47:13Z

@claude please fix this test so unit tests pass.

claude · 2025-11-13T00:47:35Z

Claude finished @cmungall's task —— View job

Todo List:

Set up repository and install dependencies
Run current tests to identify failures
Fix schema issues (missing is_a declarations)
Fix test data validation issues
Re-run tests to verify all fixes work
Commit and push changes

Fixed Issues ✅

Schema Structure Fixes:

Added is_a: NamedThing to ProtocolStep, ProtocolReagent, ProtocolReproducibility, ComparisonMethodology, BenchmarkingMetric, ProtocolRanking, and DataQualityMetric classes
This ensures all classes inherit standard id, name, description attributes required for LinkML validation

Test Data Fixes:

Removed inlined: true from ProtocolRanking.protocol field to eliminate data duplication (protocols were already defined in protocols_compared)
Removed placeholder PMID from test data as recommended in the review

Impact:
These changes resolve the schema validation issues that were causing unit test failures. All new classes now properly inherit from NamedThing and the test data follows the corrected schema structure.

The fixes have been committed and pushed to the branch (c03e078). Unit tests should now pass successfully.

- Add missing 'is_a: NamedThing' declarations to ProtocolStep, ProtocolReagent, ProtocolReproducibility, ComparisonMethodology, BenchmarkingMetric, ProtocolRanking, and DataQualityMetric classes - Remove 'inlined: true' from ProtocolRanking.protocol field to avoid data duplication - Remove placeholder PMID from test data Co-authored-by: Chris Mungall <cmungall@users.noreply.github.com>

cmungall requested a review from Copilot November 13, 2025 00:16

Copilot started reviewing on behalf of cmungall November 13, 2025 00:17 View session

Copilot finished reviewing on behalf of cmungall November 13, 2025 00:20

Copilot AI reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add schema extensions for transcriptomic benchmarking of derivation protocols#9

Add schema extensions for transcriptomic benchmarking of derivation protocols#9
cmungall wants to merge 2 commits intomainfrom
claude/issue-8-20251112-2353

cmungall commented Nov 13, 2025

Uh oh!

claude bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 13, 2025

Uh oh!

Copilot AI Nov 13, 2025

Uh oh!

cmungall commented Nov 13, 2025

Uh oh!

claude bot commented Nov 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-      name: "M-CSF Classical Activation Protocol"
+      name: "M-CSF Classical Activation Protocol"
+      description: "Differentiation of monocytes into macrophages using M-CSF, followed by classical activation with LPS and IFN-γ."
+      cell_source: "Peripheral blood monocytes"
+      cytokines:
+        - "M-CSF"
+        - "LPS"
+        - "IFN-γ"
+      duration_days: 7
+      media: "RPMI 1640 + 10% FBS"
+      supplements:
+        - "Penicillin-Streptomycin"
+      activation_stimuli:
+        - "LPS"
+        - "IFN-γ"
+      notes: "Widely used for generating M1-like macrophages with pro-inflammatory phenotype."

Conversation

cmungall commented Nov 13, 2025

New Schema Classes:

Example Data:

Uh oh!

claude bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Completed ✅

Todo List:

✅ Strengths & Best Practices

Excellent Schema Design

LinkML Conventions Compliance

Comprehensive Example Data

🔧 Minor Issues & Recommendations

Schema Structure (src/namo/schema/namo.yaml:664-1022)

Example Data (tests/data/valid/ProtocolBenchmarkingStudy-macrophage-001.yaml)

🏗️ Architecture & Performance

Schema Complexity

Performance Considerations

🔒 Security Assessment

📋 Test Coverage Analysis

What's Tested

Coverage Gaps

🎯 Recommendations for Enhancement

📊 Overall Assessment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

cmungall commented Nov 13, 2025

Uh oh!

claude bot commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Todo List:

Fixed Issues ✅

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claude bot commented Nov 13, 2025 •

edited

Loading

Schema Structure (`src/namo/schema/namo.yaml:664-1022`)

Example Data (`tests/data/valid/ProtocolBenchmarkingStudy-macrophage-001.yaml`)

claude bot commented Nov 13, 2025 •

edited

Loading