Skip to content

Draft Pedigree schema v1: updated model, examples, and proto schema#50

Open
tripb wants to merge 6 commits into
masterfrom
schema-v1
Open

Draft Pedigree schema v1: updated model, examples, and proto schema#50
tripb wants to merge 6 commits into
masterfrom
schema-v1

Conversation

@tripb

@tripb tripb commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

Summary

Proposes the v1 Pedigree schema, updating the conceptual model from v0.1 and adding a formal proto3 schema modeled on phenopacket-schema conventions.

Individual changes

  • sexsex_assigned_at_birth (now optional; categorical values added)
  • gendergender_identity (categorical values added)
  • dateOfBirthdate_of_birth, populationDescriptorspopulation_descriptors (snake_case)
  • karyotypicSex removed (to be represented in linked genotypic data)
  • affected removed (to be represented in linked phenotypic data)
  • egg_parent and sperm_parent added as shorthand for gamete-provider parent links

ExternalIdentifier (new)

  • New concept linking an Individual to identifiers in external systems (EHR, Phenopackets, etc.) via external_id, external_id_system, external_system_endpoint

Relationship changes

  • Single relation field replaced by biological_relationship (0..) and social_relationship (0..) for independent multi-valued representation
  • twin_group added (monozygotic / dizygotic)
  • consanguinity (bool) and consanguinity_note (free text) added

Pedigree changes

  • indexPatientsindex_patients (snake_case)

Proto schema (new)

  • src/main/proto/ga4gh/pedigree/v1/ — four proto3 files:
    • base.proto — OntologyClass, TimeElement, Age, AgeRange, GestationalAge
    • individual.proto — Individual, ExternalIdentifier, SexAssignedAtBirth enum, GenderIdentity enum
    • relationship.proto — Relationship, TwinType enum
    • pedigree.proto — Pedigree, PedigreeStatus enum
  • Package namespace: org.ga4gh.pedigree.v1, mirroring phenopacket-schema conventions

Docs

  • pedigree-model.rst — updated field tables with all v1 changes
  • examples.rst — all YAML examples updated to v1 field names; added consanguinity example
  • using-the-pedigree-model.rst — updated relationship direction section; new Biological vs. Social and Consanguinity subsections
  • schema.rst — new page documenting the proto schema and how to compile it
  • README.md — updated with v1 change summary and proto schema link

Test plan

  • Review all field names and descriptions in pedigree-model.rst against the v1 spec CSV
  • Verify YAML examples are consistent with updated model (especially relationship structure)
  • Compile proto files with protoc to confirm no syntax errors
  • Review proto field numbers and optional vs repeated choices
  • Verify egg_parent_id / sperm_parent_id consistency note is accurate
  • Confirm ExternalIdentifier embedding approach (embedded in Individual vs. separate collection)

🤖 Generated with Claude Code
EOF
)"

- Individual: rename sex→sex_assigned_at_birth (optional, +enum),
  gender→gender_identity (+enum), dateOfBirth→date_of_birth,
  populationDescriptors→population_descriptors; remove karyotypicSex
  and affected; add egg_parent and sperm_parent
- ExternalIdentifier: new concept linking individuals to external systems
- Relationship: replace single `relation` with biological_relationship
  and social_relationship (both 0..*); add twin_group, consanguinity,
  consanguinity_note
- Pedigree: rename indexPatients→index_patients
- Add proto3 schema under src/main/proto/ga4gh/pedigree/v1/ modeled on
  phenopacket-schema namespace conventions (base, individual,
  relationship, pedigree protos)
- Add schema.rst documentation page; update examples and using-the-model
  docs throughout; update README with v1 change summary

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tripb tripb requested review from BuehlerR1 and buske June 17, 2026 19:04
tripb and others added 5 commits June 17, 2026 14:07
RTD now requires an explicit .readthedocs.yaml at the repo root.
Adds docs/requirements.txt with sphinx and sphinx-rtd-theme pins.
Bumps conf.py version from 0.1 to 1.0 to match schema v1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- using-the-pedigree-model.rst: replace link to old third-party proto
  with reference to the new canonical GA4GH Pedigree v1 proto schema
- acknowledgements.rst: fix typo "Phenotpic" → "Phenotypic"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Explains how KIN biological and social subsets map to the
biological_relationship and social_relationship fields; documents
preferred downward direction for asymmetric terms with a reference
table; notes that inverse terms exist for convenience and should not
be double-asserted; adds YAML usage examples.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n priority

- New 'Tree-based vs. Relationship-based Fields' section explicitly defines
  egg_parent/sperm_parent as tree-based and biological_relationship/
  social_relationship as relationship-based, with preference rules:
  prefer tree-based, use relationship-based as supplement or alternative,
  consistency required with tree-based taking precedence
- Direction of Relationships updated with three-tier priority:
  1. Proband-ascending (highest), 2. Downward/ancestor, 3. Consistent
- kin.rst direction note updated to reference the priority and explicitly
  call out the proband-ascending use case for inverse terms

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant