Skip to content

Redesign mapping file structure #3

@mbaudis

Description

@mbaudis

The current mapping file structure is based on the assumption of a primary use of icdom::icdot codes as the unique source keys. As we know, some of the mapping targets are more specific than any existing icdom::icdot combination.

I suggest to make this format more general, with the indication of what primary key is being used.
This would then also allow to have a way of making arbitrary compositions of primary keys, and even avoiding the specification of "primary" and "equivalents". However, this would require the addition of an information attribute to samples which indicates the original code attribution (this could default to icdom::icdot if not specified).

Current example:
equivalents:
  - label: Ductal Breast Carcinoma
    id: ncit:C4017
  - id: seer:26000
examples:
  - label: invasive breast adenocarcinoma
input:
  - label: Infiltrating duct carcinoma, NOS
    id: icdom:85003
  - label: Breast, NOS
    id: icdot:C50.9
New version:
pattern: "icdom::icdot"
examples:
  - label: "invasive breast adenocarcinoma"
close_matches:
  - label: Infiltrating duct carcinoma, NOS
    id: icdom:85003
  - label: Breast, NOS
    id: icdot:C50.9
  - label: Ductal Breast Carcinoma
    id: ncit:C4017
  - id: seer:26000

This now would be the specification of a sample with a primary assignment by NCIt, and the derived ICD and SEER codes:

pattern: "ncit"
examples:
  - label: "breast carcinoma [from DCIS, basal-like, triple negative]"
close_matches:
  - label: Infiltrating duct carcinoma, NOS
    id: icdom:85003
  - label: Breast, NOS
    id: icdot:C50.9
  - label: Triple-Negative Breast Carcinoma
    id: ncit:C71732
  - id: seer:26000

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions