Skip to content

Standardize amount/level terminology: rename level→amount, add cross-synonyms#422

Open
Copilot wants to merge 12 commits into
masterfrom
copilot/clean-up-terms-with-amount-level
Open

Standardize amount/level terminology: rename level→amount, add cross-synonyms#422
Copilot wants to merge 12 commits into
masterfrom
copilot/clean-up-terms-with-amount-level

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 6, 2026

Quantitative OBA terms were inconsistently labeled with either amount or level, causing GWAS matching failures and user confusion. Resolution: amount becomes the canonical label; level moves to exact synonym.

Rules applied

  • X level [in Y] in label → renamed to X amount [in Y]; original level form added as exact_synonym
  • level of X [in Y] in label → renamed to X amount [in Y]; original level of X [in Y] form added as exact_synonym
  • amount in labellevel variant added as exact_synonym
  • Duplicate synonyms removed: where the new label matched an existing synonym, the duplicate was removed from the synonym list
  • Auto-generated rows (empty defined_class_name) with attribute = PATO:0000070 and known entity/location/role names: level synonym added, resolving entity labels from merged_import.owl where needed

Handles singular (level/amount) and plural (levels/amounts) at word boundaries.

Scale of changes

File Level→Amount renames Level synonyms added
entity_attribute_location.tsv 12,971 12,971 + 377 for explicit amount terms; 86 duplicate synonyms removed; 6 auto-generated rows given explicit label + level synonym; 995 auto-generated PR protein rows given level of {PR_label} in blood synonym (labels resolved from merged_import.owl)
attribute_ratio.tsv 2,824 2,824
entity_attribute.tsv 4 4 + 15 for explicit amount terms; 675 auto-generated rows given {entity_name} level synonym
attribute_location_during_activity.tsv 2 2
entity_response_quality_in_location.tsv 1 1
chemical_role_attribute.tsv 2 2 explicit + 5 auto-generated (level synonyms added via new exact_synonyms column)
chemical_role_attribute_location.tsv 4 10 (level synonyms added via new exact_synonyms column)
process_attribute_location.tsv 1 (level synonym added via new exact_synonyms column)

DOSDP patterns updated to support exact_synonyms data list var: chemical_role_attribute.yaml, chemical_role_attribute_location.yaml, process_attribute_location.yaml.

Examples

# Before (X level style)
OBA:1000957  carbon dioxide level  ...

# After
OBA:1000957  carbon dioxide amount  ...  carbon dioxide level

# Before (level of X in Y style)
OBA:2040177  level of ceramide  ...  ceramide amount
OBA:2040184  level of chordin in blood serum  ...

# After (duplicate removed, correct synonym only)
OBA:2040177  ceramide amount  ...  level of ceramide
OBA:2040184  chordin amount in blood serum  ...  level of chordin in blood serum

# Before (chemical_role_attribute_location, no synonym column)
OBA:2045211  nutrient level in blood  ...

# After (synonym column added)
OBA:2045211  nutrient amount in blood  ...  nutrient level in blood

# Before (entity_attribute.tsv, auto-generated label)
OBA:0002005  [auto: external genitalia amount]  ...  (no level synonym)

# After
OBA:0002005  [auto: external genitalia amount]  ...  external genitalia level

# Before (entity_attribute_location.tsv, auto-generated PR protein row)
OBA:2051966  [auto: amount of basement membrane-specific heparan sulfate proteoglycan core protein (human) in blood]  ...  (no level synonym)

# After (entity label resolved from merged_import.owl)
OBA:2051966  [auto: ...]  ...  level of basement membrane-specific heparan sulfate proteoglycan core protein (human) in blood

# Before (process_attribute_location.tsv, no synonym column)
OBA:2050069  serum lipase activity amount  ...

# After (synonym column added)
OBA:2050069  serum lipase activity amount  ...  serum lipase activity level

definitions.owl will be regenerated automatically by the DOSDP CI workflow (dosdp.yml) on merge.

Copilot AI linked an issue May 6, 2026 that may be closed by this pull request
Copilot AI and others added 3 commits May 6, 2026 12:25
- In all TSV pattern data files, rename defined_class_name values:
  'level' → 'amount', 'levels' → 'amounts' at word boundaries
- For renamed 'level' terms: add original 'level' name as exact synonym
- For existing 'amount' terms: add 'level' version as exact synonym
- Affects ~12,971 level→amount renames + ~377 amount synonyms in
  entity_attribute_location.tsv, plus ~2,824 renames in attribute_ratio.tsv
  and smaller changes in other pattern files
- definitions.owl will be regenerated by CI (DOSDP workflow)"

Agent-Logs-Url: https://github.com/obophenotype/bio-attribute-ontology/sessions/d0c41147-aff3-4e0f-bc5d-8d2ec6a09b88

Co-authored-by: aleixpuigb <94959119+aleixpuigb@users.noreply.github.com>
Copilot AI changed the title [WIP] Clean up terms with amount and level in EFO Standardize amount/level terminology: rename level→amount, add cross-synonyms May 6, 2026
Copilot AI requested a review from aleixpuigb May 6, 2026 12:33
@aleixpuigb
Copy link
Copy Markdown
Collaborator

@copilot Thank you for the change, but the 'amount' pattern is slightly different than 'level of'. Normally the label starts with 'level of', but if we use amount, it is at the end of the term label. Here are some examples:

carbon dioxide amount
biopterins amount
tumor necrosis factor receptor superfamily member 3 amount
blood corticotropin-releasing hormone amount
urine phosphate amount
skin water amount
blood carnitine amount

The 'level of X in Y' naming pattern should convert to 'X amount [in Y]'
(entity first, 'amount' at the end), not 'amount of X in Y'.

Fixed in:
- entity_attribute_location.tsv: 12,768 names
- attribute_location_during_activity.tsv: 2 names
- entity_response_quality_in_location.tsv: 1 name
- chemical_role_attribute_location.tsv: 1 name

Examples after fix:
- 'level of ceramide' → 'ceramide amount' (synonym: 'level of ceramide')
- 'level of chordin in blood serum' → 'chordin amount in blood serum'
- 'level of glucose in blood during fasting' → 'glucose amount in blood during fasting'

Agent-Logs-Url: https://github.com/obophenotype/bio-attribute-ontology/sessions/33c97092-13d3-456a-9f05-17e742a8a5da

Co-authored-by: aleixpuigb <94959119+aleixpuigb@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 6, 2026

@copilot Thank you for the change, but the 'amount' pattern is slightly different than 'level of'. Normally the label starts with 'level of', but if we use amount, it is at the end of the term label. ...

Fixed in cd3e1c0. The level of X [in Y] style terms now correctly transform to X amount [in Y] (entity first, "amount" at end), matching the existing naming convention:

  • level of ceramideceramide amount (synonym: level of ceramide)
  • level of chordin in blood serumchordin amount in blood serum (synonym: level of chordin in blood serum)
  • level of glucose in blood during fastingglucose amount in blood during fasting

The X level style terms (e.g., biopterins level) continue to transform to X amount (e.g., biopterins amount), which was already correct.

@aleixpuigb aleixpuigb marked this pull request as ready for review May 6, 2026 14:11
@aleixpuigb aleixpuigb requested a review from ar-ibrahim May 6, 2026 14:11
@ar-ibrahim
Copy link
Copy Markdown
Collaborator

@aleixpuigb There are some issues.

For example, for some terms, labels have not been added by the curator in the respective DOSDP tsv file and are auto-generated. Therefore changes requested by the agent in this PR have not been implemented. Example: for OBA:2052779 amount of retbindin (human) in blood, the label has remained the same and no 'level' synonym has been added.

For other terms like OBA:1000941 insulin level, changing the label to incorporate the attribute 'amount' has resulted in duplicated synonyms, so the term has changed to OBA:1000941 insulin amount with the exact_synonym insulin amount.

Also, out of the 14 patterns, only 5 have 'synonym' columns, so while the labels have been updated, no 'level' synonym has been added. For example OBA:2045211 nutrient level in blood of the chemical_role_attribute_location pattern has changed to OBA:2045211 nutrient amount in blood with the exact_synonym blood nutrient amount only (which has been auto-generated).

@aleixpuigb
Copy link
Copy Markdown
Collaborator

@copilot can you address @ar-ibrahim comments?

Note that for the auto-generated labels, we only want to add synonym to the ones that are using the term 'amount'

1. Duplicate synonyms: Remove label from exact_synonyms when it duplicates
   the new label (86 rows in entity_attribute_location.tsv, e.g.
   OBA:2040177 'ceramide amount' was listed in its own synonyms)

2. Auto-generated labels: For 6 rows with empty defined_class_name but
   known entity_name + location_name + attribute_name=amount, set explicit
   label (e.g. 'insulin amount in blood') and add level synonym

3. Missing synonym support: Add exact_synonyms data_list_var to
   chemical_role_attribute.yaml and chemical_role_attribute_location.yaml
   patterns, add exact_synonyms column to both TSV files, and populate
   level synonyms for all rows (e.g. 'blood thyroid hormone level' for
   OBA:VT0003351 'blood thyroid hormone amount'; 'thyroid hormone level'
   for auto-generated OBA:1000972)

Agent-Logs-Url: https://github.com/obophenotype/bio-attribute-ontology/sessions/8946373c-0106-42da-b062-f4970fe1292a

Co-authored-by: aleixpuigb <94959119+aleixpuigb@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 7, 2026

@copilot can you address @ar-ibrahim comments?

Note that for the auto-generated labels, we only want to add synonym to the ones that are using the term 'amount'

Addressed in 558b76c — three fixes applied:

  1. Duplicate synonyms: Removed the label from exact_synonyms when it was identical to the new label (86 rows in entity_attribute_location.tsv, e.g. OBA:2040177 now has ceramide amount as label and only level of ceramide as synonym).

  2. Auto-generated labels with known names: For the 6 rows with empty defined_class_name but known entity_name + location_name + attribute_name=amount, set the explicit label (e.g. insulin amount in blood) and added the level synonym (e.g. level of insulin in blood). The remaining ~995 auto-generated rows (no entity/location names in TSV) cannot be addressed without running the DOSDP pipeline to resolve ontology labels — per @aleixpuigb's note, we're only handling rows where 'amount' is confirmed.

  3. Missing synonym columns: Added exact_synonyms data_list_var to chemical_role_attribute.yaml and chemical_role_attribute_location.yaml DOSDP patterns, added the exact_synonyms column to both TSV files, and populated level synonyms for all rows — including auto-generated rows where role_name is known (e.g. OBA:1000972 gets synonym thyroid hormone level).

@aleixpuigb
Copy link
Copy Markdown
Collaborator

@copilot There are still many terms in entity_attribute.tsv that are missing the exact synonym 'level of'. For example OBA:0002005, OBA:0002009. It would be easier if you look for terms that use the PATO:0000070 term (amount) and check if they have a synonym with 'level of'. Check in other DOSDP as well in case there are more missing.

- entity_attribute.tsv: add '{entity_name} level' synonym to all 675
  auto-generated rows (empty defined_class_name) where attribute is
  PATO:0000070 (amount) and entity_name is known in TSV
  (e.g. OBA:0002005 'external genitalia amount' gets synonym
  'external genitalia level'; OBA:0002009 'renal collecting system
  amount' gets 'renal collecting system level')

- process_attribute_location.yaml: add exact_synonyms data_list_var
  support (mirrors pattern used in other DOSDP patterns)

- process_attribute_location.tsv: add exact_synonyms column;
  OBA:2050069 'serum lipase activity amount' gets synonym
  'serum lipase activity level'

Agent-Logs-Url: https://github.com/obophenotype/bio-attribute-ontology/sessions/dbefd126-990d-42c1-bb50-ef1a0771238e

Co-authored-by: aleixpuigb <94959119+aleixpuigb@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 7, 2026

@copilot There are still many terms in entity_attribute.tsv that are missing the exact synonym 'level of'. For example OBA:0002005, OBA:0002009. It would be easier if you look for terms that use the P...

Fixed in b876496:

  • entity_attribute.tsv: Added {entity_name} level synonym to all 675 auto-generated rows (empty defined_class_name) where attribute = PATO:0000070 and entity_name is known. OBA:0002005 now has synonym external genitalia level; OBA:0002009 now has renal collecting system level.

  • process_attribute_location.tsv: Added exact_synonyms column support (YAML pattern updated too). OBA:2050069 serum lipase activity amount now has synonym serum lipase activity level.

All other DOSDP files were checked — no further explicit-name terms with "amount" in the label are missing a level synonym.

@ar-ibrahim
Copy link
Copy Markdown
Collaborator

@copilot There are still 995 terms in the entity_attribute_location.tsv file with the attribute PATO:0000070 (amount) that do not have 'level' synonyms. These terms do have entity and location entries in the tsv file. Example: OBA:2051966.

…e_location.tsv

All 995 rows with empty defined_class_name/entity_name but known location_name
('blood') and attribute PATO:0000070 now have a 'level of {PR_label} in blood'
synonym, resolved from merged_import.owl.

Example:
  OBA:2051966 (PR:P98160 / blood)
    → synonym: 'level of basement membrane-specific heparan sulfate
       proteoglycan core protein (human) in blood'

Agent-Logs-Url: https://github.com/obophenotype/bio-attribute-ontology/sessions/9abb2232-24fc-486e-85c5-015d0af20310

Co-authored-by: ar-ibrahim <97462656+ar-ibrahim@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 11, 2026

@copilot There are still 995 terms in the entity_attribute_location.tsv file with the attribute PATO:0000070 (amount) that do not have 'level' synonyms. These terms do have entity and location entries...

Fixed in 370b4eb. All 995 rows resolved: the PR protein labels were looked up from merged_import.owl and level of {entity_label} in blood synonyms were added.

Example:

  • OBA:2051966 (PR:P98160) → synonym: level of basement membrane-specific heparan sulfate proteoglycan core protein (human) in blood
  • OBA:2051967 (PR:O43464) → synonym: level of serine protease HTRA2, mitochondrial (human) in blood

No auto-generated amount rows in entity_attribute_location.tsv remain without a level synonym.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants