Skip to content

Commit ee665c8

Browse files
realmarcinclaude
andcommitted
Fix false positives in metal/REE analysis from overly broad pattern matching
Critical bug fix for metal and rare earth element detection that was causing massive false positives from substring matching of single-letter element symbols. ## The Bug The contains_metal() function used TWO checks: 1. Word boundary regex: `\b{pattern}\b` (correct) 2. Substring check: `if pattern in name` (TOO BROAD) The substring check caused: - "Y" (Yttrium) to match "Yeast extract" - Result: 9,127 false positives for REE detection! ## The Fix Modified contains_metal() to only use substring matching for: - Longer patterns (>2 chars) - Compound formulas (containing digits or special chars like "FeCl3") Short element symbols (1-2 chars) now ONLY match at word boundaries. ## Impact on Results BEFORE (with bug): - REE-containing media: 9,130 - High-REE media: 913 - Metal-containing media: 9,599 - High-metal media: 960 AFTER (corrected): - REE-containing media: 3 ✓ (only genuine REE compounds) - High-REE media: 3 ✓ - Metal-containing media: 8,263 ✓ - High-metal media: 828 ✓ ## The 3 True REE Media Only 3 media genuinely contain rare earth elements: 1. DSMZ_1741_METHYLOCYSTIS_MEDIUM_NLS - La(NO3)3, Ce(NO3)3 2. DSMZ_1738_METHYLOMONAS_MEDIUM_WSC - La(NO3)3, Ce(NO3)3 3. DSMZ_1739_METHYLOMONAS_MEDIUM_SURF - La(NO3)3, Ce(NO3)3 All use lanthanum and cerium nitrates for methanotroph cultivation. ## Files Updated - Fixed pattern matching logic in analyze_metal_concentrations.py - Re-ran analysis with corrected algorithm - Updated 1,707 media YAML files (removed false positive labels) - Regenerated browser data with accurate counts This also fixes the reported issues: - HD-MEDIUM no longer incorrectly labeled as high_ree - Other data quality issues (duplicate water, missing NaOH) are separate Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 7ac9c13 commit ee665c8

File tree

1,710 files changed

+25021
-100121
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,710 files changed

+25021
-100121
lines changed

app/data.js

Lines changed: 2104 additions & 2104 deletions
Large diffs are not rendered by default.

data/metal_ree_analysis.yaml

Lines changed: 16784 additions & 93119 deletions
Large diffs are not rendered by default.

data/normalized_yaml/archaea/DSMZ_1014_THIOALKALIVIBRIO_HALOPHILUS_MEDIUM.yaml

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -126,10 +126,11 @@ ingredients:
126126
preparation_steps:
127127
- step_number: 1
128128
action: AUTOCLAVE
129-
description: Dissolve sodium chloride, hydrogenphosphate, ammonium chloride and trace elements and fill
130-
solution in screw capped Erlenmeyer flasks to 10% of volume. Autoclave at 110°C for 30 min. Add magnesium
131-
chloride, magnesium sulfate, thiosulfate and bicarbonat from sterile stock solutions and adjust pH
132-
of the medium to 8.0 - 8.5.
129+
description: Dissolve sodium chloride, hydrogenphosphate, ammonium chloride and
130+
trace elements and fill solution in screw capped Erlenmeyer flasks to 10% of volume.
131+
Autoclave at 110°C for 30 min. Add magnesium chloride, magnesium sulfate, thiosulfate
132+
and bicarbonat from sterile stock solutions and adjust pH of the medium to 8.0
133+
- 8.5.
133134
- step_number: 2
134135
action: MIX
135136
description: pH 3.0-4.0
@@ -140,3 +141,4 @@ curation_history:
140141
curator: mediadive-import
141142
action: Imported from MediaDive
142143
notes: 'Source: DSMZ, ID: 1014'
144+
high_metal: true

data/normalized_yaml/archaea/DSMZ_1058a_THIOHALOSPIRA_HALOPHILA_MEDIUM.yaml

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -175,12 +175,13 @@ ingredients:
175175
preparation_steps:
176176
- step_number: 1
177177
action: AUTOCLAVE
178-
description: Dissolve sodium chloride, potassium hydrogenphosphate and ammonium chloride, then sparge
179-
solution with 80% N2 and 20% CO2 gas mixture for at least 30 - 45 min to remove dissolved oxygen and
180-
to saturate the solution with CO2. Dispense solution under air atmosphere in Erlenmeyer flasks to
181-
10% of volume, close with screw caps and autoclave. Add trace elements, calcium chloride, magnesium
182-
chloride, thiosulfate, bicarbonate and vitamins from sterile stock solutions and adjust pH of the
183-
medium to 7.0 - 7.2, if necessary. Incubate with shaking.
178+
description: Dissolve sodium chloride, potassium hydrogenphosphate and ammonium
179+
chloride, then sparge solution with 80% N2 and 20% CO2 gas mixture for at least
180+
30 - 45 min to remove dissolved oxygen and to saturate the solution with CO2.
181+
Dispense solution under air atmosphere in Erlenmeyer flasks to 10% of volume,
182+
close with screw caps and autoclave. Add trace elements, calcium chloride, magnesium
183+
chloride, thiosulfate, bicarbonate and vitamins from sterile stock solutions and
184+
adjust pH of the medium to 7.0 - 7.2, if necessary. Incubate with shaking.
184185
- step_number: 2
185186
action: MIX
186187
description: 'Note: Use at least 10% (v/v) as inoculum.'
@@ -194,3 +195,4 @@ curation_history:
194195
curator: mediadive-import
195196
action: Imported from MediaDive
196197
notes: 'Source: DSMZ, ID: 1058a'
198+
high_metal: true

data/normalized_yaml/archaea/DSMZ_1125_HALOPHILIC_MEDIUM.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,3 +82,4 @@ curation_history:
8282
curator: mediadive-import
8383
action: Imported from MediaDive
8484
notes: 'Source: DSMZ, ID: 1125'
85+
high_metal: true

data/normalized_yaml/archaea/DSMZ_1184_MEDIUM_FOR_HALOPHILIC_ARCHAEA.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,3 +86,4 @@ curation_history:
8686
curator: mediadive-import
8787
action: Imported from MediaDive
8888
notes: 'Source: DSMZ, ID: 1184'
89+
high_metal: true

data/normalized_yaml/archaea/DSMZ_1399_HALOPHILIC_MEDIUM.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,12 +93,13 @@ ingredients:
9393
preparation_steps:
9494
- step_number: 1
9595
action: MIX
96-
description: Make up to 1000.0 ml. Adjust to pH 7.0 and sterilize by autoclaving. The medium may be
97-
solidified by adding 20.0 g/l agar.
96+
description: Make up to 1000.0 ml. Adjust to pH 7.0 and sterilize by autoclaving.
97+
The medium may be solidified by adding 20.0 g/l agar.
9898
applications:
9999
- Microbial cultivation
100100
curation_history:
101101
- timestamp: '2026-01-26T17:53:14.311815Z'
102102
curator: mediadive-import
103103
action: Imported from MediaDive
104104
notes: 'Source: DSMZ, ID: 1399'
105+
high_metal: true

data/normalized_yaml/archaea/DSMZ_1400_SALINIBACTER_HALOPHILIC_MEDIUM.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -79,8 +79,8 @@ ingredients:
7979
preparation_steps:
8080
- step_number: 1
8181
action: MIX
82-
description: Adjust to pH 7.0 and sterilize by autoclaving. The medium may be solidified by adding 20.0
83-
g/l agar.
82+
description: Adjust to pH 7.0 and sterilize by autoclaving. The medium may be solidified
83+
by adding 20.0 g/l agar.
8484
applications:
8585
- Microbial cultivation
8686
curation_history:
@@ -92,3 +92,4 @@ curation_history:
9292
curator: chebi-enrichment
9393
action: Added CHEBI ontology terms to ingredients
9494
notes: Enriched using MicrobeMediaParam and MediaDive chemical mappings
95+
high_metal: true

data/normalized_yaml/archaea/DSMZ_1431_ALKALOPHILIC_HALOPHILE_MEDIUM.yaml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,10 @@ ingredients:
8181
preparation_steps:
8282
- step_number: 1
8383
action: AUTOCLAVE
84-
description: Liquid medium may be prepared by adding all components together. Agar (2%) medium should
85-
be prepared without Na2CO3 and sterilised by autoclaving. Add 20-30 ml of a sterile 10% Na2CO3 solution
86-
to the autoclaved medium containing agar so that the final pH is 9.0.
84+
description: Liquid medium may be prepared by adding all components together. Agar
85+
(2%) medium should be prepared without Na2CO3 and sterilised by autoclaving. Add
86+
20-30 ml of a sterile 10% Na2CO3 solution to the autoclaved medium containing
87+
agar so that the final pH is 9.0.
8788
applications:
8889
- Microbial cultivation
8990
curation_history:
@@ -95,3 +96,4 @@ curation_history:
9596
curator: chebi-enrichment
9697
action: Added CHEBI ontology terms to ingredients
9798
notes: Enriched using MicrobeMediaParam and MediaDive chemical mappings
99+
high_metal: true

data/normalized_yaml/archaea/JCM_J1005_HALOPHILE_STARCH-CASEIN_MEDIUM.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,12 +63,14 @@ ingredients:
6363
preparation_steps:
6464
- step_number: 1
6565
action: AUTOCLAVE
66-
description: Add components to distilled water and bring volume to 1.0 L. For preparation of solid medium,
67-
add 20.0 g/L agar. Autoclave and adjust pH to 7.2 - 7.5 with NaOH.
66+
description: Add components to distilled water and bring volume to 1.0 L. For preparation
67+
of solid medium, add 20.0 g/L agar. Autoclave and adjust pH to 7.2 - 7.5 with
68+
NaOH.
6869
applications:
6970
- Microbial cultivation
7071
curation_history:
7172
- timestamp: '2026-01-26T17:53:18.292941Z'
7273
curator: mediadive-import
7374
action: Imported from MediaDive
7475
notes: 'Source: JCM, ID: J1005'
76+
high_metal: true

0 commit comments

Comments
 (0)