Commit ee665c8
Fix false positives in metal/REE analysis from overly broad pattern matching
Critical bug fix for metal and rare earth element detection that was causing
massive false positives from substring matching of single-letter element symbols.
## The Bug
The contains_metal() function used TWO checks:
1. Word boundary regex: `\b{pattern}\b` (correct)
2. Substring check: `if pattern in name` (TOO BROAD)
The substring check caused:
- "Y" (Yttrium) to match "Yeast extract"
- Result: 9,127 false positives for REE detection!
## The Fix
Modified contains_metal() to only use substring matching for:
- Longer patterns (>2 chars)
- Compound formulas (containing digits or special chars like "FeCl3")
Short element symbols (1-2 chars) now ONLY match at word boundaries.
## Impact on Results
BEFORE (with bug):
- REE-containing media: 9,130
- High-REE media: 913
- Metal-containing media: 9,599
- High-metal media: 960
AFTER (corrected):
- REE-containing media: 3 ✓ (only genuine REE compounds)
- High-REE media: 3 ✓
- Metal-containing media: 8,263 ✓
- High-metal media: 828 ✓
## The 3 True REE Media
Only 3 media genuinely contain rare earth elements:
1. DSMZ_1741_METHYLOCYSTIS_MEDIUM_NLS - La(NO3)3, Ce(NO3)3
2. DSMZ_1738_METHYLOMONAS_MEDIUM_WSC - La(NO3)3, Ce(NO3)3
3. DSMZ_1739_METHYLOMONAS_MEDIUM_SURF - La(NO3)3, Ce(NO3)3
All use lanthanum and cerium nitrates for methanotroph cultivation.
## Files Updated
- Fixed pattern matching logic in analyze_metal_concentrations.py
- Re-ran analysis with corrected algorithm
- Updated 1,707 media YAML files (removed false positive labels)
- Regenerated browser data with accurate counts
This also fixes the reported issues:
- HD-MEDIUM no longer incorrectly labeled as high_ree
- Other data quality issues (duplicate water, missing NaOH) are separate
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 7ac9c13 commit ee665c8
File tree
1,710 files changed
+25021
-100121
lines changed- app
- data
- normalized_yaml
- archaea
- bacterial
- fungal
- specialized
- scripts
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
1,710 files changed
+25021
-100121
lines changedLarge diffs are not rendered by default.
Large diffs are not rendered by default.
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
133 | 134 | | |
134 | 135 | | |
135 | 136 | | |
| |||
140 | 141 | | |
141 | 142 | | |
142 | 143 | | |
| 144 | + | |
Lines changed: 8 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
175 | 175 | | |
176 | 176 | | |
177 | 177 | | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
184 | 185 | | |
185 | 186 | | |
186 | 187 | | |
| |||
194 | 195 | | |
195 | 196 | | |
196 | 197 | | |
| 198 | + | |
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
| 85 | + | |
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| 89 | + | |
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
93 | 93 | | |
94 | 94 | | |
95 | 95 | | |
96 | | - | |
97 | | - | |
| 96 | + | |
| 97 | + | |
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
104 | 104 | | |
| 105 | + | |
Lines changed: 3 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
82 | | - | |
83 | | - | |
| 82 | + | |
| 83 | + | |
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
Lines changed: 5 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | | - | |
85 | | - | |
86 | | - | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
87 | 88 | | |
88 | 89 | | |
89 | 90 | | |
| |||
95 | 96 | | |
96 | 97 | | |
97 | 98 | | |
| 99 | + | |
Lines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
66 | | - | |
67 | | - | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
68 | 69 | | |
69 | 70 | | |
70 | 71 | | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
74 | 75 | | |
| 76 | + | |
0 commit comments