Skip to content

Commit 299c57e

Browse files
authored
Move species filtering for 'phage' and 'sp.'
Removed filtering of species names that include 'phage' or 'sp.' from the DataFrame.
1 parent 6529654 commit 299c57e

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

scripts/parse.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,10 @@ def _extract_species_name(classification: str) -> str:
9494
)
9595
return ""
9696

97+
# Filter out classifications we don't want
98+
if "phage" in species or "sp." in species:
99+
return ""
100+
97101
return species
98102

99103

@@ -171,10 +175,6 @@ def parse_mash_winning_sorted_tab(
171175
# Extract species names
172176
df["species"] = df["full_classification"].apply(_extract_species_name)
173177

174-
# Filter out species names that include "phage"
175-
df = df[~df["species"].str.contains("phage", case=False, na=False)]
176-
df = df[~df["species"].str.contains("sp.", case=False, na=False)]
177-
178178
# Filter by median multiplicity factor
179179
if df.empty:
180180
logger.debug("No mash hits after species filtering", extra={"path": str(fp)})

0 commit comments

Comments
 (0)