Conversation
that no longer exists
propagating block rules down the taxonomic tree.
…to build_geofence_release.py
build_geofence_release
stefanistrate
left a comment
There was a problem hiding this comment.
Looks good overall. A couple of comments below.
|
|
||
| Examples: | ||
|
|
||
| This taxon would be allowed in RUS, SGP, THA, TWAN, and VNM. In the USA, |
There was a problem hiding this comment.
Did you mean TWN instead of TWAN?
| # taxonomy_release.txt (which requires GUIDs for parent taxa of the taxa that appear | ||
| # in the labels file). | ||
| wildlife_insights_page_size = 30000 | ||
| wildlife_insights_taxonomy_url = "https://api.wildlifeinsights.org/api/v1/taxonomy/taxonomies-all?fields=class,order,family,genus,species,authority,taxonomyType,uniqueIdentifier,commonNameEnglish&page[size]={}".format( |
There was a problem hiding this comment.
You could use an f-string here for nicer formatting over multiple lines, e.g.
... = (
"https://...?"
"fields=class,order,"
"family,genus,"
"..."
f"commonNameEnglish&page[size]={wildlife_insights_page_size}"
)|
|
||
|
|
||
| def _validate_taxon_string(taxon: str) -> bool: | ||
| """Validates a five-token taxon string. Errors in invalid |
|
|
||
| tokens = taxon.split(";") | ||
|
|
||
| if (not isinstance(taxon, str)) or (len(tokens) != 5): |
There was a problem hiding this comment.
Outer parenthesis not needed.
| return sorted(list(five_token_taxa)) | ||
|
|
||
|
|
||
| def generate_release_taxonomy_from_label_list( |
There was a problem hiding this comment.
I'm wondering if this long function could be split into several self-contained parts, so that we'd improve the code readability a bit?
| return sorted(list(five_token_taxa)) | ||
|
|
||
|
|
||
| def generate_release_taxonomy_from_label_list( |
There was a problem hiding this comment.
I don't see this function being called in the current script at all. Would it be better to split the utilities into one or more separate modules, and just use this script for a high level assembling of everything?
| return labels | ||
|
|
||
|
|
||
| def download_wildlife_insights_taxonomy( |
There was a problem hiding this comment.
Adding support for handling the WI taxonomy here implies that we'll make sure over time that it won't diverge in an incompatible way from SpeciesNet. I'm sure we'll be quick to keep updating SpeciesNet if necessary, but if anyone just wants to experiments with their own quick fixes (let's say in geofence_fixes.csv), I don't want to force them to deal with a (potentially) updated WI taxonomy. Will the current script (build_geofence_release.py) still allow users to only update geofence rules without having to deal with taxonomy changes outside of this repo?
|
Thanks, @stefanistrate. I made almost all of the stylistic changes you recommended, and for posterity, we clarified on another thread that this module is dependent on taxonomy_release.txt now, and includes the function to update it, but it's not dependent on the WI taxonomy API, because taxonomy_release.txt is now included in the repo and is used by default. Will merge this PR, update the model file on kaggle, create a new PR that changes the default model file, then run the geofencing tests against the new model package. I've already run the (new) validation functions (from this PR) against both the new geofence_release.json and the new taxonomy_release.txt. |
This PR does not change any inference-time code. It includes the following changes:
None of these changes will have any impact on users until a subsequent model package release.