Skip to content

Taxonomy geofence update#53

Merged
agentmorris merged 25 commits intomainfrom
taxonomy-geofence-update
Dec 8, 2025
Merged

Taxonomy geofence update#53
agentmorris merged 25 commits intomainfrom
taxonomy-geofence-update

Conversation

@agentmorris
Copy link
Collaborator

This PR does not change any inference-time code. It includes the following changes:

  • Added the labels file, geofence release file, and taxonomy release file (which are all included as part of the model package) to source control.
  • Taxonomic fixes described in issues 52 and 37.
  • The first update to the geofence since the initial SpeciesNet release, capturing several user-reported geofencing issues. Support for comments was added to geofence_fixes.csv, so it's a bit easier to see what changed.
  • Significant changes to build_geofence_release.py, which now updates taxonomy_release.txt as well as geofence_release.json.

None of these changes will have any impact on users until a subsequent model package release.

This was referenced Dec 8, 2025
Copy link
Collaborator

@stefanistrate stefanistrate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. A couple of comments below.


Examples:

This taxon would be allowed in RUS, SGP, THA, TWAN, and VNM. In the USA,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean TWN instead of TWAN?

# taxonomy_release.txt (which requires GUIDs for parent taxa of the taxa that appear
# in the labels file).
wildlife_insights_page_size = 30000
wildlife_insights_taxonomy_url = "https://api.wildlifeinsights.org/api/v1/taxonomy/taxonomies-all?fields=class,order,family,genus,species,authority,taxonomyType,uniqueIdentifier,commonNameEnglish&page[size]={}".format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use an f-string here for nicer formatting over multiple lines, e.g.

... = (
    "https://...?"
    "fields=class,order,"
    "family,genus,"
    "..."
    f"commonNameEnglish&page[size]={wildlife_insights_page_size}"
)



def _validate_taxon_string(taxon: str) -> bool:
"""Validates a five-token taxon string. Errors in invalid
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/in/if


tokens = taxon.split(";")

if (not isinstance(taxon, str)) or (len(tokens) != 5):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outer parenthesis not needed.

return sorted(list(five_token_taxa))


def generate_release_taxonomy_from_label_list(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this long function could be split into several self-contained parts, so that we'd improve the code readability a bit?

return sorted(list(five_token_taxa))


def generate_release_taxonomy_from_label_list(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this function being called in the current script at all. Would it be better to split the utilities into one or more separate modules, and just use this script for a high level assembling of everything?

return labels


def download_wildlife_insights_taxonomy(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding support for handling the WI taxonomy here implies that we'll make sure over time that it won't diverge in an incompatible way from SpeciesNet. I'm sure we'll be quick to keep updating SpeciesNet if necessary, but if anyone just wants to experiments with their own quick fixes (let's say in geofence_fixes.csv), I don't want to force them to deal with a (potentially) updated WI taxonomy. Will the current script (build_geofence_release.py) still allow users to only update geofence rules without having to deal with taxonomy changes outside of this repo?

@agentmorris
Copy link
Collaborator Author

Thanks, @stefanistrate. I made almost all of the stylistic changes you recommended, and for posterity, we clarified on another thread that this module is dependent on taxonomy_release.txt now, and includes the function to update it, but it's not dependent on the WI taxonomy API, because taxonomy_release.txt is now included in the repo and is used by default. Will merge this PR, update the model file on kaggle, create a new PR that changes the default model file, then run the geofencing tests against the new model package. I've already run the (new) validation functions (from this PR) against both the new geofence_release.json and the new taxonomy_release.txt.

@agentmorris agentmorris merged commit 3d85c45 into main Dec 8, 2025
18 checks passed
@agentmorris agentmorris deleted the taxonomy-geofence-update branch December 8, 2025 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants