Skip to content

initial nabat models/api script #122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 46 commits into
base: main
Choose a base branch
from
Open

initial nabat models/api script #122

wants to merge 46 commits into from

Conversation

BryonLewis
Copy link
Collaborator

@BryonLewis BryonLewis commented Feb 4, 2025

resolves #121

NOTE: Use the ./scripts/USGS/sampleURL.txt for testing, just inject your apiToken.

  • Creates new /nabat under /models for nabat specific models
    • AcousticBatch -> Recording
    • NaBatSpectrogram -> Spectrogram
    • NaBatCompressedSpectrogram -> CompressedSpectrogram
  • Added a sample script to read from the NaBat GraphQL API
  • Under ./tasks/nabat/ added a tasks.py and a nabat_data_retrieval.py
    • The nabat_data_retrieval has a main acoustic_batch_initialize that does:
      • Using the ApiToken and BatchId it requests some basic information about the NaBat file for the batchId
      • Using a ProjectId and the filename it requests a presigned S3 URL for downloading the file
      • Downloads the wav file
      • Creates an AcousticBatch using the metadata that we have
      • Using the downloaded audio file it generates a spectrogram
      • From the spectrogram it generates a compressed spectrogram
      • It also creates an annotation if one exists for the file and uses the proper species
      • Then it runs the prediction on the compressed spectrogram and logs the results
  • Under the /views there is an /nabat sub directory that contains the new endpoints for the nabat integration
    • NABat access relies on having API key access to the file for displaying
    • The endpoints are all public and don't require a django username but require that the NABat API Token has access to the file
    • The above is why in a lot of cases there is a check for getting the presigned S3 URL for the file before viewing or editing annotations.
  • Client Side
    • Added a new /nabat route that avoids the standard login because the API endpoints are all unpotected
    • There is a default /nabat/{recordingId}/?surveyEventId={surveyEventId}&apiToken={apiToken}
      • This endpoint will initially look for an existing NABatRecording withthe recording Id.
      • If that is not found it will check for an in progress processing task
      • If both of the above are false it starts a task for downloading and processing the spectrogram. This also creates a processingTask that can be used to check if a version is in progress
      • Finally when complete it will redirect to the /nabat/spectrogram/{id} route to display the spectrogram
    • The Spectrogram Annotation viewer has some minor changes
      • Users can only make a single annotation (this is an NABat limitation)
      • Only File Annotaitons are enabled currently
  • Removed references to flower and the container from the docker compose settings
  • Added a generateSpectrograms.py script to test the generation on a folder of files and log any errors that happen.
  • Added a configuration option for NABat that allows importing the species list from NABat.

TODO:

  • Client Side

    • New endpoint that is used for nabat data
      • Takes in the URL API Key and BatchID
      • Uses this information to create an AcousticBatch with the information
      • After creation of the AcousticBatch it should download the audio file and create the spectrogram and compressed spectrogram
      • Finally it should run the AutoId onnx model we are using
    • The UI for NaBat spectrograms will need to be created, this is a separate from the default one
      • Mostly the pulse and sequence annotations won't exist
      • The API will need a new /nabat header for getting information
        • An acoustic_batch/${id}/spectrogram endpoint for getting the spectrogram data
        • An acoustic_batch/${id}/spectrogram/compressed endpoint for getting the compressed spectrogram data
        • Endpoints for setting and configuring Acoustic Batch Annotations (Recording level annotations)
  • Back-End

    • /nabat URLs for getting information about AcousticBatch, Spectrograms and CompressedSpectrograms
    • Logic for getting a presigned URL for download that can be used to generate spectrograms and perform inference
    • Updating inference logic to properly use the new API for calculation
    • PROJECTID is static and needs to be retrieved using the batchId.
    • Better logging of the progress of downloading/processing/conversion (This may be a separate PR).
  • Clean up compressed spectrogram generation and prediction code. It's a bit duplicated in locations and could be improved.

20240404 TODO:

  • Update to utilize new Query for presigned URL
  • Update the endpoint to be outside of Use Credentials/Logging in
  • Extract from API Token the username and email for associatiating a user with the system
  • Update the Front-End to respect the new UI updates

20250408 TODO:

  • Front-end Annotation UI Implementation
    • Handle the difference between model-suggested annotations and user annotations
    • Adding existing annotations on import (we only have the user email)
    • Testing of creation/deletion/updating file annotations
  • Integration of the Prediction settings to turn on/off automatic annotation calculation
  • Restrict to a single Annotation per file per user
  • Pushing annotations to NABat
  • Update the Species List from NaBat (maybe a configuration endpoint to run?)

20250411 TODO:

  • CONVERT THE PUSHING OF SPECIES ANNOTATIONS TO mutation instead of query

BryonLewis and others added 28 commits February 4, 2025 14:32
…on to tasks, update NABat spectro generating process
* renames acoustic batch to recording based on conversations

* add migrations
Add additional settings to the admin page
@BryonLewis
Copy link
Collaborator Author

Updated queries:
add a basic query:
SpeciesId is from the species we utilize.
Takes the email from the JWT token for pushing value.
It has an assumed manual species classifier list.

SoftwareId is a Kitware SoftwareId.

Query VetQuery {
updateAcousticFileVet(
surveyEventId: 4481017
acousticFileId: 174988990
softwareId: 81
speciesId: 29
) {
acousticFileBatchId
}
}

query PresignedUrlQuery {
presignedUrlFromAcousticFile(acousticFileId: "174988990")
{
s3PresignedUrl
}
}

@BryonLewis
Copy link
Collaborator Author

Add in new query to get existing annotations and match them up with any thing we currently have:

query GetFileBatchesBySurveyEventIdAndFileId {

  surveyEventById(id: "4481017") {

    acousticBatchesBySurveyEventId(filter: {softwareId: {equalTo:81}}) {

      nodes {

        id

        acousticFileBatchesByBatchId(filter: {fileId: {equalTo: "174988990"}}) {

          nodes {

            autoId

            manualId

            vetter

            speciesByManualId {

              speciesCode

            }

          }

        }

      }

    }

  }

}

@BryonLewis BryonLewis requested a review from naglepuff April 17, 2025 17:39
Comment on lines 32 to 35
'NABatRecordingAnnotation',
'NABatCompressedSpectrogram',
'NABatSpectrogram',
'NABatRecording',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action needed, just observing that these don't follow the convention of other classes here of ending with ...Admin. Not sure if you want to keep that convention for these or move forward as-is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, I'll update it

Comment on lines 39 to 47
export interface NATBatFiles {
id: number,
recording_time: string;
recording_location: string | null;
file_name: string | null;
s3_verified: boolean | null;
length_ms: number | null;
size_bytes: number | null;
survey_event: null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an argument to be made for moving this into NABatApi.ts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good idea.

Comment on lines 68 to 69
getSpectrogram,
getSpectrogramCompressed,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think these functions should include NABat in their names? Is there a reason they share names with functions in the non-NABat api.ts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

Comment on lines 12 to 15
apiToken: {
type: String,
default: () => undefined,
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own learning, why have the default be a function that returns undefined instead of just using the empty string as the default?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should probably change it to String as PropType<string | undefined>;

@@ -85,7 +105,8 @@ export default defineComponent({
confidence,
comments,
updateAnnotation,
deleteAnno
deleteAnno,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be a big change/touch a ton of files if we changed this to deleteAnnotation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting removed/disabled right now. But I'll change the name currently.

if not request.user.is_authenticated or not request.user.is_superuser:
return JsonResponse({'error': 'Permission denied'}, status=403)
existing_task = ProcessingTask.objects.filter(
metadata__type='Updating Species',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any value in using an enum for metadata.type at this point? Maybe at least for this one we can use a variable for the string 'Updating Species'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implment this as an enum instead of.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the commented out code here be removed?


@app.task(bind=True)
def update_nabat_species(self):
processing_task = ProcessingTask.objects.filter(celery_id=self.request.id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this guaranteed to exist at this point or is that up to the code that starts this task?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you check for the existence of the task or in the calling request doing something different.

Comment on lines 20 to 26
# @router.get("/filtered", response=List[ProcessingTaskSerializer])
# def filtered_tasks(request, status: Optional[str] = None):
# if status and status not in ProcessingTask.Status.values:
# return {"error": f"Invalid status value. Allowed values are {ProcessingTask.Status.values}."}, 400

# tasks = ProcessingTask.objects.filter(status=status) if status else ProcessingTask.objects.all()
# return ProcessingTaskSerializer(tasks, many=True).data
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these lines.

Comment on lines 12 to 15
# class ProcessingTaskSerializer(serializers.ModelSerializer):
# class Meta:
# model = ProcessingTask
# fields = '__all__'
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NABat Integration Support
2 participants