Skip to content

[do not merge] Reusable requests functionality #9230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 103 commits into
base: develop
Choose a base branch
from

Conversation

Marishka17
Copy link
Contributor

@Marishka17 Marishka17 commented Mar 19, 2025

Motivation and context

API changes:

  • POST /api/consensus/merges?rq_id=rq_id returns 410 status code, this endpoint no longer supports process status checking
  • GET /api/projects/id/dataset?action=import_status returns 410 status code, this endpoint no longer supports process status checking
  • POST /api/projects/backup?rq_id=rq_id returns 410 status code, this endpoint no longer supports process status checking
  • POST /api/tasks/backup?rq_id=rq_id returns 410 status code, this endpoint no longer supports process status checking
  • PUT /api/tasks/id/annotations?rq_id=rq_id&format=format returns 410 status code, this endpoint no longer supports process status checking
  • PUT /api/jobs/id/annotations?rq_id=rq_id&format=format returns 410 status code, this endpoint no longer supports process status checking
  • GET /api/events is deprecated in favor of the following API:
    • [new] POST /api/events/export (returns 202 with Request ID)
    • GET /api/requests/rq_id
    • [new] GET /api/events/download?rq_id=rq_id (private endpoint, should be used only as result_url)
  • POST /api/quality/reports/rq_id=rq_id is deprecated in favor of GET /api/requests/rq_id

Architecture visible changes:

  • Cache files containing events (created after using the API to export events as a file) are stored in /data/cache/export/ instead of /data/tmp/. The _clear_export_cache function is deleted, since cache files are deleted one day after creation (by default) by the cleanup_export_cache_directory cron job.

API backward-incompatible changes:
- POST /api/projects|tasks|jobs/id/annotations|dataset|backup API accepts unified body: {"file": ...} when file is uploaded now via TUS.

SDK backward-incompatible changes:

  • [requests_api.list] action/target/subresource filters now have string type
  • Several types were removed: AnnotationFileRequest/ProjectFileRequest/DatasetFileRequest/TaskFileRequest; DatasetWriteRequest/BackupWriteRequest/TaskAnnotationsWriteRequest; JobAnnotationsUpdateRequest/TaskAnnotationsUpdateRequest
  • RqId renamed to RequestId

List with all possible request types:

  • Export requests:
    • action=export, target=project|task|job, target_id=target_id, subresource=dataset|annotations, format=format, user_id=uid
    • action=export, target=project|task, target_id=target_id, subresource=backup, user_id=uid
    • action=export, target=events, id=uuid, user_id=uid
  • Import requests:
    • action=import, target=task|job, target_id=target_id, subresource=annotations
    • action=import, target=project, target_id=target_id, subresource=dataset
    • action=import, target=project|task, id=uuid, subresource=backup
  • Creation requests:
    • action=create, target=task, target_id=target_id
  • Auto annotation requests:
    • action=autoannotate, target=task, target_id=target_id
  • Merging requests:
    • action=merge, target=task|job, target_id=target_id
  • Calculation requests:
    • action=calculate, target=task, target_id=target_id, subresource=quality

How has this been tested?

Checklist

  • I submit my changes into the develop branch
  • I have created a changelog fragment
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • I have linked related issues (see GitHub docs)

License

  • I submit my code changes under the same MIT License that covers the project.
    Feel free to contact the maintainers if that's a concern.

from django.conf import settings


class LayeredKeyDict(dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, using a custom class here obscures the underlying logic. I would suggest just adding an ordinary helper function:

def get_queue_for_triplet(action, target, subresource):
    if queue := MAPPING.get((action, target, subresource)):
        return queue
    if queue := MAPPING.get((action, subresource)):
        return queue
    return MAPPING[action]

Copy link
Contributor Author

@Marishka17 Marishka17 Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep my implementation because using get_queue_for_triplet requires passing action, target, and subresource every time, but in most cases, just the action is enough to determine the corresponding queue.

Moreover, I want to retain the flexibility of accessing the mapping via ACTION_TO_QUEUE[action] or ACTION_TO_QUEUE[(action, subresource)], since the QUEUE_SELECTORS defined in the RequestId subclasses are used to determine the queue based on the known RequestIdSubclass.

So, the pipeline looks like this:

  1. There are several RequestId subclasses. Based on RQ_QUEUES/PARSED_JOB_ID_CLASS/ and each RequestId subclass that has QUEUE_SELECTORS, we build the ACTION_TO_QUEUE (queue_selector -> queue).
  2. ACTION_TO_QUEUE is used to determine the queue based on the parsed action, target, and subresource.
  3. ACTION_TO_QUEUE is also used when the legacy rq_id is passed. In that case, we know the subclass, but we need to determine the corresponding queue, so one of the subclass's QUEUE_SELECTORS is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything you've described could be achieved by changing the signature to get_queue_to_selector(action, subresource=None, target=None). But since this is just a stylistic disagreement, I'm not going to insist.

queue_is_known = bool(queue)

if not queue_is_known:
_parsed = _RequestIdForMapping(**params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this? Couldn't we just use params["action"], etc. in the following line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this?

To run validations defined in the RequestId class, but okay, I've removed the helper class

@Marishka17 Marishka17 force-pushed the mk/reusable_requests_functionality branch from ecb133b to 620b81b Compare April 23, 2025 09:42
@@ -2104,35 +2097,33 @@ def test_api_v2_export_import_dataset(self):
]:
# TO-DO: fix bug for this formats
continue
if upload_format_name == "Ultralytics YOLO Classification 1.0":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this workaround need to be added now? Your changes don't seem like they should impact whether the import succeeds or fails.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They impact because now we check that the operation is finished successfully (by default).
image

Before that, there was a check that the import "initialization" (I remember about sync mode) response was 202.

return Response(serializer.data, status=status.HTTP_202_ACCEPTED)


@classmethod
def parse_filename(
cls, filename: str,
) -> ParsedDatasetFilename | ParsedBackupFilename:
) -> ParsedExportFilename | ParsedExportFilenameWithConstructedId:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return type is equivalent to just ParsedExportFilename.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it isn't, because they have different types for the file_id attribute

where rq_id is obtained from the response of the initializing request.
"""))

class DeprecatedResponse(Response):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem like a custom class is needed here - this could just be a function returning a Response.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about a function that accepts the same args as the Response constructor? In that case, what are the benefits of such an approach? The DeprecatedResponse does not introduce a lot of logic, but it looks like we still work with the DRF Response. IMHO, return DeprecatedResponse(msg, status=..., headers=) looks better than return get_deprecated_response(msg, status=..., headers=...)

@Marishka17 Marishka17 mentioned this pull request Apr 24, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants