-
Notifications
You must be signed in to change notification settings - Fork 73
[SYNPY-1590] Implement Submission(+Status, +Bundle) OOP model #1250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
thomasyu888
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work starting this, let's get the evaluation work across the finish line first so we don't spread ourselves too thin.
8b50d06 to
a061fb4
Compare
a061fb4 to
c748132
Compare
c748132 to
31d9583
Compare
…dle) OOP model (#1251) * tdd initial tests * initial intro of dataclasses * expose api services for submission object * style and update docstring * add submission and submissionstatus models * add submission status retrieval and update methods; remove empty submissionstatus file * pipe query params directly into restAPI httpx requests * new dataclass object submission_bundle * move submission services functions to evaluation_services.py * renaming imports, to_synapse_request, request body refactor * patching up store method signature * update docs * new suite of tests * submissionstatus rework as a mutable object * bug fix for Statuses: updated to_synapse_request to follow same pattern as evaluations design * replace != with is not for full object comparison (not just keys) * expose the is_private arg for to_submission_status_annotations ONLY FOR submission annotations * fixed submission status/submission annotations * add support for legacy annotations * remove debug prints * get_all_submission_statuses now returns a list of substat objects * docstring updates * update submissionbundle docstrings, add more examples * initial sync test for status. moved evaluation_id retrieval to fill_from_dict for submissionstatus calls * update submissionBundle submissionstatus with evaluation_id * patch sync substatus integ tests. style. * fix submissionStatus integ tests and has_changed attribute * new substatus async integ tests. can_cancel can now be modified by an organizer on the client. cancel request returns no response body. * new test class for submission cancel functionality * substatus async unit tests * remove compare=false for some attributes. update sync unit tests * add submissionBundle integration tests * add submissionBundle unit tests * remove unnecessary imports and add style * get_evaluation_submissions returns generator object * get_user_submissions returns generator object * submissionBundle methods return generators * address final todo: implement docker_tag, and note in docs that version_number can be ignored for docker submissions * import -> imports * change back to import. patch uses synapse logger instance. * lock mkdocstrings-python to >=2.0.0 * add Return description to create_submission * [SYNPY-1714] Fix Issues with Failing Tests (#1287) * always log warning * check assert called with * patch object on class instance * Revert "always log warning" This reverts commit ae63c81. * update patch target * re-add erroneously removed line * no need to build out Annotations object from scratch (remove evaluation_id attribute) * remove Dict, List imports * fix broken tests due to removed evaluation_id attr * style * import classes directly from typing. remove Dict and List. * style * add API reference links to _fetch_latest_entity * type hint should be logging.Logger instance (generic or Synapse client) * explicit import to follow the other imports * import order * minimize indenting by using if not: * raise ValueError(no etag for entity) sooner * remove docker submission async integration test * link Jira ticket to TODO item * remove old cancel submission test * remove old cancel submission test (async) * assert the SubmissionStatus object has not changed * valueError(msg) from e * ValueError -> LookupError * add evaluation_round_id and explicit optional_fields dict for organization * loop over optionalfields for SubmissionStatus * new global var: POSSIBLE_STATUSES * same for sync * remove validation using response * style * correctly pass synapse_client instance * use prior synapse_client instance. LookupError message --------- Co-authored-by: SageGJ <[email protected]>
| synapse_client: Optional[Synapse] = None, | ||
| ) -> dict: | ||
| """ | ||
| Update multiple SubmissionStatuses. The maximum batch size is 500. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since 500 is the limit, should we enforce that in the python method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the limit before the API starts a new batch. The request can still have >500 individual submission statuses, but once it exceeds 500, the API splits them into batches and the response we get back will include batch tokens to identify the batches.
https://rest-docs.synapse.org/rest/PUT/evaluation/evalId/statusBatch.html
BryanFauble
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments for areas to improve, but overall this is really excellent work @jaymedina . Thank you so much for your attention to detail to get these changes across the finish line.
linglp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jaymedina ! I spent some time today testing different functions in this script. For some reason, a few of them hang for me, and I’m curious whether you’ve encountered similar issues.
In addition, there are two things you might want to consider:
- I’m not currently seeing any code that validates whether status is one of the allowed values listed here. It may be worth adding validation to ensure status is valid.
- In the unit tests under models/synchronous/unit_test_submission_bundle.py and models/synchronous/unit_test_submission.py, many tests still appear to be calling async methods instead of their sync counterparts (for example, cancel_async instead of cancel).
| Arguments: | ||
| evaluation_id: The ID of the evaluation queue. | ||
| status: Optionally filter submissions by a submission status. | ||
| Submission status can be one of <https://rest-docs.synapse.org/rest/org/sagebionetworks/evaluation/model/SubmissionStatusEnum.html> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since status is an optional parameter, I would expect Submission.get_evaluation_submissions(evaluation_id="9999999") to still work. However, when I tested it, omitting status seems to cause the code to hang. Could you try verifying this on your end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a user passes in a status value that is not one of the enum values listed here? Can we add validation in the code to catch that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah. This method wraps the new async generator function, and for evaluation IDs that exist, this method works fine, with the generator object returning the submission objects as expected. But for invalid evaluation IDs, instead of raising an error like Your evaluation_id does not exist it silently returns nothing.
@BryanFauble is this the intended behavior for the wrap-async-to-sync generator function, or should we revisit? Do you see this as a blocker for this PR?
In [32]: subs = Submission.get_evaluation_submissions(eval_id)
In [33]: for i in subs:
...: print("submission: ", i.id)
...:
submission: 9742736
submission: 9742737
submission: 9743051
submission: 9743052
submission: 9746035
submission: 9746036
In [34]: Submission.get_evaluation_submissions("9999999")
Out[34]: <generator object SubmissionSynchronousProtocol.get_evaluation_submissions at 0x1055030b0>
In [35]: subs2 = Submission.get_evaluation_submissions("9999999")
In [36]: for i in subs:
...: print("submission: ", i.id)
...:
In [37]:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a user passes in a status value that is not one of the enum values listed here? Can we add validation in the code to catch that?
When a user passes an invalid status, we leave it to the API to handle that. For example, this is the message a user would retrieve when trying to collect submissions with a status of 'jenny':
SynapseHTTPError: 400 Client Error: No enum constant org.sagebionetworks.evaluation.model.SubmissionStatusEnum.JENNY
If there is a way to grab the valid constants for SubmissionStatusEnum, then I think we can add some error-handling here to make a prettier ValueError message. Otherwise, I would say we continue letting the API handle this error, since the other option would mean having to maintain a hard-coded list of available statuses.
^ If there's no known way to do this I can investigate further when I come back from holidays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I happend to have a discussion with @BryanFauble about Enum here. In short, we could support either a string or the enum, though using the enum is preferred. See this as an example.
I think something like this will work:
from enum import Enum
class SubmissionStatusEnum(str, Enum):
OPEN = "OPEN"
CLOSED = "CLOSED"
SCORED = "SCORED"
INVALID = "INVALID"
VALIDATED = "VALIDATED"
EVALUATION_IN_PROGRESS = "EVALUATION_IN_PROGRESS"
RECEIVED = "RECEIVED"
REJECTED = "REJECTED"
ACCEPTED = "ACCEPTED"
def validate_status(input_status: str) -> SubmissionStatusEnum:
try:
return SubmissionStatusEnum(input_status)
except ValueError:
valid = ", ".join([e.value for e in SubmissionStatusEnum])
raise ValueError(f"Invalid submission status: {input_status!r}. Must be one of: {valid}")
validate_status(SubmissionStatusEnum.SCORED) # Valid
validate_status("INVALID") # Valid
validate_status("UNKNOWN") # Invalid, should raise ValueError
tests/unit/synapseclient/models/synchronous/unit_test_submission.py
Outdated
Show resolved
Hide resolved
tests/unit/synapseclient/models/synchronous/unit_test_submission.py
Outdated
Show resolved
Hide resolved
tests/unit/synapseclient/models/synchronous/unit_test_submission.py
Outdated
Show resolved
Hide resolved
| ) | ||
|
|
||
| @classmethod | ||
| def get_user_submissions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not exactly sure what's going on, but when I tested it out the code on my end, I observed different outputs from sync v.s. async method:
def get_user_submission():
for submission in Submission.get_user_submissions(
evaluation_id=evaluation.id,
user_id="3443707"
):
print("user submission", submission)
get_user_submission()
async def get_user_submission_async():
async for submission in Submission.get_user_submissions_async(
evaluation_id=evaluation.id,
user_id="3443707"
):
print("user submission async", submission)
asyncio.run(get_user_submission_async())
The async version is able to print out many submission records, while the sync function gets stuck.
| @staticmethod | ||
| async def get_submission_count_async( | ||
| evaluation_id: str, | ||
| status: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above: I think we should add validation to ensure that status is one of the allowed enum values.
| """Protocol defining the synchronous interface for SubmissionBundle operations.""" | ||
|
|
||
| @classmethod | ||
| def get_evaluation_submission_bundles( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following code works for me:
async def list_submission_bundle():
print("Starting to fetch bundles...")
bundles = []
try:
async for bundle in SubmissionBundle.get_evaluation_submission_bundles_async(
evaluation_id=evaluation.id,
status="SCORED", # Add status filter
synapse_client=syn
):
print(f"Got a bundle: {bundle.submission.id if bundle.submission else 'None'}")
bundles.append(bundle)
except Exception as e:
print(f"Error occurred: {e}")
import traceback
traceback.print_exc()
print(f"Found {len(bundles)} submission bundles")
return bundles
I could see no submission bundle gets printed out. But if I remove status="SCORED", the code would hang again.
…OOP model (#1252) * tdd initial tests * initial intro of dataclasses * expose api services for submission object * style and update docstring * add submission and submissionstatus models * add submission status retrieval and update methods; remove empty submissionstatus file * pipe query params directly into restAPI httpx requests * new dataclass object submission_bundle * move submission services functions to evaluation_services.py * renaming imports, to_synapse_request, request body refactor * patching up store method signature * update docs * new suite of tests * submissionstatus rework as a mutable object * bug fix for Statuses: updated to_synapse_request to follow same pattern as evaluations design * replace != with is not for full object comparison (not just keys) * expose the is_private arg for to_submission_status_annotations ONLY FOR submission annotations * fixed submission status/submission annotations * add support for legacy annotations * remove debug prints * get_all_submission_statuses now returns a list of substat objects * docstring updates * update submissionbundle docstrings, add more examples * initial sync test for status. moved evaluation_id retrieval to fill_from_dict for submissionstatus calls * update submissionBundle submissionstatus with evaluation_id * patch sync substatus integ tests. style. * fix submissionStatus integ tests and has_changed attribute * new substatus async integ tests. can_cancel can now be modified by an organizer on the client. cancel request returns no response body. * new test class for submission cancel functionality * substatus async unit tests * remove compare=false for some attributes. update sync unit tests * add submissionBundle integration tests * add submissionBundle unit tests * remove unnecessary imports and add style * get_evaluation_submissions returns generator object * get_user_submissions returns generator object * submissionBundle methods return generators * address final todo: implement docker_tag, and note in docs that version_number can be ignored for docker submissions * add async page in api references * add initial submission, status, bundle docs * updated tutorial purpose. add api reference to navbar * new tutorial scripts * remove try/excepts. add line references. add resources and source code sections. * add reference to File model * deprecate old submission model * style * add submission status and bundle to references * no need to import Submission for submission_organizer tutorial * status should be CLOSED not CANCELLED. add Submission import back (its for the delete step). * set tutorial global vars to None * fix line references * remove limits * add 2 more examples for Submission docstring * use the same evaluation_id in all examples * elaborate on submissions, statuses, bundles, and their relationship to evaluations * remove AcessControllable from being inherited
Problem:
This work is part of the ongoing effort to refactor the Synapse Python Client using object-oriented principles, following the pattern established in https://sagebionetworks.jira.com/browse/SYNPY-1418.
We currently to not have an OOP model for communications to the Evaluation API services provided by the Synapse platform.
Solution:
The work has been separated into 3 feature branches, with the *main branch being intended for merge into develop and the other 2 branches merging into *main when ready.
The acceptance criteria has been split up into these 3 branches in the following way.
Branch:
synpy-1589-evaluation-model-mainBranch:
synpy-1590-submission-model-functionalityEvaluation API Design Summary
The data model of the Evaluation API is built around around two primary objects:
The data model includes additional objects to support scoring of Submissions and convenient data access:
Source: https://rest-docs.synapse.org/rest/#org.sagebionetworks.repo.web.controller.EvaluationController
The design for the python client tools for communicating with this API will take into consideration the Evaluation API design summary above. This implementation will look like so:
evaluation_services.pySubmission operations:
Create submission: https://rest-docs.synapse.org/rest/POST/evaluation/submission.html
Get submission: https://rest-docs.synapse.org/rest/GET/evaluation/submission/subId.html
Get submissions for evaluation: https://rest-docs.synapse.org/rest/GET/evaluation/evalId/submission/all.html
Get user submissions: https://rest-docs.synapse.org/rest/GET/evaluation/evalId/submission.html
Get submission count: https://rest-docs.synapse.org/rest/GET/evaluation/evalId/submission/count.html
Delete submission: https://rest-docs.synapse.org/rest/DELETE/evaluation/submission/subId.html
Cancel submission: https://rest-docs.synapse.org/rest/PUT/evaluation/submission/subId/cancellation.html
SubmissionStatus operations:
Get submission status: https://rest-docs.synapse.org/rest/GET/evaluation/submission/subId/status.html
Update submission status: https://rest-docs.synapse.org/rest/PUT/evaluation/submission/subId/status.html
Get all submission statuses: https://rest-docs.synapse.org/rest/GET/evaluation/evalId/submission/status/all.html
Batch update statuses: https://rest-docs.synapse.org/rest/PUT/evaluation/evalId/statusBatch.html
Bundle operations:
Get submission bundles: https://rest-docs.synapse.org/rest/GET/evaluation/evalId/submission/bundle/all.html
Get user submission bundles: https://rest-docs.synapse.org/rest/GET/evaluation/evalId/submission/bundle.html
*.pyfiles for discoverability:SubmissionBundlewill inheritSubmissionandSubmissionStatusBranch:
synpy-1590-submission-model-docsTesting: