-
Notifications
You must be signed in to change notification settings - Fork 26
[SCHEMATIC-183] Update tests - Use magic mock and add parentId #1554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
6ea80d8
98552d3
040cc01
8f454ff
6a0383c
498af2b
98d345b
c4676cb
8dd2d09
0ee2fd2
0d24776
a4a5dba
b8360cb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,12 +19,10 @@ | |
| import numpy as np | ||
| import pandas as pd | ||
| import synapseclient | ||
| import synapseutils | ||
| from opentelemetry import trace | ||
| from synapseclient import Annotations as OldAnnotations | ||
| from synapseclient import ( | ||
| Column, | ||
| Entity, | ||
| EntityViewSchema, | ||
| EntityViewType, | ||
| File, | ||
|
|
@@ -705,13 +703,16 @@ def getFilesInStorageDataset( | |
| ValueError: Dataset ID not found. | ||
| """ | ||
| file_list = [] | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Work with Bryan to see the difference in speeds between dev branch and prod branch within signoz.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is necessary to filter out difference between branches in gh runs. It's possible to get the data now, just a bit more difficult to filter it out. As of now the average duration of this function plotted over time:
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feel free to pull develop into this feature branch and we can compare the develop branch to this feature branch performance.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @BryanFauble. Done
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @thomasyu888 These are the results over the past 5 days. We can see your branch has a much better average execution time for this function. Although, similar to We can selected on a few fields to filter for what we want, perform an average of the duration for the function, then group by a few fields to get these results. |
||
|
|
||
| # HACK: must requery the fileview to get new files, since SynapseStorage will query the last state | ||
| # of the fileview which may not contain any new folders in the fileview. | ||
| # This is a workaround to fileviews not always containing the latest information | ||
| # self.query_fileview(force_requery=True) | ||
| # Get path to dataset folder by using childern to avoid cases where the dataset is the scope of the view | ||
| if self.storageFileviewTable.empty: | ||
| raise ValueError( | ||
| f"Fileview {self.storageFileview} is empty, please check the table and the provided synID and try again." | ||
| ) | ||
|
|
||
| # Get path to dataset folder by using children to avoid cases where the dataset is the scope of the view | ||
| child_path = self.storageFileviewTable.loc[ | ||
| self.storageFileviewTable["parentId"] == datasetId, "path" | ||
| ] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -400,38 +400,31 @@ def test_mock_get_files_in_storage_dataset( | |
| with patch( | ||
| "schematic.store.synapse.CONFIG", return_value=TEST_CONFIG | ||
| ) as mock_config: | ||
| with patch.object(synapse_store, "syn") as mock_synapse_client: | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is me misleading Gianna, but there are times when we want a connection with Synapse and there are times we don't. This is probably a case where it is ok for there to a direct connection with Synapse to simplify the mocking - since you've included in your other |
||
| # AND the appropriate asset view | ||
| mock_config.synapse_master_fileview_id = asset_view | ||
| # AND the appropriately filtered filenames | ||
| if filenames: | ||
| files_to_remove = [] | ||
| for f in expected_files: | ||
| retain = False | ||
| for name in filenames: | ||
| if name in f[1]: | ||
| retain = True | ||
| if not retain: | ||
| files_to_remove.append(f) | ||
| # AND the appropriate asset view | ||
| mock_config.synapse_master_fileview_id = asset_view | ||
| # AND the appropriately filtered filenames | ||
| if filenames: | ||
| files_to_remove = [] | ||
| for f in expected_files: | ||
| retain = False | ||
| for name in filenames: | ||
| if name in f[1]: | ||
| retain = True | ||
| if not retain: | ||
| files_to_remove.append(f) | ||
|
|
||
| for file in files_to_remove: | ||
| expected_files.remove(file) | ||
| for file in files_to_remove: | ||
| expected_files.remove(file) | ||
|
|
||
| mock_table_dataFrame = pd.DataFrame( | ||
| expected_files, columns=["id", "path"] | ||
| ) | ||
| mock_table = build_table("Mock Table", "syn123", mock_table_dataFrame) | ||
| mock_synapse_client.tableQuery.return_value = mock_table | ||
|
|
||
| # WHEN getFilesInStorageDataset is called for the given dataset | ||
| dataset_files = synapse_store.getFilesInStorageDataset( | ||
| datasetId=dataset_id, fileNames=filenames | ||
| ) | ||
| # WHEN getFilesInStorageDataset is called for the given dataset | ||
| dataset_files = synapse_store.getFilesInStorageDataset( | ||
| datasetId=dataset_id, fileNames=filenames | ||
| ) | ||
|
|
||
| # THEN the expected files are returned | ||
| # AND there are no unexpected files | ||
| assert dataset_files == expected_files | ||
| # AND the (synId, path) order is correct | ||
| synapse_id_regex = re_compile(SYN_ID_REGEX) | ||
| if dataset_files: | ||
| assert synapse_id_regex.fullmatch(dataset_files[0][0]) | ||
| # THEN the expected files are returned | ||
| # AND there are no unexpected files | ||
| assert dataset_files == expected_files | ||
| # AND the (synId, path) order is correct | ||
| synapse_id_regex = re_compile(SYN_ID_REGEX) | ||
| if dataset_files: | ||
| assert synapse_id_regex.fullmatch(dataset_files[0][0]) | ||




There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're using vscode, you may see unused imports "grayed" out. It's ok to remove them in PRs, the intent is that each PR that we contribute should put the codebase in a better place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint would also catch this, but this module hasn't been gone through and fixed so that is passes, so it isn't included in the pylint check.