-
Notifications
You must be signed in to change notification settings - Fork 1
📝 Add quickstart notebook/tutorial #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #48 +/- ##
=======================================
Coverage 83.63% 83.63%
=======================================
Files 8 8
Lines 605 605
=======================================
Hits 506 506
Misses 99 99 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks very good
docs/notebooks/example.ipynb
Outdated
" \"threading.max_workers\": 5,\n", | ||
" \"codec_pipeline.path\": \"zarrs.ZarrsCodecPipeline\",\n", | ||
" \"concurrency\": 4,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either don't set the max_workers
and concurrency
or make it os.cpu_count
dependent. I just wouldn't set it tbh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although if you're going to have this big section, then maybe you can explain the settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just double checked this, for me it doesn't give that much of a speed increase. Would be fine with removing it based on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also fine removing this yea
docs/notebooks/example.ipynb
Outdated
"# Download an example dataset from CELLxGENE\n", | ||
"!wget https://datasets.cellxgene.cziscience.com/866d7d5e-436b-4dbd-b7c1-7696487d452e.h5ad" | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should do two datasets? It's easy enough and highlights that we can handle the var_space
docs/notebooks/example.ipynb
Outdated
" \"866d7d5e-436b-4dbd-b7c1-7696487d452e.h5ad\",\n", | ||
" ],\n", | ||
" # Path to store the output collection\n", | ||
" output_path=\"tahoe100_FULL\",\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A different name?
docs/notebooks/example.ipynb
Outdated
"metadata": {}, | ||
"cell_type": "markdown", | ||
"source": [ | ||
"IMPORTANT:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "IMPORTANT" should at least be bold
"* The `ZarrSparseDataset` yields batches of sparse tensors.\n", | ||
"* The conversion to dense tensors should be done on the GPU, as shown in the example below.\n", | ||
" * First call `.cuda()` and then `.to_dense()`\n", | ||
" * E.g. `x = x.cuda().to_dense()`\n", | ||
" * This is significantly faster than doing the dense conversion on the CPU.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention preload_to_gpu
here - i.e., if you have a GPU and can spare some extra memory, you should use preload_to_gpu
and then you don't need to use .cuda()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the preload_to_gpu
option. Would leave the .cuda()
call. That might just confuse the user, and if everything is already correct the .cuda()
call doesn't do anything.
From the torch documentation:
"If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned."
Oh rad, then I would have been ok leaving out preload_to_gpu
but now that this is moving in the direction of a guide not on the README.md but on the docs, the extra detail is good then
docs/notebooks/example.ipynb
Outdated
"source": [ | ||
"The conversion code will take care of the following things:\n", | ||
"* Align the gene spaces across all datasets listed in `adata_paths`\n", | ||
" * The gene spaces are aligned based on the gene names provided in the `var_names` field of the individual `AnnData` objects.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's specifically an outer join, though - "aligned" is ambiguous
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With these changes, looks good. Probably want to make sure it renders right with the changes
docs/notebooks/example.ipynb
Outdated
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from arrayloaders import create_anndata_collection\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update imports :)))
Co-authored-by: Ilan Gold <[email protected]>
Co-authored-by: Ilan Gold <[email protected]>
Co-authored-by: Ilan Gold <[email protected]>
Co-authored-by: Ilan Gold <[email protected]>
Co-authored-by: Ilan Gold <[email protected]>
Co-authored-by: Ilan Gold <[email protected]>
Co-authored-by: Ilan Gold <[email protected]>
Co-authored-by: Ilan Gold <[email protected]>
Co-authored-by: Ilan Gold <[email protected]>
Co-authored-by: Ilan Gold <[email protected]>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
7e3d9ea
to
ef2545c
Compare
That's a draft for a quickstart tutorial notebook. The idea was to walk the user through all steps needed to use the package, eg. from creating the collection to actually using the dataloader + explain the important settings to the user.
Let me know what you think and whether I missed something