Skip to content

anndata_dask_array.ipynb: copy tweaks / improvements#19

Open
ryan-williams wants to merge 3 commits into
scverse:mainfrom
ryan-williams:copy
Open

anndata_dask_array.ipynb: copy tweaks / improvements#19
ryan-williams wants to merge 3 commits into
scverse:mainfrom
ryan-williams:copy

Conversation

@ryan-williams
Copy link
Copy Markdown

@ryan-williams ryan-williams commented Oct 30, 2024

anndata_dask_array.ipynb

  • Seed da.random/np.random ⟹ allow regenerating the notebook deterministically.
  • Copy tweaks: some typos, some more opinionated rephrasing, couple places described things that didn't match the cell outputs (e.g. claiming a slice result was eager when it wasn't), couple places I added a cell and some copy.
  • Kernel name: python3.bakpython3

Dockerfile / regenerate.sh

Regenerate notebook files deterministically (in Docker / using juq to clean notebooks / canonicalize outputs), e.g.:

./regenerate.sh anndata_dask_array

I've used it here on anndata_dask_array.ipynb, but not any other notebooks.

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@ryan-williams ryan-williams force-pushed the copy branch 2 times, most recently from c400939 to 1eafd86 Compare October 30, 2024 21:31
@flying-sheep
Copy link
Copy Markdown
Member

Hi, thank you very much, this is a great idea. We really need to standardize a reproducibility story here, thank you for getting started here. I don‘t think pip install will install the same versions without pip-timemachine or a lockfile, so this is not completely reproducible. It would be amazing though if we found a way to get pixel-perfect graphics that don‘t generate a diff every time.

One thing about your solution: The font rendering is kind of ugly. Can you check if installing ipycytoscape instead of graphviz (and maybe also if setting dask’s config visualization.engine to it) improves this?

Comment thread regenerate.sh Outdated
@flying-sheep
Copy link
Copy Markdown
Member

Hia, I don’t know if you saw my comments above: Could you please take a look?

- build a Python 3.11.8 / Ubuntu Docker image
- mount in `$PWD`
- execute notebook(s)
- clean results (remove execution/timing metadata, canonicalize outputs)
@ryan-williams
Copy link
Copy Markdown
Author

ryan-williams commented Feb 7, 2025

Sorry, I finally got back to this.

One thing about your solution: The font rendering is kind of ugly. Can you check if installing ipycytoscape instead of graphviz (and maybe also if setting dask’s config visualization.engine to it) improves this?

ipycytoscape embeds "widgets" in the notebook, which render as placeholder text when I reload the notebook:

ipycytoscape widget text output

I'm assuming they also won't render when compiled into a docsite either. lmk if I missed something there.

On the Graphviz side, Dask's to_graphviz hard-codes fontname="helvetica", but it renders differently for me on macOS vs. Ubuntu (and slightly differently still in an Ubuntu Docker image on an Ubuntu host, which I used in this PR). The existing images seem to match what I see when I run it on my Macbook, and both Ubuntu versions have "uglier" fonts. Here are samples:

macOS

mac

Ubuntu Docker (this PR)

dkr

Ubuntu

ubuntu

Let me know how you want to proceed. I was also thinking I should add an example of reading from Dask (based on Scanpy's Dask tutorial, that @ivirshup pointed me at)

"One-liner" I used for extracting images (for reference, incl. my own)
r=copy  # Git ref
f=anndata_dask_array.ipynb  # notebook path
git show $r:$f \
| jq -r '.cells[] | (.outputs // [])[].data.["image/png"] | select(.)' \
| head -n1 \
| base64 -d \
> $r.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants