Skip to content

Conversation

@vincerubinetti
Copy link
Member

@vincerubinetti vincerubinetti commented Mar 6, 2025

  • switch to html-to-image (better maintained fork of dom-to-image than dom-to-image-more)
  • make basic MSA component
    • takes multiple string sequences, with labels
    • assigns characters in sequence a particular color based on a "type" (provided by consumer or auto-assigned by uniq character)
    • combined summary row at top that shows percentage breakdown of chars in column
    • ability to wrap to new rows at N number of chars
    • print and raw export options
  • debounce network component resizing that was slightly slowing down page
  • when clicking number box inc/dec buttons, keep button under mouse
  • split up testbed page code. split each section into its own component, making it easier to re-arrange and comment/uncomment them. split fake data generation functions into separate file.
  • change color palette yet again. go back to hand-picked tailwind colors.
  • tweak some util funcs

I'll shortly also be adding JPEG/PNG export to the MSA viz as well. I won't be adding SVG export, because the way it's currently constructed using several separate SVGs. I could reconstruct the whole viz as one large SVG, which would have the following tradeoffs: allow single SVG download, more straightforward but much more verbose positioning of elements, less flexible and harder to make changes to layout, not easily made responsive to screen size width.

@netlify
Copy link

netlify bot commented Mar 6, 2025

Deploy Preview for molevolvr ready!

Name Link
🔨 Latest commit 2823b68
🔍 Latest deploy log https://app.netlify.com/sites/molevolvr/deploys/67feea60ed317800086c80c9
😎 Deploy Preview https://deploy-preview-50--molevolvr.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@vincerubinetti vincerubinetti requested review from epbrenner, falquaddoomi and jananiravi and removed request for falquaddoomi March 7, 2025 18:21
Copy link
Contributor

@falquaddoomi falquaddoomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The graphic looks great IMO! I tried out the following features:

  • Keyboard navigation: I can tab through each panel and scroll them with the arrow keys as expected.
  • Wrapping: seems to work as expected, with additional rows being added as the wrap width is decreased. The window does indeed remain scrolled so that the mouse is under the button you're clicking, even as the height of the page is changing, which is pretty cool. 🙂
  • Width Toggle: works as expected, and IMO a nice quality-of-life feature.
  • Exports:
    • the PNG and JPEG exports appear to export the image as it appears in the page. One thing I noticed is that the PNG and JPEG previews appear to show the current view, but the panel in the resulting image is scrolled to 0, rather than showing the portion of the panel where the user had scrolled. (More thoughts in [1].)
    • The TSV export works well, although if it's not a standard format you might consider adding a header column.
    • The PDF export works, and IMO the result looks great.

[1] I don't know how big of a problem this is, if at all; I also don't know if there's a way to get the package you're using to manipulate the scroll position in the exported image. If there isn't, you could consider setting the wrap prior to export to a value for which the contents of the panel won't overflow and require scrolling.

@vincerubinetti
Copy link
Member Author

although if it's not a standard format you might consider adding a header column.

I think in this case column headers aren't super useful? After the first column, it'd just be the sequence position. I guess I'd leave that call up to Janani & Co. It would just look something like:

Name,1,2,3,4,5,6,7,8,9,10,
Firmucutes A,A,G,T,C,C,T,A,T,G,
Firmucutes B,A,G,T,C,C,T,A,T,G,

Maybe it could be more useful if we include the top "combined" row to the tsv export. Not sure how to format that... just choose the most common character in each column? show all the %s like A 70%\nG 30% (or comma separated).

@vincerubinetti
Copy link
Member Author

Regarding the jpg/png export issue:

I didn't notice that scroll issue. I would've assumed that the library would preserve whatever was on screen. It looks like this is a known issue: https://github.com/bubkoo/html-to-image/issues?q=is%3Aissue%20state%3Aopen%20scroll%20position I'm not sure the library could even fix the issue though, as I think it would lie in the browser's internal painting of the elements to canvas. One workaround is instead of centering a particular element in the view using the browser's normal overflow, you'd (temporarily) hard code a fixed offset position of everything in the container with CSS, as described in some of those issues. That's very dirty though and I'd like to avoid it.

Also, my thought is that if a user wants something to be visible in the png and were going to manually scroll to it anyway, they could also manually set the wrap point to ensure it's visible without a scrollbar at all.

Another option is to temporarily automatically set the wrap point ourselves right before the user exports as png, then reset it afterward. This is what I'm doing with the print option. But in the case of printing, we can usually expect the screen size to be a standard US letter, pretty wide. But with the png, it depends on the user's window width. I'll try experimenting with this. An exact calculation of the wrap point would be complex, but I could either loosely base it on window width, or iteratively reduce it until there's no scrollbars.

@vincerubinetti
Copy link
Member Author

could either loosely base it on window width, or iteratively reduce it until there's no scrollbars.

The iterative can work, but causes a visible lag because you need to decrease the wrap, wait for it to render, check if a scrollbar is there, decrease again, etc. The rough tuning seems to work well enough, though when CSS on the page changes (widths of stuff) in the future, a maintainer will have to remember to re-tune it.

@vincerubinetti
Copy link
Member Author

@falquaddoomi good catch, the latest commit should improve the spacing there (but still not perfect because it's just a rough tuning).

@epbrenner
Copy link

if the version without the header is an already-accepted format

I have no idea about this. @jananiravi @epbrenner, is there a particular format that's most standard or helpful when downloading as tsv? See my comment above

People going to download MSAs are usually expecting one of a few possible plaintext formats. These are usually Clustal, FASTA/Pearson, and sometimes NEXUS. If somebody wants to download the MSA not in a graphical format, they'll probably want it in a format like that so they can use it in other software downstream (like building their own custom phylogenetic trees).

The raw input MSA that goes into this is the simplest plaintext output to provide a user, but if they want the Combined consensus sequence, that's something not present in that original. Maybe the options for plaintext/tsv download could be "Original MSA" in the original upstream format MolEvolvR produces before visualization, and then "Consensus sequence" just as a FASTA file like:

>ConsensusSeq_$QueryName_$FiltersApplied
MWYFAWILTSTLWARE[...]

And this separate file would contain only the combined consensus sequence as a regular FASTA format sequence and no other sequences. At least that's what I think may be the most broadly applicable, but I'd like thoughts from @jananiravi too!

@falquaddoomi
Copy link
Contributor

falquaddoomi commented Mar 12, 2025

@falquaddoomi good catch, the latest commit should improve the spacing there (but still not perfect because it's just a rough tuning).

@vincerubinetti: Your latest change seems to have fixed it for me!

I haven't noticed much roughness myself, but IMO whoever's downloading the PNG/JPEG format can edit to their liking, if their intent is to use it for a publication or something. If they want a more predictable output format they can download it as a PDF, and for a more flexible format there's the TSV.

@falquaddoomi falquaddoomi self-requested a review March 12, 2025 22:03
@vincerubinetti
Copy link
Member Author

@falquaddoomi After you approved, I noticed some critical bugs I wanted to include, and also ended up including a legend and the clustal coloring scheme that was requested. I'm also still waiting to hear back about what TSV download format is desired, which I'll also add to this PR.

@epbrenner Could you take a look at the code in msa-clustal.ts to verify that I've interpreted the table correctly? And please spot check that the coloring is right on the testbed, where I've hooked up the clustal theme by default.

@jananiravi
Copy link
Member

I think in this case column headers aren't super useful? After the first column, it'd just be the sequence position. I guess I'd leave that call up to Janani & Co. It would just look something like:

Name,1,2,3,4,5,6,7,8,9,10,
Firmucutes A,A,G,T,C,C,T,A,T,G,
Firmucutes B,A,G,T,C,C,T,A,T,G,

Maybe it could be more useful if we include the top "combined" row to the tsv export. Not sure how to format that... just choose the most common character in each column? show all the %s like A 70%\nG 30% (or comma separated).

The MSA would be in a fasta format -- it will work later w/ any of these tools: https://www.ebi.ac.uk/jdispatcher/msa

Reg. @falquaddoomi's comment:

Well, I was just thinking it'd be "Name\tSequence", just so it's identified somehow; no need to create columns per position in my opinion. That said, if the version without the header is an already-accepted format, let's stick with that.

I agree that for a simple tsv, this should suffice
Name\tAccNum\tSequence

Reg. @epbrenner's comment:
yes, the most common use case is exporting the MSA in FASTA format so they can use it with a different tool downstream.

Was the MSA coloring part of a different PR or resolved comments above? I know Evan had some thoughts on those -- can't find that anymore.

@vincerubinetti
Copy link
Member Author

Was the MSA coloring part of a different PR or resolved comments above? I know Evan had some thoughts on those -- can't find that anymore.

If you're referring to the clustal coloring, that is implemented and in the preview.

I agree that for a simple tsv, this should suffice
Name\tAccNum\tSequence

I've just made changes to download in this format. You can put label/name, sequence, and any additional field (not just accnumber) as a column in the download.

@falquaddoomi
Copy link
Contributor

Hey @vincerubinetti, as @jananiravi mentioned above, in our prior meeting @epbrenner mentioned that having FASTA as an additional download type could be useful for downstream analyses. Sorry that I'm just mentioning this now, and hopefully it's not too hard to add.

FASTA is a text format in which each sequence is preceded by a "header" line. The header starts with "> " and then lines that follow it are the sequence for that header. Multiple sequences can be included in one file, each initiated by a header line. AFAIK it's common for a FASTA header to start with the accession number, but I'm unaware of specifics beyond that; maybe @jananiravi and @epbrenner can provide some advice.

In our case, it might look like

> esse velit nulla deserunt eu ut eu laborum nostrud
APLTMWOCAIUOTATCIRDOUEXEMFZRKDNKOLPGEYMBLSGNTIKQPFIADONYWUHFXJKQOZHXMDJVPTJZCKYW
PICOMLUFMBNJATTRIHMTRCWXSLBUUYFQFQOKXPVKOPQKALGWRFOFOFQWKJEHXIGENMNAIXYQAVELZXDU
WDZLTKXINRUNDVCBMXNAUIVGFVFZCHTSKSFVTQWUUMSKNPXUSSKLHOXYIZCTXOTZRJKVINROJKQDCOWQ
TPLBVOZOIRJJJRPVLOJBQMYXKRXGMOLRQNGFZCHCCOPJDIMAEJLJVHZHESWVLEHVFQEDOJOLXGEWPCIG
OPOELDZNTPATYUDIIYBAIRIMYMUQMDXKKCGVOPFFBQHCOCEAAPPAASCNNCCMCJQMYVGGZCTCZMRUNFIF
BGCPWXPTONMBXKJZOORFVZVBJKZFOBPSLMERPWXYWHLMQJOLQKSNKDOCJMPSPFPELJCCRNYYEEMSOSZR
JHKWZOFKZLGIYOMNVNDLEPCTCXDRRTBVUSMAFGWIINPZUSNZEFECWQESCBQKPLETADUTHMWVNUJFTYPJ
KMQTNGCGNHSJQBUMIZUVHTMQYZDMFRPXDNMIKHQHKTCLBINOLXLXRHVJQMKULDEVSEBJOOGSSSTUBDMP
RALSPAYWXFMYEPLOQLAKFNHSTPGKLKZCXGZIRANMAKRWYATIKBYVPAZXZRZITHOEMXJZYEQFKDUPWPBB
NNHNTMEESPFV

> nostrud nisi tempor amet incididunt qui est est est in
APLJMWOCAOUUTWRCIYDLUEXEMVZRGDNKWLBGESMBLSGNTIKQPFIADONYWUHFXUKQOZHXMDJVPTJACJYW
PICOMLULMXNJATTRIHMMRCEXSLBUUYFQFZOKXPVKOPQKALGWRFOFOFQWKJEHXIGENMPAIXYQASELZXDU
WDZLTKXINPUNDVCOBMXNAUIMGAVFZCHTSHSFVTQWUUMSKNPXUSSNLHOXYWECTXQTZZJKVINROIAQDCOW
QTPIBVOGOIPOJJRPVLOJBQMYXKEXGMOVRQNGFNCHCCPJZIMAEJLJVHZCEOWVLESVFQEDOJOLXGEWCIGO
POELDZNTPBTYUDIIYBAIRIMYMUQGBXGKCGVOPFFBQHCMCEAAPPAASCNNCSRMCJQMYVGBZCTCEMPGNFIF
BGPBOPTFVMBXKJZOORFVZVBJKZFOBPSLWERPWXPIHLMQJOLOKSZKDOCFMJSPFTELJCCRNYYEEMQOSZOJ
QKWZOFKZLGYYOMNJNONEPCTCXDRRTBVUSMEFGWLINPZUSNFEFECWEINBQMPLITADUTHMOVNUJZTYPJKM
QTNGCGNHSJRBULIZUVHTHYQYZDMFRPXDIMIKHQHKOCLBDNOLNLXRHNJMMQULDEVSEBJLOGSSTTUBDMQR
LSOYWXFMYEPLOWQLAKFNHSTYGKLKZCXMZICAIMABRWYATIKFYVPAZXZRZITOEMXJZYWQKDUPWFPVNHNT
MEESPFV

@vincerubinetti
Copy link
Member Author

I believe I've implemented the requests above, though I would like someone to take a look at the clustal coloring logic to make sure I've interpreted it correctly.

@epbrenner
Copy link

Sorry again to keep you waiting on this @vincerubinetti! This looks good and ready to go. Will approve shortly.

The only (non-blocker) quirk I really see is when downloading the PDF in Dark Mode, the MSA goes to a white background (which I think is fine and a reasonable expected behavior), but it still retains the black background on amino acid positions that don't have a coloring assigned. Since you're using non-standard characters like J, this behavior isn't really an issue for them since you'll never see those in a biological sequence, but for gaps (-), these won't have properties so they'll have that black background normally. We could keep that, but I also think just going with the light mode white background for gaps would work well too.

tl;dr: The gap coloring is usually a lack of coloring because there's nothing there. The PDF gaps from a dark mode download are black against a white page background. That could be switched to white. Low priority.

Screenshot 2025-04-15 at 3 19 02 PM

Copy link

@epbrenner epbrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downloads and coloring logic all check out to me. Minor dark/light mode PDF download issue left in comments won't block merge. @ me if you'd like me to make an issue out of it to keep track of it for later!

@vincerubinetti
Copy link
Member Author

Regarding the print background, I've changed it so it will respect the dark mode black background. But, since we're just using the browser's built in print dialog for this, the user will still have to allow background images/colors to be drawn. In Chrome, you have to manually check this option under "More settings", and there is no way we (via JavaScript) can check it for them:

Screenshot 2025-04-15 at 7 23 47 PM

@vincerubinetti vincerubinetti merged commit c7430e9 into main Apr 15, 2025
4 checks passed
@vincerubinetti vincerubinetti deleted the msa branch April 15, 2025 23:26
@vincerubinetti vincerubinetti mentioned this pull request Apr 15, 2025
vincerubinetti added a commit that referenced this pull request Jun 12, 2025
Unfortunately this became a big PR because everything here was
intertwined, and I wanted to (hopefully) solve all these related issues
permanently. So, the diff is very big and difficult to review, and the
individual commits won't be very clean or helpful. Instead I will mark
certain parts of the code to look at. And know that converting the old
SVGs to use the new chart wrapper component involved a lot of
copy-pasting large blocks of code. Most importantly, please re-test
every visualization component on the Netlify preview.

At a high level, here are the changes:

- implement upset plot
- make generic chart wrapper component that handles most things
automatically. auto-fits view box to content. SVG units match DOM units
1:1. see discussion below for how i landed on this particular design.
- abstract out download button with various download formats as its own
component. integrate into chart component.
- chart text can be truncated automatically by actual rendered width
(not number of chars)
- include chart titles in every chart
- include analysis id (or anything else we want) as part of every
filename download. add fake analysis id to testbed page.
- turn the legend component into fully SVG instead of DOM elements.
implement simple wrapping strategy.
- make MSA and IPR charts fully SVG, with no DOM elements
- MSA now auto-wraps by default, and accurately (instead of loose
estimates). when printing,
- make charts resizable
- add SVG group elements where appropriate for easier editing in vector
software
- consistently put settings vars and Props type at the top of each
component file for quick reference
- util func simplifications and refactors

---

If you poke through the commit history, you can see some of the many
complicated approaches I experimented with for handling sizing and
positioning in the SVGs.

I wanted to scale the SVG shapes up/down based on the available width
but keep the font size matching the document's. This was already sort of
implemented before this PR, where the viz components made repeated use
of a useSvgTransform hook. But it was verbose, so I first attempted to
extract it out to a single hook, and then a component. It required
iteratively content-fitting and font-sizing back and forth, which would
(usually) converge to a stable configuration. But it was very difficult
to find a foolproof way to prevent infinite re-renders or weird edge
cases. I also tried leveraging simply defining font size in the root of
the SVG that matched the document's, then specifying positions/sizes in
terms of `em` units, which the browser could use synchronously with no
JS. But there are some limitations to this, like not being able to use
relative units in things like `transform`s, and some weird exceptions
that caused runaway calculations like when an `em`-sized element
exceeded the available width of the container. Trying to implement
automatic text truncation made this even more complicated.

The MSA & IPR viz's also required a "fill remaining space" feature, i.e.
have a column of labels on the left of a fixed width, and use all the
remaining space on the right for the tracks/sequences. Another small
problem to solve was that `getBBox`, the method used for fitting the
view box to the SVG contents, doesn't work with `clip-path`'ed elements,
which is needed for IPR's zoom area.

There's much more that I tried or thought about and didn't commit, which
has already left my brain, but I think it's not that valuable.

What I landed on keeps things much simpler. The SVG units always match
the DOM units, no exception (hopefully). If there isn't enough room on
the page to show the SVG's content, instead of trying to shrink
dynamically, it will just overflow and show scrollbars. Not only does
this avoid needing iterative fitting, it's probably better UX wise since
graphical elements stay in the same position and size -- instead of
shrinking to an unreadable size -- and you just need to scroll over to
see it all. Also this makes the testbed page much more performant. The
"fill remaining" feature is accomplished by providing the available
width to the consuming component, and letting it use it however it
wants.

---

Regarding the MSA auto-wrapping, currently the code is basically:

```ts
setWrap(40);
print()
setWrap(oldWrap);
```

This is just a loose, hard-coded wrapping point that would generally not
overflow the page width when printing portrait mode, US letter.

This PR doesn't change our inability to look inside the print dialog, as
discussed in #50 and [this
comment](#52 (comment)).
But I've taken out the hard-coded `40`. I figure now with the resize
handle available on every chart, the user can pre-size the chart to
whatever they want before opening the print dialog.

I wanted to mention this because, if you look at the preview on a normal
desktop or laptop screen width and have your print dialog on defaults,
you'll usually see the MSA overflow the page, and I'm aware of it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants