Skip to content

Conversation

@alesan99
Copy link
Contributor

@alesan99 alesan99 commented Jun 11, 2025

Fixes #6521, Fixes #4445, Fixes #6403, Fixes #6947

This PR makes the backend fetch all the attachments in a record set so the "Download All" button in the gallery dialog doesn't rely on the attachments fetched by the frontend. Kind of related to #4445, but fetching all the attachments on the frontend isn't a good idea anyway (This PR does fix the loading though!).

These are the changes:

  • The backend will fetch all the attatchments in a record set, no longer relying on the frontend to load them all
  • The backend will stream attachments.zip to the frontend as its being created, meaning large attachment downloads are now possible without storage usage or request timeouts. (uses a new dependency: https://pypi.org/project/zipstream-new/)
  • Scrolling on the attachment gallery on a record set should reliably load more attachments (Broken before).
  • Scrolling on the attachment gallery when browsing in forms on query results should reliably load more attachments (Broken before).
  • A dialog telling you to create a record set should appear if you attempt to download all attachments when browsing in forms before all the attachments are done loading.
  • If not all the attachments are finished loaded, the dialog will show up with a plus sign(123+). Once all attachments are loaded, it will display the number as (123).

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list
  • Add automated tests
  • Add a reverse migration if a migration is present in the PR

Testing instructions

  • Create a Query on a table with a lot of attachments.
  • Example:
    image
  • Run the query (make note of how many results there are) and create a record set
  • Click on the attachment gallery button on the record set
    image
  • You should see the number displayed in the title is capped at 40 does not actually reflect the number of attachments.
  • Verify the "Download All" button actually downloaded all the record set's attachments.
  • You should be able to scroll down and load more record set attachments.
  • Go back to the Query
  • Click "Browse in forms" and click on the gallery button.
  • You should see the number of attachments is still off. It's probably not 40 but its likely still off from the actual count.
  • Verify the "Download All" button prompts you to create a record set from the query if not all the attachments were loaded
  • You should be able to scroll down and load more query attachments.
  • Filter the query so it only results in a few records
  • Make sure "Download All" downloads all of the query's attachments.

alesan99 added 2 commits June 11, 2025 14:13
Fix missing parameter
Triggered by 28d620c on branch refs/heads/issue-6521
@alesan99 alesan99 marked this pull request as ready for review June 17, 2025 17:46
@alesan99
Copy link
Contributor Author

@specify/dev-testing Here's something I want to draw attention to:
Large record sets were represented with an array of ids of loaded records and padded with undefined to show the unloaded records [1, 2, 3, undef, undef, undef]
Large queries were represented with an array that only had the loaded record ids [1, 2, 3].

This meant that the true number of records was misrepresented on large queries. (Also, though expected, the undefined values weren't entirely handled correctly by the attachment gallery)

I fixed this by both padding the ids in queries and sending the true number of records to the attachment gallery (aka not relying on the length of the array).
Either solution works and aren't needed together, but I left them both to ensure record sets and queries are consistent.

Should I leave in the padding? Large record sets pad their arrays already and it seems to work fine, but it still feels weird to have potentially massive arrays.
From my testing, it works okay, but if you have strong feelings one way or the other, let me know.

@alesan99 alesan99 requested review from a team June 17, 2025 18:10
@alesan99 alesan99 changed the title Fetch attachment urls on the backend when download from record sets Fetch attachments on the backend when downloading from record sets Jun 17, 2025
Copy link
Collaborator

@emenslin emenslin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Verify the "Download All" button actually downloaded all the record set's attachments.
  • You should be able to scroll down and load more record set attachments.
  • Verify the "Download All" button prompts you to create a record set from the query if not all the attachments were loaded
  • You should be able to scroll down and load more query attachments.
  • Make sure "Download All" downloads all of the query's attachments.

So everything does work, however downloading attachments takes forever even if there are not that many. I was testing on KUfish and most downloads seemed to take about 2 minutes but when checking on KUbirds a record set with about the same amount of attachments would be done in 10 seconds. I'm not sure if this is just a kufish problem or if it happens on more dbs but either way I think it should be looked into more.
Link to KUfish record set: https://kufish20250214-issue-6521.test.specifysystems.org/specify/view/collectionobject/35816/?recordsetid=214

Link to KUBirds record set: https://kubirds20240606-issue-6521.test.specifysystems.org/specify/view/collectionobject/129670/?recordsetid=231

Also I found a problem on ojsmnh where when I query for attachments and I open the attachment gallery it shows the wrong number of attachments. Not sure if this is on other dbs or not, it might have something to do with dowloading attachments as well but I'm not sure.

07-08_10.33.mp4

@github-project-automation github-project-automation bot moved this from 📋Back Log to Dev Attention Needed in General Tester Board Jul 8, 2025
@alesan99
Copy link
Contributor Author

alesan99 commented Jul 9, 2025

So everything does work, however downloading attachments takes forever even if there are not that many.

@emenslin Hm I have no idea what could be causing this but I optimized the backend asset server requests and queries to hopefully improve this.
Additionally, the zip files now get downloaded as they're being created, which should reduce download times greatly AND should allow for long downloads without experiencing a timeout error @pashiav

The PR is ready to test again with all issues resolved 👍👍 (except for the one below vv )

Also I found a problem on ojsmnh where when I query for attachments and I open the attachment gallery it shows the wrong number of attachments. Not sure if this is on other dbs or not, it might have something to do with dowloading attachments as well but I'm not sure.

I recreated this on main so it seems to be an existing issue. If a record appears more than once in the query results it will appear multiple times in "browse in forms", so its attachments will be multiplied when viewed in the gallery. While I could remove duplicate records from the gallery, I think this should be written up as a separate issue to also address records showing up multiple times in browse in forms.

chrome_mc54BJxI3y.mp4

https://kubirds20240606-main.test.specifysystems.org/specify/query/76/

Copy link
Collaborator

@emenslin emenslin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Verify the "Download All" button actually downloaded all the record set's attachments.
  • You should be able to scroll down and load more record set attachments.
  • Verify the "Download All" button prompts you to create a record set from the query if not all the attachments were loaded
  • You should be able to scroll down and load more query attachments.
  • Make sure "Download All" downloads all of the query's attachments.

Looks good, the only issue I ran into was when trying to download a large amount of attachments (<1000) I would either get a failed to fetch error or if it successfully downloaded when I tried to open the zip file it would be invalid. I'm going to approve because it seemed to work consistently with up to 500 attachments (not sure what the limit is for when it stops working consistently) and I don't know if users are ever going to want to download more than that, I just wanted to make note of this behavior.

Downloading attachments in kufish is still slow but faster than before (same record set is now done in about a minute and a half instead of two minutes)

@alesan99
Copy link
Contributor Author

alesan99 commented Aug 5, 2025

I pushed a fix to correctly download files with paths in the name like "C:/path/to/file.jpg". I believe this could result in corrupted zip files sometimes.

All attachment names will now be sanitized to be filename-safe.

Copy link
Contributor

@pashiav pashiav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Verify the "Download All" button actually downloaded all the record set's attachments.
  • You should be able to scroll down and load more record set attachments.
  • Verify the "Download All" button prompts you to create a record set from the query if not all the attachments were loaded
  • You should be able to scroll down and load more query attachments.
  • Make sure "Download All" downloads all of the query's attachments.

Looks good!

Note: I also get Failed to Fetch messages (same behavior as #6625 (review)) with large downloads.

@alesan99 alesan99 enabled auto-merge (squash) September 29, 2025 15:45
@alesan99 alesan99 merged commit 2d542aa into main Sep 29, 2025
14 checks passed
@alesan99 alesan99 deleted the issue-6521 branch September 29, 2025 15:46
@github-project-automation github-project-automation bot moved this from Dev Attention Needed to ✅Done in General Tester Board Sep 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

7 participants