-
Notifications
You must be signed in to change notification settings - Fork 25
3427 relax import list limit #3739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replace text_field with text_area for the inat_ids input to prepare for accepting longer lists of iNaturalist observation IDs. This provides better UX for entering large comma-separated ID lists. - Update form to use f.text_area instead of f.text_field - Update controller test to expect textarea element - Remove size attribute (not applicable to textarea) - Maintain form-control class for consistent width Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Update validator to accept up to 9,984 characters in the inat_ids parameter, increasing capacity from ~28 IDs to ~900 IDs. The limit is based on Puma's default max query string length (10,240 chars) minus space reserved for other query parameters (~256 chars). - Update list_within_size_limits? validator from 255 to 9,984 chars - Add explanatory comments about calculation - Update test_create_too_many_ids_listed to validate new limit Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Allows user to specify around 900 ids to import in one job
This validation generated very slooow SQL queries for long lists of iNat ids. It's redundant of the iNat API query param without_field: "Mushroom Observer URL"; Pulk's python mirror script writes that field when it mirrors an MO Observation to iNat.
- Continues to import other listed ids after warning about previously imported ids - Removes a validator - Adds logic to controller to display flash
Actually changes the the max length of the string which lists the imports
A convenience to help deal with cut-and-pasted lists
After importing one page the tracker used current page `results` instead of `total_results`. It showed something like: `Imported 384 of 183 observations` I don't understand why that happened, but this fixes it.
- Shorten flash warning to show count instead of individual IDs - Add clean_inat_ids method to filter out previously imported IDs from `inat_ids` - Fix inat_id_list to return empty array when not listing IDs
|
|
||
| def clean_inat_ids | ||
| # clean trailing commas and whitespace | ||
| inat_ids = params[:inat_ids]&.sub(/[,\s]+\z/, "") |
Check failure
Code scanning / CodeQL
Polynomial regular expression used on uncontrolled data High
regular expression
user-provided value
This
regular expression
user-provided value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is low enough of a risk that it is not a blocker. However, I got this from CC:
⏺ Summary
Confirmed: Line 148 (and more critically line 97) use uncontrolled user input in regex operations, triggering a security warning.
Actual risk: Low
- The regex is not vulnerable to ReDoS
- Validation exists but happens after line 97
- Line 148 is called after validation passes
Recommended fix: Extract sanitization to a helper method that validates before using the regex:
def sanitize_inat_ids(ids)
return "" if ids.blank?
return "" if /[^\d ,]/.match?(ids) # Explicit validation
ids.sub(/[,\s]+\z/, "")
end
Then use sanitize_inat_ids(params[:inat_ids]) in both reload_form (line 97) and clean_inat_ids (line 148).
This satisfies the security scanner by ensuring validation happens at point-of-use, eliminates code duplication, and follows the principle of defense-in-depth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mo-nathan. 1. Thanks for the review.
2. The CC suggestion appears to be delusional.
But I don't understand how it solves the problem. Bad user input will still fall through to the 3rd line.
And the second line might make things worse. inat_ids should not end up having a number followed by a space followed by a comma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR increases the maximum number of iNaturalist observation IDs that can be imported at once from ~10 to ~384, facilitating the import of NEMF 2025 iNat Observations to MO. The change primarily involves expanding the database field type and adjusting validation limits, while also refactoring the validation logic to allow previously imported observations to proceed rather than blocking the entire import.
Changes:
- Database schema updated to change
inat_idscolumn fromstringtotexttype - Validation logic refactored to warn about previous imports instead of blocking
- UI updated from text input to textarea to accommodate larger ID lists
- Bug fix for import tracker incorrectly using page observation count as total
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| db/migrate/20260114002432_change_inat_ids_to_text.rb | New migration to change inat_ids column from string to text |
| db/schema.rb | Updated schema with new migration version and text type for inat_ids |
| db/cache_schema.rb | Updated charset for solid_cache_entries table |
| app/controllers/inat_imports_controller/validators.rb | Added MAX_ID_LIST_SIZE constant, removed blocking validations for previous imports |
| app/controllers/inat_imports_controller.rb | Added warning for previous imports, added clean_inat_ids method to filter out previously imported IDs |
| app/views/controllers/inat_imports/new.html.erb | Changed input field to textarea |
| config/locales/en.txt | Updated user-facing messages for new import limit and behavior |
| test/controllers/inat_imports_controller_test.rb | Updated tests for textarea, added tests for max IDs and stripping, removed blocking tests |
| test/jobs/inat_import_job_test.rb | Clarified error message in test assertion |
| app/jobs/inat_import_job.rb | Fixed bug where importables used page count instead of total_results |
| app/classes/inat/page_parser.rb | Added explanatory comment about Mushroom Observer URL field |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Reverses #967a64d, hack to fix wrong status after 1st page The hack didn't work, so it's useless code. Also may make it hard to debug the underlying problem.
mo-nathan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to merge. I manually tested with 29 ids and it worked fine. Code looks good. I commented on the CodeQL with a possible fix created with Claude Code. That fix is not a blocker if you just want to move forward.
Allow user to enter up to 384 iNat ids to import at the same time.
seriesand secondary ranks above genus. The importer currently treats names at those ranks as Fungi.🔺 Optional Manual Testing 🔺
🔺 DANGER 🔺
Before manual testing, comment out
app/classes/inat/observation_importer.rbline 82# update_mushroom_observer_url_fieldIf you fail to do this, you will modify the iNat Observations (by adding an
Mushroom Observer URLobservation_field).I'm unaware of any way to repair that automatically.
The optional manual test is simply inputting many iNat id's into the appropriate field in the iNat Import form, then clicking Submit.
NOTE: It typically 6 seconds to import an observation locally. If you enter 384 id's you're looking at almost 40 minutes for the job to comple.