Skip to content

Conversation

@JoeCohen
Copy link
Member

@JoeCohen JoeCohen commented Jan 13, 2026

Allow user to enter up to 384 iNat ids to import at the same time.

  • The immediate goal is to facilitate importing NEMF 2025 iNat Observations to MO. This PR is sufficient to do that.
  • The PR does not address other existing issues relating to iNat imports, including:
    • Display of remaining time and observations is incorrect after importing 200 observations
    • MO does not support the rank series and secondary ranks above genus. The importer currently treats names at those ranks as Fungi.
    • For and Observation's Location, the importer uses the smallest existing MO Location encompassing the iNat error-expanded lat/lng.

🔺 Optional Manual Testing 🔺

🔺 DANGER 🔺
Before manual testing, comment out app/classes/inat/observation_importer.rb line 82

      # update_mushroom_observer_url_field

If you fail to do this, you will modify the iNat Observations (by adding an Mushroom Observer URL observation_field).
I'm unaware of any way to repair that automatically.

The optional manual test is simply inputting many iNat id's into the appropriate field in the iNat Import form, then clicking Submit.
NOTE: It typically 6 seconds to import an observation locally. If you enter 384 id's you're looking at almost 40 minutes for the job to comple.

JoeCohen and others added 3 commits January 13, 2026 13:23
Replace text_field with text_area for the inat_ids input to prepare
for accepting longer lists of iNaturalist observation IDs. This provides
better UX for entering large comma-separated ID lists.

- Update form to use f.text_area instead of f.text_field
- Update controller test to expect textarea element
- Remove size attribute (not applicable to textarea)
- Maintain form-control class for consistent width

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Update validator to accept up to 9,984 characters in the inat_ids
parameter, increasing capacity from ~28 IDs to ~900 IDs. The limit is
based on Puma's default max query string length (10,240 chars) minus
space reserved for other query parameters (~256 chars).

- Update list_within_size_limits? validator from 255 to 9,984 chars
- Add explanatory comments about calculation
- Update test_create_too_many_ids_listed to validate new limit

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@JoeCohen JoeCohen linked an issue Jan 13, 2026 that may be closed by this pull request
29 tasks
@coveralls
Copy link
Collaborator

coveralls commented Jan 13, 2026

Coverage Status

coverage: 95.661%. remained the same
when pulling 2be9712 on 3427-relax-import-list-limit
into c506e56 on main.

Allows user to specify around 900 ids to import in one job
This validation generated very slooow SQL queries for long lists of iNat ids.
It's redundant of the iNat API query param without_field: "Mushroom Observer URL";
Pulk's python mirror script writes that field when it mirrors an MO Observation to iNat.
- Continues to import other listed ids after  warning about previously imported ids
- Removes a validator
- Adds logic to controller to display flash
Actually changes the the max length of the string which lists the imports
A convenience to help deal with cut-and-pasted lists
After importing one page the tracker used current page `results` instead of `total_results`.
It showed something like:
`Imported 384 of 183 observations`
I don't understand why that happened, but this fixes it.
- Shorten flash warning to show count instead of individual IDs
- Add clean_inat_ids method to filter out previously imported IDs from `inat_ids`
- Fix inat_id_list to return empty array when not listing IDs

def clean_inat_ids
# clean trailing commas and whitespace
inat_ids = params[:inat_ids]&.sub(/[,\s]+\z/, "")

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on a
user-provided value
may run slow on strings with many repetitions of '\t'.
This
regular expression
that depends on a
user-provided value
may run slow on strings with many repetitions of '\t'.
Copy link
Member

@mo-nathan mo-nathan Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is low enough of a risk that it is not a blocker. However, I got this from CC:

⏺ Summary

Confirmed: Line 148 (and more critically line 97) use uncontrolled user input in regex operations, triggering a security warning.

Actual risk: Low

  • The regex is not vulnerable to ReDoS
  • Validation exists but happens after line 97
  • Line 148 is called after validation passes

Recommended fix: Extract sanitization to a helper method that validates before using the regex:

  def sanitize_inat_ids(ids)
    return "" if ids.blank?
    return "" if /[^\d ,]/.match?(ids)  # Explicit validation
    ids.sub(/[,\s]+\z/, "")
  end

Then use sanitize_inat_ids(params[:inat_ids]) in both reload_form (line 97) and clean_inat_ids (line 148).

This satisfies the security scanner by ensuring validation happens at point-of-use, eliminates code duplication, and follows the principle of defense-in-depth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mo-nathan. 1. Thanks for the review.
2. The CC suggestion appears to be delusional.
But I don't understand how it solves the problem. Bad user input will still fall through to the 3rd line.
And the second line might make things worse. inat_ids should not end up having a number followed by a space followed by a comma.

@JoeCohen JoeCohen self-assigned this Jan 16, 2026
@JoeCohen JoeCohen added the iNat interaction with iNat label Jan 16, 2026
@JoeCohen JoeCohen marked this pull request as ready for review January 16, 2026 03:38
Copilot AI review requested due to automatic review settings January 16, 2026 03:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR increases the maximum number of iNaturalist observation IDs that can be imported at once from ~10 to ~384, facilitating the import of NEMF 2025 iNat Observations to MO. The change primarily involves expanding the database field type and adjusting validation limits, while also refactoring the validation logic to allow previously imported observations to proceed rather than blocking the entire import.

Changes:

  • Database schema updated to change inat_ids column from string to text type
  • Validation logic refactored to warn about previous imports instead of blocking
  • UI updated from text input to textarea to accommodate larger ID lists
  • Bug fix for import tracker incorrectly using page observation count as total

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
db/migrate/20260114002432_change_inat_ids_to_text.rb New migration to change inat_ids column from string to text
db/schema.rb Updated schema with new migration version and text type for inat_ids
db/cache_schema.rb Updated charset for solid_cache_entries table
app/controllers/inat_imports_controller/validators.rb Added MAX_ID_LIST_SIZE constant, removed blocking validations for previous imports
app/controllers/inat_imports_controller.rb Added warning for previous imports, added clean_inat_ids method to filter out previously imported IDs
app/views/controllers/inat_imports/new.html.erb Changed input field to textarea
config/locales/en.txt Updated user-facing messages for new import limit and behavior
test/controllers/inat_imports_controller_test.rb Updated tests for textarea, added tests for max IDs and stripping, removed blocking tests
test/jobs/inat_import_job_test.rb Clarified error message in test assertion
app/jobs/inat_import_job.rb Fixed bug where importables used page count instead of total_results
app/classes/inat/page_parser.rb Added explanatory comment about Mushroom Observer URL field

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JoeCohen JoeCohen requested a review from mo-nathan January 16, 2026 03:49
Reverses #967a64d, hack to fix wrong status after 1st page
The hack didn't work, so it's useless code.
Also may make it hard to debug the underlying problem.
Copy link
Member

@mo-nathan mo-nathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge. I manually tested with 29 ids and it worked fine. Code looks good. I commented on the CodeQL with a possible fix created with Claude Code. That fix is not a blocker if you just want to move forward.

@JoeCohen JoeCohen merged commit c1d3d3d into main Jan 16, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

iNat interaction with iNat

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Relax import list limit

4 participants