Skip to content

Conversation

ErykKul
Copy link
Collaborator

@ErykKul ErykKul commented Aug 14, 2025

What this PR does / why we need it:

  • Fix CORS to correctly handle multiple browser origins without proxy changes.
  • Echo the request Origin when it’s in the allowlist and add Vary: Origin so caches behave.
  • Sanitize CSV lists for methods/headers to avoid quoted/invalid values breaking preflight.
  • Standardize origin parsing to comma-separated; remain whitespace-tolerant.
  • Rely on JVM options/MicroProfile only for dataverse.cors.* config (deprecated DB fallback removed).

Which issue(s) this PR closes:

Special notes for your reviewer:

  • Wildcard "*" ACAO remains invalid with credentials; docs warn accordingly. Echo logic only applies when a concrete allowlist is configured.
  • Config keys used (JVM options/MicroProfile):
    • dataverse.cors.origin (comma-separated list or "*")
    • dataverse.cors.methods
    • dataverse.cors.headers.allow
    • dataverse.cors.headers.expose
  • Added Vary: Origin only when echoing a specific Origin. With "*", Vary is not added.
  • Unit tests cover wildcard, single/multiple origins, Vary behavior, and header sanitization.
  • An unrelated full-suite error (CacheFactoryBeanTest.init, cgroups) was observed locally; targeted CORS tests pass.

Suggestions on how to test this:

  1. Configure comma-separated origins via JVM options/MicroProfile, e.g.:

    • dataverse.cors.origin = https://app1.example, https://app2.example
    • dataverse.cors.methods = GET, POST, OPTIONS
    • dataverse.cors.headers.allow = Authorization, Content-Type
    • dataverse.cors.headers.expose = Content-Disposition
  2. From a client, send requests with different Origin headers and verify:

    • When Origin matches allowlist: Access-Control-Allow-Origin echoes that origin and response includes Vary: Origin.
    • When Origin doesn’t match: no ACAO header is returned.
    • With wildcard configured (dataverse.cors.origin = ): ACAO is ""; note that credentialed requests will still fail per browser rules.
  3. Preflight check: Send OPTIONS with Origin + Access-Control-Request-Method. Verify Access-Control-Allow-Methods/Headers reflect sanitized config values.

  4. Automated tests: run CorsFilterTest (JUnit 5) to validate the above behaviors.

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

  • No UI changes.

Is there a release notes update needed for this change?:

  • Yes. Included: doc/release-notes/11744-cors-echo-origin-vary.md.

Additional documentation:

  • If helpful, I can add a short section to the installation/admin docs describing CORS configuration via JVM options/MicroProfile and the comma-separated origin convention.

Verification (Manual Tests)

Replace DV_URL with your base URL and ORIGIN with one you configured.

Preflight (OPTIONS):

curl -i -X OPTIONS \
  -H "Origin: https://libis.github.io" \
  -H "Access-Control-Request-Method: GET" \
  "${DV_URL}/api/info/version"

Expect:

  • HTTP/1.1 200 (or appropriate success)
  • Access-Control-Allow-Origin: https://libis.github.io
  • Vary: Origin

Actual request:

curl -i \
  -H "Origin: https://libis.github.io" \
  "${DV_URL}/api/info/version"

Expect same echoed ACAO and Vary: Origin.

Multi-origin sanity check (second origin):

curl -i -H "Origin: https://gdcc.github.io" "${DV_URL}/api/info/version" | grep -i "Access-Control-Allow-Origin"

Should echo https://gdcc.github.io.

Wildcard mode check (after setting *):

curl -i -H "Origin: https://whatever.example" "${DV_URL}/api/info/version" | grep -i "Access-Control-Allow-Origin"

Should show * (and no Vary: Origin).

…ists; prefer comma-separated origins; rely on JVM options/MicroProfile only; add tests and release notes
@github-actions github-actions bot added the Type: Bug a defect label Aug 14, 2025
@ErykKul ErykKul moved this to Ready for Review ⏩ in IQSS Dataverse Project Aug 14, 2025
@ErykKul ErykKul added Consider For Next Release A simple change (eg bug fix) that would be good to prioritize since it has been seen in the wild Original size: 3 labels Aug 14, 2025

This comment has been minimized.

@cmbz cmbz added the FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) label Aug 16, 2025
@cmbz cmbz added the FY26 Sprint 5 FY26 Sprint 5 (2025-08-27 - 2025-09-10) label Aug 28, 2025
@cmbz cmbz added the FY26 Sprint 6 FY26 Sprint 6 (2025-09-10 - 2025-09-24) label Sep 14, 2025
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay to me w.r.t. the CORS changes themselves. I made one comment about the possibility of simplifying some of the parsing. The other request I'd make is to check in the Guides where CORS is mentioned and see if this changes anything, e.g. in the settings documentation (e.g. https://guides.dataverse.org/en/latest/installation/config.html#cors-settings and settings list where minimally the :AllowsCors info should get deleted, but also the notes in the Big Data guide, etc.) There's more discussion of CORS at https://github.com/gdcc/dataverse-previewers/wiki/Using-Previewers-with-download-redirects-from-S3#if-you-are-interested-in-restricting-the-allowedorigins-and-allowedheaders - changes there wouldn't be part of this PR but it would be good to get updated guidance for that for Dataverse v6.9+.

W.r.t. testing, I think there's a lot of useful guidance, I'd also suggest regression testing to make sure that things like direct up/download and previewers that rely on CORS still work after this change and updating the config as described.

@qqmyers qqmyers moved this from Ready for Review ⏩ to In Review 🔎 in IQSS Dataverse Project Sep 23, 2025
@qqmyers qqmyers added this to the 6.9 milestone Sep 23, 2025
@cmbz cmbz added the FY26 Sprint 7 FY26 Sprint 7 (2025-09-24 - 2025-10-08) label Sep 24, 2025
Copy link
Collaborator Author

@ErykKul ErykKul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short follow-up summary (reason for additional changes):

While addressing the CORS review feedback, we noticed scattered ad‑hoc regex CSV splits (",\s" / ",\s*") across core code (settings, PID providers, anonymized fields, workflow, tests). We replaced every remaining instance with the existing CsvUtil to unify trimming, quote cleanup, empty‑token removal, and ordering. Added small JvmSettings CSV helpers and refactored CORS + PID provider code to consume them. This is a low‑risk mechanical cleanup done now to prevent further drift, reduce future review overhead, and give us one choke point (CsvUtil) for any future parsing tweaks. Behaviour only changes for pathological inputs (extra commas/quotes) which are now safely normalized.

@ErykKul ErykKul requested a review from qqmyers September 29, 2025 10:45
@coveralls
Copy link

coveralls commented Sep 29, 2025

Coverage Status

Changes unknown
when pulling 83e4a10 on 11744-cors-echo-origin-vary
into ** on develop**.

This comment has been minimized.

1 similar comment

This comment has been minimized.

@ErykKul ErykKul assigned qqmyers and unassigned ErykKul Sep 29, 2025
@pdurbin pdurbin self-assigned this Sep 30, 2025
Copy link
Member

@qqmyers qqmyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR now contains two useful additions - CORS changes and standardizing parsing of csv settings. I've left various comments about simplifying some code, removing references to the now unused :AllowCors setting.
I raised two comments in standup today

  • are we OK with this as one PR (versus two)?
  • are there any concerns about stripping double quote marks from settings? @pdurbin has added a comment trying to get clarification about why @ErykKul thinks this is useful/needed.

My guess is that we can accept this as one PR to avoid extra work, so I think addressing the comments and making a group decision about whether double quotes should be stripped are the main actions.

This comment has been minimized.

2 similar comments

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

1 similar comment

This comment has been minimized.

@ErykKul ErykKul requested a review from Copilot October 14, 2025 17:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modernizes CORS (Cross-Origin Resource Sharing) handling in Dataverse to support multiple browser origins without proxy changes. It implements proper origin echoing and caching behavior while standardizing CSV parsing across the codebase.

Key changes:

  • CORS now echoes the request Origin when it matches the allowlist and adds Vary: Origin for proper caching behavior
  • Introduces standardized comma-separated value parsing via ListSplitUtil across the entire codebase
  • Removes deprecated database-based CORS configuration in favor of JVM/MicroProfile settings only

Reviewed Changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/main/java/edu/harvard/iq/dataverse/filter/CorsFilter.java Core CORS logic rewritten to echo origins and add Vary header
src/main/java/edu/harvard/iq/dataverse/util/ListSplitUtil.java New utility for consistent CSV parsing across the application
src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java Added methods for CSV-aware setting lookups
src/test/java/edu/harvard/iq/dataverse/filter/CorsFilterTest.java Comprehensive test coverage for new CORS behavior
Various implementation files Updated to use standardized CSV parsing utility
Documentation files Updated configuration guides and release notes
Comments suppressed due to low confidence (1)

src/main/java/edu/harvard/iq/dataverse/settings/JvmSettings.java:1

  • Similar to the optional version, this method unnecessarily joins array values with commas only to split them again. Consider using Arrays.asList(values) or List.of(values) directly.
package edu.harvard.iq.dataverse.settings;

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

This comment has been minimized.

This comment has been minimized.

1 similar comment
Copy link

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:11744-cors-echo-origin-vary
ghcr.io/gdcc/configbaker:11744-cors-echo-origin-vary

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Consider For Next Release A simple change (eg bug fix) that would be good to prioritize since it has been seen in the wild FY26 Sprint 4 FY26 Sprint 4 (2025-08-13 - 2025-08-27) FY26 Sprint 5 FY26 Sprint 5 (2025-08-27 - 2025-09-10) FY26 Sprint 6 FY26 Sprint 6 (2025-09-10 - 2025-09-24) FY26 Sprint 7 FY26 Sprint 7 (2025-09-24 - 2025-10-08) FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) Original size: 3 Type: Bug a defect

Projects

Status: In Review 🔎

Development

Successfully merging this pull request may close these issues.

CORS: Multi-origin ACAO invalid — echo single Origin and set Vary: Origin

8 participants