Skip to content

Bug: Improve import/export service scalability for large exhibits #3643

@VanessaVenti

Description

@VanessaVenti

Description

Spotlight’s current import/export functionality does not scale well for large exhibits, particularly those with many items or uploaded images. This results in timeouts, crashes, and failed imports when JSON files become too large.

This issue combines two related concerns:

  1. Large JSON files (200MB+) cannot be imported (previously tracked on import/export service: large JSON files cannot be imported #2783)
  2. There is no ability to export CMS-only content (excluding exhibit items), resulting in unnecessarily large export files (previously tracked on Feature: Ability to export only the exhibit CMS content and not exhibit items #3504)

Together, these issues make exhibit migration and backup workflows unreliable for larger sites.

Issue 1: Large JSON files cannot be imported

When exporting and then importing a Spotlight exhibit using the Export/Import data buttons under General settings, importing large JSON files (200MB+) causes the application to crash without logging errors.

Investigation suggests the issue may stem from this code:

def process_import
  if @exhibit.import(JSON.parse(import_exhibit_params.read)) && @exhibit.reindex_later(current_user)
    redirect_to spotlight.exhibit_dashboard_path(@exhibit), notice: t(:'helpers.submit.exhibit.updated', model: @exhibit.class.model_name.human.downcase)
  else
    render action: :import
  end
end

JSON.parse reads the entire file into memory at once. If the file is sufficiently large, the application may run out of memory and crash before logging any useful information.

Questions to investigate:

  • Is this memory exhaustion the root cause?
  • Should large JSON files be processed in chunks or streamed instead?
  • Can better logging or error handling be added?

Issue 2: Need ability to export CMS-only content (exclude exhibit items)

Currently, exporting an exhibit includes all exhibit items, including large numbers of records and uploaded images. For exhibits with many items:

  • Exports can timeout
  • JSON files become too large to import
  • Migration workflows fail

Harvard encountered this when migrating from Spotlight 2 to 3 and had to manually recreate some exhibits because exports were too large to process.

There is a need for an option to:

  • Export only CMS content
  • Exclude harvested exhibit items from the export
  • Allow items to be reharvested separately after migration

This feature was carried over from the Spotlight Community Roadmap.

Proposed Improvements

  1. Improve import process to handle large JSON files more safely:
  • Stream or chunk JSON processing
  • Improve memory handling
  • Add meaningful error logging
  1. Add export options:
  • Full export (current behavior)
  • CMS-only export (no harvested items)
  • Potentially exclude uploaded images as an option

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions