Skip to content

Problem: Dataverse Transfer Fails at "Parse Dataverse METS XML" Step When Processing RData Files #1745

@Meghanxuxx

Description

@Meghanxuxx

Expected behaviour
When uploading Dataverse files, Archivematica should be able to correctly parse the Dataverse METS XML and generate METS.xml documentation, when "Approve automatically" is checked.

Current behaviour
For datasets containing tabular data files, processing in Archivematica fails at the "Parse Dataverse METS XML" step.

Error Message:

type: 'Item' using path: originalFormatStata/originalFormatStatacitation-endnote.xml
FSEntry(type='Item', path='originalFormatRdata/originalFormatRdata.RData', use='original', label='originalFormatRdata.RData', file_uuid='8db9f23d-1c63-4787-9767-8297052524c4', checksum='9b44d38dcaffacbdef5b358806af222f', checksumtype='MD5', fileid='file-8db9f23d-1c63-4787-9767-8297052524c4')Traceback (most recent call last):
  File "/usr/lib/archivematica/MCPClient/job.py", line 103, in JobContext
    yield
  File "/usr/lib/archivematica/MCPClient/clientScripts/parse_dataverse_mets.py", line 321, in call
    job.set_status(init_parse_dataverse_mets(job))
  File "/usr/lib/archivematica/MCPClient/clientScripts/parse_dataverse_mets.py", line 307, in init_parse_dataverse_mets
    return parse_dataverse_mets(job, transfer_dir, transfer_uuid)
  File "/usr/lib/archivematica/MCPClient/clientScripts/parse_dataverse_mets.py", line 291, in parse_dataverse_mets
    create_db_entries(job, mapping, agent)
  File "/usr/lib/archivematica/MCPClient/clientScripts/parse_dataverse_mets.py", line 182, in create_db_entries
    original_uuid = mapping[entry.derived_from].uuid
KeyError: FSEntry(type='Item', path='originalFormatRdata/originalFormatRdata.RData', use='original', label='originalFormatRdata.RData', file_uuid='8db9f23d-1c63-4787-9767-8297052524c4', checksum='9b44d38dcaffacbdef5b358806af222f', checksumtype='MD5', fileid='file-8db9f23d-1c63-4787-9767-8297052524c4')

Steps to reproduce

  • Select Transfer Type - Dataverse
  • Browse and select "Archivematica Test on Demo Dataverse"
  • Choose sample test and upload
  • Make sure Approve automatically is checked
  • Error happens during Microservice: Parse external files - Job Parse Dataverse METS XML

Cause
This issue is related to what was mentioned in issue 269. During the extract_and_remove_bundle step, RData files are considered as files that need to be extracted, and after the extraction the original RData files is deleted. This causes parse_dataverse_mets.py to throw an error if there is a RData file because the code can't find it in the database.

Your environment (version of Archivematica, operating system, other relevant details)
Archivematica v1.16.0
Storage Service v0.22.0


For Artefactual use:

Before you close this issue, you must check off the following:

  • All pull requests related to this issue are properly linked
  • All pull requests related to this issue have been merged
  • A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
  • Documentation regarding this issue has been written and merged (if applicable)
  • Details about this issue have been added to the release notes (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions