Skip to content

SPIKE: IA: metadata may not be used correctly #601

Open
@ryjen

Description

@ryjen

Describe the bug

Am not 100% sure, but is probably worth looking into whether we are using internet archive meta data correctly.

Internet Archive generates its own metadata about uploads.

Currently the Save app just uploads a json file with our information, which IA also generates metadata for (metadata about metadata)

It looks like there may be a separate endpoint for metadata we can use to update IA's metadata.

Expected behavior

No unnecessary files for metadata

Metadata does not include irrelevant information (for example originalFilePath or progress)

Examples

Sample IA generated meta data for Save up upload content

<metadata>
<identifier>E1j913BVoAMX59e-jpeg-zjpu</identifier>
<collection>opensource_media</collection>
<language>eng</language>
<licenseurl>https://creativecommons.org/licenses/by/4.0/</licenseurl>
<mediatype>image</mediatype>
<title>E1j913BVoAMX59e.jpeg</title>
<uploader>[email protected]</uploader>
<publicdate>2024-03-25 17:23:58</publicdate>
<addeddate>2024-03-25 17:23:58</addeddate>
<curation>[curator][email protected][/curator][date]20240325172447[/date][comment]checked for malware[/comment]</curation>
</metadata>

Media File metadata from Save app:

{"author":"","collectionId":6,"contentLength":1099011,"dateCreated":"Mar 25, 2024 10:23:09 AM","description":"","flag":false,"location":"","mediaHash":[],"hash":"15e64238066bfa3ba2a5c88bfcb551ff88278d3543c5f4067d69348b77cd82ee","contentType":"image/jpeg","originalFilePath":"file:///data/user/0/net.opendasharchive.openarchive.release/cache/20240325_102309.E1j913BVoAMX59e.jpeg","priority":0,"progress":0,"projectId":1,"selected":false,"serverUrl":"","status":4,"statusMessage":"","tags":"","originalFileName":"E1j913BVoAMX59e.jpeg","updateDate":"Mar 25, 2024 10:23:09 AM","uploadDate":"Mar 25, 2024 10:23:11 AM","id":6}
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
  <identifier>E1j913BVoAMX59e-jpeg-frih</identifier>
  <collection>opensource</collection>
  <language>eng</language>
  <mediatype>texts</mediatype>
  <uploader>[email protected]</uploader>
  <title>E1j913BVoAMX59e-jpeg-frih</title>
  <publicdate>2024-03-25 17:24:09</publicdate>
  <addeddate>2024-03-25 17:24:09</addeddate>
  <curation>[curator][email protected][/curator][date]20240325172944[/date][comment]checked for malware[/comment]</curation>
  <identifier-access>http://archive.org/details/E1j913BVoAMX59e-jpeg-frih</identifier-access>
  <identifier-ark>ark:/13960/s243q45kwcz</identifier-ark>
</metadata>

Environment (please complete the following information):

  • OS version: Any
  • Device:Any
  • App Version: 0.3.1

Additional context

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Backlog

Relationships

None yet

Development

No branches or pull requests

Issue actions