Skip to content

[hma] Storing and fetching metadata from banked content is unimplemented #1886

@Pezmc

Description

@Pezmc

POST /bank/<bank_name>/content accepts and stores metadata, but GET /bank/<bank_name>/content/<id> does not return it. The response includes original_media_uri and collab_metadata, but not the user-supplied metadata.

Repro

  1. POST /bank/MYBANK/content with metadata (content_id, content_uri, json).
  2. GET /bank/MYBANK/content/<id>.
  3. See that metadata is missing in the response.

Current code (abridged)

response: BankContentResponse = {
    "id": content_config.id,
    "disable_until_ts": content_config.disable_until_ts,
    "collab_metadata": content_config.collab_metadata,
    "original_media_uri": content_config.original_media_uri,
    "bank": content_config.bank,
}
Full `bank_get_content` method

def bank_get_content(bank_name: str, content_id: int):
storage = persistence.get_storage()
bank = storage.get_bank(bank_name)
if not bank:
abort(404, f"bank '{bank_name}' not found")
include_signals = request.args.get("include_signals", "false").lower() == "true"
content = storage.bank_content_get([content_id])
if not content:
abort(404, f"content '{content_id}' not found")
content_config = content[0]
# Create base response
response: BankContentResponse = {
"id": content_config.id,
"disable_until_ts": content_config.disable_until_ts,
"collab_metadata": content_config.collab_metadata,
"original_media_uri": content_config.original_media_uri,
"bank": content_config.bank,
}
# If signals were requested, fetch them separately and include in response
if include_signals:
signals = storage.bank_content_get_signals([content_id])
if content_id in signals:
response["signals"] = signals[content_id]
return jsonify(response)

Expected

Return the stored user metadata:

"metadata": {
  "content_id": "...",
  "content_uri": "...",
  "json": { "..." : "..." }
}

Proposal

Include metadata in the response:

response["metadata"] = content_config.metadata

Optionally support ?include_metadata=true|false, similar to include_signals.

Why

We're been banking descriptive metadata for curation and audit, and need it on read.

Related

#1681

Metadata

Metadata

Assignees

No one assigned

    Labels

    hmaItems related to the hasher-matcher-actioner system

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions