Skip to content

Manifest enrichment#699

Open
stephenwf wants to merge 10 commits into
feature/IDA-893-enrichment-integrationfrom
feature/IDA-1055-enrich-manifest
Open

Manifest enrichment#699
stephenwf wants to merge 10 commits into
feature/IDA-893-enrichment-integrationfrom
feature/IDA-1055-enrich-manifest

Conversation

@stephenwf

Copy link
Copy Markdown
Member

No description provided.

Comment on lines +8 to +54
export const manifestEnrichmentPipeline: RouteMiddleware<{ id: number }> = async context => {
const { siteId } = userWithScope(context, ['site.admin']);
const site = await context.siteManager.getSiteById(siteId);
const siteApi = api.asUser({ siteId });

// 12-hour token.
const webhook = await context.webhookExtension.generateWebhookUrl(
site,
manifestEnrichmentPipelineEvent.event_id,
12 * 3600
);
context.response.body = await siteApi.enrichment.enrichManifestInternal(context.params.id, webhook);
};

export const manifestEnrichmentPipelineEvent: WebhookEventType = {
event_id: 'manifest-enrichment-pipeline.complete',
body_variables: ['id'],
};

export const manifestEnrichmentHook: IncomingWebhook = {
type: 'manifest-enrichment-pipeline-task-ingest',
event_id: 'manifest-enrichment-pipeline.complete',
is_outgoing: false,
execute: async (resp, siteApi) => {
invariant(resp.id, 'Expected response to contain `id`');

const task = await siteApi.enrichment.getEnrichmentTask(resp.id);
invariant(task.subject, 'Missing subject on task');
invariant(task.status === 3, 'Task is not yet complete');

if (task.task_type === 'ocr_madoc_resource') {
const parsed = parseUrn(task.subject);
invariant(parsed, 'Invalid subject');
invariant(parsed.type === 'canvas', 'Can only process canvases');

if (task.state && task.state.ocr_resources && task.state.ocr_resources[0]) {
const first = task.state.ocr_resources[0];
const enrichmentPlaintext = await siteApi.enrichment.getEnrichmentPlaintext(first);
invariant(enrichmentPlaintext, 'Missing plaintext from enrichment');
if (enrichmentPlaintext.plaintext) {
const canvasId = parsed.id; // ??
return await siteApi.updateCanvasPlaintext(canvasId, enrichmentPlaintext.plaintext);
}
}
}
},
};

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main part (other parts mostly filling gaps in Madocs API).

  • manifestEnrichmentPipeline is the API route handler when an admin hits "Enrich"
    • Generates a webhook URL (12-hour token)
    • Creates the enrichment task
    • Returns task (@mattmcgrattan it would be useful to omit callback_url from the state in the future)
  • manifestEnrichmentPipelineEvent is a short description of the webhook "type" and the fields expected in the response.
  • manifestEnrichmentHook this is the function that is called when the task is complete. We get the webhook post-body JSON and an instance of the siteApi (already mapped correctly to the right site).
    • Fetch the enrichment task
    • Validate that it has a subject + is complete
    • If it's an ocr_madoc_resource:
      • Parse + validate the subject
      • check for ocr_resources (@mattmcgrattan will need to change if there are more than one here)
      • Fetch the plaintext + validate what we expect to see
      • Attach the plaintext to the canvas.

Comment on lines +94 to 104
enrichManifestInternal(id: number, callback?: string) {
return this.api.request<EnrichmentTask>(`/api/enrichment/tasks/madoc_manifest_enrichment_pipeline`, {
method: 'POST',
body: {
task: {
subject: `urn:madoc:manifest:${id}`,
parameters: [{ callback_url: callback }],
},
},
});
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the call to kick off the enrichment pipeline

@github-actions

github-actions Bot commented Apr 4, 2023

Copy link
Copy Markdown

Preview docker image available

docker pull ghcr.io/digirati-co-uk/madoc-api:pr-699

@Heather0K

Copy link
Copy Markdown
Contributor

you might want to put a word-break: break-all; here

Screenshot 2023-04-04 at 18 13 27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants