Tree-based sectioning, structured content editor, and section thumbnails by elasticsounds · Pull Request #289 · unicef/adt-studio

elasticsounds · 2026-04-16T22:28:18Z

Summary

Extend page-sectioning to consume the recursive content tree from page-structuring via a new SectionContentNodePart variant; stage-runner detects tree-vs-flat data and routes accordingly, and web-rendering's expandParts flattens subtrees.
Replace the extract stage's legacy detail view with the shared ContentNodeBlock tree UI (drag/drop restructure, inline text editing, role/structure pills) and unify storyboard overview + per-section editing into one all-pages view.
Capture fixed-viewport PNG section thumbnails during every render/re-render using a shared Playwright ScreenshotRenderer; serve via new GET /books/:label/thumbnails/:filename and swap the scaled-iframe previews in SectioningOverview and ContentEditor for <img> tags.
Misc: storage gains thumbnails/ dir + helpers, new Lingui strings extracted and translated to es / pt-BR, prompt updates for tree node summaries.

Test plan

pnpm typecheck passes
Run storyboard on a book with tree-based structuring and verify sections render with content_node parts
Verify thumbnails appear in the overview and content editor after render, and refresh after re-render
Drag/drop content across sections, save, and confirm re-render reflects the new structure
Confirm older flat text-classification books still section + render (backward compat)

…mbnails - Extend page-sectioning pipeline to accept a content tree (SectionContentNodePart) alongside legacy text-group/image parts; stage-runner detects tree vs flat data and routes accordingly - Migrate ExtractPageDetail to the shared ContentNodeBlock tree UI with drag/drop restructure and inline text editing - Unify storyboard overview + content editing into a single all-pages view with per-section content trees - Capture section preview PNG thumbnails after every render/re-render; serve via new GET /books/:label/thumbnails/:filename; replace scaled-iframe previews with img tags

- Wrap Template/AI badge labels and page/pages in Lingui macros - Fix drag-reorder off-by-one in section overview and tree move helper - Escape CSS selector id and surface re-render failures via console

Consecutive text-leaf children of a container now emit a single group part tagged with the container's structure, so a group/paragraph with multiple sentence leaves renders as one paragraph instead of one block per leaf.

Scope the paragraph-flow grouping to consecutive same-role leaves within a single container, so a heading sibling stays separate and standalone leaves at the section level remain independent groups.

Add explicit rule for text-group rendering: when a group has multiple texts in a single reading flow, wrap the whole group in one block-level element and use inner per text, instead of emitting one block per text which forces each sentence onto its own line.

Without this, the reviewer reads the text-only rule and splits a .... back into one per sentence on every iteration, undoing the paragraph flow introduced by the generation prompt.

Extract-phase tree was showing flat sentences while render grouped them — the structurer LLM needed an explicit rule that sibling sentences of the same visual paragraph must share a group container.

…guidance Collapses the separate `role: "image"` leaf back into the `image_group` container that owns it — eliminating the schema-confusion errors where the LLM mixed up structure vs role on image nodes or duplicated image_id on both container and child. The container now carries `image_id` directly, with optional `children` for captions, labels, and overlaid text. Also clarifies activity_option guidance after recurring failures where the LLM applied `activity_option` as a leaf role: the container description now spells out the inner shape, and the example shows an activity with three options (text-only and image+text) — each wrapped in its own container even when it holds a single text leaf. Plus: extract structure tab gains a JSON view, and consecutive same-type text groups are merged at section level so paragraph runs render as one block instead of fragmenting.

The unquoted inline string contained a colon ("content:") that the YAML parser read as a nested mapping key, breaking config load on stage run. Use a folded block scalar so the description can include colons and backticks freely.

CI lint was failing on `DetailPanel` enum values like "textGroups" and "prunedImages" — these are internal state keys never displayed to users, but the existing ignore regex only matched all-lowercase identifiers. Add a complementary regex for camelCase identifier strings so similar state-key literals don't keep tripping the rule. Also drop a now-unused eslint-disable directive in ContentEditor that the rule no longer fires on.

Empty cells are a legitimate table pattern (column gaps, alignment) — the validator now accepts table_cell with no children, the same exemption image_group already has. Prompt updated to tell the LLM to preserve column structure rather than skipping blank cells. Also drop the global "tree nesting too deep" follow-up. It fired on any empty container, not on actual depth, and steered the LLM toward restructuring instead of fixing the real issue (an empty cell or a missing leaf). The per-node error already explains what's wrong.

Tells the structurer to nest shared-background content under one image_group and to emit plain leaves for backgrounds that don't match any extracted image_id, rather than borrowing an unrelated one.

Replaces the image_group container with two distinct shapes mirroring HTML: an image leaf (role: "image" + image_id) for foreground images and an optional background_image_id on any container for backdrops. Lower the default min_side image filter to 10 so visually meaningful small content (signs, labels) reaches the structurer, allow the LLM to omit images embedded in the PDF but not visible on the page, and disable the self-review refinement loop by default for one-shot structuring.

elasticsounds added 13 commits April 16, 2026 15:18

Address review feedback and refine tree editing UX

31d67e5

- Wrap Template/AI badge labels and page/pages in Lingui macros - Fix drag-reorder off-by-one in section overview and tree move helper - Escape CSS selector id and surface re-render failures via console

Group sibling text leaves into one paragraph during flatten

d73ef20

Consecutive text-leaf children of a container now emit a single group part tagged with the container's structure, so a group/paragraph with multiple sentence leaves renders as one paragraph instead of one block per leaf.

Group same-role text leaves only when they share a container

f5f65ee

Scope the paragraph-flow grouping to consecutive same-role leaves within a single container, so a heading sibling stays separate and standalone leaves at the section level remain independent groups.

Require paragraph grouping in page structuring prompt

bdc396b

Extract-phase tree was showing flat sentences while render grouped them — the structurer LLM needed an explicit rule that sibling sentences of the same visual paragraph must share a group container.

Fix YAML parse error on activity_option description

0e69fce

The unquoted inline string contained a colon ("content:") that the YAML parser read as a nested mapping key, breaking config load on stage run. Use a folded block scalar so the description can include colons and backticks freely.

Forbid reusing image_id across sibling image_groups

95e10d6

Tells the structurer to nest shared-background content under one image_group and to emit plain leaves for backgrounds that don't match any extracted image_id, rather than borrowing an unrelated one.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tree-based sectioning, structured content editor, and section thumbnails#289

Tree-based sectioning, structured content editor, and section thumbnails#289
elasticsounds wants to merge 13 commits intonicpottier/tree-text-extractionfrom
elasticsounds/tree-editing-features

elasticsounds commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elasticsounds commented Apr 16, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant