Skip to content

Tree-based sectioning, structured content editor, and section thumbnails#289

Open
elasticsounds wants to merge 13 commits intonicpottier/tree-text-extractionfrom
elasticsounds/tree-editing-features
Open

Tree-based sectioning, structured content editor, and section thumbnails#289
elasticsounds wants to merge 13 commits intonicpottier/tree-text-extractionfrom
elasticsounds/tree-editing-features

Conversation

@elasticsounds
Copy link
Copy Markdown
Contributor

Summary

  • Extend page-sectioning to consume the recursive content tree from page-structuring via a new SectionContentNodePart variant; stage-runner detects tree-vs-flat data and routes accordingly, and web-rendering's expandParts flattens subtrees.
  • Replace the extract stage's legacy detail view with the shared ContentNodeBlock tree UI (drag/drop restructure, inline text editing, role/structure pills) and unify storyboard overview + per-section editing into one all-pages view.
  • Capture fixed-viewport PNG section thumbnails during every render/re-render using a shared Playwright ScreenshotRenderer; serve via new GET /books/:label/thumbnails/:filename and swap the scaled-iframe previews in SectioningOverview and ContentEditor for <img> tags.
  • Misc: storage gains thumbnails/ dir + helpers, new Lingui strings extracted and translated to es / pt-BR, prompt updates for tree node summaries.

Test plan

  • pnpm typecheck passes
  • Run storyboard on a book with tree-based structuring and verify sections render with content_node parts
  • Verify thumbnails appear in the overview and content editor after render, and refresh after re-render
  • Drag/drop content across sections, save, and confirm re-render reflects the new structure
  • Confirm older flat text-classification books still section + render (backward compat)

…mbnails

- Extend page-sectioning pipeline to accept a content tree (SectionContentNodePart) alongside legacy text-group/image parts; stage-runner detects tree vs flat data and routes accordingly
- Migrate ExtractPageDetail to the shared ContentNodeBlock tree UI with drag/drop restructure and inline text editing
- Unify storyboard overview + content editing into a single all-pages view with per-section content trees
- Capture section preview PNG thumbnails after every render/re-render; serve via new GET /books/:label/thumbnails/:filename; replace scaled-iframe previews with img tags
- Wrap Template/AI badge labels and page/pages in Lingui macros
- Fix drag-reorder off-by-one in section overview and tree move helper
- Escape CSS selector id and surface re-render failures via console
Consecutive text-leaf children of a container now emit a single group
part tagged with the container's structure, so a group/paragraph with
multiple sentence leaves renders as one paragraph instead of one block
per leaf.
Scope the paragraph-flow grouping to consecutive same-role leaves within
a single container, so a heading sibling stays separate and standalone
leaves at the section level remain independent groups.
Add explicit rule for text-group rendering: when a group has multiple
texts in a single reading flow, wrap the whole group in one block-level
element and use inner <span data-id=...> per text, instead of emitting
one block per text which forces each sentence onto its own line.
Without this, the reviewer reads the text-only rule and splits a
<p><span data-id>..</span><span data-id>..</span></p> back into one
<p data-id> per sentence on every iteration, undoing the paragraph
flow introduced by the generation prompt.
Extract-phase tree was showing flat sentences while render grouped
them — the structurer LLM needed an explicit rule that sibling
sentences of the same visual paragraph must share a group container.
…guidance

Collapses the separate `role: "image"` leaf back into the `image_group`
container that owns it — eliminating the schema-confusion errors where the
LLM mixed up structure vs role on image nodes or duplicated image_id on
both container and child. The container now carries `image_id` directly,
with optional `children` for captions, labels, and overlaid text.

Also clarifies activity_option guidance after recurring failures where
the LLM applied `activity_option` as a leaf role: the container
description now spells out the inner shape, and the example shows an
activity with three options (text-only and image+text) — each wrapped
in its own container even when it holds a single text leaf.

Plus: extract structure tab gains a JSON view, and consecutive same-type
text groups are merged at section level so paragraph runs render as one
block instead of fragmenting.
The unquoted inline string contained a colon ("content:") that the YAML
parser read as a nested mapping key, breaking config load on stage run.
Use a folded block scalar so the description can include colons and
backticks freely.
CI lint was failing on `DetailPanel` enum values like "textGroups" and
"prunedImages" — these are internal state keys never displayed to users,
but the existing ignore regex only matched all-lowercase identifiers.
Add a complementary regex for camelCase identifier strings so similar
state-key literals don't keep tripping the rule.

Also drop a now-unused eslint-disable directive in ContentEditor that
the rule no longer fires on.
Empty cells are a legitimate table pattern (column gaps, alignment) — the
validator now accepts table_cell with no children, the same exemption
image_group already has. Prompt updated to tell the LLM to preserve
column structure rather than skipping blank cells.

Also drop the global "tree nesting too deep" follow-up. It fired on any
empty container, not on actual depth, and steered the LLM toward
restructuring instead of fixing the real issue (an empty cell or a
missing leaf). The per-node error already explains what's wrong.
Tells the structurer to nest shared-background content under one
image_group and to emit plain leaves for backgrounds that don't match
any extracted image_id, rather than borrowing an unrelated one.
Replaces the image_group container with two distinct shapes mirroring
HTML: an image leaf (role: "image" + image_id) for foreground images
and an optional background_image_id on any container for backdrops.
Lower the default min_side image filter to 10 so visually meaningful
small content (signs, labels) reaches the structurer, allow the LLM
to omit images embedded in the PDF but not visible on the page, and
disable the self-review refinement loop by default for one-shot
structuring.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant