Skip to content

fix(notebook): stop streaming jumpiness — disable smooth markdown + stable adapter#695

Open
Movm wants to merge 6 commits intomasterfrom
fix/notebook-chat-jumpiness
Open

fix(notebook): stop streaming jumpiness — disable smooth markdown + stable adapter#695
Movm wants to merge 6 commits intomasterfrom
fix/notebook-chat-jumpiness

Conversation

@Movm
Copy link
Copy Markdown
Collaborator

@Movm Movm commented Apr 26, 2026

Summary

Notebook chat answers visibly jump up and down during streaming on long, citation-dense responses. Two unrelated causes were compounding:

  1. MarkdownTextPrimitive smooth-streaming. assistant-ui's `smooth=true` default re-buffers SSE chunks and reveals them character-by-character via `requestAnimationFrame`. With notebook's citation-dense answers (placeholder `` badges via `processChildren(text, citationMap, true)`), this caused continuous line-wrap recalculation — observable as +400/-500 pixel oscillations mid-stream. Diagnosed via a temporary `ResizeObserver`-based viewport jump monitor: identical `+1256 / −1252` cancellation pairs landed on every `useSmooth` animation frame.
  2. `NotebookChatProvider.getConfig` recreated the adapter on every prop identity change. `useCallback` had seven deps; any upstream churn (e.g. `config.collections` rebuilt by `getNotebookConfig` per render) recreated the adapter, which forced `useLocalRuntime` to reinitialize and reset scroll/streaming state. Logged in console as the existing `[Notebook] ⚠ Adapter RECREATED` warning, firing before the first request.

General chat is unaffected — its answers are shorter, less citation-dense, and its provider doesn't churn the adapter the same way.

Fix

  • New `MarkdownStreamingContext`: per-thread switch for `smooth`. Default `true` (matches assistant-ui, preserves typewriter feel for general chat). `NotebookChatProvider` overrides to `false`. `CitationMarkdownText` reads from context.
  • Stabilized `getConfig`: same ref-pattern already in use for `threadId`/`getFilters`/`onComplete`. `getConfig` is now a `useCallback([])` that snapshots all inputs from refs at request time — adapter is created exactly once per provider mount.

Files

  • `packages/chat/src/context/MarkdownStreamingContext.tsx` (new)
  • `packages/chat/src/runtime/NotebookChatProvider.tsx` (refs + provider wrap)
  • `packages/chat/src/components/message-parts/CitationMarkdownText.tsx` (read context, pass `smooth`)
  • `packages/chat/src/index.ts` (export new context APIs)

Test plan

  • Notebook chat: send a long answer (3k+ chars, many citations). No mid-stream up/down jumps. End-of-stream snap is reduced to a single small delta.
  • Notebook chat: `[Notebook] ⚠ Adapter RECREATED` warning no longer fires before the first request.
  • General chat: typewriter animation still active for incoming chunks.
  • No regression in citation badges (placeholders during stream → real popovers after).

Movm added 6 commits April 25, 2026 19:50
Fixes the admin gate ("Kein Zugriff" page) caused by the legacy /status
handler stripping fields from the user response. The hand-picked literal
omitted is_admin and ~30 other UserProfile fields, so the frontend's
useAuthStore could never see is_admin=true even when the DB row had it.

Replaces the manual pick with toBetterAuthUser() — the same null-strip +
Zod-parse boundary already used by authMiddleware. The new ts-rest
contract enforces the response shape end-to-end, making this regression
class impossible to reintroduce.

- packages/contracts: authStatusContract + authStatusResponseSchema
  reusing userProfileSchema.nullable()
- apps/api: authStatusContractRouter mounted in routes.ts before the
  legacy authRouter; legacy /status removed from authCore.ts
- packages/shared: authStatus added to the typed ContractsClient
- middleware: toBetterAuthUser exported for reuse
…page

Remove the two cards from the StatisticsSection grid so the startpage
focuses on document count and date range. Collapse the grid from
four columns to two so the remaining cards fill the row at all
breakpoints, and shrink the matching skeleton placeholders.
Regolo's gemma4-31b endpoint hangs upstream — every notebook chat with
the default Gemma model just spun forever because the AI SDK has no
built-in first-token deadline. This change:

- Adds litellmFetchWithThinkingDisabled (sibling to regoloThinkingFetch)
  injecting Ollama's `think: false` so LiteLLM-served gemma streams
  content instead of burning its entire token budget on `reasoning`.
- Re-routes the user-facing "Gemma 4" model from Regolo → LiteLLM. Old
  `gemma-regolo` ID is aliased server-side and migrated client-side
  (chatStore v6) to the new `gemma-litellm` ID.
- Adds Qwen 3.6 27B as a selectable model (already in the existing
  Regolo reasoning-stream allowlist, so no extra wiring).
- Introduces a 20s first-token deadline + single-step cross-provider
  fallback (gemma-litellm ↔ gpt-oss-regolo) in responseStreamingService.
  Qwen entries intentionally have no `fallback` field — the
  Chinese-only-when-selected firewall (informed-consent boundary,
  documented in ModelConfig).
- Fixes pre-existing bug: getModel('litellm', modelId) ignored the
  modelId arg and always used LITELLM_DEFAULT_MODEL.

Fallback is silent end-user-side: server emits a `fallback` SSE event,
both runtime adapters log it to the browser console, no UI banner.

Implementation notes:
- streamAndAccumulate / streamAndAccumulateWithReasoning now have a
  shared `wrapWithCompatCatch` factory and an `*OrThrow` internal layer
  used by streamWithFallback. Existing chat router callers see the same
  null-on-failure shape, plus the new deadline + empty-completion
  safety nets for free.
- Single shared deadline across initial-probe iterations (was
  accidentally giving 40s grace via per-call setTimeout).
- Reasoning streamer split into Phase-1 (race vs deadline until first
  text) + Phase-2 (drain without race) — eliminates wasted Promise.race
  microtask hops on every reasoning chunk after first content.
- Uses native AbortSignal.any() (Node 20.3+) instead of a hand-rolled
  composeAbortSignals helper.
…+ stable adapter

Two unrelated causes worked together to make notebook chat answers visibly
jump up and down during streaming:

1. assistant-ui's MarkdownTextPrimitive defaults to smooth=true, which
   re-buffers SSE chunks and reveals them character-by-character via
   requestAnimationFrame. With notebook's citation-dense answers
   (placeholder <sup> badges via processChildren(..., true)), this caused
   continuous line-wrap recalculation — observable as +400/-500 pixel
   oscillations in mid-stream. General chat is fine because its answers
   are shorter and less citation-dense.

2. NotebookChatProvider's getConfig was wrapped in useCallback with seven
   deps. Any upstream prop identity churn (e.g. config.collections rebuilt
   by getNotebookConfig on every render) recreated the adapter, which
   forced useLocalRuntime to reinitialize and reset scroll/streaming
   state. Visible as the "[Notebook] ⚠ Adapter RECREATED" warning firing
   before the first request.

Fix:
- New per-thread MarkdownStreamingContext. Default smooth=true preserves
  the typewriter feel for general chat. NotebookChatProvider sets
  smooth=false. CitationMarkdownText reads from context.
- All getConfig inputs moved behind refs (same pattern already used for
  threadId/getFilters/onComplete). getConfig itself is now a stable
  useCallback([]) that snapshots refs at request time, so the adapter is
  created exactly once per provider mount.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant