Skip to content

Latest commit

 

History

History
283 lines (178 loc) · 15 KB

File metadata and controls

283 lines (178 loc) · 15 KB

Localization (L10n) Developer Guide

PauseAI is multilingual. The website ships English source content and lets an LLM-powered pipeline generate translations into the configured locales. This guide tells you what you need to know — proportional to how much you're going to touch the system.

Status note (April 2026): no locales are currently enabled in default-settings.js on main, so production and main-targeted deploy previews are English-only. The l10n pipeline is exercised on dedicated preview branches (l10n-pl, l10n-es). Locales will go live in main in a later change.

TL;DR for Content Authors

You almost certainly don't need to read the rest of this document.

When you push a branch or open a PR:

  • CI automatically runs the l10n pipeline against your branch
  • Your Netlify deploy preview will include translations for all production locales
  • You edit content in English; the rest happens for you

You only need to keep reading if:

  • You're adding a new locale to production
  • You're hitting an l10n-related build failure
  • You're working on the l10n system itself

Concepts

  • Locales: language/region combinations (en, de, nl, pl, es, …)
  • L10n Cage: a separate Git repository (PauseAI/paraglide) cloned locally as l10n-cage/. It stores every generated translation as a versioned cache so we don't pay the LLM for the same content twice.
  • Modes: the script picks one of three modes based on configuration:
    • en-only (no LLM, default for local dev)
    • dry-run (build a plan and estimate cost, no LLM calls)
    • perform (call LLMs for any pending items)

Commands

Most developers will only ever invoke l10n indirectly:

pnpm dev     # development server, en-only on dev machines
pnpm build   # production build, runs l10n in whatever mode is configured

If you do need to run l10n directly:

pnpm l10n            # default mode (usually en-only locally)
pnpm l10n --dryRun   # plan and estimate cost, no LLM calls
pnpm l10n --verbose  # detailed file-by-file output

Flags

Flag Purpose
--dryRun Skip LLM calls, just plan and estimate (camelCase, not --dry-run)
--verbose Detailed file-by-file logging
--force <file> [<file>...] Re-translate listed files even if cache appears fresh
--spend N Authorize up to $N for this run (overrides defaults)

--force takes a list of filenames (basenames). It does not accept glob patterns. Its main use is repairing a cage entry that was silently broken — see Translation Quality Guardrails below.

Environment

PARAGLIDE_LOCALES=en              # English only (default for local dev)
PARAGLIDE_LOCALES=en,nl,de        # Specific locales
PARAGLIDE_LOCALES=all             # All available locales (default in CI)
PARAGLIDE_LOCALES=-fr,es          # All except French and Spanish

L10N_OPENROUTER_API_KEY=...       # Required for perform mode

For preview branches named l10n-XX (two-letter locale code), the script auto-derives the locale when PARAGLIDE_LOCALES is unset, so pushing to l10n-pl builds for Polish provided pl is in default-settings.js.

Adding a New Locale

Test it on a preview branch first

  1. Create a test branch named l10n-XX:

    git checkout -b l10n-fr origin/main
  2. Add the locale to project.inlang/default-settings.js:

    locales: ['en', 'de', 'nl', 'fr']
  3. (Optional) inspect cost locally before pushing:

    PARAGLIDE_LOCALES=en,fr pnpm l10n --dryRun --verbose
  4. Push to trigger the preview build:

    git push -u origin l10n-fr

    Your preview will be at https://l10n-fr--pauseai.netlify.app. The l10n cage will mirror your branch under the same name.

Promote to production

Once you're happy with the preview:

  1. Add the locale to default-settings.js on main
  2. Choose a translation strategy:
    • Fresh: let production regenerate translations from scratch
    • Preserve tested translations: first merge the cage's l10n-XX branch into the cage's main branch, then production will use what you already reviewed

Troubleshooting

Skipping <file>: LLM response incomplete (finish_reason: length)

  • The model output got truncated. Build still succeeds, but the file is left un-translated and renders English.
  • Fix: pnpm l10n --force <file> to retry. If it persists, the file may need splitting.

Estimated cost $X exceeds spend limit $Y

  • The work plan wants more than the local ($0.10) or CI ($0.50) cap allows.
  • Fix: re-run with --spend N where N ≥ the estimate, after sanity-checking that you actually meant to do that much work.

Cannot write to main branch

  • Branch safety prevents local writes to the cage's main branch.
  • Fix: work on a feature branch in the website repo, or use --dryRun.

API key too short

  • Set a valid L10N_OPENROUTER_API_KEY (10+ characters).

Git push authentication failed

  • Configure a GitHub Personal Access Token, or use --dryRun.

Page renders English on a translated branch

  • Possible causes: file is in the cage but truncated (see the cage cache footgun), or never translated, or the locale isn't enabled in default-settings.js.
  • Inspection: compare l10n-cage/md/<locale>/<file>.md against the source in src/posts/. If it's noticeably shorter or cut off mid-tag, force-retranslate it.

For deeper debugging, pnpm l10n --dryRun --verbose shows mode determination, cage state, and the per-file plan with reasons.


Inside the L10n System

The rest of this document is for people working on the l10n pipeline itself, or on a build failure that turned out to need real diagnosis. It explains how the pieces fit together, why they're shaped that way, and where the known footguns live.

L10n Cage Architecture

The l10n cage is a Cache Adopting Git's Engine: a separate Git repository at PauseAI/paraglide, cloned locally as l10n-cage/. Each translation is a tracked file under version control.

l10n-cage/
├── json/                       # Aggregated short messages
│   ├── de.json
│   ├── nl.json
│   └── ...
├── md/                         # Localized markdown pages
│   ├── de/
│   │   ├── faq.md
│   │   ├── proposal.md
│   │   └── ...
│   └── ...
└── work/                       # Work plans and completion records
    ├── todo-pl.json            # Pending plan for the pl locale (usually empty)
    ├── todo-es.json
    └── 2026-04-07T1514.json    # Completion record from a past run

Why a separate repo? It gives us version control and audit trail of every translation decision, branch isolation between feature development, and free deduplication of LLM work — but keeps the noise and bulk of generated content out of the website's history.

The l10n script clones the cage on first run, fetches and pulls on subsequent runs, and after a successful perform run it commits and pushes incrementally (after each batch) so partial progress survives a build timeout.

Branch Safety

Cage and website are linked by branch name. If your website branch is my-feature, the script uses (or creates) the cage's my-feature branch. This isolates l10n work between concurrent development streams.

To prevent accidental writes to production translations, the script will not let local development write to the cage's main branch. Work on a feature branch in the website repo and you'll get a matching cage branch automatically. CI is allowed to write to main because that's how production deploys land.

The branch-derived locale logic (branchLocale() in src/lib/env.ts) extends the same pattern to locales: a website branch named l10n-XX (two-letter locale code) auto-sets PARAGLIDE_LOCALES=XX.

Work Plan System

A perform run splits into three phases:

Phase 1: Plan

Walks the source tree, compares each source file against its cached translation in the cage by git commit date, and assembles a list of WorkItems describing what needs (re-)translating. The plan is written to l10n-cage/work/todo-{locales}.json so it can be inspected or resumed. Per-locale-set todo files prevent two concurrent runs (e.g. pl and es) from clobbering each other's plans.

The plan summary in build output looks like:

Messages: 1 cached, 0 need translation
Markdown: 99 cached, 1 need translation

Work plan: 1 items, estimated $0.03
Model: meta-llama/llama-3.3-70b-instruct:nitro
Branch: l10n-pl
  pl: 1 files, $0.03

Phase 2: Spend Gate

If the plan's estimated cost exceeds the spend limit (default $0.10 locally, $0.50 in CI), the script aborts with a message telling you the exact --spend N value needed to proceed. This protects against runaway runs — for example, someone blowing away the cage and triggering a full re-translation by accident.

Phase 3: Execute

Calls the LLM for each plan item in sequence (BATCH_SIZE=1 currently). Commits the cage after each batch and pushes incrementally. Records completion in l10n-cage/work/<timestamp>.json.

Why split planning and execution at all? Because LLM calls are expensive enough that you want a chance to look at the bill before paying it, and because executing as a separate phase makes it possible to resume after a failure without re-doing the whole plan walk.

Cost Visibility

Billing Snapshot

Each perform run captures Limit Remaining from OpenRouter before any LLM work, and again at the end (in a try/finally so it fires on success, exception, or SIGTERM from a Netlify build timeout). The delta is printed at the end of the l10n stage:

💰 OpenRouter spend this run: $0.0144
   ($27.9154 → $27.9009 remaining)

If either query fails, the line falls back to "unable to determine".

This is the ground truth for how much a run cost. Everything else (the planner's estimate, the cost-per-1000-words constant) is a prediction.

Estimator vs Actual

The cost estimator in scripts/l10n/dry-run.ts uses a single COST_PER_1000_WORDS figure. It's a crude predictor: the model entry is named for the bare meta-llama/llama-3.3-70b-instruct, but in practice we route via :nitro (faster providers, more expensive). The current calibration was derived from a single real measurement (April 2026 l10n-es full build, 58 successful items, ratio ~4.3× over the previous theoretical figure).

Treat the estimator output as an order-of-magnitude guide, not a quote. The billing snapshot in the build log is the ground truth, and recalibration will get easier as more snapshots accumulate.

Translation Quality Guardrails

LLMs sometimes return broken translations: truncated mid-output, content-filtered, or just garbage. The framework defends in three layers, each catching a different failure mode.

1. finish_reason Guard

scripts/l10n/llm-client.ts rejects any response where finish_reason !== 'stop' — most importantly length (output token cap hit, response truncated) and content_filter. The error is thrown rather than silently accepting partial content.

2. Skip Individual Failures

If one file fails (truncation, garbage, or any other LLM error), scripts/l10n/heart.ts logs a warning and skips to the next file rather than aborting the whole batch:

⚠️  Skipping local-organizing.md (es): LLM response incomplete (finish_reason: length). Output may be truncated or filtered.

The build still succeeds. The skipped file remains un-translated in the cage, so the production build serves the English fallback for that page until it's manually fixed with --force.

3. The Cache-by-Commit-Date Footgun

The cache freshness check uses git commit dates: if the cage's translation file has a newer commit date than the source file, the cache entry is considered fresh and not re-translated.

This means a translation that was committed broken in the past (e.g. before the finish_reason guard existed, or as a force-translation of half a file) will sit in the cage forever, looking fresh, until something else trips on it. We hit this in April 2026 when a vite upgrade exposed a 12-day-old truncated markdown file that had been silently broken all that time — the build had been "succeeding" while quietly serving a half-finished page.

The fix is pnpm l10n --force <filename> to re-translate the bad cage entry. The finish_reason guard now ensures the new attempt either succeeds cleanly or fails loudly. After the cage commit lands, future builds will use the fresh content.

This footgun is a strong argument for the request-keyed cache discussed in Ongoing Evolution below.

Ongoing Evolution

This system is mid-evolution. A few directions we expect to explore:

Alternative models. We currently use meta-llama/llama-3.3-70b-instruct:nitro via OpenRouter. Translation quality is acceptable but we have no formal comparison against other models. The deploy permalinks for the current Polish and Spanish baselines are recorded so they can be reviewed side-by-side against, e.g., a Gemini Flash run on the same content. That comparison work is pending.

Request-keyed cache. The current cache freshness check is heuristic: it compares commit dates of source and translation. This is fast but lossy (the cache-by-commit-date footgun). A cleaner alternative is to key cache entries by a hash of the actual translation request (source content + prompt + model + locale). That eliminates the heuristic — a translation is fresh iff the exact request would produce it again — and removes the silent-truncation hazard entirely. Not yet committed work, but a serious candidate the next time the cache layer is touched.

Cost calibration. The current cost estimator is calibrated from a single data point. As more billing snapshots accumulate in CI logs, the calibration constant should be revisited — and the code that bridges the bare model name with the :nitro-routed reality should be tidied at the same time.

Production Considerations

The l10n system is designed to run automatically in CI/CD with minimal supervision:

  • Defaults to all locales in CI (no override needed)
  • Production API key comes from environment
  • Branch deploys use the branchLocale() derivation, so l10n-XX previews need no special config
  • Build timeouts survive thanks to incremental cage pushes and the SIGTERM billing snapshot
  • Spend gate prevents budget surprises

The CI spend cap ($0.50) is set just above one full re-translation of all current locales. If a build hits it, that's a signal to investigate before just bumping the cap.