Skip to content

Yesterday-AI/paperize

Repository files navigation

Paperize

Distill unstructured sources into qualified goals via AI.

npm version CI License Node.js


Point Paperize at a folder of notes, ideas, research, markdown files — and it generates actionable goals, ready for any project management system.

Works with Obsidian vaults, Zettelkasten collections, research dumps, brainstorm folders, or any pile of text files. Handles hundreds of files through an intelligent map-reduce pipeline that never truncates your content.

One command: npx paperize --source ~/notes


Table of Contents


Quick Start

npx paperize --source ~/notes

That's it. Scans your folder, calls Claude, outputs goals as JSON.

Requires an ANTHROPIC_API_KEY — pass it inline or export it:

# Inline
ANTHROPIC_API_KEY=sk-ant-... npx paperize --source ~/notes

# Or export once
export ANTHROPIC_API_KEY=sk-ant-...

Install

npx paperize              # run directly (no install)
npm i -g paperize         # or install globally -> paperize

Requires Node.js 20+.


Usage

Headless mode (non-interactive)

Pass --source to skip the wizard. No TTY required — fully scriptable.

# Scan and generate goals
paperize --source ~/notes

# Steer the AI with guiding context
paperize --source ~/research --context "Focus on SaaS product ideas"

# Read context from a file
paperize --source ~/research --context-file brief.md

# Save output in different formats
paperize --source ./ideas --output goals.json
paperize --source ./ideas --output goals.md --format markdown
paperize --source ./ideas --output goals.yaml --format yaml

# Control creativity level
paperize --source ~/notes --vibe wild         # more ideas, speculative goals
paperize --source ~/notes --vibe focused      # strict, high-confidence only

# Dry run — scan only, no AI
paperize --source ~/notes --dry-run

# Use a different model
paperize --source ~/notes --model claude-opus-4-6

Interactive mode (TUI wizard)

paperize

The wizard walks you through six steps:

$ paperize

  Paperize — Goal distillation from unstructured sources

  Step 1 of 6 — Source
  Enter path to source folder: ~/notes

  Step 2 of 6 — Scan
  Found 247 files (1.8 MB, 892K chars)
  .md: 201  .txt: 38  .yaml: 8

  Step 3 of 6 — Context
  Add guiding context (optional): Focus on product roadmap items

  Step 4 of 6 — Analyze
  Strategy: map-reduce (9 batches)
  ✓ Batch 1/9 — extracted 12 ideas
  ✓ Batch 2/9 — extracted 8 ideas
  ...
  ✓ Synthesized 73 ideas into 7 goals

  Step 5 of 6 — Goals
  ❯ ✓ Build a real-time collaboration engine
    ✓ Implement usage-based billing system
    ✓ Design onboarding flow for enterprise users
    ...

  Step 6 of 6 — Done
  ✓ Wrote 7 goals to goals.json

How It Works

Paperize automatically chooses the right strategy based on source size:

Small sources (< 150K chars)

Single-shot — All files are combined into one document and sent to Claude in a single API call. Fast and cost-effective.

Large sources (> 150K chars)

Map-reduce pipeline — A two-phase approach that handles arbitrarily large sources without truncation:

┌──────────────────────────────────────────────────────┐
│                    Source files                      │
│           (hundreds/thousands of files)              │
└──────────┬───────────┬───────────┬───────────────────┘
           │           │           │
     ┌─────▼─────┐ ┌───▼───┐ ┌────▼────┐
     │  Batch 1  │ │ Bat 2 │ │ Batch N │   Phase 1: Extract
     │  ~100K ch │ │       │ │         │   (parallel, up to 3)
     └─────┬─────┘ └───┬───┘ └────┬────┘
           │           │           │
           │    ideas + weights    │
           │     + attribution     │
           └───────────┼───────────┘
                       │
                ┌──────▼──────┐
                │  Synthesize │                Phase 2: Synthesize
                │  cluster &  │                (single call)
                │  prioritize │
                └──────┬──────┘
                       │
                ┌──────▼──────┐
                │    Goals    │
                │  title +    │
                │  description│
                └─────────────┘
  1. Extract — Files are split into ~100K-char batches. Each batch is processed in parallel to extract atomic ideas with source attribution and weight (strong/weak).

  2. Synthesize — All extracted ideas are merged and clustered into coherent goals. Strong ideas are prioritized. Related ideas from different batches are combined.


Goal Format

Each goal is self-contained and independently actionable:

{
  "title": "Build a real-time collaboration engine",
  "description": "Context: Several notes mention the need for...\n\nScope: ...\n\nSuccess criteria: ..."
}
Field Description
title Concise, imperative voice, max ~80 chars
description Context (why), scope (what), success criteria (how to measure) — max 2000 words

Output formats

JSON (default) — Array of goal objects.

Markdown — Each goal as an ## H2 with description body.

YAML — Structured YAML document with properly escaped strings.


Options

Source

Flag Description Default
--source <path> Path to folder with source material (wizard prompt)
--context <text> Guiding context/prompt for goal generation
--context-file <path> Read guiding context from a file

AI

Flag Description Default
--model <model> Claude model for analysis claude-sonnet-4-6
--max-goals <n> Maximum goals to generate 10
--vibe <level> Creativity level: focused, balanced, wild balanced

Vibe levels:

Vibe Extraction Synthesis Typical goals
focused Only clear, actionable ideas Merge aggressively, strong evidence only 1–5
balanced Standard extraction Balanced merging 5–15
wild Everything — speculative, half-baked, creative leaps Preserve breadth, let unusual ideas stand alone 10–20

Output

Flag Description Default
--output <path> Write goals to file stdout
--format <fmt> Output format: json, markdown, yaml json
--dry-run Scan files only, skip AI analysis off

Environment variables

Variable Description
ANTHROPIC_API_KEY Required for AI analysis

Supported File Types

.md .txt .text .markdown .org .rst .adoc .csv .json .yaml .yml .xml .html .htm

Skips: hidden directories, node_modules, .obsidian, .trash, __pycache__. Max 512 KB per file.


Development

git clone https://github.com/Yesterday-AI/paperize.git
cd paperize
npm install
npm run build        # esbuild: src/cli.jsx -> dist/cli.mjs
npm test             # node --test src/logic/*.test.js
npm run lint         # eslint src/
npm run format       # prettier --write src/

License

MITYesterday

About

CLI that distills unstructured text sources into qualified goals via AI — Ink TUI + headless mode

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages