|
| 1 | +# Release Notes |
| 2 | + |
| 3 | +## Version 0.2.0 (2025-01-20) |
| 4 | + |
| 5 | +This release marks a significant milestone with comprehensive document processing capabilities and architectural maturity. |
| 6 | + |
| 7 | +### Major Features |
| 8 | + |
| 9 | +- **PDF Document Support:** Native processing via Claude API with joint text and visual analysis (32MB limit) |
| 10 | +- **Microsoft Office Support:** Automatic conversion of .docx and .pptx files with intelligent caching |
| 11 | +- **Image Processing:** Vision API integration with automatic resizing and validation |
| 12 | +- **Enhanced Documentation:** Complete user guides for document types and LibreOffice setup |
| 13 | + |
| 14 | +### Core Architecture & Unique Innovations |
| 15 | + |
| 16 | +#### 1. Git-like Project Discovery with Configuration Cascade |
| 17 | + |
| 18 | +The tool implements a sophisticated multi-tier configuration cascade (global → ancestors → project → workflow → CLI) with **pass-through inheritance**. Empty values automatically inherit from parent tiers, while explicit values override and become decoupled. This enables centralized defaults that cascade down but can be overridden at any level. Nested projects automatically inherit ALL ancestor configurations in the hierarchy. |
| 19 | + |
| 20 | +**Innovation:** Unlike most tools with simple config hierarchies, this provides transparent inheritance where changing a global default automatically affects all empty configs downstream. |
| 21 | + |
| 22 | +#### 2. Semantic Content Separation |
| 23 | + |
| 24 | +Distinguishes between **INPUT documents** (primary materials to analyze/transform) and **CONTEXT materials** (supporting information) with three aggregation methods: glob patterns, explicit file lists, and workflow dependencies. |
| 25 | + |
| 26 | +**Innovation:** This semantic separation enables precise control over what the AI analyzes vs what provides background, with automatic ordering optimization. |
| 27 | + |
| 28 | +#### 3. Workflow Chaining and Dependencies |
| 29 | + |
| 30 | +Workflows can declare dependencies on other workflow outputs via `DEPENDS_ON`. Outputs are managed via hardlinks for efficient storage and atomic updates. Cross-format dependencies work seamlessly (JSON → Markdown → HTML pipelines). |
| 31 | + |
| 32 | +**Innovation:** Creates a DAG of processing stages where outputs automatically feed as context into dependent workflows, enabling complex multi-stage pipelines. |
| 33 | + |
| 34 | +#### 4. Dual Execution Modes |
| 35 | + |
| 36 | +- **Run mode:** Persistent workflows with configuration, context, dependencies, outputs |
| 37 | +- **Task mode:** Lightweight one-off execution without workflow directories |
| 38 | + |
| 39 | +Both modes share execution logic but optimized for different use cases. |
| 40 | + |
| 41 | +**Innovation:** One tool handles both persistent iterative development and quick ad-hoc queries. |
| 42 | + |
| 43 | +#### 5. Advanced Document Processing |
| 44 | + |
| 45 | +- Automatic detection and processing of PDFs (32MB limit, ~2000 tokens/page) |
| 46 | +- Office file conversion (.docx, .pptx) via LibreOffice with smart caching |
| 47 | +- Vision API support for images (5MB limit, automatic resizing, base64 encoding) |
| 48 | +- Text files with embedded metadata |
| 49 | + |
| 50 | +**Innovation:** Unified handling of text, PDFs, Office files, and images with automatic format detection, conversion, caching, and optimal ordering. |
| 51 | + |
| 52 | +#### 6. Prompt Caching Architecture |
| 53 | + |
| 54 | +JSON-first content block architecture with strategic cache breakpoint placement (max 4) at semantic boundaries. Stable-to-volatile ordering: system prompts → project descriptions → PDFs → text → images → task. Date-only timestamps (not datetime) to prevent minute-by-minute cache invalidation. |
| 55 | + |
| 56 | +**Innovation:** Sophisticated caching strategy that can achieve 90% cost reduction by carefully ordering content from most stable (system prompts) to most volatile (task), with PDFs placed before text per Anthropic optimization guidelines. |
| 57 | + |
| 58 | +#### 7. Citations Support |
| 59 | + |
| 60 | +Optional Anthropic citations API support via `--enable-citations` flag. Generates document map for citable sources (text and PDFs, not images). Parses citation responses and formats them appropriately. Creates sidecar citations files for reference. |
| 61 | + |
| 62 | +**Innovation:** Enables AI-generated content with proper source attribution. |
| 63 | + |
| 64 | +### Complete Feature List |
| 65 | + |
| 66 | +#### Core Workflow Management |
| 67 | + |
| 68 | +- `init` - Initialize project with .workflow/ structure |
| 69 | +- `new` - Create workflows with XML task skeleton |
| 70 | +- `edit` - Open workflow/project files in editor |
| 71 | +- `config` - View configuration cascade with source tracking |
| 72 | +- `run` - Execute workflows with full context aggregation |
| 73 | +- `task` - Lightweight one-off task execution |
| 74 | +- `cat` - Display output to stdout |
| 75 | +- `open` - Open output in default application (macOS) |
| 76 | +- `list` - List all workflows in project |
| 77 | + |
| 78 | +#### Configuration System |
| 79 | + |
| 80 | +- Multi-tier cascade: global → ancestors → project → workflow → CLI |
| 81 | +- Pass-through inheritance (empty values inherit, non-empty override) |
| 82 | +- Nested project support with automatic ancestor discovery |
| 83 | +- Subshell isolation for safe config sourcing |
| 84 | +- Source tracking (shows where each value comes from) |
| 85 | +- Cross-platform editor detection |
| 86 | + |
| 87 | +#### Context Aggregation |
| 88 | + |
| 89 | +- Three methods: glob patterns, explicit file lists, workflow dependencies |
| 90 | +- Semantic separation: INPUT vs CONTEXT materials |
| 91 | +- Project-relative paths in configs, PWD-relative in CLI |
| 92 | +- Brace expansion and recursive glob patterns |
| 93 | +- Automatic file type detection (text, PDF, Office, images) |
| 94 | + |
| 95 | +#### Document Processing |
| 96 | + |
| 97 | +- PDFs: Native support via Claude PDF API (32MB limit) |
| 98 | +- Office files: Automatic conversion via LibreOffice (.docx, .pptx) |
| 99 | +- Images: Vision API support with validation, resizing, caching |
| 100 | +- Text files: Multiple formats with metadata embedding |
| 101 | +- Smart caching with mtime validation |
| 102 | + |
| 103 | +#### API Integration |
| 104 | + |
| 105 | +- Anthropic Messages API with streaming and batch modes |
| 106 | +- Prompt caching with strategic breakpoint placement |
| 107 | +- Token estimation: dual approach (heuristic + exact API count) |
| 108 | +- Citations support with document mapping |
| 109 | +- Dry-run mode for prompt inspection |
| 110 | +- Large payload handling via jq --slurpfile |
| 111 | + |
| 112 | +#### Output Management |
| 113 | + |
| 114 | +- Automatic timestamped backups |
| 115 | +- Hardlinked copies for convenient access |
| 116 | +- Format-specific post-processing (mdformat, jq) |
| 117 | +- Multiple output formats (md, json, txt, html, etc.) |
| 118 | +- Atomic updates with trap-based cleanup |
| 119 | + |
| 120 | +#### Safety and Robustness |
| 121 | + |
| 122 | +- Automatic backups before overwriting |
| 123 | +- Atomic file operations |
| 124 | +- Trap-based cleanup on exit |
| 125 | +- Subshell isolation for config extraction |
| 126 | +- Git-like project boundary detection (stops at $HOME) |
| 127 | + |
| 128 | +#### Developer Experience |
| 129 | + |
| 130 | +- XML task skeleton with structured sections |
| 131 | +- Named task templates (reusable task definitions) |
| 132 | +- Token estimation before API calls |
| 133 | +- Comprehensive help system (git-style) |
| 134 | +- 205+ test suite using Bats |
| 135 | +- Detailed documentation with MkDocs |
| 136 | + |
| 137 | +### Current API Support & Extensibility |
| 138 | + |
| 139 | +#### Anthropic API Support |
| 140 | + |
| 141 | +- Messages API (single and streaming) |
| 142 | +- Prompt Caching API (ephemeral cache control) |
| 143 | +- Token Counting API (exact token estimation) |
| 144 | +- PDF API (native document support) |
| 145 | +- Vision API (image processing) |
| 146 | +- Citations API (source attribution) |
| 147 | + |
| 148 | +#### Extensibility Architecture |
| 149 | + |
| 150 | +**Modular library structure** (lib/): |
| 151 | +- `api.sh` - API interaction |
| 152 | +- `config.sh` - Configuration loading |
| 153 | +- `core.sh` - Subcommand implementations |
| 154 | +- `execute.sh` - Shared execution logic |
| 155 | +- `utils.sh` - File processing utilities |
| 156 | +- `task.sh` - Task mode logic |
| 157 | +- `help.sh` - Help text |
| 158 | +- `edit.sh` - Editor selection |
| 159 | + |
| 160 | +**Configuration as code:** |
| 161 | +- Config files are bash scripts (can include logic) |
| 162 | +- Easy to extend with new variables |
| 163 | +- Pass-through mechanism scales automatically |
| 164 | + |
| 165 | +**Content block architecture:** |
| 166 | +- JSON-first design |
| 167 | +- Each file becomes a separate content block |
| 168 | +- Enables future format support (e.g., more document types) |
| 169 | +- Custom converter for pseudo-XML convenience views |
| 170 | + |
| 171 | +**Custom system prompts:** |
| 172 | +- User-definable prompts in ~/.config/workflow/prompts/ |
| 173 | +- Composable via SYSTEM_PROMPTS array |
| 174 | +- Project and workflow overrides |
| 175 | + |
| 176 | +**Named task templates:** |
| 177 | +- Reusable task definitions in ~/.config/workflow/tasks/ |
| 178 | +- Shareable across projects |
| 179 | + |
| 180 | +### Target Users & Use Cases |
| 181 | + |
| 182 | +#### Primary Audience |
| 183 | + |
| 184 | +Technical users with high AI/LLM familiarity and shell proficiency who need: |
| 185 | +- Reproducible AI workflows |
| 186 | +- Complex multi-stage processing pipelines |
| 187 | +- Project-aware context management |
| 188 | +- Cost-effective prompt caching |
| 189 | +- Integration with existing tools |
| 190 | + |
| 191 | +#### Key User Profiles |
| 192 | + |
| 193 | +**Research Scientists:** |
| 194 | +- Multi-stage analysis pipelines |
| 195 | +- Paper writing with context management |
| 196 | +- Data analysis and visualization workflows |
| 197 | +- Citation tracking for AI-generated content |
| 198 | + |
| 199 | +**Software Developers:** |
| 200 | +- Code review workflows |
| 201 | +- Documentation generation |
| 202 | +- Multi-file refactoring analysis |
| 203 | +- Architecture and design assistance |
| 204 | + |
| 205 | +**Technical Writers:** |
| 206 | +- Structured content generation |
| 207 | +- Document transformation (Office → PDF → Markdown) |
| 208 | +- Multi-stage editing workflows |
| 209 | +- Cross-referencing and citations |
| 210 | + |
| 211 | +**Data Scientists:** |
| 212 | +- Exploratory data analysis |
| 213 | +- Report generation from datasets |
| 214 | +- Multi-format output (JSON, Markdown, HTML) |
| 215 | +- Pipeline orchestration |
| 216 | + |
| 217 | +#### Enabled Workflows |
| 218 | + |
| 219 | +**Research Pipelines:** |
| 220 | +Context gathering from PDFs/papers → Outline generation → Section drafting → Review/refinement |
| 221 | + |
| 222 | +**Code Analysis:** |
| 223 | +Codebase exploration → Architecture analysis → Documentation generation → Review |
| 224 | + |
| 225 | +**Data Processing:** |
| 226 | +Data ingestion → Exploratory analysis → Statistical testing → Report generation → Visualization |
| 227 | + |
| 228 | +**Content Creation:** |
| 229 | +Research/context → Outline → Draft → Edit → Format conversion |
| 230 | + |
| 231 | +**Iterative Development:** |
| 232 | +Initial attempt → Review output → Refine task → Re-run (with automatic backup) |
| 233 | + |
| 234 | +### Pain Points Solved |
| 235 | + |
| 236 | +**Context Management Complexity:** |
| 237 | +- **Problem:** Managing large codebases/datasets as context for AI |
| 238 | +- **Solution:** Glob patterns, explicit files, and workflow dependencies with semantic separation |
| 239 | + |
| 240 | +**Cost of Repetitive Prompts:** |
| 241 | +- **Problem:** Paying for same system prompts and context repeatedly |
| 242 | +- **Solution:** Prompt caching with 90% cost reduction on cached content |
| 243 | + |
| 244 | +**Configuration Sprawl:** |
| 245 | +- **Problem:** Repeating settings across projects and workflows |
| 246 | +- **Solution:** Pass-through cascade enables change-once, affect-many |
| 247 | + |
| 248 | +**Multi-Stage Processing:** |
| 249 | +- **Problem:** Manually copying outputs between stages |
| 250 | +- **Solution:** Workflow dependencies with automatic context passing |
| 251 | + |
| 252 | +**Output Management:** |
| 253 | +- **Problem:** Losing previous versions when iterating |
| 254 | +- **Solution:** Automatic timestamped backups with hardlinked access |
| 255 | + |
| 256 | +**Format Fragmentation:** |
| 257 | +- **Problem:** Different tools for PDFs, images, Office files, text |
| 258 | +- **Solution:** Unified document processing with automatic detection and conversion |
| 259 | + |
| 260 | +**One-Off vs Persistent Workflows:** |
| 261 | +- **Problem:** Need both quick queries and persistent workflows |
| 262 | +- **Solution:** Dual execution modes optimized for each use case |
| 263 | + |
| 264 | +**Source Attribution:** |
| 265 | +- **Problem:** AI-generated content lacks proper citations |
| 266 | +- **Solution:** Citations API integration with document mapping |
| 267 | + |
| 268 | +### Unique Selling Points vs Other AI CLI Tools |
| 269 | + |
| 270 | +**vs aider/cursor/copilot:** |
| 271 | +- Not code-focused; document and workflow-focused |
| 272 | +- Multi-stage pipelines with dependencies |
| 273 | +- Sophisticated context aggregation beyond current file |
| 274 | +- Prompt caching for cost optimization |
| 275 | + |
| 276 | +**vs chatgpt-cli/claude-cli:** |
| 277 | +- Persistent workflows with configuration |
| 278 | +- Project-aware with automatic discovery |
| 279 | +- Multi-tier config cascade |
| 280 | +- Workflow chaining and dependencies |
| 281 | +- Native document processing (PDFs, Office, images) |
| 282 | + |
| 283 | +**vs custom scripts:** |
| 284 | +- Structured configuration system |
| 285 | +- Built-in prompt caching |
| 286 | +- Safe output management |
| 287 | +- Cross-platform compatibility |
| 288 | +- Comprehensive documentation and testing |
| 289 | + |
| 290 | +#### Key Differentiators |
| 291 | + |
| 292 | +1. **Configuration sophistication:** Pass-through cascade is unique |
| 293 | +2. **Workflow chaining:** DAG-based pipeline orchestration |
| 294 | +3. **Document processing breadth:** Unified handling of text, PDFs, Office, images |
| 295 | +4. **Cost optimization:** Strategic prompt caching architecture |
| 296 | +5. **Git-like UX:** Familiar project discovery and structure |
| 297 | +6. **Dual execution modes:** Both persistent and ad-hoc in one tool |
| 298 | +7. **Citations support:** Source attribution for generated content |
| 299 | + |
| 300 | +### Technical Highlights |
| 301 | + |
| 302 | +- **205+ comprehensive test suite** using Bats testing framework |
| 303 | +- **Modular architecture** with 8 separate library modules |
| 304 | +- **Cross-platform support** (macOS, Linux, WSL) |
| 305 | +- **Safe execution** with atomic operations and automatic backups |
| 306 | +- **Efficient caching** for images and Office file conversions |
| 307 | +- **Robust error handling** with graceful degradation |
| 308 | + |
| 309 | +### Installation & Requirements |
| 310 | + |
| 311 | +**Required:** |
| 312 | +- Bash 4.0+ |
| 313 | +- curl |
| 314 | +- jq |
| 315 | +- Anthropic API key |
| 316 | + |
| 317 | +**Optional:** |
| 318 | +- LibreOffice (for .docx/.pptx support) |
| 319 | +- ImageMagick (for optimal image resizing) |
| 320 | +- mdformat (for Markdown formatting) |
| 321 | + |
| 322 | +### What's Next |
| 323 | + |
| 324 | +This pre-release (0.2.0) represents a mature foundation with comprehensive document processing. The tool is production-ready for personal and research use. Future directions may include: |
| 325 | + |
| 326 | +- Additional API provider support (OpenAI-compatible endpoints) |
| 327 | +- Batch API integration for cost-effective processing |
| 328 | +- Enhanced citation formatting options |
| 329 | +- Additional document format support |
| 330 | +- Web-based workflow visualization |
| 331 | + |
| 332 | +### Commits in This Release |
| 333 | + |
| 334 | +- bc12145: feat: Add PDF document support with optimized API ordering |
| 335 | +- 5023475: feat: Add Microsoft Office file support with PDF conversion |
| 336 | +- 45cac86: docs: Add comprehensive document type support documentation |
| 337 | +- 8af897e: fix: Correct context aggregation order in execution guide |
| 338 | + |
| 339 | +### Acknowledgments |
| 340 | + |
| 341 | +This tool represents a sophisticated approach to AI workflow management, built with attention to cost optimization, reproducibility, and developer experience. Special thanks to the Anthropic team for their excellent API documentation and the LibreOffice project for enabling Office file conversion. |
0 commit comments