Skip to content
This repository was archived by the owner on Feb 11, 2026. It is now read-only.
This repository was archived by the owner on Feb 11, 2026. It is now read-only.

Document expected timeouts and performance characteristics #220

@turbomam

Description

@turbomam

Problem

Users don't know how long operations will take, especially slow ones like PDF conversion (30-60 seconds). This leads to:

  • Users thinking the tool has hung
  • Premature cancellation of slow operations
  • Poor UX when operations take longer than expected

Current Gaps

No documentation exists for:

  • Typical response times for each operation
  • What operations are slow vs fast
  • When to expect timeouts
  • Memory requirements for large files

Proposed Documentation

Performance Table

Add to USERS.md:

Operation Typical Time Notes
DOI Metadata 1-3s Fast, cached by CrossRef
Europe PMC Search 2-5s Depends on result count
Full Text (PMC) 3-10s Depends on article length
PDF → Markdown 30-60s ⚠️ Slow, can be up to 2 min for large PDFs
PDF Text Extraction 10-30s Variable, depends on PDF complexity
Identifier Conversion 2-5s Usually cached

Timeout Configuration

Document current timeouts:

  • Default request timeout: 30s
  • PDF operations: 120s
  • How to configure: ARTL_REQUEST_TIMEOUT env var

Memory Requirements

Document memory needs:

  • Small operations: <100MB
  • PDF conversion: 200-500MB
  • Large PDFs (>10MB): May fail with insufficient memory

Progress Indicators

Note that progress indicators are planned (Issue to be created)

Where to Document

  1. USERS.md - Performance table in new "Performance & Timeouts" section
  2. README.md - Quick reference in Configuration section
  3. API docstrings - Note slow operations with time estimates

Example Documentation

## Performance & Timeouts

### Operation Speed Reference

**Fast Operations** (< 5 seconds):
- Metadata retrieval (DOI, PMID)
- Identifier conversion
- Literature search
- Abstract retrieval

**Medium Operations** (5-15 seconds):
- Full text retrieval from PMC
- Large search queries
- PDF download

**Slow Operations** (30-120 seconds):
⚠️ These operations may appear to hang but are still working:
- PDF to Markdown conversion: 30-60s typical, up to 2 min for complex PDFs
- Large PDF text extraction: 20-40s
- Batch operations on multiple papers

### What to Do for Slow Operations

If an operation seems slow:
1. **Wait**: PDF operations legitimately take 30-60+ seconds
2. **Check logs**: Use `--debug` flag to see progress
3. **Memory**: Ensure sufficient RAM for PDF processing
4. **Timeout**: Operations timeout after 120s by default

### Configuring Timeouts

```bash
export ARTL_REQUEST_TIMEOUT=180  # Increase to 3 minutes

## Priority

**Medium** - Improves UX without code changes

## Related

- #219 (Retry logic)
- Future: Progress indicators issue
- Reliability assessment findings

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions