Pseudocode, interfaces, state management, and testing strategy for building a fan-out/fan-in workflow.
SplitResult:
chunks: list of {id: string, data: any}
ExecutorResult:
chunk_id: string
status: "success" | "error"
output: any
error: string or null
latency_ms: integer
AggregateConfig:
strategy: "concatenate" | "merge" | "summarize" | "vote" | "rank" | "custom"
custom_aggregator: function(results) → output // Used when strategy = "custom"
summarize_prompt: string or null // Used when strategy = "summarize"
ParallelConfig:
max_concurrency: integer // Max simultaneous LLM calls (default: 5)
failure_mode: "fail_fast" | "best_effort" | "retry_then_continue"
max_retries_per_chunk: integer // Default: 1
timeout_ms: integer // Per-executor timeout
function parallel_execute(input, splitter, prompt, aggregate_config, config):
// Split
split = splitter(input)
chunks = split.chunks
// Fan-out with bounded concurrency
results = bounded_parallel_map(
items: chunks,
max_concurrency: config.max_concurrency,
executor: function(chunk):
return execute_chunk(chunk, prompt, config)
)
// Check failure mode
failures = [r for r in results if r.status == "error"]
if config.failure_mode == "fail_fast" and failures.length > 0:
return {status: "failed", errors: failures}
successes = [r for r in results if r.status == "success"]
if successes.length == 0:
return {status: "failed", error: "All chunks failed"}
// Fan-in
output = aggregate(successes, aggregate_config)
return {
status: "success",
output: output,
stats: {total: chunks.length, succeeded: successes.length, failed: failures.length}
}
function execute_chunk(chunk, prompt, config):
start_time = now()
for attempt in 1..(config.max_retries_per_chunk + 1):
try:
response = call_llm(
prompt: prompt.replace("{chunk}", chunk.data),
timeout: config.timeout_ms
)
return {
chunk_id: chunk.id,
status: "success",
output: response.text,
latency_ms: now() - start_time
}
catch error:
if attempt == config.max_retries_per_chunk + 1:
return {
chunk_id: chunk.id,
status: "error",
error: error.message,
latency_ms: now() - start_time
}
// Retry with backoff
wait(attempt * 1000)
function bounded_parallel_map(items, max_concurrency, executor):
results = []
semaphore = create_semaphore(max_concurrency)
parallel for item in items:
semaphore.acquire()
result = executor(item)
results.append(result)
semaphore.release()
return results
function aggregate(results, config):
// Sort by chunk_id to maintain original order
sorted = sort_by(results, "chunk_id")
if config.strategy == "concatenate":
return join([r.output for r in sorted], "\n\n")
if config.strategy == "merge":
merged = {}
for r in sorted:
merged = deep_merge(merged, parse_json(r.output))
return merged
if config.strategy == "summarize":
combined = join([r.output for r in sorted], "\n---\n")
return call_llm(
prompt: config.summarize_prompt.replace("{results}", combined)
).text
if config.strategy == "vote":
// Count occurrences of each answer, return most common
return majority_vote([r.output for r in sorted])
if config.strategy == "custom":
return config.custom_aggregator(sorted)
ParallelState:
chunks: list of {id, data}
pending: set of chunk_ids
completed: list of ExecutorResult
failed: list of ExecutorResult
start_time: timestamp
stateDiagram-v2
[*] --> Splitting: receive input
Splitting --> Executing: chunks created
Executing --> Executing: chunk completes
Executing --> Aggregating: all chunks done
Executing --> Failed: fail-fast triggered
Aggregating --> Completed: aggregation done
Completed --> [*]
Failed --> [*]
One prompt reused for each chunk:
System: Analyze the following text segment. Extract key themes, entities, and sentiment.
Return as JSON: {"themes": [...], "entities": [...], "sentiment": "positive"|"negative"|"neutral"}
User: {chunk}
Different prompts for the same input:
// Sentiment prompt:
System: Determine the sentiment of this text. Return: {"sentiment": "...", "confidence": 0.0-1.0}
// Entity prompt:
System: Extract all named entities. Return: {"people": [...], "orgs": [...], "locations": [...]}
// Summary prompt:
System: Summarize this text in 2-3 sentences.
System: Synthesize these analysis results into a single coherent report.
Each section below is an independent analysis of a portion of the input.
Combine them without losing key details, resolving any minor inconsistencies.
User: {results}
These are production-ready templates. Copy and adapt — replace {placeholders} with your specifics.
You process one segment of a larger input.
Your task: {single_focused_task — e.g. "extract the key claims from this text segment"}
Rules:
- Process only the text provided. Do not infer what other segments might say.
- If this segment does not contain relevant content, respond: NO_CONTENT
- Do not explain your reasoning — return only the result.
Output format: {exact_format}
You perform a specific type of analysis on the provided text.
Your role: {role — e.g. "sentiment analyst", "entity extractor", "summarizer"}
Your task: {specific_task}
Return your analysis in this exact format:
{output_format_with_example}
Do not add fields not listed above.
You combine analysis results from multiple independent sources into a single coherent output.
Original task: {original_user_task}
Rules:
- Each section below is an independent analysis of a separate portion of the input.
- Resolve minor contradictions by noting both perspectives.
- If two sections report the same fact, include it once.
- Preserve all significant findings — do not discard content for brevity.
- Output format: {final_output_format}
[Source 1]
{branch_1_output}
[Source 2]
{branch_2_output}
[Source N]
{branch_N_output}
Combine the above into a single {output_type}.
| Placeholder | What to put here |
|---|---|
{single_focused_task} |
One job only — "extract claims", "score relevance 0-10", "identify action items" |
{exact_format} |
Specify structure explicitly, including example values where helpful |
{original_user_task} |
Repeat the original request so the aggregator understands the goal |
{output_type} |
"summary", "ranked list", "JSON report", "executive overview" |
Splitter tests:
- Verify correct chunk count for known inputs
- Verify chunks are independent (no cross-references)
- Verify chunk IDs are unique
Aggregator tests:
- Test each strategy with known result sets
- Test with partial results (some failures)
- Test ordering is maintained
Full pipeline tests:
- Stub all LLM calls with predetermined responses
- Verify end-to-end flow: split → execute → aggregate
- Verify concurrency limit is respected (use timing assertions)
Failure mode tests:
- Fail one chunk → verify fail-fast aborts
- Fail one chunk → verify best-effort continues
- Fail one chunk → verify retry-then-continue retries first
- Verify the system handles N chunks without exceeding concurrency limits
- Verify rate limiting doesn't cause cascading failures
Problem: Sending all chunks simultaneously overwhelms the API with rate limit errors. Fix: Always use bounded concurrency. Start with 5–10 and adjust based on API limits.
Problem: Parallel LLM calls on similar data produce inconsistent outputs (different formats, conflicting answers). Fix: Use lower temperature for consistency. Add format validation per chunk. For critical consistency, use voting.
Problem: Aggregation assumes ordered results, but parallel execution completes out of order. Fix: Always tag results with chunk_id and sort before aggregation.
Problem: Summarization aggregation (LLM call on all results) chokes on large result sets. Fix: Hierarchical aggregation — first aggregate pairs, then aggregate aggregates.
Problem: Best-effort mode returns partial results, but the consumer doesn't know what's missing. Fix: Always return stats (succeeded/failed counts) alongside the output. Let the consumer decide if partial results are acceptable.
If you're processing items one at a time:
- Verify that items are independent (no inter-item dependencies)
- Wrap the existing per-item logic as an executor
- Add a splitter and aggregator
- Start with max_concurrency = 3 and increase
When you add a retrieval step before the parallel execution, you're building toward RAG:
- The "splitter" becomes a query generator
- The "executors" become retrieval queries
- The "aggregator" becomes context assembly + generation