Skip to content

[Feature] First-class functions and closures #1110

@AndreKurait

Description

@AndreKurait

Is your feature request related to a problem? Please describe.

Amber has no way to pass behavior as a value. There's no callable type, so you can't
write a generic map/filter/reduce, a comparator-driven sort, a retry(fn), or
any callback API — and the stdlib can't grow higher-order helpers (a generic
map/filter/reduce can't even be typed today; related to the builtin-set
discussion #908 and the it.each-style ergonomics wanted by the unit-test-framework
discussion, discussions/647).
In a real ~6.7K-LOC Amber codebase, a test suite
can't share scenario setup across parameterized cases because there is no
it.each(cases, fn), so every case re-spells its setup inline.

Reading the compiler, the reason is structural, not incidental:

  • Type (src/modules/types.rs:8) has no callable variant.
  • ExprType (src/modules/expression/expr.rs:41) has FunctionInvocation (a call) but
    no function value/lambda.
  • Calls are 100% static: every call site emits a literal bash function name at compile
    time (invocation.rs:206, "{prefix}{name}__{id}_v{variant_id}"), with
    monomorphization keyed by arg-types (function_cache.rs). There is no runtime
    dispatch and no eval anywhere — which is good, and a design constraint to keep.

Describe the solution you'd like

Add a function type and a lambda expression, compiled in a way that stays inside
Amber's static-dispatch, no-eval, bash-3.2 model.

// function type
fun(Int, Int): Int          // takes two Ints, returns Int
fun(Text): Null
fun(Int): Int?              // failable callable

// lambda expression  (=> body; block body also allowed)
let inc  = (x: Int): Int => x + 1
let log  = (m: Text): Null => { eprintln(m) }
let base = 10
let addb = (x: Int): Int => x + base        // closes over `base`

// call a function value; pass one to a higher-order fn
fun apply(f: fun(Int): Int, v: Int): Int { return f(v) }
echo apply(addb, 5)        // 15
echo apply(double, 5)      // a bare top-level fun name is usable as a value

The compilation model (the key idea). A closure value is just Text:
"<id>\x1f<cap1>\x1f<cap2>…" — a small integer picking the lambda body, plus its
captured environment, joined by Amber's existing internal field separator \x1f. Each
lambda compiles to a normal monomorphized bash function taking its captures as leading
positional args; a call through a function-typed value routes through a
compiler-generated case-over-id dispatcher, one per function-type shape used in
the program. No eval, no ${!var} function-name indirection — the only runtime
variability is an integer case, the same branch class if-chains already emit.

# from: let addb = (x: Int): Int => x + base ; apply(addb, 5)
__lambda_0() { local base="$1"; local x="$2"; echo $(( x + base )); }
# one dispatcher per used shape — here (Int) -> Int
__call_I_I() { local c="$1"; shift; local id="${c%%$'\x1f'*}"; local env="${c#*$'\x1f'}"
  case "$id" in 0) __lambda_0 "$env" "$@";; esac; }
apply__0_v0() { local f="$1"; local v="$2"; __call_I_I "$f" "$v"; }
base=10
addb="0"$'\x1f'"$base"
apply__0_v0 "$addb" 5      # -> 15

This reuses what already exists: monomorphization (function_cache.rs instances →
dispatcher arms keyed by type-shape, with capture types in the key), the per-arg local
binding from $1..$N (declaration.rs:67 set_args_as_variables, including its
ksh/zsh array handling), and \x1f as the field delimiter. Zero cost when unused: a
program with no lambda emits no __lambda_*/__call_* and is byte-identical to today.

A runnable prototype in the emitted shape (prototype/closures.sh) covers single +
multi capture, an escaping closure (returned from a factory, called later), array
capture, and multiple distinct environments — green and byte-identical on bash 3.2.57
and zsh
, dispatch via integer case only, no eval.

Describe alternatives you've considered

  • eval/${!var} runtime function-name dispatch — rejected: pulls eval into
    generated code and breaks Amber's static model; the case-over-id design avoids it.
  • A bash-AST manipulation pass to model callables — rejected: the closure value is
    just Text and dispatch is just a case, so no new IR is needed (this is the
    exact maintenance concern raised in the compiler-architecture discussion,
    discussions/544).
  • Reference capture (capture by alias) — deferred: capture is by value here,
    which matches bash string semantics; ref-capture can be a later extension.
  • Do nothing / keep using string-keyed dispatch tables by hand — that's what users
    do today; it doesn't type-check, doesn't compose, and can't back a stdlib map.

Additional context

  • No prior closure/lambda issue exists; this is the missing primitive under several
    open threads: the builtin-set discussion [Feature] New builtins proposals #908, the unit-test-framework discussion
    discussions/647 (it.each),
    and the method-call discussion
    discussions/652 (which notes
    "we are missing an OOP system"). Filing per the formal feature-proposal process,
    discussions/302.
  • Phased so the first PRs carry no codegen risk: (1) Type::Function + parse +
    ExprType::Lambda, typecheck-only; (2) free-variable capture analysis; (3) codegen
    (__lambda_* + closure value + per-shape dispatchers + dynamic call sites);
    (4) snapshot tests proving closure-free programs are byte-identical; (5) multi-shell
    runtime suite (bash 3.2/4.x/zsh/ksh); (6) stdlib map/filter/reduce (the payoff).
  • Open questions for discussion: function-type variance (propose invariant first);
    capture by-value vs a future ref-capture; lambda syntax (=> proposed, vs |x| or
    anonymous fun(){}); dispatcher granularity (per-shape proposed); confirm/define the
    "user Text never contains \x1f" invariant or escape on capture.
Appendix A — current architecture (file:line), why closures are hard
  • Static dispatch: invocation.rs:206 emits the callee's literal bash name
    "{prefix}{name}__{id}_v{variant_id}"; the name/variant is resolved at compile time
    at every call site. No runtime function dispatch exists.
  • Monomorphization: each fun is cached as N FunctionInstances keyed by the
    arg-type vector (function_cache.rs; invocation_utils.rs:229
    find(|fun| fun.args == args)); each instance is translated to its own bash function;
    args are bound from $1..$N as locals (declaration.rs:67), with special
    nameref/serialization handling for arrays under ksh/zsh/bash-legacy
    (declaration.rs:82-145).
  • No callable type: Type = Null/Text/Bool/Num/Int/Array/Union/Generic
    (types.rs:8).
  • No function value: ExprType (expr.rs:41) has only FunctionInvocation (a
    call); a function name cannot appear as a value.
  • No lexical environment: variables are positional-arg locals, function-body locals,
    or globals; renamed with a unique global_id at translation
    (get_variable_name, translate/fragments/mod.rs:52). Functions are global-scope-only
    (declaration.rs:381 is_global_scope guard); no nesting.
Appendix B — type system + AST + capture changes

Type (types.rs:8): add Function(Vec<Type>, Box<Type>, bool) = (params, return,
is_failable).

  • parse_type/try_parse_simple_type: recognise the fun( type form.
  • is_allowed_in: invariant param+return matching to start (sound + minimal; relax to
    contravariant-params/covariant-return later if wanted).
  • is_strictly_typed: a Function is strict iff params + return are; an un-inferable
    lambda param type is a compile error (like an untyped empty array).
  • Display: fun(Int, Int): Int.

ExprType (expr.rs:41): add Lambda (params, optional return, expr-or-block body)
and FunctionRef (a bare top-level fun name in value position → empty-env closure).
Generalize the invocation callee from the hardcoded name: String
(invocation.rs:14-27) to an expression: try the existing identifier/static path first
(unchanged), fall to dispatch when the identifier is a function-typed variable or the
callee is a non-identifier expression (get_op()(…), ops[0](x)).

Capture analysis (new src/modules/function/lambda.rs, mirroring declaration.rs):
at typecheck, collect every free variable in the lambda body (variables resolving
outside its params + locals) via the existing get_var_used + scope stack
(variable/mod.rs:110). Result is Vec<{name, global_id, type}>, captured by value
at the point the lambda is evaluated; capture types feed monomorphization exactly like
arg types do (invocation_utils.rs:115-149). Lambdas are allowed in non-global scope —
the is_global_scope guard (declaration.rs:381) applies to named fun only.

Appendix C — codegen + multi-shell notes

Per distinct lambda (id = its monomorphized instance index):

  1. emit __lambda_<id>() with captures as leading positional params then the lambda's
    params, bound to locals as set_args_as_variables does today (reusing the array
    nameref/serialization paths);
  2. at the lambda site, emit the value "<id>\x1f<cap1>\x1f<cap2>…" (a captured array
    serializes into one \x1f-joined field — same array-serialization Amber already has);
  3. per used function-type shape, emit one __call_<shape>() whose case arms cover
    every lambda of that shape (shape = stable mangling of param/return types: I_I,
    T_T, II_I, _I, …);
  4. a dynamic call f(args) emits __call_<shape> "$f" <args>;
  5. failable closures (fun(...):R?) propagate the body's status through the existing
    failure protocol (function/fail.rs, ret.rs); the ?/failed {} requirement is
    enforced in typecheck.

All five are gated on closures being used; otherwise none are emitted.

Multi-shell (verified, byte-identical output on bash 3.2.57, bash 5.3, zsh 5.9, and
ksh 93u+)
— a run-all.sh runs the prototype under every available shell and asserts
the outputs agree. Two portability rules make one form work everywhere, both things
Amber's codegen already does (so they're not new burdens):

  • the \x1f separator is minted with printf '\037' and matched with quoted "$SEP"
    in parameter expansions, and a serialized array is peeled one field at a time
    (${rest%%"$SEP"*} / ${rest#*"$SEP"}), never IFS=$'\x1f' word-split (which
    errors under zsh) — the same IFS-avoidance Amber already practices for arrays;
  • function locals use typeset (accepted by bash/zsh/ksh); the real compiler emits the
    per-shell form via the existing ShellType::Ksh branches in declaration.rs.

So the \x1f closure encoding does not add an unportable cross-shell surface — it's
the same delimiter + per-shell local handling the compiler already ships.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions