-
Notifications
You must be signed in to change notification settings - Fork 0
Description
proposal
Sem-check symbols lazily, deferring semcheck of declaration and body until they're needed:
- a symbol declared gets added to current scope but isn't semchecked yet
- a reference to an overloaded symbol (ie a rountine call
fn(a,b)triggers semcheck of declarations of each overload offn(only the declaration, not the body) - once a call is resolved to a symbol, that symbol's body is semchecked, at which point a new scope is created (with parent scope = scope where symbol was declared) and semcheck recurses in this scope in same way as for the top-level scope
It works similarly to how nim implements dead code elimination in backend, by processing declarations top-down, which has the built-in property of skipping unused declarations (even if those appear in a cycle, eg if foo calls bar but no top-level code calls (transitively) either foo or bar, both foo and bar will be dead-code eliminated.
semcheck as depth first search
We define a processing stack containing declaration scopes (PScope), and semcheck consists in lazily processing statements in a scope and recursing when a non-declaration statement is visited that requires semchecking a symbol.
There is no need for fwd declarations nor complex data structures nor keeping track of scopes attached to declarations; all that's needed is a stack of scopes.
At any given time during semcheck, you only need to keep track of a stack of N scopes where N is the processing depth (eg: fn1 calls fn2 calls fn3 => N = 3). When a symbol f is declared in a scope S, f's scope will grow if new declarations occur after f is declared (in same lexical scope); when a statement triggers semcheck of f (delaration or body), a new scope is created (whose parent is S).
The parent relationship implicitly defines a tree (rooted at module top-level), but we never need to visit the children of a scope; all that's needed is walking up the scope when doing symbol resolution (ident => symbol).
proc a00=discard
proc a01=
proc a11=
a12()
proc a12=
a00()
a11()
a12()
a01()processing steps:
- register a00, a01; stack = [S0] # the scope stack
- visit a01; stack = [S0, S1] # S1.parent = S0
- register a11, a12; stack = [S0, S1]
- visit a12; stack = [S0, S1, S2]
- visit a00; stack = [S0, S1, S2, S3] # S3.parent = S0, not S2!
- visit a11; stack = [S0, S1, S2, S4] # s4.parent = S2
- visit a12(cached)
semcheck consists in doing depth first search in the symbol graph, where recursion is triggered for each statement that requires semchecking a declaration (a new declaration doesn't require semchecking nor recursion; instead the symbol just declared is merely added to the current scope). Each time a symbol is semchecked, a new scope is created (with an appropriate parent) and pushed to the stack; it is popped when semchecking for this symbol is completed.
this can be represented simply as follows:
type
PScope = ref object
parent: PScope
symbols: TStrTable
var stack: seq[PScope] # the processing stackin particular, the position in processingStack is unrelated to the parent field.
example
# m1.nim
type
A = object
B = typeof(fn3())
C = seq[B]
proc fn1(a: C) =
from m2 import fn2
fn2(a)
proc fn3(): int = 1
# until this point, the symbol table just contains shallow entries: `A(type) , B(type), C(type), fn1 (proc), fn3(proc)`;
# in particular, fn1, fn3, C have not been semchecked.
# If the module ended here, that would be the end of semcheck
discard fn3() # this triggers semcheck of fn3
var a: C # this triggers semcheck of C => B => fn3 (cached)
discard fn1(a) # this triggers semcheck of fn1 => registers m2; then fn2(a) triggers import of m2, and semcheck of `fn2`scoping rules
the declaration scope looks both before and after a declaration
proc fn1 = discard
# scope: fn1
block:
proc fn2 =
# if/when body is semchecked, scope = fn1, fn2, fn3
fn3()
proc fn3 = discard
# fn2 and fn3 are not in scope anymore (scope = fn1)behavior of declared
const a = declared(foo)
static: echo a # prints `false` because a top-level code triggers semcheck of `a` and `foo` isn't yet declared
proc foo() = discardcontrast with:
const a = declared(foo)
proc foo() = discard
static: echo a # prints true # the top-level code triggers after foo is declaredbehavior of lazy imports
# a.nim
proc a1=
from b import b1
b1()
proc a2=
from b import b2
b2()
a1()
a2()
# b.nim
proc b0 = discard
proc b1 = b0()
proc b2 = discardsemcheck steps:
- symbol table registers a1, a2 (without semchecking those)
- a1() is seen, triggers semcheck of a1
from b import b1is seen, registers symbols b, b1 (no semcheck nor import yet)- b1() is seen, triggers semcheck of b1, which triggers import of b, then of b1()
- b0() is seen, triggers semcheck of b0
- a2() is seen, triggers semcheck of a2
from b import b2is seen, registers symbols, b, b2- b2() is seen, triggers semcheck of b2
benefits
- this would solve cyclic dependencies
- forward declarations become un-needed
- this is a more principled and effective way to achieve
{.experimental: "codeReordering".}, and subsumes this feature - this would massively speed up compilation times, and possibly also binary sizes; the dead code elimination that nim currently does cannot recover from certain earlier decisions such as generating module initializations for modules that would be skipped thanks to lazy symbol resolution
- there would be no need for include files anymore (these are still pervasive in compiler code)
links
- revive my very old PR related to this
- => lazy symbol semantic checking nim-lang/RFCs#416