|
1 | 1 | # JuliaLowering
|
2 | 2 |
|
3 | 3 | [](https://github.com/c42f/JuliaLowering.jl/actions/workflows/CI.yml?query=branch%3Amain)
|
| 4 | + |
| 5 | +Experimental port of Julia's code "lowering" compiler passes into Julia. |
| 6 | + |
| 7 | +Lowering comprises four symbolic simplification steps |
| 8 | +* Syntax desugaring - simplifying the rich surface syntax down to a small |
| 9 | + number of forms. |
| 10 | +* Scope analysis - analyzing identifier names used in the code to discover |
| 11 | + local variables, closure captures, and associate global variables to the |
| 12 | + appropriate module. |
| 13 | +* Closure conversion - convert closures to types and deal with captured |
| 14 | + variables efficiently where possible. |
| 15 | +* Flattening to linear IR - convert code in hierarchical tree form to a |
| 16 | + flat array of statements and control flow into gotos. |
| 17 | + |
| 18 | +## Goals |
| 19 | + |
| 20 | +This work is intended to |
| 21 | +* Bring precise code provenance to Julia's lowered form (and eventually |
| 22 | + downstream in type inference, stack traces, etc). This has many benefits |
| 23 | + - Talk to users precisely about their code via character-precise error and |
| 24 | + diagnostic messages from lowering |
| 25 | + - Greatly simplify the implementation of critical tools like Revise.jl |
| 26 | + which rely on analyzing how the user's source maps to the compiler's data |
| 27 | + structures |
| 28 | + - Allow tools like JuliaInterpreter to use type-inferred and optimized |
| 29 | + code, with the potential for huge speed improvements. |
| 30 | +* Bring improvements for macro authors |
| 31 | + - Prototype "automatic hygiene" (no more need for `esc()`!) |
| 32 | + - Precise author-defined error reporting from macros |
| 33 | + - Sketch better interfaces for syntax trees (hopefully!) |
| 34 | + |
| 35 | +# Design Notes |
| 36 | + |
| 37 | +A disorganized collection of design notes :) |
| 38 | + |
| 39 | +## Syntax trees |
| 40 | + |
| 41 | +Want something something better than `JuliaSyntax.SyntaxNode`! `SyntaxTree` and |
| 42 | +`SyntaxGraph` provide this. (These will probably end up in `JuliaSyntax`.) |
| 43 | + |
| 44 | +We want to allow arbitrary attributes to be attached to tree nodes by analysis |
| 45 | +passes. This separates the analysis pass implementation from the data |
| 46 | +structure, allowing passes which don't know about each other to act on a shared |
| 47 | +data structure. |
| 48 | + |
| 49 | +Design and implementation inspiration comes in several analogies: |
| 50 | + |
| 51 | +Analogy 1: the ECS (Entity-Component-System) pattern for computer game design. |
| 52 | +This pattern is highly successful because it separates game logic (systems) |
| 53 | +from game objects (entities) by providing flexible storage |
| 54 | +* Compiler passes are "systems" |
| 55 | +* AST tree nodes are "entities" |
| 56 | +* Node attributes are "components" |
| 57 | + |
| 58 | +Analogy 2: The AoS to SoA transformation. But here we've got a kind of |
| 59 | +tree-of-structs-with-optional-attributes to struct-of-Dicts transformation. |
| 60 | +The data alignment / packing efficiency and concrete type safe storage benefits |
| 61 | +are similar. |
| 62 | + |
| 63 | +Analogy 3: Graph algorithms which represent graphs as a compact array of node |
| 64 | +ids and edges with integer indices, rather than using a linked data structure. |
| 65 | + |
| 66 | +## Julia's existing lowering implementation |
| 67 | + |
| 68 | +### How does macro expansion work? |
| 69 | + |
| 70 | +`macroexpand(m::Module, x)` calls `jl_macroexpand` in ast.c: |
| 71 | + |
| 72 | +``` |
| 73 | +jl_value_t *jl_macroexpand(jl_value_t *expr, jl_module_t *inmodule) |
| 74 | +{ |
| 75 | + expr = jl_copy_ast(expr); |
| 76 | + expr = jl_expand_macros(expr, inmodule, NULL, 0, jl_world_counter, 0); |
| 77 | + expr = jl_call_scm_on_ast("jl-expand-macroscope", expr, inmodule); |
| 78 | + return expr; |
| 79 | +} |
| 80 | +``` |
| 81 | + |
| 82 | +First we copy the AST here. This is mostly a trivial deep copy of `Expr`s and |
| 83 | +shallow copy of their non-`Expr` children, except for when they contain |
| 84 | +embedded `CodeInfo/phi/phic` nodes which are also deep copied. |
| 85 | + |
| 86 | +Second we expand macros recursively by calling |
| 87 | + |
| 88 | +`jl_expand_macros(expr, inmodule, macroctx, onelevel, world, throw_load_error)` |
| 89 | + |
| 90 | +This relies on state indexed by `inmodule` and `world`, which gives it some |
| 91 | +funny properties: |
| 92 | +* `module` expressions can't be expanded: macro expansion depends on macro |
| 93 | + lookup within the module, but we can't do that without `eval`. |
| 94 | + |
| 95 | +Expansion proceeds from the outermost to innermost macros. So macros see any |
| 96 | +macro calls or quasiquote (`quote/$`) in their children as unexpanded forms. |
| 97 | + |
| 98 | +Things which are expanded: |
| 99 | +* `quote` is expanded using flisp code in `julia-bq-macro` |
| 100 | + - symbol / ssavalue -> `QuoteNode` (inert) |
| 101 | + - atom -> itself |
| 102 | + - at depth zero, `$` expands to its content |
| 103 | + - Expressions `x` without `$` expand to `(copyast (inert x))` |
| 104 | + - Other expressions containing a `$` expand to a call to `_expr` with all the |
| 105 | + args mapped through `julia-bq-expand-`. Roughly! |
| 106 | + - Special handling exists for multi-splatting arguments as in `quote quote $$(x...) end end` |
| 107 | +* `macrocall` proceeds with |
| 108 | + - Expand with `jl_invoke_julia_macro` |
| 109 | + - Call `eval` on the macro name (!!) to get the macro function. Look up |
| 110 | + the method. |
| 111 | + - Set up arguments for the macro calling convention |
| 112 | + - Wraps errors in macro invocation in `LoadError` |
| 113 | + - Returns the expression, as well as the module at |
| 114 | + which that method of that macro was defined and `LineNumberNode` where |
| 115 | + the macro was invoked in the source. |
| 116 | + - Deep copy the AST |
| 117 | + - Recursively expand child macros in the context of the module where the |
| 118 | + macrocall method was defined |
| 119 | + - Wrap the result in `(hygienic-scope ,result ,newctx.m ,lineinfo)` (except |
| 120 | + for special case optimizations) |
| 121 | +* `hygenic-scope` expands `args[1]` with `jl_expand_macros`, with the module |
| 122 | + of expansion set to `args[2]`. Ie, it's the `Expr` representation of the |
| 123 | + module and expression arguments to `macroexpand`. The way this returns |
| 124 | + either `hygenic-scope` or unwraps is a bit confusing. |
| 125 | +* "`do` macrocalls" have their own special handling because the macrocall is |
| 126 | + the child of the `do`. This seems like a mess!! |
| 127 | + |
| 128 | + |
| 129 | +### Scope resolution |
| 130 | + |
| 131 | +Scopes are documented in the Juila documentation on [Scope of Variables](https://docs.julialang.org/en/v1/manual/variables-and-scoping/) |
| 132 | + |
| 133 | +This pass disambiguates variables which have the same name in different scopes |
| 134 | +and fills in the list of local variables within each lambda. |
| 135 | + |
| 136 | +#### Which data is needed to define a scope? |
| 137 | + |
| 138 | +As scope is a collection of variable names by category: |
| 139 | +* `argument` - arguments to a lambda |
| 140 | +* `local` - variables declared local (at top level) or implicitly local (in lambdas) or desugared to local-def |
| 141 | +* `global` - variables declared global (in lambdas) or implicitly global (at top level) |
| 142 | +* `static-parameter` - lambda type arguments from `where` clauses |
| 143 | + |
| 144 | +#### How does scope resolution work? |
| 145 | + |
| 146 | +We traverse the AST starting at the root paying attention to certian nodes: |
| 147 | +* Nodes representing identifiers (Identifier, operators, var) |
| 148 | + - If a variable exists in the table, it's *replaced* with the value in the table. |
| 149 | + - If it doesn't exist, it becomes an `outerref` |
| 150 | +* Variable scoping constructs: `local`, `local-def` |
| 151 | + - collected by scope-block |
| 152 | + - removed during traversal |
| 153 | +* Scope metadata `softscope`, `hardscope` - just removed |
| 154 | +* New scopes |
| 155 | + - `lambda` creates a new scope containing itself and its arguments, |
| 156 | + otherwise copying the parent scope. It resolves the body with that new scope. |
| 157 | + - `scope-block` is really complicated - see below |
| 158 | +* Scope queries `islocal`, `locals` |
| 159 | + - `islocal` - statically expand to true/false based on whether var name is a local var |
| 160 | + - `locals` - return list of locals - see `@locals` |
| 161 | + - `require-existing-local` - somewhat like `islocal`, but allows globals |
| 162 | + too (whaa?! naming) and produces a lowering error immediately if variable |
| 163 | + is not known. Should be called `require-in-scope` ?? |
| 164 | +* `break-block`, `symbolicgoto`, `symboliclabel` need special handling because |
| 165 | + one of their arguments is a non-quoted symbol. |
| 166 | +* Add static parameters for generated functions `with-static-parameters` |
| 167 | +* `method` - special handling for static params |
| 168 | + |
| 169 | +`scope-block` is the complicated bit. It's processed by |
| 170 | +* Searching the expressions within the block for any `local`, `local-def`, |
| 171 | + `global` and assigned vars. Searching doesn't recurse into `lambda`, |
| 172 | + `scope-block`, `module` and `toplevel` |
| 173 | +* Building lists of implicit locals or globals (depending on whether we're in a |
| 174 | + top level thunk) |
| 175 | +* Figuring out which local variables need to be renamed. This is any local variable |
| 176 | + with a name which has already occurred in processing one of the previous scope blocks |
| 177 | +* Check any conflicting local/global decls and soft/hard scope |
| 178 | +* Build new scope with table of renames |
| 179 | +* Resolve the body with the new scope, applying the renames |
| 180 | + |
| 181 | + |
| 182 | +### Lowered IR |
| 183 | + |
| 184 | +See https://docs.julialang.org/en/v1/devdocs/ast/#Lowered-form |
| 185 | + |
| 186 | +#### CodeInfo |
| 187 | + |
| 188 | +```julia |
| 189 | +mutable struct CodeInfo |
| 190 | + code::Vector{Any} # IR statements |
| 191 | + codelocs::Vector{Int32} # `length(code)` Vector of indices into `linetable` |
| 192 | + ssavaluetypes::Any # `length(code)` or Vector of inferred types after opt |
| 193 | + ssaflags::Vector{UInt32} # flag for every statement in `code` |
| 194 | + # 0 if meta statement |
| 195 | + # inbounds_flag - 1 bit (LSB) |
| 196 | + # inline_flag - 1 bit |
| 197 | + # noinline_flag - 1 bit |
| 198 | + # ... other 8 flags which are defined in compiler/optimize.jl |
| 199 | + # effects_flags - 9 bits |
| 200 | + method_for_inference_limit_heuristics::Any |
| 201 | + linetable::Any |
| 202 | + slotnames::Vector{Symbol} # names of parameters and local vars used in the code |
| 203 | + slotflags::Vector{UInt8} # vinfo flags from flisp |
| 204 | + slottypes::Any # nothing (used by typeinf) |
| 205 | + rettype::Any # Any (used by typeinf) |
| 206 | + parent::Any # nothing (used by typeinf) |
| 207 | + edges::Any |
| 208 | + min_world::UInt64 |
| 209 | + max_world::UInt64 |
| 210 | + inferred::Bool |
| 211 | + propagate_inbounds::Bool |
| 212 | + has_fcall::Bool |
| 213 | + nospecializeinfer::Bool |
| 214 | + inlining::UInt8 |
| 215 | + constprop::UInt8 |
| 216 | + purity::UInt16 |
| 217 | + inlining_cost::UInt16 |
| 218 | +end |
| 219 | +``` |
| 220 | + |
0 commit comments