Skip to content

Commit eb086f9

Browse files
committed
Initial code, copied from prototype in JuliaSyntax repo branch
Requires a custom branch of JuliaSyntax to run...
1 parent 685639e commit eb086f9

File tree

10 files changed

+2115
-2
lines changed

10 files changed

+2115
-2
lines changed

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2024 Claire Foster <[email protected]> and contributors
3+
Copyright (c) 2024 Julia Computing and contributors
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

Project.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@ uuid = "f3c80556-a63f-4383-b822-37d64f81a311"
33
authors = ["Claire Foster <[email protected]> and contributors"]
44
version = "1.0.0-DEV"
55

6+
[deps]
7+
JuliaSyntax = "70703baa-626e-46a2-a12c-08ffd08c73b4"
8+
69
[compat]
710
julia = "1"
811

README.md

Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,220 @@
11
# JuliaLowering
22

33
[![Build Status](https://github.com/c42f/JuliaLowering.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/c42f/JuliaLowering.jl/actions/workflows/CI.yml?query=branch%3Amain)
4+
5+
Experimental port of Julia's code "lowering" compiler passes into Julia.
6+
7+
Lowering comprises four symbolic simplification steps
8+
* Syntax desugaring - simplifying the rich surface syntax down to a small
9+
number of forms.
10+
* Scope analysis - analyzing identifier names used in the code to discover
11+
local variables, closure captures, and associate global variables to the
12+
appropriate module.
13+
* Closure conversion - convert closures to types and deal with captured
14+
variables efficiently where possible.
15+
* Flattening to linear IR - convert code in hierarchical tree form to a
16+
flat array of statements and control flow into gotos.
17+
18+
## Goals
19+
20+
This work is intended to
21+
* Bring precise code provenance to Julia's lowered form (and eventually
22+
downstream in type inference, stack traces, etc). This has many benefits
23+
- Talk to users precisely about their code via character-precise error and
24+
diagnostic messages from lowering
25+
- Greatly simplify the implementation of critical tools like Revise.jl
26+
which rely on analyzing how the user's source maps to the compiler's data
27+
structures
28+
- Allow tools like JuliaInterpreter to use type-inferred and optimized
29+
code, with the potential for huge speed improvements.
30+
* Bring improvements for macro authors
31+
- Prototype "automatic hygiene" (no more need for `esc()`!)
32+
- Precise author-defined error reporting from macros
33+
- Sketch better interfaces for syntax trees (hopefully!)
34+
35+
# Design Notes
36+
37+
A disorganized collection of design notes :)
38+
39+
## Syntax trees
40+
41+
Want something something better than `JuliaSyntax.SyntaxNode`! `SyntaxTree` and
42+
`SyntaxGraph` provide this. (These will probably end up in `JuliaSyntax`.)
43+
44+
We want to allow arbitrary attributes to be attached to tree nodes by analysis
45+
passes. This separates the analysis pass implementation from the data
46+
structure, allowing passes which don't know about each other to act on a shared
47+
data structure.
48+
49+
Design and implementation inspiration comes in several analogies:
50+
51+
Analogy 1: the ECS (Entity-Component-System) pattern for computer game design.
52+
This pattern is highly successful because it separates game logic (systems)
53+
from game objects (entities) by providing flexible storage
54+
* Compiler passes are "systems"
55+
* AST tree nodes are "entities"
56+
* Node attributes are "components"
57+
58+
Analogy 2: The AoS to SoA transformation. But here we've got a kind of
59+
tree-of-structs-with-optional-attributes to struct-of-Dicts transformation.
60+
The data alignment / packing efficiency and concrete type safe storage benefits
61+
are similar.
62+
63+
Analogy 3: Graph algorithms which represent graphs as a compact array of node
64+
ids and edges with integer indices, rather than using a linked data structure.
65+
66+
## Julia's existing lowering implementation
67+
68+
### How does macro expansion work?
69+
70+
`macroexpand(m::Module, x)` calls `jl_macroexpand` in ast.c:
71+
72+
```
73+
jl_value_t *jl_macroexpand(jl_value_t *expr, jl_module_t *inmodule)
74+
{
75+
expr = jl_copy_ast(expr);
76+
expr = jl_expand_macros(expr, inmodule, NULL, 0, jl_world_counter, 0);
77+
expr = jl_call_scm_on_ast("jl-expand-macroscope", expr, inmodule);
78+
return expr;
79+
}
80+
```
81+
82+
First we copy the AST here. This is mostly a trivial deep copy of `Expr`s and
83+
shallow copy of their non-`Expr` children, except for when they contain
84+
embedded `CodeInfo/phi/phic` nodes which are also deep copied.
85+
86+
Second we expand macros recursively by calling
87+
88+
`jl_expand_macros(expr, inmodule, macroctx, onelevel, world, throw_load_error)`
89+
90+
This relies on state indexed by `inmodule` and `world`, which gives it some
91+
funny properties:
92+
* `module` expressions can't be expanded: macro expansion depends on macro
93+
lookup within the module, but we can't do that without `eval`.
94+
95+
Expansion proceeds from the outermost to innermost macros. So macros see any
96+
macro calls or quasiquote (`quote/$`) in their children as unexpanded forms.
97+
98+
Things which are expanded:
99+
* `quote` is expanded using flisp code in `julia-bq-macro`
100+
- symbol / ssavalue -> `QuoteNode` (inert)
101+
- atom -> itself
102+
- at depth zero, `$` expands to its content
103+
- Expressions `x` without `$` expand to `(copyast (inert x))`
104+
- Other expressions containing a `$` expand to a call to `_expr` with all the
105+
args mapped through `julia-bq-expand-`. Roughly!
106+
- Special handling exists for multi-splatting arguments as in `quote quote $$(x...) end end`
107+
* `macrocall` proceeds with
108+
- Expand with `jl_invoke_julia_macro`
109+
- Call `eval` on the macro name (!!) to get the macro function. Look up
110+
the method.
111+
- Set up arguments for the macro calling convention
112+
- Wraps errors in macro invocation in `LoadError`
113+
- Returns the expression, as well as the module at
114+
which that method of that macro was defined and `LineNumberNode` where
115+
the macro was invoked in the source.
116+
- Deep copy the AST
117+
- Recursively expand child macros in the context of the module where the
118+
macrocall method was defined
119+
- Wrap the result in `(hygienic-scope ,result ,newctx.m ,lineinfo)` (except
120+
for special case optimizations)
121+
* `hygenic-scope` expands `args[1]` with `jl_expand_macros`, with the module
122+
of expansion set to `args[2]`. Ie, it's the `Expr` representation of the
123+
module and expression arguments to `macroexpand`. The way this returns
124+
either `hygenic-scope` or unwraps is a bit confusing.
125+
* "`do` macrocalls" have their own special handling because the macrocall is
126+
the child of the `do`. This seems like a mess!!
127+
128+
129+
### Scope resolution
130+
131+
Scopes are documented in the Juila documentation on [Scope of Variables](https://docs.julialang.org/en/v1/manual/variables-and-scoping/)
132+
133+
This pass disambiguates variables which have the same name in different scopes
134+
and fills in the list of local variables within each lambda.
135+
136+
#### Which data is needed to define a scope?
137+
138+
As scope is a collection of variable names by category:
139+
* `argument` - arguments to a lambda
140+
* `local` - variables declared local (at top level) or implicitly local (in lambdas) or desugared to local-def
141+
* `global` - variables declared global (in lambdas) or implicitly global (at top level)
142+
* `static-parameter` - lambda type arguments from `where` clauses
143+
144+
#### How does scope resolution work?
145+
146+
We traverse the AST starting at the root paying attention to certian nodes:
147+
* Nodes representing identifiers (Identifier, operators, var)
148+
- If a variable exists in the table, it's *replaced* with the value in the table.
149+
- If it doesn't exist, it becomes an `outerref`
150+
* Variable scoping constructs: `local`, `local-def`
151+
- collected by scope-block
152+
- removed during traversal
153+
* Scope metadata `softscope`, `hardscope` - just removed
154+
* New scopes
155+
- `lambda` creates a new scope containing itself and its arguments,
156+
otherwise copying the parent scope. It resolves the body with that new scope.
157+
- `scope-block` is really complicated - see below
158+
* Scope queries `islocal`, `locals`
159+
- `islocal` - statically expand to true/false based on whether var name is a local var
160+
- `locals` - return list of locals - see `@locals`
161+
- `require-existing-local` - somewhat like `islocal`, but allows globals
162+
too (whaa?! naming) and produces a lowering error immediately if variable
163+
is not known. Should be called `require-in-scope` ??
164+
* `break-block`, `symbolicgoto`, `symboliclabel` need special handling because
165+
one of their arguments is a non-quoted symbol.
166+
* Add static parameters for generated functions `with-static-parameters`
167+
* `method` - special handling for static params
168+
169+
`scope-block` is the complicated bit. It's processed by
170+
* Searching the expressions within the block for any `local`, `local-def`,
171+
`global` and assigned vars. Searching doesn't recurse into `lambda`,
172+
`scope-block`, `module` and `toplevel`
173+
* Building lists of implicit locals or globals (depending on whether we're in a
174+
top level thunk)
175+
* Figuring out which local variables need to be renamed. This is any local variable
176+
with a name which has already occurred in processing one of the previous scope blocks
177+
* Check any conflicting local/global decls and soft/hard scope
178+
* Build new scope with table of renames
179+
* Resolve the body with the new scope, applying the renames
180+
181+
182+
### Lowered IR
183+
184+
See https://docs.julialang.org/en/v1/devdocs/ast/#Lowered-form
185+
186+
#### CodeInfo
187+
188+
```julia
189+
mutable struct CodeInfo
190+
code::Vector{Any} # IR statements
191+
codelocs::Vector{Int32} # `length(code)` Vector of indices into `linetable`
192+
ssavaluetypes::Any # `length(code)` or Vector of inferred types after opt
193+
ssaflags::Vector{UInt32} # flag for every statement in `code`
194+
# 0 if meta statement
195+
# inbounds_flag - 1 bit (LSB)
196+
# inline_flag - 1 bit
197+
# noinline_flag - 1 bit
198+
# ... other 8 flags which are defined in compiler/optimize.jl
199+
# effects_flags - 9 bits
200+
method_for_inference_limit_heuristics::Any
201+
linetable::Any
202+
slotnames::Vector{Symbol} # names of parameters and local vars used in the code
203+
slotflags::Vector{UInt8} # vinfo flags from flisp
204+
slottypes::Any # nothing (used by typeinf)
205+
rettype::Any # Any (used by typeinf)
206+
parent::Any # nothing (used by typeinf)
207+
edges::Any
208+
min_world::UInt64
209+
max_world::UInt64
210+
inferred::Bool
211+
propagate_inbounds::Bool
212+
has_fcall::Bool
213+
nospecializeinfer::Bool
214+
inlining::UInt8
215+
constprop::UInt8
216+
purity::UInt16
217+
inlining_cost::UInt16
218+
end
219+
```
220+

src/JuliaLowering.jl

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
module JuliaLowering
22

3-
# Write your package code here.
3+
using JuliaSyntax
4+
5+
using JuliaSyntax: SyntaxHead, highlight, Kind, GreenNode, @KSet_str
6+
using JuliaSyntax: haschildren, children, child, numchildren, head, kind, flags
7+
using JuliaSyntax: filename, first_byte, last_byte, source_location
8+
9+
using JuliaSyntax: is_literal, is_number, is_operator, is_prec_assignment, is_infix_op_call, is_postfix_op_call
10+
11+
include("syntax_graph.jl")
12+
include("utils.jl")
13+
14+
include("desugaring.jl")
15+
include("scope_analysis.jl")
16+
include("linear_ir.jl")
417

518
end

0 commit comments

Comments
 (0)