Skip to content

Commit

Permalink
Initial code, copied from prototype in JuliaSyntax repo branch
Browse files Browse the repository at this point in the history
Requires a custom branch of JuliaSyntax to run...
  • Loading branch information
c42f committed Mar 25, 2024
1 parent 685639e commit eb086f9
Show file tree
Hide file tree
Showing 10 changed files with 2,115 additions and 2 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2024 Claire Foster <[email protected]> and contributors
Copyright (c) 2024 Julia Computing and contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
3 changes: 3 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ uuid = "f3c80556-a63f-4383-b822-37d64f81a311"
authors = ["Claire Foster <[email protected]> and contributors"]
version = "1.0.0-DEV"

[deps]
JuliaSyntax = "70703baa-626e-46a2-a12c-08ffd08c73b4"

[compat]
julia = "1"

Expand Down
217 changes: 217 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,220 @@
# JuliaLowering

[![Build Status](https://github.com/c42f/JuliaLowering.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/c42f/JuliaLowering.jl/actions/workflows/CI.yml?query=branch%3Amain)

Experimental port of Julia's code "lowering" compiler passes into Julia.

Lowering comprises four symbolic simplification steps
* Syntax desugaring - simplifying the rich surface syntax down to a small
number of forms.
* Scope analysis - analyzing identifier names used in the code to discover
local variables, closure captures, and associate global variables to the
appropriate module.
* Closure conversion - convert closures to types and deal with captured
variables efficiently where possible.
* Flattening to linear IR - convert code in hierarchical tree form to a
flat array of statements and control flow into gotos.

## Goals

This work is intended to
* Bring precise code provenance to Julia's lowered form (and eventually
downstream in type inference, stack traces, etc). This has many benefits
- Talk to users precisely about their code via character-precise error and
diagnostic messages from lowering
- Greatly simplify the implementation of critical tools like Revise.jl
which rely on analyzing how the user's source maps to the compiler's data
structures
- Allow tools like JuliaInterpreter to use type-inferred and optimized
code, with the potential for huge speed improvements.
* Bring improvements for macro authors
- Prototype "automatic hygiene" (no more need for `esc()`!)
- Precise author-defined error reporting from macros
- Sketch better interfaces for syntax trees (hopefully!)

# Design Notes

A disorganized collection of design notes :)

## Syntax trees

Want something something better than `JuliaSyntax.SyntaxNode`! `SyntaxTree` and
`SyntaxGraph` provide this. (These will probably end up in `JuliaSyntax`.)

We want to allow arbitrary attributes to be attached to tree nodes by analysis
passes. This separates the analysis pass implementation from the data
structure, allowing passes which don't know about each other to act on a shared
data structure.

Design and implementation inspiration comes in several analogies:

Analogy 1: the ECS (Entity-Component-System) pattern for computer game design.
This pattern is highly successful because it separates game logic (systems)
from game objects (entities) by providing flexible storage
* Compiler passes are "systems"
* AST tree nodes are "entities"
* Node attributes are "components"

Analogy 2: The AoS to SoA transformation. But here we've got a kind of
tree-of-structs-with-optional-attributes to struct-of-Dicts transformation.
The data alignment / packing efficiency and concrete type safe storage benefits
are similar.

Analogy 3: Graph algorithms which represent graphs as a compact array of node
ids and edges with integer indices, rather than using a linked data structure.

## Julia's existing lowering implementation

### How does macro expansion work?

`macroexpand(m::Module, x)` calls `jl_macroexpand` in ast.c:

```
jl_value_t *jl_macroexpand(jl_value_t *expr, jl_module_t *inmodule)
{
expr = jl_copy_ast(expr);
expr = jl_expand_macros(expr, inmodule, NULL, 0, jl_world_counter, 0);
expr = jl_call_scm_on_ast("jl-expand-macroscope", expr, inmodule);
return expr;
}
```

First we copy the AST here. This is mostly a trivial deep copy of `Expr`s and
shallow copy of their non-`Expr` children, except for when they contain
embedded `CodeInfo/phi/phic` nodes which are also deep copied.

Second we expand macros recursively by calling

`jl_expand_macros(expr, inmodule, macroctx, onelevel, world, throw_load_error)`

This relies on state indexed by `inmodule` and `world`, which gives it some
funny properties:
* `module` expressions can't be expanded: macro expansion depends on macro
lookup within the module, but we can't do that without `eval`.

Expansion proceeds from the outermost to innermost macros. So macros see any
macro calls or quasiquote (`quote/$`) in their children as unexpanded forms.

Things which are expanded:
* `quote` is expanded using flisp code in `julia-bq-macro`
- symbol / ssavalue -> `QuoteNode` (inert)
- atom -> itself
- at depth zero, `$` expands to its content
- Expressions `x` without `$` expand to `(copyast (inert x))`
- Other expressions containing a `$` expand to a call to `_expr` with all the
args mapped through `julia-bq-expand-`. Roughly!
- Special handling exists for multi-splatting arguments as in `quote quote $$(x...) end end`
* `macrocall` proceeds with
- Expand with `jl_invoke_julia_macro`
- Call `eval` on the macro name (!!) to get the macro function. Look up
the method.
- Set up arguments for the macro calling convention
- Wraps errors in macro invocation in `LoadError`
- Returns the expression, as well as the module at
which that method of that macro was defined and `LineNumberNode` where
the macro was invoked in the source.
- Deep copy the AST
- Recursively expand child macros in the context of the module where the
macrocall method was defined
- Wrap the result in `(hygienic-scope ,result ,newctx.m ,lineinfo)` (except
for special case optimizations)
* `hygenic-scope` expands `args[1]` with `jl_expand_macros`, with the module
of expansion set to `args[2]`. Ie, it's the `Expr` representation of the
module and expression arguments to `macroexpand`. The way this returns
either `hygenic-scope` or unwraps is a bit confusing.
* "`do` macrocalls" have their own special handling because the macrocall is
the child of the `do`. This seems like a mess!!


### Scope resolution

Scopes are documented in the Juila documentation on [Scope of Variables](https://docs.julialang.org/en/v1/manual/variables-and-scoping/)

This pass disambiguates variables which have the same name in different scopes
and fills in the list of local variables within each lambda.

#### Which data is needed to define a scope?

As scope is a collection of variable names by category:
* `argument` - arguments to a lambda
* `local` - variables declared local (at top level) or implicitly local (in lambdas) or desugared to local-def
* `global` - variables declared global (in lambdas) or implicitly global (at top level)
* `static-parameter` - lambda type arguments from `where` clauses

#### How does scope resolution work?

We traverse the AST starting at the root paying attention to certian nodes:
* Nodes representing identifiers (Identifier, operators, var)
- If a variable exists in the table, it's *replaced* with the value in the table.
- If it doesn't exist, it becomes an `outerref`
* Variable scoping constructs: `local`, `local-def`
- collected by scope-block
- removed during traversal
* Scope metadata `softscope`, `hardscope` - just removed
* New scopes
- `lambda` creates a new scope containing itself and its arguments,
otherwise copying the parent scope. It resolves the body with that new scope.
- `scope-block` is really complicated - see below
* Scope queries `islocal`, `locals`
- `islocal` - statically expand to true/false based on whether var name is a local var
- `locals` - return list of locals - see `@locals`
- `require-existing-local` - somewhat like `islocal`, but allows globals
too (whaa?! naming) and produces a lowering error immediately if variable
is not known. Should be called `require-in-scope` ??
* `break-block`, `symbolicgoto`, `symboliclabel` need special handling because
one of their arguments is a non-quoted symbol.
* Add static parameters for generated functions `with-static-parameters`
* `method` - special handling for static params

`scope-block` is the complicated bit. It's processed by
* Searching the expressions within the block for any `local`, `local-def`,
`global` and assigned vars. Searching doesn't recurse into `lambda`,
`scope-block`, `module` and `toplevel`
* Building lists of implicit locals or globals (depending on whether we're in a
top level thunk)
* Figuring out which local variables need to be renamed. This is any local variable
with a name which has already occurred in processing one of the previous scope blocks
* Check any conflicting local/global decls and soft/hard scope
* Build new scope with table of renames
* Resolve the body with the new scope, applying the renames


### Lowered IR

See https://docs.julialang.org/en/v1/devdocs/ast/#Lowered-form

#### CodeInfo

```julia
mutable struct CodeInfo
code::Vector{Any} # IR statements
codelocs::Vector{Int32} # `length(code)` Vector of indices into `linetable`
ssavaluetypes::Any # `length(code)` or Vector of inferred types after opt
ssaflags::Vector{UInt32} # flag for every statement in `code`
# 0 if meta statement
# inbounds_flag - 1 bit (LSB)
# inline_flag - 1 bit
# noinline_flag - 1 bit
# ... other 8 flags which are defined in compiler/optimize.jl
# effects_flags - 9 bits
method_for_inference_limit_heuristics::Any
linetable::Any
slotnames::Vector{Symbol} # names of parameters and local vars used in the code
slotflags::Vector{UInt8} # vinfo flags from flisp
slottypes::Any # nothing (used by typeinf)
rettype::Any # Any (used by typeinf)
parent::Any # nothing (used by typeinf)
edges::Any
min_world::UInt64
max_world::UInt64
inferred::Bool
propagate_inbounds::Bool
has_fcall::Bool
nospecializeinfer::Bool
inlining::UInt8
constprop::UInt8
purity::UInt16
inlining_cost::UInt16
end
```

15 changes: 14 additions & 1 deletion src/JuliaLowering.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,18 @@
module JuliaLowering

# Write your package code here.
using JuliaSyntax

using JuliaSyntax: SyntaxHead, highlight, Kind, GreenNode, @KSet_str
using JuliaSyntax: haschildren, children, child, numchildren, head, kind, flags
using JuliaSyntax: filename, first_byte, last_byte, source_location

using JuliaSyntax: is_literal, is_number, is_operator, is_prec_assignment, is_infix_op_call, is_postfix_op_call

include("syntax_graph.jl")
include("utils.jl")

include("desugaring.jl")
include("scope_analysis.jl")
include("linear_ir.jl")

end
Loading

0 comments on commit eb086f9

Please sign in to comment.