Skip to content

Commit 107da67

Browse files
committed
more docstrings
1 parent 9c90c94 commit 107da67

2 files changed

Lines changed: 65 additions & 17 deletions

File tree

src/EGraphs/egraph.jl

Lines changed: 47 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Functional implementation of https://egraphs-good.github.io/
22
# https://dl.acm.org/doi/10.1145/3434304
33

4-
##############################################
5-
# Interface to implement for custom analyses #
6-
##############################################
4+
# ==============================================================
5+
# Interface to implement for custom analyses
6+
# ==============================================================
77

88
"""
99
modify!(eclass::EClass{Analysis})
@@ -31,9 +31,9 @@ Given an e-node `n`, `make` should return the corresponding analysis value.
3131
"""
3232
function make end
3333

34-
############
35-
# EClasses #
36-
############
34+
# ==============================================================
35+
# EClasses
36+
# ==============================================================
3737

3838
"""
3939
EClass{D}
@@ -132,13 +132,17 @@ not necessarily very informative, but you can access the terms of each e-node
132132
via `Metatheory.to_expr`.
133133
134134
See the [egg paper](https://dl.acm.org/doi/pdf/10.1145/3434304)
135-
for implementation details.
135+
for implementation details. Of special notice is the e-graph invariants,
136+
and when they do or do not hold. One of the main innovations of `egg` was to
137+
"batch" the maintenance of the e-graph invariants. We use the `clean` field
138+
on this struct to keep track of whether there is pending work to do in order
139+
to re-establish the e-graph invariants.
136140
"""
137141
mutable struct EGraph{ExpressionType,Analysis}
138142
"""
139143
stores the equality relations over e-class ids
140144
141-
The `(potentially non-root id) --> (root id)` mapping.
145+
More specifically, the `(potentially non-root id) --> (root id)` mapping.
142146
"""
143147
uf::UnionFind
144148

@@ -170,12 +174,27 @@ mutable struct EGraph{ExpressionType,Analysis}
170174
pending::Vector{Pair{VecExpr,Id}}
171175

172176
"""
177+
When an e-node is added to an e-graph for the first time, we add analysis data to the
178+
newly-created e-class by calling [`make`]() on the head of the e-node and the analysis
179+
data for the arguments to that e-node. However, the analysis data for the arguments to
180+
that e-node could get updated at some point, as e-classes are merged.
181+
182+
This is a queue for e-nodes which have had the analysis of some of their arguments
183+
updated, but have not updated the analysis of their parent e-class yet.
173184
"""
174185
analysis_pending::UniqueQueue{Pair{VecExpr,Id}}
186+
187+
"""
188+
The Id of the e-class that we have built this e-graph to simplify.
189+
"""
175190
root::Id
191+
176192
"a cache mapping signatures (function symbols and their arity) to e-classes that contain e-nodes with that function symbol."
177193
classes_by_op::Dict{IdKey,Vector{Id}}
194+
195+
"do we need to do extra work in order to re-establish the e-graph invariants"
178196
clean::Bool
197+
179198
"If we use global buffers we may need to lock. Defaults to false."
180199
needslock::Bool
181200
lock::ReentrantLock
@@ -220,6 +239,8 @@ EGraph(e; kwargs...) = EGraph{typeof(e),Nothing}(e; kwargs...)
220239
@inline get_constant(@nospecialize(g::EGraph), hash::UInt64) = g.constants[hash]
221240
@inline has_constant(@nospecialize(g::EGraph), hash::UInt64)::Bool = haskey(g.constants, hash)
222241

242+
# Why does one of these use `get!` and the other use `setindex!`?
243+
223244
@inline function add_constant!(@nospecialize(g::EGraph), @nospecialize(c))::Id
224245
h = hash(c)
225246
get!(g.constants, h, c)
@@ -286,13 +307,17 @@ Returns the canonical e-class id for a given e-class.
286307
# new_n
287308
# end
288309

310+
"""
311+
Make sure all of the arguments of `n` point to root nodes in the unionfind
312+
data structure for `g`.
313+
"""
289314
function canonicalize!(g::EGraph, n::VecExpr)
290-
v_isexpr(n) || @goto ret
291-
for i in (VECEXPR_META_LENGTH + 1):length(n)
292-
@inbounds n[i] = find(g, n[i])
315+
if v_isexpr(n)
316+
for i in (VECEXPR_META_LENGTH + 1):length(n)
317+
@inbounds n[i] = find(g, n[i])
318+
end
319+
v_unset_hash!(n)
293320
end
294-
v_unset_hash!(n)
295-
@label ret
296321
v_hash!(n)
297322
n
298323
end
@@ -391,8 +416,9 @@ function addexpr!(g::EGraph, se)::Id
391416
end
392417

393418
"""
394-
Given an [`EGraph`](@ref) and two e-class ids, set
395-
the two e-classes as equal.
419+
Given an [`EGraph`](@ref) and two e-class ids, merge the two corresponding e-classes.
420+
421+
This includes merging the analysis data of the e-classes.
396422
"""
397423
function Base.union!(
398424
g::EGraph{ExpressionType,AnalysisType},
@@ -435,6 +461,9 @@ function Base.union!(
435461
return true
436462
end
437463

464+
"""
465+
Returns whether all of `ids...` are the same e-class in `g`.
466+
"""
438467
function in_same_class(g::EGraph, ids::Id...)::Bool
439468
nids = length(ids)
440469
nids == 1 && return true
@@ -563,7 +592,9 @@ end
563592

564593
# Thanks to Max Willsey and Yihong Zhang
565594

566-
595+
"""
596+
Look up a grounded pattern.
597+
"""
567598
function lookup_pat(g::EGraph{ExpressionType}, p::PatExpr)::Id where {ExpressionType}
568599
@assert isground(p)
569600

src/vecexpr.jl

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,29 @@ const Id = UInt64
3434
end
3535
3636
An e-node is represented by `Vector{Id}` where:
37-
* Position 1 stores the hash of the `VecExpr`.
37+
* Position 1 stores the hash of the rest of the `VecExpr`.
3838
* Position 2 stores the bit flags (`isexpr` or `iscall`).
3939
* Position 3 stores the signature
4040
* Position 4 stores the hash of the `head` (if `isexpr`) or node value in the e-graph constants.
4141
* The rest of the positions store the e-class ids of the children nodes.
4242
43+
The meaning of the bitflags `isexpr` and `iscall` can be best understood through looking at
44+
the source for `to_expr(g::EGraph, n::VecExpr)` in `src/EGraphs/egraph.jl`. Namely,
45+
e-nodes for which `isexpr` is false have no arguments; their only "data" is their head.
46+
E-nodes for which `isexpr` is true and `iscall` is also true correspond to
47+
`Expr(:call, head, args...)` expressions, and e-nodes for which `isexpr` is true but
48+
`iscall` is false correspond to `Expr(head, args...)` expressions. There should
49+
not be `VecExpr`s with `isexpr = false` but `iscall = true`.
50+
51+
The "signature" of an expression seems to in practice be computed as the hash of the head combined
52+
with the number of arguments (the arity). See: [`addexpr!`]() in `src/EGraphs/egraph.jl`.
53+
Perhaps in the future, signatures could also involve type information, e.g. to disambiguate
54+
overloaded heads? Signatures are used in the `classes_by_op` dictionary in a e-graph,
55+
so that when you are matching for `(a + b)` you can iterate over all of the e-classes
56+
that have some e-node with `(+, 2)` as its signature.
57+
58+
It also seems like the signature of a constant is `0`.
59+
4360
The expression is represented as an array of integers to improve performance.
4461
The hash value for the VecExpr is cached in the first position for faster lookup performance in dictionaries.
4562
"""

0 commit comments

Comments
 (0)