-
-
Notifications
You must be signed in to change notification settings - Fork 131
Add sourcemap support to the Aiken compiler #1250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Quantumplation
wants to merge
17
commits into
main
Choose a base branch
from
pi/source-maps
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Adds a generic "Context" parameter that can be populated with data; any methods which work with terms will preserve the context, meaning it survives execution, optimization, etc. For now, we default it to (), but this is foundational work for source map support, because we can assign sourcemap locations as the context during codegen.
Adds a new location field to Air and AirTree node variants, intended to track the closest span in the source code that generated that node. Initially set to empty spans, but then we can add in more and more source coverage in future commits.
This now passes the location information we have down into the Term!
It's not *super* useful yet; the things generated by the compiler that
*should* be able to derive their span info from the things around them
aren't yet being provided; but it's a good start, all the tests pass!
For example, here's a little fibonacci test I put together:
=== Source code ===
fn fib(n: Int) -> Int {
if n < 2 {
n
} else {
fib(n - 1) + fib(n - 2)
}
}
test fib_test() {
fib(10) == 55
}
=== Term tree with source locations ===
Apply (no span)
Apply (no span)
Builtin(EqualsInteger) (no span)
Apply (no span)
Apply (no span)
Lambda(test_module_fib) (no span)
Apply (no span)
Lambda(test_module_fib) (no span)
Var(test_module_fib) @ 164..167 = "fib"
Apply (no span)
Var(test_module_fib) (no span)
Var(test_module_fib) (no span)
Lambda(__no_inline__) (no span) Lambda(test_module_fib) (no span)
Lambda(n_id_0) (no span)
Force (no span)
Apply (no span)
Apply (no span)
Apply (no span)
Force (no span)
Builtin(IfThenElse) (no span)
Apply (no span)
Apply (no span)
Builtin(LessThanInteger) @ 42..47 = "n < 2"
Var(n_id_0) @ 42..43 = "n"
Constant(Integer(2)) (no span)
Delay (no span)
Var(n_id_0) @ 60..61 = "n"
Delay (no span)
Apply (no span)
Apply (no span)
Builtin(AddInteger) (no span)
Apply (no span)
Apply (no span)
Var(test_module_fib) @ 89..92 = "fib" Var(test_module_fib) @ 89..92 = "fib"
Apply (no span)
Apply (no span)
Builtin(SubtractInteger) @ 93..98 = "n
- 1"
Var(n_id_0) @ 93..94 = "n"
Constant(Integer(1)) (no span) Apply (no span)
Apply (no span)
Var(test_module_fib) @ 102..105 = "fib" Var(test_module_fib) @ 102..105 = "fib"
Apply (no span)
Apply (no span)
Builtin(SubtractInteger) @ 106..111 = "n
- 2"
Var(n_id_0) @ 106..107 = "n"
Constant(Integer(2)) (no span)
Constant(Integer(10)) (no span)
Constant(Integer(55)) (no span)
- Add SourceMap type for mapping UPLC node indices to source locations - Add the ability to generate a sourcemap when building, either externally or in the blueprint json - Supports exporting tests (useful as a source of complex examples) - Add's a --list flag to make identifying what exactly to export easier
Exposes an interface to run the CEK machine one step at a time (essentially just exporing a few functions); More importantly, Now that we've added Context to terms, and propagated that through compilation, when we *run* the machine, we have to erase the context down to unit (). This makes things like debugger support awkward, because as the machine executes, the term being executed gets manipulated. That means to map it back to the generated source maps, we'd need some kind of pattern matching system. So, instead if we update the Machine to be able to run generically over context (i.e. preserve the context as we juggle the CEK machine), then we can use that context to attach a post-order numbering to each node, and use that to index into the source maps. To that end, we add a context parameter to Value, BuiltinRuntime, and Env. Of note, we don't make Error generic over Context. While that might be useful for better error messages on failure, it's a much bigger refactor, and isn't critical for debugging steps; so for now, we just erase the context when constructing errors, and provide a utility for lifting Value<()> into the default context.
It doesn't do us much good to carry the source spans through the whole compilation process, if optimizations just screw with the UPLC tree at the end. This ensures that the interning, shrinking, and other optimizations carry through the relevant context. This also fixes the order of operations to apply used functions before optimization.
- Add more source spans during code generation - Reorder pipeline to ensure source maps survive optimization - Unify the codegen paths to always use generic context
Spans aren't sufficient to actually track source location; we were trying to match based on module, but inlining messes with that. So, we'll pass the source file name in all the way through, and we introduce a SourceLocation type for this.
Thread SourceLocation through the code generation to all UPLC term construction sites, including: - Assignments and let bindings - Boolean operators (and/or chains) - Function application
This should avoid bugs with diverging implementations, the recurring theme of this PR.
Allows us to do compilation and codegen, but skip any optimizations that might screw with the execution codepath, making it easier to debug a contract
Adds a field to sourcemaps that map var's and lambdas to the variable names that introduce them; this lets debuggers show original source variable names in the environment!
Operates identically to the test command, but prints out a coverage report, using the new source map capabilities!
- Add is_empty() method to Env (len_without_is_empty) - Allow type_complexity in builtin_curry_reducer - Remove redundant iter cloning in script_context.rs
- Add #[allow(clippy::too_many_arguments)] to a few functions - Rename to_string() to render() - Add #[allow(clippy;type_complexity)] to collect_tests_for_coverage - Fix a few unused variables - Remove unused imports - Fix needless_borrow in export command - Use arrays instead of vec![] in tests - Use !is_empty() instead of len() > 0
e5c766c to
a45c83e
Compare
I had a different rust version locally so therese weren't showing up. - Used derive(Default) on SourceLocation - Allow some unused assignments, because they're used by miette - Add result_large_err to existing allow attribute
a45c83e to
d82d733
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
As Aiken matures, developers need better tooling for understanding what their contracts are actually doing at runtime. Three use cases drove this work:
All three require the same fundamental capability: given a UPLC node during execution, answer "where did this come from in the source?"
Approach
The core insight is that UPLC's Term type can carry metadata through compilation and execution. We:
Alternative considered
We initially experimented with a trace-based approach: inject special Trace calls at key points during codegen, then strip them out before final output. The traces would carry source location strings that a debugger could intercept.
This had several problems, mainly that it didn't play well with optimizations and was very brittle.
The generic context approach is cleaner: the metadata is truly out-of-band and doesn't affect the compiled output when not needed.
What's included
This PR was heavily AI-assisted (Claude Code) because I was doing it over Christmas break; Without this, given my busy schedule, it likely wouldn't have happened at all, so I'm hoping the output is coherent enough to offset the influence of AI.
The high level design and approach was decided by @SupernaviX and @MicroProofs ahead of implementation, and all tests pass; Still, the volume of mechanical changes (particularly threading context through pattern matches) means a careful review is warranted. Key areas to scrutinize:
The test suite passes and the feature works end-to-end with Gastronomy, but fresh eyes on the implementation details would be valuable.