-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Summary
The current vyperTestLib.sml mixes two concerns:
- Parsing JSON into HOL terms
- Making semantic decisions to fit the target
vyperAST
This issue tracks refactoring to introduce an intermediate AST that mirrors the JSON structure, with a separate HOL translation to vyperAST.
Problem
Examples of semantic work currently buried in the parser:
- Computing loop bounds from array types
- Constant folding (target AST assumes constants folded)
- Recognizing
msg.senderfrom nested JSON structure - Deducing
StringvsBytesbound from type field
Proposed Solution
JSON → jsonASTScript.sml (dumb 1:1 mapping)
↓
jsonToVyperScript.sml (HOL translation, all semantic decisions)
↓
vyperAST (clean, elaborated)
Agent-Optimized Implementation Plan
The following plan is written for an agent to execute. It contains detailed AST sketches, translation examples, and implementation phases.
# Implementation Plan: Intermediate JSON-matching AST
## Motivation
Currently `vyperTestLib.sml` does two things at once:
1. Parses JSON into HOL terms
2. Makes semantic decisions to fit the target `vyperAST`
Examples of semantic work in the parser:
- Computing loop bounds from array types
- Constant folding (target AST assumes constants folded)
- Recognizing `msg.sender` from nested JSON structure and emitting `Builtin (Env Sender) []`
- Deducing `String` vs `Bytes` bound from type field
This mixes concerns. The proposal: introduce an intermediate AST that mirrors JSON structure, then translate to `vyperAST` in HOL.
```
JSON → jsonASTScript.sml (dumb 1:1 mapping)
↓
jsonToVyperScript.sml (HOL translation, all semantic decisions)
↓
vyperAST (clean, elaborated)
```
## Intermediate AST Sketch
File: `jsonASTScript.sml`
```sml
Theory jsonAST
Ancestors
string words
(* ===== Type Information ===== *)
(* Mirrors the "type" field in JSON *)
Datatype:
json_typeclass
= TC_integer
| TC_bytes_m
| TC_static_array
| TC_dynamic_array
| TC_struct
| TC_flag
| TC_tuple
| TC_hashmap
| TC_interface
| TC_contract_function
| TC_builtin_function
| TC_module
| TC_other string (* catch-all *)
End
Datatype:
json_type
= JT_Named string (* name field only, e.g. "bool", "address" *)
| JT_Integer num bool (* bits, is_signed *)
| JT_BytesM num (* m for bytes1..bytes32 *)
| JT_String num (* length *)
| JT_Bytes num (* length *)
| JT_StaticArray json_type num (* value_type, length *)
| JT_DynArray json_type num (* value_type, length *)
| JT_Struct string (* name *)
| JT_Flag string (* name *)
| JT_Tuple (json_type list) (* member_types *)
| JT_HashMap json_type json_type (* key_type, value_type *)
| JT_None (* null type *)
End
(* ===== Binary/Unary Operators ===== *)
(* Direct mirror of ast_type for operators *)
Datatype:
json_binop
= JBop_Add | JBop_Sub | JBop_Mult | JBop_Div | JBop_FloorDiv
| JBop_Mod | JBop_Pow
| JBop_And | JBop_Or | JBop_BitAnd | JBop_BitOr | JBop_BitXor
| JBop_LShift | JBop_RShift
| JBop_Eq | JBop_NotEq | JBop_Lt | JBop_LtE | JBop_Gt | JBop_GtE
| JBop_In | JBop_NotIn
End
Datatype:
json_unaryop = JUop_USub | JUop_Not | JUop_Invert
End
Datatype:
json_boolop = JBoolop_And | JBoolop_Or
End
(* ===== Expressions ===== *)
(* Only keep type info where needed for translation *)
Datatype:
json_expr
(* Literals - need type for int bounds, string/bytes length *)
= JE_Int int json_type (* value, type for bounds *)
| JE_Decimal string (* value as string *)
| JE_Str num string (* length, value *)
| JE_Bytes num string (* length, hex value *)
| JE_Hex string (* hex value for fixed bytes *)
| JE_Bool bool (* True/False *)
(* Variables and access *)
| JE_Name string (* id *)
| JE_Attribute json_expr string (* value, attr *)
| JE_Subscript json_expr json_expr (* value, slice *)
(* Operators *)
| JE_BinOp json_expr json_binop json_expr (* left, op, right *)
| JE_BoolOp json_boolop (json_expr list) (* op, values *)
| JE_UnaryOp json_unaryop json_expr (* op, operand *)
| JE_IfExp json_expr json_expr json_expr (* test, body, orelse *)
(* Compound - need type for array element type, tuple handling *)
| JE_Tuple (json_expr list) (* elements *)
| JE_List (json_expr list) json_type (* elements, type needed for elem type *)
(* Calls - need type for builtins like concat/slice that embed return length *)
| JE_Call json_expr (json_expr list) (json_keyword list) json_type
End
;
json_keyword = JKeyword string json_expr (* arg, value *)
End
(* ===== Statements ===== *)
Datatype:
json_stmt
= JS_Pass
| JS_Break
| JS_Continue
| JS_Expr json_expr
| JS_Return (json_expr option)
| JS_Raise (json_expr option)
| JS_Assert json_expr (json_expr option) (* test, msg *)
| JS_Log string (json_expr list) (* event name, args *)
| JS_If json_expr (json_stmt list) (json_stmt list) (* test, body, orelse *)
| JS_For string json_type json_iter (json_stmt list) (* var, var_type, iter, body *)
| JS_Assign json_expr json_expr (* target, value *)
| JS_AnnAssign string json_type json_expr (* var name, type, value *)
| JS_AugAssign json_expr json_binop json_expr (* target, op, value *)
| JS_Append json_expr json_expr (* target, value *)
;
(* Iterator - need type info for computing bounds *)
json_iter
= JIter_Range (json_expr list) (num option) (* args, explicit bound if given *)
| JIter_Array json_expr json_type (* array expr, array type for bound *)
End
(* ===== Top-level Declarations ===== *)
Datatype:
json_decorator = JDec string (* decorator name: external, internal, view, etc *)
End
Datatype:
json_arg = JArg string json_type (* arg name, type *)
End
Datatype:
json_func_type = JFuncType (json_type list) json_type (* argument_types, return_type *)
End
Datatype:
json_toplevel
= JTL_FunctionDef string (json_decorator list) (json_arg list) json_func_type (json_stmt list)
(* name, decorators, args, func_type, body *)
| JTL_VariableDecl string json_type bool bool bool (json_expr option)
(* name, type, is_public, is_immutable, is_transient, value (for constants) *)
| JTL_EventDef string (json_arg list)
| JTL_StructDef string (json_arg list)
| JTL_FlagDef string (string list) (* name, member names *)
| JTL_InterfaceDef string (* ignored but parsed *)
End
(* ===== Module ===== *)
Datatype:
json_module = JModule (json_toplevel list)
End
```
## Translation to vyperAST
File: `jsonToVyperScript.sml`
This is where semantic decisions live:
```sml
Theory jsonToVyper
Ancestors
jsonAST vyperAST
(* ===== Type Translation ===== *)
Definition translate_type_def:
translate_type (JT_Named "bool") = BaseT BoolT ∧
translate_type (JT_Named "address") = BaseT AddressT ∧
translate_type (JT_Named "decimal") = BaseT DecimalT ∧
translate_type (JT_Integer bits F) = BaseT (UintT bits) ∧
translate_type (JT_Integer bits T) = BaseT (IntT bits) ∧
translate_type (JT_BytesM m) = BaseT (BytesT (Fixed m)) ∧
translate_type (JT_String n) = BaseT (StringT n) ∧
translate_type (JT_Bytes n) = BaseT (BytesT (Dynamic n)) ∧
translate_type (JT_StaticArray vt len) = ArrayT (translate_type vt) (Fixed len) ∧
translate_type (JT_DynArray vt len) = ArrayT (translate_type vt) (Dynamic len) ∧
translate_type (JT_Struct name) = StructT name ∧
translate_type (JT_Flag name) = FlagT name ∧
translate_type (JT_Tuple tys) = TupleT (MAP translate_type tys) ∧
translate_type JT_None = NoneT
Termination
...
End
(* ===== Expression Translation ===== *)
(* This is where msg.sender recognition etc. happens *)
Definition translate_expr_def:
(* msg.sender → Builtin (Env Sender) [] *)
translate_expr (JE_Attribute (JE_Name "msg") "sender") =
Builtin (Env Sender) [] ∧
(* msg.value → Builtin (Env ValueSent) [] *)
translate_expr (JE_Attribute (JE_Name "msg") "value") =
Builtin (Env ValueSent) [] ∧
(* block.timestamp → Builtin (Env TimeStamp) [] *)
translate_expr (JE_Attribute (JE_Name "block") "timestamp") =
Builtin (Env TimeStamp) [] ∧
(* block.number → Builtin (Env BlockNumber) [] *)
translate_expr (JE_Attribute (JE_Name "block") "number") =
Builtin (Env BlockNumber) [] ∧
(* self.x → TopLevelName x *)
translate_expr (JE_Attribute (JE_Name "self") attr) =
TopLevelName attr ∧
(* General attribute: e.attr *)
translate_expr (JE_Attribute e attr) =
Attribute (translate_expr e) attr ∧
(* Name *)
translate_expr (JE_Name id) = Name id ∧
(* Integer literal *)
translate_expr (JE_Int v ty) =
Literal (IntL (int_bound_of_type ty) v) ∧
(* ... etc ... *)
Termination
...
End
(* ===== Statement Translation ===== *)
(* Loop bounds computed here from iterator type *)
Definition get_iter_bound_def:
(* Range with explicit bound *)
get_iter_bound (JIter_Range args (SOME n)) = n ∧
(* Range without bound: end - start *)
get_iter_bound (JIter_Range [e] NONE) = ... ∧ (* 0 to e *)
get_iter_bound (JIter_Range [s;e] NONE) = ... ∧ (* s to e *)
(* Array: get length from type *)
get_iter_bound (JIter_Array _ (JT_StaticArray _ len)) = len ∧
get_iter_bound (JIter_Array _ (JT_DynArray _ len)) = len
End
Definition translate_iter_def:
translate_iter var_ty (JIter_Range args _) =
Range (translate_range_start var_ty args) (translate_range_end var_ty args) ∧
translate_iter var_ty (JIter_Array e _) =
Array (translate_expr e)
End
Definition translate_stmt_def:
translate_stmt JS_Pass = Pass ∧
translate_stmt JS_Break = Break ∧
translate_stmt JS_Continue = Continue ∧
translate_stmt (JS_For var ty iter body) =
For var (translate_type ty) (translate_iter ty iter)
(get_iter_bound iter) (MAP translate_stmt body) ∧
translate_stmt (JS_Append tgt e) =
Append (translate_base_target tgt) (translate_expr e) ∧
(* ... etc ... *)
End
```
## Implementation Steps
### Phase 1: Define jsonAST (1-2 days)
1. Create `jsonASTScript.sml` with datatypes above
2. Keep it simple - mirror JSON structure exactly
3. Add `cv_auto_trans` for all types
4. Build and verify it compiles
### Phase 2: SML JSON Parser (2-3 days)
1. Create `jsonASTLib.sml` - simplified parser that just maps JSON → jsonAST terms
2. Much simpler than current `vyperTestLib.sml` - no semantic decisions
3. Each JSON `ast_type` maps to one constructor
4. Type field parsing is mechanical
### Phase 3: HOL Translation (3-4 days)
1. Create `jsonToVyperScript.sml`
2. `translate_type` - straightforward
3. `translate_expr` - this is where `msg.sender` etc. recognition goes
4. `translate_stmt` - loop bound computation here
5. `translate_toplevel`
6. Prove termination for recursive functions
### Phase 4: Integration (1-2 days)
1. Modify `vyperTestLib.sml` to use new pipeline:
- Parse JSON → jsonAST term
- Apply `translate_module` (via cv_compute or similar)
- Result is vyperAST term
2. Verify all tests still pass
### Phase 5: Cleanup (1 day)
1. Remove old parsing code from `vyperTestLib.sml`
2. Document the new architecture
3. Update CLAUDE.md/AGENTS.md
## Files to Create
| File | Purpose |
|------|---------|
| `jsonASTScript.sml` | Intermediate AST datatypes |
| `jsonASTSyntax.sml` | SML syntax functions (auto-generated mostly) |
| `jsonASTLib.sml` | JSON → jsonAST parser (SML) |
| `jsonToVyperScript.sml` | jsonAST → vyperAST translation (HOL) |
## Design Decisions
1. **Trace/test infrastructure**: Keep as-is. The existing trace translation (call, deployment) is already minimal - no need for intermediate AST there.
2. **Granularity**: Only include fields we actually use. Skip `node_id`, `variable_reads`, `variable_writes`, `source_id`, etc. The intermediate AST is not a complete JSON mirror, just what's needed for translation.
3. **Folded values**: Use `folded_value` when available (in SML parser), don't represent both in intermediate AST. The intermediate AST holds the value we'll use.
4. **Error handling**: Catch malformed JSON in SML parser. Fail fast. The HOL translation assumes well-formed input.
## Benefits When Complete
1. **SML parser becomes trivial** - just field access, no logic
2. **Semantic decisions documented in HOL** - inspectable, modifiable
3. **JSON format explicitly specified** - jsonAST documents what we expect
4. **Easier to extend** - new JSON construct = new constructor + translation clause
5. **Potential for proving translation correct** - if ever desiredMetadata
Metadata
Assignees
Labels
No labels