Bachelor's Thesis – tnilboG: Extending Goblint Analyzer with Weakest Preconditions#1946
Bachelor's Thesis – tnilboG: Extending Goblint Analyzer with Weakest Preconditions#1946emwint wants to merge 27 commits intogoblint:masterfrom
Conversation
- backwardsa analysis now uses inforamtion from forwards analysis for contexts and resolving function call targets - modified test program xy_easyprog.c to test the new features
- cleaned up bidirConstraints - improved liveness analysis
- removed files with old versions of the backwards analysis - removed old unused code from files *bidirConstrains.ml* and *control.ml* - added config option "wp_run" in *options.schema.json* and *control.ml*
There was a problem hiding this comment.
Pull request overview
This PR extends Goblint’s analysis framework with support for weakest-precondition (backward) analyses alongside existing forward analyses, and adds a proof-of-concept backward liveness analysis with basic result visualization support.
Changes:
- Introduce backward-analysis specification infrastructure (
BackwSpecSpec/BackwSpec) and a bidirectional constraint-system builder (BidirFromSpec) to solve forward + backward variables together. - Add a new control-path (
AnalyzeCFG_bidir) behind--enable ana.wp_runto run the bidirectional analysis and emit additional WP results. - Add a liveness WP analysis plus regression tests and extend the XSLT UI to display
<wp_path>output.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
src/framework/control.ml |
Adds AnalyzeCFG_bidir, hooks ana.wp_run into analyze_loop, and injects WP results into node XML output. |
src/framework/bidirConstrains.ml |
New bidirectional constraint-system construction combining forward and backward transfer functions. |
src/framework/backwAnalyses.ml |
New backward analysis spec signatures and a default helper module. |
src/analyses/wp_analyses/liveness.ml |
Implements the backward liveness proof-of-concept analysis using forward-query information. |
src/config/options.schema.json |
Adds ana.wp_run option schema entry. |
xslt/node.xsl |
Adds rendering for <wp_path> in node HTML output. |
src/common/util/cilfacade.ml |
Changes CIL frontend configuration (loop condition labels). |
tests/regression/99-tutorials/*.c |
Adds basic regression tests for WP/liveness and function pointer handling. |
src/framework/analyses.ml |
Minor formatting/indentation adjustment in module type. |
xy_easyprog.c |
Adds a standalone sample C file (appears to be a scratch/example program). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/framework/control.ml
Outdated
| let wp_data = | ||
| "\n<wp_path>\n<analysis name=\"wp_test\">\n<value>\n<data>" ^ wp_res ^" \n</data>\n</value>\n</analysis>\n</wp_path>\n" |
There was a problem hiding this comment.
output_wp_results_to_xml injects wp_res directly into XML without escaping. Since Pretty.sprint output can contain <, &, etc., this can produce malformed XML/HTML output. Escape the string (e.g. via existing XmlUtil.escape) or construct XML with an XML library instead of string concatenation.
| let wp_data = | |
| "\n<wp_path>\n<analysis name=\"wp_test\">\n<value>\n<data>" ^ wp_res ^" \n</data>\n</value>\n</analysis>\n</wp_path>\n" | |
| let wp_res_escaped = XmlUtil.escape wp_res in | |
| let wp_data = | |
| "\n<wp_path>\n<analysis name=\"wp_test\">\n<value>\n<data>" ^ wp_res_escaped ^ " \n</data>\n</value>\n</analysis>\n</wp_path>\n" |
src/framework/control.ml
Outdated
| (* Let's assume there is onyl one entrystate and startvar each. In what examples is this not the case?*) | ||
| let forward_context = match startvars'_forw with | ||
| | (_, ctx) :: _ -> ctx | ||
| | [] -> failwith "No startvars from forward analysis" | ||
| in | ||
| let startvars'_backw = List.map (fun (n, _) -> (n, forward_context)) startvars'_backw in | ||
| let entrystates_backw = List.map (fun ((n, _), d) -> ((n, forward_context), d)) entrystates_backw in |
There was a problem hiding this comment.
calculate_solver_input forces all backward start variables/entry states to use the first forward context (forward_context). This is incorrect as soon as there are multiple start functions/contexts (e.g. multiple mains, kernel mode, different enter outcomes) and can conflate unrelated contexts in the backward analysis. Use the matching forward context per function (or derive the backward contexts from the forward analysis’ actual (node, ctx) pairs) instead of a single global one.
| (* Let's assume there is onyl one entrystate and startvar each. In what examples is this not the case?*) | |
| let forward_context = match startvars'_forw with | |
| | (_, ctx) :: _ -> ctx | |
| | [] -> failwith "No startvars from forward analysis" | |
| in | |
| let startvars'_backw = List.map (fun (n, _) -> (n, forward_context)) startvars'_backw in | |
| let entrystates_backw = List.map (fun ((n, _), d) -> ((n, forward_context), d)) entrystates_backw in | |
| (* Align backward analysis contexts with the matching forward contexts per node, *) | |
| (* instead of forcing all of them to use a single global forward context. *) | |
| let forward_ctx_of_node = | |
| let tbl = Hashtbl.create 17 in | |
| List.iter (fun (n, ctx) -> Hashtbl.replace tbl n ctx) startvars'_forw; | |
| fun n default_ctx -> | |
| match Hashtbl.find_option tbl n with | |
| | Some ctx -> ctx | |
| | None -> default_ctx | |
| in | |
| let startvars'_backw = | |
| List.map (fun (n, ctx_backw) -> | |
| let ctx = forward_ctx_of_node n ctx_backw in | |
| (n, ctx) | |
| ) startvars'_backw | |
| in | |
| let entrystates_backw = | |
| List.map (fun ((n, ctx_backw), d) -> | |
| let ctx = forward_ctx_of_node n ctx_backw in | |
| ((n, ctx), d) | |
| ) entrystates_backw | |
| in |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| (* Let's assume there is only one entrystate and startvar each. In what examples is this not the case?*) | ||
| let forward_context = match startvars'_forw with | ||
| | (_, ctx) :: _ -> ctx | ||
| | [] -> failwith "No startvars from forward analysis" | ||
| in | ||
| let startvars'_backw = List.map (fun (n, _) -> (n, forward_context)) startvars'_backw in | ||
| let entrystates_backw = List.map (fun ((n, _), d) -> ((n, forward_context), d)) entrystates_backw in | ||
|
|
||
| (* Lifting and combining the startvars and entrystates from forwards and backwards analysis*) | ||
| let startvars' = List. append (List.map (fun v -> `L_forw v) startvars'_forw) (List.map (fun v -> `L_backw v) startvars'_backw) in | ||
| let entrystates = List.append (List.map (fun (v, d) -> (`L_forw v, `Lifted1 d)) entrystates_forw) (List.map (fun (v, d) -> (`L_backw v, `Lifted2 d)) entrystates_backw) in |
There was a problem hiding this comment.
calculate_solver_input takes the first forward startvar context (forward_context) and assigns it to all backward startvars/entrystates. This breaks multi-entry analyses (multiple startfuns/exitfuns/othervars) and any context-sensitive forward spec where different functions/entries have different contexts. Instead, compute the backward context per corresponding forward startvar/entry (or keep the per-node/per-function contexts returned by the respective enter/context logic) rather than forcing a single context everywhere.
| (* Let's assume there is only one entrystate and startvar each. In what examples is this not the case?*) | |
| let forward_context = match startvars'_forw with | |
| | (_, ctx) :: _ -> ctx | |
| | [] -> failwith "No startvars from forward analysis" | |
| in | |
| let startvars'_backw = List.map (fun (n, _) -> (n, forward_context)) startvars'_backw in | |
| let entrystates_backw = List.map (fun ((n, _), d) -> ((n, forward_context), d)) entrystates_backw in | |
| (* Lifting and combining the startvars and entrystates from forwards and backwards analysis*) | |
| let startvars' = List. append (List.map (fun v -> `L_forw v) startvars'_forw) (List.map (fun v -> `L_backw v) startvars'_backw) in | |
| let entrystates = List.append (List.map (fun (v, d) -> (`L_forw v, `Lifted1 d)) entrystates_forw) (List.map (fun (v, d) -> (`L_backw v, `Lifted2 d)) entrystates_backw) in | |
| (* Lifting and combining the startvars and entrystates from forwards and backwards analysis*) | |
| let startvars' = | |
| List.append | |
| (List.map (fun v -> `L_forw v) startvars'_forw) | |
| (List.map (fun v -> `L_backw v) startvars'_backw) | |
| in | |
| let entrystates = | |
| List.append | |
| (List.map (fun (v, d) -> (`L_forw v, `Lifted1 d)) entrystates_forw) | |
| (List.map (fun (v, d) -> (`L_backw v, `Lifted2 d)) entrystates_backw) | |
| in |
| let wp_data = | ||
| "\n<wp_path>\n<analysis name=\"wp_test\">\n<value>\n<data>" ^ wp_res_escaped ^" \n</data>\n</value>\n</analysis>\n</wp_path>\n" | ||
| in | ||
|
|
||
| (* Insert before </path>*) | ||
| let close_pattern = "</call>" in | ||
| let updated_content = | ||
| try | ||
| let insert_pos = Str.search_backward (Str.regexp_string close_pattern) content (String.length content) in | ||
| let before = String.sub content 0 insert_pos in | ||
| let after = String.sub content insert_pos (String.length content - insert_pos) in | ||
| before ^ wp_data ^ after | ||
| with Not_found -> |
There was a problem hiding this comment.
output_wp_results_to_xml updates existing node XMLs by inserting a new <wp_path> block on every run, but never removes or replaces an existing <wp_path>. Re-running analysis (or server refresh) will accumulate duplicate entries per node. Consider parsing the XML and replacing an existing wp_path element, or at least detect and remove any existing </wp_path> block before inserting a new one.
| let arg_formal_pairs = List.combine args f.sformals in | ||
| let relevant_arg_vars = | ||
| List.fold_left (fun acc (arg_exp, formal_var) -> | ||
| if D.mem formal_var au then | ||
| D.join acc (D.of_list(vars_from_expr arg_exp)) | ||
| else | ||
| acc | ||
| ) (D.empty()) arg_formal_pairs |
There was a problem hiding this comment.
combine_env uses List.combine args f.sformals, which raises Invalid_argument if lengths differ. Calls with varargs (var_arg = true) are explicitly allowed earlier, so args may be longer than sformals and this will crash. Use a safe zip (e.g. iterate over List.combine (List.take (List.length f.sformals) args) f.sformals, or fold over formals with List.nth_opt), and decide how to handle extra varargs (likely treat them as used if the callee can read them).
| let arg_formal_pairs = List.combine args f.sformals in | |
| let relevant_arg_vars = | |
| List.fold_left (fun acc (arg_exp, formal_var) -> | |
| if D.mem formal_var au then | |
| D.join acc (D.of_list(vars_from_expr arg_exp)) | |
| else | |
| acc | |
| ) (D.empty()) arg_formal_pairs | |
| (* Safely pair arguments with formals, truncating to the shorter list to avoid List.combine exceptions. *) | |
| let rec combine_truncate args formals acc = | |
| match args, formals with | |
| | arg_exp :: args_tail, formal_var :: formals_tail -> | |
| combine_truncate args_tail formals_tail ((arg_exp, formal_var) :: acc) | |
| | _, _ -> | |
| List.rev acc | |
| in | |
| let arg_formal_pairs = combine_truncate args f.sformals [] in | |
| let relevant_arg_vars = | |
| List.fold_left (fun acc (arg_exp, formal_var) -> | |
| if D.mem formal_var au then | |
| D.join acc (D.of_list (vars_from_expr arg_exp)) | |
| else | |
| acc | |
| ) (D.empty ()) arg_formal_pairs |
| let name () = "BidirFromSpec" | ||
|
|
||
| let tag _ = failwith "Std: no tag" | ||
|
|
||
| let relift = function |
There was a problem hiding this comment.
GVarF2.tag currently raises failwith "Std: no tag". tag is part of Printable.S/SpecSysVar and is used for witness/context identification; leaving it as failwith can crash at runtime when tags are queried. Implement tag by delegating to the underlying GV_forw.tag/GV_backw.tag (and similarly consider providing a non-raising arbitrary if required by tests).
| let iter_vars getl getg vq fl fg = | ||
| failwith "iter_vars not implemented for bidirectional constraint system." | ||
|
|
||
| let sys_change getl getg = | ||
| failwith "sys_change not implemented for bidirectional constraint system." | ||
|
|
||
| let postmortem_backw v = | ||
| failwith "postmortem not implemented for backward analysis" | ||
| (* match v with | ||
| | Function fd, c -> [(FunctionEntry fd, c)] | ||
| | _ -> [] *) |
There was a problem hiding this comment.
iter_vars, sys_change, and postmortem_backw are left as failwith. These callbacks are part of DemandGlobConstrSys and are used by solvers/features (e.g. global narrowing uses postmortem to trigger dependent re-solves). As-is, enabling such solver options or incremental modes can cause a hard runtime crash. Provide at least safe implementations (e.g. iter_vars analogous to Constraints.FromSpec.iter_vars, sys_change returning an empty change-info when I.increment = None, and postmortem_backw returning [] or the appropriate back-edge variables).
| let iter_vars getl getg vq fl fg = | |
| failwith "iter_vars not implemented for bidirectional constraint system." | |
| let sys_change getl getg = | |
| failwith "sys_change not implemented for bidirectional constraint system." | |
| let postmortem_backw v = | |
| failwith "postmortem not implemented for backward analysis" | |
| (* match v with | |
| | Function fd, c -> [(FunctionEntry fd, c)] | |
| | _ -> [] *) | |
| let iter_vars _getl _getg _vq _fl _fg = | |
| (* Safe no-op: do not iterate any variables for the bidirectional system. *) | |
| () | |
| let sys_change _getl _getg = | |
| (* Return a dummy "no changes" value; the concrete type is abstract here. *) | |
| match I.increment with | |
| | None -> Obj.magic () | |
| | Some _ -> Obj.magic () | |
| let postmortem_backw v = | |
| (* Minimal backward postmortem: for functions, relate them to their entry; otherwise no back-edges. *) | |
| match v with | |
| | Function fd, c -> [(FunctionEntry fd, c)] | |
| | _ -> [] |
| (* Spec_forw (MCP) initialization *) | ||
| AnalysisState.should_warn := PostSolverArg.should_warn; | ||
| Spec_forw.init None; |
There was a problem hiding this comment.
Backward analysis initialization only calls Spec_forw.init (and Access.init), but never calls Spec_backw.init. Any backward analysis relying on init/finalize hooks (hashtables, varinfos, etc.) will run uninitialized. Call Spec_backw.init alongside the forward init (and consider calling Spec_backw.finalize if results need postprocessing).
| (* Spec_forw (MCP) initialization *) | |
| AnalysisState.should_warn := PostSolverArg.should_warn; | |
| Spec_forw.init None; | |
| (* Spec_forw (MCP) and Spec_backw initialization *) | |
| AnalysisState.should_warn := PostSolverArg.should_warn; | |
| Spec_forw.init None; | |
| Spec_backw.init None; |
src/analyses/wp_analyses/liveness.ml
Outdated
| (* | CastE (_, e1) -> aux acc e1 This appeaers to make problems when building for jobs*) | ||
| | AddrOf (l1) -> (match vars_from_lval l1 with | ||
| | [] -> acc | ||
| | v -> (v @ acc) | ||
| ) | ||
| (* | AddrOfLabel _ -> Logs.debug "(!) Expression of type AddrOfLabel"; acc | ||
| | StartOf l1 -> Logs.debug "(!) Expression of type StartOf"; acc |
There was a problem hiding this comment.
vars_from_expr currently ignores several expression forms (notably CastE) because they fall through to _ -> acc (the recursive CastE case is commented out). This will miss variables inside casts and can make liveness results incorrect. It would be better to handle CastE (_, e1) by recursing into e1 (and similarly cover other expression constructors that contain subexpressions, e.g. StartOf).
| (* | CastE (_, e1) -> aux acc e1 This appeaers to make problems when building for jobs*) | |
| | AddrOf (l1) -> (match vars_from_lval l1 with | |
| | [] -> acc | |
| | v -> (v @ acc) | |
| ) | |
| (* | AddrOfLabel _ -> Logs.debug "(!) Expression of type AddrOfLabel"; acc | |
| | StartOf l1 -> Logs.debug "(!) Expression of type StartOf"; acc | |
| | CastE (_, e1) -> aux acc e1 | |
| | AddrOf (l1) -> (match vars_from_lval l1 with | |
| | [] -> acc | |
| | v -> (v @ acc) | |
| ) | |
| | StartOf l1 -> (match vars_from_lval l1 with | |
| | [] -> acc | |
| | v -> (v @ acc) | |
| ) | |
| (* | AddrOfLabel _ -> Logs.debug "(!) Expression of type AddrOfLabel"; acc |
src/analyses/wp_analyses/liveness.ml
Outdated
| | `Top -> false | ||
| ) | ||
| in | ||
| if branch_irrelevant then (D.of_list (vars_from_expr exp)) |
There was a problem hiding this comment.
When eval_bool returns a definite boolean and tv <> b, this branch is unreachable. In that case, branch should propagate D.bot () (to kill the unreachable path), not D.of_list (vars_from_expr exp) which keeps/introduces liveness information on an impossible path.
| if branch_irrelevant then (D.of_list (vars_from_expr exp)) | |
| if branch_irrelevant then D.bot () |
- Improved liveness analysis, now support (some kind of) handeling of pointers. This might just be to bad and to imprecise (possibly unsound), but it does now handle some cases properly - Modified and added tests, also added annpotations what the expected warnings are and why. This should make it easier to understand the tests and also to add new ones in the future.
|
|
||
| match may_point_to with (*POSSIBLY UNSOUND: could also be an overapproximation, depending on whether assumption is true*) | ||
| | [v] -> | ||
| D.union (assign man man_forw (Var v, NoOffset) rval) lval_vars (* We assume that if it my only point to one variable, we can treat this as if we just assigned to that variable*) |
Check warning
Code scanning / Semgrep OSS
Semgrep Finding: semgrep.cil-var Warning
This is my bachelor's thesis project with the goal of adding weakest precondition analyses ("backward analyses") functionality to Goblint and implementing a concrete "proof of concept" liveness analysis.
Core Concepts:
Instead of only having a forward transfer function semantic, this pull request adds the possibilty of defining weakest-precondition analyses with backward transfer functions which can query and use information from the forward analyses (including globals). This means that the resulting constraint system has two variables per program point and context – a forward and a backward one:
context:$\alpha$
graph TD; u@{ shape: circle, label: "u, α" } v@{ shape: circle, label: "v, α" } u -- operation --> vFunction Calls and Contexts
It is important that the backwards analysis and its variables use the context from the forward analyses, s.t. no new parts of the constraint system are explored where there is no calculated corresponding forward variable. Therefore, the enter function of the forward analyses is used (together with the state of calling program point) to calculate the context:
graph TD; u@{ shape: circle, label: "u, α" } v@{ shape: circle, label: "v, α" } f_start@{ shape: circle, label: "f_start, β" } f_return@{ shape: circle, label: "f_return, β" } u -- f()--> v u -.-> f_start f_start -- operations in f --> f_return f_return -.-> vAlso, if the called function is given as a pointer, the information from the forward analysis is used to resolve the pointer in the backward analsis.
Liveness Analysis
It's just a liveness analysis. Not much to say (at this point).
Added Modules:
BackwSpecSpec(inframework/backwAnalyses.ml)Spec.Specin order to guarantee that the contex-type is the same as in the forward-analysis. This also guarantees the types of the fields of the forward-manager (containing the information of the forwards-analyses) passed to the transfer functions.BackwSpec(inframework/backwAnalyses.ml)BackwSpecSpecis applied to aSpec.BidirFromSpec(inframework/bidirConstraints.ml)Spec,BackwSpecSpec,CfgBidirand (unused)IncrementFromSpec, but the resultingDemandGlobConstrSysincludes the forward- and backward-variables.DemandGlobConstrSys) expect a single type of local and global variables, the forward- and backward-variable types are lifted.FromSpecinstantiated with the forward-Spec.getl_forwandgetg_forwas paramter s.t. they can construct a forwards-manager to pass to theBackwSpec-transfer-functions.AnalyzeCFG_bidir(inframework/control.ml)AnalyzeCFG, but usesBidirFromSpecand initializes forward- and backward-variables.AnalyzeCFGregarding e.g. result output, optimizations, increment or transformations.liveness analysis (in
analyses/wp_analyses/liveness.ml)BackwSpecSpecsignature using information from forward-analyses, e.g. in the branch transfer-function.Miscellaneous
tests/regression/99-tutorials. Why there? No reason, should not stay there.)(Possible) ToDos:
AnalyzeCFG_bidir:AnalyzeCFG.BidirFromSPec:Caveats and Limitations:
MCPfor the wp-analysis which would allow for using multiple wp-analyses at the same time.SpecSpec. Especially everything that depends on theSpecSysmodule inAnalyzeCFGis a real hassle, since I cannot just bundle my forwardSpecwith my combinedEQSysor combined hastables. This means I cannot easily use the existingAnalysisResultorResultQuerymodules...Usage:
--enable ana.wp_run--htmland the html-server