Due Friday, March 21th, 11:59PM (Pacific Time)
Note: March 21th is the last day of the quarter, so this will be the absolute last day to submit this assignment. No late submissions will be accepted.
- Make sure you download the latest version of the reference manual using this link. You will need to refer to the chapter on type inference in the language reference manual.
- Either clone this directory, or download the zipped directory using this link.
- The files required for this assignment are located in this folder. In particular, the functions you need to implement are in lamp/typeinfer.ml and (for extra credits) lamp/curry_howard.ml.
- You must not change the type signatures of the original functions. Otherwise, your program will not compile. If you accidentally change the type signatures, you can refer to the corresponding
.mlifile to see what the expected type signatures are. - Once you're done, run
make, which will create an archive calledsubmission.zipcontaininglamp/{typeinfer.ml,curry_howard.ml}. Submit the zip file to Gradescope. You do not need to submit any other file, including any.mlifile or test code. The autograder will automatically compile your code together with our testing infrastructure and run the tests. - If your program contains print statements, please remove them before submitting.
In this assignment, you will implement an extremely powerful type inference algorithm for λ+, which will allow you to type check not only programs containing no type annotation whatsoever. Before we start, however, let's introduce an OCaml feature that we will use in this assignment. Although you won't be graded on this part, it is important to make sure you understand it before proceeding to the main part of the assignment.
You have been using the pure subset of OCaml. However, OCaml also supports imperative features, such as mutable states, which are called reference cells in OCaml.
If t is an OCaml type, then a reference cell that holds a value of type t has type t ref. The ref type is a built-in type in OCaml. There are only three operations on reference cells:
- Creating a reference cell: For example,
ref 0creates a fresh reference cell that initially holds the value0. The type of this reference cell isint ref, notint. - Reading from a reference cell:
!rreads the value from the reference cellr. Ifrhas typet ref, then!rhas typet. You can only read from a value of typeref t. For example, you cannot read from a value of typeint. - Writing to a reference cell:
r := ewrites the result of evaluating expressioneto the reference cellr. The type of the overall expressionr := eisunit, which is simply a dummy value that represents the result of performing some side-effect without returning a value. You can sequence multiple expressions of typeunitwith the;operator. For example,r := 1; r := 2first writes1to the reference cellr, and then writes2to the reference cellr. (This also works for printing:print_endline "Hello"; print_endline "World"will print "Hello" and then "World".)
If you're familiar with C or C++, reference cells are exactly analogous to pointers:
int refin OCaml is like anint*in Cref vin OCaml is likemallocfollowed by initialization with valuev.!rin OCaml is like*rin Cr := ein OCaml is like*r = ein C But reference cells are much nicer to work with than pointers. In particular:
- There's no manual memory management via
free. OCaml automatically garbage collects reference cells when they are no longer needed.- There's no such thing as a
nullpointer. Every reference cell of typet refis guaranteed to hold a value of typet.
As an example, here's an efficient implementation of the Fibonacci function using reference cells:
let fib (n: int) =
let a = ref 0 in
let b = ref 1 in
let rec go (n: int) =
if n = 0 then !a
else
(
let a_prev = !a in
a := !b;
b := a_prev + !b;
go (n - 1)
)
in go nNote the use of ;: given two expressions e1 and e2, if e1 has type unit, then e1; e2 first evaluates e1, discards the result, and then evaluates e2. This is useful for sequencing side effects. In the example above, the ; operator is used to sequence the updates to a and b.
Exercise (📝) Re-implement the timestamp function from hw1 using reference cells.
OCaml Tip: In later parts of the assignment, you may find it helpful to
;to sequence effectful computation, such as writing to reference cells or print statements. For example,Fmt.pr "[abstract_eval] e = %a\n%!" Pretty.pp_expr e; Fmt.pr "[abstract_eval] t = %a\n%!" Pretty.pp_ty t; ... Fmt.pr "[unify] constraint = %a\n%!" Pretty.pp_cons c; ...
Exercise: On a piece of paper, for each of the following expressions
Hint: Feel free to use these expressions as test cases when you're implementing your constraint generation algorithm.
lambda x. x + 2 * xx(lambda x. x + 2 * y) 310 < 101::2::3+4::Nil(lambda x.x) :: (lambda x. x::Nil) :: Nil(lambda x.x) :: (lambda x. x+1) :: Nilif true then false else truematch Nil with Nil -> Nil | _::_ -> Nil end(if 3>4 then 5 else 7+10*3) = 10.let f = lambda x. if x then false else true in f (10 > 0)1 :: 10 :: Nil :: Nil(1::10::Nil) :: Nil :: Nil(Nil :: Nil) :: Nil :: Nilmatch 1::Nil with Nil -> 0 | hd::_ -> hd endmatch 1::Nil with Nil -> 0 | _::tl -> tl endmatch 1::2 with Nil -> 3 | x::y -> x+y end(fix recur is lambda n. if n < 1 then 1 else recur (n-1) + recur (n-2)) 2.(fix recur is lambda xs: List['a]. match xs with Nil -> 0 | _::ys -> 1 + recur ys end) (false::true::Nil[Int])(fix recur is lambda xs: List['a]. match xs with Nil -> 0 | _::ys -> 1 + recur ys end) Nil(fix recur is lambda n. recur (n-1)) 10(fix recur is lambda n: Int. n-1) 10if true then 2 else false : Bool
Exercise: For each of the expressions above, solve the corresponding constraints using the unification algorithm, and think about the following:
- Which constraint systems have a solution?
- Which constraint systems do not have any solution? What does this tell you about the original expression?
- Which constraint systems have multiple solutions? What does this tell you about the original expression?
Abstract syntax: Since we must introduce type variables, we have augmented ty with a new constructor called TVar that represents type variables using strings:
type ty =
+ | TVar of string (** Type variable *)
| TInt
| TBool
| TFun of ty * ty
| TList of ty
...Concrete syntax The parser has also been updated to parse type variables. A type variable has the form '<ID>, i.e., a single quote followed by an identifier. For example, 'a is a type variable. Please see the test/examples folder for examples of the new syntax.
WARNING: Do not confuse expression variables(represented by
Varin theexprtype), with type variables (represented byTVarin thetytype). They are two different things. An expression variable can only be substituted with a value during runtime/evaluation, while a type variable can only be substituted with another type during type inference. For example, in the following$\lambda^+$ expression:fun rec map: List['b] with f: 'a -> 'b, l: List['a] = match l with | Nil -> Nil | x::xs -> let y = f x in let ys = map f xs in y::ys end in ...
x,xs,y, andysare expression variables, while'aand'bare type variables. During runtime,xandxswill be substituted with the head and the tail of listl. During type inference,'aand'bwill be substituted with, for example,intif themapfunction is called withList[Int]as the type of the input list andint -> intas the type of the functionf.
Problem: Your task is to complete the abstract_eval function in the Infer module of typeinfer.ml. This function will be used to generate typing constraints. You should refer to the type inference rules as presented in Section 6.4 - Figure 8 of the reference manual.
Important notes:
-
Constraint generation works by traversing the AST expression and collecting the typing constraints into a global store. We will represent a constraint between two types as a pair:
type cons = ty * ty
and the function
pp_consformats a constraint in a human-readable format.The global constraint store is implemented using a reference cell:
let _cs : cons list ref = ref []
However, you don't need to modify
_csdirectly. We have provided the(===) : ty -> ty -> unitfunction which adds a constraint between two types to the global store. Note that this function can be called using infix notation. For example, if you want to constraint1to be equal to the integer type, andt2to be equal tot1you can call:t1 === int; t2 === t1; <more code>
Note the use of
;to sequence expressions ofunittype. Do NOT attempt to modify the accumulator variable_csyourself. Also, you should never need to remove any constraint in this part of the assignment.We also provided some helper functions:
- The
curr_cons_list : unit -> cons listfunction returns the list of constraints collected so far. You should not need to call this function until Part 3, unless you want to print out all the constraints generated so far for debugging purposes. - We have defined the
fresh_var : unit -> tyfunction to generate a fresh type variable. Call this function usingfresh_var (), which will return someTVar strwherestris an automatically generated string that is guaranteed to be unique from all previous calls tofresh_var.
- The
-
As in HW4, the
abstract_evalfunction has typegamma -> expr -> ty. Unlike HW4, in anticipation of Part 3 of this assignment, the typing environment has been defined to betype gamma = (string * ty) list
That is, the typing environment maps expression variables to actual types (that may contain type variables).
-
The structure of your
abstract_evalshould be extremely close to what you had in HW4. The only difference is that, when you checked type equality usingequal_tyin HW4, you will now instead generate an equality constraint using the===operator, and generate fresh variables as needed usingfresh_var (), both of which are provided to you. -
Your
abstract_evalmust take type annotations into account. For example, you should use type annotations if they are provided (e.g. such as inlambda x:Int. x), or generate fresh type variables if they are absent.
OCaml Tip: Mutually recursive functions are defined using a combination of
let recandand. For example, the following code defines two mutually recursive functionsevenandodd:let rec even (n: int) : bool = if n = 0 then true else odd (n - 1) and odd (n: int) : bool = if n = 0 then false else even (n - 1)Notice that you do not need to write
recafterand.In Part 3 of this assignment, constraint generation and constraint solving will be mutually recursive. Thus, be careful not to accidentally break the mutual recursion by removing
andorrec, or inserting other functions in between the mutually recursive functions.
We're now ready to implement the unification-based constraint solver, solve : cons list -> soln, which takes a list of constraints collected so far, and returns a solution to the constraints. The soln type is defined as an association list of type variables to types:
type soln = (string * ty) listAlternative, you can think of soln as representing a substitution that maps type variables to types.
The solve function first pattern-matches on the list of constraints cs:
...
and solve (cs : cons list) : soln =
match cs with
| [] -> []
| (t1, t2)::cs ->
match t1, t2 with
| _, _ -> part2 ()- If the list is empty, then there are no constraints to solve, and the solution is the empty substitution, represented by the empty list
[]. - If the list is non-empty, where
(t1, t2)is the first constraint, then we will pattern-match ont1andt2to determine how to proceed.
Please refer to Section 6.5 of the manual for a detailed description of the unification algorithm. The unify function should raise an error using the ty_err function if the constraints are unsatisfiable.
Before you proceed, let's first define some helper functions in the Utils module. Later, when you need to call the above helper functions, you need to qualify the function names with the module name, as in Utils.free_vars, Utils.subst, and Utils.apply_soln. There will no be autograde tests for these helper functions, but it is crucial that you implement them correctly and test them thoroughly.
Complete the free_vars function that collects all type variables in a ty into a set.
Hint
Since monomorphic types do not have binding structures, you just need to `union` a bunch of things together.In the Utils module, implement the subst function which replaces all occurrences of a type variable string with another type ty.
Using the higher-order function map, define a function subst_cs: string -> ty -> cons list -> cons list that substitutes a type variable with a type in a list of constraints.
Define a function apply_soln: soln -> ty -> ty that applies a solution s to a type t by substituting all type variables in t with their solved types given by s.
Hint
1. Combine `subst` from Task 2.2 with either `fold_left` or `fold_right`. 2. If you did the extra-credit `subst_multi` from HW2, don't confuse that with this function. This function is *much* simpler. (In fact it might be the "incorrect solution" for `subst_multi` that many of you came up with first.)- You can find unit test helpers in
test/test_typing.ml(to test the overall type inference algorithm), andtest/test_solving.ml(to test the constraint solver indepdently). You can execute unit tests withdune runtest. For convenience, we have provided two modules calledExprandTyinast.mlthat will make it easier to manually construct expressions and types for testing. - We highly recommend that you write unit tests for each helper function you defined.
- Make sure you test your constraint generation and your constraint solver separately. If you generate incorrect constraints, your unification will not produce the correct answers ("garbage in, garbage out").
- Afterwards, you should test your constraint generation and unification together, so that you can get a sense of how well they do as an actual type inference procedure (which is the goal of this assignment).
- Be careful when implementing unification; it is easy to end up with infinite recursion. In particular, you must implement the "occurs check" correctly or you will hang the autograder and get a score of zero.
- The reference interpreter on CSIL (
~junrui/lamp) also implements type inference. - If you would like to run the interpreter with your own type inference implementation locally, use
dune exec bin/repl.exeordune exec bin/repl.exe -- <filename>. - Some good sources of test cases are:
- Exercise 1.0 in this document.
- OCaml exercises from HW1 & HW2 -- you can rewrite some of them in
$\lambda^+$ . - Programs that involve higher-order functions, such as
map,fold,filter, etc. - Example programs in test/examples that you can use to stress-test your type inference engine.
Click here to show extra credit problems
Part 3: The Curry-Howard Correspondence
Problem 1 (1 point)
Reverse engineering type inference for unit (()), void (!), product (*), and sum (+) types by playing with the reference interpreter.
Problem 2 (1 point)
The unit, void, product, sum, and function types in
- The unit type
()corresponds to the logical propositionTrue. - The void type
!corresponds to the logical propositionFalse. - The product type
*corresponds to logical conjunction/\. - The sum type
+corresponds to logical disjunction\/. - The function type
->corresponds to logical implication->.
For each of the following t:
- Convert it to a logical proposition using the scheme above. Use the precedence rule that
*binds more tightly than+, and+binds more tightly than->. For associativity, assume*and+are left-associative, and->is right-associative (as in OCaml), meaning thatA -> B -> Cis parsed asA -> (B -> C). - Determine if the proposition is always true (valid), sometimes true (satisfiable), or always false (unsatisfiable). Use an online truth table generator to verify your answer.
- Try to come up with a
$\lambda^+$ expressionefor which the type inference algorithm assignsttoe(the inferred type doesn't need to be exactly the same ast, as long as you can alpha-rename the type variables to gett). Don't use thefixoperator for this problem. - If the proposition is valid, are you able to come up with a corresponding expression?
- If you are able to come up with a corresponding expression, is the proposition valid?
always_true: ()always_false: !everything: 'peverything_implies_truth: 'p -> ()falsehood_implies_everything: ! -> 'qeverything_implies_itself: 'p -> 'pmodus_ponens: 'p * ('p -> 'q) -> 'qboth_true_implies_left_true: 'p * 'q -> 'peither_true_implies_left_true: 'p + 'q -> 'pconjunction_is_commutative: 'p * 'q -> 'q * 'pdisjunction_is_commutative: 'p + 'q -> 'q + 'pconjunction_distributes_over_disjunction: 'p * ('q + 'r) -> ('p * 'q) + ('p * 'r)disjunction_distributes_over_conjunction: 'p + ('q * 'r) -> ('p + 'q) * ('p + 'r)-
curry: ('p * 'q -> 'r) -> ('p -> ('q -> 'r))- Intuitively, this type says that any two-argument function can be converted into an equivalent higher-order function that takes the first argument and returns a function that takes the second argument.
uncurry: ('p -> ('q -> 'r)) -> ('p * 'q -> 'r)
Provide your solution in Problem2 module of curry_howard.ml.
Problem 3 (8 points)
The miracle that we just witnessed is just a tip of an iceberg. It embodies a profound phenomenon in logic, computer science, and math, called the Curry-Howard correspondence. It states that, whenever you write a functional program that has a certain type, you are also implicitly proving that a mathematical theorem holds.
Different type systems allow you to express and prove different sorts of math theorems. The simple type system of
As we extend the power of the type system, the Curry-Howard correspondence can give us some really cool applications:
-
Polymorphism (generics in OOP languages) corresponds to universal quantification (
$\forall$ ). Data abstraction -- which hides the internal implementation of some function from the clients -- corresponds to existential quantification ($\exists$ ). -
Mathematical induction becomes just annother recursive higher-order function that takes some value representing the base case and a function representing the inductive step. For example, induction on
listis exactlyList.fold_right. -
Once you throw in "dependent types", which allow you to talk about expressions in types and vice versa, you can do theorem proving just as functional programming! We can phrase an arbitrarily complicated math theorem, say the four-color theorem, as the type of a functional program. Proving the theorem amounts to writing down some (really long) program and asking the type checker to certify that the program indeed has the right type. In this way, the program becomes a proof of the theorem. Moreover,
- since the proof is just a another program, and proving is just coding, we can leverage LLMs or other AI techniques to help us write the proof!
- Importantly, the program/proof is always validated by a type checker, so we can have an extremely high degree of confidence that the proof is correct, regardless of whether the proof is written by a human or hallucinated by LLMs.
-
Program Verification It is important for programs in security-critical systems to be categorically free of any sort of bugs. For example, you do not want aviation software to have integer overflow or dereferences null pointers. We can again phrase these properties as types, and write a machine-checked proof that the program of interest indeed has the desired property.
If this sounds interesting at all, you may want to contact Prof. Yu Feng, whose lab does research on these topics.
In this problem, we'll give you a taste of an exciting area research called program synthesis, which aims to automatically generating programs that are guaranteed to meet some user specification. Contrast this with LLM-based code generation, which always has the possibility of spitting out nonsense -- I don't know about you, but I personally don't feel safe riding an airplane if I learn that the software was LLM-generated.
We'll take programs to be synthesize: ty -> expr option that, given a type as specification, finds a None. In implementing this function:
- You should not use
Fixin your synthesized programs. - You can ignore
TList.
Note that this is a challenging problem, as suggested by the high point value. You are not expected to solve this problem perfectly. The autograder will award points based on the percentage of types for which you can synthesize a program. A simple algorithm will be enough to solve a non-trivial portion, if not all, of the test cases. You will be given a 10-second timeout per test case.
To help you debug your implementation interactively, we have added a #synth <ty> command to the REPL interpreter. For example,
#synth 'a -> 'a
will run your synthesis function. Hopefully, this will return
[synthesis] ==> lambda x. x
inferred type ==> 't1 -> 't1
and the REPL will also run your type inference algorithm on the synthesized program so that you can check whether the synthesized program has the correct type. In this case, the synthesized program lambda x. x indeed has the type 't1 -> 't1, which is alpha-equivalent to 'a -> 'a. Hopefully, this is also the answer you gave for everything_implies_itself in Problem 5.1.
Junrui TODO: Set up starter code for this part