Skip to content

Smarter integer pattern matching #45

@eduardoleon

Description

@eduardoleon

One of ML's greatest strengths is pattern matching, which allows us to perform sophisticated case analyses without ever branching on a boolean expression. When it comes to case-analyzing inductively defined types, no other general-purpose language is better than ML.

Unfortunately, ML is far less good at case-analyzing integers. Just last week I wrote the following datatype:

datatype 'a tree
  = Pure of 'a
  | Two of int * 'a tree * 'a tree
  | ThreeA of int * 'a tree * 'a tree * 'a tree
  | ThreeB of int * 'a tree * 'a tree * 'a tree
  | Four of int * 'a tree * 'a tree * 'a tree * 'a tree

and the following helper functions:

fun part (m, Two (n, a, b)) = if m mod 2 = 0 andalso n = m + 1 then SOME (a, b) else NONE
  | part _ = NONE
  
fun many (n, ab, cd) =
  case (part (n, ab), part (n, cd)) of
      (NONE, NONE) => Two (n, ab, cd)
    | (NONE, SOME (c, d)) => ThreeA (n, ab, c, d)
    | (SOME (a, b), NONE) => ThreeB (n, a, b, cd)
    | (SOME (a, b), SOME (c, d)) => Four (n, a, b, c, d)

What I wish I could have written instead is

fun many (2*n, Two (2*n+1, a, b), Two (2*n+1, c, d)) = Four (2*n, a, b, c, d)
  | many (2*n, Two (2*n+1, a, b), c) = ThreeB (2*n, a, b, c)
  | many (2*n, a, Two (2*n+1, b, c)) = ThreeA (2*n, a, b, c)
  | many args = Two args

The patterns 2*n and 2*n+1 already take care of matching two consecutive integers, the greater of which is the odd one. More generally, what I would want to see is

  • Pattern matching affine combinations of integer variables with constant coefficients. For example,

    fun sameParity (x+y, x-y) = true
      | sameParity _ = false
  • Patterns that only match nonnegative integers. For example,

    fun lessOrEq (n, n + nonneg k) = true
      | lessOrEq _ = false
  • Of course, the type checker should reject patterns that can be matched in two or more ways for the same input. For example, the snippet

    fun wrong (2*x + 3*y, 4*x + 6*y) = ...
      | wrong _ = ...

    is wrong because the matrix [2 3; 4 6] (sorry, MATLAB notation) is not invertible, whereas the snippet

    fun okay (2*x + 3*y, 3*x + 2*y) = ...
      | okay _ = ...

    is not wrong because the matrix [2 3; 3 2] is invertible.

Does this seem like a reasonable feature to want?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions