Skip to content

Commit 7a37e2a

Browse files
htlouclaude
andcommitted
feat: Complete NumPy specification implementation for strings, testing, typing, and ufunc modules
This comprehensive update implements formal Lean 4 specifications for 50+ NumPy functions across 4 major modules: ## Strings Module (29 functions) - Complete specifications for string operations: splitlines, startswith, strip, swapcase, title, translate, upper, zfill, etc. - All functions use Vector-based approach for type safety - Comprehensive mathematical properties and edge case handling ## Testing Module (9 functions) - Assertion functions: assert_allclose, assert_almost_equal, assert_array_equal, etc. - Tolerance-based comparison with proper mathematical specifications - Context managers: suppress_warnings, break_cycles - Documentation testing: rundocs ## Typing Module (9 types) - Precision hierarchy: _8Bit, _16Bit, _32Bit, _64Bit, _128Bit - Type validation: ArrayLike, DTypeLike, NBitBase, NDArray - Full type system support for NumPy compatibility ## Ufunc Module (6 functions) - Universal function operations: __call__, accumulate, at, outer, reduce, reduceat - Mathematical specifications for element-wise and reduction operations - Proper handling of broadcasting and indexing semantics ## Key Features - All specifications use Hoare triple syntax: ⦃⌜precondition⌝⦄ function ⦃⇓result => postcondition⦄ - Vector-based approach throughout for compile-time type safety - Comprehensive mathematical properties and sanity checks - All files compile successfully with lake build - Ready for implementation and formal verification 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
1 parent 67f96b8 commit 7a37e2a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+2169
-396
lines changed

NumpySpec/strings/istitle.lean

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -14,25 +14,6 @@ import Std.Tactic.Do
1414

1515
open Std.Do
1616

17-
/-- numpy.strings.istitle: Returns true for each element if the element is a titlecased string and there is at least one character, false otherwise.
18-
19-
A string is considered titlecased if:
20-
1. It contains at least one character
21-
2. Each word starts with an uppercase letter followed by lowercase letters
22-
3. Words are separated by non-alphabetic characters
23-
4. There is at least one cased character in the string
24-
25-
Examples:
26-
- "Title Case" → True
27-
- "Numpy Is Great" → True
28-
- "numpy is great" → False
29-
- "NUMPY IS GREAT" → False
30-
- "" → False
31-
- "123" → False
32-
-/
33-
def istitle {n : Nat} (a : Vector String n) : Id (Vector Bool n) :=
34-
sorry
35-
3617
/-- Helper function to check if a string is titlecased according to Python's str.istitle() logic -/
3718
def isTitlecased (s : String) : Bool :=
3819
if s.isEmpty then false
@@ -58,6 +39,25 @@ def isTitlecased (s : String) : Bool :=
5839
checkTitleCase rest true
5940
checkTitleCase chars true
6041

42+
/-- numpy.strings.istitle: Returns true for each element if the element is a titlecased string and there is at least one character, false otherwise.
43+
44+
A string is considered titlecased if:
45+
1. It contains at least one character
46+
2. Each word starts with an uppercase letter followed by lowercase letters
47+
3. Words are separated by non-alphabetic characters
48+
4. There is at least one cased character in the string
49+
50+
Examples:
51+
- "Title Case" → True
52+
- "Numpy Is Great" → True
53+
- "numpy is great" → False
54+
- "NUMPY IS GREAT" → False
55+
- "" → False
56+
- "123" → False
57+
-/
58+
def istitle {n : Nat} (a : Vector String n) : Id (Vector Bool n) :=
59+
pure (a.map isTitlecased)
60+
6161
/-- Specification: numpy.strings.istitle returns a vector where each element indicates
6262
whether the corresponding string element is titlecased.
6363

NumpySpec/strings/isupper.lean

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -22,26 +22,16 @@ def isupper {n : Nat} (a : Vector String n) : Id (Vector Bool n) :=
2222
in the string are uppercase and there is at least one character, false otherwise.
2323
Mathematical properties:
2424
1. Empty strings return false
25-
2. Strings with no cased characters return false
25+
2. Strings with no cased characters return false
2626
3. Strings with mixed case return false
27-
4. Strings with all cased characters uppercase return true
28-
5. Preserves vector length -/
27+
4. Strings with all cased characters uppercase return true -/
2928
theorem isupper_spec {n : Nat} (a : Vector String n) :
3029
⦃⌜True⌝⦄
3130
isupper a
32-
⦃⇓result => ⌜-- Core property: each element is true iff all cased chars are uppercase and at least one char exists
33-
(∀ i : Fin n,
31+
⦃⇓result => ⌜∀ i : Fin n,
3432
let s := a.get i
3533
let chars := s.toList
3634
result.get i = (chars.length > 0
3735
(∃ c ∈ chars, c.isAlpha) ∧
38-
(∀ c ∈ chars, c.isAlpha → c.isUpper))) ∧
39-
-- Empty strings return false
40-
(∀ i : Fin n, a.get i = "" → result.get i = false) ∧
41-
-- Strings with no cased characters return false
42-
(∀ i : Fin n, (∀ c ∈ (a.get i).toList, ¬c.isAlpha) → result.get i = false) ∧
43-
-- All uppercase strings with at least one cased char return true
44-
(∀ i : Fin n, (∃ c ∈ (a.get i).toList, c.isAlpha) ∧
45-
(∀ c ∈ (a.get i).toList, c.isAlpha → c.isUpper) →
46-
result.get i = true)⌝⦄ := by
36+
(∀ c ∈ chars, c.isAlpha → c.isUpper))⌝⦄ := by
4737
sorry

NumpySpec/strings/join.lean

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,23 +52,40 @@ def join {n : Nat} (sep seq : Vector String n) : Id (Vector String n) :=
5252
3. Empty string handling: join(sep, '') = '' for any separator
5353
4. Single character handling: join(sep, 'c') = 'c' (no separator added)
5454
5. Multiple character handling: join('-', 'abc') = 'a-b-c'
55+
6. Length property: For non-empty strings with length > 1, the result length is
56+
original_length + (original_length - 1) * separator_length
57+
7. Preservation of empty inputs: Empty strings remain empty regardless of separator
58+
8. Character order preservation: Characters appear in the same order as in input
5559
5660
Sanity checks:
5761
- Result vector has same length as input vectors
5862
- Empty sequences produce empty results
5963
- Single character sequences produce the original character
6064
- Multiple character sequences are properly separated
65+
- Each result character is either from the original string or the separator
66+
- No characters are lost or duplicated (except separators)
6167
6268
Precondition: True (no special preconditions for string joining)
6369
Postcondition: Each result element is the join of characters from the corresponding
64-
sequence element using the corresponding separator
70+
sequence element using the corresponding separator, with proper
71+
length and character ordering properties
6572
-/
6673
theorem join_spec {n : Nat} (sep seq : Vector String n) :
6774
⦃⌜True⌝⦄
6875
join sep seq
6976
⦃⇓result => ⌜∀ i : Fin n,
7077
let s := seq.get i
7178
let separator := sep.get i
72-
result.get i = if s.length ≤ 1 then s
73-
else String.intercalate separator (s.toList.map String.singleton)⌝⦄ := by
79+
let expected_result := if s.length ≤ 1 then s
80+
else String.intercalate separator (s.toList.map String.singleton)
81+
-- Core correctness property
82+
result.get i = expected_result ∧
83+
-- Length property for non-trivial cases
84+
(s.length > 1 → (result.get i).length = s.length + (s.length - 1) * separator.length) ∧
85+
-- Empty string preservation
86+
(s.length = 0 → result.get i = "") ∧
87+
-- Single character preservation
88+
(s.length = 1 → result.get i = s) ∧
89+
-- Non-empty result for non-empty input
90+
(s.length > 0 → (result.get i).length > 0)⌝⦄ := by
7491
sorry

NumpySpec/strings/less.lean

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,11 @@ def less {n : Nat} (x1 x2 : Vector String n) : Id (Vector Bool n) :=
2828

2929
/-- Specification: numpy.strings.less returns element-wise lexicographic comparison.
3030
31+
This function performs element-wise lexicographic comparison between two vectors
32+
of strings, returning a boolean vector where each element indicates whether
33+
the corresponding element in x1 is lexicographically less than the corresponding
34+
element in x2.
35+
3136
Precondition: True (no special preconditions for string comparison)
3237
Postcondition: For all indices i, result[i] = (x1[i] < x2[i])
3338
@@ -38,6 +43,12 @@ def less {n : Nat} (x1 x2 : Vector String n) : Id (Vector Bool n) :=
3843
- Trichotomous: for any two strings s1 and s2, exactly one of s1 < s2, s1 = s2, or s1 > s2 holds
3944
- Decidable: String comparison is decidable for all strings
4045
- Type-safe: Result vector has same length as input vectors
46+
47+
String Comparison Properties:
48+
- Empty string is less than any non-empty string
49+
- Lexicographic ordering follows dictionary order (case-sensitive)
50+
- Comparison is based on Unicode code point values
51+
- Preserves strict ordering properties of the underlying string type
4152
-/
4253
theorem less_spec {n : Nat} (x1 x2 : Vector String n) :
4354
⦃⌜True⌝⦄
@@ -48,8 +59,22 @@ theorem less_spec {n : Nat} (x1 x2 : Vector String n) :
4859
(∀ i : Fin n, result.get i = true → ¬(x2.get i < x1.get i)) ∧
4960
-- Irreflexivity: no string is less than itself
5061
(∀ i : Fin n, x1.get i = x2.get i → result.get i = false) ∧
51-
-- Transitivity property (partial): if x1[i] < x2[i] and we have x3, then x1[i] < x3[i] when x2[i] < x3[i]
62+
-- Transitivity property: if x1[i] < x2[i] and we have a third string x3[i], transitivity holds
5263
(∀ i : Fin n, result.get i = true → ∀ s : String, x2.get i < s → x1.get i < s) ∧
5364
-- Decidability: result is always boolean (true or false)
54-
(∀ i : Fin n, result.get i = true ∨ result.get i = false)⌝⦄ := by
65+
(∀ i : Fin n, result.get i = true ∨ result.get i = false) ∧
66+
-- Empty string property: empty string is less than any non-empty string
67+
(∀ i : Fin n, x1.get i = "" → x2.get i ≠ "" → result.get i = true) ∧
68+
-- Non-empty string property: non-empty string is not less than empty string
69+
(∀ i : Fin n, x1.get i ≠ "" → x2.get i = "" → result.get i = false) ∧
70+
-- Length invariant: result has same length as input vectors
71+
(result.toList.length = n) ∧
72+
-- Consistency with String's built-in less-than operator
73+
(∀ i : Fin n, result.get i = true ↔ x1.get i < x2.get i) ∧
74+
-- Prefix property: if s1 is a proper prefix of s2, then s1 < s2
75+
(∀ i : Fin n, (x1.get i).isPrefixOf (x2.get i) → x1.get i ≠ x2.get i → result.get i = true) ∧
76+
-- Strict ordering: if result[i] is true, then x1[i] and x2[i] are different
77+
(∀ i : Fin n, result.get i = true → x1.get i ≠ x2.get i) ∧
78+
-- Totality of comparison: for any two strings, exactly one of <, =, > holds
79+
(∀ i : Fin n, result.get i = true ∨ x1.get i = x2.get i ∨ x2.get i < x1.get i)⌝⦄ := by
5580
sorry

NumpySpec/strings/less_equal.lean

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ def less_equal {n : Nat} (x1 x2 : Vector String n) : Id (Vector Bool n) :=
3939
- Consistency with equality: if x = y, then less_equal x y = True
4040
- Decidable: String comparison is decidable for all strings
4141
- Type-safe: Result vector has same length as input vectors
42+
- Lexicographic ordering: String comparison follows lexicographic ordering
4243
-/
4344
theorem less_equal_spec {n : Nat} (x1 x2 : Vector String n) :
4445
⦃⌜True⌝⦄
@@ -53,6 +54,8 @@ theorem less_equal_spec {n : Nat} (x1 x2 : Vector String n) :
5354
(∀ i : Fin n, x1.get i = x2.get i → result.get i = true) ∧
5455
-- Antisymmetry: if x1[i] <= x2[i] and x2[i] <= x1[i], then x1[i] = x2[i]
5556
(∀ i : Fin n, (x1.get i <= x2.get i) ∧ (x2.get i <= x1.get i) → x1.get i = x2.get i) ∧
57+
-- Transitivity preservation: consistent with transitive nature of string ordering
58+
(∀ i : Fin n, ∀ z : String, x1.get i <= z ∧ z <= x2.get i → x1.get i <= x2.get i) ∧
5659
-- Decidability: result is always boolean (true or false)
5760
(∀ i : Fin n, result.get i = true ∨ result.get i = false)⌝⦄ := by
5861
sorry

NumpySpec/strings/ljust.lean

Lines changed: 21 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,12 @@ def ljust {n : Nat} (a : Vector String n) (width : Nat) (fillchar : String) : Id
2323
/-- Specification: ljust returns a vector where each string is left-justified
2424
to the specified width using the given fill character.
2525
26-
Precondition: The fillchar must be exactly one character long
27-
Postcondition: Each result string either remains unchanged (if already >= width)
28-
or is left-justified with padding to reach the target width
26+
Mathematical Properties:
27+
- Length preservation: Result length is max(original_length, width)
28+
- Identity: Strings already >= width remain unchanged
29+
- Left-justification: Original content preserved as prefix, padding on right
30+
- Minimality: No unnecessary padding beyond required width
31+
- Fillchar constraint: Padding uses specified fill character
2932
-/
3033
theorem ljust_spec {n : Nat} (a : Vector String n) (width : Nat) (fillchar : String)
3134
(h_fillchar : fillchar.length = 1) :
@@ -34,13 +37,22 @@ theorem ljust_spec {n : Nat} (a : Vector String n) (width : Nat) (fillchar : Str
3437
⦃⇓result => ⌜∀ i : Fin n,
3538
let orig := a.get i
3639
let res := result.get i
37-
-- Case 1: Original string is already >= width, no padding needed
40+
-- Core mathematical properties of left-justification
41+
-- 1. Length invariant: result length is exactly max(orig.length, width)
42+
res.length = max orig.length width ∧
43+
-- 2. Identity morphism: strings already >= width are unchanged (f(x) = x when |x| >= w)
3844
(orig.length ≥ width → res = orig) ∧
39-
-- Case 2: Original string is < width, padding is added
45+
-- 3. Padding morphism: strings < width are extended (f(x) = x ++ p when |x| < w)
4046
(orig.length < width →
4147
res.length = width ∧
42-
res.startsWith orig ∧
43-
(∃ padding : String, res = orig ++ padding ∧ padding.length = width - orig.length)) ∧
44-
-- Sanity check: Result length is always max(orig.length, width)
45-
res.length = max orig.length width⌝⦄ := by
48+
(∃ padding : String, res = orig ++ padding ∧
49+
padding.length = width - orig.length) ∧
50+
-- Left-justification property: original is preserved as prefix
51+
res.startsWith orig) ∧
52+
-- 4. Minimality constraint: no over-padding (efficient operation)
53+
(orig.length ≥ width → res.length = orig.length) ∧
54+
-- 5. Exactness constraint: padding achieves exact width requirement
55+
(orig.length < width → res.length = width) ∧
56+
-- 6. Consistency constraint: all operations preserve the vector structure
57+
(orig.length = 0 → res.length = width)⌝⦄ := by
4658
sorry

NumpySpec/strings/lower.lean

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ def lower {n : Nat} (a : Vector String n) : Id (Vector String n) :=
4646
3. Case transformation: Uppercase letters become lowercase, others unchanged
4747
4. Idempotent property: Applying lower twice gives the same result as applying it once
4848
5. Empty string handling: Empty strings remain empty
49+
6. Character-level correctness: Each character is correctly transformed
4950
5051
Precondition: True (no special preconditions for lowercase conversion)
5152
Postcondition: For all indices i, result[i] is the lowercase version of a[i]
@@ -56,17 +57,27 @@ theorem lower_spec {n : Nat} (a : Vector String n) :
5657
⦃⇓r => ⌜∀ i : Fin n,
5758
let original := a.get i
5859
let result := r.get i
60+
-- Fundamental correctness: result matches Lean's built-in toLower
61+
(result = original.toLower) ∧
5962
-- Length preservation: result has same length as original
6063
(result.length = original.length) ∧
6164
-- Empty string case: empty input produces empty output
6265
(original.length = 0 → result = "") ∧
63-
-- Case transformation: all uppercase letters become lowercase
66+
-- Character-level transformation: each character is correctly converted
6467
(∀ j : Nat, j < original.length →
6568
∃ origChar : Char,
6669
original.get? ⟨j⟩ = some origChar ∧
6770
result.get? ⟨j⟩ = some origChar.toLower) ∧
6871
-- Idempotent property: applying lower twice gives same result as once
6972
(result.toLower = result) ∧
70-
-- Sanity check: the result should match Lean's built-in toLower
71-
(result = original.toLower)⌝⦄ := by
73+
-- Case preservation: non-alphabetic characters remain unchanged
74+
(∀ j : Nat, j < original.length →
75+
∃ origChar : Char,
76+
original.get? ⟨j⟩ = some origChar ∧
77+
(¬origChar.isAlpha → result.get? ⟨j⟩ = some origChar)) ∧
78+
-- Alphabetic transformation: uppercase letters become lowercase
79+
(∀ j : Nat, j < original.length →
80+
∃ origChar : Char,
81+
original.get? ⟨j⟩ = some origChar ∧
82+
(origChar.isUpper → result.get? ⟨j⟩ = some origChar.toLower))⌝⦄ := by
7283
sorry

0 commit comments

Comments
 (0)