Examples from HumanEval translated to Nagini. Nagini is a verification framework built on top of Python language. Its purpose is to prove the functional properties of programs.
We added such properties to the translated programs in Bench. Additionally, different parts of the verification proofs/executable code are marked there, so that one can easily remove, for example, all invariants from the code and evaluate ones algorithm for invariant generation on this benchmark. We differentiate the following parts of code: invariants, function preconditions, function postconditions, function implementation, pure functions.
In src we copied Nagini verifier's code, to use it in our CI infrastructure. In CI we checked that every program in Bench is verified in the time limit of 600 seconds. Currently, it's failing due to Nagini's inconsistency between runs and sometimes more time needed for the verification of the same task.
In WIP you can find examples from HumanEval that are in the process of being translated.
There is also a similar benchmark of HumanEval translated to Dafny
Current status:
- 0. has_close_elements
- 1. separate_paren_groups
- 2. truncate
- 3. below_zero
- 4. mean_absolute_derivation
- 5. intersperse
- 6. parse_nested_parens
- 7. filter_by_substring
- 8. sum_product
- 9. rolling_max
- 10. is_palindrome
- 11. string_xor
- 12. longest
- 13. greatest_common_divisor
- 14. all_prefixes
- 15. string_sequence
- 16. count_distinct_characters
- 17. parse_music
- 18. how_many_times
- 19. sort_numbers
- 20. find_closest_elements
- 21. rescale_to_unit
- 22. filter_integers
- 23. strlen
- 24. largest_divisor
- 25. factorize
- 26. remove_duplicates
- 27. flip_case
- 28. concatenate
- 29. filter_by_prefix
- 30. get_positive
- 31. is_prime
- 32. poly
- 33. sort_third
- 34. unique
- 35. max_element
- 36. fizz_buzz
- 37. sort_even
- 38. encode_cyclic
- 39. prime_fib
- 40. triples_sum_to_zero
- 41. car_race_collision
- 42. incr_list
- 43. pairs_sum_to_zero
- 44. change_base
- 45. triangle_area
- 46. fib4
- 47. median
- 48. is_palindrome
- 49. modp
- 50. encode_shift
- 51. remove_vowels
- 52. below_threshold
- 53. add
- 54. same_chars
- 55. fib
- 56. correct_bracketing
- 57. monotonic
- 58. common
- 59. largest_prime_factor
- 60. sum_to_n
- 61. correct_bracketing
- 62. derivative
- 63. fibfib
- 64. vowels_count
- 65. circular_shift
- 66. digitSum
- 67. fruit_distribution
- 68. pluck
- 69. search
- 70. strange_sort_list
- 71. triangle_area
- 72. will_it_fly
- 73. smallest_change
- 74. total_match
- 75. is_multiply_prime
- 76. is_simple_power
- 77. iscube
- 78. hex_key
- 79. decimal_to_binary
- 80. is_happy
- 81. numerical_letter_grade
- 82. prime_length
- 83. starts_one_ends
- 84. solve
- 85. add
- 86. anti_shuffle
- 87. get_row
- 88. sort_array
- 89. encrypt
- 90. next_smallest
- 91. is_bored
- 92. any_int
- 93. encode
- 94. skjkasdkd
- 95. check_dict_case
- 96. count_up_to
- 97. multiply
- 98. count_upper
- 99. closest_integer
- 100. make_a_pile
- 101. words_string
- 102. choose_num
- 103. rounded_avg
- 104. unique_digits
- 105. by_length
- 106. f
- 107. even_odd_palindrome
- 108. count_nums
- 109. move_one_ball
- 110. exchange
- 111. histogram
- 112. reverse_delete
- 113. odd_count
- 114. minSubArraySum
- 115. max_fill
- 116. sort_array
- 117. select_words
- 118. get_closest_vowel
- 119. match_parens
- 120. maximum
- 121. solution
- 122. add_elements
- 123. get_odd_collatz
- 124. valid_date
- 125. split_words
- 126. is_sorted
- 127. intersection
- 128. prod_signs
- 129. minPath
- 130. tri
- 131. digits
- 132. is_nested
- 133. sum_squares
- 134. check_if_last_char_is_a_letter
- 135. can_arrange
- 136. largest_smallest_integers
- 137. compare_one
- 138. is_equal_to_sum_even
- 139. special_factorial
- 140. fix_spaces
- 141. file_name_check
- 142. sum_squares
- 143. words_in_sentence
- 144. simplify
- 145. order_by_points
- 146. specialFilter
- 147. get_max_triples
- 148. bf
- 149. sorted_list_sum
- 150. x_or_y
- 151. double_the_difference
- 152. compare
- 153. Strongest_Extension
- 154. cycpattern_check
- 155. even_odd_count
- 156. int_to_mini_roman
- 157. right_angle_triangle
- 158. find_max
- 159. eat
- 160. do_algebra
- 161. solve
- 162. string_to_md5
- 163. generate_integers