Skip to content

hammer with given lemmas#212

Open
Rw1nd wants to merge 4 commits into
lukaszcz:rocq-9.1from
Rw1nd:given_lemmas
Open

hammer with given lemmas#212
Rw1nd wants to merge 4 commits into
lukaszcz:rocq-9.1from
Rw1nd:given_lemmas

Conversation

@Rw1nd
Copy link
Copy Markdown

@Rw1nd Rw1nd commented Feb 9, 2026

Hello, CoqHammer is an excellent automated theorem proving tool that I have been using on Rocq. However, I notice that the current implementation only selects lemmas through clustering algorithms (by using predict command). This modification extends hammer to support user-specified lemmas.

I add support for an optional syntax to provide specific lemmas to the hammer tactic:

From Hammer Require Import Hammer.
From Stdlib Require Import Lists.List.
Import ListNotations.

Theorem demo: forall (A: Type) (l1 l2 : list A), 
  rev (l1 ++ l2) = rev l2 ++ rev l1.
Proof.
  intros.
  induction l1.
  - hammer [app_nil_r app_assoc_reverse_deprecated].
  - hammer [app_nil_r app_assoc_reverse_deprecated].
Qed.

The new syntax, hammer [lemma1 lemma2 ...]., allows users to explicitly specify which lemmas to use. I think this makes hammer more flexible.


Summary by cubic

Add optional lemma selection to hammer so users can guide premise choice. Default hammer behavior stays the same.

  • New Features
    • New syntax: hammer [lemma1 ...].
    • Resolves names from the env; errors if a lemma is missing.
    • Extracts deps for the goal and given lemmas, then runs ATPs (incl. greedy sequence). Exposed as Features.choose_given_lemmas.
    • Hyps are always passed; minimization still applies. Plain hammer uses prediction.

Written for commit 03f915e. Summary will update on new commits.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 4 files

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/plugin/features.mli">

<violation number="1" location="src/plugin/features.mli:14">
P3: The extract_given_lemmas return comment says it is a temporary file name, but the function actually returns an hhdef list of predicted dependencies. The interface contract is misleading.</violation>
</file>

<file name="src/plugin/features.ml">

<violation number="1" location="src/plugin/features.ml:230">
P2: extract_given_lemmas can throw a misleading "Predictor did not return advice." error when ths_deps is empty because it reads from an empty temp file. This is likely for empty user-provided lemmas and should return an empty selection instead of crashing.</violation>
</file>

<file name="src/plugin/hammer_main.ml">

<violation number="1" location="src/plugin/hammer_main.ml:263">
P2: Catch-all exception handler in get_argus will swallow Sys.Break interrupts (Ctrl+C), preventing users from interrupting a hanging tactic; re-raise Sys.Break or catch specific exceptions.</violation>

<violation number="2" location="src/plugin/hammer_main.ml:734">
P2: do_extraction duplicates the bulk of do_predict’s parallel execution, cleanup, and minimization logic. This makes maintenance error-prone because changes to prover orchestration must be kept in sync across two functions.</violation>
</file>

Since this is your first cubic review, here's how it works:

  • cubic automatically reviews your code and comments on bugs and improvements
  • Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
  • Ask questions if you need clarification on any suggestion

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/plugin/features.ml Outdated
Comment thread src/plugin/hammer_main.ml Outdated
Comment thread src/plugin/hammer_main.ml Outdated
Comment thread src/plugin/features.mli Outdated
Comment thread src/plugin/g_hammer.mlg Outdated
| [ "hammer" "[" constr_list(l) "]" ] -> { hammer_tac l }
END


Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid whitespace-only changes

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the whitespace-only changes

Comment thread src/plugin/features.ml Outdated
deps



Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid whitespace-only changes

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the whitespace-only changes

Comment thread src/plugin/features.ml Outdated
(Str.split (Str.regexp " ")
(try input_line ic with End_of_file ->
close_in ic; Sys.remove oname;
raise (HammerError "Predictor did not return advice.")))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is misleading because there is no prediction going on here (we supply the lemmas, not predict them)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code is redundant. I will remove it.

@lukaszcz
Copy link
Copy Markdown
Owner

lukaszcz commented Feb 9, 2026

Thank you for the contribution, but I don't quite see the point. If we know which lemmas we want to use, we can just use the best/sauto tactic with these lemmas. The hammer tactic first runs best (essentially), and if that fails it runs ATPs to select relevant lemmas which it then uses with best/sauto (essentially) to prove the theorem inside Rocq. If the user provides the lemmas they want to use up-front, there is not much point in passing them over to the ATPs, so one could as well use best. The only use-case seems to be if you want to minimize the number of dependencies from a small user-provided set, but then one could just write a function to run best for all possible subsets if the number of supplied lemmas is small.

@Rw1nd
Copy link
Copy Markdown
Author

Rw1nd commented Feb 10, 2026

Hello, my idea is to make hammer easier to integrate with other automated tools. For example, if a tool implements a new lemma retrieval algorithm, this syntax would allow integration with hammer without requiring modifications to hammer's internal algorithms. This new syntax could serve as a general-purpose interface between Rocq and ATPs and is better support for non-interactive, programmatic usage.

@lukaszcz
Copy link
Copy Markdown
Owner

lukaszcz commented Feb 10, 2026

Hello, my idea is to make hammer easier to integrate with other automated tools. For example, if a tool implements a new lemma retrieval algorithm, this syntax would allow integration with hammer without requiring modifications to hammer's internal algorithms. This new syntax could serve as a general-purpose interface between Rocq and ATPs and is better support for non-interactive, programmatic usage.

Yes, this makes sense if by lemma retrieval you mean quickly preselecting a large number of possibly relevant lemmas. The whole ATP backend of Hammer is nothing more than a lemma retrieval algorithm itself. So by providing lemmas to the hammer tactic all you're doing is feeding them into another (possibly more precise) lemma retrieval tool.

The only thing that ATPs do in a hammer is relevant lemma selection. The proof found by the ATPs is discarded.

If you don't need further relevant lemma selection, you should feed your retrieved lemmas to best which is the actual in-Rocq prover that hammer uses after selecting relevant lemmas.

If you have a tool that pre-selects a large number of lemmas (essentially a replacement for hammer's predict), then it may make sense to provide the lemmas to hammer. A preferred way to do this would be to replace the predict program, but this indeed may not always be easy or feasible.

If your lemma retrieval tool is precise enough to select e.g. around 10 relevant lemmas, then there is little point in feeding them further to the ATPs - you should feed them to best instead. If your lemma retrieval tool returns e.g. around 100 possibly relevant lemmas, then it should be implemented as a replacement for predict whenever possible - the in-Rocq interface from this PR makes sense only when this is too hard.

I think it makes sense to include this PR, but the use-cases seem to be rare. Also, the issues with the PR code need to be addressed.

Comment thread src/plugin/g_hammer.mlg Outdated

TACTIC EXTEND Hammer_tac
| [ "hammer" ] -> { hammer_tac () }
| [ "hammer" ] -> { hammer_tac [] }
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use empty list to indicate no extraction. Create a datatype with constructors for extraction and prediction.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the "lemma extraction" should be called something else to avoid confusion with feature extraction. Perhaps "lemma choice"?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "lemma choice" is better.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I create a datatype hammer_mode for prediction and choice:

type hammer_mode = 
  | Prediction 
  | Choice of EConstr.t list

Co-authored-by: cubic-dev-ai[bot] <191113872+cubic-dev-ai[bot]@users.noreply.github.com>
Comment thread src/plugin/features.ml Outdated
let goal_deps = List.filter (fun a -> Hhlib.StringSet.mem a names) goal_deps in
let ths_deps = List.sort_uniq Stdlib.compare (goal_deps @ (List.concat (List.map write_def lems))) in

let oname = Filename.temp_file ("coqhammer_out" ^ atpname) "" in
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we're writing to files here at all. We should just convert the lemmas to hhdefs without going through the type system. The only reason extract creates files is to interface with the separate predict program.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended to reuse part of the previous code. Indeed, writing to a file is unnecessary here. I will remove this redundant code.

Comment thread src/plugin/features.ml Outdated
if !Opt.debug_mode then
Msg.info ("After filtering: " ^ string_of_int (List.length defss) ^ " Coq objects.");
let names = Hhlib.strset_from_lst (List.map get_hhdef_name defss) in
let write_def def =
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write_def is a confusing name for a function that doesn't do any IO

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe choose_def is better.

Comment thread src/plugin/features.ml Outdated
fname

let extract_given_lemmas (hyps : hhdef list) (defs : hhdef list) (lems : hhdef list) (goal : hhdef) (atpname : string) =
Msg.info "Extracting definitions...";
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"extract" and "extracting" is poor naming, because it suggest connection to feature extraction and we're not doing that here. Perhaps "choice" and "choosing"?. Maybe this function should not be in the features.ml file, but a separate one?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, choose_given_lemmas might be a better name. I think we don't need a separate file, as this functionality is quite similar to the extract function in features.ml. Although we don't need to use features for prediction, the functions used are very similar. Having a separate file just for a single function seems a bit redundant.

@Rw1nd
Copy link
Copy Markdown
Author

Rw1nd commented Feb 11, 2026

Thank you for your suggestion. I will address the issues in the code.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/plugin/features.ml">

<violation number="1" location="src/plugin/features.ml:200">
P2: Empty lemma lists are now treated as an error, but review feedback requires graceful handling of empty lists. This rejects valid inputs like `hammer []` and violates the stated requirement.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/plugin/features.ml
Comment on lines +200 to +201
if lems = [] then
raise (HammerError ("No lemmas given for choice."));
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Empty lemma lists are now treated as an error, but review feedback requires graceful handling of empty lists. This rejects valid inputs like hammer [] and violates the stated requirement.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/plugin/features.ml, line 200:

<comment>Empty lemma lists are now treated as an error, but review feedback requires graceful handling of empty lists. This rejects valid inputs like `hammer []` and violates the stated requirement.</comment>

<file context>
@@ -196,46 +194,27 @@ let extract (hyps : hhdef list) (defs : hhdef list) (goal : hhdef) : string =
+let choose_given_lemmas (hyps : hhdef list) (defs : hhdef list) (lems : hhdef list) (goal : hhdef) (atpname : string) =
+  Msg.info "Choosing definitions...";
   let defss = List.filter is_nontrivial defs in
+  if lems = [] then
+     raise (HammerError ("No lemmas given for choice."));
   if !Opt.debug_mode then
</file context>
Suggested change
if lems = [] then
raise (HammerError ("No lemmas given for choice."));
if lems = [] then
Msg.info "No lemmas given for choice.";
Fix with Cubic

@lukaszcz lukaszcz self-requested a review February 17, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants