diff --git a/rules/rules26.tex b/rules/rules26.tex new file mode 100644 index 0000000..141eb1c --- /dev/null +++ b/rules/rules26.tex @@ -0,0 +1,1549 @@ +\documentclass[12pt]{article} + +\usepackage{color} +\usepackage{times} +\usepackage{fullpage} +\usepackage{hyperref} +\usepackage{xspace} +\usepackage{enumitem} +\usepackage{amsmath} + +%\usepackage{draftwatermark} +%\usepackage{showframe} % debug + +\hyphenation{data-types} + +\newcommand{\akey}[1]{\textbf{#1}\xspace} +\newcommand{\bkey}[1]{\textbf{{#1}}\xspace} + +\newcommand{\rem}[1]{\textcolor{red}{[#1]}} +\newcommand{\todo}[1]{\rem{TODO #1}} +\newcommand{\dd}[1]{\rem{#1 -- david}} +\newcommand{\dw}[1]{\rem{#1 -- dominik}} +\newcommand{\fb}[1]{\rem{#1 -- fran\c{c}ois}} +\newcommand{\mj}[1]{\rem{#1 -- martin}} + +\newcommand{\maintrack}{Single Query Track\xspace} +\newcommand{\inctrack}{Incremental Track\xspace} +\newcommand{\ucoretrack}{Unsat-Core Track\xspace} +\newcommand{\mvaltrack}{Model-Validation Track\xspace} +\newcommand{\challtrack}{Industry-Challenge Track\xspace} +\newcommand{\paralleltrack}{Parallel Track\xspace} + +\newcommand{\executionservice}{execution service\xspace} + +\newcommand{\rationale}[1]{\hskip .5em{\textit{Rationale:} #1}\xspace} + +\let\OLDthebibliography\thebibliography +\renewcommand\thebibliography[1]{ + \OLDthebibliography{#1} + \setlength{\parskip}{2pt} + \setlength{\itemsep}{1.5pt plus 0.3ex} +} + +\urlstyle{same} +% add chars ~,-,*,'," to break line at for long URLs +\makeatletter +\g@addto@macro{\UrlBreaks}{\UrlOrds} +\makeatother + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\begin{document} + +\date{\small This version revised \the\year-\the\month-\the\day} + +\title{21st International Satisfiability Modulo Theories Competition + (SMT-COMP 2026): Rules and Procedures} + +% [morgan] do our own layout of authors; the four-author layout spacing +% was screwed up... +\def\doauthor#1{{% + \hsize.5\hsize \advance\hsize by-1cm % + \def\\{\hss\egroup\hbox to\hsize\bgroup\strut\hss}% + \vbox{\hbox to\hsize\bgroup\strut\hss#1\hss\egroup}}}% + +\def\header#1{\medskip\noindent\textbf{#1}} + +\author{% +Dominik Winterer\\ +University of Manchester\\ +United Kingdom\\ +{\small\href{mailto:dominik.winterer@manchester.ac.uk}{\texttt{dominik.winterer@manchester.ac.uk}}} +\and +Martin Jon\'{a}\v{s}\\ +Masaryk University\\ +Czechia\\ +{\small\href{mailto:martin.jonas@mail.muni.cz}{\texttt{martin.jonas@mail.muni.cz}}} +\and +Tomáš Kolárik\\ +Università della Svizzera italiana\\ +Switzerland\\ +{\small\href{mailto:tomas.kolarik@usi.ch}{\texttt{tomas.kolarik@usi.ch}}} +} + +\maketitle + +\noindent Comments on this document should be emailed to the SMT-COMP +mailing list or the Zulip channel (see below) or, if necessary, directly to the organizers. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Communication} + +Interested parties should subscribe to the SMT-COMP mailing list. +Important late-breaking news and any necessary clarifications and +edits to these rules will be announced there, and it is the primary +way in which such announcements will be communicated. + +\begin{itemize} +\item SMT-COMP mailing list: + \href{mailto:smt-comp@googlegroups.com}{\textrm{smt-comp@googlegroups.com}} +\item Sign-up site for the mailing list: + \url{https://groups.google.com/g/smt-comp} +\end{itemize} + +\noindent Additional material will be made available at the +competition web site, \url{http://www.smtcomp.org}. + +\bigskip +\noindent The main communication channel for the SMT-COMP community is: +\begin{itemize} +\item \texttt{\#smtcomp} channel on SMT-LIB Zulip server: +\url{https://smtlib.zulipchat.com/} +\end{itemize} +This channel should be used for public discussions among the participants and +organizers about topics such as the rules, competition-related issues, +suggestions for improvements, inquiries about the status, or other topics that +could be of interest for the whole SMT-COMP community. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\newpage + +\section{Important Dates} +\label{sec:important} + + +\begin{description} + \item[May~1] Final versions of competition tools (used by the organizers to run the participating solvers) + \item[May~27] Deadline for first versions of participating solvers (for all tracks), including preliminary system descriptions + \item[June~10] Deadline for final versions of participating solvers, including final system descriptions + \item[July~24--25] SMT Workshop (presentation of results) +\end{description} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Introduction} + +The annual Satisfiability Modulo Theories Competition~(SMT-COMP) is +held to spur advances in SMT solver implementations on benchmark +formulas of practical interest. Public competitions are a well-known +means of stimulating advancement in software tools. For example, in +automated reasoning, the CASC and SAT competitions for first-order and +propositional reasoning tools, respectively, have spurred significant +innovation in their fields~\cite{leberre+03,PSS02}. +% +Accordingly, researchers are highly encouraged to submit both new +benchmarks and new or improved solvers to raise the level of +competition and advance the state of the art in automated SMT problem +solving. More information +on the history and motivation for SMT-COMP can be found at the +competition web site, \url{http://www.smtcomp.org}, and in reports on +previous +competitions~(\cite{SMTCOMP-JAR,SMTCOMP-FMSD,BDOS08,SMTCOMP-2008,CDW14,SMTCOMP-2012,CSW15}). + +SMT-COMP~2026 is part of the SMT Workshop~2026 +(\url{http://smt-workshop.cs.uiowa.edu/2026/}), +which is affiliated with IJCAR~2026 (\url{https://www.floc26.org/ijcar}). +The SMT Workshop will include a block of time to present the results of the +competition. + +SMT-COMP 2026 will have the following tracks: +\begin{enumerate} + \item the \maintrack (before 2019: Main Track), + \item the \inctrack (before 2019: Application Track), + \item the \ucoretrack, + \item the \mvaltrack, + \item the \paralleltrack{}. +\end{enumerate} + +% +Within each track there are multiple divisions, where each division +uses benchmarks from a specific group of SMT-LIB logics. +% +We will recognize winners in all tracks. +% +They will be determined by the number of benchmarks solved (taking into account +the weighting detailed in Section~\ref{sec:scoring}); we will also recognize +solvers based on additional criteria. + +The rest of this document, revised from the previous +version,\footnote{Earlier versions of this document include + contributions from Clark Barrett, Martin Bromberger, Roberto Bruttomesso, David Cok, + Sylvain Conchon, David D{\'e}harbe, Morgan Deters, Alberto Griggio, + Liana Hadarean, + Matthias Heizmann, Antti Hyvarinen, Jochen Hoenicke, Aina Niemetz, Albert Oliveras, Giles Reger, Aaron Stump, + and Tjark Weber.} describes the rules and competition procedures for +SMT-COMP~2026. +% + +As in previous years, we have revised the rules slightly. +% Most changes are due to our move from StarExec to BenchExec. +% +The principal rule changes from the previous competition effective at +SMT-COMP~2026 are the following: +\begin{itemize} +\item {\bf Derived solver submission.} \emergencystretch.5cm Submitters of + derived solvers must now also submit the source code of their solver. \\[0.2cm] + \rationale{Derived solver submissions are important for the community as they + can push the state of the art. These improvements can be used by the + community and the authors of the original solvers only if the derived + solvers are open source.} % + +\item {\bf Derived solver scoring \& recognition.} \emergencystretch.5cm A + derived solver can win a division, track, or logic only if it + significantly outperforms its base solver, i.e., only if it improves its PAR-2 + score by at least $10\,\%$. A derived solver can win a competition-wide + recognition under the condition that it significantly outperforms its base + solver in at least one division.\\[0.2cm] + % + % \rationale{The new rule prevents that derived solvers that only marginally + % outperform their base solvers can win awards.} + \rationale{If the contribution of the derived solver is only marginal, the + success of the solver should be attributed to the base solver, not to the + changes made in the derived solver. Note that all derived solvers will be + executed and their results will still be published as usual. We believe that + seeing the effect of the additions on top of the base solver provides the + community with useful insights into the algorithms and their potential + improvements.} + % + +\item {\bf Benchmark selection.} \emergencystretch.5cm In accordance with + discussions of the SMT community, we reduced the number of benchmarks to keep + SMT-COMP 2026 computable in a reasonable amount of time. Moreover, + the benchmarks will be drawn from the SMT-LIB repository of the preceding year + rather than the current year. + \\[0.2cm] + \rationale{SMT-COMP + 2025 took 9.98 CPU years to execute, yielding little buffer for errors and + for checking the results between the solver submission deadline and the SMT + workshop. Drawing from the preceding year's benchmark set maintains fairness among + participants regarding exposure to benchmarks.} +\end{itemize} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Entrants} +\label{sec:entrants} + + +\header{SMT Solver.} +% +A Satisfiability Modulo Theories (SMT) solver that can enter SMT-COMP is a tool that can determine the +(un)satisfia\-bility of benchmarks from the SMT-LIB benchmark library +(\url{https://smtlib.cs.uiowa.edu/benchmarks.shtml}). + +\header{Portfolio Solver.} +% +A \emph{portfolio solver} is a solver using a combination of two or +more SMT solvers on the same input problem. Portfolio solver are in +general \textbf{not allowed}, except for the parallel track. +If you are unsure if your tool is a portfolio +solver according to this definition and you feel that it should be allowed +contact the organizers of the SMT-COMP for clarification. + +\header{Wrapper Tool.} +% +A \emph{wrapper tool} is defined as any solver that calls one or more other SMT +solvers (the \emph{wrapped solvers}) to solve \emph{different} subtasks +(in contrast to portfolio solvers that combine the result of several SMT +solvers on the same task). +Examples of such subtasks are preprocessing steps or solving a subproblem that +belongs to a strict sublogic of the input logic. +For example, a solver using one subsolver to solve the ground abstraction of a +quantified problem would be a wrapper tool. +A wrapper tool solving a benchmark of logic A +is \textbf{not allowed} to call an SMT solver to solve a problem for logic A, +even if the problem is just a subproblem. +The system description of a wrapper tool \textbf{must} +explicitly acknowledge and state the \textbf{exact} version of any solvers that +it wraps. It \textbf{must} further make clear technical innovations by which +the wrapper tool expects to improve on the wrapped solvers. + +\header{Derived Tool.} +% +A \emph{derived tool} is defined as any solver that is \emph{based on and +extends} another SMT solver (the \emph{base solver}) from a different +group of authors. +In contrast to a wrapper tool, a derived tool solving a benchmark of logic A is allowed to call an SMT solver +to solve a problem for logic A. +However, it is \textbf{not allowed} to call more than one SMT solver to solve a problem over the input logic. +Examples of derived tools are tools that change the strategy selection for the base solver, +tools that add new preprocessing steps on top of the base solver, +tools that add new heuristics or incomplete procedures to a base solver. +The system description of a derived tool +\textbf{must} explicitly acknowledge +the solver it is based on and extends. It \textbf{must} further make clear +technical innovations by which the derived tool expects to improve on the +original solver. A derived tool \textbf{must} follow the \emph{naming convention} +{[name of base solver]-[my solver name]}. +Submitters of a derived tool \textbf{must} submit their tool's binary \emph{and} source code +({[name of base solver]-[my solver name]}.tar.gz) along with the corresponding base +tool's binary for the same track and logics as their derived tool. + +\header{SMT Solver Submission.} +% +An entrant to SMT-COMP is a solver submitted by its authors via +a pull request to the SMT-COMP GitHub repository +\url{https://github.com/SMT-COMP/smt-comp.github.io/tree/master/submissions}. +The final solver version needs to be uploaded to Zenodo (\url{https://zenodo.org/}). +In the case of derived tools, the Zenodo submission must include both the derived tool and the base tool. +The solver is an archive that contains the precompiled executable (statically linked is preferable). +It will be executed on a computer that has the same installation as the following docker image +\begin{center} + \footnotesize +\texttt{\url{registry.gitlab.com/sosy-lab/benchmarking/competition-scripts/user:latest}} +\end{center} + + +\header{Solver execution.} +% +BenchExec (\url{https://github.com/sosy-lab/benchexec}) is a framework for reliable benchmarking and resource +measurement developed by LMU's Software and Computational Systems Lab (SoSy-Lab \url{https://www.sosy-lab.org/}). +The framework can be downloaded and used locally by anyone. +The competition will be executed on a BenchExec cluster owned by SoSy-Lab, +who are kind enough to support our competition with their computing power. +To be more precise, the competition will be run on the 168 apollon nodes of the +SoSy-Lab BenchExec cluster (for more details see \url{https://vcloud.sosy-lab.org/cpachecker/webclient/master/info}) and on a 128-processor, 1TB RAM machine for the Parallel Track. +It is also possible to locally emulate and test the computing environment on +the competition machines using the following instructions: +\url{https://gitlab.com/sosy-lab/benchmarking/competition-scripts/#computing-environment-on-competition-machines}. + +\header{Participation in the Competition.} +% +For participation in SMT-COMP, the organizers must be informed of the solver's presence +\emph{and the tracks and divisions which it enters}, by submitting a properly formatted +JSON file to the SMT-COMP GitHub repository. + +All instructions for submissions are available at the following URL: +\begin{center} + \url{https://smt-comp.github.io/2026/solver_submission/} +\end{center} + +Note that independent of the tracks, the final solver version must be uploaded +to Zenodo (\url{https://zenodo.org/}), together with the base solver in the case of a +derived solver. + +\header{System description.} +% +As part of the submission, SMT-COMP entrants are \textbf{required} to provide a +short (1-2 pages, excluding references) description of the system, which \textbf{must} explicitly +acknowledge any solver it wraps or is based on in case of a \emph{wrapper} or +\emph{derived} tool (see above). +In case of a \emph{wrapper} tool, it \textbf{must} also explicitly state +the exact version of each wrapped solver. +A system description \textbf{must} further include the following information: +\begin{itemize}[itemsep=0ex] + \item a list of all (current) authors of the system and their present institutional + affiliations, + \item the basic SMT solving approach employed, + \item in case of a \emph{wrapper} or \emph{derived tool}: details of + technical innovations by which a wrapper or derived tool expects to improve + on the wrapped solvers or base solver, and + \item appropriate acknowledgment of tools other than SMT solvers called by + the system (e.g., SAT solvers) that are not written by the authors of the + submitted solver. +\end{itemize} +A system description \emph{should} further include the following information +(unless there is a good reason otherwise): +\begin{itemize}[itemsep=0ex] + \item details of any non-standard algorithmic techniques as well as + references to relevant literature (by the authors or others), and + \item a link to a website for the submitted tool. +\end{itemize} +System descriptions \textbf{must} be submitted \textbf{by the deadline for first versions +of solvers}, and will be made publicly available on the competition website. +Organizers will check that they contain sufficient information +and may withdraw a system if its description is not sufficiently updated upon +request. The updates must happen \textbf{by the deadline for final versions of solvers}. + +\header{Multiple versions.} +% +The intent of the organizers is to promote as wide a comparison among +solvers and solver options as possible. However, to keep the number of +solver submissions low, each team should only provide multiple solvers +if they are substantially different. A justification must be provided +for the difference. We strongly encourage the teams to keep the number +of solvers per team per category at at most two. By allowing +up to two submissions we want to encourage the development of new, +experimental techniques via an ``alternative solver'' while keeping +the competition manageable. + +\header{Other solvers.} +% +The organizers reserve the right to include additional solvers of interest (such as participants in previous editions), +in the competition, e.g., for comparison purposes. + +%\header{Attendance.} +% +%Submitters of an SMT-COMP entrant are not required (but encouraged) to be +%physically present at the competition or the SMT Workshop to participate or +%win. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\subsection*{Deadlines} + +SMT-COMP entrants must be submitted via a pull request as explained above until the end of +{\bf May~27, 2026} anywhere on Earth. +After this date \emph{no new entrants} will be accepted. +However, updates to existing entrants via pull requests +will be accepted until the end of {\bf June~10, 2026} anywhere on Earth. + +We strongly encourage participants to use this grace period +\emph{solely} to address any bugs that may be identified, +rather than adding new features. This is because there may be limited opportunities for +thorough testing using the \executionservice or other methods after the initial deadline. + +The last solver links pushed to the repository at the conclusion of +the grace period will be the ones used for the competition. Versions +submitted after this time will not be used. The organizers reserve +the right to start the competition itself at any time after the opening +of the New York Stock Exchange on the day after the deadline for final versions of solvers. + +These deadlines and procedures apply equally to all tracks of the +competition. + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Execution of Solvers} + +Solvers will be publicly evaluated in all tracks and divisions into +which they have been entered. +% +A solver enters a division in a track if it supports at least one +logic in this division. +% +A solver not supporting all logics in a division will not be run on +the benchmarks from the unsupported logics and will be scored as if it +returned the result \texttt{unknown} within zero time. +% +All results of the competition will be made public. Solvers will be +made publicly available and it is a minimum +license requirement that (i) solvers can be distributed in this way, +and (ii) all submitted solvers may be freely used for academic +evaluation purposes. + +%\an{intro of the section maybe, we should add there that solvers will be archived on the website and so on. Interestingly, we replied to the reviewer in the report that the solvers are publicly available in the StarExec space when asked about reproducibility, but in the past it was never defined under which terms.} + + +\subsection{Logistics} +\label{sec:logistics} + +\header{Dates of Competition.} +% +The bulk of the computation will take place during the weeks leading +up to SMT 2026. Intermediate results will be regularly posted to the +SMT-COMP website and the Zulip channel as the competition runs. +% +The organizers reserve the right to prioritize certain competition +tracks or divisions to ensure their timely completion, and, under +exceptional circumstances, to complete divisions after the SMT +Workshop. + +\header{Competition Website.} +The competition website (\url{www.smtcomp.org}) will be used as the main form +of communication for the competition. The website will be used to post updates, +link to these rules and other relevant information (e.g., the benchmarks), and +to announce the results. The website also archives previous +competitions. +% + +\header{Tools.} \label{tools} +The competition uses a number of tools/scripts to run the competition. In the +following, we briefly describe these tools. +\begin{itemize} + \item \textbf{smtcomp.} This tool combines the functionality of most of the + scripts used in previous years and more. It is available at + \url{https://github.com/SMT-COMP/smt-comp.github.io}. Some of its features are: + \begin{itemize} + \item \textbf{Benchmark Selection.} The tool has a command that executes the + benchmark selection policy described on page~\pageref{benchmark-selection}. + It takes a seed for the random benchmark selection. The same seed is used + for all tools requiring randomisation. + \item \textbf{Results processing.} The tool has commands that process and + summarize all the results from files generated by BenchExec (e.g. compute the number of \texttt{check\_sat} solved in incremental from the output of the solver). + \item \textbf{Scoring.} The tool has a command that executes the scoring computation described on in Section~\ref{sec:scoring}. It also includes the scoring + computations used in competitions since 2015. + \end{itemize} + For a full list of the capabilities of the smtcomp tool and an explanation of the + main commands + see \url{https://github.com/SMT-COMP/smt-comp.github.io/blob/master/README.md} + \item \textbf{Scrambler.} This tool is used to scramble benchmarks during the + competition to ensure that tools do not rely on syntactic features to + identify benchmarks. The scrambler can be found at + \url{https://github.com/SMT-COMP/scrambler}. + \item \textbf{Trace Executor.} This tool is used in the \inctrack to emulate + an on-line interaction between an SMT solver and a client application and + is available at \url{https://github.com/SMT-COMP/trace-executor} +\end{itemize} + +\header{Input.} +% +In the \emph{\inctrack}, the \emph{trace executor} will send commands from an +(incremental) benchmark file to the standard input channel of the solver. In +\emph{all other tracks}, a participating solver must read a \emph{single} +benchmark file, whose filename is presented as the first command-line argument +of the solver. + +Benchmark files are in the concrete syntax of the SMT-LIB format +version~2.6, though with a \emph{restricted} set of commands. A benchmark +file is a text file containing a sequence of SMT-LIB commands that +satisfies the following \emph{requirements}: +% +\begin{itemize} + \item + \bkey{(set-option ...)} + The input contains the following \akey{set-option} commands. + \begin{enumerate}[label=(\alph*)] + \item In the \emph{\inctrack}, the \akey{:print-success} option + must not be disabled. The trace executor will send an initial + \akey{(set-option :print-success true)} command to the solver. + \item In all other tracks, the scrambler will add an initial + \akey{(set-option :print-success false)} command to the solver. + \item In the \emph{\mvaltrack}, a benchmark file contains a single + \akey{(set-option :produce-models true)} command as the second command. + \item In the \emph{\ucoretrack}, a benchmark file contains a + single \akey{(set-option :produce-unsat-cores true)} command as + the second command. + \end{enumerate} + \item \bkey{(set-logic ...)}\\ + A (single) \akey{set-logic} command is the \emph{first} command after + any \akey{set-option} commands. + \item \bkey{(set-info ...)}\\ + A benchmark file may contain any number of \akey{set-info} commands. + During the competition all \akey{set-info} commands are removed from + the benchmark by the scrambler. + \item \bkey{(declare-sort ...)}\\ + A benchmark file may contain any number of \akey{declare-sort} and + \akey{define-sort} commands. All sorts declared or defined with these + commands must have zero arity. + \item \bkey{(declare-fun ...)} and \bkey{(define-fun ...)}\\ + A benchmark file may contain any number of \akey{declare-fun} and + \akey{define-fun} commands. + \item \bkey{(declare-datatype ...)} and \bkey{(declare-datatypes ...)}\\ + If the logic features algebraic datatypes, the benchmark file may + contain any number of \akey{declare-datatype(s)} commands. + \item \bkey{(assert ...)}\\ + A benchmark file may contain any number of \akey{assert} commands. All + formulas in the file belong in the declared logic, with any free symbols + declared in the file. + \item + \bkey{:named} + \begin{enumerate}[label=(\alph*)] + \vspace{-1ex} + \item In \emph{all} tracks \emph{except} the \ucoretrack, named + terms (i.e., terms with the \akey{:named} attribute) are \emph{not} + used. + \item In the \emph{\ucoretrack}, top-level assertions may be named. + \end{enumerate} + \item + \bkey{(check-sat)} + \begin{enumerate}[label=(\alph*)] + \vspace{-1ex} + \item In \emph{all} tracks \emph{except} the \inctrack, there is + \emph{exactly one} \akey{check-sat} command. + \item In the \emph{\inctrack}, there are one or more + \akey{check-sat} commands. There may also be zero or more + \akey{(push 1)} commands, and zero or more \akey{(pop 1)} commands, + consistent with the use of those commands in the SMT-LIB standard. + \end{enumerate} + \item \bkey{(get-unsat-core)}\\ + In the \emph{\ucoretrack}, the \akey{check-sat} command (which is + always issued in an unsatisfiable context) is followed by a single + \akey{get-unsat-core} command. + \item \bkey{(get-model)}\\ + In the \emph{\mvaltrack}, the \akey{check-sat} command (which is + always issued in a satisfiable context) is followed by a single + \akey{get-model} command. + \item \bkey{(exit)}\\ + It may \emph{optionally} contain an \akey{exit} command as its + last command. In the \emph{\inctrack}, this command must not be + omitted. + \item \textbf{No other commands} besides the ones just mentioned may be used. +\end{itemize} +% +The SMT-LIB format specification is available from the ``Standard'' +section of the SMT-LIB website~\cite{SMT-LIB}. Solvers will be given +formulas only from the divisions into which they have been entered. + +\header{Output.} +% +In all tracks except the \inctrack, any \texttt{success} output will be +ignored\footnote{SMT-LIB~2.6 requires solvers to produce a \texttt{success} + answer after each \akey{set-logic}, \akey{declare-sort}, \akey{declare-fun} + and \akey{assert} command (among others), unless the option + \akey{:print-success} is set to false. Ignoring the \texttt{success} outputs + allows for submitting fully SMT-LIB~2.6 compliant solvers without the need for + a wrapper script, while still allowing entrants of previous competitions to + run without changes.}. Solvers that exit before the time limit without +reporting a result (e.g., due to exhausting memory or crashing) \emph{and} do +not produce output that includes \texttt{sat}, \texttt{unsat}, \texttt{unknown} +or other track specific output as specified in the individual track sections, +e.g., unsat cores or models, will be considered to have aborted. Note that there +is no distinction between output and error channel and tools should not write +any message to the error channel because it could be misinterpreted as a wrong +result. + +\header{Time and Memory Limits.} +% +Each SMT-COMP solver will be executed on a dedicated processor of a +competition machine, for each given benchmark, up to a fixed +wall-clock time limit~$T$. The individual track descriptions on +pages~\pageref{sec:exec:single}-\pageref{sec:exec:model} specify +the time limit for each track. Each processor has 4 cores. Detailed +machine specifications are available on the competition web site. + +The \executionservice also limits the memory consumption of the solver +processes. We expect the memory limit per solver/benchmark pair to be +on the order of 30\,GB. +% The values of both the time limit and the +% memory limit are available to a solver process through environment +% variables. See the user guide for the \executionservice for more information. + +The limits for \paralleltrack{} are available at +\url{https://smt-comp.github.io/2026/parallel-track.html}. + +\header{Persistent State.} +% +Solvers may create and write to files and directories during the +course of an execution, but they must not read such files back during +later executions. This is ensured by BenchExec by executing each solver +with the whole filesystem mounted as read-only with an overlay writeable +layer that is mounted as a RAM disk. Any generated files will be therefore +written to the RAM disk. The used storage is counted into the +memory limit. The temporary overlay layer is deleted after the job is complete. +Solvers must not attempt to communicate with other machines, e.g., over the network. + +\subsection{\maintrack} +\label{sec:exec:single} + +The \maintrack track will consist of selected non-incremental benchmarks in +each of the competitive logics. Each benchmark will be presented to +the solver as its first command-line argument. The solver is then expected to +report on its standard output channel whether the formula is satisfiable +(\texttt{sat}) or unsatisfiable (\texttt{unsat}). A solver may also report +\texttt{unknown} to indicate that it cannot determine satisfiability of the +formula. + +\header{Benchmark Selection.} See page~\pageref{benchmark-selection}. + +\header{Time Limit.} +This track will use a wall-clock time limit of 20 minutes per solver/benchmark +pair. + +\subsection{Incremental Track} +\label{sec:exec:app} + +The incremental track evaluates SMT solvers when interacting with an +external verification framework, e.g., a model checker. This +interaction, ideally, happens by means of an online communication +between the framework and the solver: the framework repeatedly sends +queries to the SMT solver, which in turn answers either \texttt{sat} +or \texttt{unsat}. In this interaction an SMT solver is required to +accept queries incrementally via its \emph{standard input channel}. + +In order to facilitate the evaluation of solvers in this track, we will set up +a ``simulation'' of the aforementioned interaction. Each benchmark represents +a realistic communication trace, containing multiple \akey{check-sat} commands +(possibly with corresponding \akey{push 1} and \akey{pop 1} commands). It is +parsed by a (publicly available) \emph{trace executor}, +which serves the following purposes: + +\begin{itemize} + \item simulating online interaction by sending single queries to the SMT + solver (through stdin), + \item preventing ``look-ahead'' behaviors of SMT solvers, + \item recording time and answers for each command, + \item guaranteeing a fair execution for all solvers by abstracting + from any possible crash, misbehavior, etc.\ that might happen in the + verification framework. +\end{itemize} + +\header{Input and output.} +Participating solvers should include a script +\texttt{smtcomp\_run\_incremental} (if this is not present the script +for the default configuration will be used). This script will be called +without arguments and will be connected to a trace executor, which +will incrementally send commands to the standard input channel of the +solver and read responses from the standard output channel of the +solver. The commands will be taken from an SMT-LIB benchmark script +that satisfies the requirements for incremental track scripts given in +Section~\ref{sec:logistics}. +% +Solvers must respond to each command sent by the trace executor with +the answers defined in the SMT-LIB format specification, that is, with +an answer of \texttt{sat}, \texttt{unsat}, or \texttt{unknown} for +\akey{check-sat} commands, and with a \texttt{success} answer for +other commands. +Solvers must not write anything to the standard error channel. + +\header{Benchmark Selection.} See page~\pageref{benchmark-selection}. + +\header{Time Limit.} +This track will use a wall-clock time limit of 20 minutes per solver/benchmark +pair. + +\header{Trace Executor.} This track will use the trace executor from +{\url{https://github.com/SMT-COMP/trace-executor}} +to execute a solver on an incremental benchmark file. + +% \subsection{\challtrack} +% \label{sec:exec:industry-challenge} + +% The \challtrack will include both non-incremental and incremental benchmarks. +% It will follow the same rules as the \maintrack and \inctrack, respectively, +% with two exceptions: benchmark selection and the time limit. + +% \header{Benchmark Selection.}\todo{fixme} +% This track will run on challenging industrial benchmarks provided by the +% community. This year, the \challtrack will include the complete set of provided +% benchmarks dedicated to this track, which consist of both incremental and +% single query benchmarks in the logics \todo{QF\_BV, QF\_ABV and QF\_AUFBV}. The +% complete list of benchmarks along with instructions on how to access them will +% be provided on the SMT-COMP website as soon as the benchmark library is +% released. + +% \header{Time Limit.} +% This track will use a wall-clock time limit of 12 hours per solver/benchmark +% pair. + +% \header{Post-Processor.} +% This track will use the post-processors from the \maintrack (for +% non-incremen\-tal benchmarks) and the \inctrack (for incremental benchmarks) +% to accumulate the results. + +\subsection{\ucoretrack} +\label{sec:exec:unsat-core} + +The \ucoretrack will evaluate the capability of solvers to generate +unsatisfiable cores. Performance of solvers will be measured by correctness +and size of the unsatisfiable core they provide. + +\header{Benchmark Selection.} +This track will run on a selection of non-incremental \emph{unsat} benchmarks +(as described on page~\pageref{benchmark-selection}), modified +to use named top-level assertions of the form \akey{(assert (! t :named f ))}. +Since SMT-COMP 2026, the benchmarks will be drawn from the SMT-LIB repository of +the preceding year rather than the current year. + +\header{Input/Output.} +The SMT-LIB language provides a command \akey{(get-unsat-core)}, which asks +a solver to identify an unsatisfiable core after a \akey{check-sat} +command returns \texttt{unsat}. +This unsat core must consist of a list of all named top-level +assertions in the format prescribed by the SMT-LIB standard. +% +Solvers must respond to each command in the benchmark script with the +answers defined in the SMT-LIB format specification. In particular, +solvers that respond \texttt{unknown} to the \akey{check-sat} command +must respond with an error to the following \akey{get-unsat-core} +command. + +\header{Result.} +The result of a solver is considered \emph{erroneous} if (i) the +response to the \akey{check-sat} command is \texttt{sat} or (ii) the +returned unsatisfiable core is not unsatisfiable. +% +If the solver replies \texttt{unsat} to \akey{check-sat} but gives no +response to \akey{get-unsat-core}, this is considered as no reduction, i.e., +as if the solver would have returned the entire benchmark as an unsat +core. + +\header{Validation.} +The organizers will use a selection of SMT solvers (the \emph{validation +solvers}) that participate in the \maintrack of this competition in order to +validate if a given unsat core is indeed unsatisfiable. For each division, the +organizers will use only solvers that have been sound (i.e., they did not +produce any erroneous result) in the \maintrack for this division. The +unsatisfiability of an unsat core is validated if the number of checking solvers +whose result is \texttt{unsat} is strictly greater than the number of validation +solvers whose result is \texttt{sat}. In particular, if no checking solver +produces \texttt{unsat}, the unsat core is not validated. + +\header{Time Limit.} This track will use a wall-clock time limit of +20 minutes per solver/benchmark pair both for unsatisfiable core generation and +for the subsequent validation. + +\subsection{\mvaltrack} +\label{sec:exec:model} +The \mvaltrack will evaluate the capability of solvers to produce models for +satisfiable problems. Performance of solvers will be measured by correctness +and well-formedness of the model they provide. + +\header{Benchmark Selection.} This track has the divisions QF\_Bitvec, +QF\_DataTypes, QF\_Equality, +QF\_Equality+Bitvec, +QF\_Equality+(Non)LinearArith, +QF\_(Non)LinearIntArith, and QF\_(Non)\-LinearRealArith. + % +This year all divisions with non-linear arithmetic, arrays, and datatypes +are experimental divisions. +% +The track will run on a +selection of non-incremental \emph{sat} benchmarks from these +logics (as described on page~\pageref{benchmark-selection}). + +\header{Input/Output.} +The SMT-LIB language provides a command \akey{(get-model)} to request a +satisfying model after a \akey{check-sat} command returns \texttt{sat}. This +model must consist of definitions specifying all and only the current +user-declared function symbols, in the format prescribed by the SMT-LIB +standard. + +\header{Result.} +The result of a solver is considered erroneous if the response to the +\akey{check-sat} command is \texttt{unsat}, if the returned model is not +well-formed (e.g. does not provide a definition for all the user-declared +function symbols), or if the returned model does not satisfy the benchmark. + +\header{Validation.} +In order to check that the model satisfies the benchmark, the organizers will +use the model validating tool, Dolmen, which can be built using the smtcomp tool. +It expects as model input a file with the answer to the \akey{check-sat} +command followed by the solver response to the \akey{get-model} command. +The model validator tool will output +\begin{enumerate} + \item VALID\hskip .5em for a \texttt{sat} solver response + followed by a full satisfying model; + \item INVALID\hskip .5em for + \begin{itemize}[noitemsep,topsep=0pt] + \item an \texttt{unsat} solver response to \akey{check-sat} or + \item models that do not satisfy the input problem. + \end{itemize} + \item UNKNOWN\hskip .5em for + \begin{itemize}[noitemsep,topsep=0pt] + \item no solver output (no response to either both commands or + \akey{get-model}), + \item an \texttt{unknown} response to \akey{check-sat}, or + \item malformed models, e.g., partial models. + \end{itemize} +\end{enumerate} + +The new experimental divisions require additional syntax to represent +their models. We propose the new syntax on the SMT-COMP website +{\url{https://smt-comp.github.io/2026/model.html}}. + +\header{Time Limit.} +This track will use a wall-clock time limit of 20 minutes per solver/benchmark +pair. The time limit for checking the satisfying assignment is yet to be +determined, but is anticipated to be around 15 minutes of wall-clock time. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\subsection{\paralleltrack{}} +The \paralleltrack{} will evaluate the capability of solvers to +determine the satisfiability of problems in a shared-memory parallel +computing environment. The track will be experimental. + +\header{Benchmark Selection.} +We will select non-incremental benchmarks from the SMT-LIB divisions +based on the participating solvers. In total 400 instances will be +chosen such that their run times are sufficiently high based on our +estimation. + +\header{Time Limit.} +This track will use a wall-clock time limit of 20 minutes per +solver/benchmark pair. + +\section{Benchmarks and Problem Divisions} + +\header{Divisions.} +% +Within each track there are multiple divisions, and each division selects +benchmarks from a specific group of SMT-LIB logics in the SMT-LIB benchmark +library. + +\header{Competitive Divisions and Logics.} A division or a logic is \emph{competitive} +for the given track if at least two substantially different solvers (i.e., +solvers from two different teams) were submitted to it. Although the organizers +may enter other solvers for comparison purposes, only solvers that are +explicitly submitted by their authors determine whether divisions and logics are +competitive, and are eligible to be designated as winners. We will \textbf{not} +run \emph{non-competitive} divisions and logics. + +\header{Benchmark sources.} +Benchmarks for each division will be drawn from the SMT-LIB benchmark +library from the previous year, i.e., from the 2025 release of the library. +The \maintrack and \paralleltrack{} will use a +subset of all \emph{non-incremental} benchmarks and the \inctrack will +use a subset of all \emph{incremental} benchmarks. +% +% The \challtrack will use all incremental and non-incremental benchmarks +% dedicated to this track by their submitters. +% +The \ucoretrack will use a selection of non-incremental +\emph{unsat} benchmarks +and more than one top-level assertion, modified to use named +top-level assertions. The \mvaltrack will use a selection of non-incremental +benchmarks with status \texttt{sat} from logics QF\_BV, QF\_IDL, QF\_RDL, +QF\_LIA, QF\_LRA, QF\_LIRA, QF\_UF, QF\_UFBV, QF\_UFIDL, QF\_UFLIA, QF\_UFLRA. +To determine whether a benchmark is sat or unsat, a combination of +the benchmark's status and the result of the \maintrack{} will be used. + +\header{Benchmark demographics.} +% +The set of all SMT-LIB benchmarks in the logics of a given division can be +naturally partitioned to sets containing benchmarks that are similar from the +user community perspective. +% +Such benchmarks could all come from the same +application domain, be generated by the same tool, or have some other +obvious common identity. +% +The organizers try to identify a meaningful partitioning based on the +directory hierarchy in SMT-LIB. In many cases the hierarchy consists of +the top-level directories each corresponding to a submitter, who has +further imposed a hierarchy on the benchmarks. +% +The organizers believe that the submitters have the best information on the +common identity of their benchmarks and therefore partition each logic in a +division based on the bottom-level directory imposed by each submitter. These +partitions are referred to as \emph{families}. + +\header{Benchmark selection.} \label{benchmark-selection} +The competition will use a large subset of SMT-LIB benchmarks, with some +guarantees on including new benchmarks, using the following selection process: +\begin{enumerate} +\item \emph{Remove inappropriate benchmarks.} The + organizers may remove benchmarks that are deemed inappropriate or + uninteresting for competition, or cut the size of certain benchmark + families to avoid their over-representation. SMT-COMP attempts to + give preference to benchmarks that are ``real-world,'' in the sense + of coming from or having some intended application outside SMT. +\item \emph{Remove easy / uninteresting benchmarks.} + For the following tracks, all benchmarks that can be + considered as easy or uninteresting based on the following criteria + will be removed. + \begin{itemize} + \item \emph{\maintrack.} All benchmarks that were solved by all + solvers (including non-competitive solvers) in less than one second in + the corresponding track in 2018--2024. + \item \emph{\ucoretrack.} All benchmarks with only a single \texttt{assert} command. + \end{itemize} + +\item + For the \ucoretrack, all benchmarks with status + \texttt{sat} are removed. We further remove benchmarks with status + \texttt{unknown} for which no sound solver reported them to be + \texttt{unsat}. + % + For the \mvaltrack, all benchmarks with status \texttt{unsat} are removed, + as well as benchmarks with status + \texttt{unknown} for which no sound solver reported them to be + \texttt{sat}. + + In case of a dispute (some solver marks a benchmark as \texttt{sat} + and some other solver as \texttt{unsat}), the benchmark may be retained in + the selection. + +\item \emph{Cap the number of instances in a division.} + The number of benchmarks in a division based on the size of the + corresponding logics in SMT-LIB will be limited as follows. + Let $n$ be the number of benchmarks in an SMT-LIB logic, then + the benchmarks will be chosen as follows: + %the number of chosen benchmarks is $min(n, max(300, 50n/100))$: + \begin{enumerate} + \vspace{-1ex} + \item \label{bench-sel-300} if $n \le 300$ then all instances will be selected; + \item \label{bench-sel-600} if $300 < n \leq 600$ then a subset of 300 instances + from the logic will be selected; + \item \label{bench-sel-600+1000-} if $600 < n \leq 1000$ then + 50\% of the benchmarks of the logic will be selected; + \item and \label{bench-sel-more} if $n > 1000$ then $500 + \frac{n - 1000}{10}$ + of the benchmarks of the logic will be selected. + \end{enumerate} +\end{enumerate} +% +The selection process in cases \ref{bench-sel-600+1000-} and \ref{bench-sel-more} +above will guarantee the inclusion of new benchmarks by first picking randomly +one benchmark from each new benchmark family. Moreover, the case of +\ref{bench-sel-more} guarantees that the competition remains computable +in a reasonable amount of time. The rest of the benchmarks will be chosen +randomly from the remaining benchmarks using a uniform distribution. +% +The benchmark selection script will be publicly available at +\url{https://github.com/SMT-COMP/smt-comp.github.io} and +will use the same random seed as the rest of the competition. The set of +benchmarks selected for the competition will be published when the competition +begins. + +\header{Heats.} +% +Since the organizers at this point are unsure how long the set of +benchmarks may take (which will depend also on the number of solvers +submitted), the competition may be run in \emph{heats}. For each +track and division, the selected benchmarks may be randomly divided +into a number of (possibly unequal-sized) heats. Heats will be run in +order. If the organizers determine that there is adequate time, all +heats will be used for the competition. Otherwise, incomplete heats +will be ignored. + +\header{Benchmark scrambling.} +% +Benchmarks will be slightly scrambled before the competition, using a simple +benchmark scrambler available at \url{https://github.com/SMT-COMP/scrambler}. +The benchmark scrambler will be made publicly available before the competition. +% +Naturally, solvers must not rely on previously determined identifying +syntactic characteristics of competition benchmarks in testing +satisfiability. Violation of this rule is considered cheating. + +\header{Pseudo-random numbers.} +% +Pseudo-random numbers used, e.g., for the creation of heats or the +scrambling of benchmarks, will be generated using the standard C +library function \texttt{random()}, seeded (using \texttt{srandom()}) +with the sum, modulo $2^{30}$, of the integer numbers provided in the +system submissions (see Section~\ref{sec:entrants}) by all SMT-COMP +entrants other than the organizers'. Additionally, the integer part +of one hundred times the opening value of the New York Stock Exchange +Composite Index on +the first day the exchange is open on or after the date specified in +the timeline (Section~\ref{sec:important}) will be added to the other +seeding values. This helps provide transparency, by guaranteeing that +the organizers cannot manipulate the seed in favor of or against any +particular submitted solver. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Scoring} +\label{sec:scoring} + +\subsection{Benchmark scoring} +\label{sec:benchmark-scoring} + +The \textbf{parallel benchmark score} of a solver is a tuple $\langle +e, n, \it{aw}, w, \it{ac},c\rangle$ with +\begin{itemize}[noitemsep] + \vspace{-1ex} + \item \makebox[6em][l]{$e \in \{0, 1\}$} + number of erroneous results (usually~$e = 0$) + \item \makebox[6em][l]{$0 \leq n \leq N$} + number of correct results (resp.~\emph{reduction} for the \ucoretrack) + \item \makebox[6em][l]{$\it{aw} \in [0,T]$} + actual wall-clock time in seconds (real-valued) + \item \makebox[6em][l]{$w \in [0,T]$} + wall-clock time score in seconds (real-valued) + \item \makebox[6em][l]{$\it{ac} \in [0, mT]$} + actual CPU time in seconds (real-valued) + \item \makebox[6em][l]{$c \in [0, mT]$} + CPU time score in seconds (real-valued) +\end{itemize} + +\header{Error Score ($\mathbf{e}$).} +For the \maintrack, \inctrack and \paralleltrack, +$e$ is the number of returned +statuses that disagree with the given expected status (as described above, +disagreements on benchmarks with unknown status lead to the benchmark being +disregarded). For the \ucoretrack, $e$ includes, in addition, the number of +returned unsat cores that are not, in fact, unsatisfiable (as +validated by a selection of other solvers selected by organizers). For the +\mvaltrack, $e$ includes, in addition, the number of returned models that are +not full satisfiable models. + +\header{Correctly Solved Score ($\mathbf{n}$).} +For the \maintrack, \inctrack, \mvaltrack, and \paralleltrack, +$N$ is defined as the number of \akey{check-sat} commands, and +$n$ is defined as the number of correct results. +For the \ucoretrack, $N$ is defined as the number of named top-level assertions, +and $n$ is defined as the \emph{reduction}, i.e., the difference between $N$ +and the size of the unsat core. + +\header{Actual Wall-Clock Time ($\mathbf{aw}$).} +The actual (real-valued) wall-clock time in seconds, until time limit $T$ or the +solver process terminates. + +\header{Wall-Clock Time Score ($\mathbf{w}$).} +For the \maintrack, \ucoretrack, \mvaltrack and \paralleltrack, +the wall-clock time score $w$ is the same as +the actual (real-valued) wall-clock time $\mathit{aw}$, +except that it is zero if the benchmark was not correctly solved within the time limit $T$, i.e., +$w = 0$ if $e = 1$, the process did not terminate within the time limit $T$, or +it did return unknown or an unknown result. +For the \inctrack, the wall-clock time score $w$ +is the (real-valued) wall-clock time in seconds until the process returned the last time sat/unsat within the time limit; +this means especially that $w = 0$ if the process never returned sat/unsat within the time limit. + +\header{Actual CPU Time ($\mathbf{ac}$).} +The (real-valued) CPU time in seconds, measured across all $m$ cores +until time limit $mT$ is reached or the solver process +terminates. + +\header{CPU Time Score ($\mathbf{c}$).} +For the \maintrack, \ucoretrack, \mvaltrack, and \paralleltrack, +the CPU time score $c$ is the same as +the actual (real-valued) CPU time $\mathit{ac}$, +except that it is zero if the benchmark was not correctly solved within the time limit $mT$, i.e., +$c = 0$ if $e = 1$, the process did not terminate within the time limit $mT$, or +it did return unknown or an unknown result. +For the \inctrack, the CPU time score $c$ +is the (real-valued) CPU time in seconds until the process returned the last time sat/unsat within the time limit; +this means especially that $c = 0$ if the process never returned sat/unsat within the time limit. + +\subsubsection{Sequential Benchmark Score} +\label{sec:sequential} + +The parallel score as defined above favors parallel solvers, which may utilize +all available processor cores. To evaluate sequential performance, we derive a +\textbf{sequential score} by imposing a \emph{virtual} CPU time limit equal to +the wall-clock time limit~$T$. A solver result is taken into consideration for +the sequential score only if the solver process terminates \emph{within} this +CPU time limit. More specifically, for a given parallel performance $\langle +e, n, \mathit{aw}, w, \mathit{ac}, c\rangle$, the corresponding sequential performance is defined +as~$\langle e_S, n_S, c_S\rangle$, where +\begin{itemize} +\item $e_S = 0$, $n_S = 0$, and $c_S = 0$ if $c > T$; +\item $e_S = e$, $n_S = n$, and $c_S = c$ + otherwise.\footnote{Under this + measure, a solver should not benefit from using multiple processor + cores. Conceptually, the sequential performance should be (nearly) + unchanged if the solver was run on a single-core processor, up to a + time limit of~$T$.} +\end{itemize} + +\subsubsection{\maintrack and \paralleltrack} + For the \maintrack and \paralleltrack, the error score + $e$ and the correctly solved score $n$ are defined as + \begin{itemize} + \item $e=0$ and $n=0$ if the solver + \begin{itemize}[noitemsep,nolistsep] + \item aborts without a response, or + \item the result of the \akey{check-sat} command is \texttt{unknown}, + \end{itemize} + \item $e=0$ and $n=1$ if the result of the \akey{check-sat} command is + \texttt{sat} or \texttt{unsat} and either + \begin{itemize}[noitemsep,nolistsep] + \item agrees with the benchmark status, + \item or the benchmark status + is unknown,\footnote{If the benchmark status is unknown, we thus treat + the solver's answer as correct. Disagreements between different + solvers on benchmarks with unknown status are governed in + Section~\ref{sec:division-scoring}.} + \end{itemize} + \item $e=1$ and $n=0$ if the result of the \akey{check-sat} command is + incorrect. + \end{itemize} +% +Note that a (correct or incorrect) response is taken into +consideration even when the solver process terminates abnormally, or +does not terminate within the time limit. Solvers should take care +not to accidentally produce output that contains \texttt{sat} or +\texttt{unsat}. + +\subsubsection{\inctrack} +% +An application benchmark may contain multiple \akey{check-sat} +commands. Solvers may partially solve the benchmark before timing +out. The benchmark is run by the trace executor, measuring the total +time (summed over all individual commands) taken by the solver to +respond to commands.\footnote{Times measured by the \executionservice may include + time spent in the trace executor. We expect that this time will + likely be insignificant compared to time spent in the solver, and + nearly constant across solvers.} Most time will likely be spent in +response to \akey{check-sat} commands, but \akey{assert}, \akey{push} +or \akey{pop} commands might also entail a reasonable amount of +processing. For the \inctrack, we have +\begin{itemize} +\item $e=1$ and $n=0$ if the solver returns an incorrect result for any + \akey{check-sat} command within the time limit, +\item otherwise, $e=0$ and $n$ is the number of correct results for + \akey{check-sat} commands returned by the solver before the time + limit is reached. +\end{itemize} + +\subsubsection{\ucoretrack} + For the \ucoretrack, the error score $e$ and the correctly solved score $n$ + are defined as + \begin{itemize} + \item $e=0$ and $n=0$ if the solver + \begin{itemize}[noitemsep,nolistsep] + \item aborts without a response to \akey{check-sat}, or + \item the result of the \akey{check-sat} command is \texttt{unknown}, + \item the result of the \akey{get-unsat-core} command is not wellformed. + \end{itemize} + \item $e=1$ and $n=0$ if the result is erroneous according to + Section~\ref{sec:exec:unsat-core}, + \item otherwise, $e=0$ and $n$ is the \emph{reduction} in the number of + formulas, i.e., $n = N$ minus the number of formula names in the + reported unsatisfiable core. + \end{itemize} + +\subsubsection{\mvaltrack} + For the \mvaltrack, the error score $e$ and the correctly solved score $n$ + are defined as + \begin{itemize} + \item $e=0$ and $n=0$ if the result is UNKNOWN according to the output of + the model validating tool described in Section~\ref{sec:exec:model}, + \item $e=1$ and $n=0$ if the result is INVALID according to the output of + the model validating tool described in Section~\ref{sec:exec:model}, + \item otherwise, $e=0$ and $n=1$. + \end{itemize} + +\subsection{Division scoring} +\label{sec:division-scoring} + +For each track and division, we compute a division score based on the parallel +performance of a solver (the \emph{parallel division score}). For the +\maintrack, \ucoretrack and \mvaltrack we also compute a division +score based on the sequential performance of a solver (the \emph{sequential +division score}). Additionally, for the \maintrack, we further determine three +additional scores based on parallel performance: The \emph{24-second score} +will reward solving performance within a time limit of 24 seconds (wall clock +time), the \emph{sat score} will reward (parallel) performance on satisfiable +instances, and the \emph{unsat score} will reward (parallel) performance on +unsatisfiable instances. +% +Finally, in divisions composed by more than one logic, all the above scores will +be presented not only for the overall division but also for each logic composing +the division. + +\header{Sound Solver.} +A solver is \emph{sound} on benchmarks with \emph{known status} for a division +if its parallel performance (Section~\ref{sec:benchmark-scoring}) is of the +form $\langle 0, n, \it{aw}, w, \it{ac}, c\rangle$ for each benchmark in the division, i.e., if +it did not produce any erroneous results. + +\header{Disagreeing Solvers.} +Two solvers \emph{disagree} on a benchmark if one of them reported \texttt{sat} +and the other reported \texttt{unsat}. + +\header{Removal of Disagreements.} +Before division scores are computed for the \maintrack{}, +benchmarks with \emph{unknown status} are removed from the competition results +if two (or more) solvers that are sound on benchmarks with known status +disagree on their result. +% +Only the remaining benchmarks are used in the following computation of division +scores (but the organizers \emph{will report disagreements} for informational +purposes). + +%\header{Derived solver scoring} A derived solver $S^+$ can only win a division +%iff it outperforms its base solver $S$ by at least $10\%$ on the PAR-2 score, i.e., +%$\mathit{PAR2}(S^+) > 1.10 * \mathit{PAR2}(S)$. + +\subsubsection{Parallel Score} + +The parallel score for a division is computed for \emph{all} tracks. It is +defined for a participating solver in a division with $M$ benchmarks as the sum +of all the individual parallel benchmark scores: +$$\sum_{b\in M} \langle e_b , n_b , \it{aw}_b, w_b, \it{ac}_b, c_b\rangle.$$ + +\noindent +A parallel division score $\langle +e, n, \mathit{aw}, w, \mathit{ac}, c\rangle$ is better than a parallel +division score\newline $\langle e', n', \mathit{aw}', w', \mathit{ac}', c'\rangle$ +iff $e < e'$ or ($e = e'$ and $n +> n'$) or ($e = e'$ and $n = n'$ and $w < w'$) or ($e = e'$ and $n = n'$ and $w += w'$ and $c < c'$). That is, fewer errors takes precedence over more correct +solutions, which takes precedence over less wall-clock time taken, which takes +precedence over less CPU time taken. + + +\subsubsection{PAR-2 Score} +The PAR-2 score for a solver in a division is its parallel division score +$\langle e, n, \mathit{aw}, w, \mathit{ac}, c \rangle$ where the individual +benchmark score components $w_b$ and $c_b$ are penalized by a factor of 2 for +unsolved benchmarks: + +$$w_b = \begin{cases} \mathit{aw}_b & \text{if } e_b = 0 \text{ and benchmark } +b \text{ is solved} \\ 2T & \text{otherwise} \end{cases}$$ +and analogously $c_b = \mathit{ac}_b$ if solved, $2mT$ otherwise. + + +\subsubsection{Sequential Score} + +The sequential score for a division is computed for \emph{all} tracks +\emph{except} the \inctrack and \paralleltrack. +\footnote{Since incremental track benchmarks may be partially +solved, defining a useful sequential performance for the incremental track +would require information not provided by the parallel performance, e.g., +detailed timing information for each result. Due to the nature of +\paralleltrack, we will not consider the sequential +scores}. It is defined for a +participating solver in a division with $M$ benchmarks as the sum of all the +individual sequential benchmark scores: +$$\sum_{b\in M} \langle e_b^s, n_b^s, \mathit{aw}_b^s, w_b^s, \mathit{ac}_b^s, c_b^s\rangle.$$ + +\noindent +A sequential division score $\langle e^s, n^s, c^s\rangle$ is better than a +sequential division score $\langle e^{s'}, n^{s'}, c^{s'}\rangle$ iff +$e^s < e^{s'}$ or ($e^s = e^{s'}$ and $n^s > n^{s'}$) or ($e^s = e^{s'}$ and +$n_S = n^{s'}$ and $c^s < c^{s'}$). +That is, fewer errors takes precedence over more correct solutions, +which takes precedence over less CPU time taken. + +We will not make any comparisons between parallel and sequential performances, +as these are intended to measure fundamentally different performance +characteristics. + +\subsubsection{24-Seconds Score (\maintrack)} + +The 24-seconds score for a division is computed for the \maintrack as the +parallel division score with a wall-clock time limit $T$ of 24 seconds. + +\subsubsection{Sat Score (\maintrack)} + +The sat score for a division is computed for the \maintrack as the +parallel division score when only satisfiable instances are considered. + +\subsubsection{Unsat Score (\maintrack)} +The unsat score for a division is computed for the \maintrack as the +parallel division score when only unsatisfiable instances are considered. + +\subsection{Competition-Wide Recognitions} + +Between 2014 and 2018, the SMT competition had used a competition-wide scoring +that emphasized the breadth of solver participation by summing up a score for +each (competitive) division a solver competed in. This was discontinued in 2019 +in favor of \emph{biggest lead} and \emph{largest contribution} rankings +to avoid favoring solvers that entered into a large number of divisions. The two +rankings have been used since. + +In SMT-COMP 2025, we reintroduced the \emph{best overall ranking} besides the +two rankings to showcase both the overall qualities of the solvers and also +their strengths in individual divisions. This ranking is also used in SMT-COMP +2026. + +\subsubsection{Best Overall Ranking} +This ranking aims to select the solver that \emph{is most universal}, i.e., +solved the largest number of benchmarks after taking the division sizes into +account. + +Let $\langle e^D, n^D, \mathit{aw}^D, w^D, \mathit{ac}^D, c^D \rangle$ be the +parallel division score for the given solver in the division $D$ (for a given +scoring system, e.g., number of correct results or reduction). Let $N^D$ be total +number of benchmarks in division $D$ that were used in the competition. The +normalized correctness score $\mathit{nn}^D$ of the solver in the division $D$ +is defined as +\[ + \mathit{nn}^D = + \begin{cases} + \left( \frac{n^D}{N^D} \right) ^2, &\text{if }e^D = 0, \\ + -2, &\text{if }e^D > 0. + \end{cases} +\] +The \emph{overall score}\footnote{\emph{Rationale:} % This metric purposely + % emphasizes breadth of solver participation---a solver participating in many + % logics need not be the best in any one of them. The metric is quite simple and + % is independent of the performance of other solvers. + % + The goal of the square is to give some advantage to solvers + that complete close to all benchmarks in a division. Therefore, a + solver still needs to do reasonably well compared to winners to be + able to catch up purely by the breadth of participation, i.e., the + number of logics it supports. + + The constant penalty for errors reflects the fact that any error in a + particular division renders a solver untrustworthy for that division. + Moreover, the value~$2$ subtracts an equivalent of two other completely solved + divisions and balances the community's strong interest in reliable and + thoroughly tested solvers against the risk of stifling innovation. Entering a + possibly buggy solver that can solve all benchmarks has a positive expected + value if the probability of some soundness bug in each of the divisions is + below $\frac{1}{1+2} \approx 33$\,\%. + + The log scaling adjusts the scores for the wide variety of numbers of + benchmarks in different divisions. It seems a reasonable compromise between + linearly combining numbers of benchmarks, which would overweigh large + divisions, and simply summing the fraction of benchmarks solved, which would + overweigh small divisions.} of the solver is then sum of +$\mathit{nn}^D \cdot \log_{10} N^D$ over all competitive divisions $D$ into +which the solver has entered. The solvers are ranked based on the overall score +and ties are resolved using the CPU time in sequential scoring and wall-clock +time in parallel scoring. + +\subsubsection{Biggest Lead Ranking} + +This ranking aims to select the solver that \emph{won by the most} in +some competitive division. The winners of each division are ranked by +the distance between them and the next competitive solver in that +division. + +Let $n_i^D$ be the correctness score of the $i$th solver (for a given scoring +system e.g. number of correct results or reduction) in division $D$. +The \emph{correctness rank} of division $D$ is given as +\[ +\frac{n_1^D+1}{n_2^D+1} +\] +Let $c_i^D$ be the CPU time score of the $i$th solver in division $D$. +The \emph{CPU time rank} of division $D$ is given as +\vspace{-1ex} +\[ +\frac{c_2^D+1}{c_1^D+1} +\] +Let $w_i^D$ be the wall-clock time score of the $i$th solver in division $D$. +The \emph{wall-clock time rank} of division $D$ is given as +\vspace{-1ex} +\[ +\frac{w_2^D+1}{w_1^D+1} +\] +The \emph{biggest lead winner} is the winner of the division with the highest +(largest) correctness rank. In case of a tie, the winner is determined as the +solver with the higher corresponding CPU (resp. wall-clock) time rank for +sequential (resp. parallel) scoring. +This can be computed per scoring system. + +\subsubsection{Largest Contribution Ranking} + +This ranking aims to select the solver that \emph{uniquely +contributed} the most in some division, or to put another way, the +solver that would be most missed. This is achieved by computing a +solver's contribution to the \emph{virtual best solver} for a +division. + +Let $\langle e^s, n^s, \mathit{aw}^s, w^s, \mathit{ac}^s, c^s \rangle$ be the parallel division score +for solver $s$ (for a given scoring system, i.e., $n$ is either number of correct +results or reduction). +If the division error score $e^s > 0$, then solver $s$ is considered unsound +and excluded from the ranking. +If the number of sound competitive solvers~$S$ in a division $D$ is $|S| \leq 2$, +the division is excluded from the ranking. + +Let $\langle e_b^s, n_b^s, \mathit{aw}_b^s, w_b^s, \mathit{ac}_b^s, c_b^s \rangle$ be the parallel benchmark +score for benchmark $b$ and solver $s$ (for a given scoring system). +The virtual best solver \emph{correctness score} for a division $D$ with +competitive sound solvers $S$ is given as +\[ +\mathit{vbss}_n(D,S) = \sum_{b \in D} {\sf max}\{ n_b^s \mid s \in S \text{ and } n_b^s > 0 \} +\] +where the maximum of an empty set is 0 (i.e., no contribution if a benchmark +is unsolved). + +The virtual best solver \emph{CPU time score} $\mathit{vbss}_c$ and +the virtual best solver \emph{wall-clock time score} $\mathit{vbss}_w$ for a +division $D$ with competitive sound solvers $S$ is given as +\[ +\mathit{vbss}_c(D,S) = \sum_{b \in D} {\sf min}\{ c_b^s \mid s \in S \text{ and } n_b^s > 0 \} +\] +\[ +\mathit{vbss}_w(D,S) = \sum_{b \in D} {\sf min}\{ w_b^s \mid s \in S \text{ and } n_b^s > 0 \} +\] +where the minimum of an empty set is 1200 seconds +(no solver was able to solve the benchmark). + +In other words, for the single query track, +$\mathit{vbss}_c(D,S)$ and $\mathit{vbss}_w(D,S)$ is the +smallest amount of CPU time and wall-clock time taken to solve all benchmarks +solved in division $D$ using all sound competitive solvers in $S$. + +Let $S$ be the set of competitive solvers competing in division $D$. +The \emph{correctness rank} $\mathit{vbss}_n$, the \emph{CPU time rank} +$\mathit{vbss}_c$ and the \emph{wall-clock time rank} $\mathit{vbss}_w$ +of solver $s \in S$ in division $D$ are then defined as +\[ +1- \frac{\mathit{vbss}_n(D,S-s) }{ \mathit{vbss}_n (D,S)} +\hspace{3em} +1- \frac{\mathit{vbss}_c(D,S) }{ \mathit{vbss}_c(D,S-s)} +\hspace{3em} +1- \frac{\mathit{vbss}_w(D,S) }{ \mathit{vbss}_w(D,S-s)} +\] +i.e., the difference in virtual best solver score when removing $s$ from the +computation. + +These ranks will be numbers between 0 and 1 with 0 indicating that $s$ made no +impact on the \emph{vbss} and 1 indicating that $s$ is the only solver that +solved anything in the division. +% +The ranks for a division $D$ in a given track will be normalized by multiplying +with $\frac{n_D}{N}$, where $n_D$ corresponds to the number of competitive +solver/benchmark pairs in division $D$ and $N$ being the overall number of +competitive solver/benchmark pairs of this track. + +The \emph{largest contribution winner} is the solver across all divisions with +the highest (largest) normalized correctness rank. Again, this can be computed +per scoring system. In case of a tie, the winner is determined as the solver +with the higher corresponding normalized CPU (resp. wall-clock) time rank for +sequential (resp. parallel) scoring. + +\subsection{Other Recognitions} + +The organizers will also recognize the following contributions: +% +\begin{itemize} +\item \emph{New entrants}. All new entrants (to be interpreted by the organisers, but broadly a significantly new tool that has not competed in the competition before) that beat an existing solver in some division will be awarded special commendations. +\item \emph{Benchmarks}. Contributors of new benchmarks used in the competition will receive a special mention. +\end{itemize} +% +These recognitions will be announced at the SMT workshop and published on the competition website. +The organizers reserve the right to recognize other outstanding +contributions that become apparent in the competition results. + +\subsection{Derived Solver Scoring \& Recognition} +A derived solver is eligible for winning a logic, track, or division only if it +substantially outperforms its base solver in the given logic, track, or +division, respectively. In particular, the derived solver is eligible for +winning only if it achieves \emph{at least a $10\,\%$ improvement on the PAR-2 + score over its base solver}. Derived solvers can win the biggest lead, largest + contribution, and best overall ranking, if they achieve \emph{at least a $10\,\%$ + improvement on the PAR-2 score over its base solver in at least one division.} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Judging} + +The organizers reserve the right, with careful deliberation, to remove +a benchmark from the competition results if it is determined that the +benchmark is faulty (e.g., syntactically invalid in a way that affects +some solvers but not others); and to clarify ambiguities in these +rules that are discovered in the course of the competition. Authors +of solver entrants may appeal to the organizers to request such +decisions. Organizers that are affiliated with solver entrants will +be recused from these decisions. The organizers' decisions are final. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\section{Acknowledgments} + +SMT-COMP 2026 is organized under the direction of the SMT Steering +Committee. The organizing team is +% +\begin{itemize} +\setlength{\itemsep}{0pt} +\item \href{https://wintered.github.io/}{Dominik Winterer}~-- University of Manchester, United Kingdom (chair) +\item \href{https://www.fi.muni.cz/~xjonas/}{Martin Jon\'{a}\v{s}}~-- Masaryk University, Czechia +\item \href{https://usi.to/bisk}{Tomáš Kolárik}~-- Università della Svizzera italiana, Switzerland +\end{itemize} +% +The competition chairs are responsible for policy and procedure decisions, +such as these rules, with input from the co-organizers. + +Many others have contributed benchmarks, effort, and feedback. Clark Barrett, +Pascal Fontaine, Aina Niemetz, Mathias Preiner, and Hans-Jörg Schurr are maintaining the SMT-LIB +benchmark library. +The competition uses a \href{https://github.com/sosy-lab/benchexec}{BenchExec} cluster +owned by LMU's Software and Computational Systems Lab (SoSy-Lab \url{https://www.sosy-lab.org}), +who are kind enough to support our competition with their computing power. Dirk +Beyer and Philipp Wendler are providing essential BenchExec support. + +\header{Disclosure.} +Dominik Winterer has been conducting large-scale testing campaigns for discovering and preventing unsoundness issues in SMT solvers. Martin Jon\'{a}\v{s} is part of the developing team of the SMT solver Q3B~\cite{Q3B} and was a member of the developing team of the SMT solver MathSAT5~\cite{MathSAT5}. Tomáš Kolárik is a developer of the SMT solvers OpenSMT~\cite{opensmt2} and SMTS. + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\pagebreak +\bibliographystyle{plain} +\bibliography{biblio} + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\end{document} + +%% Local Variables: +%% mode: latex +%% mode: flyspell +%% ispell-local-dictionary: "american" +%% LocalWords: arity Heizmann logics Reger satisfiability SMT StarExec Tjark +%% End: