Replies: 1 comment
-
|
I like this idea! I believe it would be better if we could be more precise.
Maybe we can come of with an "ontology" of how benchmarks relate. With ontology I mean something like the SZS ontology for solver return values. We don't need to go that detailed, but we could define some common relations. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Some problems can be encoded in various ways, resulting in e.g. many possible logics. For example, an LRA problem combined with traversing a graph can result in combinations such as LRA, LIRA, BVLRA, UFLRA, etc. Such encodings can result in quite different performance across solvers. Thanks to @hansjoergschurr, we have a database that may allow us to easily filter out related benchmarks and compare their performance.
Generally speaking, different encodings can also be done within the same logic, while still potentially affecting the performance. Not only for the filtering, but also for the sake of maintaining the order of the benchmarks, it would be reasonable to include explicit metadata in such similar benchmarks.
For example, say that an existing benchmark is encoded in LRA, and one wants to submit an alternative encoding of the same benchmark encoded in LIRA. That would require adding a field such as
This would require a unique identifier for the benchmark. A straightforward possibility is to use the relative path of the benchmark used in the repository, including the family name etc.
For example, given a benchmark
non-incremental/QF_LRA/AddamsFamily/bench1.smt2, its derived benchmarknon-incremental/QF_LIRA/AddamsFamily/bench1.smt2would include fieldassuming that the derived benchmarks always stay in the same track.
The above is a bit verbose (only the logic differs), not sure about some abbreviations.
A "derived-from" relationship probably does not cover all possibilities of "similar" benchmarks, but I do not have a better suggestion at the moment.
What do you think?
For example,
@hansjoergschurr @mpreiner @aehyvari @bobot
Beta Was this translation helpful? Give feedback.
All reactions