-
Notifications
You must be signed in to change notification settings - Fork 89
Run containers attempt 3 #320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implements and tests `insert` and `insert_range` methods on runs.
This fixes some failing tests and adds some `#[allow(todo]` and `#[allow(unused]`.
I fixed some bugs found by the fuzzer, but found the following:
But the docs in the croaring crate state:
As such fuzzing fails because the rust implementation (currently) returns true if anything was optimized. Some things to consider: should we follow croaring's 'return true if bitmap contains any run' or 'return true if we changed anything'. I'm more in favor of the current (second) option, since it makes more sense. |
Thank you, @lucascool12, I think I am more in favor of the second option. I am willing to change the behavior and to do so we will have to document that in the breaking release. Would you mind updating the main PR message to include breaking changes, please? |
Hey @Dr-Emann and @lucascool12 👋 I was looking at the cargo fuzz discovery and trying to reproduce it on my laptop. However, I haven't found a way to provide the seed to Have a lovely week 🦆 |
I'm not sure how you can use the seed. But you can recreate the exact error from these lines:
You can use this command to recreate the crash file locally: echo 'E7AI/1ZMKZMn7f/+/wMA//8ZmzvcTAEAFf+Tk5OTk531CJNh' | base64 -d > fuzz/artifacts/against_croaring/crash-5496d655c8cc97f820e887fd9ab710723de50c8f Then you can run this exact crash using the suggested command in the logs cargo fuzz run --sanitizer=none against_croaring fuzz/artifacts/against_croaring/crash-5496d655c8cc97f820e887fd9ab710723de50c8f |
Thank you, @lucascool12, It works. I investigated a bit and both serialization gives the same bitmaps. They compare equally. I must investigate further to understand the reason why the serialization differs. I'll bet on either an empty container or a container with a different type 🤔 |
You can run with |
Nice investigation @Dr-Emann and thanks for the help. What tool do you use to analyze serialized bitmaps?
Maybe we can change the optimization condition so that roaring-rs decide to do a run as well? |
I wrote a Kaitai Struct definition for the format (RoaringBitmap/RoaringFormatSpec#17) and used it on https://ide.kaitai.io/devel/#, which gives that nice object-tree view of the parsed file format. |
@Kerollmops I think there's some confusion (or I'm confused), I think all the discussions about changing behavior is for the I just opened an issue and a PR against croaring-rs, the docs were just wrong, its behavior exactly matches the CRoaring behavior (because it's a thin wrapper). FWIW, I don't have any strong opinions on the better meaning of the return value of the |
Wait, what I said was false:
That's not true, a run container is a This isn't correct, an array container only takes However, even if that's fixed, we still have a problem, because of this: When there is a point where two container types are equally valid, both roaring-rs and CRoaring both default to leaving the container as is. In order to match serialization exactly in all cases, we would need to always create the same container types with the same operations. @lucascool12 do you think that's feasible? If not, we can relax the fuzzing to not require serializations to exactly match. |
Unless I misunderstand Roaring bitmaps, for any set of u16 a container will either be a bitmap or an array, where the choice between the two is based on the amount of values right? But run containers are allowed to replace both, depending on implementation details. Unless one calls As such shouldn't the optimize and remove runs method always produce the same results (barring bugs)? |
I have implemented this based on CRoaring's implementation in eff381a. |
I think the important factor is that:
There are cases where a container can be represented equally efficiently as either a range, or an {array/bitmap}. Both implementations (correctly imo) default to leaving the existing container type when converting to/from a run container is not strictly more efficient. Therefore, in these cases, the result of e.g. for the Roaring Bitmap containing [0, 1, 2], it could be represented in two ways
So e.g. both implementations have to match on the result type of container for all operations for all container types, e.g. |
I see. I don't think it is feasible to ensure we also produces runs in the exact same situations as CRoaring. Relaxing the serialization comparison would be the best option we have. |
Couldn't we call |
I ran the fuzzer with the following patch applied on croaring-rs and found nothing after letting it run for 45 minutes. Yeey! diff --git i/croaring-sys/CRoaring/roaring.c w/croaring-sys/CRoaring/roaring.c
index d49cda5..ba61acb 100644
--- i/croaring-sys/CRoaring/roaring.c
+++ w/croaring-sys/CRoaring/roaring.c
@@ -1494,7 +1494,7 @@ bool array_container_validate(const array_container_t *v, const char **reason);
* Return the serialized size in bytes of a container having cardinality "card".
*/
static inline int32_t array_container_serialized_size_in_bytes(int32_t card) {
- return card * 2 + 2;
+ return card * 2;
}
/** |
Hey @lucascool12 and @Dr-Emann 👋 I hope you're good 😊 I was wondering if the final change we want to merge this PR is to merge RoaringBitmap/CRoaring#702? And if so, what's actually missing for it to be merged? Have a nice day 🥬 |
I noticed that Also I think we are all in favour of the current semantics of |
Did find something in fuzzing: Fuzz input
Base64: Looking a bit closer at https://github.com/lucascool12/roaring-rs/blob/c3ebe863e377b58a0732f0ba27da13dc8a1b987f/fuzz/fuzz_targets/arbitrary_ops/mod.rs#L280-L282 x.run_optimize();
y.optimize();
assert_eq!(x.remove_run_compression(), y.remove_run_compression()); I don't think we can do that assert: If we've got a bitmap that can be either a bitmap or {array/bitmap}, the Think we could either just not check the return values, or we could use the |
So using a statistics call before and after and then checking no run containers exist? I tried adding |
This PR continues on #66. My main goal is to move each part of the original branch to the new project layout, e.g. the
run_store.rs
file or whatever it should be called.Each commit will move such a piece of code and also add tests for this (and then fix any resulting bugs).
Example of such a commit: a57aff1