-
-
Notifications
You must be signed in to change notification settings - Fork 217
Add split and split_once to bit_array module #803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
8536d3c
14b8688
2712885
f34f065
02a4db6
919f651
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,7 +11,8 @@ | |
| int_from_base_string/2, utf_codepoint_list_to_string/1, contains_string/2, | ||
| crop_string/2, base16_encode/1, base16_decode/1, string_replace/3, slice/3, | ||
| bit_array_to_int_and_size/1, bit_array_pad_to_bytes/1, index/2, list/5, | ||
| dict/1, int/1, float/1, bit_array/1, is_null/1 | ||
| dict/1, int/1, float/1, bit_array/1, is_null/1, bit_array_split/2, | ||
| bit_array_split_once/2 | ||
| ]). | ||
|
|
||
| %% Taken from OTP's uri_string module | ||
|
|
@@ -154,6 +155,21 @@ bit_array_slice(Bin, Pos, Len) -> | |
| catch error:badarg -> {error, nil} | ||
| end. | ||
|
|
||
| bit_array_split_once(Bin, Sub) -> | ||
| try | ||
| case binary:split(Bin, [Sub]) of | ||
| [<<>>, <<>>] -> {error, nil}; | ||
| [A, B] -> {ok, {A, B}}; | ||
| _ -> {error, nil} | ||
| end | ||
| catch error:badarg -> {error, nil} | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is this try-catch for?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It catches this case, where Erlang would raise. Now that I reread about the Erlang implementation I realize that it can also raise a |
||
| end. | ||
|
|
||
| bit_array_split(Bin, Sub) -> | ||
| try {ok, binary:split(Bin, [Sub], [global, trim_all])} | ||
| catch error:badarg -> {error, nil} | ||
| end. | ||
|
|
||
| base64_decode(S) -> | ||
| try {ok, base64:decode(S)} | ||
| catch error:_ -> {error, nil} | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -451,6 +451,80 @@ export function bit_array_slice(bits, position, length) { | |
| return new Ok(bitArraySlice(bits, start * 8, end * 8)); | ||
| } | ||
|
|
||
| export function bit_array_split_once(bits, pattern) { | ||
| try { | ||
| const patternEmpty = pattern.buffer.length < 1 | ||
| const patternLongerThanBits = pattern.buffer.length >= bits.buffer.length | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may not be the length of the bit array itself. |
||
| const incorrectArguments = !(bits instanceof BitArray) || !(pattern instanceof BitArray) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No runtime type checking please 🙏 |
||
| if (incorrectArguments || patternEmpty || patternLongerThanBits) { | ||
| return new Error(Nil); | ||
| } | ||
|
|
||
| const n = bits.buffer.length - pattern.buffer.length + 1; | ||
| find: for (let i = 0; i < n; i++) { | ||
| for (let j = 0; j < pattern.buffer.length; j++) { | ||
| if (bits.buffer[i + j] !== pattern.buffer[j]) { | ||
| continue find; | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This looks like quite an expensive algorithm, it is checking bytes multiple times even when we know they could not match. There's a few established algorithms we could use https://en.wikipedia.org/wiki/String-searching_algorithm. Boyer–Moore–Horspool seems fairly straightforward, but two-way algorithm seems to be the most popular approach.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it's a naive approach. I'll have a go at one of the more efficient algorithms. |
||
| } | ||
| const before = bits.buffer.slice(0, i); | ||
| const after = bits.buffer.slice(i + pattern.buffer.length); | ||
| return new Ok([new BitArray(before), new BitArray(after)]); | ||
| } | ||
|
|
||
| return new Error(Nil); | ||
| } catch (e) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's this try-catch for?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't remember if it catches a specific case. I'll check and remove it if not. |
||
| return new Error(Nil); | ||
| } | ||
| } | ||
|
|
||
| export function bit_array_split(bits, pattern) { | ||
| try { | ||
| const patternEmpty = pattern.buffer.length < 1 | ||
| const incorrectArguments = !(bits instanceof BitArray) || !(pattern instanceof BitArray) | ||
| if (incorrectArguments || patternEmpty) { | ||
| return new Error(Nil); | ||
| } | ||
|
|
||
| const bitsShorter = bits.buffer.length < pattern.buffer.length | ||
| if (bitsShorter) { | ||
| return new Ok(List.fromArray([bits])) | ||
| } | ||
|
|
||
| const results = []; | ||
| let lastIndex = 0; | ||
| const n = bits.buffer.length - pattern.buffer.length + 1; | ||
|
|
||
| find: for (let i = 0; i < n; i++) { | ||
| for (let j = 0; j < pattern.buffer.length; j++) { | ||
| if (bits.buffer[i + j] !== pattern.buffer[j]) { | ||
| continue find; | ||
| } | ||
| } | ||
|
|
||
| const bitsEqualsPattern = bits.buffer.length === pattern.buffer.length | ||
| if (bitsEqualsPattern) { | ||
| return new Ok(List.fromArray([])); | ||
| } | ||
|
|
||
| if (i > lastIndex) { | ||
| results.push(new BitArray(bits.buffer.slice(lastIndex, i))); | ||
| } | ||
|
|
||
| lastIndex = i + pattern.buffer.length; | ||
| i = lastIndex - 1; | ||
| } | ||
|
|
||
| if (lastIndex < bits.buffer.length) { | ||
| results.push(new BitArray(bits.buffer.slice(lastIndex))); | ||
| } | ||
|
|
||
| return new Ok(List.fromArray(results)) | ||
| } catch (e) { | ||
| return new Error(Nil); | ||
| } | ||
| } | ||
|
|
||
| export function codepoint(int) { | ||
| return new UtfCodepoint(int); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not implement this in Gleam rather than Erlang? Could be a bunch nicer, and we wouldn't need to use any private APIs which should not be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of this comment: #629 (comment)
I can give a stab at implementing it in Gleam if you'd prefer it. The Erlang
binary:splitis a BIF so performance-wise it makes sense to use it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I made a typo here. I meant to say JavaScript rather than Erlang 😅