Add split and split_once to bit_array module#803
Add split and split_once to bit_array module#803ackurat wants to merge 6 commits intogleam-lang:mainfrom
Conversation
f4511c5 to
14b8688
Compare
4282e36 to
919f651
Compare
lpil
left a comment
There was a problem hiding this comment.
Thank you! I've left some notes inline.
| [A, B] -> {ok, {A, B}}; | ||
| _ -> {error, nil} | ||
| end | ||
| catch error:badarg -> {error, nil} |
There was a problem hiding this comment.
It catches this case, where Erlang would raise.
Now that I reread about the Erlang implementation I realize that it can also raise a nif_error, so maybe the catch should be more generic to avoid raising altogether.
| try { | ||
| const patternEmpty = pattern.buffer.length < 1 | ||
| const patternLongerThanBits = pattern.buffer.length >= bits.buffer.length | ||
| const incorrectArguments = !(bits instanceof BitArray) || !(pattern instanceof BitArray) |
| export function bit_array_split_once(bits, pattern) { | ||
| try { | ||
| const patternEmpty = pattern.buffer.length < 1 | ||
| const patternLongerThanBits = pattern.buffer.length >= bits.buffer.length |
There was a problem hiding this comment.
This may not be the length of the bit array itself.
| } | ||
|
|
||
| return new Error(Nil); | ||
| } catch (e) { |
There was a problem hiding this comment.
I don't remember if it catches a specific case. I'll check and remove it if not.
| /// // -> Error(Nil) | ||
| /// ``` | ||
| @external(erlang, "gleam_stdlib", "bit_array_split_once") | ||
| @external(javascript, "../gleam_stdlib.mjs", "bit_array_split_once") |
There was a problem hiding this comment.
Why not implement this in Gleam rather than Erlang? Could be a bunch nicer, and we wouldn't need to use any private APIs which should not be used.
There was a problem hiding this comment.
Because of this comment: #629 (comment)
I can give a stab at implementing it in Gleam if you'd prefer it. The Erlang binary:split is a BIF so performance-wise it makes sense to use it.
There was a problem hiding this comment.
Sorry, I made a typo here. I meant to say JavaScript rather than Erlang 😅
| for (let j = 0; j < pattern.buffer.length; j++) { | ||
| if (bits.buffer[i + j] !== pattern.buffer[j]) { | ||
| continue find; | ||
| } |
There was a problem hiding this comment.
This looks like quite an expensive algorithm, it is checking bytes multiple times even when we know they could not match.
There's a few established algorithms we could use https://en.wikipedia.org/wiki/String-searching_algorithm. Boyer–Moore–Horspool seems fairly straightforward, but two-way algorithm seems to be the most popular approach.
There was a problem hiding this comment.
Yes, it's a naive approach. I'll have a go at one of the more efficient algorithms.
Hello,
This PR solves issue #568 and also adds a
splitfunction. Bothsplitandsplit_oncefunction as their counterparts in the thegleam/stringmodule, and have feature parity across the targets.There are two older PRs (#571, #629) which addresses this, but they seem to be outdated/abandoned.
The
splitfunction could take options to make it have feature parity with Erlang'sbinary:split/3, but I opted for a simpler version here. Let me know if you'd like me to add the options feature.