Skip to content

Conversation

@peternewman
Copy link
Contributor

Should this be targeted at main or develop?

Happy to hear other name suggestions, I've mirrored strlen for now.

I did wonder should I add push/pop at the same time (and possibly others...)

@dnmeid
Copy link
Member

dnmeid commented Feb 23, 2025

Here are my thoughts:
I think the name of strlen() is dated back to the time when variables had always and ever been strings and was chosen to make that more clear. I would keep strlen for legacy reasons and add a new generic function length(). That function length should work with strings (number of unicode graphemes), arrays (number of elements), JSON (number of properties). It could also work for integer numbers by returning the length of the string representation. We can then extend this if we introduce e.g. enums in the future.

As it is a feature that I think will not need super heaviy testing it should be targeted at main.

Feel free to add more useful functions.

Please write a little less minimalistic description in the getting started. Especially examples are very useful and the description needs to be suitable for non programmers. If you say a function "lenght" gives the lenght, that is not super useful. You need to explain more basic, like "count of elements".

@dnmeid
Copy link
Member

dnmeid commented Feb 26, 2025

@peternewman still there?

@peternewman
Copy link
Contributor Author

Here are my thoughts: I think the name of strlen() is dated back to the time when variables had always and ever been strings and was chosen to make that more clear. I would keep strlen for legacy reasons and add a new generic function length(). That function length should work with strings (number of unicode graphemes), arrays (number of elements), JSON (number of properties). It could also work for integer numbers by returning the length of the string representation. We can then extend this if we introduce e.g. enums in the future.

That sounds very sensible, I'll pivot to that.

As it is a feature that I think will not need super heaviy testing it should be targeted at main.

Great, I'll leave this targeted as is then.

Feel free to add more useful functions.

I realised pop would need to return both the modified array (or modify it in place) and the item that was popped, I'm not aware of other functions working like that currently. Obviously push would be pretty easy.

Please write a little less minimalistic description in the getting started. Especially examples are very useful and the description needs to be suitable for non programmers. If you say a function "lenght" gives the lenght, that is not super useful. You need to explain more basic, like "count of elements".

Sorry, lazy copy/paste of the strlen thing.

@peternewman still there?

Yeah sorry, just competing open source stuff and other commitments.

@dnmeid
Copy link
Member

dnmeid commented May 31, 2025

@peternewman What's the state of this PR? Currently I see some failing tests and conflicts. v4 is around the corner and I'd be happy to have this PR included.

@peternewman
Copy link
Contributor Author

@peternewman What's the state of this PR? Currently I see some failing tests and conflicts.

I've managed to resolve the conflicts.

I'm not sure what to do about the failing test though. As I mentioned in https://github.com/bitfocus/companion/pull/3317/files#r1981604888 it seems like it's a bug in unicode-segmenter or I'm not using it right or something? Do you have any suggestions?

v4 is around the corner and I'd be happy to have this PR included.

Yes, me too.

@peternewman peternewman changed the title feat: Add array length function, closes #3308 feat: Add generic length function, closes #3308 May 31, 2025
@Julusian
Copy link
Member

Julusian commented Jun 1, 2025

I've just had the thought, should this be considering unicode like this?
I ask because the other functions aren't which could lead to confusing results.

for example:
image

so if you are doing something that combines length and substr (or any other string function), that is going to give incredibly confusing and 'broken' behaviour.
A likely scenario, is the naive trim off the last character substr(my_str, 0, length(my_str)-1).

So I think that short term this should not consider multi character unicode.
And then a follow up can be done (I think its too late in 4.0 for this) which looks at all the functions and makes then consider unicode correctly. This could be a deep rabbithole, or could be pretty simple to achieve.

@dnmeid
Copy link
Member

dnmeid commented Jun 1, 2025

I've just had the thought, should this be considering unicode like this?

Yes, it should. That is a valid grapheme.

so if you are doing something that combines length and substr (or any other string function), that is going to give incredibly confusing and 'broken' behaviour.

substr is not unicode aware, so you should combine it with the correct counterpart strlen.
I think the correct counterpart for length is slice which we don't have yet.

@dnmeid
Copy link
Member

dnmeid commented Jun 2, 2025

@peternewman I found the issue with the count of 2
It is not a bug and we can keep this as a correct test result
The order of the two codepoints is wrong
U+0308 U+0061 should give 2 and
U+0061 U+0308 should give 1.
They are all looking the same, what makes it even more confusing considering that a grapheme should be what you see, but that is the unicode definition and the difference gets obvious when you add a regular char at the front. The combining mark has to be after the base char.

image

So you just need to adjust the test.

@Julusian
Copy link
Member

Julusian commented Jun 2, 2025

Ah, I forgot there was strlen already 🤦

Then my only request is that the docs should make it really clear which functions consider unicode correctly, and which ones are naive string versions

@peternewman
Copy link
Contributor Author

@peternewman I found the issue with the count of 2 It is not a bug and we can keep this as a correct test result The order of the two codepoints is wrong U+0308 U+0061 should give 2 and U+0061 U+0308 should give 1. They are all looking the same, what makes it even more confusing considering that a grapheme should be what you see, but that is the unicode definition and the difference gets obvious when you add a regular char at the front. The combining mark has to be after the base char.

Hmm, that feels a bit to me like the current rendering is broken (in the web-browser) and it should have shown it as ..a or that unicode-segmenter is broken, which doesn't sound like it from what you've said.

substr is not unicode aware, so you should combine it with the correct counterpart strlen. I think the correct counterpart for length is slice which we don't have yet.

This seems to imply that slice isn't fully unicode compatible:
https://stackoverflow.com/a/70303029

There was mention of a unicode substring type tool elsewhere.

I'm quite happy to implement slice if it actually works as we want with unicode.

I've just had the thought, should this be considering unicode like this? I ask because the other functions aren't which could lead to confusing results.

Then my only request is that the docs should make it really clear which functions consider unicode correctly, and which ones are naive string versions

I've attempted to improve on this. Let me know if we've got preferred terms for grapheme and byte that might be more user-friendly? I think I've caught all the other broken functions, but let me know if I've missed any.

@Julusian
Copy link
Member

Julusian commented Jun 2, 2025

This seems to imply that slice isn't fully unicode compatible:

our substr function is using slice internally, so Im pretty sure it isnt. But we could expose a method called slice (or whatever) that uses some other implementation internally.

@Julusian Julusian merged commit c70317d into bitfocus:main Jun 2, 2025
3 of 5 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Companion Plan Jun 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants