feat: Add generic length function, closes #3308 #3317

peternewman · 2025-02-23T22:05:30Z

Should this be targeted at main or develop?

Happy to hear other name suggestions, I've mirrored strlen for now.

I did wonder should I add push/pop at the same time (and possibly others...)

dnmeid · 2025-02-23T23:37:08Z

Here are my thoughts:
I think the name of strlen() is dated back to the time when variables had always and ever been strings and was chosen to make that more clear. I would keep strlen for legacy reasons and add a new generic function length(). That function length should work with strings (number of unicode graphemes), arrays (number of elements), JSON (number of properties). It could also work for integer numbers by returning the length of the string representation. We can then extend this if we introduce e.g. enums in the future.

As it is a feature that I think will not need super heaviy testing it should be targeted at main.

Feel free to add more useful functions.

Please write a little less minimalistic description in the getting started. Especially examples are very useful and the description needs to be suitable for non programmers. If you say a function "lenght" gives the lenght, that is not super useful. You need to explain more basic, like "count of elements".

dnmeid · 2025-02-26T12:38:20Z

@peternewman still there?

peternewman · 2025-02-26T13:07:43Z

Here are my thoughts: I think the name of strlen() is dated back to the time when variables had always and ever been strings and was chosen to make that more clear. I would keep strlen for legacy reasons and add a new generic function length(). That function length should work with strings (number of unicode graphemes), arrays (number of elements), JSON (number of properties). It could also work for integer numbers by returning the length of the string representation. We can then extend this if we introduce e.g. enums in the future.

That sounds very sensible, I'll pivot to that.

As it is a feature that I think will not need super heaviy testing it should be targeted at main.

Great, I'll leave this targeted as is then.

Feel free to add more useful functions.

I realised pop would need to return both the modified array (or modify it in place) and the item that was popped, I'm not aware of other functions working like that currently. Obviously push would be pretty easy.

Please write a little less minimalistic description in the getting started. Especially examples are very useful and the description needs to be suitable for non programmers. If you say a function "lenght" gives the lenght, that is not super useful. You need to explain more basic, like "count of elements".

Sorry, lazy copy/paste of the strlen thing.

@peternewman still there?

Yeah sorry, just competing open source stuff and other commitments.

shared-lib/test/expressions-functions.test.ts

shared-lib/lib/Expression/ExpressionFunctions.ts

dnmeid · 2025-05-31T12:57:06Z

@peternewman What's the state of this PR? Currently I see some failing tests and conflicts. v4 is around the corner and I'd be happy to have this PR included.

peternewman · 2025-05-31T23:21:10Z

@peternewman What's the state of this PR? Currently I see some failing tests and conflicts.

I've managed to resolve the conflicts.

I'm not sure what to do about the failing test though. As I mentioned in https://github.com/bitfocus/companion/pull/3317/files#r1981604888 it seems like it's a bug in unicode-segmenter or I'm not using it right or something? Do you have any suggestions?

v4 is around the corner and I'd be happy to have this PR included.

Yes, me too.

Julusian · 2025-06-01T18:30:09Z

I've just had the thought, should this be considering unicode like this?
I ask because the other functions aren't which could lead to confusing results.

for example:

so if you are doing something that combines length and substr (or any other string function), that is going to give incredibly confusing and 'broken' behaviour.
A likely scenario, is the naive trim off the last character substr(my_str, 0, length(my_str)-1).

So I think that short term this should not consider multi character unicode.
And then a follow up can be done (I think its too late in 4.0 for this) which looks at all the functions and makes then consider unicode correctly. This could be a deep rabbithole, or could be pretty simple to achieve.

dnmeid · 2025-06-01T23:58:25Z

I've just had the thought, should this be considering unicode like this?

Yes, it should. That is a valid grapheme.

so if you are doing something that combines length and substr (or any other string function), that is going to give incredibly confusing and 'broken' behaviour.

substr is not unicode aware, so you should combine it with the correct counterpart strlen.
I think the correct counterpart for length is slice which we don't have yet.

dnmeid · 2025-06-02T00:41:32Z

@peternewman I found the issue with the count of 2
It is not a bug and we can keep this as a correct test result
The order of the two codepoints is wrong
U+0308 U+0061 should give 2 and
U+0061 U+0308 should give 1.
They are all looking the same, what makes it even more confusing considering that a grapheme should be what you see, but that is the unicode definition and the difference gets obvious when you add a regular char at the front. The combining mark has to be after the base char.

So you just need to adjust the test.

Julusian · 2025-06-02T07:54:54Z

Ah, I forgot there was strlen already 🤦

Then my only request is that the docs should make it really clear which functions consider unicode correctly, and which ones are naive string versions

peternewman · 2025-06-02T15:05:36Z

@peternewman I found the issue with the count of 2 It is not a bug and we can keep this as a correct test result The order of the two codepoints is wrong U+0308 U+0061 should give 2 and U+0061 U+0308 should give 1. They are all looking the same, what makes it even more confusing considering that a grapheme should be what you see, but that is the unicode definition and the difference gets obvious when you add a regular char at the front. The combining mark has to be after the base char.

Hmm, that feels a bit to me like the current rendering is broken (in the web-browser) and it should have shown it as ..a or that unicode-segmenter is broken, which doesn't sound like it from what you've said.

substr is not unicode aware, so you should combine it with the correct counterpart strlen. I think the correct counterpart for length is slice which we don't have yet.

This seems to imply that slice isn't fully unicode compatible:
https://stackoverflow.com/a/70303029

There was mention of a unicode substring type tool elsewhere.

I'm quite happy to implement slice if it actually works as we want with unicode.

I've just had the thought, should this be considering unicode like this? I ask because the other functions aren't which could lead to confusing results.

Then my only request is that the docs should make it really clear which functions consider unicode correctly, and which ones are naive string versions

I've attempted to improve on this. Let me know if we've got preferred terms for grapheme and byte that might be more user-friendly? I think I've caught all the other broken functions, but let me know if I've missed any.

Julusian · 2025-06-02T19:02:09Z

This seems to imply that slice isn't fully unicode compatible:

our substr function is using slice internally, so Im pretty sure it isnt. But we could expose a method called slice (or whatever) that uses some other implementation internally.

feat: Add array length function, closes bitfocus#3308

355661d

peternewman added 6 commits February 26, 2025 14:36

Try and switch to new behaviour

b87f922

Handle undefined variables in length function

7eecaef

Add some JSON tests to the length function too

b8c20ba

Handle objects within length function too

c4c333b

chore: Fix prettier

f00633c

chore: Add a decimal value and a negative number to the length tests

d6b668f

dnmeid reviewed Feb 26, 2025

View reviewed changes

shared-lib/test/expressions-functions.test.ts Show resolved Hide resolved

dnmeid reviewed Feb 26, 2025

View reviewed changes

shared-lib/test/expressions-functions.test.ts Show resolved Hide resolved

dnmeid reviewed Feb 26, 2025

View reviewed changes

shared-lib/lib/Expression/ExpressionFunctions.ts Show resolved Hide resolved

peternewman added 6 commits February 26, 2025 21:10

Fix @dnmeid's comments/suggestions and add more tests

e8e43cf

Handle RegExp and prettier format

4babc0b

chore: Try and fix the RegExp test

89d945a

Fix some of the simpler edge case behaviour

c1d2435

Properly count UTF graphemes too

c5425bd

Add missing entry to yarn.lock

983f997

Julusian added this to Companion Plan May 31, 2025

github-project-automation bot moved this to In Progress in Companion Plan May 31, 2025

Julusian added this to the v4.0 milestone May 31, 2025

peternewman mentioned this pull request May 31, 2025

feat: Add length function, closes #3308 #3445

Closed

Merge branch 'main' into peternewman-array-length

ac650f2

chore: Prettier

2e32088

peternewman changed the title ~~feat: Add array length function, closes #3308~~ feat: Add generic length function, closes #3308 May 31, 2025

Unicode tests and grapheme orders for length function

4f3acb5

Julusian merged commit c70317d into bitfocus:main Jun 2, 2025
3 of 5 checks passed

github-project-automation bot moved this from In Progress to Done in Companion Plan Jun 2, 2025

Uh oh!

feat: Add generic length function, closes #3308 #3317

feat: Add generic length function, closes #3308 #3317

Uh oh!

Conversation

peternewman commented Feb 23, 2025

Uh oh!

dnmeid commented Feb 23, 2025

Uh oh!

dnmeid commented Feb 26, 2025

Uh oh!

peternewman commented Feb 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dnmeid commented May 31, 2025

Uh oh!

peternewman commented May 31, 2025

Uh oh!

Julusian commented Jun 1, 2025

Uh oh!

dnmeid commented Jun 1, 2025

Uh oh!

dnmeid commented Jun 2, 2025

Uh oh!

Julusian commented Jun 2, 2025

Uh oh!

peternewman commented Jun 2, 2025

Uh oh!

Julusian commented Jun 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants