Skip to content

Make ScmFunction lookup 1-based#550

Merged
x87 merged 3 commits into
masterfrom
fix_scmfunction_0
Mar 13, 2026
Merged

Make ScmFunction lookup 1-based#550
x87 merged 3 commits into
masterfrom
fix_scmfunction_0

Conversation

@x87
Copy link
Copy Markdown

@x87 x87 commented Mar 10, 2026

  • Fixes an issue when the script uses return on the root level using a default ScmFunction Id (0).
{$CLEO}
script_name {name} "A"

foo()

debug_on
trace "after foo"
terminate_this_custom_script

function foo()
    while true
        wait {time} 0
        print_formatted_now {format} "inside foo" {time} 50
    end
end
{$CLEO}
script_name {name} "B"

wait {time} 5000
cleo_return 0

terminate_this_custom_script

Without fix:

"After foo" is printed, meaning script B is executing code from script A, plus destroying its call stack.

With fix:

Nothing is printed

@x87 x87 requested review from MiranDMC March 10, 2026 19:29
Copy link
Copy Markdown
Collaborator

@MiranDMC MiranDMC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is zero error checks for scenarios outside normal working conditions.

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 10, 2026

Storage validates max capacity, it is the only thing it should care. Anything else is validated on the caller side.

Get should not throw errors, because it can be used, e.g. for counting of used/free spots.

@MiranDMC
Copy link
Copy Markdown
Collaborator

Storage validates max capacity, it is the only thing it should care. Anything else is validated on the caller side.

Get should not throw errors, because it can be used, e.g. for counting of used/free spots.

Then implement checks in other places. I'm bored of wasting time troubleshooting crashes that could be prevented by basic coding discipline.
Apparently you are not using CLEO to anything so problems like that do not affect you.

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 10, 2026

Does this MR fix the issue with the script in the first post? What other crashes you think of?

@MiranDMC
Copy link
Copy Markdown
Collaborator

Storage validates max capacity, it is the only thing it should care. Anything else is validated on the caller side.

Get should not throw errors, because it can be used, e.g. for counting of used/free spots.

It should be method of the class then. Nobody has businesses to do things like that from outside.

@MiranDMC
Copy link
Copy Markdown
Collaborator

Does this MR fix the issue with the script in the first post? What other crashes you think of?

Any situation where invalid input arguments are used. Referencing not allocated/already deleted function ID is sign of serious problems. Currently situations like that are silently ignored, or some other function is returned if it happens to exists at that slot.

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 10, 2026

Does this MR fix the issue with the script in the first post? What other crashes you think of?

Any situation where invalid input arguments are used. Referencing not allocated/already deleted function ID is sign of serious problems. Currently situations like that are silently ignored, or some other function is returned if it happens to exists at that slot.

I'm not fixing imaginary issues. I spent last 6 months tirelessly fixing concrete, difficult bugs, caused sometimes by careless refactoring. If you have a specific scenario, please share it. Talk is cheap.

Comment thread source/ScmFunction.cpp Outdated
{
ScmFunction* ScmFunction::store[Store_Size] = {0};
size_t ScmFunction::allocationPlace = 0;
const size_t Default_Index = 1;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two year of refactoring and you adding stuff like this. Why not declared in header file. Why it is member of CLEO namespace?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is class related property. Single look at header gives idea about general contents of the entire class.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constants belong to the logic in my book. why would I need to open header when I have my cpp file open? It's only used here

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how spaghetti is created. You either look at the header and have general idea what it is all about or spend time analyzing every function inside cpp file.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very convincing, but ok

@MiranDMC
Copy link
Copy Markdown
Collaborator

MiranDMC commented Mar 10, 2026

Technically fixes the issue but entire solution is so lazy it is even below vibe-coding quality.
If it turn out for example the MoonLoader scripts has field SCMFunction initialized with value 13, then are you going to make slot 13 unused too?

I would give approval but I'm not looking forward for rebaseing my branch.

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 10, 2026

Technically fixes the issue but entire solution is so lazy it is even below vibe-coding quality. If it turn out for example the MoonLoader scripts has field SCMFunction initialized with value 13, then are you going to make slot 13 unused too?

I would give approval but I'm not looking forward for rebaseing my branch.

lol.

Ok, show me how your BULLETPROOF coding solution helps to catch errors in this case

{$CLEO}
script_name {name} "A"

foo()

debug_on
trace "after foo"
terminate_this_custom_script

function foo()
    while true
        wait {time} 0
        print_formatted_now {format} "inside foo" {time} 50
    end
end
{$CLEO}
script_name {name} "B"

wait {time} 5000

40@ = 257 // ONLY THIS LINE ADDED TO ORIGINAL EXAMPLE
cleo_return 0

terminate_this_custom_script

@x87 x87 mentioned this pull request Mar 11, 2026
@MiranDMC
Copy link
Copy Markdown
Collaborator

MiranDMC commented Mar 11, 2026

Ok, show me how your BULLETPROOF coding solution helps to catch errors in this case

On master branch your example only allocates function with index 0 then tries to use index 1.

Updated the script:

{$CLEO}
script_name {name} "A"

foo(1)

debug_on
trace "after foo"
terminate_this_script

function foo(depth: int)
    if
        depth <= 0
    then
        while not is_key_just_pressed {keyCode} KeyCode.Num1
            wait {time} 0
            print_formatted_now {format} "Inside foo. 1 to exit" {time} 50
        end
    else
        depth--
        foo(depth)
    end
end

Key breaking infinite loop was added.

As previously "after foo" is executed by script B, but then after pressing 1 button error is shown:
image
It claims the problem is user scripting error in script A.

Meanwhile on the version from #537 error is thrown the moment B script calls return:
image
Reason is actually the script log trying to query current stack depth.

After disabling script logging error is thrown also the moment script B tries to return:
image
Bit unexpected is fact B is suspended but pressing 1 then causes
image

@MiranDMC
Copy link
Copy Markdown
Collaborator

MiranDMC commented Mar 11, 2026

{$CLEO}
int ptr, func
script_name {name} "B"

40@ = 0x100

ptr = get_this_script_struct
func = read_memory_with_offset {address} ptr {offset} 0xDD {size} 2 // CRunningScript::ScmFunction
debug_on
trace "ScmFunction is now %d" func

wait {time} 5000

cleo_return 0

trace "B not suspended"
terminate_this_script

Updated B script

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 11, 2026

'Introduction to Debugging 101' by Miran:

ignore the original code and write completely new, unrelated one from scratch ✅

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 11, 2026

I used version from #537 and it does not produce any errors in the code I posted. But it shows "after foo".

@MiranDMC
Copy link
Copy Markdown
Collaborator

MiranDMC commented Mar 11, 2026

I used version from #537 and it does not produce any errors in the code I posted. But it shows "after foo".

Right, it does after running second game session. Bug introduced during implementation of idea of skipping slot 0. I have to untangle it.

@MiranDMC
Copy link
Copy Markdown
Collaborator

ignore the original code and write completely new, unrelated one from scratch ✅

By doing 40@ = 257 you are also setting bIsMission to true

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 11, 2026

ignore the original code and write completely new, unrelated one from scratch ✅

By doing 40@ = 257 you are also setting bIsMission to true

maybe, I did not validate that, the idea was to modify other two bytes

@MiranDMC
Copy link
Copy Markdown
Collaborator

MiranDMC commented Mar 11, 2026

It is not gonna to be bullet proof. But corrupted function id will not always target already allocated slots. Then the correct error message makes sense.
Currently it is exactly same as in case of using return without call. If script logging is active it will report incorrect stack depth in lucky scenario, in bad one it will get caught in loop causing stack overflow and exiting game without any info (no error message, no crash info etc.).

Per script storage would make it more likely to hit empty slots and would solve problem of cross execution. But there come external scripts that we do not know when they get terminated.

Another problem I spotted is after script termination its call stack is not cleared in the storage.

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 12, 2026

If script logging is active it will report incorrect stack depth in lucky scenario, in bad one it will get caught in loop causing stack overflow and exiting game without any info (no error message, no crash info etc.).

Is that true on this branch? My understanding is that it should eventually hit id=0 and stop.

@MiranDMC
Copy link
Copy Markdown
Collaborator

Is that true on this branch? My understanding is that it should eventually hit id=0 and stop.

Case I had shows it is somehow possible. Take note the script can get corrupted id then go forward and call more functions.
I would not depend on anything to eventually happen.
Requesting function from storage with id other than 0 and getting nullptr is sign of serious problem inside CLEO.

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 12, 2026

Is that true on this branch? My understanding is that it should eventually hit id=0 and stop.

Case I had shows it is somehow possible. Take note the script can get corrupted id then go forward and call more functions. I would not depend on anything to eventually happen. Requesting function from storage with id other than 0 and getting nullptr is sign of serious problem inside CLEO.

there is only one public function that allows it: CLEO_GetCleoCallStackSize. And It should handle this case better then, instead of default to 0. It can return -1 to the caller and let them decide how to handle this. The logger can put error message in the log, other library can ignore it, third library can show error message, etc.

@MiranDMC
Copy link
Copy Markdown
Collaborator

Should it really be caller role to decide if and how to report corruption inside the CLEO? CLEO_GetCleoCallStackSize takes as argument the script pointer, not function id. So there is no other scenario than severe problem already happened or is about to happen.

@MiranDMC
Copy link
Copy Markdown
Collaborator

MiranDMC commented Mar 12, 2026

There is another issue. Now I'm getting heap corruption error when:

  • script B executes "after foo"
  • then 1 is pressed to cause problem in A
  • new game session is started

Happens on master, this branch, and the #537 too.

Call stack leads to:

CCustomScript::~CCustomScript()
{
    if (BaseIP) delete[] BaseIP;

It also occurred before I rebased to current master (without recent custom mission BaseIP changes).

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 12, 2026

Double free

@MiranDMC
Copy link
Copy Markdown
Collaborator

MiranDMC commented Mar 12, 2026

Well, BaseIP is also stored in ScmFunction stack entry. It is needed when switching modules.

I added fixes to the other branch, but it still does not solve this case. If script uses other thread's return it restores its BaseIP as its own.
No idea how to solve that. Maybe it is time to make it per script, or at least store script pointer and check it's correctness during return.

@MiranDMC
Copy link
Copy Markdown
Collaborator

Ok seems I squished all issues on the other branch. "after foo" no longer is executed by B, A is unaffected at all, double free and few other bugs found and fixed.

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 12, 2026

Well, BaseIP is also stored in ScmFunction stack entry. It is needed when switching modules.

#552

@x87
Copy link
Copy Markdown
Author

x87 commented Mar 13, 2026

Moving forward with MVP for better stability

@x87 x87 merged commit 54fe1af into master Mar 13, 2026
3 checks passed
@x87 x87 deleted the fix_scmfunction_0 branch March 13, 2026 03:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants