Skip to content

Various fixes related to makePromise() pointing behind EOF #4266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

techee
Copy link
Contributor

@techee techee commented Jun 2, 2025

For more info, see the individual commit messages and the discussion in geany/geany#4303

To reproduce the crash, use geany/geany@79fb415 from geany/geany#4303. Note that to reproduce this issue, it has to use CRLF line endings (might get lost when copy/pasting from the github web interface).

The issue can be reproduced using ctags --extras=+rg -o- wp-guest.php.

techee added 3 commits June 2, 2025 20:31
If indices passed to makePromise() are invalid (e.g. point behind EOF)
and this is detected during pushArea(), don't attempt to perform any
parsing with a subparser and discard the promise completely.
When a position passed to mio_seek() is behind EOF, set the internal
pointer to the position corresponding to EOF.

This is how normal fseek() behaves on my machine (probably because it
first has to read the necessary amount of bytes from the stream to get
to the right position). This could help in situations when the return value
of mio_seek() isn't checked and EOF position corresponds "more closely" to
the (invalid) position passed as the parameter instead of staying at
the same position.

Note that when using negative offset and the result is before the
beginning of the file, fseek() doesn't move its internal position
and stays at the same position (probably because it already knows how
many bytes have been read). So this case hasn't been updated in this
patch (though it might be worth considering).
@techee
Copy link
Contributor Author

techee commented Jun 2, 2025

Note that the MIO patches are more or less a "bonus" and not really essential. I think, regardless of what the exact problem of the offset miscalculation is, pushArea() should detect such situations and if something invalid is detected, discard the promise.

Copy link

codecov bot commented Jun 2, 2025

Codecov Report

Attention: Patch coverage is 80.95238% with 4 lines in your changes missing coverage. Please review.

Project coverage is 85.95%. Comparing base (c06d333) to head (f9969f5).

Files with missing lines Patch % Lines
main/read.c 75.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4266      +/-   ##
==========================================
- Coverage   85.95%   85.95%   -0.01%     
==========================================
  Files         246      246              
  Lines       63434    63445      +11     
==========================================
+ Hits        54525    54532       +7     
- Misses       8909     8913       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@b4n
Copy link
Member

b4n commented Jun 2, 2025

See also #4267 that adds a simpler version of the test case mentioned above. It seems that any CRLF that triggers a sub-parser will reveal the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this change is not a good idea. I see a few issues, which basically stem from the C standard.

First, the C standard does not specify this behavior. Quoting C17 N2176:

Description

The fseek function sets the file position indicator for the stream pointed to by stream. If a read or
write error occurs, the error indicator for the stream is set and fseek fails.

For a binary stream, the new position, measured in characters from the beginning of the file, is
obtained by adding offset to the position specified by whence. The specified position is the
beginning of the file if whence is SEEK_SET, the current value of the file position indicator if
SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not meaningfully support fseek calls
with a whence value of SEEK_END.

For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier
successful call to the ftell function on a stream associated with the same file and whence shall be
SEEK_SET.

After determining the new position, a successful call to the fseek function undoes any effects of the
ungetc function on the stream, clears the end-of-file indicator for the stream, and then establishes
the new position. After a successful fseek call, the next operation on an update stream may be
either input or output.

With After determining the new position, a successful call to the fseek function […] then establishes
the new position
, one could even read that it should not change the position if the call is not successful (and can it be if it's not actually honored?).

But at any rate, as there is nothing specifying this behavior, it can't be relied upon for FILE-based streams, and thus I don't see any value of changing the behavior for memory streams, it just adds a more or less random behavior for a specific case. If it's not standard, the caller shouldn't rely on it, or it's gonna be brittle.

Additionally, it's not something my libc does. I doesn't behave like you suggest (here I have glibc 2.36):
Using the very crude

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv)
{
  FILE *fp = fopen(argv[1], "rb");
  
  if (! fp)
    return 1;

  long off = atol(argv[2]);

  fprintf(stderr, "initial position: %ld\n", ftell(fp));
  fprintf(stderr, "seeking to %ld: %d\n", off, fseek(fp, off, SEEK_SET));
  fprintf(stderr, "ftell(): %ld\n", ftell(fp));
  fprintf(stderr, "seeking to +32: %d\n", fseek(fp, 32, SEEK_CUR));
  fprintf(stderr, "ftell(): %ld\n", ftell(fp));
  
  fprintf(stderr, "seeking to end: %d\n", fseek(fp, 0, SEEK_END));
  fprintf(stderr, "ftell(): %ld\n", ftell(fp));

  fclose(fp);
  
  return 0;
}

and running it:

$ ./seek.out seek.c 1000
initial position: 0
seeking to 1000: 0
ftell(): 1000
seeking to +32: 0
ftell(): 1032
seeking to end: 0
ftell(): 613

You can clearly see that it sought (hehe, seek()ed) past the end: when seeking to SEEK_END, I get 613, but if I seek to 1000 I get that 1000 (which is then past the end), and I can even further SEEK_CUR happily.


So in the end I'd rather not have this change any internal on failure, and leave it to the caller. Originally MIO was designed to mimic as closely as possible the equivalent C library calls, not adding any fanciness anywhere it, and I don't think it should deviate to what can be expected from the C library.

At some point it had tests to compare against the C library itself, maybe we should re-introduce this in uctags -- not that it would catch anything here, as the test would have not to depend on unspecified behavior, but we could verify it's not diverging on specified behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it was just an idea inspired by the fact that ctags uses mio_seek() and mio_setpos() without any checks of the returned value and that I thought that in the error cases it would be better to be "closer" to the provided value. The right thing of course is to properly check the return values and recover accordingly.

Let's discard this patch.

b4n added a commit to b4n/fishman-ctags that referenced this pull request Jun 2, 2025
@b4n
Copy link
Member

b4n commented Jun 2, 2025

Apart from the remark above, looks reasonable. I'll have to test it properly with a clean head though, it's 1:20am here and I'm not fresh anymore -- but what's left of my brain like the other changes 🙂

@techee
Copy link
Contributor Author

techee commented Jun 3, 2025

I've added a few more checks inside pushArea() - I believe all the mio_setpos() and mio_setpos() return values should be checked because the offsets coming from parsers could be wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants