Skip to content

Comments

main/read: revise the offset calculation on nested input stream#4212

Merged
masatake merged 17 commits intouniversal-ctags:masterfrom
masatake:main/read--revise-offset-calculation-in-nested-input-stream
May 6, 2025
Merged

main/read: revise the offset calculation on nested input stream#4212
masatake merged 17 commits intouniversal-ctags:masterfrom
masatake:main/read--revise-offset-calculation-in-nested-input-stream

Conversation

@masatake
Copy link
Member

Conceptually this pull request is derrived from #4097.
The commit logs are not updated yet.

See also #3727.
fpos_t makes our code too complicated.

@masatake masatake changed the title main/read: revise the offset calculation in nested input stream main/read: revise the offset calculation on nested input stream Mar 18, 2025
@masatake masatake force-pushed the main/read--revise-offset-calculation-in-nested-input-stream branch from b498b9b to b3b8694 Compare March 20, 2025 00:09
@masatake
Copy link
Member Author

There are too many inconsistencies in the nested-stream related code.

@masatake masatake force-pushed the main/read--revise-offset-calculation-in-nested-input-stream branch from b3b8694 to 0773776 Compare March 30, 2025 19:47
@codecov
Copy link

codecov bot commented Mar 30, 2025

Codecov Report

Attention: Patch coverage is 97.63314% with 4 lines in your changes missing coverage. Please review.

Project coverage is 85.86%. Comparing base (6e6c9a2) to head (edd51f3).
Report is 35 commits behind head on master.

Files with missing lines Patch % Lines
main/read.c 96.19% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4212      +/-   ##
==========================================
+ Coverage   85.85%   85.86%   +0.01%     
==========================================
  Files         242      242              
  Lines       62610    62693      +83     
==========================================
+ Hits        53752    53833      +81     
- Misses       8858     8860       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@masatake masatake mentioned this pull request Mar 30, 2025
@masatake masatake force-pushed the main/read--revise-offset-calculation-in-nested-input-stream branch from 0773776 to 55a26ca Compare April 7, 2025 02:04
@masatake masatake force-pushed the main/read--revise-offset-calculation-in-nested-input-stream branch from 55a26ca to 71088c3 Compare April 13, 2025 18:41
@masatake
Copy link
Member Author

This must be rebased on #4231.

@masatake masatake force-pushed the main/read--revise-offset-calculation-in-nested-input-stream branch from 25717dc to bc343c5 Compare April 21, 2025 21:11
@masatake
Copy link
Member Author

178140f may be applicable to PEG-based parsers.

A test case:

baz.md

## T
```toml
# Th
[o]
[d]
```
$ ~/bin/ctags --options=NONE --extras=+g /tmp/baz.md

An internal error is reported as a warning:

ctags: Notice: No options will be read from files or environment
ctags: Warning: given end line (1) for the tag (o) in the file (/tmp/baz.md) is smaller than its start line: 2

@leleliu008
Copy link
Member

The CI failes have been fixed. please rebase master branch.

@masatake masatake force-pushed the main/read--revise-offset-calculation-in-nested-input-stream branch 2 times, most recently from 1607ba1 to c917fcf Compare April 29, 2025 20:25
masatake and others added 12 commits May 2, 2025 03:16
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…ere no data has been read yet

The original code didn't consider the case that the parser calling
getInputColumnNumber() run as a guest parser.

This commit considers the case and simplifies the logic.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…neral case

The original code didn't consider the case that the parser calling
getInputColumnNumber() run as a guest parser.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…F case

This was original proposed as a part of universal-ctags#4097 by @b4n.
However, the fix exposed various hidden breakage in guest parsing.

The previous computation could result in invalid values, which could
lead to invalid promise ranges and subsequent crashes.

This change considers two additional items to the original logic:

- the area offset, and
- ungetch buffer

Co-authored-by: Colomban Wendling <ban@herbesfolles.org>
Co-authored-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…rent coordinate system

The original code returns a pos in the absolute coordinate system.
For the code expecting the function returns the absolute coordinate
system, this changes adds a more generic function:

   MIOPos getInputFilePositionForLineFull (unsigned int line,
                                           enum areaCoord areaCoord);

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…s as a guest

Fix universal-ctags#4181.

This commit consists of two interdependent parts that could not be
separated into two individual commits. Introducing either part first
would surface a previously hidden crash—the first sequence triggers
one crash, while introducing the other part first triggers a different
one. Merging any crash-inducing commit would hamper future git bisect
debugging. Therefore, both parts are combined here into a single
commit.

The first part changes main/promise.c and main/read.c.

When a parser made a promise for the same area (thin area) of the
parser, the calculation of the area was delayed till the promise is
forced.

This delaying can be a bug if a guest parser makes a thin area because
the guest parser expects the thin area is the same as the current area
where the guest parser processes.  On the other hand, forcing the
promise is always done in the base area.

With this change, makePromise() does the calculation of the thin area
if the parser runs as a guest.

The second part changes main/lxpath.c and parsers/yaml.c.

The meta parsers user external libraries (libxml2 and libyaml) for
parsing. The libraries provides the way to get the current line
numbers. However, the coordinate system of the line number is the
current coordinate system. updateTagLine() and setTagEndLine() expect
a line number in the absolute coordinate system. In the 2nd part, we
utilize translateLineNumber() for convert the line numbers to the
values in the absolute coordinate system.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: Masatake YAMATO <yamato@redhat.com>
(Edited by @masatake)
Where ideal ctags should extract "post-", the current
implementation extacts  "post-<?".
Don't include (part of) the PHP open tag in the HTML input.
b4n and others added 2 commits May 2, 2025 03:25
Fix universal-ctags#4097

JSP tags shouldn't trigger the PHP parser.  This used to happen and
trigger an infinite loop of a ping-pong effect with the HTML parser
asking the PHP parser to parse the JSP block, and the PHP parser, not
recognizing the input, asking the HTML one to deal with it.  Rinse and
repeat.
This function is for fixing peg based parsers.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake masatake force-pushed the main/read--revise-offset-calculation-in-nested-input-stream branch from c917fcf to 8356581 Compare May 1, 2025 18:30
@masatake masatake marked this pull request as ready for review May 1, 2025 18:31
@masatake masatake requested a review from Copilot May 1, 2025 18:31
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR aims to revise the offset calculation on nested input streams by simplifying the use of fpos_t. The changes include updates to test input files for markdown and HTML to support the new stream handling.

Reviewed Changes

Copilot reviewed 23 out of 41 changed files in this pull request and generated 2 comments.

File Description
Units/parser-markdown.r/packcc-parser-running-within-code-block.d/input.md Adds a markdown input file with a code block intended for toml testing data; consider clarifying the purpose of the leading "T".
Units/parser-html.r/jsp.d/input.html Adds an HTML input file with a JSP expression; ensure the expression is correctly completed.
Files not reviewed (18)
  • Units/parser-html.r/jsp.d/README: Language not supported
  • Units/parser-html.r/jsp.d/args.ctags: Language not supported
  • Units/parser-markdown.r/frontmatter.d/expected.tags: Language not supported
  • Units/parser-markdown.r/packcc-parser-running-within-code-block.d/args.ctags: Language not supported
  • Units/parser-markdown.r/packcc-parser-running-within-code-block.d/expected.tags: Language not supported
  • Units/parser-markdown.r/packcc-parser-running-within-code-block.d/features: Language not supported
  • Units/parser-php.r/wp-guest.d/args.ctags: Language not supported
  • Units/parser-php.r/wp-guest.d/expected.tags: Language not supported
  • Units/parser-php.r/wp-guest.d/input.php: Language not supported
  • Units/parser-rmarkdown.r/frontmatter.d/expected.tags: Language not supported
  • Units/parser-sh.r/sh-heredoc-run-guest-parser-with-external-libs.d/args.ctags: Language not supported
  • Units/parser-sh.r/sh-heredoc-run-guest-parser-with-external-libs.d/expected.tags: Language not supported
  • Units/parser-sh.r/sh-heredoc-run-guest-parser-with-external-libs.d/features: Language not supported
  • Units/parser-sh.r/sh-heredoc-run-guest-parser-with-external-libs.d/input-0.sh: Language not supported
  • Units/parser-sh.r/sh-heredoc-run-guest-parser-with-external-libs.d/input.sh: Language not supported
  • Units/parser-sh.r/sh-heredoc-run-guest-parser-with-packcc.d/args.ctags: Language not supported
  • Units/parser-sh.r/sh-heredoc-run-guest-parser-with-packcc.d/expected.tags: Language not supported
  • Units/parser-sh.r/sh-heredoc-run-guest-parser-with-packcc.d/features: Language not supported

@@ -0,0 +1,6 @@
T
Copy link

Copilot AI May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The single character 'T' at the start of the file is ambiguous. Please clarify its purpose or include additional context if this is intended test data.

Copilot uses AI. Check for mistakes.
… as a guest

Fix universal-ctags#4114.

setTagEndLine() takes a tag entry and a line number as arguments.
setTagEndLine() expects the line number in the absolute coordinate
system. However, a peg-based parser running as a guest parser passed a
line number in the current coordinate system. Using the wrong
coordinate system caused trouble; the end line number was smaller
than the start line number assigned to the tag.

The line numbers were derived from file offsets provided by
the code generated by packcc. The file offsets were in the current
coordinate system. As a result, the derived line numbers were in
the current coordinate system.

This change uses translateFileOffset() to convert the coordinate
system of the offset.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake masatake force-pushed the main/read--revise-offset-calculation-in-nested-input-stream branch from 8356581 to 036d135 Compare May 1, 2025 20:15
masatake added 2 commits May 4, 2025 01:26
…bugs

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
…guest parser has read nothing yet

getInputLineNumber() returned 1, not 0 if a parser had read nothing yet.
This is the expected behaviour IF the parser runs as a host parser.

If a parser runs as a guest parser, it should return the start line of
the area where the guest parser runs.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake masatake merged commit fc82393 into universal-ctags:master May 6, 2025
90 of 91 checks passed
@masatake masatake mentioned this pull request May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants