Line number problems in C `# notranslate` exclusions

The following message seems to be missing from `data.table.pot`:

https://github.com/Rdatatable/data.table/blob/b7f2106efe038d93577f427f34c06d9c00b4c486/src/fread.c#L2775

The code seems to consider this message to be subject to a `# notranslate` exclusion:

https://github.com/MichaelChirico/potools/blob/0dc529285c4f54a86d0755317d9304d735c3858f/R/get_src_messages.R#L255
```
debug: src_messages = drop_excluded(src_messages, exclusions[is_outside_char_array(exclusion_pos,
    arrays)])
Browse[1]> src_messages[grepl('sep=', msgid)] # <-- row 3 here
                                                                      msgid msgid_plural   fname
                                                                     <char>       <list>  <char>
1:   sep='\\\\n' passed in meaning read lines as single character column\\n       [NULL] DTPRINT
2:                                             sep=',' so dec set to '.'\\n       [NULL] DTPRINT
3:                                                    %8.3fs (%3.0f%%) sep=       [NULL] DTPRINT
                                                                                     call array_start is_marked_for_translation line_number
                                                                                   <char>       <int>                    <lgcl>       <int>
1: DTPRINT(_("  sep='\\\\n' passed in meaning read lines as single character column\\n"))       71163                      TRUE        1674
2:                                           DTPRINT(_("  sep=',' so dec set to '.'\\n"))       83411                      TRUE        1892
3:           DTPRINT(_("%8.3fs (%3.0f%%) sep="), tLayout-tMap, 100.0*(tLayout-tMap)/tTot)      129888                      TRUE        2775
Browse[1]> n
<...>
Browse[1]> src_messages[grepl('sep=', msgid)] # <-- one row less now!
                                                                      msgid msgid_plural   fname
                                                                     <char>       <list>  <char>
1:   sep='\\\\n' passed in meaning read lines as single character column\\n       [NULL] DTPRINT
2:                                             sep=',' so dec set to '.'\\n       [NULL] DTPRINT
                                                                                     call array_start is_marked_for_translation line_number
                                                                                   <char>       <int>                    <lgcl>       <int>
1: DTPRINT(_("  sep='\\\\n' passed in meaning read lines as single character column\\n"))       71163                      TRUE        1674
2:                                           DTPRINT(_("  sep=',' so dec set to '.'\\n"))       83411                      TRUE        1892
Browse[1]> exclusions[is_outside_char_array(exclusion_pos, arrays)]
          file line1 capture_lengths
        <char> <int>           <int>
1: src/fread.c   438               0
2: src/fread.c  1366               0
3: src/fread.c  1733               0
4: src/fread.c  1783               0
5: src/fread.c  2111               0
6: src/fread.c  2119               0
7: src/fread.c  2305               0
8: src/fread.c  2775               0 # <-- why is line 2775 excluded?
9: src/fread.c  2794               0
Browse[1]> readChar(file, file.size(file)) |> substr(exclusion_pos[8]-32, exclusion_pos[8]+16)
[1] "\n      DTPRINT(\"  =====\\n\"); // # notranslate\n   " # <-- exclusion no.8 corresponds to a different line!
```

Since the exclusions are matched against the original, non-preprocessed file contents:
https://github.com/MichaelChirico/potools/blob/0dc529285c4f54a86d0755317d9304d735c3858f/R/get_src_messages.R#L75
...and the newlines are matched in the preprocessed file contents, where they have different offsets due to the comments being removed:
https://github.com/MichaelChirico/potools/blob/0dc529285c4f54a86d0755317d9304d735c3858f/R/get_src_messages.R#L77-L82
...the line numbers produced from `exclusion_pos` and `newlines_loc` end up being incorrect:
https://github.com/MichaelChirico/potools/blob/0dc529285c4f54a86d0755317d9304d735c3858f/R/get_src_messages.R#L250-L254

Matching exclusions against the original file would have given the correct line number:

```
Browse[1]> newlines_loc2 = c(0L, as.integer(gregexpr("\n", readChar(file, file.size(file)), fixed = TRUE)[[1L]]))
Browse[1]> data.table(
      file = file,
      line1 = findInterval(as.integer(exclusion_pos), newlines_loc2),
      capture_lengths = attr(exclusion_pos, "capture.length")[ , 1L]
    )[8]
          file line1 capture_lengths
        <char> <int>           <int>
1: src/fread.c  2113               0
Browse[1]> readLines(file)[2113]
[1] "      DTPRINT(\"  =====\\n\"); // # notranslate"
Browse[1]>
```

...but there must be a better solution, one that is compatible with preprocessing.

	contents_char = preprocess(strsplit(contents, NULL)[[1L]])
	# as a single string
	contents = paste(contents_char, collapse = "")

	# NB: should still be fine to look only for \n on windows
	newlines_loc = c(0L, as.integer(gregexpr("\n", contents, fixed = TRUE)[[1L]]))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Line number problems in C `# notranslate` exclusions #323

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	exclusions = data.table(
	file = file,
	line1 = findInterval(as.integer(exclusion_pos), newlines_loc),
	capture_lengths = attr(exclusion_pos, "capture.length")[ , 1L]
	)

Line number problems in C # notranslate exclusions #323

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Line number problems in C `# notranslate` exclusions #323