The following message seems to be missing from data.table.pot:
https://github.com/Rdatatable/data.table/blob/b7f2106efe038d93577f427f34c06d9c00b4c486/src/fread.c#L2775
The code seems to consider this message to be subject to a # notranslate exclusion:
|
src_messages = drop_excluded(src_messages, exclusions[is_outside_char_array(exclusion_pos, arrays)]) |
debug: src_messages = drop_excluded(src_messages, exclusions[is_outside_char_array(exclusion_pos,
arrays)])
Browse[1]> src_messages[grepl('sep=', msgid)] # <-- row 3 here
msgid msgid_plural fname
<char> <list> <char>
1: sep='\\\\n' passed in meaning read lines as single character column\\n [NULL] DTPRINT
2: sep=',' so dec set to '.'\\n [NULL] DTPRINT
3: %8.3fs (%3.0f%%) sep= [NULL] DTPRINT
call array_start is_marked_for_translation line_number
<char> <int> <lgcl> <int>
1: DTPRINT(_(" sep='\\\\n' passed in meaning read lines as single character column\\n")) 71163 TRUE 1674
2: DTPRINT(_(" sep=',' so dec set to '.'\\n")) 83411 TRUE 1892
3: DTPRINT(_("%8.3fs (%3.0f%%) sep="), tLayout-tMap, 100.0*(tLayout-tMap)/tTot) 129888 TRUE 2775
Browse[1]> n
<...>
Browse[1]> src_messages[grepl('sep=', msgid)] # <-- one row less now!
msgid msgid_plural fname
<char> <list> <char>
1: sep='\\\\n' passed in meaning read lines as single character column\\n [NULL] DTPRINT
2: sep=',' so dec set to '.'\\n [NULL] DTPRINT
call array_start is_marked_for_translation line_number
<char> <int> <lgcl> <int>
1: DTPRINT(_(" sep='\\\\n' passed in meaning read lines as single character column\\n")) 71163 TRUE 1674
2: DTPRINT(_(" sep=',' so dec set to '.'\\n")) 83411 TRUE 1892
Browse[1]> exclusions[is_outside_char_array(exclusion_pos, arrays)]
file line1 capture_lengths
<char> <int> <int>
1: src/fread.c 438 0
2: src/fread.c 1366 0
3: src/fread.c 1733 0
4: src/fread.c 1783 0
5: src/fread.c 2111 0
6: src/fread.c 2119 0
7: src/fread.c 2305 0
8: src/fread.c 2775 0 # <-- why is line 2775 excluded?
9: src/fread.c 2794 0
Browse[1]> readChar(file, file.size(file)) |> substr(exclusion_pos[8]-32, exclusion_pos[8]+16)
[1] "\n DTPRINT(\" =====\\n\"); // # notranslate\n " # <-- exclusion no.8 corresponds to a different line!
Since the exclusions are matched against the original, non-preprocessed file contents:
|
exclusion_pos = gregexpr("# notranslate( (start|end))?", contents, perl=TRUE)[[1L]] |
...and the newlines are matched in the preprocessed file contents, where they have different offsets due to the comments being removed:
|
contents_char = preprocess(strsplit(contents, NULL)[[1L]]) |
|
# as a single string |
|
contents = paste(contents_char, collapse = "") |
|
|
|
# NB: should still be fine to look only for \n on windows |
|
newlines_loc = c(0L, as.integer(gregexpr("\n", contents, fixed = TRUE)[[1L]])) |
...the line numbers produced from
exclusion_pos and
newlines_loc end up being incorrect:
|
exclusions = data.table( |
|
file = file, |
|
line1 = findInterval(as.integer(exclusion_pos), newlines_loc), |
|
capture_lengths = attr(exclusion_pos, "capture.length")[ , 1L] |
|
) |
Matching exclusions against the original file would have given the correct line number:
Browse[1]> newlines_loc2 = c(0L, as.integer(gregexpr("\n", readChar(file, file.size(file)), fixed = TRUE)[[1L]]))
Browse[1]> data.table(
file = file,
line1 = findInterval(as.integer(exclusion_pos), newlines_loc2),
capture_lengths = attr(exclusion_pos, "capture.length")[ , 1L]
)[8]
file line1 capture_lengths
<char> <int> <int>
1: src/fread.c 2113 0
Browse[1]> readLines(file)[2113]
[1] " DTPRINT(\" =====\\n\"); // # notranslate"
Browse[1]>
...but there must be a better solution, one that is compatible with preprocessing.
The following message seems to be missing from
data.table.pot:https://github.com/Rdatatable/data.table/blob/b7f2106efe038d93577f427f34c06d9c00b4c486/src/fread.c#L2775
The code seems to consider this message to be subject to a
# notranslateexclusion:potools/R/get_src_messages.R
Line 255 in 0dc5292
Since the exclusions are matched against the original, non-preprocessed file contents:
potools/R/get_src_messages.R
Line 75 in 0dc5292
...and the newlines are matched in the preprocessed file contents, where they have different offsets due to the comments being removed:
potools/R/get_src_messages.R
Lines 77 to 82 in 0dc5292
...the line numbers produced from
exclusion_posandnewlines_locend up being incorrect:potools/R/get_src_messages.R
Lines 250 to 254 in 0dc5292
Matching exclusions against the original file would have given the correct line number:
...but there must be a better solution, one that is compatible with preprocessing.