-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathtranslate_package.Rd
More file actions
342 lines (285 loc) · 15.4 KB
/
translate_package.Rd
File metadata and controls
342 lines (285 loc) · 15.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/translate_package.R
\name{translate_package}
\alias{translate_package}
\title{Interactively provide translations for a package's messages}
\usage{
translate_package(
dir = ".",
languages = NULL,
diagnostics = list(check_cracked_messages, check_untranslated_cat,
check_untranslated_src),
custom_translation_functions = list(R = NULL, src = NULL),
max_translations = Inf,
use_base_rules = package \%chin\% .potools$base_package_names,
copyright = NULL,
bugs = "",
verbose = !is_testing()
)
}
\arguments{
\item{dir}{Character, default the present directory; a directory in which an
R package is stored.}
\item{languages}{Character vector; locale codes to which to translate.
Must be a valid language accepted by gettext. This almost always takes
the form of (1) an ISO 639 2-letter language code; or (2) \code{ll_CC}, where
\code{ll} is an ISO 639 2-letter language code and \code{CC} is an ISO 3166 2-letter
country code e.g. \code{es} for Spanish, \code{es_AR} for Argentinian Spanish, \code{ro}
for Romanian, etc. See \code{\link[base:locales]{base::Sys.getlocale()}} for some helpful tips
about how to tell which locales are currently available on your machine, and
see the References below for some web resources listing more locales.}
\item{diagnostics}{A \code{list} of diagnostic functions to be run on the
package's message data. See Details.}
\item{custom_translation_functions}{A \code{list} with either/both of two
components, \code{R} and \code{src}, together governing how to extract any
non-standard strings from the package. See Details.}
\item{max_translations}{Numeric; used for setting a cap on the number of
translations to be done for each language. Defaults to \code{Inf}, meaning
all messages in the package.}
\item{use_base_rules}{Logical; Should internal behavior match base behavior
as strictly as possible? \code{TRUE} if being run on a base package (i.e.,
\code{base} or one of the default packages like \code{utils},
\code{graphics}, etc.). See Details.}
\item{copyright}{Character; passed on to \code{\link[=write_po_file]{write_po_file()}}.}
\item{bugs}{Character; passed on to \code{\link[=write_po_file]{write_po_file()}}.}
\item{verbose}{Logical, default \code{TRUE} (except during testing). Should
extra information about progress, etc. be reported?}
}
\value{
This function returns nothing invisibly. As a side effect, a
\file{.pot} file is written to the package's \file{po} directory (updated if
one does not yet exist, or created from scratch otherwise), and a \file{.po}
file is written in the same directory for each element of \code{languages}.
}
\description{
This function handles the "grunt work" of building and updating translation
libraries. In addition to providing a friendly interface for supplying
translations, some internal logic is built to help make your package more
translation-friendly.
To get started, the package developer should run \code{translate_package()} on
your package's source to produce a template \code{.pot} file (or files, if your
package has both R and C/C++ messages to translated), e.g.
To add translations in your desired language, include the target language:
in the \code{translate_package(languages = "es")} call.
}
\section{Phases}{
\code{translate_package()} goes through roughly three "phases" of translation.
\enumerate{
\item Setup -- \code{dir} is checked for existing translations
(toggling between "update" and "new" modes), and R files are parsed and
combed for user-facing messages.
\item Diagnostics: see the Diagnostics section below. Any
diagnostic detecting "unhealthy" messages will result in a yes/no prompt to
exit translation to address the issues before continuing.
\item Translation. All of the messages found in phase one are iterated over --
the user is shown a message in English and prompted for the translation
in the target language. This process is repeated for each domain
in \code{languages}.
}
An attempt is made to provide hints for some translations that require
special care (e.g. that have escape sequences or use templates). For
templated messages (e.g., that use \verb{\%s}), the user-provided message
must match the templates of the English message. The templates \emph{don't}
have to be in the same order -- R understands template reordering, e.g.
\verb{\%2$s} says "interpret the second input as a string". See
\code{\link[=sprintf]{sprintf()}} for more details.
After each language is completed, a corresponding \file{.po} file is written
to the package's \file{po} directory (which is created if it does not yet
exist).
There are some discrepancies in the default behavior of
\code{translate_package} and the translation workflow used to generate the
\file{.po}/\file{.pot} files for R itself (mainly, the suite of functions
from \code{tools}, \code{\link[tools:update_pkg_po]{tools::update_pkg_po()}},
\code{\link[tools:xgettext]{tools::xgettext2pot()}}, \code{\link[tools:xgettext]{tools::xgettext()}}, and
\code{\link[tools:xgettext]{tools::xngettext()}}). They should only be superficial (e.g.,
whitespace or comments), but nevertheless may represent a barrier to
smoothly submitting patchings to R Core. To make the process of translating
base R and the default packages (\code{tools}, \code{utils}, \code{stats},
etc.) as smooth as possible, set the \code{use_base_rules} argument to
\code{TRUE} and your resulting \file{.po}/\file{.pot}/\file{.mo} file will
match base's.
}
\section{Custom translation functions}{
\code{base} R provides several functions for messaging that are natively equipped
for translation (they all have a \code{domain} argument): \code{stop()}, \code{warning()},
\code{message()}, \code{gettext()}, \code{gettextf()}, \code{ngettext()}, and
\code{packageStartupMessage()}.
While handy, some developers may prefer to write their own functions, or to
write wrappers of the provided functions that provide some enhanced
functionality (e.g., templating or automatic wrapping). In this case,
the default R tooling for translation (\code{xgettext()}, \code{xngettext()}
\code{xgettext2pot()}, \code{update_pkg_po()} from \code{tools}) will not work, but
\code{translate_package()} and its workhorse \code{get_message_data()} provide an
interface to continue building translations for your workflow.
Suppose you wrote a function \code{stopf()} that is a wrapper of
\code{stop(gettextf())} used to build templated error messages in R, which makes
translation easier for translators (see below), e.g.:
\if{html}{\out{<div class="sourceCode R">}}\preformatted{stopf = function(fmt, ..., domain = NULL) \{
stop(gettextf(fmt, ...), domain = domain, call. = FALSE)
\}
}\if{html}{\out{</div>}}
Note that \code{potools} itself uses just such a wrapper internally to build
error messages! To extract strings from calls in your package to \code{stopf()}
and mark them for translation, use the argument
\code{custom_translation_functions}:
\if{html}{\out{<div class="sourceCode R">}}\preformatted{get_message_data(
'/path/to/my_package',
custom_translation_functions = list(R = 'stopf:fmt|1')
)
}\if{html}{\out{</div>}}
This invocation tells \code{get_message_data()} to look for strings in the
\code{fmt} argument in calls to \code{stopf()}. \code{1} indicates that \code{fmt} is the
first argument.
This interface is inspired by the \code{--keyword} argument to the
\code{xgettext} command-line tool. This argument consists of a list with two
components, \code{R} and \code{src} (either can be excluded), owing to
differences between R and C/C++. Both components, if present, should consist
of a character vector.
For R, there are two types of input: one for named arguments, the other for
unnamed arguments.
\itemize{
\item Entries for \strong{named} arguments will look like \code{"fname:arg|num"} (singular
string) or \code{"fname:arg1|num1,arg2|num2"} (plural string). \code{fname}
gives the name of the function/call to be extracted from the R source,
\code{arg}/\code{arg1}/\code{arg2} specify the name of the argument to
\code{fname} from which strings should be extracted, and
\code{num}/\code{num1}/\code{num2} specify the \emph{order} of the named
argument within the signature of \code{fname}.
\item Entries for \strong{unnamed} arguments will look like
\code{"fname:...\\xarg1,...,xargn"}, i.e., \code{fname}, followed by
\code{:}, followed by \code{...} (three dots), followed by a backslash
(\verb{\\}), followed by a comma-separated list of argument names. All
strings within calls to \code{fname} \emph{except} those supplied to the
arguments named among \code{xarg1}, ..., \code{xargn} will be extracted.
}
To clarify, consider the how we would (redundantly) specify
\code{custom_translation_functions} for some of the default messagers,
\code{gettext}, \code{gettextf}, and \code{ngettext}:
\code{custom_translation_functions = list(R = c("gettext:...\\domain", "gettextf:fmt|1", "ngettext:msg1|2,msg2|3"))}.
For src, there is only one type of input, which looks like
\code{"fname:num"}, which says to look at the \code{num} argument of calls
to \code{fname} for \code{char} arrays.
Note that there is a difference in how translation works for src vs. R -- in
R, all strings passed to certain functions are considered marked for
translations, but in src, all translatable strings must be explicitly marked
as such. So for \code{src} translations, \code{custom_translation_functions}
is not used to customize which strings are marked for translation, but
rather, to expand the set of calls which are searched for potentially
\emph{untranslated} arrays (i.e., arrays passed to the specified calls that
are not explicitly marked for translation). These can then be reported in
the \code{\link[=check_untranslated_src]{check_untranslated_src()}} diagnostic, for example.
}
\section{Diagnostics}{
\subsection{Cracked messages}{
A cracked message is one like:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{stop("There are ", n, " good things and ", m, " bad things.")
}\if{html}{\out{</div>}}
In its current state, translators will be asked to translate three messages
independently:
\itemize{
\item "There are"
\item "good things and"
\item "bad things."
}
The message has been cracked; it might not be possible to translate a string
as generic as "There are" into many languages -- context is key!
To keep the context, the error message should instead be build with
\code{gettextf} like so:
\if{html}{\out{<div class="sourceCode r">}}\preformatted{stop(domain=NA, gettextf("There are \%d good things and \%d bad things."))
}\if{html}{\out{</div>}}
Now there is only one string to translate! Note that this also allows the
translator to change the word order as they see fit -- for example, in
Japanese, the grammatical order usually puts the verb last (where in
English it usually comes right after the subject).
\code{translate_package} detects such cracked messages and suggests a
\code{gettextf}-based approach to fix them.
}
\subsection{Untranslated R messages produced by \code{cat()}}{
Only strings which are passed to certain \code{base} functions are eligible for
translation, namely \code{stop}, \code{warning}, \code{message}, \code{packageStartupMessage},
\code{gettext}, \code{gettextf}, and \code{ngettext} (all of which have a \code{domain} argument
that is key for translation).
However, it is common to also produce some user-facing messages using
\code{cat} -- if your package does so, it must first use \code{gettext} or \code{gettextf}
to translate the message before sending it to the user with \code{cat}.
\code{translate_package} detects strings produced with \code{cat} and suggests a
\code{gettext}- or \code{gettextf}-based fix.
}
\subsection{Untranslated C/C++ messages}{
This diagnostic detects any literal \code{char} arrays provided to common
messaging functions in C/C++, namely \code{ngettext()}, \code{Rprintf()}, \code{REprintf()},
\code{Rvprintf()}, \code{REvprintf()}, \code{R_ShowMessage()}, \code{R_Suicide()}, \code{warning()},
\code{Rf_warning()}, \code{error()}, \code{Rf_error()}, \code{dgettext()}, and \code{snprintf()}.
To actually translate these strings, pass them through the translation
macro \verb{_}.
NB: Translation in C/C++ requires some additional \verb{#include}s and
declarations, including defining the \verb{_} macro.
See the Internationalization section of Writing R Extensions for details.
}
}
\section{Custom diagnostics}{
A diagnostic is a function which takes as input a \code{data.table}
summarizing the translatable strings in a package (e.g. as generated by
\code{\link[=get_message_data]{get_message_data()}}), evaluates whether these messages are
"healthy" in some sense, and produces a digest of "unhealthy" strings and
(optionally) suggested replacements.
The diagnostic function must have an attribute named \code{diagnostic_tag}
that describes what the diagnostic does; it is reproduced in the format
\code{Found {nrow(result)} {diagnostic_tag}:}. For example,
\code{\link[=check_untranslated_cat]{check_untranslated_cat()}} has \code{diagnostic_tag = "untranslated messaging calls passed through cat()"}.
The output diagnostic result has the following schema:
\itemize{
\item \code{call}: \code{character}, the call identified as problematic
\item \code{file}: \code{character}, the file where \code{call} was found
\item \code{line_number}: \code{integer}, the line in \code{file} where \code{call} was found
\item \code{replacement}: \code{character}, \emph{optional}, a suggested fix to make the call
"healthy"
}
See \code{\link[=check_cracked_messages]{check_cracked_messages()}},
\code{\link[=check_untranslated_cat]{check_untranslated_cat()}}, and
\code{\link[=check_untranslated_src]{check_untranslated_src()}} for examples of diagnostics.
}
\examples{
pkg <- system.file('pkg', package = 'potools')
# copy to a temporary location to be able to read/write/update below
tmp_pkg <- file.path(tempdir(), "pkg")
dir.create(tmp_pkg)
file.copy(pkg, dirname(tmp_pkg), recursive = TRUE)
# run translate_package() without any languages
# this will generate a .pot template file and en@quot translations (in UTF-8 locales)
# we can also pass empty 'diagnostics' to skip the diagnostic step
# (skip if gettext isn't available to avoid an error)
if (isTRUE(check_potools_sys_reqs)) {
translate_package(tmp_pkg, diagnostics = NULL)
}
\dontrun{
# launches the interactive translation dialog for translations into Estonian:
translate_package(tmp_pkg, "et_EE", diagnostics = NULL, verbose = TRUE)
}
# cleanup
unlink(tmp_pkg, recursive = TRUE)
rm(pkg, tmp_pkg)
}
\references{
\url{https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Internationalization}
\cr
\url{https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Internationalization}
\cr
\url{https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Internationalization-in-the-R-sources}
\cr \url{https://developer.r-project.org/Translations30.html} \cr
\url{https://isi-web.org/glossary} \cr
\url{https://www.gnu.org/software/gettext/} \cr
\url{https://www.gnu.org/software/gettext/manual/html_node/Usual-Language-Codes.html#Usual-Language-Codes}
\cr
\url{https://www.gnu.org/software/gettext/manual/html_node/Country-Codes.html#Country-Codes}
\cr \url{https://www.stats.ox.ac.uk/pub/Rtools/goodies/gettext-tools.zip}
\cr \url{https://saimana.com/list-of-country-locale-code/}
}
\seealso{
\code{\link[=get_message_data]{get_message_data()}}, \code{\link[=write_po_file]{write_po_file()}},
\code{\link[tools:xgettext]{tools::xgettext()}}, \code{\link[tools:update_pkg_po]{tools::update_pkg_po()}},
\code{\link[tools:checkPoFiles]{tools::checkPoFile()}}, \code{\link[base:gettext]{base::gettext()}}
}
\author{
Michael Chirico
}