potools/man/translate_package.Rd at 06d97f4819041ffe272d568a4906e3cda6c06f94 · MichaelChirico/potools · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
\name{translate_package}
\alias{translate_package}
\title{
Interactively provide translations for a package's messages
}
\description{
This function handles the "grunt work" of building and updating translation libraries.
In addition to providing a friendly interface for supplying translations, some internal
logic is built to help make your package more translation-friendly.

To do so, it builds on low-level command line tools from \code{gettext}. See Details.
}
\usage{
translate_package(
  dir='.', languages,
  diagnostics = list(check_cracked_messages, check_untranslated_cat, check_untranslated_src),
  src_translation_macros = c("_", "N_"),
  use_base_rules = package \%chin\% .potools$base_package_names,
  team_size = 1L, team_id = 1L,
  team_split_rule = c("equalize_char", "equalize_count", "equalize_files"),
  copyright = NULL, bugs = NULL, verbose=FALSE
)
}
\arguments{
  \item{dir}{ Character, default the present directory; a directory in which an R package is stored. }
  \item{languages}{ Character vector; locale codes to which to translate. See Details. }
  \item{diagnostics}{ A \code{list} of diagnostic functions to be run on the package's message data. See Details.}
  \item{src_translation_macros}{ Character, the macro used to indicate which \code{char} arrays are to be marked for translation in C/C++ files. The default, \code{_} and \code{N_}, is shared by R itself & recommended in R-exts and R-ints (See references). }
  \item{use_base_rules}{ Logical; Should internal behavior match base behavior as strictly as possible? \code{TRUE} if being run on a base package (i.e., \code{base} or one of the default packages like \code{utils}, \code{graphics}, etc.). See Details. }
  \item{team_size}{ Integer; how many translators are there for the (singular) language? See Details for this, \code{team_id}, and \code{team_split_rule}. }
  \item{team_id}{ Integer; which translator is currently working? }
  \item{team_split_rule}{ Character; how should the message base be split up among the teams? }
  \item{copyright}{ Character; passed on to \code{\link[tools]{update_pkg_po}}. }
  \item{bugs}{ Character; passed on to \code{\link[tools]{update_pkg_po}}. }
  \item{verbose}{ Logical, default \code{FALSE}. Should extra information about progress, etc. be reported? }
}
\details{
\code{translate_package} goes through roughly three "phases" of translation.

Phase one is setup -- \code{dir} is checked for existing translations (toggling between "update" and
"new" modes), and R files are parsed and combed for user-facing messages.

Phase two is for diagnostics; see the Diagnostics section below. Any diagnostic detecting "unhealthy" messages
will result in a yes/no prompt to exit translation to address the issues before continuing.

Phase three is translation. All of the messages found in phase one are iterated over -- the user
is shown a message in English and prompted for the translation in the target language. This process is repeated
for each domain in \code{languages}.

An attempt is made to provide hints for some translations that require special care (e.g. that have escape
sequences or use templates). For templated messages (e.g., that use \code{\%s}), the user-provided message
must match the templates of the English message. The templates \emph{don't} have to be in the same order --
R understands template reordering, e.g. \code{\%2$s} says "interpret the second input as a string". See
\code{\link{sprintf}} for more details.

After each language is completed, a corresponding \file{.po} file is written to the package's \file{po}
directory (which is created if it does not yet exist).

There are some discrepancies in the default behavior of \code{translate_package} and the translation workflow used to generate the \file{.po}/\file{.pot} files for R itself (mainly, the suite of functions from \code{tools}, \code{\link[tools]{update_pkg_po}}, \code{\link[tools]{xgettext2pot}}, \code{\link[tools]{xgettext}}, and \code{\link[tools]{xngettext}}). They should only be superficial (e.g., whitespace or comments), but nevertheless may represent a barrier to smoothly submitting patchings to R Core. To make the process of translating base R and the default packages (\code{tools}, \code{utils}, \code{stats}, etc.) as smooth as possible, set the \code{use_base_rules} argument to \code{TRUE} and your resulting \file{.po}/\file{.pot}/\file{.mo} file will match base's.

\bold{Teams:}

For packages with larger message bases to tackle (e.g., R itself or a large, currently-untranslated package), a divide-and-conquer approach may be preferable if a suitable team can be assembled. The arguments \code{team_size}, \code{team_id}, and \code{team_split_rule} are meant to facilitate the work of translation in this case. If \code{team_size > 1}, first the set of messages that need translation is divided "roughly"" equally into \code{team_size} parts, each of which is assigned an "ID" from \code{1} to \code{team_size}. You can select which block of messages you'd like to translate by passing your \code{team_id} (the ID for each translator will need to be coordinated amongst the team members).

There are three ways the translation set can be split "equally", controlled by the \code{team_split_rule} argument:

\enumerate{
  \item \code{"equalize_char"}: Roughly assign each translator the same number of \emph{characters} of messages to translate (i.e., according to \code{\link{nchar}}).
  \item \code{"equalize_count"}: Roughly assign each translator the same number of \emph{messages} to translate. Specific implementation is analogous to \code{"equalize_char"}.
  \item \code{"equalize_files"}: Roughly assign each translator the same number of \emph{soure files} to translate. The thinking here is to try and give one translator many similar messages on the hope that there are some efficiency gains from autocorrelation in the messages.
}

NB: \code{"equalize_char"} and \code{"equalize_files"} are implemented by sorting the and slicing. For example, if \code{team_size=3}, for \code{"equalize_char"}, the first translator (\code{team_id=1}) will get the 1st, 4th, 7th, ... largest messages, the second (\code{team_id=2}) will get the 2nd, 5th, 8th, ... largest, and the third (\code{team_id=3}) will get the 3rd, 6th, 9th, ... largest. For \code{"equalize_count"}, messages are simply taken in alternating order, sorted as they are from \code{\link{get_message_data}}.

NB: this option only applies when a single language is specified for translation.

\bold{Diagnostics:}

A diagnostic is a function which takes as input a \code{data.table} summarizing the translatable strings in
a package (e.g. as generated by \code{\link{get_message_data}}), evaluates whether these messages are "healthy" in some sense, and produces a digest of "unhealthy"
strings and (optionally) suggested replacements.

The diagnostic function must have an attribute named \code{diagnostic_tag} that describes what the diagnostic does; it is reproduced in the format \code{Found {nrow(result)} {diagnostic_tag}:}. For example, \code{\link{check_untranslated_cat}} has \code{diagnostic_tag = "untranslated messaging calls passed through cat()"}.

The output diagnostic result has the following schema:

\enumerate{
  \item \code{call}, \code{character}, the call identified as problematic
  \item \code{file}, \code{character}, the file where \code{call} was found
  \item \code{line_number}, \code{integer}, the line in \code{file} where \code{call} was found
  \item \code{replacement}, \code{character}, \emph{optional}, a suggested fix to make the call "healthy"
}

See \code{\link{check_cracked_messages}}, \code{\link{check_untranslated_cat}}, and \code{\link{check_untranslated_src}} for examples of diagnostics.

\bold{Domains:}

The input to \code{languages} are the locale codes described in \code{\link[base]{Sys.getlocale}}, e.g.
\code{es} for Spanish, \code{es_AR} for Argentinian Spanish, \code{ro} for Romanian, etc. See that help
file for some helpful tips about how to tell which locales are currently available on your machine, and
see the References below for some web resources listing more locales.

Note also the advice given in the R Installation and Administration manual (also cited below) -- if you
are writing Spanish translations, a typical package should use \code{language = "es"} to generate
Spanish translations for \emph{all} Spanish domains. If you want to add more regional flair to your
messaging, you can do so through supplemental \code{.po} files. For example, you can add some
Argentinian messages to \code{es_AR}; users running R in the \code{es_AR} locale will see messages
specifically written for \code{es_AR} first; absent that, the \code{es} message will be shown; and
absent that, the default message (i.e., in the language written in the source code, usually English).

Chinese is a slightly different case -- typically, the \code{zh_CN} domain is used to write
with simplified characters while \code{zh_TW} is used for traditional characters. In principal you
could leverage \code{zh_TW} for Taiwanisms and \code{zh_HK} for Hongkieisms.

Currently, translation is limited to the same set of domains as is available for base R: Danish, German,
English, British English, Spanish, Farsi, French, Italian, Japanese, Korean, Dutch, Polish,
Brazilian Portugese, Russian, Turkish, Mainland Chinese, and Taiwanese Chinese.

This list can be expanded; please file an Issue request on GitHub.
}
\value{
This function returns nothing invisibly. As a side effect, a \file{.pot} file is written to the package's
\file{po} directory (updated if one does not yet exist, or created from scratch otherwise), and a
\file{.po} file is written in the same directory for each element of \code{languages}.
}
\references{
\url{https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Internationalization} \cr
\url{https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Internationalization} \cr
\url{https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Internationalization-in-the-R-sources} \cr
\url{https://developer.r-project.org/Translations30.html} \cr
\url{https://www.isi-web.org/publications/glossary-of-statistical-terms} \cr
\url{https://www.gnu.org/software/gettext/} \cr
\url{https://www.stats.ox.ac.uk/pub/Rtools/goodies/gettext-tools.zip} \cr
\url{https://saimana.com/list-of-country-locale-code}
}
\seealso{
\code{\link[tools]{xgettext}}, \code{\link[tools]{update_pkg_po}}, \code{\link[tools]{checkPoFile}},
\code{\link[base]{gettext}}
}
\author{Michael Chirico}