Skip to content

Commit 46746b9

Browse files
alan-yjamesk-PHSMoohan
authored
Feature/word sensitivity (#128)
* Extending read_sensitivity_label function to handle Word .docx files. * Updating documentation. * Extending apply_sensitivity_label functionality for Word documents * Updating documentation * Updated supporting file metadata. `openxlsx2` doesn't work here so reverted to original version of my function which works on our production processes which uses `xml2`. * Adding dependency for named list w/ actual label metadata (originally in @Moohan's phssensitivitylabels repo.) * Removing duplicate list call. * Incorporated relative file paths * Tweak doc wording for consistency * Update docs * Using internal object * Updating function to use temporary directory for zipping/unzipping. Also replaced paste0() for file.path() and removed suppressWarnigns(). * Adding/removing "/" separator as necessary for path string data. * Bug fix - file_ext elsewhere. * Bug fix - file_ext used elsewhere. * fixes after testing * zip to docx directly * sensitivity labels for templates * no label word doc fix * unlink whole zipdir * Apply suggestions from code review Co-authored-by: James Hayes (né McMahon) <james.mcmahon@phs.scot> * Style code (GHA) * xlsm and docm support * label param in templates * tests for docx * fix test error --------- Co-authored-by: James <james.kilgour2@phs.scot> Co-authored-by: James Hayes (né McMahon) <james.hayes2@phs.scot> Co-authored-by: James Hayes (né McMahon) <james.mcmahon@phs.scot>
1 parent faa1500 commit 46746b9

13 files changed

Lines changed: 337 additions & 51 deletions

File tree

DESCRIPTION

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: phstemplates
22
Title: PHS R Templates
3-
Version: 1.2.4
3+
Version: 1.3.0
44
Authors@R: c(
55
person("Public Health Scotland", , , "phs.datascience@phs.scot", role = "cph"),
66
person("Alan", "Yeung", , "alan.yeung@phs.scot", role = c("aut", "cre"),
@@ -10,15 +10,16 @@ Authors@R: c(
1010
person("James", "Hayes", , "James.Hayes2@phs.scot", role = "ctb",
1111
comment = c(ORCID = "0000-0002-5380-2029")),
1212
person("Ellie", "Bates", , "Ellie.Bates@phs.scot", role = "ctb"),
13-
person("Matthew", "Hoyle", , "matthew.hoyle@phs.scot", role = "ctb")
13+
person("Matthew", "Hoyle", , "matthew.hoyle@phs.scot", role = "ctb"),
14+
person("James", "Kilgour", , "james.kilgour2@phs.scot", role = "ctb")
1415
)
1516
Description: Templates for Public Health Scotland.
1617
License: MIT + file LICENSE
1718
URL: https://github.com/Public-Health-Scotland/phstemplates,
1819
https://public-health-scotland.github.io/phstemplates/
1920
BugReports: https://github.com/Public-Health-Scotland/phstemplates/issues
2021
Depends:
21-
R (>= 4.0.0)
22+
R (>= 4.1.0)
2223
Imports:
2324
cli,
2425
git2r (>= 0.36.2),
@@ -28,7 +29,9 @@ Imports:
2829
renv,
2930
rmarkdown (>= 2.12),
3031
rstudioapi,
31-
utils
32+
utils,
33+
xml2,
34+
zip
3235
Suggests:
3336
dplyr,
3437
flextable (>= 0.5.7),

NEWS.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
1+
# phstemplates 1.3.0
2+
3+
* Added PHS Quarto HTML template, accessible through `create_phs_html()`.
4+
* Added `read_sensitivity_label()` and `apply_sensitivity_label()` functions to read and apply PHS sensitivity labels to Microsoft Word and Excel documents.
5+
16
# phstemplates 1.2.4
27

38
* Uses git2r functions for working with git.

R/phs_report_docx.R

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#' @param cover_title Title to be used in the cover page.
2626
#' @param cover_subtitle Subtitle to be used in the cover page.
2727
#' @param cover_date Date to be used in the cover page.
28+
#' @inheritParams apply_sensitivity_label
2829
#' @return R Markdown output format to pass to \code{\link[rmarkdown]{render}}
2930
#' @examples
3031
#' \dontrun{
@@ -49,7 +50,8 @@ phs_report_docx <- function(
4950
cover_page = NULL,
5051
cover_title = "Title",
5152
cover_subtitle = "Subtitle",
52-
cover_date = "DD Month YYYY"
53+
cover_date = "DD Month YYYY",
54+
label = NULL
5355
) {
5456
resolve_highlight <- utils::getFromNamespace("resolve_highlight", "rmarkdown")
5557
highlighters <- utils::getFromNamespace("highlighters", "rmarkdown")
@@ -128,7 +130,8 @@ phs_report_docx <- function(
128130
title = cover_title,
129131
stitle = cover_subtitle,
130132
dt = cover_date,
131-
tocd = toc_depth
133+
tocd = toc_depth,
134+
slabel = label
132135
) {
133136
officer::read_docx(output_file) %>%
134137
officer::cursor_begin() %>%
@@ -165,6 +168,12 @@ phs_report_docx <- function(
165168
officer::body_add_xml(str = xml_elt) %>%
166169
officer::set_doc_properties(title = title) %>%
167170
print(output_file)
171+
172+
if (!is.null(slabel)) {
173+
apply_sensitivity_label(output_file, slabel)
174+
} else {
175+
invisible(output_file)
176+
}
168177
}
169178

170179
# return output format

R/sensitivity_labels.R

Lines changed: 167 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
#' Read Sensitivity Label
2-
#' @description Reads the sensitivity label from an Excel file using
3-
#' openxlsx2::wb_get_mips. Returns the label name, 'no label' if none is found,
2+
#' @description Reads the sensitivity label from a Word or Excel file using
3+
#' openxlsx2 package functions. Returns the label name, 'no label' if none is found,
44
#' or errors if unexpected.
55
#'
6-
#' @param file Path to the Excel file (.xlsx or .xls)
6+
#' @param file Path to the file (.xlsx, .xlsm, .docx, docm)
77
#' @return The sensitivity label name, or 'no label' if none is found.
88
#' @examples
99
#' \dontrun{
@@ -12,6 +12,7 @@
1212
#' label <- read_sensitivity_label("myfile.xlsx")
1313
#' print(label) # "Personal"
1414
#' }
15+
#' @family Sensitivity Label functions
1516
#' @export
1617
read_sensitivity_label <- function(file) {
1718
# Parameter validation
@@ -40,23 +41,63 @@ read_sensitivity_label <- function(file) {
4041
)
4142
}
4243

43-
# Check file is Excel workbook
44-
file_extension <- tolower(tools::file_ext(file))
45-
if (!file_extension %in% c("xlsx", "xls")) {
44+
# Check file is valid
45+
file_ext <- tolower(tools::file_ext(file))
46+
if (!file_ext %in% c("xlsx", "xlsm", "docx", "docm")) {
4647
cli::cli_abort(
47-
"{.arg file} must be an Excel workbook with {.val .xlsx} or {.val .xls} extension, not {.val .{file_extension}}."
48+
"{.arg file} must be an Excel workbook or Word document with {.val .xlsx}, {.val .xlsm}, or {.val .docx}, {.val .docm} extension, not {.val .{file_ext}}."
4849
)
4950
}
5051

51-
wb <- openxlsx2::wb_load(file)
52-
mips <- openxlsx2::wb_get_mips(wb)
52+
## Extracting label within Excel workbooks ----
5353

54-
if (is.null(mips) || length(mips) == 0L || mips == "") {
55-
return("No label")
54+
if (file_ext %in% c("xlsx", "xlsm")) {
55+
wb <- openxlsx2::wb_load(file)
56+
mips <- openxlsx2::wb_get_mips(wb)
57+
58+
if (is.null(mips) || length(mips) == 0L || mips == "") {
59+
return("No label")
60+
}
61+
62+
# Try to extract label name from XML
63+
label_name <- which(sensitivity_label_xml == mips) |> names()
5664
}
5765

58-
# Try to extract label name from XML
59-
label_name <- which(sensitivity_label_xml == mips) |> names()
66+
## Extracting label within Word docs ----
67+
68+
if (file_ext %in% c("docx", "docm")) {
69+
file_name <- tools::file_path_sans_ext(basename(file))
70+
71+
zipdir <- file.path(tempdir(), file_name)
72+
73+
utils::unzip(file, exdir = zipdir)
74+
75+
label_file_exists <- file.exists(file.path(
76+
zipdir,
77+
"docMetadata",
78+
"LabelInfo.xml"
79+
))
80+
81+
if (label_file_exists) {
82+
mips <- xml2::read_xml(file.path(zipdir, "docMetadata", "LabelInfo.xml"))
83+
84+
label_node <- xml2::xml_find_first(mips, "//clbl:label")
85+
id_value <- xml2::xml_attr(label_node, "id")
86+
87+
# If the label ID is the same as the site ID, return no label
88+
if (id_value == xml2::xml_attr(label_node, "siteId")) {
89+
label_name <- "No label"
90+
} else {
91+
id_value <- substr(id_value, 2, nchar(id_value) - 1)
92+
label_id <- grep(id_value, unlist(sensitivity_label_xml), fixed = TRUE)
93+
label_name <- names(sensitivity_label_xml)[label_id]
94+
}
95+
} else {
96+
return("No label")
97+
}
98+
99+
unlink(zipdir, recursive = TRUE)
100+
}
60101

61102
if (length(label_name) == 0L) {
62103
cli::cli_abort(
@@ -69,18 +110,19 @@ read_sensitivity_label <- function(file) {
69110

70111

71112
#' Apply Sensitivity Label
72-
#' @description Applies a sensitivity label to an Excel file using openxlsx2
113+
#' @description Applies a sensitivity label to a Word or Excel file using openxlsx2
73114
#' and built-in XML. Supported labels are 'Personal', 'OFFICIAL', and
74115
#' 'OFFICIAL_SENSITIVE_VMO' (visual markings only).
75116
#'
76-
#' The function loads the Excel file, applies the specified sensitivity label
117+
#' The function loads the file, applies the specified sensitivity label
77118
#' using the appropriate XML, and saves the modified file. If successful, it
78119
#' silently returns the file path.
79120
#'
80-
#' @param file Path to the Excel file (.xlsx or .xls)
121+
#' @param file Path to the file (.xlsx, .xlsm, .docx, .docm)
81122
#' @param label Sensitivity label. One of: 'Personal', 'OFFICIAL',
82123
#' 'OFFICIAL_SENSITIVE_VMO'.
83124
#' @return Silently returns the file path if successful.
125+
#' @family Sensitivity Label functions
84126
#' @export
85127
#' @examples
86128
#' \dontrun{
@@ -143,19 +185,120 @@ apply_sensitivity_label <- function(file, label) {
143185
)
144186
}
145187

146-
# Check file is Excel workbook
147-
file_extension <- tolower(tools::file_ext(file))
148-
if (!file_extension %in% c("xlsx", "xls")) {
188+
# Check file is valid
189+
file_ext <- tolower(tools::file_ext(file))
190+
if (!file_ext %in% c("xlsx", "xlsm", "docx", "docm")) {
149191
cli::cli_abort(
150-
"{.arg file} must be an Excel workbook with {.val .xlsx} or {.val .xls} extension, not {.val .{file_extension}}."
192+
"{.arg file} must be an Excel workbook or Word document with{.val .xlsx}, {.val .xlsm}, or {.val .docx}, {.val .docm} extension, not {.val .{file_ext}}."
151193
)
152194
}
153195

154-
# Load workbook and apply label
155-
wb <- openxlsx2::wb_load(file)
196+
# Get label data
156197
xml <- sensitivity_label_xml[[label]]
157-
wb <- openxlsx2::wb_add_mips(wb, xml)
158-
openxlsx2::wb_save(wb, file)
198+
199+
## Apply label to Excel workbooks ----
200+
201+
if (file_ext %in% c("xlsx", "xlsm")) {
202+
wb <- openxlsx2::wb_load(file)
203+
wb <- openxlsx2::wb_add_mips(wb, xml)
204+
openxlsx2::wb_save(wb, file)
205+
}
206+
207+
## Apply label to Word docs ----
208+
209+
if (file_ext %in% c("docx", "docm")) {
210+
# Parsing input
211+
file_dir <- dirname(file)
212+
file_name <- tools::file_path_sans_ext(basename(file))
213+
214+
# Zipping process needs its own directory
215+
# Creates the temporary directory at the same time
216+
zipdir <- file.path(tempdir(), file_name)
217+
218+
# Unzip the file into the dir
219+
utils::unzip(file, exdir = zipdir)
220+
221+
# Formatting as xml, not character data.
222+
xml <- xml2::as_xml_document(xml)
223+
224+
# Create the dir using that name if needed
225+
if (!dir.exists(file.path(zipdir, "/docMetadata"))) {
226+
dir.create(
227+
file.path(zipdir, "/docMetadata"),
228+
showWarnings = FALSE,
229+
recursive = TRUE
230+
)
231+
}
232+
233+
# Write the XML data to the temp directory
234+
xml2::write_xml(
235+
xml,
236+
file.path(zipdir, "docMetadata", "LabelInfo.xml"),
237+
useBytes = TRUE
238+
)
239+
240+
# Update content file with new child node if needed
241+
content <- xml2::read_xml(file.path(zipdir, "/[Content_Types].xml"))
242+
content_required <- !grepl("docMetadata", content)
243+
244+
if (content_required) {
245+
new_node <- xml2::xml_add_child(content, .value = "Override")
246+
247+
xml2::xml_set_attrs(
248+
new_node,
249+
c(
250+
PartName = "/docMetadata/LabelInfo.xml",
251+
ContentType = "application/vnd.ms-office.classificationlabels+xml"
252+
)
253+
)
254+
255+
xml2::write_xml(
256+
content,
257+
file.path(zipdir, "/[Content_Types].xml"),
258+
useBytes = TRUE
259+
)
260+
}
261+
262+
# Update .rels file with new child node if needed
263+
rels_file <- xml2::read_xml(file.path(zipdir, "/_rels/.rels"))
264+
rels_required <- !grepl("docMetadata", rels_file)
265+
266+
if (rels_required) {
267+
xml2::xml_set_attrs(
268+
xml2::xml_add_child(rels_file, .value = "Relationship"),
269+
c(
270+
Id = "rId6",
271+
Type = "http://schemas.microsoft.com/office/2020/02/relationships/classificationlabels",
272+
Target = "docMetadata/LabelInfo.xml"
273+
)
274+
)
275+
276+
xml2::write_xml(
277+
rels_file,
278+
file.path(zipdir, "/_rels/.rels"),
279+
useBytes = TRUE
280+
)
281+
}
282+
283+
# Delete original file
284+
file.remove(file)
285+
286+
newzip <- file.path(
287+
normalizePath(file_dir),
288+
paste0(file_name, ".", tools::file_ext(file))
289+
)
290+
291+
zip::zip(
292+
zipfile = newzip,
293+
files = list.files(zipdir),
294+
recurse = TRUE,
295+
include_directories = FALSE,
296+
root = zipdir,
297+
mode = "mirror"
298+
)
299+
300+
unlink(zipdir, recursive = TRUE)
301+
}
159302

160303
cli::cli_alert_success(
161304
"Sensitivity label {.val {label}} successfully applied to {.path {basename(file)}}."

inst/rmarkdown/templates/phs-accstats-report/skeleton/skeleton.Rmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ output:
88
cover_subtitle: "My Subtitle"
99
cover_date: "DD MM YYYY"
1010
toc_depth: 3
11+
label: "OFFICIAL"
1112
---
1213

1314
```{r setup, include = FALSE}

inst/rmarkdown/templates/phs-mnginfo-report/skeleton/skeleton.Rmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ output:
88
cover_subtitle: "My subtitle"
99
cover_date: "DD MM YYYY"
1010
toc_depth: 3
11+
label: "OFFICIAL"
1112
---
1213

1314
```{r setup, include = FALSE}

inst/rmarkdown/templates/phs-offdev-report/skeleton/skeleton.Rmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ output:
88
cover_subtitle: "My Subtitle"
99
cover_date: "DD MM YYYY"
1010
toc_depth: 3
11+
label: "OFFICIAL"
1112
---
1213

1314
```{r setup, include = FALSE}

inst/rmarkdown/templates/phs-offstats-report/skeleton/skeleton.Rmd

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ output:
88
cover_subtitle: "My Subtitle"
99
cover_date: "DD MM YYYY"
1010
toc_depth: 3
11+
label: "OFFICIAL"
1112
---
1213

1314
```{r setup, include = FALSE}

man/apply_sensitivity_label.Rd

Lines changed: 8 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)