Skip to content

Commit be75742

Browse files
authored
Merge pull request #242 from microsoft/copilot/fix-35
Add detection of text missing values in validation_report()
2 parents bc7242c + 93ffe5f commit be75742

File tree

432 files changed

+43930
-24
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

432 files changed

+43930
-24
lines changed

.github/analyst_guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ To get the most out of **wpa**, make sure to leverage these additional resources
3838

3939
1. Our official [**wpa** cheat sheet](https://github.com/microsoft/wpa/blob/main/man/figures/wpa%20cheatsheet.pdf).
4040
2. A growing list of [articles](https://microsoft.github.io/wpa/articles/) with detailed walkthroughs, written by multiple contributors.
41-
3. Our [Microsoft Learn module](https://docs.microsoft.com/en-us/learn/modules/workplace-analytics-r-package/) _Analyze Microsoft Workplace Analytics data using the wpa R package_, which takes you step-by-step through the R package and its key features.
41+
3. Our [Microsoft Learn module](https://learn.microsoft.com/en-us/training/modules/workplace-analytics-r-package/) _Analyze Microsoft Workplace Analytics data using the wpa R package_, which takes you step-by-step through the R package and its key features.
4242

4343
## Ready to go?
4444

CRAN-SUBMISSION

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
Version: 1.9.1
2-
Date: 2024-06-06 12:00:40 UTC
3-
SHA: 0d960e8011577d9963a3be121dc52ef78cde61fb
1+
Version: 1.9.2
2+
Date: 2025-05-28 14:01:14 UTC
3+
SHA: d2bbd2a998182433f85e93dc51d36460b43e2ef7

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Package: wpa
22
Type: Package
33
Title: Tools for Analysing and Visualising Viva Insights Data
4-
Version: 1.9.1.9000
4+
Version: 1.9.2
55
Authors@R: c(
66
person(given = "Martin", family = "Chan", role = c("aut", "cre"), email = "[email protected]"),
77
person(given = "Carlos", family = "Morales", role = "aut", email = "[email protected]"),

NEWS.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
# wpa (development version)
1+
# wpa 1.9.2
2+
3+
- Improved missing value detection for `validation_report()`, `hrvar_count_all()` (#35)
4+
- Added support for logical outcome variables for `create_IV()` (#240)
25

36
# wpa 1.9.1
47

R/hrvar_count_all.R

Lines changed: 59 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@
2323
#' Default is 100.
2424
#' @param maxna The max percentage of NAs allowable for any column. Default is
2525
#' 20.
26+
#' @param na_values Character vector of values to be treated as missing. Default is
27+
#' c("NA", "N/A", "#N/A", " ").
2628
#'
2729
#' @import dplyr
2830
#'
@@ -40,12 +42,25 @@
4042
#' - `'message'`: outputs a message indicating which values are
4143
#' beyond the specified thresholds.
4244
#'
45+
#' @note
46+
#' As of v1.6.3, the function can detect and report text values like "NA",
47+
#' "N/A", "#N/A", and spaces that represent missing values, by treating them as
48+
#' NA values. You can customize which values are treated as missing with the
49+
#' `na_values` parameter.
50+
#' This can be validated as per:
51+
#' ```
52+
#' dv_data %>%
53+
#' mutate(TempOrg = sample(c("NA", "#N/A", " "), size = nrow(.), replace = TRUE)) %>%
54+
#' hrvar_count_all(return = "table")
55+
#' ```
56+
#'
4357
#' @export
4458
hrvar_count_all <- function(data,
4559
n_var = 50,
4660
return = "message",
4761
threshold = 100,
48-
maxna = 20
62+
maxna = 20,
63+
na_values = c("NA", "N/A", "#N/A", " ")
4964
){
5065

5166
## Character vector of HR attributes
@@ -56,6 +71,11 @@ hrvar_count_all <- function(data,
5671
exclude_constants = FALSE
5772
)
5873

74+
# Ensure na_values is not NULL
75+
if(is.null(na_values)){
76+
na_values <- character(0)
77+
}
78+
5979
summary_table_n <-
6080
data %>%
6181
select(PersonId, extracted_chr) %>%
@@ -67,23 +87,57 @@ hrvar_count_all <- function(data,
6787
select(PersonId, extracted_chr) %>%
6888
summarise_at(vars(extracted_chr),
6989
list(`WPAn_unique` = ~n_distinct(., na.rm = TRUE), # Excludes NAs from unique count
70-
`WPAper_na` = ~(sum(is.na(.))/ nrow(data) * 100),
71-
`WPAsum_na` = ~sum(is.na(.)) # Number of missing values
72-
)) %>% # % of missing values
90+
`WPAper_na` = ~(sum(is.na(.) | . %in% na_values, na.rm = TRUE)/ nrow(data) * 100), # % of missing values including na_values
91+
`WPAsum_na` = ~sum(is.na(.) | . %in% na_values, na.rm = TRUE), # Number of missing values including na_values
92+
`WPAtext_na` = ~sum(!is.na(.) & . %in% na_values, na.rm = TRUE) # Number of text values considered as NA
93+
)) %>%
7394
tidyr::gather(attribute, values) %>%
7495
tidyr::separate(col = attribute, into = c("attribute", "calculation"), sep = "_WPA") %>%
7596
tidyr::spread(calculation, values)
7697

98+
# Initialize printMessage
99+
printMessage <- ""
100+
77101
## Single print message
78102
if(sum(results$n_unique >= threshold)==0){
79-
80103
printMessage <- paste("No attributes have greater than", threshold, "unique values.")
81104
}
82105

83106
if(sum(results$per_na >= maxna)==0){
84107
newMessage <- paste("No attributes have more than", maxna, "percent NA values.")
85108
printMessage <- paste(printMessage, newMessage, collapse = "\n")
109+
}
86110

111+
# Check for text NA values
112+
if(length(na_values) > 0 && any(colnames(results) == "text_na")) {
113+
total_text_na <- sum(results$text_na, na.rm = TRUE)
114+
115+
if(total_text_na > 0) {
116+
# Find which NA values were actually found in the data
117+
found_na_values <- c()
118+
for(na_val in na_values) {
119+
for(col in extracted_chr) {
120+
if(col %in% names(data)) {
121+
if(any(data[[col]] == na_val, na.rm = TRUE)) {
122+
found_na_values <- c(found_na_values, na_val)
123+
break
124+
}
125+
}
126+
}
127+
}
128+
129+
found_na_values <- unique(found_na_values)
130+
131+
if(length(found_na_values) > 0) {
132+
newMessage <- paste0(
133+
"There are ", total_text_na,
134+
" values which may potentially represent missing values: ",
135+
paste(found_na_values, collapse = ", "),
136+
"."
137+
)
138+
printMessage <- paste(printMessage, newMessage, collapse = "\n")
139+
}
140+
}
87141
}
88142

89143
for (i in 1:nrow(results)) {

R/validation_report.R

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@
5555
#' to the `threshold` argument within `hrvar_count_all()`.
5656
#' @param timestamp Logical vector specifying whether to include a timestamp in
5757
#' the file name. Defaults to `TRUE`.
58+
#' @param na_values Character vector of values to be treated as missing in addition
59+
#' to NA values. Defaults to c("NA", "N/A", "#N/A", " ").
5860
#'
5961
#' @section Creating a report:
6062
#'
@@ -81,7 +83,8 @@ validation_report <- function(data,
8183
hrvar = "Organization",
8284
path = "validation report",
8385
hrvar_threshold = 150,
84-
timestamp = TRUE){
86+
timestamp = TRUE,
87+
na_values = c("NA", "N/A", "#N/A", " ")){
8588

8689
## Create timestamped path (if applicable)
8790
if(timestamp == TRUE){
@@ -195,7 +198,7 @@ validation_report <- function(data,
195198

196199
read_preamble("organizational_data_quality.md"), #13, Header - 2. Organizational Data Quality
197200
read_preamble("attributes_available.md"),#14
198-
data %>% hrvar_count_all(return = "table", threshold = hrvar_threshold),
201+
data %>% hrvar_count_all(return = "table", threshold = hrvar_threshold, na_values = na_values),
199202

200203
read_preamble("groups_under_privacy_threshold_1.md"), #16, Header - 2.2 Groups under Privacy Threshold
201204
paste(">", data %>% identify_privacythreshold(return="text")),

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
## Analyze and Visualize Viva Leader Insights data
1010

11-
This is an R package for analyzing and visualizing data from [Microsoft Workplace Analytics](https://docs.microsoft.com/en-us/workplace-analytics/). For analyzing data from [Microsoft Viva Insights](https://analysis.insights.viva.office.com/), please see our other package [**vivainsights**](https://microsoft.github.io/vivainsights/).
11+
This is an R package for analyzing and visualizing data from Microsoft Workplace Analytics. For analyzing data from [Microsoft Viva Insights](https://analysis.insights.viva.office.com/), please see our other package [**vivainsights**](https://microsoft.github.io/vivainsights/).
1212

1313
## With the **wpa** package, you can...
1414

@@ -34,7 +34,7 @@ To get started with the package, please see the following links:
3434
- [Full function list](https://microsoft.github.io/wpa/reference/index.html)
3535
- [Analyst Guide](https://microsoft.github.io/wpa/analyst_guide.html)
3636
- [FAQ](https://microsoft.github.io/wpa/faq.html)
37-
- [Microsoft Learn module](https://docs.microsoft.com/en-us/learn/modules/workplace-analytics-r-package/)
37+
- [Microsoft Learn module](https://learn.microsoft.com/en-us/training/modules/workplace-analytics-r-package/)
3838

3939
Also check out our package cheat sheet for a quick glimpse of what **wpa** offers:
4040

cran-comments.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,3 @@
66
## R CMD check results
77

88
0 errors | 0 warnings | 0 note
9-
10-
## Submission 1.9.1
11-
12-
Minor bug fixes
13-

man/hrvar_count_all.Rd

Lines changed: 10 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/theme_wpa.Rd

Lines changed: 1 addition & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)