@@ -126,7 +126,8 @@ lod_root2 <- matrix(
126126 nrow = nrow(D_tilde),
127127 ncol = ncol(D_tilde), byrow = TRUE
128128)
129- D_tilde[which(lod_info$tilde_mask == 1)] <- lod_root2[which(lod_info$tilde_mask == 1)]
129+ lod_idxs <- which(lod_info$tilde_mask == 1)
130+ D_tilde[lod_idxs] <- lod_root2[lod_idxs]
130131plot_matrix(D_tilde)
131132```
132133
@@ -154,9 +155,11 @@ indicative of complex underlying patterns and a relatively large degree of noise
154155Most EH data can be described this way. ` root_pcp() ` is best for data characterized
155156by rapidly decaying singular values, indicative of very well-defined latent patterns.
156157
157- For a simple example like the above, both PCP models are perfectly suitable.
158+ The singular values plotted above decay quickly from the first to the second, but very gradually
159+ from the second onward. For this simple simulated dataset, both PCP models are perfectly suitable.
158160We will use ` rrmc() ` , as this is the model environmental health researchers will
159- likely employ most frequently.
161+ likely employ most frequently. The ` vignette("pcp-applied") ` contains an exemplary
162+ mixtures matrix singular value plot with slowly decaying singular values.
160163
161164## Grid search for parameter tuning
162165
@@ -188,7 +191,7 @@ passed `etas` as the `grid` argument to search and sent $r = 5$ as a constant
188191parameter common to all models in the search. Since ` length(etas) = 6 ` and $r = 5$, we
189192searched through 30 different PCP models. The ` num_runs ` argument determines how many (random)
190193tests should be performed for each unique model setting. By default, ` num_runs = 100 ` ,
191- so our grid search tuned ` r ` and ` eta ` by measuring the performance of 300 different PCP models.
194+ so our grid search tuned ` r ` and ` eta ` by measuring the performance of 3000 different PCP models.
192195We passed the simulated ` lod ` vector as another constant to the grid search,
193196equipping each ` rrmc() ` run with the same LOD information.
194197
@@ -205,7 +208,13 @@ gs$summary_stats
205208Inspecting the ` summary_stats ` table from the output grid search provides the mean-aggregated
206209statistics for each of the 30 distinct parameter settings we tested.
207210The grid search correctly identified the rank ` r r_star ` solution as the best
208- (lowest relative error rate). The corresponding ` eta ` = ` r eta_star ` .
211+ (lowest relative error ` rel_err ` rate). The corresponding ` eta ` = ` r eta_star ` . The top three parameter
212+ settings also seem to have reasonable ` S_sparsity ` levels as well (all are above ` 0.95 ` ). The next three
213+ parameter settings seem to under-regularize the sparse ` S ` matrix by quite a bit, as 80% of entries are non-zero.
214+ We will take the top parameters identified by the grid search in this instance. Had the very top parameters
215+ yielded a sparsity of e.g. ` 0.7 ` , we likely then would have preferred the second set of parameters with sparisities in the
216+ ` 0.9 ` s. This decision would have been grounded in prior assumptions about the amount of outliers to expect in the mixtuere.
217+ For more on the interpreation of grid search results, consult the documentation for the ` grid_search_cv() ` function.
209218
210219## Running PCP
211220
@@ -286,6 +295,8 @@ PCP's sparse matrix estimate was only off from the ground truth `S_0` by
286295
287296We can now pair our estimated ` L ` matrix with any matrix factorization method of our
288297choice (e.g. PCA, factor analysis, or non-negative matrix factorization) to extract
289- the latent chemical exposure patterns. These patterns, along with the isolated outlying
298+ the latent chemical exposure patterns (an example of what this looks like is
299+ in ` vignette("pcp-applied") ` , where non-negative matrix factorization is used to extract
300+ patterns from PCP's ` L ` matrix). These patterns, along with the isolated outlying
290301exposure events in ` S ` , can then be analyzed with any outcomes of interest in
291302downstream epidemiological analyses.
0 commit comments