Skip to content

Comments

Update md.pattern.R - Extended plot customisation#734

Open
MattxFive wants to merge 4 commits intoamices:masterfrom
MattxFive:patch-1
Open

Update md.pattern.R - Extended plot customisation#734
MattxFive wants to merge 4 commits intoamices:masterfrom
MattxFive:patch-1

Conversation

@MattxFive
Copy link

Addition of several parameters to allow further plot customisation. Using the function without any specification displays the same plot.

@rotate.names : allows a numeric value
@pattern.ord : allows to sort the plot according to the number of missing variables
@min.ind : allows to drop observations below a minimum of individuals
@nb.pat : allows to drop the number of variables composing the pattern - right numbers
@nb.var : allows to drop the number of individuals per variable missing - bottom numbers. Also automatically drops bottom-right corner total missing values.
@nb.ind : allows to drop the number of individuals composing the pattern - left numbers
@nb.tot : allows to drop the total number of missing values - bottom right corner. Automatically dropped by nb.var = FALSE
@names : allows to provide a custom vector of names to be displayed at the top of the plot
@colors : allows to choose colors to be displayed for missing and non-missing values

Line 56 : function updated with new parameters
Lines 63-65 : check consistency between ncol(x) and length of the @names, especially if a vector is specified
Lines 66-68 : set @nb.tot = FALSE if nb.var = FALSE
Line 70 : allocate @names
Lines 95-97 : drops rows below the minimum of observations specified if @min.ind if specified
Lines 99-101 : sort the matrix according to the number of variables in the patterns if @pattern.ord is specified
Lines 115-121 : adj and the plot parameters are modified to fit the numeric values of @rotate.names
Line 123 : uses @colors parameters used instead of static colors
Lines 127-129 : displays nb.var if TRUE
Lines 132-134 : displays nb.pat if TRUE
Lines 135-137 : displays nb.ind if TRUE
Lines 139-141 : displays nb.tot if TRUE

It is my first request, I hope it fits your useful work, I'm open to feedback.

Addition of several parameters to allow further plot customisation. Using the function without any specification displays the same plot.

@rotate.names : allows a numeric value
@pattern.ord : allows to sort the plot according to the number of missing variables
@min.ind : allows to drop observations below a minimum of individuals
@nb.pat : allows to drop the number of variables composing the pattern - right numbers
@nb.var : allows to drop the number of individuals per variable missing - bottom numbers. Also automatically drops bottom-right corner total missing values.
@nb.ind : allows to drop the number of individuals composing the pattern - left numbers
@nb.tot : allows to drop the total number of missing values - bottom right corner. Automatically dropped by nb.var = FALSE
@NAMEs : allows to provide a custom vector of names to be displayed at the top of the plot
@Colors : allows to choose colors to be displayed for missing and non-missing values

Line 56 : function updated with new parameters
Lines 63-65 : check consistency between ncol(x) and length of the @NAMEs, especially if a vector is specified
Lines 66-68 : set nb.tot = FALSE if nb.var = FALSE
Line 70 : allocate @NAMEs
Lines 95-97 : drops rows below the minimum of observations specified if @min.ind if specified
Lines 99-101 : sort the matrix according to the number of variables in the patterns if @pattern.ord is specified
Lines 115-121 : adj and the plot parameters are modified to fit the numeric values of @rotate.names
Line 123 : @Colors parameters used instead of static colors
Lines 127-129 : displays nb.var if TRUE
Lines 132-134 : displays nb.pat if TRUE
Lines 135-137 : displays nb.ind if TRUE
Lines 139-141 : displays nb.tot if TRUE
@MattxFive
Copy link
Author

In case of min.ind that drops some observations, the bottom numbers won't match the totals. I'll check how to fix it. In the meantime, the function can trigger a warning using the following additions :

Between lines 68-69 : adding a new variable, drop.ind = FALSE

Replace lines 95-97 with

if(!is.null(min.ind)) {
  if((min(as.numeric(rownames(mpat))) <= min.ind)){
    mpat <- mpat[as.numeric(rownames(mpat)) > min.ind, ]
    drop.ind <- TRUE
  }
}

Between lines 141-142 and 143-144, add a warning message

if(drop.ind){
  warning("Individuals were dropped using min.ind, totals on the bottom line will not match")
}

Major modifications, described below.

Added a rotate.var parameter to allow the rotation of bottom counts to improve clarity. Only proposed for 0 or 90 degrees angles.

Removed pattern.ord to replace it with a parameter that allows more sorting options. This case : sorting by number of missing value per pattern (right counts), increasing or decreasing, sorting by number of individuals per pattern (left counts), increasing or decreasing.

min.ind no longer causes problems when calculating the number of individuals per variable at the bottom of the graph.

Added drop.zero.vars, allows to drop all the empty columns to produce a more condensed graph. Useful when the function is used on large tables.

Removed nb.tot to be consistent with a new other parameter. Replacer with tot.mis, same as nb.tot, allows to display or not the total number of missing values at the bottom-right.

Added tot.ind, allows to display the total number of individuals at the bottom-left corner. Useful especially if min.ind removes some patterns with few individuals. Corresponds to the sum of individuals per pattern.

Updated the exemple on nhanes with the addition of tot.ind at the bottom-left corner.

Code modifications :
Lines 72-74 : checks the value of the order parameter. If an unvalid value is specified, the parameter is ignored and the graph will be displayed without order specification, as for the default function.

Line 82 : store the name of the columns, will be useful later if parameters are used to filter the data.

Lines 86-99 : Apply the min.ind parameter. First, count the number of individuals per pattern, then store those with more individuals than specified. If there is at least one pattern remaining, filter the database to remove the patterns to be excluded, and modify other variables to prevent the creation of too much databases. If there are no more patterns remaining, stops the function with a warning.

Lines 101-112 : Apply the drop.zero.vars parameter. Identifies empty columns and modify the related variables ; current.names and database.

Lines 123-126 : Prevent mpat from being vectorized if min.ind only keeps one single pattern of missing values.

Line 128 : The check is now done on R which has been modified (or not) with other parameters.

Lines 138-157 : Apply the order specified, if any.

Lines 165-169 : In order to simplify the next lines, useful values to build the graph is stored in a matrix. If the object would be a vector, especially because of min.ind, then it is forced to be a matrix in order to allow the next lines to work, and display a single-lined plot.

Lines 171-173 : Similar to the original, but allows the variables names to be rotated using a numeric angle.

Lines 176-178 : one.line is used to simplify the testing of single-lined plots, and to ensure the next lines will work. R.ncol and R.nrow to simplify the next lines of code.

Line 181 : current.names used because it is up to date compared to names.

Lines 184-191 : Ensure the graph is correctly displayed if min.ind only kept one single pattern of missing values.

Lines 198-202 : Rotate the variables count at the Bottom of the graph, if specified.

Lines 206-221 : Adds some conditions to display or not the counts, based on the parameters.
Based on the previous modifications, the end of the function has been changed to improve transparency about observations or variables that have been dropped due to min.ind and/or drop.zero.vars

The final lines of the function are replaced, from return(r) (in both if(plot) and else), to some cat that remind the size of the original database (observations and variables), and the size of the managed database.
@MattxFive
Copy link
Author

In the last commit, the function is still named "md.pattern2" which is my own name to prevent overwriting the original function. Of course, it should be renamed "md.pattern".
Sorry for that

Text adjustment based on a simpler condition if rotate.names is different of 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant