11# Rforce
22
3- * We here to introduce [ Rforce: Random Forests for Composite Endpoints] ( https://onlinelibrary.wiley.com/doi/10.1002/sim.70413 ) , consisting of non-fatal composite events and terminal events. It utilizes generalized estimating equations(GEE) to build trees and handles the dependent censoring due to the terminal events with the concept of pseudo-at-risk duration.
3+ ** Rforce ** implements the methodology described in [ Rforce: Random Forests for Composite Endpoints] ( https://onlinelibrary.wiley.com/doi/10.1002/sim.70413 ) , which models composite endpoints consisting of ** non-fatal events and terminal events** .
44
5- * This work has been awared as one of 4 receipents in American statistical Association [ 2024 Student Paper Competition ] ( https://community.amstat.org/jointscsg-section/awards/student-paper-competition ) (Section on Statistical Computing and Section on Statistical Graphics). It is now publised in Statistics in Medicine, PMID: [ 41640374 ] ( https://pubmed.ncbi.nlm.nih.gov/41640374/ ) DOI: [ 10.1002/sim.70413 ] ( https://onlinelibrary.wiley.com/doi/10.1002/sim.70413 ) .
5+ The method builds random forests using ** generalized estimating equations (GEE) ** and handles dependent censoring caused by terminal events using the concept of ** pseudo-at-risk duration ** .
66
7- * It not only gives the methodological soundness, but also offers both ` R API ` and ` C API ` , provides both computing and memory efficiency, enables parallel mechanism through [ OpenMP ] ( https://www.openmp .org/ ) while ensure the [ reproducibility ] ( https://yuw444.github.io/Rforce/articles/get-started.html#in-1-equivalent-command ) .
7+ This work received the ** 2024 Student Paper Competition Award ** from the American Statistical Association (ASA), jointly from the [ Section on Statistical Computing and Section on Statistical Graphics ] ( https://community.amstat .org/jointscsg-section/awards/student-paper-competition ) .
88
9+ The paper is published in * Statistics in Medicine* :
10+
11+ - PMID: [ 41640374] ( https://pubmed.ncbi.nlm.nih.gov/41640374/ )
12+ - DOI: [ 10.1002/sim.70413] ( https://onlinelibrary.wiley.com/doi/10.1002/sim.70413 )
13+
14+ The software provides both:
15+
16+ - ** R API**
17+ - ** C API**
18+
19+ Key features include:
20+
21+ - High computational and memory efficiency
22+ - Parallel computation using [ OpenMP] ( https://www.openmp.org/ )
23+ - Reproducible results (see the reproducibility example [ here] ( https://yuw444.github.io/Rforce/articles/get-started.html#in-1-equivalent-command ) )
924
1025---
11- ## Installation
1226
13- ### Dependecy
27+ # Installation
28+
29+ ## Dependencies
1430
15- * ` cmake>= 3.16.0 ` : compile tool for ` C API `
16- * ` OpenMP ` : enable the parallel mechanism
17- * ` R>= 4.3.3 ` : enable interaction with ` R `
31+ - ` cmake >= 3.16.0 ` – build system for the C API
32+ - ` OpenMP ` – parallel computing
33+ - ` R >= 4.3.3 ` – R interface
1834
19- ### R API
35+ ---
36+
37+ ## Install R API
2038
2139``` r
2240# install.packages("devtools")
@@ -25,7 +43,7 @@ devtools::install_github("yuw444/Rforce")
2543
2644### C API
2745
28- ```
46+ ``` bash
2947git clone https://github.com/yuw444/Rforce.git
3048cd Rforce
3149mkdir build
@@ -34,16 +52,18 @@ cmake ..
3452make
3553```
3654
37- A ` CMakeLists.txt ` file is provided in the repository
55+ A ` CMakeLists.txt ` file is provided in the repository.
3856
3957---
58+
4059## Usage
4160
4261### R Examples
4362
44- * Examples: [ Get Started] ( https://yuw444.github.io/Rforce/articles/get-started.html ) .
63+ * Examples: [ Get Started] ( https://yuw444.github.io/Rforce/articles/get-started.html ) .
4564
4665---
66+
4767### Shell Scripts
4868
4969``` bash
@@ -70,39 +90,39 @@ Rforce train <options>
7090
7191** Options:**
7292
73- | Option | Description | Required/Optional | Default |
74- | : -----------------------------| : ------------| : ------------------| : --------|
75- | ` -d, --designMatrixY=<str> ` | Path to design matrix | ** Required** | |
76- | ` -a, --auxiliary=<str> ` | Path to auxiliary features | ** Required** | |
77- | ` -u, --unitsOfCPIU=<str> ` | Path to unitsOfCPIU file | ** Required** | |
78- | ` -o, --out=<str> ` | Path to output directory | Optional | Current working directory |
79- | ` -v, --verbose=<int> ` | Verbosity level (0–3) | Optional | 0 |
80- | ` -m, --maxDepth=<int> ` | Maximum tree depth | Optional | 10 |
81- | ` -n, --minNodeSize=<int> ` | Minimum node size | Optional | 2 × len(unitsOfCPIU) - 1 |
82- | ` -g, --gain=<float> ` | Minimum gain for split | Optional | 0.0 (likelihood-based) or 1.3 (GEE-based) |
83- | ` -t, --mtry=<int> ` | Number of variables to try during splitting | Optional | √(number of variables) |
84- | ` -s, --nsplits=<int> ` | Number of splits to try per variable | Optional | 10 |
85- | ` -r, --nTrees=<int> ` | Number of trees | Optional | 200 |
86- | ` -e, --seed=<int> ` | Random seed | Optional | 926 |
87- | ` -p, --nPerms=<int> ` | Number of permutations for variable importance | Optional | 10 |
88- | ` -u, --nVars=<int> ` | Number of variables in the design matrix | Optional | Number of columns |
89- | ` -i, --pathVarIds=<str> ` | Variable IDs (categorical variables supported via repeated IDs) | Optional | |
90- | ` -x, --iDot ` | Output tree DOT files | Optional | False |
91- | ` -k, --k=<int> ` | Bayesian estimator parameter for leaf output | Optional | 4 |
92- | ` -L, --long ` | Use multiple rows per patient (RF-SLAM style) | Optional | |
93- | ` -N, --nopseudo ` | Do not estimate pseudo risk time | Optional | |
94- | ` -P, --pseudorisk1 ` | Use original pseudo-risk time (population level) | Optional | |
95- | ` -B, --pseudorisk2 ` | Recalculate pseudo-risk time at each tree (default) | Optional | |
96- | ` -D, --dynamicrisk ` | Dynamically estimate pseudo-risk time at each split | Optional | |
97- | ` -F, --nophi ` | Fix φ = 1, do not estimate φ | Optional | |
98- | ` -P, --phi1 ` | Estimate φ at population level | Optional | |
99- | ` -H, --phi2 ` | Estimate φ at tree level (default) | Optional | |
100- | ` -Y, --dynamicphi ` | Dynamically estimate φ at each split | Optional | |
101- | ` -G, --gee ` | Use GEE approach | Optional | |
102- | ` -A, --padjust=<str> ` | p-value adjustment method (` bonferroni ` , ` holm ` , ` hochberg ` , ` hommel ` , ` BH ` , ` BY ` , ` none ` ) | Optional | ` BH ` |
103- | ` -I, --interaction ` | Add interaction terms for GEE | Optional | NULL |
104- | ` -S, --asym ` | Use asymptotic approach | Optional | |
105- | ` -T, --threads=<int> ` | Number of parallel computing threads | Optional | 8 |
93+ | Option | Description | Required/Optional | Default |
94+ | ------------------------------- | ------------------------------------------ | ------------------- | -------------------------- --------|
95+ | ` -d, --designMatrixY=<str> ` | Path to design matrix | ** Required** | |
96+ | ` -a, --auxiliary=<str> ` | Path to auxiliary features | ** Required** | |
97+ | ` -u, --unitsOfCPIU=<str> ` | Path to unitsOfCPIU file | ** Required** | |
98+ | ` -o, --out=<str> ` | Path to output directory | Optional | Current working directory |
99+ | ` -v, --verbose=<int> ` | Verbosity level (0–3) | Optional | 0 |
100+ | ` -m, --maxDepth=<int> ` | Maximum tree depth | Optional | 10 |
101+ | ` -n, --minNodeSize=<int> ` | Minimum node size | Optional | 2 × len(unitsOfCPIU) - 1 |
102+ | ` -g, --gain=<float> ` | Minimum gain for split | Optional | 0.0 (likelihood-based) or 1.3 (GEE-based) |
103+ | ` -t, --mtry=<int> ` | Number of variables to try during splitting | Optional | √(number of variables) |
104+ | ` -s, --nsplits=<int> ` | Number of splits to try per variable | Optional | 10 |
105+ | ` -r, --nTrees=<int> ` | Number of trees | Optional | 200 |
106+ | ` -e, --seed=<int> ` | Random seed | Optional | 926 |
107+ | ` -p, --nPerms=<int> ` | Number of permutations for variable importance | Optional | 10 |
108+ | ` -u, --nVars=<int> ` | Number of variables in the design matrix | Optional | Number of columns |
109+ | ` -i, --pathVarIds=<str> ` | Variable IDs (categorical variables supported via repeated IDs) | Optional | |
110+ | ` -x, --iDot ` | Output tree DOT files | Optional | False |
111+ | ` -k, --k=<int> ` | Bayesian estimator parameter for leaf output | Optional | 4 |
112+ | ` -L, --long ` | Use multiple rows per patient (RF-SLAM style) | Optional | |
113+ | ` -N, --nopseudo ` | Do not estimate pseudo risk time | Optional | |
114+ | ` -P, --pseudorisk1 ` | Use original pseudo-risk time (population level) | Optional | |
115+ | ` -B, --pseudorisk2 ` | Recalculate pseudo-risk time at each tree (default) | Optional | |
116+ | ` -D, --dynamicrisk ` | Dynamically estimate pseudo-risk time at each split | Optional | |
117+ | ` -F, --nophi ` | Fix φ = 1, do not estimate φ | Optional | |
118+ | ` -P, --phi1 ` | Estimate φ at population level | Optional | |
119+ | ` -H, --phi2 ` | Estimate φ at tree level (default) | Optional | |
120+ | ` -Y, --dynamicphi ` | Dynamically estimate φ at each split | Optional | |
121+ | ` -G, --gee ` | Use GEE approach | Optional | |
122+ | ` -A, --padjust=<str> ` | p-value adjustment method (` bonferroni ` , ` holm ` , ` hochberg ` , ` hommel ` , ` BH ` , ` BY ` , ` none ` ) | Optional | ` BH ` |
123+ | ` -I, --interaction ` | Add interaction terms for GEE | Optional | NULL |
124+ | ` -S, --asym ` | Use asymptotic approach | Optional | |
125+ | ` -T, --threads=<int> ` | Number of parallel computing threads | Optional | 8 |
106126
107127---
108128
@@ -116,11 +136,11 @@ Rforce predict <options>
116136
117137** Options:**
118138
119- | Option | Description | Required/Optional | Default |
120- | : -------------------| : ------------| : ------------------| : --------|
121- | ` -m, --model=<str> ` | Path to trained model | ** Required** | |
122- | ` -t, --test=<str> ` | Path to test data | ** Required** | |
123- | ` -o, --out=<str> ` | Path to output directory | Optional | Current working directory |
139+ | Option | Description | Required/Optional | Default |
140+ | ----------------------- | --------------------------- | ------------------- | ------------------- --------|
141+ | ` -m, --model=<str> ` | Path to trained model | ** Required** | |
142+ | ` -t, --test=<str> ` | Path to test data | ** Required** | |
143+ | ` -o, --out=<str> ` | Path to output directory | Optional | Current working directory |
124144
125145---
126146
@@ -146,8 +166,8 @@ Rforce predict -m output_folder/model.rforce -t test_data.csv -o prediction_resu
146166- Dynamic options (` --dynamicrisk ` , ` --dynamicphi ` ) allow estimates at each split for more flexibility.
147167- Parallel computation is supported via the ` --threads ` option.
148168- GEE-based splitting with p-value adjustment is available.
149- - An R API is currently actively developing which includes
150- - Classical survivial data generation
169+ - An R API is currently actively developing which includes:
170+ - Classical survival data generation
151171 - Composite endpoint data generation
152172 - [ ` Wcompo ` ] ( https://cran.r-project.org/web/packages/Wcompo/index.html ) methodology realization
153173 - An R interface to ** Rforce**
0 commit comments