Skip to content

Commit 3bd482f

Browse files
committed
typo
1 parent 0a1b1e4 commit 3bd482f

File tree

6 files changed

+299
-224
lines changed

6 files changed

+299
-224
lines changed

README.md

Lines changed: 72 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,40 @@
11
# Rforce
22

3-
* We here to introduce [Rforce:Random Forests for Composite Endpoints](https://onlinelibrary.wiley.com/doi/10.1002/sim.70413), consisting of non-fatal composite events and terminal events. It utilizes generalized estimating equations(GEE) to build trees and handles the dependent censoring due to the terminal events with the concept of pseudo-at-risk duration.
3+
**Rforce** implements the methodology described in [Rforce: Random Forests for Composite Endpoints](https://onlinelibrary.wiley.com/doi/10.1002/sim.70413), which models composite endpoints consisting of **non-fatal events and terminal events**.
44

5-
* This work has been awared as one of 4 receipents in American statistical Association [2024 Student Paper Competition](https://community.amstat.org/jointscsg-section/awards/student-paper-competition) (Section on Statistical Computing and Section on Statistical Graphics). It is now publised in Statistics in Medicine, PMID: [41640374](https://pubmed.ncbi.nlm.nih.gov/41640374/) DOI: [10.1002/sim.70413](https://onlinelibrary.wiley.com/doi/10.1002/sim.70413).
5+
The method builds random forests using **generalized estimating equations (GEE)** and handles dependent censoring caused by terminal events using the concept of **pseudo-at-risk duration**.
66

7-
* It not only gives the methodological soundness, but also offers both `R API` and `C API`, provides both computing and memory efficiency, enables parallel mechanism through [OpenMP](https://www.openmp.org/) while ensure the [reproducibility](https://yuw444.github.io/Rforce/articles/get-started.html#in-1-equivalent-command).
7+
This work received the **2024 Student Paper Competition Award** from the American Statistical Association (ASA), jointly from the [Section on Statistical Computing and Section on Statistical Graphics](https://community.amstat.org/jointscsg-section/awards/student-paper-competition).
88

9+
The paper is published in *Statistics in Medicine*:
10+
11+
- PMID: [41640374](https://pubmed.ncbi.nlm.nih.gov/41640374/)
12+
- DOI: [10.1002/sim.70413](https://onlinelibrary.wiley.com/doi/10.1002/sim.70413)
13+
14+
The software provides both:
15+
16+
- **R API**
17+
- **C API**
18+
19+
Key features include:
20+
21+
- High computational and memory efficiency
22+
- Parallel computation using [OpenMP](https://www.openmp.org/)
23+
- Reproducible results (see the reproducibility example [here](https://yuw444.github.io/Rforce/articles/get-started.html#in-1-equivalent-command))
924

1025
---
11-
## Installation
1226

13-
### Dependecy
27+
# Installation
28+
29+
## Dependencies
1430

15-
* `cmake>=3.16.0`: compile tool for `C API`
16-
* `OpenMP`: enable the parallel mechanism
17-
* `R>=4.3.3`: enable interaction with `R`
31+
- `cmake >= 3.16.0` – build system for the C API
32+
- `OpenMP`parallel computing
33+
- `R >= 4.3.3` – R interface
1834

19-
### R API
35+
---
36+
37+
## Install R API
2038

2139
```r
2240
# install.packages("devtools")
@@ -25,7 +43,7 @@ devtools::install_github("yuw444/Rforce")
2543

2644
### C API
2745

28-
```
46+
```bash
2947
git clone https://github.com/yuw444/Rforce.git
3048
cd Rforce
3149
mkdir build
@@ -34,16 +52,18 @@ cmake ..
3452
make
3553
```
3654

37-
A `CMakeLists.txt` file is provided in the repository
55+
A `CMakeLists.txt` file is provided in the repository.
3856

3957
---
58+
4059
## Usage
4160

4261
### R Examples
4362

44-
* Examples: [Get Started](https://yuw444.github.io/Rforce/articles/get-started.html).
63+
* Examples: [Get Started](https://yuw444.github.io/Rforce/articles/get-started.html).
4564

4665
---
66+
4767
### Shell Scripts
4868

4969
```bash
@@ -70,39 +90,39 @@ Rforce train <options>
7090

7191
**Options:**
7292

73-
| Option | Description | Required/Optional | Default |
74-
|:-----------------------------|:------------|:------------------|:--------|
75-
| `-d, --designMatrixY=<str>` | Path to design matrix | **Required** | |
76-
| `-a, --auxiliary=<str>` | Path to auxiliary features | **Required** | |
77-
| `-u, --unitsOfCPIU=<str>` | Path to unitsOfCPIU file | **Required** | |
78-
| `-o, --out=<str>` | Path to output directory | Optional | Current working directory |
79-
| `-v, --verbose=<int>` | Verbosity level (0–3) | Optional | 0 |
80-
| `-m, --maxDepth=<int>` | Maximum tree depth | Optional | 10 |
81-
| `-n, --minNodeSize=<int>` | Minimum node size | Optional | 2 × len(unitsOfCPIU) - 1 |
82-
| `-g, --gain=<float>` | Minimum gain for split | Optional | 0.0 (likelihood-based) or 1.3 (GEE-based) |
83-
| `-t, --mtry=<int>` | Number of variables to try during splitting | Optional | √(number of variables) |
84-
| `-s, --nsplits=<int>` | Number of splits to try per variable | Optional | 10 |
85-
| `-r, --nTrees=<int>` | Number of trees | Optional | 200 |
86-
| `-e, --seed=<int>` | Random seed | Optional | 926 |
87-
| `-p, --nPerms=<int>` | Number of permutations for variable importance | Optional | 10 |
88-
| `-u, --nVars=<int>` | Number of variables in the design matrix | Optional | Number of columns |
89-
| `-i, --pathVarIds=<str>` | Variable IDs (categorical variables supported via repeated IDs) | Optional | |
90-
| `-x, --iDot` | Output tree DOT files | Optional | False |
91-
| `-k, --k=<int>` | Bayesian estimator parameter for leaf output | Optional | 4 |
92-
| `-L, --long` | Use multiple rows per patient (RF-SLAM style) | Optional | |
93-
| `-N, --nopseudo` | Do not estimate pseudo risk time | Optional | |
94-
| `-P, --pseudorisk1` | Use original pseudo-risk time (population level) | Optional | |
95-
| `-B, --pseudorisk2` | Recalculate pseudo-risk time at each tree (default) | Optional | |
96-
| `-D, --dynamicrisk` | Dynamically estimate pseudo-risk time at each split | Optional | |
97-
| `-F, --nophi` | Fix φ = 1, do not estimate φ | Optional | |
98-
| `-P, --phi1` | Estimate φ at population level | Optional | |
99-
| `-H, --phi2` | Estimate φ at tree level (default) | Optional | |
100-
| `-Y, --dynamicphi` | Dynamically estimate φ at each split | Optional | |
101-
| `-G, --gee` | Use GEE approach | Optional | |
102-
| `-A, --padjust=<str>` | p-value adjustment method (`bonferroni`, `holm`, `hochberg`, `hommel`, `BH`, `BY`, `none`) | Optional | `BH` |
103-
| `-I, --interaction` | Add interaction terms for GEE | Optional | NULL |
104-
| `-S, --asym` | Use asymptotic approach | Optional | |
105-
| `-T, --threads=<int>` | Number of parallel computing threads | Optional | 8 |
93+
| Option | Description | Required/Optional | Default |
94+
|-------------------------------|------------------------------------------|-------------------|----------------------------------|
95+
| `-d, --designMatrixY=<str>` | Path to design matrix | **Required** | |
96+
| `-a, --auxiliary=<str>` | Path to auxiliary features | **Required** | |
97+
| `-u, --unitsOfCPIU=<str>` | Path to unitsOfCPIU file | **Required** | |
98+
| `-o, --out=<str>` | Path to output directory | Optional | Current working directory |
99+
| `-v, --verbose=<int>` | Verbosity level (0–3) | Optional | 0 |
100+
| `-m, --maxDepth=<int>` | Maximum tree depth | Optional | 10 |
101+
| `-n, --minNodeSize=<int>` | Minimum node size | Optional | 2 × len(unitsOfCPIU) - 1 |
102+
| `-g, --gain=<float>` | Minimum gain for split | Optional | 0.0 (likelihood-based) or 1.3 (GEE-based) |
103+
| `-t, --mtry=<int>` | Number of variables to try during splitting | Optional | √(number of variables) |
104+
| `-s, --nsplits=<int>` | Number of splits to try per variable | Optional | 10 |
105+
| `-r, --nTrees=<int>` | Number of trees | Optional | 200 |
106+
| `-e, --seed=<int>` | Random seed | Optional | 926 |
107+
| `-p, --nPerms=<int>` | Number of permutations for variable importance | Optional | 10 |
108+
| `-u, --nVars=<int>` | Number of variables in the design matrix | Optional | Number of columns |
109+
| `-i, --pathVarIds=<str>` | Variable IDs (categorical variables supported via repeated IDs) | Optional | |
110+
| `-x, --iDot` | Output tree DOT files | Optional | False |
111+
| `-k, --k=<int>` | Bayesian estimator parameter for leaf output | Optional | 4 |
112+
| `-L, --long` | Use multiple rows per patient (RF-SLAM style) | Optional | |
113+
| `-N, --nopseudo` | Do not estimate pseudo risk time | Optional | |
114+
| `-P, --pseudorisk1` | Use original pseudo-risk time (population level) | Optional | |
115+
| `-B, --pseudorisk2` | Recalculate pseudo-risk time at each tree (default) | Optional | |
116+
| `-D, --dynamicrisk` | Dynamically estimate pseudo-risk time at each split | Optional | |
117+
| `-F, --nophi` | Fix φ = 1, do not estimate φ | Optional | |
118+
| `-P, --phi1` | Estimate φ at population level | Optional | |
119+
| `-H, --phi2` | Estimate φ at tree level (default) | Optional | |
120+
| `-Y, --dynamicphi` | Dynamically estimate φ at each split | Optional | |
121+
| `-G, --gee` | Use GEE approach | Optional | |
122+
| `-A, --padjust=<str>` | p-value adjustment method (`bonferroni`, `holm`, `hochberg`, `hommel`, `BH`, `BY`, `none`) | Optional | `BH` |
123+
| `-I, --interaction` | Add interaction terms for GEE | Optional | NULL |
124+
| `-S, --asym` | Use asymptotic approach | Optional | |
125+
| `-T, --threads=<int>` | Number of parallel computing threads | Optional | 8 |
106126

107127
---
108128

@@ -116,11 +136,11 @@ Rforce predict <options>
116136

117137
**Options:**
118138

119-
| Option | Description | Required/Optional | Default |
120-
|:-------------------|:------------|:------------------|:--------|
121-
| `-m, --model=<str>` | Path to trained model | **Required** | |
122-
| `-t, --test=<str>` | Path to test data | **Required** | |
123-
| `-o, --out=<str>` | Path to output directory | Optional | Current working directory |
139+
| Option | Description | Required/Optional | Default |
140+
|-----------------------|---------------------------|-------------------|---------------------------|
141+
| `-m, --model=<str>` | Path to trained model | **Required** | |
142+
| `-t, --test=<str>` | Path to test data | **Required** | |
143+
| `-o, --out=<str>` | Path to output directory | Optional | Current working directory |
124144

125145
---
126146

@@ -146,8 +166,8 @@ Rforce predict -m output_folder/model.rforce -t test_data.csv -o prediction_resu
146166
- Dynamic options (`--dynamicrisk`, `--dynamicphi`) allow estimates at each split for more flexibility.
147167
- Parallel computation is supported via the `--threads` option.
148168
- GEE-based splitting with p-value adjustment is available.
149-
- An R API is currently actively developing which includes
150-
- Classical survivial data generation
169+
- An R API is currently actively developing which includes:
170+
- Classical survival data generation
151171
- Composite endpoint data generation
152172
- [`Wcompo`](https://cran.r-project.org/web/packages/Wcompo/index.html) methodology realization
153173
- An R interface to **Rforce**

docs/authors.html

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)