Skip to content

Commit 211dc1d

Browse files
authored
Merge pull request #143 from danieeeld2/performance-comparison
performance-comparison
2 parents 1b2a976 + bbeb302 commit 211dc1d

17 files changed

+646
-1
lines changed

README.md

Lines changed: 73 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,14 @@
88
- [3. 📁 Repository Structure](#3--repository-structure)
99
- [3.1. Tests Directory Structure](#31-tests-directory-structure)
1010
- [4. 👷🏻 GitHub Workflows](#4--github-workflows)
11+
- [5. 🐳 Docker Image](#5--docker-image)
1112
- [6. 📝 Functions Description](#6--functions-description)
1213
- [7. Example Usage](#7-example-usage)
1314
- [7.1 MATLAB/Octave](#71-matlaboctave)
1415
- [7.2. R](#72-r)
16+
- [8. 📈 Performance Comparison](#8--performance-comparison)
17+
- [8.1. `parglmVS` Performance](#81-parglmvs-performance)
18+
- [8.2. `vasca` Performance](#82-vasca-performance)
1519

1620
---
1721

@@ -63,6 +67,8 @@ The `matlab/` folder contains the implementation of all functions involved in th
6367
├── loadings_runners/
6468
├── loadings_test.go
6569
├── loadings_test_results/
70+
├── parglmVS_benchmark/
71+
├── parglmVS_benchmark.sh
6672
├── parglmVS_runners/
6773
├── parglmVS_test.go
6874
├── pcaEig_runners/
@@ -78,6 +84,8 @@ The `matlab/` folder contains the implementation of all functions involved in th
7884
├── scores_runners/
7985
├── scores_test.go
8086
├── scores_test_results/
87+
├── vasca_benchmark/
88+
├── vasca_benchmark.sh
8189
├── vasca_runners/
8290
└── vasca_test.go
8391
```
@@ -88,6 +96,8 @@ In general, all functions return structures with numerical data that can be comp
8896

8997
⚠️⚠️ *`parglmVS` is the function responsible for calculating all the data structures in the permutation test. These permutations are random, and for the test, we require a large number of permutations due to the fact that the random number generators in both languages are different. This results in the test taking a considerable amount of time, as discussed in the issues [#51](https://github.com/danieeeld2/vASCA-R/issues/51) and [#52](https://github.com/danieeeld2/vASCA-R/issues/52), as well as in the PR [#30](https://github.com/danieeeld2/vASCA-R/pull/30). At the end of this PR, you can find a screenshot showing the execution time of this test and confirming that the function passes the tests. If you want to reactivate it, simply go to the `parglmVS_test.go` code and comment the two lines that contain `t.Skip`.*
9098

99+
In addition to the tests, we provide the scripts `vasca_benchmark.sh` and `parglmVS_benchmark.sh`, which generate comparative performance plots of the R and MATLAB/Octave implementations of these functions. The results are stored in the `<function>_benchmark/` folder. You can find more information about this benchmark in [Section 8](#8--performance-comparison)
100+
91101
## 4. 👷🏻 GitHub Workflows
92102

93103
The repository includes two GitHub workflows: `docker.yml` and `test.yml`.
@@ -141,7 +151,7 @@ With this, you have a ready-to-use environment with all the code, languages, and
141151

142152
## 7. Example Usage
143153

144-
We will provide an example of an execution pipeline in each language, using the Docker image `danieeeld2/r-vasca-testing:latest` (or `danieeeld2/r-vasca-testing:r-dependencies-installed` if you want to have already installed R dependencies) and running everything from the root directory of the project. Start by running the image with a volume that includes the project, as instructed in [Section 5](#5-🐳-docker-image).
154+
We will provide an example of an execution pipeline in each language, using the Docker image `danieeeld2/r-vasca-testing:latest` (or `danieeeld2/r-vasca-testing:r-dependencies-installed` if you want to have already installed R dependencies) and running everything from the root directory of the project. Start by running the image with a volume that includes the project, as instructed in [Section 5](#5--docker-image).
145155

146156
```bash
147157
docker run -it --rm -v "$(pwd):/app" -w /app danieeeld2/r-vasca-testing:latest /bin/bash
@@ -272,3 +282,65 @@ for (i in seq_len(vascao$nFactors)) {
272282
}
273283
}
274284
```
285+
286+
## 8. 📈 Performance Comparison
287+
288+
In this repository, we have implemented several functions to carry out the VASCA pipeline in R. However, to evaluate the performance of the R implementation compared to its MATLAB/Octave counterpart, we focus on the two main scripts that form the core of the pipeline: `parglmVS` and `vasca`.
289+
290+
### 8.1. `parglmVS` Performance
291+
292+
The script `tests/parglmVS_benchmark.sh` automates the benchmarking process for the `parglmVS` function implemented in both R and MATLAB/Octave. It runs the function across multiple models (`linear`, `interaction`, and `full`) and a range of permutation values, measuring execution times for each configuration. The results are saved in a CSV file and visualized using several comparative plots, which are generated automatically with R and stored in the `parglmVS_benchmark/` folder. *The datasets used are `X_DATA="../datasets/tests_datasets/X_test.csv"` and `F_DATA="../datasets/tests_datasets/F_test.csv"`*.
293+
294+
<p align="center">
295+
<img src="./tests/parglmVS_benchmark/benchmark_comparison_all.png" alt="Comparison plot" width="48%"/>
296+
<img src="./tests/parglmVS_benchmark/benchmark_comparison_logscale.png" alt="Log-scale comparison plot" width="48%"/>
297+
</p>
298+
299+
<p align="center"><i>
300+
Comparison of execution times between R and Octave for the <code>parglmVS</code> function. The left plot shows the raw execution times across different models and number of permutations, while the right plot displays the same results using a log-log scale for better visualization of performance differences at large scales.
301+
</i></p>
302+
303+
<p align="center">
304+
<img src="./tests/parglmVS_benchmark/benchmark_linear.png" alt="Linear model benchmark" width="32%"/>
305+
<img src="./tests/parglmVS_benchmark/benchmark_interaction.png" alt="Interaction model benchmark" width="32%"/>
306+
<img src="./tests/parglmVS_benchmark/benchmark_full.png" alt="Full model benchmark" width="32%"/>
307+
</p>
308+
309+
<p align="center"><i>
310+
Execution time comparisons for the <code>parglmVS</code> function between R and Octave, separated by model type. Each plot shows how performance varies with the number of permutations for the linear, interaction, and full models, respectively.
311+
</i></p>
312+
313+
<p align="center">
314+
<img src="./tests/parglmVS_benchmark/benchmark_models_by_permutations.png" alt="Bar chart by model and permutations" width="75%"/>
315+
</p>
316+
317+
<p align="center"><i>
318+
Bar chart summarizing the execution times of the <code>parglmVS</code> function across different models and permutation counts, grouped by language (R vs. Octave). This visualization helps highlight relative performance differences depending on model complexity and computational load.
319+
</i></p>
320+
321+
As observed in the benchmark plots, the R implementation significantly outperforms the Octave version when computing the test structures for permutation testing across all three model types. This performance gap becomes increasingly pronounced as the number of permutations grows, highlighting the efficiency of the R-based approach in handling larger computational loads.
322+
323+
### 8.2. `vasca` Performance
324+
325+
The script `tests/vasca_benchmark.sh` automates the benchmarking process for the `vasca` function in both R and MATLAB/Octave. It evaluates performance across multiple datasets and two significance levels (`0.01` and `0.05`), measuring execution times for each configuration. The results are compiled into a CSV file and visualized through a variety of comparative plots, automatically generated with R and saved in the `vasca_benchmark/` directory. *The datasets used for this benchmark are four `.json` files located under `../datasets/tests_datasets/`, named `parglmVS_1.json` to `parglmVS_4.json`.*
326+
327+
<p align="center">
328+
<img src="./tests/vasca_benchmark/vasca_language_comparison.png" width="49%">
329+
<img src="./tests/vasca_benchmark/vasca_heatmap.png" width="49%">
330+
</p>
331+
332+
<p align="center">
333+
<em>The first image compares the execution time of the <code>vasca</code> function between R and Octave across different datasets and significance levels. Each point represents an individual execution time, with lines connecting the results for each language. The second image shows a heatmap visualizing the execution times of the <code>vasca</code> function, categorized by language and dataset, at various significance levels.</em>
334+
</p>
335+
336+
<p align="center">
337+
<img src="./tests/vasca_benchmark/vasca_comparison_all.png" width="75%">
338+
</p>
339+
340+
<p align="center">
341+
<em>Comparison of the execution time of the <code>vasca</code> function in R and Octave across four datasets, with bars grouped by significance level (0.01 and 0.05).</em>
342+
</p>
343+
344+
Although the performance of Octave/MATLAB is better in this case, the `vasca` function requires significantly less computational time compared to `parglmVS`. The performance differences between Octave and R are not as pronounced in this case, but when considering the entire pipeline, R will offer better overall performance. This is because the `parglmVS` function, which is the most computationally intensive part of the pipeline, is much more optimized in R, resulting in a faster overall execution when running the entire workflow in R.
345+
346+

tests/parglmVS_benchmark.sh

Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
#!/bin/bash
2+
# Parameter configuration
3+
MODELS=("linear" "interaction" "full")
4+
PERMS=(100 500 1000 2000 5000 10000 20000 50000 100000)
5+
BENCHMARK_DIR="parglmVS_benchmark"
6+
OUTPUT_FILE="$BENCHMARK_DIR/benchmark_results.csv"
7+
R_SCRIPT="./parglmVS_runners/parglmVS_run.R"
8+
OCTAVE_SCRIPT="./parglmVS_runners/parglmVS_run.m"
9+
X_DATA="../datasets/tests_datasets/X_test.csv"
10+
F_DATA="../datasets/tests_datasets/F_test.csv"
11+
12+
# Create benchmark directory if it doesn't exist
13+
mkdir -p "$BENCHMARK_DIR"
14+
15+
# Verify that files exist
16+
if [ ! -f "$R_SCRIPT" ]; then
17+
echo "Error: $R_SCRIPT not found"
18+
exit 1
19+
fi
20+
21+
if [ ! -f "$OCTAVE_SCRIPT" ]; then
22+
echo "Error: $OCTAVE_SCRIPT not found"
23+
exit 1
24+
fi
25+
26+
if [ ! -f "$X_DATA" ]; then
27+
echo "Error: $X_DATA not found"
28+
exit 1
29+
fi
30+
31+
if [ ! -f "$F_DATA" ]; then
32+
echo "Error: $F_DATA not found"
33+
exit 1
34+
fi
35+
36+
# Create results file with header
37+
echo "Language,Model,Permutations,Time(s)" > $OUTPUT_FILE
38+
39+
# Benchmark for R
40+
for model in "${MODELS[@]}"; do
41+
for perm in "${PERMS[@]}"; do
42+
echo "Running R with Model=$model, Permutations=$perm..."
43+
TIME=$( { time Rscript $R_SCRIPT $X_DATA $F_DATA Model $model Permutations $perm >/dev/null 2>&1; } 2>&1 | grep real | awk '{print $2}' | sed 's/m/*60+/g' | sed 's/s//g' | bc)
44+
echo "R,$model,$perm,$TIME" >> $OUTPUT_FILE
45+
# Remove any CSV files generated by R script
46+
find . -name "parglmVS_*.csv" -type f -delete
47+
done
48+
done
49+
50+
# Benchmark for Octave (MATLAB compatible)
51+
for model in "${MODELS[@]}"; do
52+
for perm in "${PERMS[@]}"; do
53+
echo "Running Octave with Model=$model, Permutations=$perm..."
54+
TIME=$( { time octave --no-gui -q $OCTAVE_SCRIPT $X_DATA $F_DATA Model $model Permutations $perm >/dev/null 2>&1; } 2>&1 | grep real | awk '{print $2}' | sed 's/m/*60+/g' | sed 's/s//g' | bc)
55+
echo "Octave,$model,$perm,$TIME" >> $OUTPUT_FILE
56+
# Remove any CSV files generated by Octave script
57+
find . -name "parglmVS_*.csv" -type f -delete
58+
done
59+
done
60+
61+
echo "Benchmark completed. Results in $OUTPUT_FILE"
62+
63+
# Generate comparative plots with R
64+
cat > $BENCHMARK_DIR/plot_benchmarks.R << EOF
65+
library(ggplot2)
66+
67+
# Read the data
68+
data <- read.csv("$OUTPUT_FILE")
69+
70+
# Convert time to numeric if not already
71+
data\$Time <- as.numeric(data\$Time.s.)
72+
73+
# Define a white background theme with grid lines
74+
white_theme <- theme_bw() +
75+
theme(
76+
panel.background = element_rect(fill = "white"),
77+
plot.background = element_rect(fill = "white"),
78+
legend.background = element_rect(fill = "white"),
79+
legend.key = element_rect(fill = "white"),
80+
panel.grid.major = element_line(color = "grey90"),
81+
panel.grid.minor = element_line(color = "grey95"),
82+
axis.line = element_line(color = "black"),
83+
text = element_text(color = "black"),
84+
axis.text = element_text(color = "black"),
85+
plot.title = element_text(face = "bold", size = 14),
86+
legend.position = "right"
87+
)
88+
89+
# Create a combined plot for all models
90+
p1 <- ggplot(data, aes(x=Permutations, y=Time, color=Language, shape=Model)) +
91+
geom_point(size=3) +
92+
geom_line(aes(linetype=Model)) +
93+
labs(title="Execution Time Comparison between R and Octave",
94+
x="Number of permutations",
95+
y="Execution time (seconds)") +
96+
scale_color_brewer(palette="Set1") +
97+
scale_x_log10(breaks = unique(data\$Permutations),
98+
labels = scales::comma(unique(data\$Permutations))) +
99+
white_theme
100+
101+
# Save the combined plot
102+
ggsave("$BENCHMARK_DIR/benchmark_comparison_all.png", p1, width=12, height=8, bg="white")
103+
104+
# Create separate plots for each model
105+
for (model_name in unique(data\$Model)) {
106+
subset_data <- data[data\$Model == model_name,]
107+
p2 <- ggplot(subset_data, aes(x=Permutations, y=Time, color=Language)) +
108+
geom_point(size=3) +
109+
geom_line(linewidth=1) +
110+
labs(title=paste("Model:", model_name),
111+
x="Number of permutations",
112+
y="Execution time (seconds)") +
113+
scale_color_brewer(palette="Set1") +
114+
scale_x_log10(breaks = unique(subset_data\$Permutations),
115+
labels = scales::comma(unique(subset_data\$Permutations))) +
116+
white_theme
117+
118+
ggsave(paste0("$BENCHMARK_DIR/benchmark_", model_name, ".png"), p2, width=10, height=6, bg="white")
119+
}
120+
121+
# Create bar chart to compare models grouped by language
122+
p3 <- ggplot(data, aes(x=Model, y=Time, fill=Language)) +
123+
geom_bar(stat="identity", position="dodge") +
124+
facet_wrap(~Permutations, scales="free_y") +
125+
labs(title="Model comparison by number of permutations",
126+
x="Model",
127+
y="Execution time (seconds)") +
128+
scale_fill_brewer(palette="Set1") +
129+
theme_bw() +
130+
theme(
131+
panel.background = element_rect(fill = "white"),
132+
plot.background = element_rect(fill = "white"),
133+
legend.background = element_rect(fill = "white"),
134+
strip.background = element_rect(fill = "lightgrey"),
135+
axis.text.x = element_text(angle = 45, hjust = 1),
136+
plot.title = element_text(face = "bold", size = 14)
137+
)
138+
139+
ggsave("$BENCHMARK_DIR/benchmark_models_by_permutations.png", p3, width=14, height=10, bg="white")
140+
141+
# Create a log-scale plot to better visualize performance across all permutation ranges
142+
p4 <- ggplot(data, aes(x=Permutations, y=Time, color=Language, shape=Model)) +
143+
geom_point(size=3) +
144+
geom_line(aes(linetype=Model)) +
145+
labs(title="Execution Time Comparison (Log-Log Scale)",
146+
x="Number of permutations (log scale)",
147+
y="Execution time (seconds, log scale)") +
148+
scale_color_brewer(palette="Set1") +
149+
scale_x_log10(breaks = unique(data\$Permutations),
150+
labels = scales::comma(unique(data\$Permutations))) +
151+
scale_y_log10() +
152+
white_theme
153+
154+
ggsave("$BENCHMARK_DIR/benchmark_comparison_logscale.png", p4, width=12, height=8, bg="white")
155+
EOF
156+
157+
# Run R script to generate plots
158+
echo "Generating comparative plots..."
159+
Rscript $BENCHMARK_DIR/plot_benchmarks.R
160+
161+
echo "Analysis complete. The following plots have been generated in $BENCHMARK_DIR:"
162+
echo "- benchmark_comparison_all.png (General comparison)"
163+
echo "- benchmark_comparison_logscale.png (Log-scale comparison)"
164+
echo "- benchmark_linear.png (Linear model)"
165+
echo "- benchmark_interaction.png (Interaction model)"
166+
echo "- benchmark_full.png (Full model)"
167+
echo "- benchmark_models_by_permutations.png (Comparison by permutations)"
234 KB
Loading
291 KB
Loading
130 KB
Loading
133 KB
Loading
129 KB
Loading
175 KB
Loading
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
Language,Model,Permutations,Time(s)
2+
R,linear,100,.653
3+
R,linear,500,.846
4+
R,linear,1000,.990
5+
R,linear,2000,1.319
6+
R,linear,5000,2.398
7+
R,linear,10000,4.096
8+
R,linear,20000,7.695
9+
R,linear,50000,18.094
10+
R,linear,100000,35.774
11+
R,interaction,100,.622
12+
R,interaction,500,.826
13+
R,interaction,1000,1.060
14+
R,interaction,2000,1.449
15+
R,interaction,5000,2.737
16+
R,interaction,10000,4.784
17+
R,interaction,20000,9.156
18+
R,interaction,50000,21.228
19+
R,interaction,100000,41.743
20+
R,full,100,.615
21+
R,full,500,.821
22+
R,full,1000,1.025
23+
R,full,2000,1.470
24+
R,full,5000,2.731
25+
R,full,10000,4.787
26+
R,full,20000,8.833
27+
R,full,50000,21.174
28+
R,full,100000,41.118
29+
Octave,linear,100,.335
30+
Octave,linear,500,.682
31+
Octave,linear,1000,1.131
32+
Octave,linear,2000,2.025
33+
Octave,linear,5000,4.687
34+
Octave,linear,10000,9.057
35+
Octave,linear,20000,18.545
36+
Octave,linear,50000,47.469
37+
Octave,linear,100000,93.257
38+
Octave,interaction,100,.360
39+
Octave,interaction,500,.842
40+
Octave,interaction,1000,1.458
41+
Octave,interaction,2000,2.668
42+
Octave,interaction,5000,6.297
43+
Octave,interaction,10000,12.339
44+
Octave,interaction,20000,24.857
45+
Octave,interaction,50000,61.538
46+
Octave,interaction,100000,122.224
47+
Octave,full,100,.357
48+
Octave,full,500,.849
49+
Octave,full,1000,1.453
50+
Octave,full,2000,2.726
51+
Octave,full,5000,6.406
52+
Octave,full,10000,12.743
53+
Octave,full,20000,23.733
54+
Octave,full,50000,60.246
55+
Octave,full,100000,123.488

0 commit comments

Comments
 (0)