From cc5464e7b611e04df056083c709b88491c841273 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jacob=20Kr=C3=BCger?=
<32985809+jacobkrueger@users.noreply.github.com>
Date: Mon, 3 Mar 2025 11:35:37 +0100
Subject: [PATCH] Update Experiments.md to cover incentives as essential
attrbiutes
Incentives impact behavior, it is important to report their use or non-use, independently of their concrete form. Added two references to a guideline and example showcasing/explaining this in detail. Updated footnotes accordingly.
---
docs/standards/Experiments.md | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/docs/standards/Experiments.md b/docs/standards/Experiments.md
index d3ccbad..a862dbc 100644
--- a/docs/standards/Experiments.md
+++ b/docs/standards/Experiments.md
@@ -42,7 +42,8 @@ Science Standard** or the **Engineering Research Standard**.
OR: provides compelling justification for not using random assignment and explains how unequal groups threat to validity is mitigated (e.g. using pre-test/post-test and matched subjects design)
- [ ] describes experimental objects (e.g. real or toy system) and their characteristics (e.g. size, type)
- [ ] justifies selection of experimental objects; acknowledges object-treatment confounds, if any
-- [ ] design and protocol appropriate (not optimal) for stated research questions and hypotheses
+- [ ] design and protocol appropriate (not optimal) for stated research questions and hypotheses
+- [ ] reports whether, and if what form of, financial or non-financial incentives have been used
@@ -67,12 +68,12 @@ Science Standard** or the **Engineering Research Standard**.
- [ ] discusses alternative experimental designs and why they were not used (e.g. validity trade-offs)
- [ ] includes visualizations of data distributions
- [ ] cites statistics research to support any nuanced issues or unusual approaches
-- [ ] explains deviations between design and execution, and their implications
+- [ ] explains deviations between design and execution, and their implications
- [ ] named experiment design (e.g. simple 2-group, 2x2 factorial, randomized block)
- [ ] analyzes construct validity of dependent variable
- [ ] reports manipulation checks
- [ ] pre-registration of hypotheses and design (where venue allows)
-- [ ] clearly distinguishes evidence-based results from interpretations and speculation
+- [ ] clearly distinguishes evidence-based results from interpretations and speculation
### Extraordinary Attributes
@@ -96,7 +97,7 @@ objectivity, reproducibility
- using bad proxies for dependent variables (e.g. task completion time
as a proxy for task complexity)
-- quasi-experiments without a good reason
+- quasi-experiments without a good reason
- treatments or response variables are poorly described
- inappropriate design for the conditions under which the experiment
took place
@@ -121,6 +122,8 @@ objectivity, reproducibility
## Exemplars
+Dmitri Bershadskyy, Jacob Krüger, Gül Çalıklı, Siegmar Otto, Sarah Zabel, Jannik Greif, and Robert Heyer. A Laboratory Experiment on Using Different Financial-Incentivization Schemes in Software-Engineering Experimentation. _PeerJ Computer Science_. (2025).
+
Eduard P. Enoiu, Adnan Cauevic, Daniel Sundmark, and Paul Pettersson. 2016. A controlled experiment in testing of safety-critical embedded software. In *2016 IEEE International Conference on Software Testing, Verification and Validation (ICST),* 11-15 April, Chicago, IL, USA. IEEE. 1-11.
Evrim Itir Karac, Burak Turhan, and Natalia Juristo. 2019. A Controlled
@@ -169,6 +172,8 @@ Vigdis By Kampenes, Tore Dybå, Jo E. Hannay, and Dag IK Sjøberg. 2009. A syste
Kitchenham, Barbara, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust statistical methods for empirical software engineering. _Empirical Software Engineering_. 22, 2 (2018), 579-630.
+Jacob Krüger, Gül Çalıklı, Dmitri Bershadskyy, Siegmar Otto, Sarah Zabel, and Robert Heyer. Guidelines for Using Financial Incentives in Software-Engineering Experimentation. _Empirical Software Engineering_. 29, 135 (2024), 1-53. DOI: 10.1007/s10664-024-10517-w
+
Martín Solari, Sira Vegas, and Natalia Juristo. 2018. Content and structure of laboratory packages for software engineering experiments. _Information and Software Technology_. 97, 64-79.
Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Science & Business Media.
@@ -180,7 +185,8 @@ Andreas Zeller, Thomas Zimmermann, and Christian Bird. 2011. Failure is a four-l
---
For example, in an experiment where the control group applies Test-Driven Development (TDD) with Object 1 while the treatment group applies Test-Last-Development (TDD) with Object 2, the experimental object is confounded with the treatment.
-e.g. dropouts affecting balance between treatment and control group.
-Simply separating results and discussion into different sections is typically sufficient. No speculation in the results section.
-Quasi-experiments are appropriate for pilot studies or when assignment is beyond the researcher’s control (e.g. assigning students to two different sections of a course). Simply claiming that a study is “exploratory” is not sufficient justification.
+For guidelines/examples on deciding whether to employ financial incentives, their benefits (over non-financial incentives), how to design them, and their reporting see Krüger et al. 2024 and Bershadskyy et al. 2025.
+e.g. dropouts affecting balance between treatment and control group.
+Simply separating results and discussion into different sections is typically sufficient. No speculation in the results section.
+Quasi-experiments are appropriate for pilot studies or when assignment is beyond the researcher’s control (e.g. assigning students to two different sections of a course). Simply claiming that a study is “exploratory” is not sufficient justification.