From cc5464e7b611e04df056083c709b88491c841273 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jacob=20Kr=C3=BCger?=
 <32985809+jacobkrueger@users.noreply.github.com>
Date: Mon, 3 Mar 2025 11:35:37 +0100
Subject: [PATCH] Update Experiments.md to cover incentives as essential
 attrbiutes

Incentives impact behavior, it is important to report their use or non-use, independently of their concrete form. Added two references to a guideline and example showcasing/explaining this in detail. Updated footnotes accordingly.
---
 docs/standards/Experiments.md | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/docs/standards/Experiments.md b/docs/standards/Experiments.md
index d3ccbad..a862dbc 100644
--- a/docs/standards/Experiments.md
+++ b/docs/standards/Experiments.md
@@ -42,7 +42,8 @@ Science Standard** or the **Engineering Research Standard**.
     OR: provides compelling justification for not using random assignment and explains how unequal groups threat to validity is mitigated (e.g. using pre-test/post-test and matched subjects design)
 - [ ]	describes experimental objects (e.g. real or toy system) and their characteristics (e.g. size, type)
 - [ ]	justifies selection of experimental objects; acknowledges object-treatment confounds, if any<sup><a class="footnote footnote_ref">1</a></sup>
-- [ ]	design and protocol appropriate (not optimal) for stated research questions and hypotheses 
+- [ ]	design and protocol appropriate (not optimal) for stated research questions and hypotheses
+- [ ]	reports whether, and if what form of, financial or non-financial incentives have been used<sup><a class="footnote footnote-ref">2</a></sup>
 
 <results>
 
@@ -67,12 +68,12 @@ Science Standard** or the **Engineering Research Standard**.
 - [ ]	discusses alternative experimental designs and why they were not used (e.g. validity trade-offs)
 - [ ]	includes visualizations of data distributions
 - [ ]	cites statistics research to support any nuanced issues or unusual approaches
-- [ ]	explains deviations between design and execution, and their implications<sup><a class="footnote footnote_ref">2</a></sup>
+- [ ]	explains deviations between design and execution, and their implications<sup><a class="footnote footnote_ref">3</a></sup>
 - [ ]	named experiment design (e.g. simple 2-group, 2x2 factorial, randomized block)
 - [ ]	analyzes construct validity of dependent variable
 - [ ]	reports manipulation checks
 - [ ]	pre-registration of hypotheses and design (where venue allows)
-- [ ]	clearly distinguishes evidence-based results from interpretations and speculation<sup><a class="footnote footnote_ref">3</a></sup>
+- [ ]	clearly distinguishes evidence-based results from interpretations and speculation<sup><a class="footnote footnote_ref">4</a></sup>
 </checklist>
      
 ### Extraordinary Attributes
@@ -96,7 +97,7 @@ objectivity, reproducibility
 
 -   using bad proxies for dependent variables (e.g. task completion time
     as a proxy for task complexity)
--   quasi-experiments without a good reason<sup><a class="footnote footnote_ref">4</a></sup>
+-   quasi-experiments without a good reason<sup><a class="footnote footnote_ref">5</a></sup>
 -   treatments or response variables are poorly described
 -   inappropriate design for the conditions under which the experiment
     took place
@@ -121,6 +122,8 @@ objectivity, reproducibility
 
 ## Exemplars
 
+Dmitri Bershadskyy, Jacob Krüger, Gül  Çalıklı, Siegmar Otto, Sarah Zabel, Jannik Greif, and Robert Heyer. A Laboratory Experiment on Using Different Financial-Incentivization Schemes in Software-Engineering Experimentation. _PeerJ Computer Science_. (2025).
+
 Eduard P. Enoiu, Adnan Cauevic, Daniel Sundmark, and Paul Pettersson. 2016. A controlled experiment in testing of safety-critical embedded software. In *2016 IEEE International Conference on Software Testing, Verification and Validation (ICST),* 11-15 April, Chicago, IL, USA. IEEE. 1-11.
     
 Evrim Itir Karac, Burak Turhan, and Natalia Juristo. 2019. A Controlled
@@ -169,6 +172,8 @@ Vigdis By Kampenes, Tore Dybå, Jo E. Hannay, and Dag IK Sjøberg. 2009. A syste
 
 Kitchenham, Barbara, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust statistical methods for empirical software engineering. _Empirical Software Engineering_. 22, 2 (2018), 579-630.
 
+Jacob Krüger, Gül  Çalıklı, Dmitri Bershadskyy, Siegmar Otto, Sarah Zabel, and Robert Heyer. Guidelines for Using Financial Incentives in Software-Engineering Experimentation. _Empirical Software Engineering_. 29, 135 (2024), 1-53. DOI: 10.1007/s10664-024-10517-w
+
 Martín Solari, Sira Vegas, and Natalia Juristo. 2018. Content and structure of laboratory packages for software engineering experiments. _Information and Software Technology_. 97, 64-79.
 
 Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Science & Business Media.
@@ -180,7 +185,8 @@ Andreas Zeller, Thomas Zimmermann, and Christian Bird. 2011. Failure is a four-l
 
 ---
 <footnote><sup><a class="footnote footnote_text">1</a></sup>For example, in an experiment where the control group applies Test-Driven Development (TDD) with Object 1 while the treatment group applies Test-Last-Development (TDD) with Object 2, the experimental object is confounded with the treatment.</footnote><br>
-<footnote><sup><a class="footnote footnote_text">2</a></sup>e.g. dropouts affecting balance between treatment and control group.</footnote><br>
-<footnote><sup><a class="footnote footnote_text">3</a></sup>Simply separating results and discussion into different sections is typically sufficient. No speculation in the results section.</footnote><br>
-<footnote><sup><a class="footnote footnote_text">4</a></sup>Quasi-experiments are appropriate for pilot studies or when assignment is beyond the researcher’s control (e.g. assigning students to two different sections of a course). Simply claiming that a study is “exploratory” is not sufficient justification.</footnote><br>
+<footnote><sup><a class="footnote footnote-text">2</a></sup>For guidelines/examples on deciding whether to employ financial incentives, their benefits (over non-financial incentives), how to design them, and their reporting see Krüger et al. 2024 and Bershadskyy et al. 2025.</footnote><br>
+<footnote><sup><a class="footnote footnote_text">3</a></sup>e.g. dropouts affecting balance between treatment and control group.</footnote><br>
+<footnote><sup><a class="footnote footnote_text">4</a></sup>Simply separating results and discussion into different sections is typically sufficient. No speculation in the results section.</footnote><br>
+<footnote><sup><a class="footnote footnote_text">5</a></sup>Quasi-experiments are appropriate for pilot studies or when assignment is beyond the researcher’s control (e.g. assigning students to two different sections of a course). Simply claiming that a study is “exploratory” is not sufficient justification.</footnote><br>
 </standard>