Skip to content

Update Experiments.md to cover incentives as essential attrbiutes #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 13 additions & 7 deletions docs/standards/Experiments.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ Science Standard** or the **Engineering Research Standard**.
OR: provides compelling justification for not using random assignment and explains how unequal groups threat to validity is mitigated (e.g. using pre-test/post-test and matched subjects design)
- [ ] describes experimental objects (e.g. real or toy system) and their characteristics (e.g. size, type)
- [ ] justifies selection of experimental objects; acknowledges object-treatment confounds, if any<sup><a class="footnote footnote_ref">1</a></sup>
- [ ] design and protocol appropriate (not optimal) for stated research questions and hypotheses
- [ ] design and protocol appropriate (not optimal) for stated research questions and hypotheses
- [ ] reports whether, and if what form of, financial or non-financial incentives have been used<sup><a class="footnote footnote-ref">2</a></sup>

<results>

Expand All @@ -67,12 +68,12 @@ Science Standard** or the **Engineering Research Standard**.
- [ ] discusses alternative experimental designs and why they were not used (e.g. validity trade-offs)
- [ ] includes visualizations of data distributions
- [ ] cites statistics research to support any nuanced issues or unusual approaches
- [ ] explains deviations between design and execution, and their implications<sup><a class="footnote footnote_ref">2</a></sup>
- [ ] explains deviations between design and execution, and their implications<sup><a class="footnote footnote_ref">3</a></sup>
- [ ] named experiment design (e.g. simple 2-group, 2x2 factorial, randomized block)
- [ ] analyzes construct validity of dependent variable
- [ ] reports manipulation checks
- [ ] pre-registration of hypotheses and design (where venue allows)
- [ ] clearly distinguishes evidence-based results from interpretations and speculation<sup><a class="footnote footnote_ref">3</a></sup>
- [ ] clearly distinguishes evidence-based results from interpretations and speculation<sup><a class="footnote footnote_ref">4</a></sup>
</checklist>

### Extraordinary Attributes
Expand All @@ -96,7 +97,7 @@ objectivity, reproducibility

- using bad proxies for dependent variables (e.g. task completion time
as a proxy for task complexity)
- quasi-experiments without a good reason<sup><a class="footnote footnote_ref">4</a></sup>
- quasi-experiments without a good reason<sup><a class="footnote footnote_ref">5</a></sup>
- treatments or response variables are poorly described
- inappropriate design for the conditions under which the experiment
took place
Expand All @@ -121,6 +122,8 @@ objectivity, reproducibility

## Exemplars

Dmitri Bershadskyy, Jacob Krüger, Gül  Çalıklı, Siegmar Otto, Sarah Zabel, Jannik Greif, and Robert Heyer. A Laboratory Experiment on Using Different Financial-Incentivization Schemes in Software-Engineering Experimentation. _PeerJ Computer Science_. (2025).

Eduard P. Enoiu, Adnan Cauevic, Daniel Sundmark, and Paul Pettersson. 2016. A controlled experiment in testing of safety-critical embedded software. In *2016 IEEE International Conference on Software Testing, Verification and Validation (ICST),* 11-15 April, Chicago, IL, USA. IEEE. 1-11.

Evrim Itir Karac, Burak Turhan, and Natalia Juristo. 2019. A Controlled
Expand Down Expand Up @@ -169,6 +172,8 @@ Vigdis By Kampenes, Tore Dybå, Jo E. Hannay, and Dag IK Sjøberg. 2009. A syste

Kitchenham, Barbara, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust statistical methods for empirical software engineering. _Empirical Software Engineering_. 22, 2 (2018), 579-630.

Jacob Krüger, Gül  Çalıklı, Dmitri Bershadskyy, Siegmar Otto, Sarah Zabel, and Robert Heyer. Guidelines for Using Financial Incentives in Software-Engineering Experimentation. _Empirical Software Engineering_. 29, 135 (2024), 1-53. DOI: 10.1007/s10664-024-10517-w

Martín Solari, Sira Vegas, and Natalia Juristo. 2018. Content and structure of laboratory packages for software engineering experiments. _Information and Software Technology_. 97, 64-79.

Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012. Experimentation in Software Engineering. Springer Science & Business Media.
Expand All @@ -180,7 +185,8 @@ Andreas Zeller, Thomas Zimmermann, and Christian Bird. 2011. Failure is a four-l

---
<footnote><sup><a class="footnote footnote_text">1</a></sup>For example, in an experiment where the control group applies Test-Driven Development (TDD) with Object 1 while the treatment group applies Test-Last-Development (TDD) with Object 2, the experimental object is confounded with the treatment.</footnote><br>
<footnote><sup><a class="footnote footnote_text">2</a></sup>e.g. dropouts affecting balance between treatment and control group.</footnote><br>
<footnote><sup><a class="footnote footnote_text">3</a></sup>Simply separating results and discussion into different sections is typically sufficient. No speculation in the results section.</footnote><br>
<footnote><sup><a class="footnote footnote_text">4</a></sup>Quasi-experiments are appropriate for pilot studies or when assignment is beyond the researcher’s control (e.g. assigning students to two different sections of a course). Simply claiming that a study is “exploratory” is not sufficient justification.</footnote><br>
<footnote><sup><a class="footnote footnote-text">2</a></sup>For guidelines/examples on deciding whether to employ financial incentives, their benefits (over non-financial incentives), how to design them, and their reporting see Krüger et al. 2024 and Bershadskyy et al. 2025.</footnote><br>
<footnote><sup><a class="footnote footnote_text">3</a></sup>e.g. dropouts affecting balance between treatment and control group.</footnote><br>
<footnote><sup><a class="footnote footnote_text">4</a></sup>Simply separating results and discussion into different sections is typically sufficient. No speculation in the results section.</footnote><br>
<footnote><sup><a class="footnote footnote_text">5</a></sup>Quasi-experiments are appropriate for pilot studies or when assignment is beyond the researcher’s control (e.g. assigning students to two different sections of a course). Simply claiming that a study is “exploratory” is not sufficient justification.</footnote><br>
</standard>