009_meta3_ml4_papaja_results.Rmd

---
title             : "ML4 Results Section in rMarkdown"
shorttitle        : "ML4 Results"

author: 
  - name          : "Richard A. Klein"
    affiliation   : "1"
    corresponding : yes    # Define only one corresponding author
    address       : ""
    email         : "raklein22@gmail.com"

affiliation:
  - id            : "1"
    institution   : "Université Grenoble Alpes"

authornote: |
  This script generates the participants + results sections for the main ML4 manuscript. To knit this document you must install the papaja package from GitHub.

abstract: |

keywords          : "Terror Management Theory, mortality salience, replication, many labs"
wordcount         : "X"

bibliography      : ["r-references.bib"]

floatsintext      : no
figurelist        : no
tablelist         : no
footnotelist      : yes
linenumbers       : no
mask              : no
draft             : no

documentclass     : "apa6"
classoption       : "man"
output            : papaja::apa6_word
---

```{r setup, include = FALSE}
# load packages
library("papaja")
library("metafor")
library("metaSEM")
library("haven")
library("psych")
library("dplyr")
library("effsize")
library("GPArotation")
library("tidyverse")

# source functions locally
source("sources/numbers2words.r")

firstup <- function(x) {
  substr(x, 1, 1) <- toupper(substr(x, 1, 1))
  x
}
```

```{r analysis-preferences}
# Seed for random number generation
set.seed(1)
knitr::opts_chunk$set(cache.extra = knitr::rand_seed)
```

```{r analysis-loaddata, include = FALSE}
# Reading in all necessary data
# Note: Some analyses require confidential (for participant identification concerns) data files. You'll have to comment out those lines for this file to knit. Contact Rick raklein22@gmail.com for information about getting the raw data, which usually simply requires the researcher to obtain IRB/ethics approval from their university stating they will maintain the confidentiality of any sensitive data.

# Primary data file with replication data aggregated across labs (deidentified is missing just a couple variables for this -- age and gender)
# Note: This file is the merged data provided by sites, produced by 001_data_cleaning.R
merged <- readRDS("./data/processed_data/merged.rds")
# Alternatively, you can run it with the public data and get most results. The problem with RMarkdown is that the script
# needs to run completely with no errors or it won't render. So, if you do this, you'll need to comment out any
# sections that require sensitive data.
#merged <- readRDS("./data/public/merged_deidentified.rds")

# Read in data from experimenter survey (Google Form - private due to sensitive info)
exp_surv <- readRDS("./data/raw_site_data/experimenter survey/exp_surv.rds")
```

```{r analysis-participants, include = FALSE}

# The 'merged' df includes all data provided by sites
# I'm going to retain it in case we need to refer to it later, and then
# apply the study-wide exclusion criteria noted below.

merged_original <- merged

# Aggregate participants characteristics
# Converting to numeric
merged$age <- as.numeric(as.character(merged$age))
merged$gender <- as.numeric(as.character(merged$gender))
merged$race <- as.numeric(as.character(merged$race))

# Applying exclusion criteria 1
# 1. Wrote something for both writing prompts
merged <- subset(merged, (merged$msincomplete == 0 | is.na(merged$msincomplete)))
# 2. Completed all six items evaluating the essay authors)
merged <- subset(merged, (!is.na(merged$prous3) & !is.na(merged$prous4) & !is.na(merged$prous5) & !is.na(merged$antius3) & !is.na(merged$antius4) & !is.na(merged$antius5)))

n_woman <- length(which(merged$gender== 1)) #number of women
n_man <- length(which(merged$gender== 2)) #number of men
n_other <- length(which(merged$gender== 3)) #other responses

n_woman_pct <- length(which(merged$gender== 1))/nrow(merged)*100 #pct women
n_man_pct <- length(which(merged$gender== 2))/nrow(merged)*100 #pct men
n_other_pct <- length(which(merged$gender== 3))/nrow(merged)*100 #pct other/blank

n_white <- length(which(merged$race == 1)) #num White
n_white_pct <- length(which(merged$race == 1))/nrow(merged)*100 #percent White, using the length of the source variable (assigned to all sessions) for total N
n_black <- length(which(merged$race == 2)) #num Black or African American 
n_black_pct <- length(which(merged$race == 2))/nrow(merged)*100 #percent Black
n_aian <- length(which(merged$race == 3)) #num American Indian or Alaska Native
n_aian_pct <- length(which(merged$race == 3))/nrow(merged)*100 #percent American Indian/Alaska Native
n_asian <- length(which(merged$race == 4)) #num Asian
n_asian_pct <- length(which(merged$race == 4))/nrow(merged)*100 #percent Asian
n_haw <- length(which(merged$race == 5)) #num Native Hawaiian or Pacific Islander
n_haw_pct <- length(which(merged$race == 5))/nrow(merged)*100 #percent Native Hawaiian or Pacific Islander
n_other <- length(which(merged$race == 6)) #num Other
n_other_pct <- length(which(merged$race == 6))/nrow(merged)*100 #percent other


```

## Participants

`r length(unique(merged_original$source))` labs participated and provided a total sample of `r formatC(nrow(merged_original), big.mark = ",")` participants. In accordance with the pre-registration (https://osf.io/4xx6w), we immediately excluded from all analyses participants who either failed to complete all 6 ratings of the essay authors, or who failed to complete both writing prompts within the mortality salience or control conditions (e.g., the between-subjects manipulation).[^1] Thus, the usable sample included `r formatC(nrow(merged), big.mark = ",")` participants (see Table 1 for a summary of sites). `r formatC(n_woman, big.mark = ",")` participants (`r n_woman_pct`%) reported being female and `r formatC(n_man, big.mark = ",")` participants (`r n_man_pct`%) reported being male; the remaining participants did not respond to the item, were asked about gender in a non-standard way, or chose a different response. The mean age was `r mean(merged$age, na.rm=TRUE)` years (*SD* = `r sd(merged$age, na.rm=TRUE)`). Participant reported race was `r n_white` (`r n_white_pct`%) White, `r n_asian` (`r n_asian_pct`%) Asian, `r n_black` (`r n_black_pct`%) Black or African American, `r n_aian` (`r n_aian_pct`%) American Indian or Alaska Native, `r n_haw` (`r n_haw_pct`%) Native Hawaiian or Pacific Islander, `r n_other` (`r n_other_pct`%) Other. The remaining participants did not report their race, or responses were not easily recoded to match these categories.

[^1]: The latter exclusion criteria applied only to participants from Author Advised sites, because the necessary data was not always available for In House sites.


```{r analysis-alpha-exclusions, include = FALSE}
alpha_anti <- psych::alpha(select(merged, antius3, antius4, antius5))
alpha_pro <- psych::alpha(select(merged, prous3, prous4, prous5))

# Tracking exclusions:
# 'merged_original' is all data, no exclusions
# 'merged' is basic exclusions (exclusion set 1 below). Implemented in the participants code chunk at the beginning.
# 'merged_excl_2' further excludes participants as per exclusion set 2 (below)
merged_excl_2 <- subset(merged, (merged$race == 1 & merged$countryofbirth == 1) | merged$expert == 0)

# 'merged_excl_3' further excludes participants as per exclusion set 3 (below)

merged_excl_3 <- subset(merged_excl_2, merged_excl_2$americanid >= 7 | merged_excl_2$expert == 0)
```

## Analysis Plan
The primary finding of interest from Greenberg et al., (1994) was that participants who underwent the mortality salience treatment showed greater preference for the pro-US essay author over the anti-US essay author compared to the control condition. To assess whether the replication results support the original, we followed a similar analysis plan as in the original article. Scores from the three items evaluating the authors of the anti-American essays were averaged ($\alpha$ = `r alpha_anti$total$std.alpha`) and then subtracted from the average of the three items evaluating authors of the pro-American essays ($\alpha$ = `r alpha_pro$total$std.alpha`).[^2] An independent-samples *t*-test was then conducted comparing those in the “subtle own death salient” (MS) condition with scores from the “TV salient” (control) condition. Some labs administered both Author Advised and In House protocols. To account for this nesting of effect sizes within labs, a three-level random-effects meta-analysis was conducted using the MetaSEM package (Cheung, 2014) in R (R Core Team, 2019).

Original authors were not entirely in agreement about what exclusions should be implemented. So, we repeated our analyses under different exclusion criteria suggested by original authors:

[^2]: Supplemental analyses treating these as two separate dependent variables are available in the online supplement (https://osf.io/xtg4u/), and those outcomes do not qualify the conclusions offered here.

*Exclusion Set 1:* Include all participants who completed the materials (e.g., wrote something for both writing prompts, and completed all six items evaluating the essay authors). Reduces the usable N from `r formatC(nrow(merged_original), big.mark = ",")` to `r formatC(nrow(merged), big.mark = ",")` participants. This sample size gives us 95% power to detect a condition effect of *d* = .15 in an independent samples *t*-test.  

*Exclusion Set 2:* All prior exclusions, and further exclude participants who did not identify as White or who indicated they were born outside the United States. Reduces *N* to `r formatC(nrow(merged_excl_2), big.mark = ",")`. This sample size gives us 95% power to detect a condition effect of *d* = .16.  

*Exclusion Set 3:* All prior exclusions, and further exclude participants who responded lower than 7 on the American Identity item ("How important to you is your identity as an American?" 1 - not at all important; 9 - extremely important). Further reduces the usable *N* to `r formatC(nrow(merged_excl_3), big.mark = ",")` participants. This sample size gives us 95% power to detect a condition effect of *d* = .18.  

Exclusion Sets 2 and 3 were specifically recommended by original authors and these criteria were used to analyze the data from Author Advised labs. However, the data required to make these exclusions were often not collected at In House replication sites because they made independent decisions about design and demographic measures for potential exclusion, and these measures were not in the original article. Thus, for all analyses only Exclusion Set 1 was used for In House participants. All analysis plans and procedures were pre-registered on the OSF prior to data collection (https://osf.io/4xx6w).

# Results

```{r analysis-researchers, include = FALSE}

# Experimenter knowledge code is messy, so I'm computing it here and retrieving
# objects in paragraph below.

exp_knowl_expert <- sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "Expert (highly knowledgeable, know the published literature and more)")
exp_knowl_alot <- sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "A lot (know theory in-depth, read many papers)")
exp_knowl_some <- sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "Some (familiar with, read a few papers)")
exp_knowl_alittle <- sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "A little (know about TMT, but not in-depth)")
exp_knowl_none <- sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "None (never heard of TMT until this project)")
exp_knowl_na <- sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "")

exp_knowl_expert_pct <- (sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "Expert (highly knowledgeable, know the published literature and more)")/nrow(exp_surv))*100
exp_knowl_alot_pct <- (sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "A lot (know theory in-depth, read many papers)")/nrow(exp_surv))*100
exp_knowl_some_pct <- (sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "Some (familiar with, read a few papers)")/nrow(exp_surv))*100
exp_knowl_alittle_pct <- (sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "A little (know about TMT, but not in-depth)")/nrow(exp_surv))*100
exp_knowl_none_pct <- (sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "None (never heard of TMT until this project)")/nrow(exp_surv))*100
exp_knowl_na_pct <- (sum(exp_surv$How.much.knowledge.did.you.have.about.Terror.Management.Theory..prior.to.joining.this.project. == "")/nrow(exp_surv))*100

# Experimenter "rooting for/against" data are messy, so I'll code them here then call the result below
# Manually recoding some responses, providing coding here for verifiability
# Copy over all responses
exp_surv$rooting_coded <- exp_surv$Overall..are.you.rooting.for.the.cumulative.results.of.this.project.to.provide.evidence.for.or.against.TMT.
exp_surv$rooting_coded[exp_surv$Overall..are.you.rooting.for.the.cumulative.results.of.this.project.to.provide.evidence.for.or.against.TMT. == "Happy either way.  Though for simple fear of our field imploding I root for replication!"] <- "Neither"
exp_surv$rooting_coded[exp_surv$Overall..are.you.rooting.for.the.cumulative.results.of.this.project.to.provide.evidence.for.or.against.TMT. == "I'm rooting finding the true state of the world"] <- "Neither"
exp_surv$rooting_coded[exp_surv$Overall..are.you.rooting.for.the.cumulative.results.of.this.project.to.provide.evidence.for.or.against.TMT. == "I am ambivalent. "] <- "Neither"
exp_surv$rooting_coded[exp_surv$Overall..are.you.rooting.for.the.cumulative.results.of.this.project.to.provide.evidence.for.or.against.TMT. == "No dog in that race (ie no preference at all)"] <- "Neither"
exp_surv$rooting_coded[exp_surv$Overall..are.you.rooting.for.the.cumulative.results.of.this.project.to.provide.evidence.for.or.against.TMT. == "No opinion"] <- "Neither"
exp_surv$rooting_coded[exp_surv$Overall..are.you.rooting.for.the.cumulative.results.of.this.project.to.provide.evidence.for.or.against.TMT. == "No preference! "] <- "Neither"
exp_surv$rooting_coded[exp_surv$Overall..are.you.rooting.for.the.cumulative.results.of.this.project.to.provide.evidence.for.or.against.TMT. == "no real view, though very interested to see if my site finds it less than others as we are not very \"pro-America\""] <- "Neither"
exp_surv$rooting_coded[exp_surv$Overall..are.you.rooting.for.the.cumulative.results.of.this.project.to.provide.evidence.for.or.against.TMT. == "I don't care"] <- "Neither"

# Counts rooting for, against, and neither
exp_rooting_for <- sum(exp_surv$rooting_coded == "For TMT")
exp_rooting_against <- sum(exp_surv$rooting_coded == "Against TMT")
exp_rooting_neither <- sum(exp_surv$rooting_coded == "Neither")
exp_rooting_na <-sum(exp_surv$rooting_coded == "")

# percentages
exp_rooting_for_pct <- (sum(exp_surv$rooting_coded == "For TMT")/nrow(exp_surv))*100
exp_rooting_against_pct <- (sum(exp_surv$rooting_coded == "Against TMT")/nrow(exp_surv))*100
exp_rooting_neither_pct <- (sum(exp_surv$rooting_coded == "Neither")/nrow(exp_surv))*100
exp_rooting_na_pct <- (sum(exp_surv$rooting_coded == "")/nrow(exp_surv))*100

mean_success_estimate_excl <- filter(exp_surv, Did.you.analyze.data..or.otherwise.learn.results..from.your.site.before.filling.out.this.survey. == "No") %>% 
  summarize(mean = mean(In.your.opinion..how.likely.is.it.that.overall.this.project..Many.Labs.4..will.successfully.replicate.Terror.Management.Theory...please.enter.a...between.0.and.100., na.rm = TRUE))

```

## Researcher Expectations and Characteristics
A total of `r nrow(exp_surv)` researchers from `r length(unique(exp_surv$Site))` participating sites completed an experimenter survey about their motivations and expertise. This survey was administered during data collection, and although no researcher had access to overall project-wide results, ~⅓ of the researchers reported looking at or analyzing their own site’s data prior to completing the survey. Psychology research experience ranged from `r min(exp_surv$How.many.years.of.experience.do.you.have.in.psychological.research.)` to `r max(exp_surv$How.many.years.of.experience.do.you.have.in.psychological.research.)` years (*M* = `r mean(exp_surv$How.many.years.of.experience.do.you.have.in.psychological.research.)`, *SD* = `r sd(exp_surv$How.many.years.of.experience.do.you.have.in.psychological.research.)`). `r firstup(numbers2words(exp_knowl_expert))` (`r round(exp_knowl_expert_pct)`%) researcher indicated they were an expert in TMT, `r numbers2words(exp_knowl_alot)` (`r round(exp_knowl_alot_pct)`%) indicated they had “a lot” of TMT knowledge, `r numbers2words(exp_knowl_some)` (`r round(exp_knowl_some_pct)`%) indicated “some” knowledge, `r numbers2words(exp_knowl_alittle)` (`r round(exp_knowl_alittle_pct)`%) indicated little knowledge, `r numbers2words(exp_knowl_none)` (`r round(exp_knowl_none_pct)`%) indicated zero knowledge, and `r numbers2words(exp_knowl_na)` (`r round(exp_knowl_na_pct)`%) did not respond to the question. 

When asked what outcome they wanted to happen, `r exp_rooting_for` (`r round(exp_rooting_for_pct)`%) indicated that they hoped for the project to successfully replicate the TMT effect, `r numbers2words(exp_rooting_neither)` (`r round(exp_rooting_neither_pct)`%) indicated no preference, and `r numbers2words(exp_rooting_against)` (`r round(exp_rooting_against_pct)`%) hoped the project would result in a failure to replicate, with `r numbers2words(exp_rooting_na)` (`r round(exp_rooting_na_pct)`%) researchers leaving the question blank. On average, the teams estimated a `r round(mean(exp_surv$In.your.opinion..how.likely.is.it.that.overall.this.project..Many.Labs.4..will.successfully.replicate.Terror.Management.Theory...please.enter.a...between.0.and.100., na.rm = TRUE))`% chance of successful replication with a wide range of estimates from `r min(exp_surv$In.your.opinion..how.likely.is.it.that.overall.this.project..Many.Labs.4..will.successfully.replicate.Terror.Management.Theory...please.enter.a...between.0.and.100., na.rm = TRUE)`% to `r max(exp_surv$In.your.opinion..how.likely.is.it.that.overall.this.project..Many.Labs.4..will.successfully.replicate.Terror.Management.Theory...please.enter.a...between.0.and.100., na.rm = TRUE)`% (*SD* = `r sd(exp_surv$In.your.opinion..how.likely.is.it.that.overall.this.project..Many.Labs.4..will.successfully.replicate.Terror.Management.Theory...please.enter.a...between.0.and.100., na.rm = TRUE)`).[^3]

[^3]: Including only sites that had not looked at any data, researchers estimated a `r round(mean_success_estimate_excl[[1]])`% chance of successful replication.

```{r analysis-replication-meta, include = FALSE}

# In 002_ml4analysis.R I generate results per site, using the same analysis
# but each of the three exclusion rules. Those are output to:
# ./data/public/combinedresults1.csv
# ./data/public/combinedresults2.csv
# ./data/public/combinedresults3.csv

# Here I'll read in those files, but if you're error checking you'll also
# want to review the code in 002_ml4analysis.R that generates them

combinedresults0 <- read.csv("./data/public/combinedresults0.csv")
combinedresults1 <- read.csv("./data/public/combinedresults1.csv")
combinedresults2 <- read.csv("./data/public/combinedresults2.csv")
combinedresults3 <- read.csv("./data/public/combinedresults3.csv")

# analyses repeated for each set of exclusion criteria
# three-level random-effects meta-analysis in MetaSEM
# summary( meta3(y=yi, v=vi, cluster=location, data=combinedresults0)) #line not necessary, results for a subset we never use (e.g., zero exclusions)
random_effects_1 <- summary(meta3(y=yi, v=vi, cluster=location, data=combinedresults1))
random_effects_2 <- summary(meta3(y=yi, v=vi, cluster=location, data=combinedresults2))
random_effects_3 <- summary(meta3(y=yi, v=vi, cluster=location, data=combinedresults3))
#Notes: Q statistic is for sig test for heterogeneity among all effect sizes. I2 for level 2 indicates the percent of total variance explained by effects within sites, and I2 for level 3 indicates the percent of total variance accounted for by differences between sites. Intercept is the avg population effect. 

# a covariate of study version (in-house or expert-designed) is added to create a three-level mixed-effects meta-analysis
# note the openMX status, sometimes indicates a potential problem
# summary( mixed0 <- meta3(y=yi, v=vi, cluster=location, x=expert, data=combinedresults0)) #line not necessary, results for a subset we never use (e.g., zero exclusions)
mixed_effects_1 <- summary(mixed1 <- meta3(y=yi, v=vi, cluster=location, x=expert, data=combinedresults1))
mixed_effects_2 <- summary(mixed2 <- meta3(y=yi, v=vi, cluster=location, x=expert, data=combinedresults2))
mixed_effects_3 <- summary(mixed3 <- meta3(y=yi, v=vi, cluster=location, x=expert, data=combinedresults3))
# Notes: The R? for the version predictor will be reported for both level 2 and level 3, although in this case version is a level 2 predictor so the level 3 R? will always be zero. 

# constraining the variance to test if it significantly worsens the model
# summary( fixed0 <- meta3(y=yi, v=vi, cluster=location, x=expert, data=combinedresults0, RE2.constraints=0, RE3.constraints=0)) #line not necessary, results for a subset we never use (e.g., zero exclusions)
constrained_1 <- summary(fixed1 <- meta3(y=yi, v=vi, cluster=location, x=expert, data=combinedresults1, RE2.constraints=0, RE3.constraints=0))
constrained_2 <- summary(fixed2 <- meta3(y=yi, v=vi, cluster=location, x=expert, data=combinedresults2, RE2.constraints=0, RE3.constraints=0))
constrained_3 <- summary(fixed3 <- meta3(y=yi, v=vi, cluster=location, x=expert, data=combinedresults3, RE2.constraints=0, RE3.constraints=0))

# compare if there is a significant difference in model fit, chi square difference test
# anova(mixed0, fixed0)
fit_comparison_1 <- anova(mixed1, fixed1)
fit_comparison_2 <- anova(mixed2, fixed2)
fit_comparison_3 <- anova(mixed3, fixed3)
```

## Research Question 1: Meta-analytic results across all labs (random effects meta-analysis). 
The most basic question is whether we observed the predicted effect of mortality salience on preference for pro- vs anti- American essay authors. To assess this we conducted a three-level random-effects meta-analysis.[^4] This analysis produces the grand mean effect size across all sites and versions. Regardless of which exclusion criteria were used, we did not observe the predicted effect and the confidence interval was quite narrow: Exclusion Set 1: *Hedges’ g* = `r random_effects_1$coefficients$Estimate[1]`, 95% CI = [`r random_effects_1$coefficients$lbound[1]`, `r random_effects_1$coefficients$ubound[1]`], *SE* = `r random_effects_1$coefficients$Std.Error[1]`, *Z* = `r random_effects_1$coefficients$"z value"[1]`, *p* = `r random_effects_1$coefficients$"Pr(>|z|)"[1]`. Exclusion Set 2: *Hedges’ g* = `r random_effects_2$coefficients$Estimate[1]`, 95% CI = [`r random_effects_2$coefficients$lbound[1]`, `r random_effects_2$coefficients$ubound[1]`], *SE* = `r random_effects_2$coefficients$Std.Error[1]`, *Z* = `r random_effects_2$coefficients$"z value"[1]`, *p* = `r random_effects_2$coefficients$"Pr(>|z|)"[1]`. Exclusion Set 3: *Hedges’ g* = `r random_effects_3$coefficients$Estimate[1]`, 95% CI = [`r random_effects_3$coefficients$lbound[1]`, `r random_effects_3$coefficients$ubound[1]`], *SE* = `r random_effects_3$coefficients$Std.Error[1]`, *Z* = `r random_effects_3$coefficients$"z value"[1]`, *p* = `r random_effects_3$coefficients$"Pr(>|z|)"[1]`. Forest plots showing the effects for individual sites and the aggregate are available in Figure 1 for Exclusion Set 1 (see https://osf.io/8ccnw/ for the other two Exclusion Sets).

[^4]: Sample code to run this analysis is: meta3(y=es, v=var, cluster=Location, data=dataset). In this sample code, “y=es” directs the program to the column of effect sizes, “v=var” indicates the variable to be used as the sampling variance for each effect size, and the “cluster=Location” command groups the effect sizes by a location variable in the dataset (in this case, a unique identifier assigned to each replication site).

There may have been a mortality salience effect at some sites and not others, so we next examined how much variation was observed among effect sizes (e.g., heterogeneity). For Exclusion Sets 1 and 3, this sort of variation did not exceed variation expected by chance (e.g., sampling variance): Exclusion Set 1: *Q*(`r random_effects_1$Q.stat$Q.df`) = `r random_effects_1$Q.stat$Q`, *p* = `r random_effects_1$Q.stat$pval`; Exclusion Set 3: *Q*(`r random_effects_3$Q.stat$Q.df`) = `r random_effects_3$Q.stat$Q`, *p* = `r random_effects_3$Q.stat$pval`. The amount of variation between sites did exceed chance for Exclusion Set 2, *Q*(`r random_effects_2$Q.stat$Q.df`) = `r random_effects_2$Q.stat$Q`, *p* = `r random_effects_2$Q.stat$pval`, however it was small in magnitude, Tau^2^~within\ labs~ = `r random_effects_2$coefficients["Tau2_2", "Estimate"]`, Tau^2^~between\ labs~ = `r random_effects_2$coefficients["Tau2_3", "Estimate"]`. 

In sum, we observed little evidence for an overall effect of mortality salience in these replications. And, overall results suggest that there was minimal or no heterogeneity in effect sizes across sites. This lack of variation suggests that it is unlikely we will observe an effect of Author Advised versus In House protocols or other moderators such as differences in samples or TMT knowledge. Even so, the plausible moderation by Author Advised/In House protocol is examined in the following section. 

## Research Question 2: Moderation by Author Advised/In House protocol
A covariate of protocol type was added to the random effects model to create a three-level mixed-effects meta-analysis. This was pre-registered as our primary analysis.[^5] 

[^5]: The addition of the argument "x = version" to the prior metaSEM R code can be seen here: meta3(y=es, v=var, cluster=Location, x=version, data=dataset)

This analysis again produces an overall grand mean effect size, and those were again near zero and relatively precisely estimated across all three Exclusion Sets: Exclusion Set 1: *Hedges’ g* = `r mixed_effects_1$coefficients$Estimate[1]`, 95% CI = [`r mixed_effects_1$coefficients$lbound[1]`, `r mixed_effects_1$coefficients$ubound[1]`], *SE* = `r mixed_effects_1$coefficients$Std.Error[1]`, *Z* = `r mixed_effects_1$coefficients$"z value"[1]`, *p* = `r mixed_effects_1$coefficients$"Pr(>|z|)"[1]`. Exclusion Set 2: *Hedges’ g* = `r mixed_effects_2$coefficients$Estimate[1]`, 95% CI = [`r mixed_effects_2$coefficients$lbound[1]`, `r mixed_effects_2$coefficients$ubound[1]`], *SE* = `r mixed_effects_2$coefficients$Std.Error[1]`, *Z* = `r mixed_effects_2$coefficients$"z value"[1]`, *p* = `r mixed_effects_2$coefficients$"Pr(>|z|)"[1]`. Exclusion Set 3: *Hedges’ g* = `r mixed_effects_3$coefficients$Estimate[1]`, 95% CI = [`r mixed_effects_3$coefficients$lbound[1]`, `r mixed_effects_3$coefficients$ubound[1]`], *SE* = `r mixed_effects_3$coefficients$Std.Error[1]`, *Z* = `r mixed_effects_3$coefficients$"z value"[1]`, *p* = `r mixed_effects_3$coefficients$"Pr(>|z|)"[1]`.

Variation among effect sizes also followed the previously observed pattern. Weak heterogeneity for Exclusion Set 2, *Q*(`r mixed_effects_2$Q.stat$Q.df`) = `r mixed_effects_2$Q.stat$Q`, *p* = `r mixed_effects_2$Q.stat$pval`, Tau^2^~within\ labs~ = `r mixed_effects_2$coefficients["Tau2_2", "Estimate"]`, Tau^2^~between\ labs~ = `r mixed_effects_2$coefficients["Tau2_3", "Estimate"]`; while variation did not meet the statistical significance threshold for Exclusion Set 1 *Q*(`r mixed_effects_1$Q.stat$Q.df`) = `r mixed_effects_1$Q.stat$Q`, *p* = `r mixed_effects_1$Q.stat$pval`; or Exclusion Set 3: *Q*(`r mixed_effects_3$Q.stat$Q.df`) = `r mixed_effects_3$Q.stat$Q`, *p* = `r mixed_effects_3$Q.stat$pval`.

Critically, protocol version did not significantly predict replication effect size regardless of which exclusion criteria were used. Exclusion Set 1: *b* = `r mixed_effects_1$coefficients["Slope_1", "Estimate"]`, *Z* = `r mixed_effects_1$coefficients["Slope_1", "z value"]`, *p* = `r mixed_effects_1$coefficients["Slope_1", "Pr(>|z|)"]`; Exclusion Set 2: *b* = `r mixed_effects_2$coefficients["Slope_1", "Estimate"]`, *Z* = `r mixed_effects_2$coefficients["Slope_1", "z value"]`, *p* = `r mixed_effects_2$coefficients["Slope_1", "Pr(>|z|)"]`; Exclusion Set 3: *b* = `r mixed_effects_3$coefficients["Slope_1", "Estimate"]`, *Z* = `r mixed_effects_3$coefficients["Slope_1", "z value"]`, *p* = `r mixed_effects_3$coefficients["Slope_1", "Pr(>|z|)"]`. The Author Advised version did not produce larger effect sizes when compared with the In House versions.

## Research Question 3: Effect of Standardization
Finally, we tested whether In House protocols displayed greater variability in effect size than Author Advised protocols. To test this hypothesis, we ran the mixed-effects models but constrained the variances at both Level 2 and Level 3 to 0, effectively creating fixed-effects models. These models were then compared with a chi-squared differences test to assess whether the fit significantly changed. In this case, none of the three models significantly decreased in fit: Exclusion Set 1: *$\chi$²* (`r fit_comparison_1$diffdf[2]`) = `r fit_comparison_1$diffLL[2]`, *p* = `r fit_comparison_1$p[2]`; Exclusion Set 2: *$\chi$²* (`r fit_comparison_2$diffdf[2]`) = `r fit_comparison_2$diffLL[2]`, *p* = `r fit_comparison_2$p[2]`; Exclusion Set 3: *$\chi$²* (`r fit_comparison_3$diffdf[2]`) = `r fit_comparison_3$diffLL[2]`, *p* = `r fit_comparison_3$p[2]`. Overall, there was no evidence that In House protocols elicited greater variability than Author Advised protocols despite the fact that they were unambiguously more variable in their procedural implementation.

```{r analysis-exploratory-knowl, include = FALSE}
# Focused analysis of sites with "expert" or "a lot of knowledge about TMT" leads
# Still using exclusion set 1 (the 'merged' datafile), implemented in participants section

# Selecting only the below sites:
#University of Wisconsin, Madison, WI (in-house)
#The College of New Jersey
#University of Kansas (Expert)
#University of Kansas (in-house)
#Pace University (expert)
#Virginia Commonwealth University, Richmond, VA
data_knowledgeable <- subset(merged, merged$source=="uwmadison_inhouse" | merged$source=="cnj" | merged$source=="kansas_expert" | merged$source=="kansas_inhouse" | merged$source=="pace_expert" | merged$source=="vcu")

#uwmadison_inhouse used a 7 point scale, but with similar anchors, to the 9-point scale used by the other sites. To make this direct comparison I'm going to scale the DV to a 9-point scale. 

data_knowledgeable[data_knowledgeable$source == "uwmadison_inhouse", 'pro_minus_anti'] <- data_knowledgeable[data_knowledgeable$source == "uwmadison_inhouse", 'pro_minus_anti'] * (9/7)

# Applying the same levels fix as earlier, only because it caused problems in 
# cohen.d() below. May not be necessary anymore.
data_knowledgeable$ms_condition <- factor(data_knowledgeable$ms_condition, levels = c("ms", "tv"))
# Analyses using that subset
knowl_ttest <- t.test(data_knowledgeable$pro_minus_anti~data_knowledgeable$ms_condition)
knowl_desc <- describeBy(data_knowledgeable$pro_minus_anti, group = data_knowledgeable$ms_condition)
knowl_effsize <- effsize::cohen.d(data_knowledgeable$pro_minus_anti~data_knowledgeable$ms_condition,pooled=TRUE,paired=FALSE,na.rm=TRUE, hedges.correction=TRUE,conf.level=0.95)
```

## Follow-Up Exploratory Analyses
**Results for TMT-knowledgeable sites.** One principal investigator reported being an expert in TMT, while five others indicated having “a lot” of knowledge about TMT. One might expect that these locations would have greater success at replicating the mortality salience effect. Aggregating across these sites, and using only the first exclusion rule, these sites did not elicit a larger difference between the mortality salience group (*M* = `r as.numeric(knowl_desc$ms["mean"])`, *SD* = `r as.numeric(knowl_desc$ms["sd"])`) and the control group (*M* = `r as.numeric(knowl_desc$tv["mean"])`, *SD* = `r as.numeric(knowl_desc$tv["sd"])`), *t*(`r knowl_ttest$parameter`) = `r knowl_ttest$statistic`, *p* = `r knowl_ttest$p.value`, *Hedges’ g* = `r knowl_effsize$estimate`, 95% CI = [`r knowl_effsize$conf.int["lower"]`, `r knowl_effsize$conf.int["upper"]`].[^6] 

[^6]: One site, UW Madison In House, used a 7-point scale. This has been rescaled to a 9-point scale for this analysis to approximately compare it with the others.

```{r analysis-exploratory-preferpro, include = FALSE}
# Investigating only participants who reported preference for the pro-US author.

# generate dfs for Author Advised and In House sites.
merged_aa <- filter(merged, expert == 1)
merged_ih <- filter(merged, expert == 0)

### Percent rating pro-author more highly than anti-author, basic exclusions
# In House
n_profav_ih <- sum(merged_ih$pro_minus_anti > 0)
n_antifav_ih <- sum(merged_ih$pro_minus_anti < 0)
n_nofav_ih <- sum(merged_ih$pro_minus_anti == 0)

pct_profav_ih <- (n_profav_ih/(n_profav_ih+n_antifav_ih+n_nofav_ih))*100
pct_antifav_ih <- (n_antifav_ih/(n_profav_ih+n_antifav_ih+n_nofav_ih))*100
pct_nofav_ih <- (n_nofav_ih/(n_profav_ih+n_antifav_ih+n_nofav_ih))*100

# Author Advised
n_profav_aa <- sum(merged_aa$pro_minus_anti > 0)
n_antifav_aa <- sum(merged_aa$pro_minus_anti < 0)
n_nofav_aa <- sum(merged_aa$pro_minus_anti == 0)

pct_profav_aa <- (n_profav_aa/(n_profav_aa+n_antifav_aa+n_nofav_aa))*100
pct_antifav_aa <- (n_antifav_aa/(n_profav_aa+n_antifav_aa+n_nofav_aa))*100
pct_nofav_aa <- (n_nofav_aa/(n_profav_aa+n_antifav_aa+n_nofav_aa))*100

# Subset to Author Advised participants who preferred the pro-US author 
# and examine if that finds the effect. Repeat for 3 exclusion sets.
# Datasets:
# merged_aa is basic exclusions
# merged_excl_2_aa is exclusion set 2
merged_excl_2_aa <- filter(merged_excl_2, expert == 1)
# merged_excl_3_aa is exclusion set 3
merged_excl_3_aa <- filter(merged_excl_3, expert == 1)

# Analyses for each
merged_aa_ttest <- t.test(merged_aa$pro_minus_anti~merged_aa$ms_condition)
merged_aa_desc <- describeBy(merged_aa$pro_minus_anti, group = merged_aa$ms_condition)
merged_aa_effsize <- effsize::cohen.d(merged_aa$pro_minus_anti~merged_aa$ms_condition,pooled=TRUE,paired=FALSE,na.rm=TRUE, hedges.correction=TRUE,conf.level=0.95)

# Analyses using that subset
merged_excl_2_aa_ttest <- t.test(merged_excl_2_aa$pro_minus_anti~merged_excl_2_aa$ms_condition)
merged_excl_2_aa_desc <- describeBy(merged_excl_2_aa$pro_minus_anti, group = merged_excl_2_aa$ms_condition)
merged_excl_2_aa_effsize <- effsize::cohen.d(merged_excl_2_aa$pro_minus_anti~merged_excl_2_aa$ms_condition,pooled=TRUE,paired=FALSE,na.rm=TRUE, hedges.correction=TRUE,conf.level=0.95)

# Analyses using that subset
merged_excl_3_aa_ttest <- t.test(merged_excl_3_aa$pro_minus_anti~merged_excl_3_aa$ms_condition)
merged_excl_3_aa_desc <- describeBy(merged_excl_3_aa$pro_minus_anti, group = merged_excl_3_aa$ms_condition)
merged_excl_3_aa_effsize <- effsize::cohen.d(merged_excl_3_aa$pro_minus_anti~merged_excl_3_aa$ms_condition,pooled=TRUE,paired=FALSE,na.rm=TRUE, hedges.correction=TRUE,conf.level=0.95)

```

**Results for participants who preferred the pro-US author** The present hypothesis that mortality salience would cause a participant to become more favorable to the pro-US author as compared to the anti-US author relies on the participant perceiving the pro-US stance as more similar to their own worldview (and/or the anti-US stance as threatening to their worldview). Original authors anticipated that the essays from the original study may not serve this function in the replication, run in 2016. For this reason, the anti-US essay from the original study was made more extreme in the Author Advised version of the replication. There was a particular concern that in the months leading up to and following the 2016 US Presidential Election of Donald Trump, the generally more liberal-leaning student bodies on college campuses may feel less patriotic and not identify with the pro-US worldview. Indeed, analysis suggests the original authors anticipated and more successfully addressed this issue. Among In House replications, `r round(pct_profav_ih, digits = 0)`% of participants prefered the pro-US essay author, `r round(pct_antifav_ih, digits = 0)`% preferred the anti-US essay author, and `r round(pct_nofav_ih, digits = 0)`% had no preference. Among Author Advised replications, `r round(pct_profav_aa, digits = 0)`% of participants prefered the pro-US essay author, `r round(pct_antifav_aa, digits = 0)`% preferred the anti-US essay author, and `r round(pct_nofav_aa, digits = 0)`% had no preference.

However, the predicted mortality salience effect was not larger or detectable via statistical significance when subsetting to only participants at Author Advised sites who preferred the pro-US author. In all exclusion sets, the mortality salience and control groups showed similar levels of preference for the pro-US author over the anti-US author: Exclusion Set 1: mortality salience group (*M* = `r as.numeric(merged_aa_desc$ms["mean"])`, *SD* = `r as.numeric(merged_aa_desc$ms["sd"])`), control group (*M* = `r as.numeric(merged_aa_desc$tv["mean"])`, *SD* = `r as.numeric(merged_aa_desc$tv["sd"])`), *t*(`r merged_aa_ttest$parameter`) = `r merged_aa_ttest$statistic`, *p* = `r merged_aa_ttest$p.value`, *Hedges’ g* = `r merged_aa_effsize$estimate`, 95% CI = [`r merged_aa_effsize$conf.int["lower"]`, `r merged_aa_effsize$conf.int["upper"]`]; Exclusion Set 2: mortality salience group (*M* = `r as.numeric(merged_excl_2_aa_desc$ms["mean"])`, *SD* = `r as.numeric(merged_excl_2_aa_desc$ms["sd"])`), control group (*M* = `r as.numeric(merged_excl_2_aa_desc$tv["mean"])`, *SD* = `r as.numeric(merged_excl_2_aa_desc$tv["sd"])`), *t*(`r merged_excl_2_aa_ttest$parameter`) = `r merged_excl_2_aa_ttest$statistic`, *p* = `r merged_excl_2_aa_ttest$p.value`, *Hedges’ g* = `r merged_excl_2_aa_effsize$estimate`, 95% CI = [`r merged_excl_2_aa_effsize$conf.int["lower"]`, `r merged_excl_2_aa_effsize$conf.int["upper"]`]; Exclusion Set 3: mortality salience group (*M* = `r as.numeric(merged_excl_3_aa_desc$ms["mean"])`, *SD* = `r as.numeric(merged_excl_3_aa_desc$ms["sd"])`), control group (*M* = `r as.numeric(merged_excl_3_aa_desc$tv["mean"])`, *SD* = `r as.numeric(merged_excl_3_aa_desc$tv["sd"])`), *t*(`r merged_excl_3_aa_ttest$parameter`) = `r merged_excl_3_aa_ttest$statistic`, *p* = `r merged_excl_3_aa_ttest$p.value`, *Hedges’ g* = `r merged_excl_3_aa_effsize$estimate`, 95% CI = [`r merged_excl_3_aa_effsize$conf.int["lower"]`, `r merged_excl_3_aa_effsize$conf.int["upper"]`]. The confidence intervals were wider because of the smaller total sample size, but this evidence is not consistent with the hypothesis that preference for the pro-US author would elicit an effect of mortality salience in this context. 

\begingroup
\setlength{\parindent}{-0.5in}
\setlength{\leftskip}{0.5in}

<div id = "refs"></div>
\endgroup