This repository contains all of the code to reproduce the analysis done for the State of the AnVIL 2024 Poll.
Raw data for this project is in a password protected, controlled access shared Google Drive because it contains some identifying information. This data is processed and de-identified and made available within the wrangled_data subdirectory.
These are codebook files created by the analysts explaining the columns in the raw data as well as possible values and dictionaries to categorize certain columns (e.g., institution).
codebook.txt: codebook relating to raw datacontrolledAccessData_codebook.txt: Controlled access data mentioned in the poll as well as whether AnVIL hosts it.institution_codebook.txt: institutions and simplified categorization
resultsTidy.rds: wrangled data saved from1_TidyData.Rmd(with identifying information of email and raw institutional affiliation removed)resultsTidy_personas.rds: wrangled data saved from2_PersonaStats.Rmd
1_TidyData.Rmd: Fetching of Raw Data and wrangling steps for later analysis to create a de-identified tidy data file.2_PersonaStats.Rmd: Identification of personas and joining of persona categorization with tidy data.3_MainAnalysis.Rmd: Main analysis and plotting driver4_Stats.Rmd: Code to support all stated stats/general observations in the report out that aren't directly observed from plots/figures. Description of format for this:- Chronological order of statements and sections aligning with layout of the preprint
- For each section, if there's a table that is used to support multiple statements, table is constructed within an expandable details section prior to any direct statements from the preprint
- For each statement, there's a section separator and the specific statement, followed by an expandable details section with code to show the support for the statement.
5_PCA.Rmd: Performs PCA analysis for all respondents after subsetting and wrangling the data
This directory contains corresponding knit HTML files for each of the R Markdown files in the analyses directory and the figure creation R Markdown in the figures directory.
scripts/shared_functions.R: some functions used repeatedly in analysis or for plottingplots/: plots from the main analysis saved as png filessupplemental_material/: Includes the complete poll, supplementary Table 1 (relation of study aims and poll questions), and supplementary Table 2 (raw responses translated to awareness and use)
figureCreation.Rmd: Usespatchworkto combine plots from3_MainAnalysis.Rmdto make figure panels and adjusts aesthetics as necessary.- The figure panels themselves are saved as png files within this directory as well
- Preprint information
- A poster presented at the AnVIL Community Conference 2025
- A companion website information
- AnVIL Collection and other outreach information