This set of files includes all of the code necessary to reproduce all of the results in the paper. It contains no data. The paper itself involves using Canadian census Dissemination Area (DA) data combined with Census Subdivision Data (CSD) data in nearly every step of the analysis. This data could be used to identify survey respondents, and so we do not include it here. If a researcher would like access to this data, we encourage them to contact Cara Wong [email protected]. We are in the process of submitting these files to the ICPSR which has a well established procedure for the use of restricted access data, and, in the future we will direct people to the ICPSR for this purpose.
We recommend that you follow a series of steps in order to reproduce the results in the paper. Feel free to leave an Issue on this Github repository with a description of your problem and one of us will work on helping you out.
Note: We have only used this workflow on a Mac OS computer and a Linux computer. We have not tested it on a Windows computer. We would be happy to publish modifications to enable broader interoperability. Feel free to contact us.
We use Canadian Census data throughout this paper and we access it using the
cancensus
R package. That package requires that researchers get an API number
or code to identify them when they access the Census Mapper servers. You can
put this API number into the Data/get_and_setup_2006_census_data.R
and
Data/get_and_setup_2016_census_data.R
files in order to run that part of the
workflow. These data are public. So, no permission is needed to access them.
The public Canadian census data files that we distribute here and that can be
re-created using the code as documented in the following lines in
Makefile.datasetup
are the following:
CENSUS_DATA_FILES = Data/CensusData/2006_Data/census_data_csd_06.rda \
Data/CensusData/2006_Data/census_data_06.rda
$(CENSUS_DATA_FILES): Data/get_and_setup_2006_census_data.R
$(RCMD) Data/get_and_setup_2006_census_data.R
Data/CensusData/2016_Data/census_data_16.rda: Data/get_and_setup_2016_census_data.R
$(RCMD) Data/get_and_setup_2016_census_data.R
As you can see in the R scripts, you will need to install the following packages: here, sf, cancensus, and dplyr.
The designmatch
package relies on the gurobi
constrained optimization
software. To replicate the code in this package, you'll need to install
gurobi
by hand from the Gurobi Quickstart
Page. This will involve
getting a free academic license, downloading the Gurobi software for your
system, installing it on your system, and then installing the gurobi R package
into the R installation within this project directory. wong_bjps_2025.Rproj
R project directory.
For example, on a Mac we first had to register an account with Gurobi using our
academic email address and to activate an academic license for a single-user,
then we downloaded a file called gurobi12.0.3_macos_universal2.pkg
. Then we
double clicked that file using the Mac Finder to install the Gurobi software on
our system.
After that installation, we started R within the wong_bjps_2025
directory
(for example, we started RStudio using the wong_bjps_2025.Rproj
file.) and
then we typed the following from the R console
install.packages("/Library/gurobi1203/macos_universal2/R/gurobi_12.0-3_R_4.5.0.tgz",repos=NULL)
Before running any of our files that rely on the Gurobi optimizer, we had to
activate the gurobi license using a command like grbgetkey 1234...
at the
unix command line on our Mac computers (using the Terminal application).
As an alternative to gurobi You could also try the open-source highs
package (installing it from within R using install.packages("highs")
,
changing the arguments to the solver
lists that we provide to the nmatch()
functions. We haven't tried that solver in this paper so we expect that some
of the results would differ.
This paper uses many R packages and we kept our collaboration running smoothly
by using the renv system
to keep track of packages and their dependencies. You should be able to install
all of the packages (except for the gurobi package that has to be installed by
hand first) using renv::restore()
) from the R command line from an R session
started within the root of this project directory (perhaps after starting RStudio using the
wong_bjps_2025.Rproj
file, or in some other way) and following the
instructions.
## If you haven't already installed renv do so using install.packages('renv')
renv::restore()
You may have to work a bit with the renv
system to avoid errors. For example,
the first time you type renv::restore()
it may complain that your version of
gurobi is not the same as the one we used originally (12.0-2). So you may need
to reinstall the one you want (we had to issue this command first before we
used renv
on this replication archive and then again after seeing some
warnings from renv
)
install.packages("/Library/gurobi1203/macos_universal2/R/gurobi_12.0-3_R_4.5.0.tgz",repos=NULL)
And then you may need to see if there are other packages that need reinstalling
by typing renv::status()
and renv::snapshot()
.
For example you might see:
> renv::status()
The following package(s) were installed from an unknown source:
- gurobi [12.0-3]
renv may be unable to restore these packages in the future.
Consider reinstalling these packages from a known source (e.g. CRAN).
But, at the end, when you quit R and restart it, you should see an R console saying something like this which indicates that all relevant packages are installed and you are ready to try to build the paper.
R version 4.5.0 (2025-04-11) -- "How About a Twenty-Six"
Copyright (C) 2025 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin20
...
- Project '~/Documents/PROJECTS/wong_bjps_2025' loaded. [renv 1.1.5]
>
We use the make system to keep track of
the dependencies among the files in this project. This means that, if you are
using a Mac or Linux machine you should be able to open a Terminal window, cd
to the directory containing the replication files, and then to type make Manuscript/manuscript.pdf
.
If you are using RStudio we recommend that you open the wong_bjps_2025.Rproj
file --- this is an R Project file that will make sure that you are in the
correct working directory. It also allows you to use the make
system by going
to the "Build" tab and clicking on "Build All".
You can see whether your system is ready to use make
by typing which make
in the Terminal. If you do not see a path to the make
command (like
/usr/bin/make
), then you will need to install GNU Make.
To install GNU Make on a Mac you need to install the "command line tools"
via the following command at the unix command line
within the Mac terminal xcode-select --install
.
Whether you type make Manuscript/manuscript.pdf
at the command line or click
"Build All" from the RStudio "Build" menu or run each of the files below in
order, this will take some time since (1) the nonbipartite matching problems
take time to solve and (2) we present Bayesian multilevel model results which
require MCMC sampling.
You can see all of the steps required to produce the manuscript.pdf
file by
typing make -n Manuscript/manuscript.pdf
, where you should get output like
this (maybe with some files repeated) showing which files should be run and in
which order:
## If you need the canadian census data, you might need to download it first
R --no-save --no-restore -f Data/make_working_files.R
R --no-save --no-restore -f Design/dist_mats_data_anyDA_new.R
R --no-save --no-restore -f Design/match_anyDA_new.R
R --no-save --no-restore -f Analysis/analysis_anyDA_new.R
R --no-save --no-restore -f Analysis/supp_desc_new.R > Analysis/supp_desc_new.Rout
R --no-save --no-restore -f Figures_Tables/figures_anyDA_new.R
R --no-save --no-restore -f Analysis/alt_explanations_analysis.R
R --no-save --no-restore -f Figures_Tables/alt_explanations_plot.R
R --no-save --no-restore -f Design/dist_mats_data_DA_new.R
R --no-save --no-restore -f Design/match_DA_new.R
R --no-save --no-restore -f Analysis/analysis_DA_new.R
R --no-save --no-restore -f Figures_Tables/figures_DA_new.R
R --no-save --no-restore -f Data/sameDAdat.R
R --no-save --no-restore -f Analysis/analysis_sameDA.R
R --no-save --no-restore -f Figures_Tables/figures_sameDA.R
R --no-save --no-restore -f Analysis/sameDAviewDA.R
R --no-save --no-restore -f Figures_Tables/figures_sameDAviewDA.R
R --no-save --no-restore -f Design/dist_mats_data_anyDA_Diversity_new.R
R --no-save --no-restore -f Design/match_anyDA_Diversity_new.R
R --no-save --no-restore -f Analysis/analysis_anyDA_Diversity_new.R
R --no-save --no-restore -f Figures_Tables/figures_anyDA_Diversity_new.R
R --no-save --no-restore -f Figures_Tables/plot_pairwise_social_cohesion_anyDA_new.R
R --no-save --no-restore -f Design/match_assess_anyDA_new.R > Design/match_assess_anyDA_new.Rout
R --no-save --no-restore -f Figures_Tables/coefplot_table_anyDA.R
cd Manuscript && latexmk -pdf manuscript.tex
You can see the relationships between the files as laid out in the different Makefile
s in this graphic:
We have not tested our workflow on a Windows computer and we would love advice/pull requests about how to use our Makefile in that context.