maxbiostat · Jun 22, 2023 · Jun 22, 2023 · Jun 22, 2023 · Jun 22, 2023 · Jun 22, 2023
diff --git a/Alumni/README.md b/Alumni/README.md
@@ -8,6 +8,8 @@ Students I have supervised over the years.
 |----------|-----------|---------------------------|
 | [Fernanda Gomes](https://github.com/fernandalsgomes) | 2020-2021 | Did COVID-19 models fail? | 
 | [Victor Bombarda](https://github.com/victorbombarda) | 2020-2021 | Did COVID-19 models fail? |
+| [Rodrigo Kalil](https://www.linkedin.com/in/rodrigo-cavalcante-kalil/?locale=en_US) | 2024-2025 | Estendendo a formulação de modelos conjuntos em Stan |
+
 
 
 
@@ -18,7 +20,20 @@ Students I have supervised over the years.
 | [Cristiana Nogueira](https://github.com/Cristiananc) | 2021-2021 | [Penalised complexity priors for the reconstruction of past population size from phylogenies](https://bibliotecadigital.fgv.br/dspace/bitstream/handle/10438/31847/TCC%20-%20Cristiana%20Couto.pdf?sequence=1)| 
 | [Lucas Moschen](https://github.com/lucasmoschen/) | 2021-2021 | [Prevalence estimation and binary regression methods for RDS with outcome uncertainty](https://github.com/lucasmoschen/rds-bayesian-analysis-tcc) |
 | [Tarla Lemos](https://github.com/TLAndrade) | 2021-2021 | Métodos estatísticos para inferir o coeficiente de endocruzamento usando dados genéticos | 
-| [João Pedro Marciano](https://github.com/JPMarciano) | 2021-2022 | The space of time-calibrated phylogenies | 
-| [Isaque Pim](https://github.com/isaquepim)| 2022-2022 | Seleção de variáveis por busca estocástica para epidemiologia espacial | 
-| [Marcos Antônio Alves](https://br.linkedin.com/in/marcos-antonio-alves-?original_referer=https%3A%2F%2Fwww.google.com%2F)| 2022-2022 | Exploração e modelagem informacional de dados públicos de saúde | 
-| [Pedro Dall'Antonia](https://github.com/pedrodall)| 2022-2022 | Um Estudo sobre Causalidade| 
+| [João Pedro Marciano](https://github.com/JPMarciano) | 2021-2022 | [The space of time-calibrated phylogenies](https://bibliotecadigital.fgv.br/dspace/handle/10438/33849) | 
+| [Isaque Pim](https://github.com/isaquepim)| 2022-2022 | [Seleção de variáveis por busca estocástica para epidemiologia espacial](https://bibliotecadigital.fgv.br/dspace/bitstream/handle/10438/33851/TCC%20-%20Isaque%20Vieira%20Pim.pdf?sequence=1&isAllowed=y) | 
+| [Marcos Antônio Alves](https://br.linkedin.com/in/marcos-antonio-alves-?original_referer=https%3A%2F%2Fwww.google.com%2F)| 2022-2022 | [Exploração e modelagem informacional de dados públicos de saúde](https://bibliotecadigital.fgv.br/dspace/handle/10438/33848) | 
+| [Pedro Dall'Antonia](https://github.com/pedrodall)| 2022-2022 | [Um Estudo sobre Causalidade](https://bibliotecadigital.fgv.br/dspace/handle/10438/33840)| 
+| [Wellington Silva](https://github.com/wellington36)| 2022-2023| Métodos de extrapolação de séries aplicados à distribuição de Tweedie|
+|[Ademir Tomaz Filho](https://www.linkedin.com/search/results/all/?fetchDeterministicClustersOnly=true&heroEntityKey=urn%3Ali%3Afsd_profile%3AACoAACrYKlcB-pIamS5gBcZxFKsgkM-9UEuj1Rg&keywords=ademir%20tomaz%20filho&origin=RICH_QUERY_SUGGESTION&position=0&searchId=2b99c2b7-cf73-43b5-9bf2-9be11091ad06&sid=86r&spellCorrectionEnabled=false)|2023-2023|Aprendizado de redes adversariais generativas: um estudo de caso|
+
+### MSc Dissertation
+| Student  | Period    | Project                   | 
+|----------|-----------|---------------------------| 
+| [Isaque Pim](https://github.com/isaquepim) | 2023-2025 | Spatial Confounding: From Classical Models to Modern Applications|
+
+
+### PhD Thesis
+| Student  | Period    | Project                   | 
+|----------|-----------|---------------------------| 
+| [Yueqi (Angie) Shen](https://scholar.google.com/citations?user=DPSj_L8AAAAJ&hl=en) | 2021-2024 | [Incorporating Historical Information in Bayesian Clinical Trial Design Using the Normalized Power Prior](https://cdr.lib.unc.edu/downloads/k930c792k) |
diff --git a/ProgrammingProjects/COMP_brms.md b/ProgrammingProjects/COMP_brms.md
@@ -0,0 +1,5 @@
+## Principled and efficient truncation of the Conway-Maxwell Poisson distribution in brms
+
+In this project you will take the (Stan) implementations of the techniques in [this](https://arxiv.org/abs/1308.2045) paper, which are [here](https://github.com/GuidoAMoreira/stan_summer) and port them to [brms](https://github.com/paul-buerkner/brms).
+In particular, you will improve the implementation of the [Conway-Maxwell Poisson](https://en.wikipedia.org/wiki/Conway%E2%80%93Maxwell%E2%80%93Poisson_distribution) pmf in [here](https://github.com/paul-buerkner/brms/blob/master/inst/chunks/fun_com_poisson.stan) by adding adaptive truncation.
+You will have to run correctness tests as well as ensure that the implementation is stable enough to be used in general regression problems. 
diff --git a/ProgrammingProjects/CovarianceBinary.md b/ProgrammingProjects/CovarianceBinary.md
@@ -0,0 +1,17 @@
+## Efficiently computing covariance matrices for binary data  
+
+Let **X** be P-dimensional binary vector. Now suppose you have a sample **D** = (**X1**, **X2**, ..., **Xn**). The task is to compute the P x P covariance matrix of **X** from the sample. 
+
+In R you could presumably do
+```R
+cov(D) 
+```
+The problem is that in many cases this will give you a matrix that is not positive-definite. One way to fix the problem is to realise that for binary variable we can compute the covariance between Xi and Xj by computing
+```
+cov_ij = p_ij - p_i*p_j.
+```
+The problem then becomes doing this efficiently. The task is embarrassingly parallelisable, but coding in R is still slow.
+
+Your job is to take the implementation  of `binary_cov_matrix()` in [here](https://github.com/maxbiostat/BinaryMarkovChains/blob/main/R/binary_multiESS.R) and make it go vrum vrum. I reckon a simple re-coding in Rcpp should do the trick.
+
+**Applications**: this can be used in estimating the efficiency of Markov chain Monte Carlo algorithms in binary spaces. 
diff --git a/ProgrammingProjects/MultiBD_Stan.md b/ProgrammingProjects/MultiBD_Stan.md
@@ -0,0 +1,10 @@
+## A Stan interface to MultiBD 
+
+[**MultiBD**](https://github.com/msuchard/MultiBD) is an R package to compute birth-death transition probabilities.
+It can be used to fit epidemic models very efficiently. 
+A fully Bayesian approach, however, necessitates Markov chain Monte Carlo methods.
+In this project you will port the code in **MultiBD** for both the likelihood and its gradients which is written in C++  to the templated C++ required by the [Stan math library](https://github.com/stan-dev/math).
+This will allow us to fit complex hierarchical models to epidemic data using HMC. 
+The goal is to have the machinery in place to extend the analysis in Section 6 of 10.1214/18-AOAS1141](https://projecteuclid.org/journals/annals-of-applied-statistics/volume-12/issue-3/Direct-likelihood-based-inference-for-discretely-observed-stochastic-compartmental-models/10.1214/18-AOAS1141.full).
+
+
diff --git a/ProgrammingProjects/Phylo2VectR.md b/ProgrammingProjects/Phylo2VectR.md
@@ -0,0 +1,7 @@
+## Implementing Phylo2vec in R
+
+You'll port [this](https://github.com/neclow/phylo2vec) Python implementation into R. Specifically, you'll design things to work fast and nice with `phylo` type (**ape**) objects.
+
+[Here](https://github.com/bacpop/trees_rs/blob/main/src/phylo2vec.cpp) is an implementation in C++ which could be leveraged with Rcpp, probably. 
+
+The paper is [here](https://arxiv.org/abs/2304.12693) if you need a technical resource. 
diff --git a/ProgrammingProjects/PhyloGradients.md b/ProgrammingProjects/PhyloGradients.md
@@ -0,0 +1,12 @@
+## Efficient implementation of the phylogenetic likelihood and its gradients
+
+Phylogenetics is awesome. But it is also a rather difficult problem, both statistically and computationally.
+Somewhat recent advances have made efficient algorithms such as [Hamiltonian Monte Carlo (HMC)](https://mc-stan.org/docs/reference-manual/hamiltonian-monte-carlo.html) go mainstream.
+While we can't do proper HMC for trees just yet, we can fix the tree and do HMC on branch lengths and other parameters.
+But for that to work, we need the phylogenetic likelihood to be programmed efficiently. 
+
+Your job is to take the implementation of the phylogenetic likelihood in [**phylostan**](https://github.com/4ment/phylostan) and code it directly in the templated C++ required by the [Stan math library](https://github.com/stan-dev/math).
+You'll also need to code up the gradients of the likelihood.
+For this project you might want to check out [my notes](https://github.com/maxbiostat/Statistical_Phylogenetics_resources) on learning phylogenetics. 
+
+This is not a project for the faint of heart: there will be A LOT of work to get this working. Hopefully the speedups will be worth it. Nae guarantees. 
diff --git a/ProgrammingProjects/README.md b/ProgrammingProjects/README.md
@@ -0,0 +1,10 @@
+- [Efficient and numerically robust computation of covariance matrices for binary data](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/CovarianceBinary.md). Languages: R, C++.
+- [Efficient implementation of the phylogenetic likelihood and its gradients for use in Stan](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/PhyloGradients.md). Languages: C++, Python, Stan.
+- [Principled truncation of infinite series in Python](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/SumPy.md). Languages: C, Python.
+- [Adaptive truncation for the Conway-Maxwell Poisson in **brms**](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/COMP_brms.md). Language: Stan.
+- [Porting birth-death exact probabilities into  Stan](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/MultiBD_Stan.md). Languages: C++, Stan.
+- [Phylo2Vec in R](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/Phylo2VectR.md). Languages: Python, R.
+- [Implementing a complicated likelihood involving an infinite sum](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/inbreeding.md). Languages: C, R.
+
+
+
diff --git a/ProgrammingProjects/SumPy.md b/ProgrammingProjects/SumPy.md
@@ -0,0 +1,8 @@
+## sumPy:  truncating infinite sums in Python
+
+In this project we want to port and expand the functionality of the R package [**sumR**](https://github.com/GuidoAMoreira/sumR) in the Python language.
+The idea is to reproduce the structure of the R package, creating wrappers for low-level functions (already implemented in C).
+We also want to add features like the ability to handle series that may be negative.
+
+This will be joint work with [Guido Moreira](https://github.com/GuidoAMoreira). You might want to check out [this](https://github.com/wellington36/acceleration_algorithms) project too. 
+
diff --git a/ProgrammingProjects/inbreeding.md b/ProgrammingProjects/inbreeding.md
@@ -0,0 +1,4 @@
+## Efficient estimation of the inbreeding coefficient. 
+
+In this project you'll implement the method in [McClure & Whitlock (2012)](https://www.nature.com/articles/hdy201227), in particular equation (8) therein using  [**sumR**](https://github.com/GuidoAMoreira/sumR).
+We expect this to be both fast and efficient. 
diff --git a/README.md b/README.md
@@ -7,11 +7,16 @@ Contact: `lmax DOT fgv AT gmail`
 
 ### Undegraduate (Scientific Initiation [IC] and Honours thesis [TCC])
 
-If you're an undergraduate student who (a) likes Statistics and Biology and wants to do Scientific Initiation and/or (b) is looking to complete an undergraduate thesis under my supervision but do not have a clear project in mind, there are a few projects listed [**here**](https://github.com/maxbiostat/Student_projects/blob/main/Undegraduate/README.md) that might pique your interest.
+If you're an undergraduate student who (a) likes Statistics and Biology and wants to do Scientific Initiation and/or (b) is looking to complete an undergraduate thesis under my supervision but does not have a clear project in mind, there are a few projects listed [**here**](https://github.com/maxbiostat/Student_projects/blob/main/Undegraduate/README.md) that might pique your interest.
 These range from programming for Public Health data analysis to theoretical statistics. So pick your poison and shoot me an email.
 
 ---
 
+### Coding projects
+
+If you like programming and would like to flex your coding muscles in Scientific Computing problems, take a look at the projects in [**Programming Projects**](https://github.com/maxbiostat/Student_projects/tree/main/ProgrammingProjects) and see if anything whets your appetite. 
+
+---
 ### Master's degree
 
 A few projects suitable for a MSc Dissertation (in Applied Mathematics) are listed [**here**](https://github.com/maxbiostat/Student_projects/blob/main/MSc/README.md). Feel free to contact me about them, but know that it is probably best to complete coursework before you start the dissertation.
@@ -29,7 +34,13 @@ For PhD-level work I expect a solid theoretical basis, as well as a commitment t
 
 ### Current students
 
-- 2022- [Wellington Silva]() (IC), "Aceleração e truncamento de séries infinitas".
-- 2022- [Eduardo Adame Salles](https://github.com/adamesalles) (IC), "Convex Gaussian Processes with derivative information".
 
-A [list of past students](https://github.com/maxbiostat/Student_projects/tree/main/Alumni#readme) is also available. 
+- 2025- [Eduardo Adame Salles](https://github.com/adamesalles) (MSc), "Exact MCMC for the normalised power prior".
+- 2025- [Ezequiel Braga](https://github.com/EzequielEBS) (MSc), "Principled Bayesian analysis under the normalised power prior".
+- 2025- [Iara Castro](https://github.com/iaracastro) (MSc), "Survival methods for cancer treatment equity in Brazil".
+- 2024- [Wellington Silva](https://github.com/wellington36) (MSc, CAPES), "Efficient Bayesian computation for intractable count models".
+- 2023- [Igor Michels](https://github.com/IgorMichels) (MSc, CAPES), "Bayesian calibration of player-level football models". 
+- 2023- [Felipe Schardong](https://www.linkedin.com/in/felipe-schardong-9911a1217/) (PhD, CAPES), "Mathematical modelling of antimicrobial resistance in Brazil".
+- 2024- [Atílio Leitão Pellegrino](https://www.linkedin.com/in/at%C3%ADlio-leit%C3%A3o-pellegrino-59016a192/?originalSubdomain=br) (PhD, CAPES), "Combining forecasts from epidemiological models: theory and methods".
+
+A [list of former students](https://github.com/maxbiostat/Student_projects/tree/main/Alumni#readme) is also available. 
diff --git a/Undegraduate/README.md b/Undegraduate/README.md
@@ -34,17 +34,8 @@ Em [trabalho recente](https://github.com/maxbiostat/presentations/blob/master/PD
 Habilidades a serem desenvolvidas: MCMC, R, métodos numéricos, cadeias de Markov de tempo discreto. 
 
 ---
-A2) **sumPy: truncamento numericamente estável de séries infinitas em Python**
 
-Neste projeto queremos portar e expandir as funcionalidades do pacote do R [**sumR**](https://github.com/GuidoAMoreira/sumR) na linguagem Python. A ideia é reproduzir a estrutura do pacote R, criando _wrappers_ para as funções em baixo nível (já implementadas em C). Queremos também adicionar _features_ como a capacidade de tratar séries que podem ser negativas.
-
-Trabalho em conjunto com [Guido Moreira](https://github.com/GuidoAMoreira).
-
-Habilidades a serem desenvolvidas: programação científica em Python e C.
-
----
-
-A3) **Análise conjunta de sensibilidade e especificidade de testes diagnósticos**
+A2) **Análise conjunta de sensibilidade e especificidade de testes diagnósticos**
 
 Testes diagnósticos em geral são imperfeitos, isto é, detectam a condição de interesse com certas características de operação (sensibilidade e especificidade). Neste projeto vamos coletar e analisar dados de meta-análises sob diferentes modelos para a distribuição conjunta da sensibilidade e especificidade de testes diagnósticos para doenças. Vamos testar modelos beta bivariados e baseados em variáveis latentes gaussianas.
 O objetivo final é entender que modelos melhor se adequam aos variados tipos de dados e como utilizar as distribuições obtidas como distribuições _a priori_ em análises bayesianas da prevalência.