Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: maxbiostat/Student_projects
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 3rd
Choose a base ref
...
head repository: maxbiostat/Student_projects
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: main
Choose a head ref

Commits on Jun 22, 2023

  1. Copy the full SHA
    2b52d69 View commit details
  2. Update README.md

    maxbiostat authored Jun 22, 2023
    Copy the full SHA
    8288ada View commit details
  3. Update README.md

    maxbiostat authored Jun 22, 2023
    Copy the full SHA
    56dee9e View commit details
  4. Create README.md

    maxbiostat authored Jun 22, 2023
    Copy the full SHA
    8554ad2 View commit details
  5. Update README.md

    maxbiostat authored Jun 22, 2023
    Copy the full SHA
    060c8d0 View commit details
  6. Create PhyloGradients.md

    maxbiostat authored Jun 22, 2023
    Copy the full SHA
    6c66bbd View commit details
  7. Update README.md

    maxbiostat authored Jun 22, 2023
    Copy the full SHA
    a998783 View commit details
  8. Update README.md

    maxbiostat authored Jun 22, 2023
    Copy the full SHA
    70734ad View commit details
  9. Create SumPy.md

    maxbiostat authored Jun 22, 2023
    Copy the full SHA
    1323265 View commit details
  10. Update README.md

    maxbiostat authored Jun 22, 2023
    Copy the full SHA
    3c2d109 View commit details

Commits on Jun 23, 2023

  1. Create COMP_brms.md

    maxbiostat authored Jun 23, 2023
    Copy the full SHA
    c97b74c View commit details
  2. Update README.md

    maxbiostat authored Jun 23, 2023
    Copy the full SHA
    735f978 View commit details
  3. Create MultiBD_Stan.md

    maxbiostat authored Jun 23, 2023
    Copy the full SHA
    8b3c8c0 View commit details
  4. Update README.md

    maxbiostat authored Jun 23, 2023
    Copy the full SHA
    5eecbbc View commit details

Commits on Jun 24, 2023

  1. Update README.md

    maxbiostat authored Jun 24, 2023
    Copy the full SHA
    bf50488 View commit details

Commits on Jun 26, 2023

  1. Update README.md

    maxbiostat authored Jun 26, 2023
    Copy the full SHA
    f8f36a2 View commit details

Commits on Jun 29, 2023

  1. Create Phylo2VectR.md

    maxbiostat authored Jun 29, 2023
    Copy the full SHA
    6d90b30 View commit details
  2. Update README.md

    maxbiostat authored Jun 29, 2023
    Copy the full SHA
    fee583b View commit details

Commits on Jul 17, 2023

  1. Update README.md

    maxbiostat authored Jul 17, 2023
    Copy the full SHA
    2bd3039 View commit details
  2. Update README.md

    maxbiostat authored Jul 17, 2023
    Copy the full SHA
    6d80849 View commit details

Commits on Aug 6, 2023

  1. Create inbreeding.md

    maxbiostat authored Aug 6, 2023
    Copy the full SHA
    368ae41 View commit details
  2. Update README.md

    maxbiostat authored Aug 6, 2023
    Copy the full SHA
    557a7f6 View commit details

Commits on Aug 22, 2023

  1. Update README.md

    maxbiostat authored Aug 22, 2023
    Copy the full SHA
    66d6a13 View commit details
  2. Update README.md

    maxbiostat authored Aug 22, 2023
    Copy the full SHA
    4cd6052 View commit details

Commits on Feb 19, 2024

  1. Update README.md

    maxbiostat authored Feb 19, 2024
    Copy the full SHA
    b551896 View commit details
  2. Update README.md

    maxbiostat authored Feb 19, 2024
    Copy the full SHA
    064b660 View commit details
  3. Update README.md

    maxbiostat authored Feb 19, 2024
    Copy the full SHA
    44d383f View commit details

Commits on Feb 28, 2024

  1. Update README.md

    maxbiostat authored Feb 28, 2024
    Copy the full SHA
    6d3d520 View commit details

Commits on Apr 9, 2024

  1. Update Phylo2VectR.md

    maxbiostat authored Apr 9, 2024
    Copy the full SHA
    887d5a5 View commit details

Commits on Jun 13, 2024

  1. Update README.md

    maxbiostat authored Jun 13, 2024
    Copy the full SHA
    c574d90 View commit details

Commits on Aug 31, 2024

  1. Update README.md

    maxbiostat authored Aug 31, 2024
    Copy the full SHA
    e3faa87 View commit details

Commits on Sep 1, 2024

  1. Update README.md

    maxbiostat authored Sep 1, 2024
    Copy the full SHA
    86a3cfe View commit details

Commits on Mar 11, 2025

  1. Update README.md

    maxbiostat authored Mar 11, 2025
    Copy the full SHA
    934ef43 View commit details
  2. Update README.md

    maxbiostat authored Mar 11, 2025
    Copy the full SHA
    56a02a7 View commit details
  3. Update README.md

    maxbiostat authored Mar 11, 2025
    Copy the full SHA
    7db1fc5 View commit details
23 changes: 19 additions & 4 deletions Alumni/README.md
Original file line number Diff line number Diff line change
@@ -8,6 +8,8 @@ Students I have supervised over the years.
|----------|-----------|---------------------------|
| [Fernanda Gomes](https://github.com/fernandalsgomes) | 2020-2021 | Did COVID-19 models fail? |
| [Victor Bombarda](https://github.com/victorbombarda) | 2020-2021 | Did COVID-19 models fail? |
| [Rodrigo Kalil](https://www.linkedin.com/in/rodrigo-cavalcante-kalil/?locale=en_US) | 2024-2025 | Estendendo a formulação de modelos conjuntos em Stan |




@@ -18,7 +20,20 @@ Students I have supervised over the years.
| [Cristiana Nogueira](https://github.com/Cristiananc) | 2021-2021 | [Penalised complexity priors for the reconstruction of past population size from phylogenies](https://bibliotecadigital.fgv.br/dspace/bitstream/handle/10438/31847/TCC%20-%20Cristiana%20Couto.pdf?sequence=1)|
| [Lucas Moschen](https://github.com/lucasmoschen/) | 2021-2021 | [Prevalence estimation and binary regression methods for RDS with outcome uncertainty](https://github.com/lucasmoschen/rds-bayesian-analysis-tcc) |
| [Tarla Lemos](https://github.com/TLAndrade) | 2021-2021 | Métodos estatísticos para inferir o coeficiente de endocruzamento usando dados genéticos |
| [João Pedro Marciano](https://github.com/JPMarciano) | 2021-2022 | The space of time-calibrated phylogenies |
| [Isaque Pim](https://github.com/isaquepim)| 2022-2022 | Seleção de variáveis por busca estocástica para epidemiologia espacial |
| [Marcos Antônio Alves](https://br.linkedin.com/in/marcos-antonio-alves-?original_referer=https%3A%2F%2Fwww.google.com%2F)| 2022-2022 | Exploração e modelagem informacional de dados públicos de saúde |
| [Pedro Dall'Antonia](https://github.com/pedrodall)| 2022-2022 | Um Estudo sobre Causalidade|
| [João Pedro Marciano](https://github.com/JPMarciano) | 2021-2022 | [The space of time-calibrated phylogenies](https://bibliotecadigital.fgv.br/dspace/handle/10438/33849) |
| [Isaque Pim](https://github.com/isaquepim)| 2022-2022 | [Seleção de variáveis por busca estocástica para epidemiologia espacial](https://bibliotecadigital.fgv.br/dspace/bitstream/handle/10438/33851/TCC%20-%20Isaque%20Vieira%20Pim.pdf?sequence=1&isAllowed=y) |
| [Marcos Antônio Alves](https://br.linkedin.com/in/marcos-antonio-alves-?original_referer=https%3A%2F%2Fwww.google.com%2F)| 2022-2022 | [Exploração e modelagem informacional de dados públicos de saúde](https://bibliotecadigital.fgv.br/dspace/handle/10438/33848) |
| [Pedro Dall'Antonia](https://github.com/pedrodall)| 2022-2022 | [Um Estudo sobre Causalidade](https://bibliotecadigital.fgv.br/dspace/handle/10438/33840)|
| [Wellington Silva](https://github.com/wellington36)| 2022-2023| Métodos de extrapolação de séries aplicados à distribuição de Tweedie|
|[Ademir Tomaz Filho](https://www.linkedin.com/search/results/all/?fetchDeterministicClustersOnly=true&heroEntityKey=urn%3Ali%3Afsd_profile%3AACoAACrYKlcB-pIamS5gBcZxFKsgkM-9UEuj1Rg&keywords=ademir%20tomaz%20filho&origin=RICH_QUERY_SUGGESTION&position=0&searchId=2b99c2b7-cf73-43b5-9bf2-9be11091ad06&sid=86r&spellCorrectionEnabled=false)|2023-2023|Aprendizado de redes adversariais generativas: um estudo de caso|

### MSc Dissertation
| Student | Period | Project |
|----------|-----------|---------------------------|
| [Isaque Pim](https://github.com/isaquepim) | 2023-2025 | Spatial Confounding: From Classical Models to Modern Applications|


### PhD Thesis
| Student | Period | Project |
|----------|-----------|---------------------------|
| [Yueqi (Angie) Shen](https://scholar.google.com/citations?user=DPSj_L8AAAAJ&hl=en) | 2021-2024 | [Incorporating Historical Information in Bayesian Clinical Trial Design Using the Normalized Power Prior](https://cdr.lib.unc.edu/downloads/k930c792k) |
5 changes: 5 additions & 0 deletions ProgrammingProjects/COMP_brms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Principled and efficient truncation of the Conway-Maxwell Poisson distribution in brms

In this project you will take the (Stan) implementations of the techniques in [this](https://arxiv.org/abs/1308.2045) paper, which are [here](https://github.com/GuidoAMoreira/stan_summer) and port them to [brms](https://github.com/paul-buerkner/brms).
In particular, you will improve the implementation of the [Conway-Maxwell Poisson](https://en.wikipedia.org/wiki/Conway%E2%80%93Maxwell%E2%80%93Poisson_distribution) pmf in [here](https://github.com/paul-buerkner/brms/blob/master/inst/chunks/fun_com_poisson.stan) by adding adaptive truncation.
You will have to run correctness tests as well as ensure that the implementation is stable enough to be used in general regression problems.
17 changes: 17 additions & 0 deletions ProgrammingProjects/CovarianceBinary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## Efficiently computing covariance matrices for binary data

Let **X** be P-dimensional binary vector. Now suppose you have a sample **D** = (**X1**, **X2**, ..., **Xn**). The task is to compute the P x P covariance matrix of **X** from the sample.

In R you could presumably do
```R
cov(D)
```
The problem is that in many cases this will give you a matrix that is not positive-definite. One way to fix the problem is to realise that for binary variable we can compute the covariance between Xi and Xj by computing
```
cov_ij = p_ij - p_i*p_j.
```
The problem then becomes doing this efficiently. The task is embarrassingly parallelisable, but coding in R is still slow.

Your job is to take the implementation of `binary_cov_matrix()` in [here](https://github.com/maxbiostat/BinaryMarkovChains/blob/main/R/binary_multiESS.R) and make it go vrum vrum. I reckon a simple re-coding in Rcpp should do the trick.

**Applications**: this can be used in estimating the efficiency of Markov chain Monte Carlo algorithms in binary spaces.
10 changes: 10 additions & 0 deletions ProgrammingProjects/MultiBD_Stan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
## A Stan interface to MultiBD

[**MultiBD**](https://github.com/msuchard/MultiBD) is an R package to compute birth-death transition probabilities.
It can be used to fit epidemic models very efficiently.
A fully Bayesian approach, however, necessitates Markov chain Monte Carlo methods.
In this project you will port the code in **MultiBD** for both the likelihood and its gradients which is written in C++ to the templated C++ required by the [Stan math library](https://github.com/stan-dev/math).
This will allow us to fit complex hierarchical models to epidemic data using HMC.
The goal is to have the machinery in place to extend the analysis in Section 6 of 10.1214/18-AOAS1141](https://projecteuclid.org/journals/annals-of-applied-statistics/volume-12/issue-3/Direct-likelihood-based-inference-for-discretely-observed-stochastic-compartmental-models/10.1214/18-AOAS1141.full).


7 changes: 7 additions & 0 deletions ProgrammingProjects/Phylo2VectR.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
## Implementing Phylo2vec in R

You'll port [this](https://github.com/neclow/phylo2vec) Python implementation into R. Specifically, you'll design things to work fast and nice with `phylo` type (**ape**) objects.

[Here](https://github.com/bacpop/trees_rs/blob/main/src/phylo2vec.cpp) is an implementation in C++ which could be leveraged with Rcpp, probably.

The paper is [here](https://arxiv.org/abs/2304.12693) if you need a technical resource.
12 changes: 12 additions & 0 deletions ProgrammingProjects/PhyloGradients.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
## Efficient implementation of the phylogenetic likelihood and its gradients

Phylogenetics is awesome. But it is also a rather difficult problem, both statistically and computationally.
Somewhat recent advances have made efficient algorithms such as [Hamiltonian Monte Carlo (HMC)](https://mc-stan.org/docs/reference-manual/hamiltonian-monte-carlo.html) go mainstream.
While we can't do proper HMC for trees just yet, we can fix the tree and do HMC on branch lengths and other parameters.
But for that to work, we need the phylogenetic likelihood to be programmed efficiently.

Your job is to take the implementation of the phylogenetic likelihood in [**phylostan**](https://github.com/4ment/phylostan) and code it directly in the templated C++ required by the [Stan math library](https://github.com/stan-dev/math).
You'll also need to code up the gradients of the likelihood.
For this project you might want to check out [my notes](https://github.com/maxbiostat/Statistical_Phylogenetics_resources) on learning phylogenetics.

This is not a project for the faint of heart: there will be A LOT of work to get this working. Hopefully the speedups will be worth it. Nae guarantees.
10 changes: 10 additions & 0 deletions ProgrammingProjects/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
- [Efficient and numerically robust computation of covariance matrices for binary data](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/CovarianceBinary.md). Languages: R, C++.
- [Efficient implementation of the phylogenetic likelihood and its gradients for use in Stan](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/PhyloGradients.md). Languages: C++, Python, Stan.
- [Principled truncation of infinite series in Python](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/SumPy.md). Languages: C, Python.
- [Adaptive truncation for the Conway-Maxwell Poisson in **brms**](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/COMP_brms.md). Language: Stan.
- [Porting birth-death exact probabilities into Stan](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/MultiBD_Stan.md). Languages: C++, Stan.
- [Phylo2Vec in R](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/Phylo2VectR.md). Languages: Python, R.
- [Implementing a complicated likelihood involving an infinite sum](https://github.com/maxbiostat/Student_projects/blob/main/ProgrammingProjects/inbreeding.md). Languages: C, R.



8 changes: 8 additions & 0 deletions ProgrammingProjects/SumPy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
## sumPy: truncating infinite sums in Python

In this project we want to port and expand the functionality of the R package [**sumR**](https://github.com/GuidoAMoreira/sumR) in the Python language.
The idea is to reproduce the structure of the R package, creating wrappers for low-level functions (already implemented in C).
We also want to add features like the ability to handle series that may be negative.

This will be joint work with [Guido Moreira](https://github.com/GuidoAMoreira). You might want to check out [this](https://github.com/wellington36/acceleration_algorithms) project too.

4 changes: 4 additions & 0 deletions ProgrammingProjects/inbreeding.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
## Efficient estimation of the inbreeding coefficient.

In this project you'll implement the method in [McClure & Whitlock (2012)](https://www.nature.com/articles/hdy201227), in particular equation (8) therein using [**sumR**](https://github.com/GuidoAMoreira/sumR).
We expect this to be both fast and efficient.
19 changes: 15 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -7,11 +7,16 @@ Contact: `lmax DOT fgv AT gmail`

### Undegraduate (Scientific Initiation [IC] and Honours thesis [TCC])

If you're an undergraduate student who (a) likes Statistics and Biology and wants to do Scientific Initiation and/or (b) is looking to complete an undergraduate thesis under my supervision but do not have a clear project in mind, there are a few projects listed [**here**](https://github.com/maxbiostat/Student_projects/blob/main/Undegraduate/README.md) that might pique your interest.
If you're an undergraduate student who (a) likes Statistics and Biology and wants to do Scientific Initiation and/or (b) is looking to complete an undergraduate thesis under my supervision but does not have a clear project in mind, there are a few projects listed [**here**](https://github.com/maxbiostat/Student_projects/blob/main/Undegraduate/README.md) that might pique your interest.
These range from programming for Public Health data analysis to theoretical statistics. So pick your poison and shoot me an email.

---

### Coding projects

If you like programming and would like to flex your coding muscles in Scientific Computing problems, take a look at the projects in [**Programming Projects**](https://github.com/maxbiostat/Student_projects/tree/main/ProgrammingProjects) and see if anything whets your appetite.

---
### Master's degree

A few projects suitable for a MSc Dissertation (in Applied Mathematics) are listed [**here**](https://github.com/maxbiostat/Student_projects/blob/main/MSc/README.md). Feel free to contact me about them, but know that it is probably best to complete coursework before you start the dissertation.
@@ -29,7 +34,13 @@ For PhD-level work I expect a solid theoretical basis, as well as a commitment t

### Current students

- 2022- [Wellington Silva]() (IC), "Aceleração e truncamento de séries infinitas".
- 2022- [Eduardo Adame Salles](https://github.com/adamesalles) (IC), "Convex Gaussian Processes with derivative information".

A [list of past students](https://github.com/maxbiostat/Student_projects/tree/main/Alumni#readme) is also available.
- 2025- [Eduardo Adame Salles](https://github.com/adamesalles) (MSc), "Exact MCMC for the normalised power prior".
- 2025- [Ezequiel Braga](https://github.com/EzequielEBS) (MSc), "Principled Bayesian analysis under the normalised power prior".
- 2025- [Iara Castro](https://github.com/iaracastro) (MSc), "Survival methods for cancer treatment equity in Brazil".
- 2024- [Wellington Silva](https://github.com/wellington36) (MSc, CAPES), "Efficient Bayesian computation for intractable count models".
- 2023- [Igor Michels](https://github.com/IgorMichels) (MSc, CAPES), "Bayesian calibration of player-level football models".
- 2023- [Felipe Schardong](https://www.linkedin.com/in/felipe-schardong-9911a1217/) (PhD, CAPES), "Mathematical modelling of antimicrobial resistance in Brazil".
- 2024- [Atílio Leitão Pellegrino](https://www.linkedin.com/in/at%C3%ADlio-leit%C3%A3o-pellegrino-59016a192/?originalSubdomain=br) (PhD, CAPES), "Combining forecasts from epidemiological models: theory and methods".

A [list of former students](https://github.com/maxbiostat/Student_projects/tree/main/Alumni#readme) is also available.
11 changes: 1 addition & 10 deletions Undegraduate/README.md
Original file line number Diff line number Diff line change
@@ -34,17 +34,8 @@ Em [trabalho recente](https://github.com/maxbiostat/presentations/blob/master/PD
Habilidades a serem desenvolvidas: MCMC, R, métodos numéricos, cadeias de Markov de tempo discreto.

---
A2) **sumPy: truncamento numericamente estável de séries infinitas em Python**

Neste projeto queremos portar e expandir as funcionalidades do pacote do R [**sumR**](https://github.com/GuidoAMoreira/sumR) na linguagem Python. A ideia é reproduzir a estrutura do pacote R, criando _wrappers_ para as funções em baixo nível (já implementadas em C). Queremos também adicionar _features_ como a capacidade de tratar séries que podem ser negativas.

Trabalho em conjunto com [Guido Moreira](https://github.com/GuidoAMoreira).

Habilidades a serem desenvolvidas: programação científica em Python e C.

---

A3) **Análise conjunta de sensibilidade e especificidade de testes diagnósticos**
A2) **Análise conjunta de sensibilidade e especificidade de testes diagnósticos**

Testes diagnósticos em geral são imperfeitos, isto é, detectam a condição de interesse com certas características de operação (sensibilidade e especificidade). Neste projeto vamos coletar e analisar dados de meta-análises sob diferentes modelos para a distribuição conjunta da sensibilidade e especificidade de testes diagnósticos para doenças. Vamos testar modelos beta bivariados e baseados em variáveis latentes gaussianas.
O objetivo final é entender que modelos melhor se adequam aos variados tipos de dados e como utilizar as distribuições obtidas como distribuições _a priori_ em análises bayesianas da prevalência.