Prof. Marek Gagolewski
My current research interests are related to data science, with a focus on modelling complex phenomena, developing usable, general-purpose algorithms, studying their analytical properties, and finding out how people (laymen, decision makers, students, and researchers from different fields) use, misuse, understand, and misunderstand data analysis methods in scientific, business, political, social, and other settings. In my spare time, I write books for my students and develop open-source data analysis software.
-
Deep R Programming (HTML) (PDF) (paper copy) (GitHub)
-
Minimalist Data Wrangling in Python (HTML) (PDF) (paper copy) (GitHub)
- lumbermark – Resistant clustering via chopping up mutual reachability minimum spanning trees (GitHub) (PyPI) (CRAN)
- deadwood – Outlier detection via pruning mutual reachability minimum spanning trees (GitHub) (PyPI) (CRAN)
- genieclust – Fast and robust hierarchical clustering (GitHub) (PyPI) (CRAN) (paper)
- quitefastmst – Euclidean and mutual reachability minimum spanning tree algorithms (GitHub) (PyPI) (CRAN)
- clustering-benchmarks – A framework for benchmarking clustering algorithms (GitHub) (PyPI) (paper)
- stringi – Fast and portable character string processing in R (one of the most often downloaded packages for R) (GitHub) (CRAN) (paper)
- stringx – Drop-in replacements for base R string functions powered by stringi (GitHub) (CRAN)
- realtest – Where expectations meet reality: Realistic unit testing in R (GitHub) (CRAN)
- TurtleGraphics – Learn computer programming in R while having a jolly time! (GitHub) (CRAN)
- Clustering benchmarks (framework, datasets, results)
- Datasets for teaching