(Pronounced like Mark Gaggle-Eve-Ski)Senior Lecturer in Applied Artificial Intelligence
Researcher in Data Science
Open Source Machine Learning, Data Analysis and Scientific Software Developer (Python, C, C++, R, etc.)
Data Science, Machine Learning, and Statistical Computing Tutor & Trainer
Python and R package
A reimplementation of my robust hierarchical clustering algorithm Genie is now available on PyPI and CRAN. Now even faster and equipped with many more features, including noise point detection. See https://genieclust.gagolewski.com for more details, documentation, benchmarks, and tutorials.
Paper on SimilaR in R Journal
SimilaR: R Code Clone and Plagiarism Detection by Maciej Bartoszuk and me has been accepted for publication in the R Journal. Read more…
Paper in PNAS: Three Dimensions of Scientific Impact
In a paper recently published in the Proceedings of the National Academy of Sciences of the United States of America (PNAS) (doi:10.1073/pnas.2001064117; joint work with Grzesiek Siudem, Basia Żogała-Siudem and Ania Cena), we consider the mechanisms behind one’s research success as measured by one’s papers’ citability. By acknowledging the perceived esteem might be a consequence not only of how valuable one’s works are but also of pure luck, we arrived at a model that can accurately recreate a citation record based on just three parameters: the number of publications, the total number of citations, and the degree of randomness in the citation patterns. As a by-product, we show that a single index will never be able to embrace the complex reality of the scientific impact. However, three of them can already provide us with a reliable summary. Read more…
Benchmark Suite for Clustering Algorithms - Version 1
Let's aggregate, polish and standardise the existing clustering benchmark suites referred to across the machine learning and data mining literature! See our new Benchmark Suite for Clustering Algorithms.