Marek Gagolewski

Dr habil. Marek Gagolewski

(Pronounced like Mark Gaggle-Eve-Ski)

Senior Lecturer in Applied Artificial Intelligence
School of Information Technology, Deakin University
Melbourne-Burwood Campus, Room T2.20, 221 Burwood Hwy, Burwood, VIC 3125, Australia
Associate Professor in Data Science (on long-term leave)
Faculty of Mathematics and Information Science, Warsaw University of Technology
ul. Koszykowa 75, 00-662 Warsaw, Poland
Emails (pick one – and only one):
marekgagolewski·com (main)
m.gagolewskideakin·edu·au (academic)
See also:  CV ORCID ORCID=0000-0003-0637-6028


Researcher in Data Science

  • Research interests: Machine learning, data clustering, data fusion, learning to aggregate data, computational statistics, usable statistical software, statistical modelling for informetrics, sports analytics, science of science.
  • Author or co-author of 74 publications (see featured papers), including:
    • 32 journal papers (in outlets such as Proceedings of the National Academy of Sciences (PNAS), Information Fusion, Information Sciences, IEEE Transactions on Fuzzy Systems, R Journal, Journal of Informetrics, and Statistical Modelling),
    • 34 papers in proceedings of international conferences,
    • 8 research monographs, textbooks and edited volumes.
  • Current h-index = 14 (Google Scholar) / 10 (Scopus) / 10 (Web of Science).

Open Source Machine Learning, Data Analysis and Scientific Software Developer (Python, C, C++, R, etc.)

Data Science, Machine Learning, and Statistical Computing Tutor & Trainer

Recent News

2020-09-09 software

R package stringi 1.5.3 released

A new, major release of my R package stringi brings quite a few new features and bug fixes. Read more…

2020-09-07 software

Tutorial on stringi

A comprehensive tutorial on the stringi package is now available.

2020-08-17 software

stringi Has a New Website

I have created a new home(page) for my stringi package, see

2020-07-31 software

Python and R package genieclust 0.9.4

A reimplementation of my robust hierarchical clustering algorithm Genie is now available on PyPI and CRAN. Now even faster and equipped with many more features, including noise point detection. See for more details, documentation, benchmarks, and tutorials.

2020-07-08 new paper

Paper on SimilaR in R Journal

SimilaR: R Code Clone and Plagiarism Detection by Maciej Bartoszuk and me has been accepted for publication in the R Journal. Read more…

2020-06-08 new paper

Paper in PNAS: Three Dimensions of Scientific Impact

In a paper recently published in the Proceedings of the National Academy of Sciences of the United States of America (PNAS) (doi:10.1073/pnas.2001064117; joint work with Grzesiek Siudem, Basia Żogała-Siudem and Ania Cena), we consider the mechanisms behind one’s research success as measured by one’s papers’ citability. By acknowledging the perceived esteem might be a consequence not only of how valuable one’s works are but also of pure luck, we arrived at a model that can accurately recreate a citation record based on just three parameters: the number of publications, the total number of citations, and the degree of randomness in the citation patterns. As a by-product, we show that a single index will never be able to embrace the complex reality of the scientific impact. However, three of them can already provide us with a reliable summary. Read more…


Benchmark Suite for Clustering Algorithms - Version 1

Let's aggregate, polish and standardise the existing clustering benchmark suites referred to across the machine learning and data mining literature! See our new Benchmark Suite for Clustering Algorithms.

2020-02-23 book draft

Lightweight Machine Learning Classics with R

A first draft of my new textbook Lightweight Machine Learning Classics with R is now available. Read more…

Browse all news