News¶
2024-07-25 - Equivalence of inequality indices in the three-dimensional model of informetric impact
Our (by Lucio Bertoli-Barsotti, Marek Gagolewski, Grzegorz Siudem, and Barbara Żogała-Siudem) new contribution Equivalence of inequality indices in the three-dimensional model of informetric impact was accepted for publication in the Journal of Informetrics (DOI:10.1016/j.joi.2024.101566).
2024-06-20 - Clustering with minimum spanning trees: How good can it be?
Clustering with minimum spanning trees: How good can it be? – a joint work with Anna Cena, Maciej Bartoszuk, and Łukasz Brzozowski, will appear in the Journal of Classification (DOI:10.1007/s00357-024-09483-1).
2024-06-11 - Normalised clustering accuracy: An asymmetric external cluster validity measure
My paper on a new well-behaving external cluster validity measure will appear in the Journal of Classification (DOI:10.1007/s00357-024-09482-2).
2024-01-08 - Gini-stable Lorenz curves and their relation to the generalised Pareto distribution
Journal of Informetrics will publish a new paper by Lucio Bertoli-Barsotti, Grzegorz Siudem, Barbara Żogała-Siudem, and yours truly (DOI:10.1016/j.joi.2024.101499).
2024-01-04 - Random generation of linearly constrained fuzzy measures and domain coverage performance evaluation
Jian-Zhang Wu, Gleb Beliakov, Simon James, and I published a new paper in Information Sciences (DOI:10.1016/j.ins.2023.120080).
2023-11-09 - stringi 1.8.1 on CRAN
2023-10-03 - Hierarchical clustering with OWA-based linkages, the Lance-Williams formula, and dendrogram inversions
A paper by Anna Cena, Simon James, Gleb Beliakov, and I entitled Hierarchical Clustering with OWA-based Linkages, the Lance-Williams Formula, and Dendrogram Inversions has been accepted for publication in Fuzzy Sets and Systems (DOI:10.1016/j.fss.2023.108740). A preprint is available on arXiv.
2023-06-28 - Deep R Programming v1.0.0
Final version of Deep R Programming is now available.
2023-03-23 - Submitted: Community Detection in Complex Networks
An early version of my most recent paper Community detection in complex networks via node similarity, graph representation learning, and hierarchical clustering is now available on arXiv.
2023-01-27 - A Benchmark-type Generalisation of the Sugeno Integral with Applications in Bibliometrics
New paper by Michał Boczek, Marek Kaluszka, Andrzej Okolewski, and yours truly to appear in Fuzzy Sets and Systems (DOI: 10.1016/j.fss.2023.01.014).
2022-12-28 - Deep R Programming (First Draft)
I’ve released an early draft of my new textbook Deep R Programming – the first 12 chapters. It is a comprehensive course on one of the most popular languages in data science (statistical computing, graphics, machine learning, data wrangling and analytics). It introduces the base language in-depth and is aimed at ambitious students, practitioners, and researchers who would like to become independent users of this powerful environment.
2022-11-16 - A Framework for Benchmarking Clustering Algorithms
A paper related to my framework for benchmarking clustering algorithms will appear in SoftwareX (DOI: 10.1016/j.softx.2022.101270). Its preprint is available on arXiv. The project also has a dedicated website: https://clustering-benchmarks.gagolewski.com.
2022-11-15 - Interpretable Reparameterisations of Citation Models
To be published in Journal of Informetrics: a new paper by Barbara Żogała-Siudem, Anna Cena, Greg Siudem, and I (DOI: 10.1016/j.joi.2022.101355).
2022-10-14 - Accidentality in Journal Citation Patterns
Maciej J. Mrowiński, Grzesiek Siudem, and I will have another contribution in the Journal of Informetrics (DOI: 10.1016/j.joi.2022.101341).
2022-09-05 - genieclust 1.1.0 on PyPI and CRAN
A new release of the
genieclust
package is available on PyPI and CRAN.2022-08-24 - Minimalist Data Wrangling with Python – Paperback Available
A printed version of my open-access textbook Minimalist Data Wrangling with Python can now be ordered from Amazon. It is exactly the same as the freely available PDF version.
2022-08-10 - Power Laws, the Price Model, and the Pareto type-2 Distribution
A new contribution of ours (with Grzesiek Siudem and Przemysław Nowak) will appear in Physica A: Statistical Mechanics and its Applications (preprint; (DOI: 10.1016/j.physa.2022.128059).
2022-08-08 - genieclust 1.0.1 on PyPI and CRAN
A new release of
genieclust
has been published on PyPI and CRAN.2022-07-16 - Minimalist Data Wrangling with Python
I’ve completed a textbook on data wrangling with Python. This work is, and will remain, available for everyone’s enjoyment, because I believe that education should be free for all. Just like open-source software, more open-access textbooks are urgently needed. Free == independent == higher quality.
2022-07-11 - stringi 1.7.8 on CRAN
2022-07-02 - Reduction of Variables and Constraints in Fitting Antibuoyant Fuzzy Measures to Data Using Linear Programming
Gleb Beliakov, Simon James, and I will have another paper published in Fuzzy Sets and Systems (DOI: 10.1016/j.fss.2022.06.025).
2022-05-04 - Time to Vote: Temporal Clustering of User Activity on Stack Overflow
A new paper of mine (coauthors: Agnieszka Geras, Grzesiek Siudem) will appear in the Journal of the Association for Information Science and Technology (DOI: 10.1002/asi.24658).
2022-03-15 - Ockham’s Index of Scientific Impact
A new paper of mine (co-authored by Basia Żogała-Siudem, Grzesiek Siudem, and Ania Cena) will appear in Scientometrics (DOI: 10.1007/s11192-022-04345-2)
2022-02-26 - Validating Citation Models by Proxy Indices
Accepted for publication in Journal of Informetrics: a new paper by Ania Cena, Basia Żogała-Siudem, Grzesiek Siudem, and yours truly (DOI: 10.1016/j.joi.2022.101267).
2022-02-19 - Ministry of Education and Science Award
Together with a number of excellent colleagues, I have received the Ministry of Education and Science, Poland, award for significant achievements in teaching, for the design and implementation of a new innovative course of study – Master of Data Science – at the Faculty of Mathematics and Information Science, Warsaw University of Technology.
2022-02-04 - Invited Lecture at FSTA 2022
I’m giving (online…) an invited lecture at FSTA 2022 today entitled Clustering and aggregation, where we will examine a few scenarios where aggregation methods can aid in the construction, analysis, and evaluation of tools related to data clustering, including linkage criteria, partition similarity measures, and cluster validity indices. We’ll also indicate some noteworthy challenges for both theoretical and practical future research endeavours.
2021-10-08 - Homepage Rewrite
My website is now generated with Sphinx.
2021-10-06 - Are Cluster Validity Measures (In)valid?
To appear in Information Sciences — a paper coauthored by Maciek Bartoszuk and Ania Cena (doi:10.1016/j.ins.2021.10.004).
2021-09-27 - Paper on stringi
A paper on my
stringi
package has been accepted for publication in Journal of Statistical Software (doi:10.18637/jss.v103.i02.2021-08-26 - T-norms or t-conorms? How to aggregate similarity degrees for plagiarism detection
A new paper by Maciek Bartoszuk and me is to appear in Knowledge-Based Systems (doi:10.1016/j.knosys.2021.107427).
2021-07-29 - stringx: Drop-in replacements for base R string functions powered by stringi
English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. The
stringx
package replaces base R string functions (such asgrep()
,tolower()
, andsprintf()
) with ones that fully support the Unicode standards related to natural language processing, fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to ICU (International Components for Unicode) andstringi
, they are fast, reliable, and portable across different platforms. Now available from CRAN.2021-07-14 - stringi 1.7.2
Another major update of
stringi
brings a rewritten version ofstri_sprintf
, support for custom rule-based transliteration, extraction of named regex capture groups, and many other enhancements.2021-06-17 - realtest 0.2.1 on CRAN
An update to
realtest
is now available.2021-06-04 - realtest: When Expectations Meet Reality: Realistic Unit Testing in R
realtest
is a framework for unit testing for realistic minimalists, where we distinguish between expected, acceptable, current, fallback, ideal, or regressive behaviour. It can also be used for monitoring other software projects for changes. Now available on CRAN.2021-05-27 - Paper on the genieclust Python+R package
genieclust: Fast and robust hierarchical clustering was accepted for publication in SoftwareX (doi:10.1016/j.softx.2021.100722).