# News

Atom feed

• 2022-07-16 - Minimalist Data Wrangling with Python

I’ve completed the textbook on data wrangling with Python. This work is, and will remain, available for everyone’s enjoyment, because I believe that education should be free for all. Just like open-source software, more open-access textbooks are urgently needed. Free == independent == high quality.

Read more ...

• Gleb Beliakov, Simon James, and I will have another paper published in Fuzzy Sets and Systems (doi:10.1016/j.fss.2022.06.025).

Read more ...

• A new paper of mine (coauthors: Agnieszka Geras, Grzesiek Siudem) will appear in the Journal of the Association for Information Science and Technology (doi:10.1002/asi.24658).

Read more ...

• 2022-03-15 - Ockham’s Index of Scientific Impact

A new paper of mine (co-authored by Basia Żogała-Siudem, Grzesiek Siudem, and Ania Cena) will appear in Scientometrics (doi:10.1007/s11192-022-04345-2)

Read more ...

• 2022-02-26 - Validating Citation Models by Proxy Indices

Accepted for publication in Journal of Informetrics: a new paper by Ania Cena, Basia Żogała-Siudem, Grzesiek Siudem, and yours truly (doi:10.1016/j.joi.2022.101267).

Read more ...

• 2022-02-19 - Ministry of Education and Science Award

Together with a number of excellent colleagues, I have received the Ministry of Education and Science, Poland, award for significant achievements in teaching, for the design and implementation of a new innovative course of study – Master of Data Science – at the Faculty of Mathematics and Information Science, Warsaw University of Technology.

Read more ...

• 2022-02-04 - Invited Lecture at FSTA 2022

I’m giving (online…) an invited lecture at FSTA 2022 today entitled Clustering and aggregation, where we will examine a few scenarios where aggregation methods can aid in the construction, analysis, and evaluation of tools related to data clustering, including linkage criteria, partition similarity measures, and cluster validity indices. We’ll also indicate some noteworthy challenges for both theoretical and practical future research endeavours.

Read more ...

• 2021-10-08 - Homepage Rewrite

My website is now generated with Sphinx.

Read more ...

• 2021-10-06 - Are Cluster Validity Measures (In)valid?

To appear in Information Sciences — a paper coauthored by Maciek Bartoszuk and Ania Cena (doi:10.1016/j.ins.2021.10.004).

Read more ...

• 2021-09-27 - Paper on stringi

A paper on my stringi package has been accepted for publication in Journal of Statistical Software (doi:10.18637/jss.v103.i02.

Read more ...

• A new paper by Maciek Bartoszuk and me is to appear in Knowledge-Based Systems (doi:10.1016/j.knosys.2021.107427).

Read more ...

• English is the native language for only 5% of the World population. Also, only 17% of us can understand this text. Moreover, the Latin alphabet is the main one for merely 36% of the total. The early computer era, now a very long time ago, was dominated by the US. Due to the proliferation of the internet, smartphones, social media, and other technologies and communication platforms, this is no longer the case. The stringx package replaces base R string functions (such as grep(), tolower(), and sprintf()) with ones that fully support the Unicode standards related to natural language processing, fixes some long-standing inconsistencies, and introduces some new, useful features. Thanks to ICU (International Components for Unicode) and stringi, they are fast, reliable, and portable across different platforms. Now available from CRAN.

Read more ...

• 2021-07-14 - stringi 1.7.2

Another major update of stringi brings a rewritten version of stri_sprintf, support for custom rule-based transliteration, extraction of named regex capture groups, and many other enhancements.

Read more ...

• 2021-06-17 - realtest 0.2.1 on CRAN

An update to realtest is now available.

Read more ...

• realtest is a framework for unit testing for realistic minimalists, where we distinguish between expected, acceptable, current, fallback, ideal, or regressive behaviour. It can also be used for monitoring other software projects for changes. Now available on CRAN.

Read more ...

• 2021-05-27 - Paper on the genieclust Python+R package

genieclust: Fast and robust hierarchical clustering was accepted for publication in SoftwareX (doi:10.1016/j.softx.2021.100722).

Read more ...