Software
Shortcuts: • genieclust (Python/R) • stringi (R/1) • stringx (R) • realtest (R)
My developer (“social media”) profiles: • GitHub • StackOverflow

I’ve been programming since the age of 7; my first computer was the C64.
I believe most software should be free (and that good software is).
genieclust Package for Python and R
Fast and robust hierarchical clustering with noise point detection: Genie finds meaningful clusters and is fast even on large data sets.
Paper on the genieclust package in SoftwareX (doi:10.1016/j.softx.2021.100722)
Paper on the Genie algorithm in Information Sciences (doi:10.1016/j.ins.2016.05.003)
stringi Package for R
stringi (pronounced “stringy”, IPA [strinɡi]) is THE R1 package for very fast, portable, correct, consistent, and convenient string/text processing in any locale or character encoding. It is one of the most often downloaded packages on CRAN.
Paper on stringi in Journal of Statistical Software (TODO — in press; see the online version)
Other R Packages
genie
A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm (note that this package been superseded by genieclust)
SimilaR
R Source Code Similarity Evaluation
(maintained by Maciej Bartoszuk)
Paper on the SimilaR package in the R Journal (doi:10.32614/RJ-2020-017)