genie
(R) and genieclust
(Python) packagesThe Fast&Robust Genie Hierarchical Clustering Algorithm
The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure  unless the clusters are wellseparated.
To overcome its limitations, we proposed a new hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini or Bonferroni index) of the cluster sizes does not increase drastically above a given threshold.
Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed. The algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution further on. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering.
The algorithm is described in detail in the following paper: Gagolewski M., Bartoszuk M., Cena A., Genie: A new, fast, and outlierresistant hierarchical clustering algorithm, Information Sciences 363, 2016, pp. 823, doi:10.1016/j.ins.2016.05.003.
See also: Gagolewski M., Cena A., Bartoszuk M., Hierarchical clustering via penaltybased aggregation and the Genie approach, In: Torra V. et al. (Eds.), Modeling Decisions for Artificial Intelligence (Lecture Notes in Artificial Intelligence 9880), Springer, 2016, pp. 191202, doi:10.1007/9783319456560_16.
genie
MetadataType:  R package


Authors:  Marek Gagolewski, Maciej Bartoszuk, Anna Cena 
Maintainer:  Marek Gągolewski 
CRAN entry:  http://cran.rproject.org/web/packages/genie/ 
Github:  https://github.com/gagolews/genie 
License:  GPL (≥ 3) 
Installation in R :  install.packages("genie") 
Changelog:  See the NEWS file 
System requirements:  R (≥ 3.1.0), OpenMP, C++11 
Documentation: 
Browse the online manual Read the paper on the Genie algorithm Another paper with some extensions and discussion 
genieclust
MetadataType:  Python package 

Authors:  Marek Gagolewski 
Maintainer:  Marek Gągolewski 
PyPI entry:  https://pypi.org/project/genieclust/ 
Github:  https://github.com/gagolews/genieclust 
License:  BSD 3Clause "New" or "Revised" License 
System requirements:  Python 3.6+ together with sklearn, numpy, scipy, matplotlib, and cython. 