genieclust 1.3.0 on CRAN and PyPI

A new version of genieclust is now available on CRAN and PyPI.

Changelog:

  • The package was heavily refactored; common MST-related functions and classes as well as functions from the tools and plots modules were moved to the new deadwood package, which is now required.

  • [BACKWARD INCOMPATIBILITY] Outlier detection based solely on whether a node is a leaf of a minimum spanning tree w.r.t. some mutual reachability distance turned out to be subpar in more detailed experiments, especially for smaller smoothing factors. Note that in the previous versions of the package, this feature was deemed merely experimental; Hence, detect_noise in genie.default and skip_leaves, preprocess, and postprocess elsewhere are no longer available. Instead, use the more universal deadwood package now.

  • [BACKWARD INCOMPATIBILITY] quitefastmst version >= 0.9.1 is now required; the introduced backward-incompatible changes have been addressed. In particular, the definition of mutual reachability distances has changed. Unlike in Campello et al.’s 2013 paper, now the core distance is the distance to the M-th nearest neighbour, not the (M-1)-th one (not including self).

  • [Python] [BACKWARD INCOMPATIBILITY] internal module was renamed core.

  • [BACKWARD INCOMPATIBILITY] Deprecated functions such as mst_from_nn have been removed.

  • [Python] [BACKWARD INCOMPATIBILITY] compute_full_tree is now always True.

  • [BUGFIX] #92: Passing a non-square confusion matrix to normalized_pivoted_accuracy and normalized_clustering_accuracy yields an error as such objects are yet to be supported.

  • [R] gclust and genie now return the computed MST via the mst object attribute. genie returns an object of the class mstclust. This makes it operable with deadwood.

  • [Python] [BUGFIX] Modifying quitefastmst_params via set_state now invalidates the cached MST.

  • [Python] [NEW FEATURE] plots.plot_scatter has new arguments: asp, markers, and colours. The module globals mrk and col were renamed accordingly. However, as mentioned above, plots was moved to deadwood.

  • [Python] [BACKWARD INCOMPATIBILITY] compute_all_cuts in Genie was renamed coarser. If True, labels_ is still a vector representing the requested n_clusters. The coarser-grained labels are now stored in labels_matrix_ whose i-th row represents an (i+1)-partition.