Benchmark Suite for Clustering Algorithms - Version 1

Let's aggregate, polish and standardise the existing clustering benchmark suites referred to across the machine learning and data mining literature! See our new Benchmark Suite for Clustering Algorithms.
2020-02-23 book draft

Lightweight Machine Learning Classics with R

A first draft of my new textbook Lightweight Machine Learning Classics with R is now available.

About. Explore some of the most fundamental algorithms which have stood the test of time and provide the basis for innovative solutions in data-driven AI. Learn how to use the R language for implementing various stages of data processing and modelling activities. Appreciate mathematics as the universal language for formalising data-intense problems and communicating their solutions. The book is for you if you're yet to be fluent with university-level linear algebra, calculus and probability theory or you've forgotten all the maths you've ever learned, and are seeking a gentle, yet thorough, introduction to the topic.

2020-02-10 new paper

Genie+OWA: Robustifying Hierarchical Clustering with OWA-based Linkages

Check out our (by Anna Cena and me) most recent paper on the best hierarchical clustering algorithm in the world – Genie. It is going to appear in Information Sciences; doi:10.1016/j.ins.2020.02.025.

Abstract. We investigate the application of the Ordered Weighted Averaging (OWA) data fusion operator in agglomerative hierarchical clustering. The examined setting generalises the well-known single, complete and average linkage schemes. It allows to embody expert knowledge in the cluster merge process and to provide a much wider range of possible linkages. We analyse various families of weighting functions on numerous benchmark data sets in order to assess their influence on the resulting cluster structure. Moreover, we inspect the correction for the inequality of cluster size distribution -- similar to the one in the Genie algorithm. Our results demonstrate that by robustifying the procedure with the Genie correction, we can obtain a significant performance boost in terms of clustering quality. This is particularly beneficial in the case of the linkages based on the closest distances between clusters, including the single linkage and its "smoothed" counterparts. To explain this behaviour, we propose a new linkage process called three-stage OWA which yields further improvements. This way we confirm the intuition that hierarchical cluster analysis should rather take into account a few nearest neighbours of each point, instead of trying to adapt to their non-local neighbourhood.

2019-12-11 new paper

DC optimization for constructing discrete Sugeno integrals and learning nonadditive measures

We (Gleb Beliakov, Simon James and I) have another paper accepted for publication – this time in the Optimization journal; doi:10.1080/02331934.2019.1705300.

Abstract. Defined solely by means of order-theoretic operations meet (min) and join (max), weighted lattice polynomial functions are particularly useful for modeling data on an ordinal scale. A special case, the discrete Sugeno integral, defined with respect to a nonadditive measure (a capacity), enables accounting for the interdependencies between input variables.

However until recently the problem of identifying the fuzzy measure values with respect to various objectives and requirements has not received a great deal of attention. By expressing the learning problem as the difference of convex functions, we are able to apply DC (difference of convex) optimization methods. Here we formulate one of the global optimization steps as a local linear programming problem and investigate the improvement under different conditions.


IEEE WCCI 2020 Special Session - Aggregation Structures: New Trends and Applications

Call for contributions – IEEE World Congress on Computational Intelligence (WCCI) 2020, Glasgow, Scotland — FUZZ-IEEE-6 Special Session on Aggregation Structures: New Trends and Applications; for more details, click here.
2019-11-14 new paper

Robust fitting for the Sugeno integral with respect to general fuzzy measures

The editor of Information Sciences have just let us know that a paper by Gleb Beliakov, Simon James and me will be published in this outlet.

Abstract. The Sugeno integral is an expressive aggregation function with potential applications across a range of decision contexts. Its calculation requires only the lattice minimum and maximum operations, making it particularly suited to ordinal data and robust to scale transformations. However, for practical use in data analysis and prediction, we require efficient methods for learning the associated fuzzy measure. While such methods are well developed for the Choquet integral, the fitting problem is more difficult for the Sugeno integral because it is not amenable to being expressed as a linear combination of weights, and more generally due to plateaus and non-differentiability in the objective function. Previous research has hence focused on heuristic approaches or simplified fuzzy measures. Here we show that the problem of fitting the Sugeno integral to data such that the maximum absolute error is minimized can be solved using an efficient bilevel program. This method can be incorporated into algorithms that learn fuzzy measures with the aim of minimizing the median residual. This equips us with tools that make the Sugeno integral a feasible option in robust data regression and analysis. We provide experimental comparison with a genetic algorithms approach and an example in data analysis.


Deakin University

On 23rd of September 2019 I commence as a Senior Lecturer in Applied Artificial Intelligence at Deakin University in Melbourne-Burwood, Australia (Australian senior lecturer is supposed to be equivalent to an associate professor in the US).
2019-09-10 new paper

Constrained Ordered Weighted averaging aggregation with multiple comonotone constraints

Lucian Coroianu, Robert Fullér, Simon James and I got a paper accepted in the Fuzzy Sets and Systems outlet. Abstract below.

Abstract. The constrained ordered weighted averaging (OWA) aggregation problem arises when we aim to maximize or minimize a convex combination of order statistics under linear inequality constraints that act on the variables with respect to their original sources. The standalone approach to optimizing the OWA under constraints is to consider all permutations of the inputs, which becomes quickly infeasible when there are more than a few variables, however in certain cases we can take advantage of the relationships amongst the constraints and the corresponding solution structures. For example, we can consider a land-use allocation satisfaction problem with an auxiliary aim of balancing land-types, whereby the response curves for each species are non-decreasing with respect to the land-types. This results in comonotone constraints, which allow us to drastically reduce the complexity of the problem.
In this paper, we show that if we have an arbitrary number of constraints that are comonotone (i.e., they share the same ordering permutation of the coefficients), then the optimal solution occurs for decreasing components of the solution. After investigating the form of the solution in some special cases and providing theoretical results that shed light on the form of the solution, we detail practical approaches to solving and give real-world examples.