Abstract. The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number of papers. Nevertheless, there are still some important issues that have not been discussed in the literature yet. In the first part of this two-part contribution we deal with the concept of regularization, a quite standard technique from machine learning applied so as to increase the fit quality on test and validation data samples. Due to the constraints on the weighting vector, it turns out that quite different methods can be used in the current framework, as compared to regression models. Moreover, it is worth noting that so far fitting weighted quasi-arithmetic means to empirical data has only been performed approximately, via the so-called linearization technique. In this paper we consider exact solutions to such special optimization tasks and indicate cases where linearization leads to much worse solutions.
Keywords. Aggregation functions, weighted quasi-arithmetic means, least squares fitting, regularization, linearization
Abstract. The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number of papers. Nevertheless, there are still some important issues that have not been discussed in the literature yet. In the second part of this two-part contribution we deal with a quite common situation in which we have inputs coming from different sources, describing a similar phenomenon, but which have not been properly normalized. In such a case, idempotent and nondecreasing functions cannot be used to aggregate them unless proper pre-processing is performed. The proposed idempotization method, based on the notion of B-splines, allows for an automatic calibration of independent variables. The introduced technique is applied in an R source code plagiarism detection system.
Keywords. Aggregation functions, weighted quasi-arithmetic means, least squares fitting, idempotence
Abstract. We discuss a generalization of the fuzzy (weighted) k-means clustering procedure and point out its relationships with data aggregation in spaces equipped with arbitrary dissimilarity measures. In the proposed setting, a data set partitioning is performed based on the notion of points' proximity to generic distance-based penalty minimizers. Moreover, a new data classification algorithm, resembling the k-nearest neighbors scheme but less computationally and memory demanding, is introduced. Rich examples in complex data domains indicate the usability of the methods and aggregation theory in general.
Keywords. fuzzy k-means algorithm, clustering, classification, fusion functions, penalty minimizers
geniepackage for R (co-authors: Maciej Bartoszuk and Anna Cena). A detailed description of the algorithm will be available in a forthcoming paper of ours.
Abstract: The famous Hirsch index has been introduced just ca. 10 years ago. Despite that, it is already widely used in many decision making tasks, like in evaluation of individual scientists, research grant allocation, or even production planning. It is known that the h-index is related to the discrete Sugeno integral and the Ky Fan metric introduced in 1940s. The aim of this paper is to propose a few modifications of this index as well as other fuzzy integrals -- also on bounded chains -- that lead to better discrimination of some types of data that are to be aggregated. All of the suggested compensation methods try to retain the simplicity of the original measure.
Abstract: The Hirsch's h-index is perhaps the most popular citation-based measure of the scientific excellence. In 2013 G. Ionescu and B. Chopard proposed an agent-based model for this index to describe a publications and citations generation process in an abstract scientific community. With such an approach one can simulate a single scientist's activity, and by extension investigate the whole community of researchers. Even though this approach predicts quite well the h-index from bibliometric data, only a solution based on simulations was given. In this paper, we complete their results with exact, analytic formulas. What is more, due to our exact solution we are able to simplify the Ionescu-Chopard model which allows us to obtain a compact formula for h-index. Moreover, a simulation study designed to compare both, approximated and exact, solutions is included. The last part of this paper presents evaluation of the obtained results on a real-word data set.
The proceedings of IPMU 2016 will be published in Communications in Computer and Information Science (CCIS) with Springer. Papers must be prepared in the LNCS/CCIS one-column page format. The length of papers is 12 pages in this special LaTeX2e format. For the details of submission click here.Please feel free to disseminate this information to other researchers that may potentially be interested in the session. For the details on the Session click here.
Notable changes since v0.5-2:
* [GENERAL] #88: C++ API is now available for use in, e.g., Rcpp packages, see https://github.com/Rexamine/ExampleRcppStringi for an example. * [BUGFIX] #183: Floating point exception raised in `stri_sub()` and `stri_sub<-()` when `to` or `length` was a zero-length numeric vector. * [BUGFIX] #180: `stri_c()` warned incorrectly (recycling rule) when using more than 2 elements.
Abstract: In this paper, we study the efficacy of the official ranking for international football teams compiled by FIFA, the body governing football competition around the globe. We present strategies for improving a team's position in the ranking. By combining several statistical techniques we derive an objective function in a decision problem of optimal scheduling of future matches. The presented results display how a team's position can be improved. Along the way, we compare the official procedure to the famous Elo rating system. Although it originates from chess, it has been successfully tailored to ranking football teams as well.