Publication List

Ordered by Type

This list is also available in BibTeX format.

Journal Articles 30

Cena A., Gagolewski M., Genie+OWA: Robustifying Hierarchical Clustering with OWA-based Linkages, Information Sciences 520, 2020, pp. 324-336. doi:10.1016/j.ins.2020.02.025

Abstract. We investigate the application of the Ordered Weighted Averaging (OWA) data fusion operator in agglomerative hierarchical clustering. The examined setting generalises the well-known single, complete and average linkage schemes. It allows to embody expert knowledge in the cluster merge process and to provide a much wider range of possible linkages. We analyse various families of weighting functions on numerous benchmark data sets in order to assess their influence on the resulting cluster structure. Moreover, we inspect the correction for the inequality of cluster size distribution -- similar to the one in the Genie algorithm. Our results demonstrate that by robustifying the procedure with the Genie correction, we can obtain a significant performance boost in terms of clustering quality. This is particularly beneficial in the case of the linkages based on the closest distances between clusters, including the single linkage and its "smoothed" counterparts. To explain this behaviour, we propose a new linkage process called three-stage OWA which yields further improvements. This way we confirm the intuition that hierarchical cluster analysis should rather take into account a few nearest neighbours of each point, instead of trying to adapt to their non-local neighbourhood.

Keywords. hierarchical clustering, OWA, data fusion, aggregation, Genie

Beliakov G., Gagolewski M., James S., DC optimization for constructing discrete Sugeno integrals and learning nonadditive measures, Optimization, 2019, in press. doi:10.1080/02331934.2019.1705300

Abstract. Defined solely by means of order-theoretic operations meet (min) and join (max), weighted lattice polynomial functions are particularly useful for modeling data on an ordinal scale. A special case, the discrete Sugeno integral, defined with respect to a nonadditive measure (a capacity), enables accounting for the interdependencies between input variables.
However until recently the problem of identifying the fuzzy measure values with respect to various objectives and requirements has not received a great deal of attention. By expressing the learning problem as the difference of convex functions, we are able to apply DC (difference of convex) optimization methods. Here we formulate one of the global optimization steps as a local linear programming problem and investigate the improvement under different conditions.

Keywords. Aggregation functions, nonadditive measures, Sugeno integral, capacities, DC optimization

Coroianu L., Fullér R., Gagolewski M., James S., Constrained Ordered Weighted averaging aggregation with multiple comonotone constraints, Fuzzy Sets and Systems, 2019, in press. doi:10.1016/j.fss.2019.09.006

Abstract. The constrained ordered weighted averaging (OWA) aggregation problem arises when we aim to maximize or minimize a convex combination of order statistics under linear inequality constraints that act on the variables with respect to their original sources. The standalone approach to optimizing the OWA under constraints is to consider all permutations of the inputs, which becomes quickly infeasible when there are more than a few variables, however in certain cases we can take advantage of the relationships amongst the constraints and the corresponding solution structures. For example, we can consider a land-use allocation satisfaction problem with an auxiliary aim of balancing land-types, whereby the response curves for each species are non-decreasing with respect to the land-types. This results in comonotone constraints, which allow us to drastically reduce the complexity of the problem.
In this paper, we show that if we have an arbitrary number of constraints that are comonotone (i.e., they share the same ordering permutation of the coefficients), then the optimal solution occurs for decreasing components of the solution. After investigating the form of the solution in some special cases and providing theoretical results that shed light on the form of the solution, we detail practical approaches to solving and give real-world examples.

Keywords. Multiple criteria evaluation; Ordered weighted averaging; Constrained OWA aggregation; Ecology; Work allocation

Gagolewski M., Pérez-Fernández R., De Baets B., An inherent difficulty in the aggregation of multidimensional data, IEEE Transactions on Fuzzy Systems 28(3), 2020, pp. 602-606. doi:10.1109/TFUZZ.2019.2908135

Abstract. In the field of information fusion, the problem of data aggregation has been formalized as an order-preserving process that builds upon the property of monotonicity. However, fields such as computational statistics, data analysis and geometry, usually emphasize the role of equivariances to various geometrical transformations in aggregation processes. Admittedly, if we consider a unidimensional data fusion task, both requirements are often compatible with each other. Nevertheless, in this paper we show that, in the multidimensional setting, the only idempotent functions that are monotone and orthogonal equivariant are the over-simplistic weighted centroids. Even more, this result still holds after replacing monotonicity and orthogonal equivariance by the weaker property of orthomonotonicity. This implies that the aforementioned approaches to the aggregation of multidimensional data are irreconcilable, and that, if a weighted centroid is to be avoided, we must choose between monotonicity and a desirable behaviour with regard to orthogonal transformations.

Keywords. multidimensional data aggregation, monotonicity, orthogonal equivariance, centroid

Geras A., Siudem G., Gagolewski M., Should we introduce a dislike button for academic papers?, Journal of the Association for Information Science and Technology 71(2), 2020, pp. 221-229. doi:10.1002/ASI.24231

Abstract. On the grounds of the revealed, mutual resemblance between the behaviour of users of Stack Exchange and the dynamics of the citations accumulation process in the scientific community, we tackled an outwardly intractable problem of assessing the impact of introducing "negative" citations.
Although the most frequent reason to cite a paper is to highlight the connection between the two publications, researchers sometimes mention an earlier work to cast a negative light. While computing citation-based scores, for instance the h-index, information about the reason why a paper was mentioned is neglected. Therefore it can be questioned whether these indices describe scientific achievements accurately.
In this contribution we shed insight into the problem of "negative" citations, analysing data from Stack Exchange and, to draw more universal conclusions, we derive an approximation of citations scores. Here we show that the quantified influence of introducing negative citations is of lesser importance and that they could be used as an indicator of where attention of scientific community is allocated.

Keywords. citation analysis, the Hirsch index, negative citations, research evaluation, science of science

Beliakov G., Gagolewski M., James S., Robust fitting for the Sugeno integral with respect to general fuzzy measures, Information Sciences 514, 2020, pp. 449-461. doi:10.1016/j.ins.2019.11.024

Abstract. The Sugeno integral is an expressive aggregation function with potential applications across a range of decision contexts. Its calculation requires only the lattice minimum and maximum operations, making it particularly suited to ordinal data and robust to scale transformations. However, for practical use in data analysis and prediction, we require efficient methods for learning the associated fuzzy measure. While such methods are well developed for the Choquet integral, the fitting problem is more difficult for the Sugeno integral because it is not amenable to being expressed as a linear combination of weights, and more generally due to plateaus and non-differentiability in the objective function. Previous research has hence focused on heuristic approaches or simplified fuzzy measures. Here we show that the problem of fitting the Sugeno integral to data such that the maximum absolute error is minimized can be solved using an efficient bilevel program. This method can be incorporated into algorithms that learn fuzzy measures with the aim of minimizing the median residual. This equips us with tools that make the Sugeno integral a feasible option in robust data regression and analysis. We provide experimental comparison with a genetic algorithms approach and an example in data analysis.

Keywords. Sugeno integral, fuzzy measure, parameter learning, aggregation functions

Gagolewski M., James S., Beliakov G., Supervised learning to aggregate data with the Sugeno integral, IEEE Transactions on Fuzzy Systems 27(4), 2019, pp. 810-815. doi:10.1109/TFUZZ.2019.2895565

Abstract. The problem of learning symmetric capacities (or fuzzy measures) from data is investigated toward applications in data analysis and prediction as well as decision making. Theoretical results regarding the solution minimizing the mean absolute error are exploited to develop an exact branch-refine-and-bound-type algorithm for fitting Sugeno integrals (weighted lattice polynomial functions, max-min operators) with respect to symmetric capacities. The proposed method turns out to be particularly suitable for acting on ordinal data. In addition to providing a model that can be used for the general data regression task, the results can be used, among others, to calibrate generalized h-indices to bibliometric data.

Keywords. weight learning, ordinal data fitting, fuzzy measures, Sugeno integral, lattice polynomials, h-index

Pérez-Fernández R., De Baets B., Gagolewski M., A taxonomy of monotonicity properties for the aggregation of multidimensional data, Information Fusion 52, 2019, pp. 322-334. doi:10.1016/j.inffus.2019.05.006

Abstract. The property of monotonicity, which requires a function to preserve a given order, has been considered the standard in the aggregation of real numbers for decades. In this paper, we argue that, for the case of multidimensional data, an order-based definition of monotonicity is far too restrictive. We propose several meaningful alternatives to this property not involving the preservation of a given order by returning to its early origins stemming from the field of calculus. Numerous aggregation methods for multidimensional data commonly used by practitioners are studied within our new framework.

Keywords. monotonicity, aggregation, multidimensional data, centroid, spatial median

Beliakov G., Gagolewski M., James S., Aggregation on ordinal scales with the Sugeno integral for biomedical applications, Information Sciences 501, 2019, pp. 377-387. doi:10.1016/j.ins.2019.06.023

Abstract. The Sugeno integral is a function particularly suited to the aggregation of ordinal inputs. Defined with respect to a fuzzy measure, its ability to account for complementary and redundant relationships between variables brings much potential to the field of biomedicine, where it is common for measurements and patient information to be expressed qualitatively. However, practical applications require well-developed methods for identifying the Sugeno integral's parameters, and this task is not easily expressed using the standard optimisation approaches. Here we formulate the objective function as the difference of two convex functions, which enables the use of specialised numerical methods. Such techniques are compared with other global optimisation frameworks through a number of numerical experiments.

Keywords. aggregation functions, fuzzy measures, Sugeno integral, capacities

Coroianu L., Gagolewski M., Grzegorzewski P., Piecewise linear approximation of fuzzy numbers: algorithms, arithmetic operations and stability of characteristics, Soft Computing 23(19), 2019, pp. 9491-9505. doi:10.1007/s00500-019-03800-2

Abstract. The problem of the piecewise linear approximation of fuzzy numbers giving outputs nearest to the inputs with respect to the Euclidean metric is discussed. The results given in Coroianu et al. (Fuzzy Sets Syst 233:26–51, 2013) for the 1-knot fuzzy numbers are generalized for arbitrary n-knot (n>=2) piecewise linear fuzzy numbers. Some results on the existence and properties of the approximation operator are proved. Then, the stability of some fuzzy number characteristics under approximation as the number of knots tends to infinity is considered. Finally, a simulation study concerning the computer implementations of arithmetic operations on fuzzy numbers is provided. Suggested concepts are illustrated by examples and algorithms ready for the practical use. This way, we throw a bridge between theory and applications as the latter ones are so desired in real-world problems.

Keywords. Approximation of fuzzy numbers, Calculations on fuzzy numbers, Characteristics of fuzzy numbers, Fuzzy number, Piecewise linear approximation

Lasek J., Gagolewski M., The efficacy of league formats in ranking teams, Statistical Modelling 18(5-6), 2018, pp. 411-435. doi:10.1177/1471082X18798426

Abstract. The efficacy of different league formats in ranking teams according to their true latent strength is analysed. To this end, a new approach for estimating attacking and defensive strengths based on the Poisson regression for modelling match outcomes is proposed. Various performance metrics are estimated reflecting the agreement between latent teams' strength parameters and their final rank in the league table. The tournament designs studied here are used in the majority of European top-tier association football competitions. Based on numerical experiments, it turns out that a two-stage league format comprising of the three round-robin tournament together with an extra single round-robin is the most efficacious setting. In particular, it is the most accurate in selecting the best team as the winner of the league. Its efficacy can be enhanced by setting the number of points allocated for a win to two (instead of three that is currently in effect in association football).

Keywords. association football, league formats, rankings, rating systems, simulation, tournament design

Beliakov G., Gagolewski M., James S., Pace S., Pastorello N., Thilliez E., Vasa R., Measuring traffic congestion: An approach based on learning weighted inequality, spread and aggregation indices from comparison data, Applied Soft Computing 67, 2018, pp. 910-919. doi:10.1016/j.asoc.2017.07.014

Abstract. As cities increase in size, governments and councils face the problem of designing infrastructure and approaches to traffic management that alleviate congestion. The problem of objectively measuring congestion involves taking into account not only the volume of traffic moving throughout a network, but also the inequality or spread of this traffic over major and minor intersections. For modelling such data, we investigate the use of weighted congestion indices based on various aggregation and spread functions. We formulate the weight learning problem for comparison data and use real traffic data obtained from a medium-sized Australian city to evaluate their usefulness.

Keywords. aggregation functions, inequality indices, spread measures, learning weights, traffic analysis

Gagolewski M., Penalty-based aggregation of multidimensional data, Fuzzy Sets and Systems 325, 2017, pp. 4-20. doi:10.1016/j.fss.2016.12.009

Abstract. Research in aggregation theory is nowadays still mostly focused on algorithms to summarize tuples consisting of observations in some real interval or of diverse general ordered structures. Of course, in practice of information processing many other data types between these two extreme cases are worth inspecting. This contribution deals with the aggregation of lists of data points in Rd for arbitrary d≥1. Even though particular functions aiming to summarize multidimensional data have been discussed by researchers in data analysis, computational statistics and geometry, there is clearly a need to provide a comprehensive and unified model in which their properties like equivariances to geometric transformations, internality, and monotonicity may be studied at an appropriate level of generality. The proposed penalty-based approach serves as a common framework for all idempotent information aggregation methods, including componentwise functions, pairwise distance minimizers, and data depth-based medians. It also allows for deriving many new practically useful tools.

Keywords. multidimensional data aggregation, penalty functions, data depth, centroid, median

Beliakov G., Gagolewski M., James S., Penalty-based and other representations of economic inequality, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 24(Suppl. 1), 2016, pp. 1-23. doi:10.1142/S0218488516400018

Abstract. Economic inequality measures are employed as a key component in various socio-demographic indices to capture the disparity between the wealthy and poor. Since their inception, they have also been used as a basis for modelling spread and disparity in other contexts. While recent research has identified that a number of classical inequality and welfare functions can be considered in the framework of OWA operators, here we propose a framework of penalty-based aggregation functions and their associated penalties as measures of inequality.

Keywords. penalty functions, aggregation functions, inequality indices, spread measures

Gagolewski M., Bartoszuk M., Cena A., Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm, Information Sciences 363, 2016, pp. 8-23. doi:10.1016/j.ins.2016.05.003

Abstract. The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure – unless the clusters are well-separated. To overcome its limitations, we propose a new hierarchical clustering linkage criterion called Genie. Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini- or Bonferroni-index) of the cluster sizes does not increase drastically above a given threshold. The presented benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed. The Genie algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution further on. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering. It can be applied on arbitrary spaces equipped with a dissimilarity measure, e.g., on real vectors, DNA or protein sequences, images, rankings, informetric data, etc. A reference implementation of the algorithm has been included in the open source genie package for R.

Keywords. hierarchical clustering, single linkage, inequity measures, Gini-index

Mesiar R., Gagolewski M., H-index and other Sugeno integrals: Some defects and their compensation, IEEE Transactions on Fuzzy Systems 24(6), 2016, pp. 1668-1672. doi:10.1109/TFUZZ.2016.2516579

Abstract. The famous Hirsch index has been introduced just ca. 10 years ago. Despite that, it is already widely used in many decision making tasks, like in evaluation of individual scientists, research grant allocation, or even production planning. It is known that the h-index is related to the discrete Sugeno integral and the Ky Fan metric introduced in 1940s. The aim of this paper is to propose a few modifications of this index as well as other fuzzy integrals – also on bounded chains – that lead to better discrimination of some types of data that are to be aggregated. All of the suggested compensation methods try to retain the simplicity of the original measure.

Keywords. h-index, Sugeno integral, Ky Fan metric, Shilkret integral, decomposition integrals

Lasek J., Szlavik Z., Gagolewski M., Bhulai S., How to improve a team's position in the FIFA ranking – A simulation study, Journal of Applied Statistics 43(7), 2016, pp. 1349-1368. doi:10.1080/02664763.2015.1100593

Abstract. In this paper, we study the efficacy of the official ranking for international football teams compiled by FIFA, the body governing football competition around the globe. We present strategies for improving a team's position in the ranking. By combining several statistical techniques we derive an objective function in a decision problem of optimal scheduling of future matches. The presented results display how a team's position can be improved. Along the way, we compare the official procedure to the famous Elo rating system. Although it originates from chess, it has been successfully tailored to ranking football teams as well.

Keywords. association football, FIFA ranking, prediction models, Monte Carlo simulations, optimal schedule, team rankings

Żogała-Siudem B., Siudem G., Cena A., Gagolewski M., Agent-based model for the h-index – Exact solution, European Physical Journal B 89:21, 2016. doi:10.1140/epjb/e2015-60757-1

Abstract. Hirsch’s h-index is perhaps the most popular citation-based measure of scientific excellence. In 2013, Ionescu and Chopard proposed an agent-based model describing a process for generating publications and citations in an abstract scientific community [G. Ionescu, B. Chopard, Eur. Phys. J. B 86, 426 (2013)]. Within such a framework, one may simulate a scientist’s activity, and – by extension – investigate the whole community of researchers. Even though the Ionescu and Chopard model predicts the h-index quite well, the authors provided a solution based solely on simulations. In this paper, we complete their results with exact, analytic formulas. What is more, by considering a simplified version of the Ionescu-Chopard model, we obtained a compact, easy to compute formula for the h-index. The derived approximate and exact solutions are investigated on a simulated and real-world data sets.

Keywords. Statistical and nonlinear physics, preferential attachment rule, h-index

Cena A., Gagolewski M., Mesiar R., Problems and challenges of information resources producers' clustering, Journal of Informetrics 9(2), 2015, pp. 273–284. doi:10.1016/j.joi.2015.02.005

Abstract. Classically, unsupervised machine learning techniques are applied on data sets with fixed number of attributes (variables). However, many problems encountered in the field of informetrics face us with the need to extend these kinds of methods in a way such that they may be computed over a set of nonincreasingly ordered vectors of unequal lengths. Thus, in this paper, some new dissimilarity measures (metrics) are introduced and studied. Owing to that we may use i.a. hierarchical clustering algorithms in order to determine an input data set's partition consisting of sets of producers that are homogeneous not only with respect to the quality of information resources, but also their quantity.

Keywords. aggregation, hierarchical clustering, distance, metric

Gagolewski M., Spread measures and their relation to aggregation functions, European Journal of Operational Research 241(2), 2015, pp. 469-477. doi:10.1016/j.ejor.2014.08.034

Abstract. The theory of aggregation most often deals with measures of central tendency. However, sometimes a very different kind of a numeric vector's synthesis into a single number is required. In this paper we introduce a class of mathematical functions which aim to measure spread or scatter of one-dimensional quantitative data. The proposed definition serves as a common, abstract framework for measures of absolute spread known from statistics, exploratory data analysis and data mining, e.g. the sample variance, standard deviation, range, interquartile range (IQR), median absolute deviation (MAD), etc. Additionally, we develop new measures of experts' opinions diversity or consensus in group decision making problems. We investigate some properties of spread measures, show how are they related to aggregation functions, and indicate their new potentially fruitful application areas.

Keywords. Group decisions and negotiations, aggregation, spread, deviation, variance

Cena A., Gagolewski M., OM3: Ordered maxitive, minitive, and modular aggregation operators – axiomatic and probabilistic properties in an arity-monotonic setting, Fuzzy Sets and Systems 264, 2015, pp. 138-159. doi:10.1016/j.fss.2014.04.001

Abstract. The recently-introduced OM3 aggregation operators fulfill three appealing properties: they are simultaneously minitive, maxitive, and modular. Among the instances of OM3 operators we find e.g. OWMax and OWMin operators, the famous Hirsch's h-index and all its natural generalizations.
In this paper the basic axiomatic and probabilistic properties of extended, i.e. in an arity-dependent setting, OM3 aggregation operators are studied. We illustrate the difficulties one is inevitably faced with when trying to combine the quality and quantity of numeric items into a single number. The discussion on such aggregation methods is particularly important in the information resources producers assessment problem, which aims to reduce the negative effects of information overload. It turns out that the Hirsch-like indices of impact do not fulfill a set of very important properties, which puts the sensibility of their practical usage into question. Moreover, thanks to the probabilistic analysis of the operators in an i.i.d. model, we may better understand the relationship between the aggregated items' quality and their producers' productivity.

Keywords. Aggregation; ordered modularity, maxitivity and minitivity; arity-monotonicity; impact assessment; Hirsch's h-index; informetrics

Gagolewski M., Mesiar R., Monotone measures and universal integrals in a uniform framework for the scientific impact assessment problem, Information Sciences 263, 2014, pp. 166-174. doi:10.1016/j.ins.2013.12.004

Abstract. The Choquet, Sugeno, and Shilkret integrals with respect to monotone measures, as well as their generalization – the universal integral, stand for a useful tool in decision support systems. In this paper we propose a general construction method for aggregation operators that may be used in assessing output of scientists. We show that the most often currently used indices of bibliometric impact, like Hirsch's h, Woeginger's w, Egghe's g, Kosmulski's MAXPROD, and similar constructions, may be obtained by means of our framework. Moreover, the model easily leads to some new, very interesting functions.

Keywords. Choquet, Sugeno, Shilkret, universal integral; monotone measures; aggregation; indices of scientific impact, bibliometrics; h-index, w-index, g-index, MAXPROD-index

Coroianu L., Gagolewski M., Grzegorzewski P., Nearest piecewise linear approximation of fuzzy numbers, Fuzzy Sets and Systems 233, 2013, pp. 26-51. doi:10.1016/j.fss.2013.02.005

Abstract. The problem of the nearest approximation of fuzzy numbers by piecewise linear 1-knot fuzzy numbers is discussed. By using 1-knot fuzzy numbers one may obtain approximations which are simple enough and flexible to reconstruct the input fuzzy concepts under study. They might be also perceived as a generalization of the trapezoidal approximations. Moreover, these approximations possess some desirable properties. Apart from theoretical considerations approximation algorithms that can be applied in practice are also given.

Keywords. Approximation of fuzzy numbers; Fuzzy number; Piecewise linear approximation

Gagolewski M., Scientific impact assessment cannot be fair, Journal of Informetrics 7(4), 2013, pp. 792-802. doi:10.1016/j.joi.2013.07.001

Abstract. In this paper we deal with the problem of aggregating numeric sequences of arbitrary length that represent e.g. citation records of scientists. Impact functions are the aggregation operators that express as a single number not only the quality of individual publications, but also their author's productivity.
We examine some fundamental properties of these aggregation tools. It turns out that each impact function which always gives indisputable valuations must necessarily be trivial. Moreover, it is shown that for any set of citation records in which none is dominated by the other, we may construct an impact function that gives any a prori-established authors' ordering. Theoretically then, there is considerable room for manipulation in the hands of decision makers.
We also discuss the differences between the impact function-based and the multicriteria decision making-based approach to scientific quality management, and study how the introduction of new properties of impact functions affects the assessment process. We argue that simple mathematical tools like the h- or g-index (as well asother bibliometric impact indices) may not necessarily be a good choice when it comes to assess scientific achievements.

Keywords. Impact functions; aggregation; decision making; reference modeling; Hirsch's h-index; scientometrics; bibliometrics

Gagolewski M., On the relationship between symmetric maxitive, minitive, and modular aggregation operators, Information Sciences 221, 2013, pp. 170-180. doi:10.1016/j.ins.2012.09.005

Abstract. In this paper the relationship between symmetric minitive, maxitive, and modular aggregation operators is considered. It is shown that the intersection between any two of the three discussed classes is the same. Moreover, the intersection is explicitly characterized.
It turns out that the intersection contains families of aggregation operators such as OWMax, OWMin, and many generalizations of the widely-known Hirsch’s h-index, often applied in scientific quality control.

Keywords. Aggregation operators; OWMax; OMA; OWA; Hirsch’s h-index; Scientometrics

Comments. Later we proposed that the symmetric minitive, maxitive, and modular aggregation operators may be called the OM3 agops, see (Cena A., Gagolewski M., OM3: ordered maxitive, minitive, and modular aggregation operators – Part I: Axiomatic analysis under arity-dependence, 2013).

Gagolewski M., Mesiar R., Aggregating different paper quality measures with a generalized h-index, Journal of Informetrics 6(4), 2012, pp. 566-579. doi:10.1016/j.joi.2012.05.001

Abstract. The process of assessing individual authors should rely upon a proper aggregation of reliable and valid papers’ quality metrics. Citations are merely one possible way to measure appreciation of publications. In this study we propose some new, SJR- and SNIP-based indicators, which not only take into account the broadly conceived popularity of a paper (manifested by the number of citations), but also other factors like its potential, or the quality of papers that cite a given publication. We explore the relation and correlation between different metrics and study how they affect the values of a real-valued generalized h-index calculated for 11 prominent scientometricians. We note that the h-index is a very unstable impact function, highly sensitive for applying input elements’ scaling. Our analysis is not only of theoretical significance: data scaling is often performed to normalize citations across disciplines. Uncontrolled application of this operation may lead to unfair and biased (toward some groups) decisions. This puts the validity of authors assessment and ranking using the h-index into question. Obviously, a good impact function to be used in practice should not be as much sensitive to changing input data as the analyzed one.

Keywords. Aggregation operators; Impact functions; Hirsch's h-index; Quality control; Scientometrics; Bibliometrics; SJR; SNIP; Scopus; CITAN; R

Comments. An empirical paper. The ideas presented here were later explored more thoroughly in (Cena A., Gagolewski M., OM3: ordered maxitive, minitive, and modular aggregation operators – Part II: A simulation study, 2013).

Gagolewski M., Bibliometric impact assessment with R and the CITAN package, Journal of Informetrics 5(4), 2011, pp. 678-692. doi:10.1016/j.joi.2011.06.006

Abstract. In this paper CITAN, the CITation ANalysis package for R statistical computing environment, is introduced. The main aim of the software is to support bibliometricians with a tool for preprocessing and cleaning bibliographic data retrieved from SciVerse Scopus and for calculating the most popular indices of scientific impact.
To show the practical usability of the package, an exemplary assessment of authors publishing in the fields of scientometrics and webometrics is performed.

Keywords. Data analysis software; Quality control in science; Citation analysis; Bibliometrics; Hirsch's h index; Egghe's g index; SciVerse Scopus

Gagolewski M., Grzegorzewski P., Possibilistic analysis of arity-monotonic aggregation operators and its relation to bibliometric impact assessment of individuals, International Journal of Approximate Reasoning 52(9), 2011, pp. 1312-1324. doi:10.1016/j.ijar.2011.01.010

Abstract. A class of arity-monotonic aggregation operators, called impact functions, is proposed. This family of operators forms a theoretical framework for the so-called Producer Assessment Problem, which includes the scientometric task of fair and objective assessment of scientists using the number of citations received by their publications.
The impact function output values are analyzed under right-censored and dynamically changing input data. The qualitative possibilistic approach is used to describe this kind of uncertainty. It leads to intuitive graphical interpretations and may be easily applied for practical purposes.
The discourse is illustrated by a family of aggregation operators generalizing the well-known Ordered Weighted Maximum (OWMax) and the Hirsch h-index.

Keywords. Aggregation operators; Possibility theory; S-statistics; h-index; OWMax

Comments. In this paper the class of effort-dominating impact functions has also been introduced. I have shown later (see Gagolewski M., On the Relation Between Effort-Dominating and Symmetric Minitive Aggregation Operators, 2012) that all such aggregation operators are symmetric minitive.

Gagolewski M., Grzegorzewski P., A geometric approach to the construction of scientific impact indices, Scientometrics 81(3), 2009, pp. 617-634. doi:10.1007/s11192-008-2253-y

Abstract. Two broad classes of scientific impact indices are proposed and their properties – both theoretical and practical – are discussed. These new classes were obtained as a geometric generalization of the well-known tools applied in scientometric, like Hirsch’s h-index, Woeginger’s w-index and the Kosmulski’s Maxprod. It is shown how to apply the suggested indices for estimation of the shape of the citation function or the total number of citations of an individual. Additionally, a new efficient and simple O(log n) algorithm for computing the h-index is given.

Keywords. Hirsch's h-index, citation analysis, scientific impact indices

Comments. I have shown later (see Gagolewski M., On the Relation Between Effort-Dominating and Symmetric Minitive Aggregation Operators, 2012) that the rp-indices are symmetric minitive. Moreover, we have found that there exists a O(n log n) algorithm for determining lp (see Gagolewski M., Dębski M., Nowakiewicz M., Efficient Algorithm for Computing Certain Graph-Based Monotone Integrals: the lp-Indices, 2013

Rowiński T., Gagolewski M., Preferencje i postawy wobec pomocy online, Studia Psychologica UKSW 7, 2007, pp. 195-210.

Research Monographs, Textbooks, Edited Volumes 8

Gagolewski M., Lightweight Machine Learning Classics with R, book draft, 2020. doi:10.5281/zenodo.3820167
Gagolewski M., Data Fusion: Theory, Methods, and Applications, Institute of Computer Science, Polish Academy of Sciences, 2015, 290 pp. isbn:978-83-63159-20-7
Gagolewski M., Bartoszuk M., Cena A., Przetwarzanie i analiza danych w języku Python (Data Processing and Analysis in Python), Wydawnictwo Naukowe PWN, 2016, 369 pp. isbn:978-83-01-18940-2
Gagolewski M., Programowanie w języku R. Analiza danych, obliczenia, symulacje (R Programming. Data Analysis. Computing. Simulations), Wydawnictwo Naukowe PWN; 1st ed. – 2014, 509 pp.; 2nd ed. – 2016, 550 pp. isbn:978-83-01-18939-6
Grzegorzewski P., Gagolewski M., Bobecka-Wesołowska K., Wnioskowanie statystyczne z wykorzystaniem środowiska R (Statistical Inference in R), Politechnika Warszawska, 2014, 183 pp. isbn:978-83-93-72601-1
Halaš R., Gagolewski M., Mesiar R. (Eds.), New Trends in Aggregation Theory (Advances in Intelligent Systems and Computing 981), Springer, 2019, 348 pp. doi:10.1007/978-3-030-19494-9 isbn:978-3-030-19493-2
Ferraro M.B., Giordani P., Vantaggi B., Gagolewski M., Gil M.Á., Grzegorzewski P., Hryniewicz O. (Eds.), Soft Methods for Data Science (Advances in Intelligent Systems and Computing 456), Springer, 2017, 535 pp. doi:10.1007/978-3-319-42972-4 isbn:978-3-319-42971-7
Grzegorzewski P., Gagolewski M., Hryniewicz O., Gil M.Á. (Eds.), Strengthening Links Between Data Analysis and Soft Computing (Advances in Intelligent Systems and Computing 315), Springer, 2015, 294 pp. doi:10.1007/978-3-319-10765-3 isbn:978-3-319-10764-6

Papers in Edited Volumes and Proceedings 34

Coroianu L., Gagolewski M., Penalty-based data aggregation in real normed vector spaces, In: Halaš R. et al. (Eds.), New Trends in Aggregation Theory (Advances in Intelligent Systems and Computing 981), Springer, 2019, pp. 160-171. doi:10.1007/978-3-030-19494-9_15

Abstract. The problem of penalty-based data aggregation in generic real normed vector spaces is studied. Some existence and uniqueness results are indicated. Moreover, various properties of the aggregation functions are considered.

Keywords. penalty-based aggregation, prototype learning, means, averages, and medians, vector spaces, Fermat-Weber problem

Beliakov G., Gagolewski M., James S., Least median of squares (LMS) and least trimmed squares (LTS) fitting for the weighted arithmetic mean, In: Medina J. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems. Theory and Foundations (Communications in Computer and Information Science 854), Springer, 2018, pp. 367-378. doi:10.1007/978-3-319-91476-3_31

Abstract. We look at different approaches to learning the weights of the weighted arithmetic mean such that the median residual or sum of the smallest half of squared residuals is minimized. The more general problem of multivariate regression has been well studied in statistical literature however in the case of aggregation functions we have the restriction on the weights and the domain is usually restricted so that ‘outliers’ may not be arbitrarily large. A number of algorithms are compared in terms of accuracy and speed. Our results can be extended to other aggregation functions.

Keywords. aggregation, LMS fitting, LTS fitting, approximation

Gagolewski M., James S., Fitting symmetric fuzzy measures for discrete Sugeno integration, In: Kacprzyk J. et al. (Eds.), Advances in Fuzzy Logic and Technology 2017 (Advances in Intelligent Systems and Computing 642), Springer, 2018, pp. 104-116. doi:10.1007/978-3-319-66824-6_10

Abstract. The Sugeno integral has numerous successful applications, including but not limited to the areas of decision making, preference modeling, and bibliometrics. Despite this, the current state of the development of usable algorithms for numerically fitting the underlying discrete fuzzy measure based on a sample of prototypical values – even in the simplest possible case, i.e., assuming the symmetry of the capacity – is yet to reach a satisfactory level. Thus, the aim of this paper is to present some results and observations concerning this class of data approximation problems.

Keywords. Sugeno integral, aggregation functions, machine learning, regression, approximation

Bartoszuk M., Gagolewski M., Binary aggregation functions in software plagiarism detection, In: Proc. FUZZ-IEEE'17, IEEE, 2017, no. 8015582. doi:10.1109/FUZZ-IEEE.2017.8015582

Abstract. Supervised learning is of key interest in data science. Even though there exist many approaches to solving, among others, classification as well as ordinal and standard regression tasks, most of them output models that do not possess useful formal properties, like nondecreasingness in each independent variable, idempotence, symmetry, etc. This makes them difficult to interpret and analyze. For instance, it might be impossible to determine the importances of individual features or to assess the effects of increasing the values of predictors on the behavior of a chosen response variable. Such properties are especially important in software plagiarism detection, where we are faced with the combination of degrees to which how much a code chunk A is similar to (or contained in) B as well as how much B is similar to A. Therefore, in this paper we consider a new method for fitting B-spline tensor product-based aggregation functions to empirical data. An empirical study indicates a highly competitive performance of the resulting models. Additionally, they possess an intuitive interpretation which is highly desirable for end-users.

Cena A., Gagolewski M., OWA-based linkage and the Genie correction for hierarchical clustering, In: Proc. FUZZ-IEEE'17, IEEE, 2017, no. 8015652. doi:10.1109/FUZZ-IEEE.2017.8015652

Abstract. In this paper we thoroughly investigate various OWA-based linkages in hierarchical clustering on numerous benchmark data sets. The inspected setting generalizes the well-known single, complete, and average linkage schemes, among others. The incorporation of weights into the cluster merge procedure creates an opportunity to make use of experts' knowledge about a particular data domain so as to generate partitions of a given data set that better reflect the true underlying cluster structure. Moreover, we introduce a correction for the inequality of cluster size distribution — similar to the one proposed in our recently introduced Genie algorithm — which results in a significant performance boost in terms of clustering quality.

Gagolewski M., Cena A., Bartoszuk M., Hierarchical clustering via penalty-based aggregation and the Genie approach, In: Torra V. et al. (Eds.), Modeling Decisions for Artificial Intelligence (Lecture Notes in Artificial Intelligence 9880), Springer, 2016, pp. 191-202. doi:10.1007/978-3-319-45656-0_16

Abstract. The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-, de Vergottini-, or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings.

Keywords. hierarchical clustering, aggregation, centroid, Gini-index, Genie algorithm

Bartoszuk M., Beliakov G., Gagolewski M., James S., Fitting aggregation functions to data: Part I – Linearization and regularization, In: Carvalho J.P. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Part II (Communications in Computer and Information Science 611), Springer, 2016, pp. 767-779. doi:10.1007/978-3-319-40581-0_62

Abstract. The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number of papers. Nevertheless, there are still some important issues that have not been discussed in the literature yet. In the first part of this two-part contribution we deal with the concept of regularization, a quite standard technique from machine learning applied so as to increase the fit quality on test and validation data samples. Due to the constraints on the weighting vector, it turns out that quite different methods can be used in the current framework, as compared to regression models. Moreover, it is worth noting that so far fitting weighted quasi-arithmetic means to empirical data has only been performed approximately, via the so-called linearization technique. In this paper we consider exact solutions to such special optimization tasks and indicate cases where linearization leads to much worse solutions.

Keywords. Aggregation functions, weighted quasi-arithmetic means, least squares fitting, regularization, linearization

Bartoszuk M., Beliakov G., Gagolewski M., James S., Fitting aggregation functions to data: Part II – Idempotization, In: Carvalho J.P. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Part II (Communications in Computer and Information Science 611), Springer, 2016, pp. 780-789. doi:10.1007/978-3-319-40581-0_63

Abstract. The use of supervised learning techniques for fitting weights and/or generator functions of weighted quasi-arithmetic means – a special class of idempotent and nondecreasing aggregation functions – to empirical data has already been considered in a number of papers. Nevertheless, there are still some important issues that have not been discussed in the literature yet. In the second part of this two-part contribution we deal with a quite common situation in which we have inputs coming from different sources, describing a similar phenomenon, but which have not been properly normalized. In such a case, idempotent and nondecreasing functions cannot be used to aggregate them unless proper pre-processing is performed. The proposed idempotization method, based on the notion of B-splines, allows for an automatic calibration of independent variables. The introduced technique is applied in an R source code plagiarism detection system.

Keywords. Aggregation functions, weighted quasi-arithmetic means, least squares fitting, idempotence

Cena A., Gagolewski M., Fuzzy k-minpen clustering and k-nearest-minpen classification procedures incorporating generic distance-based penalty minimizers, In: Carvalho J.P. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Part II (Communications in Computer and Information Science 611), Springer, 2016, pp. 445-456. doi:10.1007/978-3-319-40581-0_36

Abstract. We discuss a generalization of the fuzzy (weighted) k-means clustering procedure and point out its relationships with data aggregation in spaces equipped with arbitrary dissimilarity measures. In the proposed setting, a data set partitioning is performed based on the notion of points' proximity to generic distance-based penalty minimizers. Moreover, a new data classification algorithm, resembling the k-nearest neighbors scheme but less computationally and memory demanding, is introduced. Rich examples in complex data domains indicate the usability of the methods and aggregation theory in general.

Keywords. fuzzy k-means algorithm, clustering, classification, fusion functions, penalty minimizers

Lasek J., Gagolewski M., The winning solution to the AAIA'15 Data Mining Competition: Tagging firefighter activities at a fire scene, In: Ganzha M., Maciaszek L., Paprzycki M. (Eds.), Proc. FedCSIS'15, IEEE, 2015, pp. 375-380. doi:10.15439/2015F418

Abstract. Multi-sensor based classification of professionals' activities plays a key role in ensuring the success of an his/her goals. In this paper we present the winning solution to the AAIA'15 Tagging Firefighter Activities at a Fire Scene data mining competition. The approach is based on a Random Forest classifier trained on an input data set with almost 5000 features describing the underlying time series of sensory data.

Keywords. Activity tagging, movement tagging, data mining competition, Random Forest model, FFT

Cena A., Gagolewski M., A K-means-like algorithm for informetric data clustering, In: Alonso J.M., Bustince H., Reformat M. (Eds.), Proc. IFSA/EUSFLAT 2015, Atlantis Press, 2015, pp. 536-543. doi:10.2991/ifsa-eusflat-15.2015.77

Abstract. The K-means algorithm is one of the most often used clustering techniques. However, when it comes to discovering clusters in informetric data sets that consist of non-increasingly ordered vectors of not necessarily conforming lengths, such a method cannot be applied directly. Hence, in this paper, we propose a K-means-like algorithm to determine groups of producers that are similar not only with respect to the quality of information resources they output, but also their quantity.

Keywords. k-means clustering, informetrics, aggregation, impact functions

Bartoszuk M., Gagolewski M., Detecting similarity of R functions via a fusion of multiple heuristic methods, In: Alonso J.M., Bustince H., Reformat M. (Eds.), Proc. IFSA/EUSFLAT 2015, Atlantis Press, 2015, pp. 419-426. doi:10.2991/ifsa-eusflat-15.2015.61

Abstract. In this paper we describe recent advances in our R code similarity detection algorithm. We propose a modification of the Program Dependence Graph (PDG) procedure used in the GPLAG system that better fits the nature of functional programming languages like R. The major strength of our approach lies in a proper aggregation of outputs of multiple plagiarism detection methods, as it is well known that no single technique gives perfect results. It turns out that the incorporation of the PDG algorithm significantly improves the recall ratio, i.e. it is better in indicating true positive cases of plagiarism or code cloning patterns. The implemented system is available as web application at

Keywords. R, plagiarism and code cloning detection, fuzzy proximity relations, aggregation, program dependence graph, t-norms

Gagolewski M., Lasek J., Learning experts' preferences from informetric data, In: Alonso J.M., Bustince H., Reformat M. (Eds.), Proc. IFSA/EUSFLAT 2015, Atlantis Press, 2015, pp. 484-491. doi:10.2991/ifsa-eusflat-15.2015.70

Abstract. In the field of informetrics, agents are often represented by numeric sequences of non necessarily conforming lengths. There are numerous aggregation techniques of such sequences, e.g., the g-index, the h-index, that may be used to compare the output of pairs of agents. In this paper we address a question whether such impact indices may be used to model experts' preferences accurately.

Keywords. preference learning, fuzzy relations, informetrics, aggregation, h-index

Gagolewski M., Normalized WDpWAM and WDpOWA spread measures, In: Alonso J.M., Bustince H., Reformat M. (Eds.), Proc. IFSA/EUSFLAT 2015, Atlantis Press, 2015, pp. 210-216. doi:10.2991/ifsa-eusflat-15.2015.32

Abstract. Aggregation theory often deals with measures of central tendency of quantitative data. As sometimes a different kind of information fusion is needed, an axiomatization of spread measures was introduced recently. In this contribution we explore the properties of WDpWAM and WDpOWA operators, which are defined as weighted Lp-distances to weighted arithmetic mean and OWA operators, respectively. In particular, we give forms of vectors that maximize such fusion functions and thus provide a way to normalize the output value so that the vector of maximal spread always leads to a fixed outcome, e.g., 1 if all the input elements are in [0,1]. This might be desirable when constructing measures of experts' opinions consistency or diversity in group decision making problems.

Keywords. data fusion, aggregation, spread, deviation, variance, OWA operators

Cena A., Gagolewski M., Aggregation and soft clustering of informetric data, In: Baczyński  M., De Baets B., Mesiar R. (Eds.), Proc. 8th International Summer School on Aggregation Operators (AGOP 2015), University of Silesia, 2015, pp. 79-84. isbn:978-83-8012-519-3

Abstract. The aim of this contribution is to inspect possible applications of clustering techniques computed over a set consisting of nonincreasingly ordered vectors of possibly nonconforming lengths. Such data sets appear in the field of informetrics, where one may need to evaluate the quality of information items, e.g., research papers, and their producers. In this paper we investigate the notion of cluster centers as an aggregated representation of all vectors from a given cluster and analyze them by means of aggregation operators.

Keywords. clustering, fuzzy clustering, c-means algorithm, distance, producers assessment problem

Gagolewski M., Some issues in aggregation of multidimensional data, In: Baczyński  M., De Baets B., Mesiar R. (Eds.), Proc. 8th International Summer School on Aggregation Operators (AGOP 2015), University of Silesia, 2015, pp. 127-132. isbn:978-83-8012-519-3

Abstract. The aggregation theory usually takes an interest in summarizing a predefined number of points in the real line. In many applications, like in statistics, data analysis, and mining, the notion of a mean – a nondecreasing, internal, and symmetric fusion function – plays a key role. Nevertheless, when it comes to aggregating a set of points in higher dimensional spaces, the componentwise extension of monotonicity and internality might not be the best choice. Instead, the invariance to certain classes of geometric transformations seems to be crucial in such a case.

Keywords. aggregation, centroid, Tukey median, 1-center, 1-median, convex hull, affine invariance, orthogonalization

Lasek J., Gagolewski M., Estimation of tournament metrics for association football league formats, In: Selected problems in information technologies (Proc. ITRIA'15 vol. 2), Institute of Computer Science, Polish Academy of Sciences, 2015, pp. 67-78.
Cena A., Gagolewski M., Clustering and aggregation of informetric data sets, In: Computational methods in data analysis (Proc. ITRIA'15 vol. 1), Institute of Computer Science, Polish Academy of Sciences, 2015, pp. 5-26. isbn:978-83-63159-22-1
Gagolewski M., Lasek J., The use of fuzzy relations in the assessment of information resources producers' performance, In: Filev D. et al. (Eds.), Proc. 7th IEEE International Conference Intelligent Systems IS'2014, Vol. 2: Tools, Architectures, Systems, Applications (Advances in Intelligent Systems and Computing 323), Springer, 2015, pp. 289-300. doi:10.1007/978-3-319-11310-4_25

Abstract. The producers assessment problem has many important practical instances: it is an abstract model for intelligent systems evaluating e.g. the quality of computer software repositories, web resources, social networking services, and digital libraries. Each producer's performance is determined according not only to the overall quality of the items he/she outputted, but also to the number of such items (which may be different for each agent).
Recent theoretical results indicate that the use of aggregation operators in the process of ranking and evaluation producers may not necessarily lead to fair and plausible outcomes. Therefore, to overcome some weaknesses of the most often applied approach, in this preliminary study we encourage the use of a fuzzy preference relation-based setting and indicate why it may provide better control over the assessment process.

Keywords. fuzzy relations, preference modeling, producers assessment problem, StackOverflow, bibliometrics, h-index

Gagolewski M., Sugeno integral-based confidence intervals for the theoretical h-index, In: Grzegorzewski P. et al. (Eds.), Strengthening Links Between Data Analysis and Soft Computing (Advances in Intelligent Systems and Computing 315), Springer, 2015, pp. 233-240. doi:10.1007/978-3-319-10765-3_28

Abstract. Sugeno integral-based confidence intervals for the theoretical h-index of a fixed-length sequence of i.i.d. random variables are derived. They are compared with other estimators of such a distribution characteristic in a Pareto i.i.d. model. It turns out that in the first case we obtain much wider intervals. It seems to be due to the fact that a Sugeno integral, which may be applied on any ordinal scale, is known to ignore too much information from cardinal-scale data being aggregated.

Keywords. h-index, Sugeno integral, confidence interval, Pareto distribution

Bartoszuk M., Gagolewski M., A fuzzy R code similarity detection algorithm, In: Laurent A. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Part III (Communications in Computer and Information Science 444), Springer, 2014, pp. 21-30. doi:10.1007/978-3-319-08852-5_3

Abstract. R is a programming language and software environment for performing statistical computations and applying data analysis that increasingly gains popularity among practitioners and scientists. In this paper we present a preliminary version of a system to detect pairs of similar R code blocks among a given set of routines, which bases on a proper aggregation of the output of three different [0,1]-valued (fuzzy) proximity degree estimation algorithms. Its analysis on empirical data indicates that the system may in future be successfully applied in practice in order e.g. to detect plagiarism among students' homework submissions or to perform an analysis of code recycling or code cloning in R's open source packages repositories.

Keywords. R, plagiarism detection, code cloning, fuzzy similarity measures

Coroianu L., Gagolewski M., Grzegorzewski P., Adabitabar Firozja M., Houlari T., Piecewise linear approximation of fuzzy numbers preserving the support and core, In: Laurent A. et al. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems, Part II (Communications in Computer and Information Science 443), Springer, 2014, pp. 244-254. doi:10.1007/978-3-319-08855-6_25

Abstract. A reasonable approximation of a fuzzy number should have a simple membership function, be close to the input fuzzy number, and should preserve some of its important characteristics. In this paper we suggest to approximate a fuzzy number by a piecewise linear 1-knot fuzzy number which is the closest one to the input fuzzy number among all piecewise linear 1-knot fuzzy numbers having the same core and the same support as the input. We discuss the existence of the approximation operator, show algorithms ready for the practical use and illustrate the considered concepts by examples. It turns out that such an approximation task may be problematic.

Keywords. Approximation of fuzzy numbers, core, fuzzy number, piecewise linear approximation, support

Cena A., Gagolewski M., OM3: Ordered maxitive, minitive, and modular aggregation operators – Part I: Axiomatic analysis under arity-dependence, In: Bustince H. et al. (Eds.), Aggregation Functions in Theory and in Practise (Advances in Intelligent Systems and Computing 228), Springer, 2013, pp. 93-103. doi:10.1007/978-3-642-39165-1_13

Abstract. Recently, a very interesting relation between symmetric minitive, maxitive, and modular aggregation operators has been shown. It turns out that the intersection between any pair of the mentioned classes is the same. This result introduces what we here propose to call the OM3 operators. In the first part of our contribution on the analysis of the OM3 operators we study some properties that may be useful when aggregating input vectors of varying lengths. In Part II we will perform a thorough simulation study of the impact of input vectors’ calibration on the aggregation results.

Cena A., Gagolewski M., OM3: Ordered maxitive, minitive, and modular aggregation operators – Part II: A simulation study, In: Bustince H. et al. (Eds.), Aggregation Functions in Theory and in Practise (Advances in Intelligent Systems and Computing 228), Springer, 2013, pp. 105-115. doi:10.1007/978-3-642-39165-1_14

Abstract. This article is a second part of the contribution on the analysis of the recently-proposed class of symmetric maxitive, minitive and modular aggregation operators. Recent results (Gagolewski, Mesiar, 2012) indicated some unstable behavior of the generalized h-index, which is a particular instance of OM3, in case of input data transformation. The study was performed on a small, carefully selected real-world data set. Here we conduct some experiments to examine this phenomena more extensively.

Gagolewski M., Statistical hypothesis test for the difference between Hirsch indices of two Pareto-distributed random samples, In: Kruse R. et al. (Eds.), Synergies of Soft Computing and Statistics for Intelligent Data Analysis (Advances in Intelligent Systems and Computing 190), Springer, 2013, pp. 359-367. doi:10.1007/978-3-642-33042-1_39

Abstract. In this paper we discuss the construction of a new parametric statistical hypothesis test for the equality of probability distributions. The test bases on the difference between Hirsch’s h-indices of two equal-length i.i.d. random samples. For the sake of illustration, we analyze its power in case of Pareto-distributed input data. It turns out that the test is very conservative and has wide acceptance regions, which puts in question the appropriateness of the h-index usage in scientific quality control and decision making.

Gagolewski M., Dębski M., Nowakiewicz M., Efficient algorithm for computing certain graph-based monotone integrals: the lp-indices, In: Mesiar R., Bacigal T. (Eds.), Proc. Uncertainty Modelling, STU Bratislava, 2013, pp. 17-23.

Abstract. The Choquet, Sugeno and Shilkret integrals with respect to monotone measures are useful tools in decision support systems. In this paper we propose a new class of graph-based integrals that generalize these three operations. Then, an efficient linear-time algorithm for computing their special case, that is lp-indices, 1≤p<∞, is presented. The algorithm is based on R.L. Graham's routine for determining the convex hull of a finite planar set.

Keywords. Monotone measures, Choquet, Sugeno and Shilkret integral, lp-index, convex hull, Graham's scan, scientific impact indices

Gagolewski M., On the relation between effort-dominating and symmetric minitive aggregation operators, In: Greco S. et al. (Eds.), Advances in Computational Intelligence, Part III (Communications in Computer and Information Science 299), Springer, 2012, pp. 276-285. doi:10.1007/978-3-642-31718-7_29

Abstract. In this paper the recently introduced class of effort-dominating impact functions is examined. It turns out that each effort-dominating aggregation operator not only has a very intuitive interpretation, but also is symmetric minitive, and therefore may be expressed as a so-called quasi-I-statistic, which generalizes the well-know OWMin operator.
These aggregation operators may be used e.g. in the Producer Assessment Problem whose most important instance is the scientometric/bibliometric issue of fair scientists’ ranking by means of the number of citations received by their papers.

Gagolewski M., Grzegorzewski P., Axiomatic characterizations of (quasi-) L-statistics and S-statistics and the Producer Assessment Problem, In: Galichet S., Montero J., Mauris G. (Eds.), Proc. EUSFLAT/LFA 2011, Atlantis Press, 2011, pp. 53-58. doi:10.2991/eusflat.2011.112

Abstract. Two classes of aggregation functions: L-statistics and S-statistics and their generalizations called quasi-L-statistics and quasi-S-statistics are considered. Some interesting characterizations of these families of operators are given. The aforementioned functions are useful for various applications. In particular, they are very helpful for modeling the so-called Producer Assessment Problem.

Rowiński T., Gagolewski M., Internet a kryzys, In: Jankowska M., Starzomska M. (Eds.), Kryzys: Pułapka czy szansa?, WN Akapit, 2011, pp. 211-224. isbn:978-83-609-5885-8
Gagolewski M., Grzegorzewski P., Metody i problemy naukometrii, In: Rowiński T., Tadeusiewicz R. (Eds.), Psychologia i informatyka. Synergia i kontradykcje, Wyd. UKSW, Warszawa, 2010, pp. 103-125. isbn:978-83-707-2679-9
Gagolewski M., Grzegorzewski P., S-Statistics and their basic properties, In: Borgelt C. et al. (Eds.), Combining Soft Computing and Statistical Methods in Data Analysis (Advances in Intelligent and Soft Computing 77), Springer, 2010, pp. 281-288. doi:10.1007/978-3-642-14746-3_35

Abstract. Some statistical properties of the so-called S-statistics, which generalize the ordered weighted maximum aggregation operators, are considered. In particular, the asymptotic normality of S-statistics is proved and some possible applications in estimation problems are suggested.

Gagolewski M., Grzegorzewski P., Arity-monotonic extended aggregation operators, In: Hüllermeier E., Kruse R., Hoffmann F. (Eds.), Information Processing and Management of Uncertainty in Knowledge-Based Systems (Communications in Computer and Information Science 80), Springer, 2010, pp. 693-702. doi:10.1007/978-3-642-14055-6_73

Abstract. A class of extended aggregation operators, called impact functions, is proposed and their basic properties are examined. Some important classes of functions like generalized ordered weighted averaging (OWA) and ordered weighted maximum (OWMax) operators are considered. The general idea is illustrated by the Producer Assessment Problem which includes the scientometric problem of rating scientists basing on the number of citations received by their publications. An interesting characterization of the well known h-index is given.

Gagolewski M., Grzegorzewski P., O pewnym uogólnieniu indeksu Hirscha, In: Kawalec P., Lipski P. (Eds.), Kadry i infrastruktura nowoczesnej nauki: teoria i praktyka, Proc. 1st Intl. Conf. Zarządzanie Nauką, 2009, Vol. 2, pp. 15-29. isbn:978-83-61671-12-1
Gagolewski M., Grzegorzewski P., Possible and necessary h-indices, In: Carvalho J.P. et al. (Eds.), Proc. IFSA/EUSFLAT 2009, 2009, pp. 1691-1695. isbn:978-989-95079-6-8

Abstract. The problem of measuring scientific impact is considered. A class of so-called p-sphere (rp) indices, which generalize the well-known Hirsch index, is used to construct a possibility measure of scientific impact. This measure might be treated as a starting point for prediction of future index values or for dealing with right-censored bibliometric data.