Research

Research Interests

My current research interests include, but are not limited to:

  • Complex Data Aggregation and Fusion, Prototype Learning
  • Machine Learning, Data Analysis and Mining Algorithms
  • Computational Statistics, Statistical Software
  • Decision Support, Recommender, and Expert Systems

See also: My Academic Vita (In English or In Polish) and my Publication List.

Research Projects

  1. National Science Center (NCN), Poland, research project 2014/13/D/HS4/01700 Construction and analysis of methods of information resources producers' quality management, Systems Research Institute, Polish Academy of Sciences, principal investigator, years 2015-2017,
  2. Research task A4.1.2, Systems Research Institute, Polish Academy of sciences, principal investigator:
    1. Data aggregation algorithms – theory and applications (2014),
    2. New algorithms for data aggregation and fusion – theory and applications (2015),
    3. Algorithms for data aggregation and fusion – theory and applications in decision making (2016),
    4. Construction and investigation of new methods for data aggregation and analysis (2017).

Awards and Scholarships

  1. Ministry of Science and Higher Education, Poland, scholarship for outstanding young scientists (36 months), 2015,
  2. Foundation for Polish Science (FNP), scholarship for young, talented researchers – START Program, 2013,
  3. Warsaw University of Technology Rector's Award of the first degree for scientific achievements in 2010-2011, 2012,
  4. Warsaw University of Technology Rector's Award of the first degree for scientific achievements in 2008-2009, 2010,
  5. Ministry of Science and Higher Education, Poland, students' scholarship for outstanding scientific achievements, academic year 2007/2008.

Academic Degrees

Dec. 21, 2011 Ph.D. in Computer Science (Data aggregation);
Systems Research Institute, Polish Academy of Sciences;
June 30, 2008 M.Sc. in Computer Science (AI & computer graphics);
Faculty of Mathematics and Information Science, Warsaw University of Technology;

Academic Positions

Feb. 2012 – Assistant Professor (permanent);
Department of Stochastic Methods,
Systems Research Institute, Polish Academy of Sciences
Apr. 2012 – Assistant Professor (½);
Department of Stochastic Processes and Financial Mathematics,
Faculty of Mathematics and Information Science, Warsaw University of Technology
Oct. 2008 – Feb. 2012 Teaching and Research Assistant (½);
Department of Computer Science and Numerical Methods,
Faculty of Mathematics and Information Science, Warsaw University of Technology
July 2008 – Jan. 2012 Research Assistant;
Department of Stochastic Methods,
Systems Research Institute, Polish Academy of Sciences

Internships

2015 Postdoctoral Research Fellow at the Institute for Research and Applications of Fuzzy Modeling (IRAFM), Ostrava, Czech Republic (supervisor: Prof. Martin Štěpnička; length: 2 months; supported by ESF EU, agreement UDA-POKL.04.01.01-00-051/10-00)
2013 Postdoctoral Research Fellow at the Department of Mathematics, Slovak University of Technology (SvF STUBA), Bratislava, Slovakia (supervisor: Prof. Radko Mesiar; length: 4 months; supported by ESF EU, agreement UDA-POKL.04.01.01-00-051/10-00)

Ph.D. Students

I am currently the scientific adviser of the following Ph.D. students:

  1. Maciej Bartoszuk, M.Sc., Eng.
    (Faculty of Mathematics and Information Science, Warsaw University of Technology),
  2. Anna Cena, M.Sc.
    (Systems Research Institute, Polish Academy of Sciences),
  3. Jan Lasek, M.Sc.
    (Interdisciplinary Ph.D. Studies Program, ICS PAS).
Anna Cena, Maciej Bartoszuk, Marek Gagolewski, and Jan Lasek

Scientific Program Committees, Conference Organization, etc.

Program Committee Member for:

  1. IFSA/SCIS 2017 (17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems, Otsu, Japan),
  2. ISAS 2016 (International Symposium on Aggregation and Structures, Luxembourg),
  3. IFSA/EUSFLAT 2015 (16th World Congress of the International Fuzzy Systems Association and 9th Conference of the European Society for Fuzzy Logic and Technology, Gijon, Spain).

Guest Editor for:

  1. Data Mining and Knowledge Discovery – Special Issue Sport Analytics, 2016.

Special Session Organizer at:

  1. Algorithms for Data Aggregation and FusionEUSFLAT 2017 (10th Conference of the European Society for Fuzzy Logic and Technology, Warsaw, Poland),
  2. Computational Aspects of Data Aggregation and Complex Data FusionIPMU 2016 (16th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Eindhoven, The Netherlands).

Organizing Committee Member for:

  1. EUSFLAT 2017 (10th Conference of the European Society for Fuzzy Logic and Technology, Warsaw, Poland) – Stream on Data Analysis Coordinator,
  2. SMPS 2016 (8th International Conference on Soft Methods in Probability and Statistics, Rome, Italy),
  3. AGOP 2015 (8th International Summer School on Aggregation Operators, Katowice, Poland),
  4. SMPS 2014 (7th International Conference on Soft Methods in Probability and Statistics, Warsaw, Poland),
  5. 37th Statystyka Matematyczna – Wisła 2011 Conference.

Memberships

Invited Plenary Lectures

1.
t.b.a., 14th International Conference on Fuzzy Set Theory and Applications – FSTA 2018, Liptovský Ján, Slovakia, Jan. 28-Feb. 2, 2018 (planned).
2.
Aggregation of multidimensional data: A review, 9th International Summer School on Aggregation Operators – AGOP 2017, University of Skövde, Sweden, June 19-22, 2017 (planned).

Abstract. Aggregation theory classically deals with functions to summarize a sequence of numeric values, e.g., in the unit interval. Since the notion of componentwise monotonicity plays a key role in many situations, there is an increasingly growing interest in methods that act on diverse ordered structures.
However, as far as the definition of a mean or an averaging function is concerned, the internality (or at least idempotence) property seems to be of a relatively higher importance than the monotonicity condition. In particular, the Bajraktarević means or the mode are among some well-known non-monotone means.
The concept of a penalty-based function was first investigated by Yager in 1993. In such a framework, we are interested in minimizing the amount of "disagreement" between the inputs and the output being computed; the corresponding aggregation functions are at least idempotent and express many existing means in an intuitive and attractive way.
In this talk I focus on the notion of penalty-based aggregation of sequences of points in Rd, this time for some d≥1. I review three noteworthy subclasses of penalty functions: componentwise extensions of unidimensional ones, those constructed upon pairwise distances between observations, and those defined by measuring the so-called data depth. Then, I discuss their formal properties, which are particularly useful from the perspective of data analysis, e.g., different possible generalizations of internality or equivariances to various geometric transforms. I also point out the difficulties with extending some notions that are key in classical aggregation theory, like the monotonicity property.

3.
Penalty-based fusion of complex data, computational aspects, and applications, International Symposium on Aggregation and Structures – ISAS 2016, University of Luxembourg, July 6, 2016.

Abstract. Since the 1980s, studies of aggregation functions most often focus on the construction and formal analysis of diverse ways to summarize numerical lists with elements in some real interval. Quite recently, we also observe an increasing interest in aggregation of and aggregation on generic partially ordered sets.
However, in many practical applications, we have no natural ordering of given data items. Thus, in this talk we review various aggregation methods in spaces equipped merely with a semimetric (distance). These include the concept of such penalty minimizers as the centroid, 1-median, 1-center, medoid, and their generalizations -- all leading to idempotent fusion functions. Special emphasis is placed on procedures to summarize vectors in Rd for d ≥ 2 (e.g., rows in numeric data frames) as well as character strings (e.g., DNA sequences), but of course the list of other interesting domains could go on forever (rankings, graphs, images, time series, and so on).
We discuss some of their formal properties, exact or approximate (if the underlying optimization task is hard) algorithms to compute them and their applications in clustering and classification tasks.

Other Invited Talks

4.
stringi package for R, Text Analysis R Developers' Workshop, London School of Economics, London, England, Apr. 21-22, 2017 (planned).
5.
Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm and its R interface, European R Users Meeting, Poznań, Poland, Oct. 12-14, 2016.

Abstract. The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure - unless the clusters are well-separated.
To overcome its limitations, we proposed a new hierarchical clustering linkage criterion called *Genie* (Gagolewski, Bartoszuk, Cena, 2016). Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini or Bonferroni index) of the cluster sizes does not increase drastically above a given threshold.
Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed. The algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution further on. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering.
In this talk we will discuss its reference implementation, included in the *genie* package for R.

Keywords. hierarchical clustering, single linkage, inequity measures, Gini-index

6.
Can the scientific assessment process be fair?, Workshop on Research Evaluation, Free University of Bozen-Bolzano, Italy, May 10, 2013.

Abstract. We will examine the very fundamental properties of impact functions, that is the aggregation operators which may be used in e.g. the assessment of scientists by means of citations received by their papers. It turns out that each impact function which gives noncontroversial valuations in disputable cases must necessarily be trivial. Moreover, we will show that for any set of authors with ambiguous citation records, we may construct an impact function that gives ANY desired authors' ordering. Theoretically then, there is a considerable room for manipulation.

Talks at Seminars

7.
Genie: Nowy, szybki i odporny algorytm analizy skupień, Seminarium IBS PAN, Warsaw, Poland, May 23, 2017 (planned).
8.
Agregacja danych: Teoria, metody i zastosowania, Wykład dla słuchaczy Studiów Doktoranckich IBS PAN, Warsaw, Poland, Mar. 5, 2016.
9.
Data aggregation from an algorithmic perspective, IRAFM Seminar, University of Ostrava, Czech Republic, June 4, 2015.
10.
^(R|ICU|i18n|regex)$, Seminarium Matematyczne Metody Informatyki, Instytut Matematyki, Uniwersytet Śląski, Katowice, Poland, Apr. 20, 2015.
11.
Indeks Hirscha i okolice (Hirsch's index & co), CeON, ICM UW, Warsaw, Poland, Mar. 12, 2014.
12.
Scientific impact assessment: State of the art (from aggregation perspective) – Agregačné funkcie: teória a aplikácie, Seminár z modelovania neurčitosti, Katedra matematiky a deskriptívnej geometrie, SvF STU, Bratislava, Slovakia, Apr. 17, 2013.

Talks at Conferences

13.
Binary aggregation functions in software plagiarism detection, Intl. Conf. 2017 IEEE Conference on Fuzzy Systems (IEEE FUZZ 2017), Naples, Italy, July 9-12, 2017 (planned).
14.
Binary aggregation functions in software plagiarism detection, 3rd Intl. Symp. on Fuzzy Sets and Uncertainty Modeling (ISFS 2017), Rzeszów, Poland, May 19-20, 2017 (planned).
15.
Hierarchical clustering via penalty-based aggregation and the Genie approach, 13th Intl. Conf. Modeling Decisions for Artificial Intelligence (MDAI 2016), Sant Julià de Lòria, Andorra, Sept. 19-21, 2016.
16.
Fitting aggregation functions to data: Part I – Linearization and regularization, 16th Intl. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2016), Eindhoven, The Netherlands, June 20-24, 2016.
17.
Some issues in aggregation of multidimensional data, AGOP 2015, Katowice, Poland, July 7, 2015.
18.
Normalized WDpWAM and WDpOWA spread measures, IFSA/Eusflat 2015, Gijon, Spain, July 2, 2015.
19.
Sugeno integral-based confidence intervals for the theoretical h-index, 7th Intl. Conf. Soft Methods in Probability and Statistics (SMPS), Warsaw, Poland, Sep. 24, 2014.
20.
OM3: ordered maxitive, minitive, and modular aggregation operators – Part I: Axiomatic analysis under arity-dependence, AGOP 2013, Pamplona, Spain, July 16-19, 2013.
21.
Statistical hypothesis test for the difference between Hirsch indices of two Pareto-distributed random samples, 6th Intl. Conf. Soft Methods in Probability and Statistics (SMPS), Konstanz, Germany, Oct. 4-6, 2012.
22.
On the relation between effort-dominating and symmetric minitive aggregation operators, 14th Intl. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU), Catania, Italy, July 9-13, 2012.
23.
Porównanie wybranych estymatorów teoretycznego indeksu Hirscha, 37th Conf. Statystyka Matematyczna, Wisła, Poland, Dec. 5-9, 2011.
24.
Axiomatic characterizations of (quasi-) L-statistics and S-statistics and the Producer Assessment Problem, 7th Intl. Conf. EUSFLAT/LFA, Aix-Les-Bains, France, July 18-22, 2011.
25.
Podstawowe właściwości S-statystyk, 36th Conf. Statystyka Matematyczna, Wisła, Poland, Dec. 6-10, 2010.
26.
S-Statistics and their basic properties, 5th Intl. Conf. Soft Methods in Probability and Statistics (SMPS), Oviedo, Spain, Sep. 28-Oct. 1, 2010.
27.
Arity-monotonic extended aggregation operators, 13th Intl. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU), Dortmund, Germany, June 28-July 2, 2010.
28.
Uogólniony indeks Hirscha a dwupróbkowe testy dla rodziny rozkładów Pareto II rodzaju, 35th Conf. Statystyka Matematyczna, Wisła, Poland, Dec. 7-11, 2009.
29.
O pewnym uogólnieniu indeksu Hirscha, 1st Intl. Conf. Zarządzanie Nauką, Lublin, Poland, Nov. 20-22, 2009.
30.
Possible and necessary h-indices, 6th Intl. Conf. IFSA/EUSFLAT, Lisbon, Portugal, July 20-24, 2009.