Research

Research Interests

My current research interests include, but are not limited to:

  • Complex Data Aggregation and Fusion, Prototype Learning
  • Machine Learning, Data Analysis and Mining Algorithms
  • Computational Statistics, Statistical Software
  • Interdisciplinary Modeling

See also: My Academic Vita (In English or In Polish) and my Publication List.

MADAM Seminar MADAM: Methods for Analysis of Data – Algorithms and Modeling

Research Projects

  1. The Czech Science Foundation (GAČR), research project 18-06915S, New approaches to aggregation operators in analysis and processing of data, Host institution: University of Olomouc, Czechia; Duration: 36 months; Role: co-investigator; Principal investigator: Prof. Radomír Halaš; 2018.
  2. National Science Center (NCN), Poland; Research project 2014/13/D/HS4/01700 Construction and analysis of methods of information resources producers' quality management; Host institution: Systems Research Institute, Polish Academy of Sciences; Duration: 30 months; Role: principal investigator; Co-investigators: Maciej Bartoszuk, Anna Cena; 2015.
  3. Systems Research Institute, Polish Academy of sciences; Research task A4.1.2:
    1. Construction and investigation of new methods for data aggregation and analysis; Role: principal investigator; Co-investigators: Anna Cena, Barbara Żogała-Siudem; 2017,
    2. Algorithms for data aggregation and fusion – theory and applications in decision making; Role: principal investigator; Co-investigator: Anna Cena; 2016,
    3. New algorithms for data aggregation and fusion – theory and applications; Role: principal investigator; Co-investigator: Anna Cena; 2015,
    4. Data aggregation algorithms – theory and applications; Role: principal investigator; Co-investigator: Anna Cena; 2014.

Awards and Scholarships

  1. Ministry of Science and Higher Education, Poland, scholarship for outstanding young scientists; Duration: 36 months; 2015,
  2. Foundation for Polish Science (FNP), scholarship for young, talented researchers – START Program; Duration: 12 months; 2013,
  3. Warsaw University of Technology Rector's Award of the first degree for scientific achievements in 2010-2011; 2012,
  4. Warsaw University of Technology Rector's Award of the first degree for scientific achievements in 2008-2009; 2010,
  5. Ministry of Science and Higher Education, Poland, students' scholarship for outstanding scientific achievements; 2007.

Academic Degrees

Oct. 20, 2017 Habilitation in Computer Science (Data aggregation and analysis);
Systems Research Institute, Polish Academy of Sciences
Dec. 21, 2011 Ph.D. in Computer Science (Data aggregation);
Systems Research Institute, Polish Academy of Sciences
June 30, 2008 M.Sc. in Computer Science (with honors; AI & computer graphics);
Faculty of Mathematics and Information Science, Warsaw University of Technology

Academic Positions

Apr. 2018 – Associate Professor;
Department of Stochastic Methods,
Systems Research Institute, Polish Academy of Sciences
Jan. 2017 – Associate Professor;
Faculty of Mathematics and Information Science, Warsaw University of Technology
Feb. 2012 – Mar. 2018 Assistant Professor;
Department of Stochastic Methods,
Systems Research Institute, Polish Academy of Sciences
Apr. 2012 – Dec. 2017 Assistant Professor;
Faculty of Mathematics and Information Science, Warsaw University of Technology
Oct. 2008 – Feb. 2012 Teaching and Research Assistant;
Faculty of Mathematics and Information Science, Warsaw University of Technology
July 2008 – Jan. 2012 Research Assistant;
Department of Stochastic Methods,
Systems Research Institute, Polish Academy of Sciences

Short-Term Research Visits

July 17 – Aug. 4, 2017 School of Information Technology, Deakin University, Burwood, Victoria, Australia
Supported by the SEBE Researcher in Residence Program 2017, Deakin University
Apr. 13 – June 14, 2015 Institute for Research and Applications of Fuzzy Modeling, University of Ostrava, Czech Republic
Supported by the European Union European Social Fund, Project UDA-POKL.04.01.01-00-051/10-00 Information technologies: Research and their interdisciplinary applications
Mar. 1 – June 30, 2013 Department of Mathematics, Slovak University of Technology, Bratislava, Slovakia
Supported by the European Union European Social Fund, Project UDA-POKL.04.01.01-00-051/10-00 Information technologies: Research and their interdisciplinary applications

Ph.D. Students

I am currently the supervisor of the following Ph.D. students (work ongoing):

  1. Maciej Bartoszuk, M.Sc., Eng.
    (Research and Teaching Assistant, Faculty of Mathematics and Information Science, Warsaw University of Technology)
  2. Anna Cena, M.Sc.
    (Research Assistant, Systems Research Institute, Polish Academy of Sciences)

I am currently the scientific adviser of the following Ph.D. students:

  1. Agnieszka Geras, M.Sc.
    (Ph.D. Studies Program, Faculty of Mathematics and Information Science, Warsaw University of Technology)
  2. Jan Lasek, M.Sc.
    (Interdisciplinary Ph.D. Studies Program, Institute of Computer Science, Polish Academy of Sciences & deepsense.io)
Anna Cena, Maciej Bartoszuk, Marek Gagolewski, and Jan Lasek in Gijón, Spain, 2015

Reviewing Activity

I was a reviewer of research project proposals for:

  1. Fondo Nacional de Desarrollo Científico y Tecnológico (FONDECYT; The National Fund for Scientific and Technological Development), Chile; 2017 (1).

I served as a reviewer of Ph.D. theses of:

  1. Jana Borzová; Faculty of Science, Pavel Jozef Šafárik University in Košice, Slovakia; 2018,
  2. Hossein Yazdani; Faculty of Electronics, Wrocław University of Science and Technology, Poland; 2018.

I wrote 176 publication reviews, including 139 peer-reviews for the following international journals:

  1. ACM Transactions on Mathematical Software (3),
  2. Afrika Mathematica (1),
  3. Computational and Applied Mathematics (1),
  4. Data Mining and Knowledge Discovery (3),
  5. Demonstratio Mathematica (1),
  6. European Journal of Operational Research (8),
  7. Foundations of Computing and Decision Sciences (1),
  8. Fuzzy Optimization and Decision Making (1),
  9. Fuzzy Sets and Systems (18),
  10. Group Decision and Negotiation (1),
  11. IEEE Access (1),
  12. IEEE Transactions on Fuzzy Systems (29),
  13. Information Fusion (2),
  14. Information Sciences (30),
  15. International Journal of Approximate Reasoning (2),
  16. International Journal of Computational Intelligence Systems (1),
  17. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems (3),
  18. Journal of Applied Analysis (1),
  19. Journal of Engineering Education (1),
  20. Journal of Informetrics (2),
  21. Journal of Intelligent and Fuzzy Systems (3),
  22. Journal of the American Society for Information Science and Technology (7),
  23. Knowledge-Based Systems (1),
  24. Mathematical Problems in Engineering (1),
  25. Pervasive and Mobile Computing (1),
  26. RUDN Journal of Mathematics, Information Sciences and Physics (1),
  27. Scientometrics (14),
  28. Soft Computing (1),

and 37 for international conferences (IFSA/EUSFLAT 2009, IPMU 2010, IPMU 2012, SMPS 2014, EUSFLAT 2015, IPMU 2016, ISAS 2016, SMPS 2016, EUSFLAT 2017, IFSA/SCIS 2017).

Scientific Program Committees, Conference Organization, etc.

Program Committee Member for:

  1. EUSFLAT 2019 (11th Conference of the European Society for Fuzzy Logic and Technology, Prague, Czechia).
  2. ISAS 2018 (2nd International Symposium on Aggregation and Structures, Valladolid, Spain),
  3. ITSRCP'18 (3rd Conference on Information Technology, Systems Research and Computational Physics, Cracow, Poland).
  4. IFSA/SCIS 2017 (17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems, Otsu, Japan),
  5. ISAS 2016 (1st International Symposium on Aggregation and Structures, Luxembourg),
  6. IFSA/EUSFLAT 2015 (16th World Congress of the International Fuzzy Systems Association and 9th Conference of the European Society for Fuzzy Logic and Technology, Gijon, Spain).

Special Session Organizer at:

  1. Algorithms for Data Aggregation and FusionEUSFLAT 2017 (10th Conference of the European Society for Fuzzy Logic and Technology, Warsaw, Poland),
  2. Computational Aspects of Data Aggregation and Complex Data FusionIPMU 2016 (16th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Eindhoven, The Netherlands).

Organizing Committee Member for:

  1. EUSFLAT 2017 (10th Conference of the European Society for Fuzzy Logic and Technology, Warsaw, Poland) – Stream on Data Analysis Coordinator,
  2. SMPS 2016 (8th International Conference on Soft Methods in Probability and Statistics, Rome, Italy),
  3. AGOP 2015 (8th International Summer School on Aggregation Operators, Katowice, Poland),
  4. SMPS 2014 (7th International Conference on Soft Methods in Probability and Statistics, Warsaw, Poland),
  5. 37th Statystyka Matematyczna – Wisła 2011 Conference.

Seminars

Memberships

Invited Plenary Lectures and Tutorials

1.
Clustering on MSTs, International Student Conference on Applied Mathematics and Informatics ISCAMI'18, Malenovice, Czechia, May 10-13, 2018.

Abstract. Cluster analysis is one of the most commonly applied unsupervised machine learning techniques. Its aim is to automatically discover an underlying structure of a data set represented by a partition of its elements: mutually disjoint and nonempty subsets are determined in such a way that observations within each group are ``similar'' and entities in distinct clusters ``differ'' as much as possible from each other.
It turns out that two state-of-the-art clustering algorithms -- namely the Genie and HDBSCAN* methods -- can be computed based on the minimum spanning tree (MST) of the pairwise dissimilarity graph. Both of them are not only resistant to outliers and produce high-quality partitions, but also are relatively fast to compute.
The aim of this tutorial is to discuss some key issues of hierarchical clustering and explore their relations with graph and data aggregation theory.

2.
Stochastic properties of and agent-based models for the Hirsch index and other discrete Sugeno integrals, 14th International Conference on Fuzzy Set Theory and Applications – FSTA 2018, Liptovský Ján, Slovakia, Feb. 2, 2018.

Abstract. Hirsch's h-index is perhaps the most popular citation-based measure of scientific excellence. Many of its natural generalizations can be expressed as simple functions of some discrete Sugeno integrals.
In this talk we shall review some less-known results concerning various stochastic properties of the discrete Sugeno integral with respect to a symmetric normalized capacity, i.e., weighted lattice polynomial functions of real-valued random variables -- both in i.i.d. (independent and identically distributed) and non-i.i.d. (with some dependence structure) cases. For instance, we will be interested in investigating their exact and asymptotic distributions. Based on these, we can, among others, show that the h-index is a consistent estimator of some natural probability distribution's location characteristic. Moreover, we can derive a statistical test to verify whether the difference between two h-indices (say, h'=7 vs. h''=10 in cases where both authors published 40 papers) is actually significant.
What is more, we shall discuss some agent-based models that describe the processes generating citation networks based on, e.g., the preferential attachment (``rich gets richer'') rule. Thanks to such an approach, we are able to simulate a scientist's activity and then estimate the expected values for the h-index and similar functions based on very simple sample statistics, such as the total number of citations and the total number of publications. Such results can help explain what does the h-index really measure.

3.
Aggregation of multidimensional data: A review, 9th International Summer School on Aggregation Operators – AGOP 2017, University of Skövde, Sweden, June 19-22, 2017.

Abstract. Aggregation theory classically deals with functions to summarize a sequence of numeric values, e.g., in the unit interval. Since the notion of componentwise monotonicity plays a key role in many situations, there is an increasingly growing interest in methods that act on diverse ordered structures.
However, as far as the definition of a mean or an averaging function is concerned, the internality (or at least idempotence) property seems to be of a relatively higher importance than the monotonicity condition. In particular, the Bajraktarević means or the mode are among some well-known non-monotone means.
The concept of a penalty-based function was first investigated by Yager in 1993. In such a framework, we are interested in minimizing the amount of "disagreement" between the inputs and the output being computed; the corresponding aggregation functions are at least idempotent and express many existing means in an intuitive and attractive way.
In this talk I focus on the notion of penalty-based aggregation of sequences of points in Rd, this time for some d≥1. I review three noteworthy subclasses of penalty functions: componentwise extensions of unidimensional ones, those constructed upon pairwise distances between observations, and those defined by measuring the so-called data depth. Then, I discuss their formal properties, which are particularly useful from the perspective of data analysis, e.g., different possible generalizations of internality or equivariances to various geometric transforms. I also point out the difficulties with extending some notions that are key in classical aggregation theory, like the monotonicity property.

4.
Penalty-based fusion of complex data, computational aspects, and applications, International Symposium on Aggregation and Structures – ISAS 2016, University of Luxembourg, July 6, 2016.

Abstract. Since the 1980s, studies of aggregation functions most often focus on the construction and formal analysis of diverse ways to summarize numerical lists with elements in some real interval. Quite recently, we also observe an increasing interest in aggregation of and aggregation on generic partially ordered sets.
However, in many practical applications, we have no natural ordering of given data items. Thus, in this talk we review various aggregation methods in spaces equipped merely with a semimetric (distance). These include the concept of such penalty minimizers as the centroid, 1-median, 1-center, medoid, and their generalizations -- all leading to idempotent fusion functions. Special emphasis is placed on procedures to summarize vectors in Rd for d ≥ 2 (e.g., rows in numeric data frames) as well as character strings (e.g., DNA sequences), but of course the list of other interesting domains could go on forever (rankings, graphs, images, time series, and so on).
We discuss some of their formal properties, exact or approximate (if the underlying optimization task is hard) algorithms to compute them and their applications in clustering and classification tasks.

Other Invited Talks

5.
R package stringi, Text Analysis Developers' Workshop, New York University, New York City, NY, US, Apr. 20-21, 2018.
6.
Algorytmy analizy skupień oparte na MST, Studencka konferencja zastosowań matematyki DwuMIan'18, Warsaw, Poland, Mar. 24, 2018.
7.
R package stringi, Text Analysis R Developers' Workshop, London School of Economics, London, England, Apr. 21-22, 2017.
8.
Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm and its R interface, European R Users Meeting, Poznań, Poland, Oct. 12-14, 2016.

Abstract. The time needed to apply a hierarchical clustering algorithm is most often dominated by the number of computations of a pairwise dissimilarity measure. Such a constraint, for larger data sets, puts at a disadvantage the use of all the classical linkage criteria but the single linkage one. However, it is known that the single linkage clustering algorithm is very sensitive to outliers, produces highly skewed dendrograms, and therefore usually does not reflect the true underlying data structure - unless the clusters are well-separated.
To overcome its limitations, we proposed a new hierarchical clustering linkage criterion called *Genie* (Gagolewski, Bartoszuk, Cena, 2016). Namely, our algorithm links two clusters in such a way that a chosen economic inequity measure (e.g., the Gini or Bonferroni index) of the cluster sizes does not increase drastically above a given threshold.
Benchmarks indicate a high practical usefulness of the introduced method: it most often outperforms the Ward or average linkage in terms of the clustering quality while retaining the single linkage speed. The algorithm is easily parallelizable and thus may be run on multiple threads to speed up its execution further on. Its memory overhead is small: there is no need to precompute the complete distance matrix to perform the computations in order to obtain a desired clustering.
In this talk we will discuss its reference implementation, included in the *genie* package for R.

Keywords. hierarchical clustering, single linkage, inequity measures, Gini-index

9.
Can the scientific assessment process be fair?, Workshop on Research Evaluation, Free University of Bozen-Bolzano, Italy, May 10, 2013.

Abstract. We will examine the very fundamental properties of impact functions, that is the aggregation operators which may be used in e.g. the assessment of scientists by means of citations received by their papers. It turns out that each impact function which gives noncontroversial valuations in disputable cases must necessarily be trivial. Moreover, we will show that for any set of authors with ambiguous citation records, we may construct an impact function that gives ANY desired authors' ordering. Theoretically then, there is a considerable room for manipulation.

Talks at Seminars

10.
Aggregation of multidimensional data: A review, School of Information Technology, Deakin University, Burwood, Victoria, Australia, July 21, 2017.
11.
Genie: Nowy, szybki i odporny algorytm analizy skupień, Seminarium IBS PAN, Warsaw, Poland, May 23, 2017.
12.
Agregacja danych: Teoria, metody i zastosowania, Wykład dla słuchaczy Studiów Doktoranckich IBS PAN, Warsaw, Poland, Mar. 5, 2016.
13.
Data aggregation from an algorithmic perspective, IRAFM Seminar, University of Ostrava, Czech Republic, June 4, 2015.
14.
^(R|ICU|i18n|regex)$, Seminarium Matematyczne Metody Informatyki, Instytut Matematyki, Uniwersytet Śląski, Katowice, Poland, Apr. 20, 2015.
15.
Indeks Hirscha i okolice (Hirsch's index & co), CeON, ICM UW, Warsaw, Poland, Mar. 12, 2014.
16.
Scientific impact assessment: State of the art (from aggregation perspective) – Agregačné funkcie: teória a aplikácie, Seminár z modelovania neurčitosti, Katedra matematiky a deskriptívnej geometrie, SvF STU, Bratislava, Slovakia, Apr. 17, 2013.

Talks at Conferences

17.
Fitting symmetric fuzzy measures for discrete Sugeno integration, 10th Intl. Conf. EUSFLAT'17, Warsaw, Poland, September 11-15, 2017.
18.
Binary aggregation functions in software plagiarism detection, 2017 IEEE Conference on Fuzzy Systems (IEEE FUZZ 2017), Naples, Italy, July 9-12, 2017.
19.
Binary aggregation functions in software plagiarism detection, 3rd Intl. Symp. Fuzzy Sets and Uncertainty Modeling (ISFS 2017), Rzeszów, Poland, May 19-20, 2017.
20.
Hierarchical clustering via penalty-based aggregation and the Genie approach, 13th Intl. Conf. Modeling Decisions for Artificial Intelligence (MDAI 2016), Sant Julià de Lòria, Andorra, Sept. 19-21, 2016.
21.
Fitting aggregation functions to data: Part I – Linearization and regularization, 16th Intl. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2016), Eindhoven, The Netherlands, June 20-24, 2016.
22.
Some issues in aggregation of multidimensional data, AGOP 2015, Katowice, Poland, July 7, 2015.
23.
Normalized WDpWAM and WDpOWA spread measures, IFSA/Eusflat 2015, Gijon, Spain, July 2, 2015.
24.
Sugeno integral-based confidence intervals for the theoretical h-index, 7th Intl. Conf. Soft Methods in Probability and Statistics (SMPS), Warsaw, Poland, Sep. 24, 2014.
25.
OM3: ordered maxitive, minitive, and modular aggregation operators – Part I: Axiomatic analysis under arity-dependence, AGOP 2013, Pamplona, Spain, July 16-19, 2013.
26.
Statistical hypothesis test for the difference between Hirsch indices of two Pareto-distributed random samples, 6th Intl. Conf. Soft Methods in Probability and Statistics (SMPS), Konstanz, Germany, Oct. 4-6, 2012.
27.
On the relation between effort-dominating and symmetric minitive aggregation operators, 14th Intl. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU), Catania, Italy, July 9-13, 2012.
28.
Porównanie wybranych estymatorów teoretycznego indeksu Hirscha, 37th Conf. Statystyka Matematyczna, Wisła, Poland, Dec. 5-9, 2011.
29.
Axiomatic characterizations of (quasi-) L-statistics and S-statistics and the Producer Assessment Problem, 7th Intl. Conf. EUSFLAT/LFA, Aix-Les-Bains, France, July 18-22, 2011.
30.
Podstawowe właściwości S-statystyk, 36th Conf. Statystyka Matematyczna, Wisła, Poland, Dec. 6-10, 2010.
31.
S-Statistics and their basic properties, 5th Intl. Conf. Soft Methods in Probability and Statistics (SMPS), Oviedo, Spain, Sep. 28-Oct. 1, 2010.
32.
Arity-monotonic extended aggregation operators, 13th Intl. Conf. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU), Dortmund, Germany, June 28-July 2, 2010.
33.
Uogólniony indeks Hirscha a dwupróbkowe testy dla rodziny rozkładów Pareto II rodzaju, 35th Conf. Statystyka Matematyczna, Wisła, Poland, Dec. 7-11, 2009.
34.
O pewnym uogólnieniu indeksu Hirscha, 1st Intl. Conf. Zarządzanie Nauką, Lublin, Poland, Nov. 20-22, 2009.
35.
Possible and necessary h-indices, 6th Intl. Conf. IFSA/EUSFLAT, Lisbon, Portugal, July 20-24, 2009.