|Publisher:||Institute of Computer Science, Polish Academy of Sciences|
|Reviewers:||Gleb Beliakov, Radko Mesiar|
This publication is issued as a part of the project
Information technologies: Research and their interdisciplinary applications,
objective 4.1 of the Human Capital Operational Program, agreement no.
UDA-POKL.04.01.01-00-051/10-00. It is co-financed by European Union from resources of European Social Fund.
Due to the above, the publication is distributed free of charge. Download a copy here.
Appropriate fusion of large, complex data sets is necessary in the information era. Having to deal with just a few records already forces the human brain to look for patterns in the data and to make its overall picture instead of conceiving a reality as a set of individual entities, which are much more difficult to process and analyze. Quite similarly, the usage of appropriate methods to reduce the information overload on a computer, may not only increase the quality of the results but also significantly decrease algorithms' run-time.
It is known that information systems relying on a single information source (e.g., measurements gathered from one sensor, opinions of just a single authoritative decision maker, outputs of one and only one machine learning algorithm, answers of an individual social survey taker) are most often neither accurate nor reliable.
The theory of aggregation is a relatively new research field, even though various particular methods for data fusion were known and used already by the ancient mathematicians. Since the 1980s, studies of aggregation functions most often focus on the construction and formal, mathematical analysis of diverse ways to summarize numerical lists with elements in some real interval [a,b]. This covers different kinds of broadly-conceived means, fuzzy logic connectives (t-norms, fuzzy implications), as well as copulas. Quite recently, we observe an increasing interest in aggregation on partially ordered sets – in particular, on ordinal (linguistic) scales.
During the 2013 AGOP – International Summer School on Aggregation Operators – conference in Pamplona, Spain, Prof. Bernard De Baets in his plenary lecture pointed out the need to convey research on the so-called Aggregation 2.0. Of course, Aggregation 2.0 does not aim to replace or in any terms depreciate the very successful and important classical aggregation field, but rather to attract the investigators' attention to new, more complex domains, most of which cannot be properly handled without using computational methods. From this perspective, data fusion tools may be embedded in larger, more complicated information processing systems and thus studied as their key components.
A proper complex data fusion has been of interest to many researchers in diverse fields, including computational statistics, computational geometry, bioinformatics, machine learning, pattern recognition, quality management, engineering, statistics, finance, economics, etc. Let us note that it plays a crucial role in:
We observe that many useful machine learning methods are based on a proper aggregation of information entities. In particular, the class of ensemble methods for classification is very successful in practice because of the assumption that no single "weak" classifier can perform as well as their whole group. Interestingly, many of the winning solutions to data mining competitions on Kaggle and similar platforms base somehow on the random forest and similar algorithms. What is more, e.g., neural networks – universal approximators – and other deep learning tools can be understood as hierarchies of individual fusion functions. Thus, they can be conceived as kinds of aggregation techniques as well. We should also mention that an appropriate data fusion is crucial to business enterprises. For numerous reasons, companies are rarely eager to sell large parts of the data sets they posses to their clients. Instead, only carefully pre-processed and aggregated data models are delivered to the customers.
This monograph integrates the spread-out results from different domains using the methodology of the well-established classical aggregation framework, introduce researchers and practitioners to Aggregation 2.0, as well as to point out the challenges and interesting directions for further research.