Abstract. Aggregation theory classically deals with functions to summarize a sequence of numeric values, e.g., in the unit interval. Since the notion of componentwise monotonicity plays a key role in many situations, there is an increasingly growing interest in methods that act on diverse ordered structures.
However, as far as the definition of a mean or an averaging function is concerned, the internality (or at least idempotence) property seems to be of a relatively higher importance than the monotonicity condition. In particular, the Bajraktarević means or the mode are among some well-known non-monotone means.
The concept of a penalty-based function was first investigated by Yager in 1993. In such a framework, we are interested in minimizing the amount of "disagreement" between the inputs and the output being computed; the corresponding aggregation functions are at least idempotent and express many existing means in an intuitive and attractive way.
In this talk I focus on the notion of penalty-based aggregation of sequences of points in Rd, this time for some d≥1. I review three noteworthy subclasses of penalty functions: componentwise extensions of unidimensional ones, those constructed upon pairwise distances between observations, and those defined by measuring the so-called data depth. Then, I discuss their formal properties, which are particularly useful from the perspective of data analysis, e.g., different possible generalizations of internality or equivariances to various geometric transforms. I also point out the difficulties with extending some notions that are key in classical aggregation theory, like the monotonicity property.
Ris on its way to CRAN. The package provides powerful string processing facilities to R users and developers and is ranked as one of the most often downloaded
* [GENERAL] `stringi` now requires ICU4C >= 52. * [GENERAL] `stringi` now requires R >= 2.14. * [BUGFIX] Fixed errors pointed out by `clang-UBSAN` in `stri_brkiter.h`. * [BUILD TIME] #238, #220: Try "standard" ICU4C build flags if a call to `pkg-config` fails. * [BUILD TIME] #258: Use `CXX11` instead of `CXX1X` on R >= 3.4. * [BUILD TIME, BUGFIX] #254: `dir.exists()` is R >= 3.2.
stringipackage to CRAN.
* [REMOVE DEPRECATED] `stri_install_check()` and `stri_install_icudt()` marked as deprecated in `stringi` 0.5-5 are no longer being exported. * [BUGFIX] #227: Incorrect behavior of `stri_sub()` and `stri_sub<-()` if the empty string was the result. * [BUILD TIME] #231: The `./configure` (*NIX only) script now reads the following environment varialbes: `STRINGI_CFLAGS`, `STRINGI_CPPFLAGS`, `STRINGI_CXXFLAGS`, `STRINGI_LDFLAGS`, `STRINGI_LIBS`, `STRINGI_DISABLE_CXX11`, `STRINGI_DISABLE_ICU_BUNDLE`, `STRINGI_DISABLE_PKG_CONFIG`, `PKG_CONFIG`, see `INSTALL` for more information. * [BUILD TIME] #253: call to `R_useDynamicSymbols` added. * [BUILD TIME] #230: icudt is now being downloaded by `./configure` (*NIX only) *before* building. * [BUILD TIME] #242: `_COUNT/_LIMIT` enum constants have been deprecated as of ICU 58.2, stringi code has been upgraded accordingly.
Abstract. Research in aggregation theory is nowadays still mostly focused on algorithms to summarize tuples consisting of observations in some real interval or of diverse general ordered structures. Of course, in practice of information processing many other data types between these two extreme cases are worth inspecting. This contribution deals with the aggregation of lists of data points in Rd for arbitrary d≥1. Even though particular functions aiming to summarize multidimensional data have been discussed by researchers in data analysis, computational statistics and geometry, there is clearly a need to provide a comprehensive and unified model in which their properties like equivariances to geometric transformations, internality, and monotonicity may be studied at an appropriate level of generality. The proposed penalty-based approach serves as a common framework for all idempotent information aggregation methods, including componentwise functions, pairwise distance minimizers, and data depth-based medians. It also allows for deriving many new practically useful tools.