Abstract. The efficacy of different league formats in ranking teams according to their true latent strength is analysed. To this end, a new approach for estimating attacking and defensive strengths based on the Poisson regression for modelling match outcomes is proposed. Various performance metrics are estimated reflecting the agreement between latent teams' strength parameters and their final rank in the league table. The tournament designs studied here are used in the majority of European top-tier association football competitions. Based on numerical experiments, it turns out that a two-stage league format comprising of the three round-robin tournament together with an extra single round-robin is the most efficacious setting. In particular, it is the most accurate in selecting the best team as the winner of the league. Its efficacy can be enhanced by setting the number of points allocated for a win to two (instead of three that is currently in effect in association football).
Abstract. Cluster analysis is one of the most commonly applied unsupervised machine learning techniques. Its aim is to automatically discover an underlying structure of a data set represented by a partition of its elements: mutually disjoint and nonempty subsets are determined in such a way that observations within each group are ``similar'' and entities in distinct clusters ``differ'' as much as possible from each other.
It turns out that two state-of-the-art clustering algorithms -- namely the Genie and HDBSCAN* methods -- can be computed based on the minimum spanning tree (MST) of the pairwise dissimilarity graph. Both of them are not only resistant to outliers and produce high-quality partitions, but also are relatively fast to compute.
The aim of this tutorial is to discuss some key issues of hierarchical clustering and explore their relations with graph and data aggregation theory.
stringiis out. Check out the changelog for more information.
* [GENERAL] #193: `stringi` is now bundled with ICU4C 61.1, which is used on most Windows and OS X builds as well as on *nix systems not equipped with ICU. However, if the C++11 support is disabled, stringi will be built against ICU4C 55.1. The update to ICU brings Unicode 10.0 support, including new emoji characters. * [BUGFIX] #288: stri_match did not return the correct number of columns when input was empty. * [NEW FEATURE] #188: `stri_enc_detect` now returns a list of data frames. * [NEW FEATURE] #289: `stri_flatten` gained `na_empty` `omit_empty` arguments. * [NEW FEATURE] New functions: `stri_remove_empty`, `stri_na2empty` * [NEW FEATURE] #285: Coercion from a non-trivial list (one that consists of atomic vectors, each of length 1) to an atomic vector now issues a warning. * [WARN] Removed `-Wparentheses` warnings in `icu55/common/cstring.h:38:63` and `icu55/i18n/windtfmt.cpp` in the ICU4C 55.1 bundle.
Abstract. The aggregation of several objects into a single one is a common study subject in mathematics. Unfortunately, whereas practitioners often need to deal with the aggregation of many different types of objects (rankings, graphs, strings, etc.), the current theory of aggregation is mostly developed for dealing with the aggregation of values in a poset. In this presentation, we will reflect on the limitations of this poset-based theory of aggregation and “jump through the poset glass”. On the other side, we will not find Wonderland, but, instead, we will find more questions than answers. Indeed, a new theory of aggregation is being born, and we will need to work together on this reboot for years to come.
Abstract. Citations scores and the h-index are basic tools used for measuring the quality of scientific work. Nonetheless, while evaluating academic achievements one rarely takes into consideration for what reason the paper was mentioned by another author - whether in order to highlight the connection between their work or to bring to the reader’s attention any mistakes or flaws. In my talk I will shed some insight into the problem of “negative” citations analyzing data from the Stack Exchange and using the proposed agent-based model. Joint work with Marek Gągolewski and Grzegorz Siudem.