2017-05-03 software

**R package stringi 1.2.2 released**

A new major release of the R package

`stringi`

is out.
Check out the changelog for more information.
Changelog:

* [GENERAL] #193: `stringi` is now bundled with ICU4C 61.1, which is used on most Windows and OS X builds as well as on *nix systems not equipped with ICU. However, if the C++11 support is disabled, stringi will be built against ICU4C 55.1. The update to ICU brings Unicode 10.0 support, including new emoji characters. * [BUGFIX] #288: stri_match did not return the correct number of columns when input was empty. * [NEW FEATURE] #188: `stri_enc_detect` now returns a list of data frames. * [NEW FEATURE] #289: `stri_flatten` gained `na_empty` `omit_empty` arguments. * [NEW FEATURE] New functions: `stri_remove_empty`, `stri_na2empty` * [NEW FEATURE] #285: Coercion from a non-trivial list (one that consists of atomic vectors, each of length 1) to an atomic vector now issues a warning. * [WARN] Removed `-Wparentheses` warnings in `icu55/common/cstring.h:38:63` and `icu55/i18n/windtfmt.cpp` in the ICU4C 55.1 bundle.

2018-04-20 invited workshop

**Text Analysis Developers' Workshop 2018 @ NYC**

Greetings from the Text Analysis
Developers' Workshop 2018 @ New York University!
This is a follow-up of the great event
held a year ago at the London School of Economics, but with a stronger
out-of-R focus (Python included).

2018-04-01

2018-03-15

**MADAM Seminar: Aggregation through the poset glass (Raúl Pérez-Fernández)**

On March 28, 2018 at the MADAM
(Methods for Analysis of Data: Algorithms and Modeling) seminar,
Dr Raúl Pérez-Fernández (Ghent University) will give a talk
on the need of Aggregation 2.0.

Abstract.The aggregation of several objects into a single one is a common study subject in mathematics. Unfortunately, whereas practitioners often need to deal with the aggregation of many different types of objects (rankings, graphs, strings, etc.), the current theory of aggregation is mostly developed for dealing with the aggregation of values in a poset. In this presentation, we will reflect on the limitations of this poset-based theory of aggregation and “jump through the poset glass”. On the other side, we will not find Wonderland, but, instead, we will find more questions than answers. Indeed, a new theory of aggregation is being born, and we will need to work together on this reboot for years to come.

2018-03-15

**MADAM Seminar: Should we introduce a ‘dislike’ button for papers? (Agnieszka Geras)**

On March 21, 2018 at the MADAM
(Methods for Analysis of Data: Algorithms and Modeling) seminar,
Ms. Agnieszka Geras (Ph.D. student @ FMIS WUT) will present her recent results
concerning analysis and modeling of Stack Exchange sites.

Abstract.Citations scores and the h-index are basic tools used for measuring the quality of scientific work. Nonetheless, while evaluating academic achievements one rarely takes into consideration for what reason the paper was mentioned by another author - whether in order to highlight the connection between their work or to bring to the reader’s attention any mistakes or flaws. In my talk I will shed some insight into the problem of “negative” citations analyzing data from the Stack Exchange and using the proposed agent-based model. Joint work with Marek Gągolewski and Grzegorz Siudem.

2018-02-24 new paper

**Least median of squares (LMS) and least trimmed squares (LTS) fitting for the weighted arithmetic mean**

A paper entitled *Least median of squares (LMS) and least trimmed squares (LTS)
fitting for the weighted arithmetic mean* (joint work with Gleb Beliakov and Simon James)
has been accepted for publication in the Proceedings of the IPMU 2018 conference.

Abstract.We look at different approaches to learning the weights of the weighted arithmetic mean such that the median residual or sum of the smallest half of squared residuals is minimized. The more general problem of multivariate regression has been well studied in statistical literature however in the case of aggregation functions we have the restriction on the weights and the domain is usually restricted so that ‘outliers’ may not be arbitrarily large. A number of algorithms are compared in terms of accuracy and speed. Our results can be extended to other aggregation functions.

2018-02-02 invited talk

**Invited Plenary Lecture @ FSTA 2018**

Today I gave a lecture at the *14th
International Conference of Fuzzy Set Theory and Applications – FSTA 2018*
held in Liptovský Ján, Slovak Republic.

Abstract.Hirsch's h-index is perhaps the most popular citation-based measure of scientific excellence. Many of its natural generalizations can be expressed as simple functions of some discrete Sugeno integrals.In this talk we shall review some less-known results concerning various stochastic properties of the discrete Sugeno integral with respect to a symmetric normalized capacity, i.e., weighted lattice polynomial functions of real-valued random variables -- both in i.i.d. (independent and identically distributed) and non-i.i.d. (with some dependence structure) cases. For instance, we will be interested in investigating their exact and asymptotic distributions. Based on these, we can, among others, show that the h-index is a consistent estimator of some natural probability distribution's location characteristic. Moreover, we can derive a statistical test to verify whether the difference between two h-indices (say, h'=7 vs. h''=10 in cases where both authors published 40 papers) is actually significant.

What is more, we shall discuss some agent-based models that describe the processes generating citation networks based on, e.g., the preferential attachment (``rich gets richer'') rule. Due to such an approach, we are able to simulate a scientist's activity and then estimate the expected values for the h-index and similar functions based on very simple sample statistics, such as the total number of citations and the total number of publications. Such results can help explain what does the h-index really measure.

2018-01-05

**MADAM Seminar: Measuring the efficacy of league formats in ranking football teams (Jan Lasek)**

On January 5, 2018 at the MADAM (Methods for Analysis
of Data: Algorithms and Modeling) seminar,
Mr Jan Lasek
(Deepsense.io & Ph.D. student @ ICS PAS) will discuss various issues concerning
the efficacy of league formats in ranking football (soccer) teams.

Abstract.Choosing between different tournament designs based on their accuracy in ranking teams is an important topic in football since many domestic championships underwent changes in the recent years. In particular, the transformations of Ekstraklasa -- the top-tier football competition in Poland -- is a topic receiving much attention from the organizing body of the competition, participating football clubs as well as supporters. In this presentation we will discuss the problem of measuring the accuracy of different league formats in ranking teams. We will present various models for rating teams that will be next used to simulate a number of tournaments to evaluate their efficacy, for example, by measuring the probability of the best team win. Finally, we will discuss several other aspects of league formats including the influence of the number of points allocated for a win on the final league standings.