Advanced Data Analysis Software Development with R (e-learning @ ICS PAS)

My Advanced Data Analysis Software Development with R e-learning course has just started. It is run on the educational platform of the Interdisciplinary PhD studies program hosted by the Insitute of Computer Science, Polish Academy of Sciences. The project is co-financed by the Human Capital Operational Programme, European Social Found. Batch 02 of the course will start in February/March 2014.
2014-10-01 software

FuzzyNumbers_0.3-5 Now Available

A maintenance release of the FuzzyNumbers package for R is now available on CRAN. CHANGELOG:
* added proper import directives in NAMESPACE
* piecewiseLinearApproximation: method="ApproximateNearestEuclidean"
no longer accepted; use "NearestEuclidean" instead.
* package vignette now in the vignettes/ directory.
2014-08-22 new paper

Spread Measures and Their Relation to Aggregation Functions – Accepted Paper

The paper Gagolewski M., Spread measures and their relation to aggregation functions has just been accepted for publication in European Journal of Operational Research.
Abstract: The theory of aggregation most often deals with measures of central tendency. However, sometimes a very different kind of a numeric vector's synthesis into a single number is required. In this paper we introduce a class of mathematical functions which aim to measure spread or scatter of one-dimensional quantitative data. The proposed definition serves as a common, abstract framework for measures of absolute spread known from statistics, exploratory data analysis and data mining, e.g. the sample variance, standard deviation, range, interquartile range (IQR), median absolute deviation (MAD), etc. Additionally, we develop new measures of experts' opinions diversity or consensus in group decision making problems. We investigate some properties of spread measures, show how are they related to aggregation functions, and indicate their new potentially fruitful application areas.
2014-06-17 new paper

IEEE IS'14 Proceedings Paper Accepted

Accepted for publication in Proc. IEEE IS 2014: Gagolewski M., Lasek J., The use of fuzzy relations in the assessment of information resources producers' performance.
2014-05-26 new paper

SMPS'14 Proceedings Paper Accepted

Accepted for publication in Proc. SMPS 2014: Gagolewski M., Sugeno integral-based confidence intervals for the theoretical h-index.
2014-05-14 software

stringi_0.2-3 Released

The second official release of the stringi package for R is on CRAN now.

Notable changes since v0.1-25:

* [IMPORTANT CHANGE] stri_cmp* now do not allow for passing opts_collator=NA.
From now on, stri_cmp_eq, stri_cmp_neq, and the new operators
%===%, %!==%, %stri===%, and %stri!==% are locale-independent operations,
which base on code point comparisons. New functions stri_cmp_equiv
and stri_cmp_nequiv (and from now on also %==%, %!=%, %stri==%,
and %stri!=%) test for canonical equivalence.

* [IMPORTANT CHANGE] stri_*_fixed search functions now perform
a locale-independent exact (bytewise, of course after conversion to UTF-8)
pattern search. All the Collator-based, locale-dependent search routines
are now available via stri_*_coll. The reason for this is that
ICU USearch has currently very poor performance and in many search tasks
in fact it is sufficient to do exact pattern matching.

* [IMPORTANT CHANGE] stri_enc_nf* and stri_enc_isnf* function families
have been renamed to stri_trans_nf* and stri_trans_isnf*, respectively.
This is because they deal with text transforming, and not with character
encoding. Moreover, all such operation may be performed by
ICU's Transliterator (see below).

* [IMPORTANT CHANGE] stri_*_charclass search functions now
rely solely on ICU's UnicodeSet patterns. All previously accepted
charclass identifiers became invalid. However, new patterns
should now be more familiar to the users (they are regex-like).
Moreover, we observe a very nice performance gain.

* [IMPORTANT CHANGE] stri_sort now does not include NAs
in output vectors by default, for compatibility with sort().
Moreover, currently none of the input vector's attributes are preserved.

* [NEW FUNCTION] stri_trans_general, stri_trans_list gives access
to ICU's Transliterator: may be used to perform very general
text transforms.

* [NEW FUNCTION stri_split_boundaries utilizes ICU's BreakIterator
to split strings at specific text boundaries. Moreover,
stri_locate_boundaries indicates positions of these boundaries.

* [NEW FUNCTION] stri_extract_words uses ICU's BreakIterator to
extract all words from a text. Additionally, stri_locate_words
locates start and end positions of words in a text.

* [NEW FUNCTION] stri_pad, stri_pad_left, stri_pad_right, stri_pad_both
pads a string with a specific code point.

* [NEW FUNCTION] stri_wrap breaks paragraphs of text into lines.
Two algorihms (greedy and minimal-raggedness) are available.

* [NEW FUNCTION] stri_unique extracts unique elements from
a character vector.

* [NEW FUNCTIONS] stri_duplicated any stri_duplicated_any
determine duplicate elements in a character vector.

* [NEW FUNCTION] stri_replace_na replaces NAs in a character vector
with a given string, useful for emulating e.g. R's paste() behavior.

* [NEW FUNCTION] stri_rand_shuffle generates a random permutation
of code points in a string.

* [NEW FUNCTION] stri_rand_strings generates random strings.

* [NEW FUNCTIONS] New functions and binary operators for string comparison:
stri_cmp_eq, stri_cmp_neq, stri_cmp_lt, stri_cmp_le, stri_cmp_gt,
stri_cmp_ge, %==%, %!=%, %<%, %<=%, %>%, %>=%.

* [NEW FUNCTION] stri_enc_mark reads declared encodings of character strings
as seen by stringi.

* [NEW FUNCTION] stri_enc_tonative(str) is an alias to
stri_encode(str, NULL, NULL).

* [NEW FEATURE] stri_order and stri_sort now have an additional argument
`na_last` (defaults to TRUE and NA, respectively).

* [NEW FEATURE] stri_replace_all_charclass now has `merge` arg
(defaults to FALSE for backward-compatibility). It may be used
to e.g. replace sequences of white spaces with a single space.

* [NEW FEATURE] stri_enc_toutf8 now has a new `validate` arg (defaults
to FALSE for backward-compatibility). It may be used in a (rare) case
in which a user wants to fix an invalid UTF-8 byte sequence.
stri_length (among others) now detect invalid UTF-8 byte sequences.

* [NEW FEATURE] All binary operators %???% now also have aliases %stri???%.

* stri_*_fixed now use a tweaked Knuth-Morris-Pratt search algorithm,
which improves the search performance drastically.

* Significant performance improvements in stri_join, stri_flatten,
stri_cmp, stri_trans_to*, and others.

Refer to NEWS for a complete list of changes, new features and bug fixes.

2014-04-02 new paper

Paper on OM3 Operators Accepted in FSS

A paper by A. Cena and me has been accepted for publication in Fuzzy Sets and Systems (doi:10.1016/j.fss.2014.04.001). It is a significantly extended version of our AGOP'2013 contributions entitled ``OM3: Ordered maxitive, minitive, and modular aggregation operators – axiomatic and probabilistic properties in an arity-monotonic setting.''
2014-03-12 software

stringi_0.1-25 Now on CRAN

The initial release of the stringi package for R is now available on CRAN. stringi is THE R package for very fast, correct, consistent and convenient string/text processing in each locale and any native character encoding. The use of the ICU library gives R users a platform-independent set of functions known to Java, Perl, Python, PHP, and Ruby programmers. The package’s API was inspired by Hadley Wickham’s stringr package. See the on-line manual for more information.