2014-05-14 software

stringi_0.2-3 Released

The second official release of the stringi package for R is on CRAN now.

Notable changes since v0.1-25:

* [IMPORTANT CHANGE] stri_cmp* now do not allow for passing opts_collator=NA.
From now on, stri_cmp_eq, stri_cmp_neq, and the new operators
%===%, %!==%, %stri===%, and %stri!==% are locale-independent operations,
which base on code point comparisons. New functions stri_cmp_equiv
and stri_cmp_nequiv (and from now on also %==%, %!=%, %stri==%,
and %stri!=%) test for canonical equivalence.

* [IMPORTANT CHANGE] stri_*_fixed search functions now perform
a locale-independent exact (bytewise, of course after conversion to UTF-8)
pattern search. All the Collator-based, locale-dependent search routines
are now available via stri_*_coll. The reason for this is that
ICU USearch has currently very poor performance and in many search tasks
in fact it is sufficient to do exact pattern matching.

* [IMPORTANT CHANGE] stri_enc_nf* and stri_enc_isnf* function families
have been renamed to stri_trans_nf* and stri_trans_isnf*, respectively.
This is because they deal with text transforming, and not with character
encoding. Moreover, all such operation may be performed by
ICU's Transliterator (see below).

* [IMPORTANT CHANGE] stri_*_charclass search functions now
rely solely on ICU's UnicodeSet patterns. All previously accepted
charclass identifiers became invalid. However, new patterns
should now be more familiar to the users (they are regex-like).
Moreover, we observe a very nice performance gain.

* [IMPORTANT CHANGE] stri_sort now does not include NAs
in output vectors by default, for compatibility with sort().
Moreover, currently none of the input vector's attributes are preserved.

* [NEW FUNCTION] stri_trans_general, stri_trans_list gives access
to ICU's Transliterator: may be used to perform very general
text transforms.

* [NEW FUNCTION stri_split_boundaries utilizes ICU's BreakIterator
to split strings at specific text boundaries. Moreover,
stri_locate_boundaries indicates positions of these boundaries.

* [NEW FUNCTION] stri_extract_words uses ICU's BreakIterator to
extract all words from a text. Additionally, stri_locate_words
locates start and end positions of words in a text.

* [NEW FUNCTION] stri_pad, stri_pad_left, stri_pad_right, stri_pad_both
pads a string with a specific code point.

* [NEW FUNCTION] stri_wrap breaks paragraphs of text into lines.
Two algorihms (greedy and minimal-raggedness) are available.

* [NEW FUNCTION] stri_unique extracts unique elements from
a character vector.

* [NEW FUNCTIONS] stri_duplicated any stri_duplicated_any
determine duplicate elements in a character vector.

* [NEW FUNCTION] stri_replace_na replaces NAs in a character vector
with a given string, useful for emulating e.g. R's paste() behavior.

* [NEW FUNCTION] stri_rand_shuffle generates a random permutation
of code points in a string.

* [NEW FUNCTION] stri_rand_strings generates random strings.

* [NEW FUNCTIONS] New functions and binary operators for string comparison:
stri_cmp_eq, stri_cmp_neq, stri_cmp_lt, stri_cmp_le, stri_cmp_gt,
stri_cmp_ge, %==%, %!=%, %<%, %<=%, %>%, %>=%.

* [NEW FUNCTION] stri_enc_mark reads declared encodings of character strings
as seen by stringi.

* [NEW FUNCTION] stri_enc_tonative(str) is an alias to
stri_encode(str, NULL, NULL).

* [NEW FEATURE] stri_order and stri_sort now have an additional argument
`na_last` (defaults to TRUE and NA, respectively).

* [NEW FEATURE] stri_replace_all_charclass now has `merge` arg
(defaults to FALSE for backward-compatibility). It may be used
to e.g. replace sequences of white spaces with a single space.

* [NEW FEATURE] stri_enc_toutf8 now has a new `validate` arg (defaults
to FALSE for backward-compatibility). It may be used in a (rare) case
in which a user wants to fix an invalid UTF-8 byte sequence.
stri_length (among others) now detect invalid UTF-8 byte sequences.

* [NEW FEATURE] All binary operators %???% now also have aliases %stri???%.

* stri_*_fixed now use a tweaked Knuth-Morris-Pratt search algorithm,
which improves the search performance drastically.

* Significant performance improvements in stri_join, stri_flatten,
stri_cmp, stri_trans_to*, and others.

Refer to NEWS for a complete list of changes, new features and bug fixes.

2014-04-02 new paper

Paper on OM3 Operators Accepted in FSS

A paper by A. Cena and me has been accepted for publication in Fuzzy Sets and Systems (doi:10.1016/j.fss.2014.04.001). It is a significantly extended version of our AGOP'2013 contributions entitled ``OM3: Ordered maxitive, minitive, and modular aggregation operators – axiomatic and probabilistic properties in an arity-monotonic setting.''
2014-03-12 software

stringi_0.1-25 Now on CRAN

The initial release of the stringi package for R is now available on CRAN. stringi is THE R package for very fast, correct, consistent and convenient string/text processing in each locale and any native character encoding. The use of the ICU library gives R users a platform-independent set of functions known to Java, Perl, Python, PHP, and Ruby programmers. The package’s API was inspired by Hadley Wickham’s stringr package. See the on-line manual for more information.
2014-03-11 new paper

IPMU 2014: Two Papers Accepted

The following papers have been accepted for publication in Proc. IPMU 2014:
  • Bartoszuk M., Gagolewski M., A fuzzy R code similarity detection algorithm.
  • Coroianu L., Gagolewski M., Grzegorzewski P., Adabitabar Firozja M., Houlari T., Piecewise linear approximation of fuzzy numbers preserving the support and core.
The conference proceedings will be published in Springer-Verlag's Communications in Computer and Information Science series.
2014-03-04 new book

Programowanie w Języku R [Programming in R]

My R Programming Book (In Polish) is now available in bookstores. / Książka Programowanie w języku R jest dostępna w księgarniach. Polecam nie tylko do ,,poduszki''!
Programowanie w języku R - okładka
2014-01-03 software

FuzzyNumbers_0.3-3 Released

A new version of the FuzzyNumbers package for R is now available on CRAN.
** FuzzyNumbers Package CHANGELOG **


0.3-3 /2014-01-03/

* piecewiseLinearApproximation() now supports new method="SupportCorePreserving",
see  Coroianu L., Gagolewski M., Grzegorzewski P., Adabitabar Firozja M.,
Houlari T., Piecewise Linear Approximation of Fuzzy Numbers Preserving
the Support and Core, 2014 (submitted for publication).

* piecewiseLinearApproximation() now does not fail on exceptions thrown
by integrate(); fallback=Newton-Cotes formula.

* Removed `Suggests` dependency: testthat tests now available for developers
via the FuzzyNumbers github repository.

* Package manual has been corrected and extended.

* Package vignette is now only available
online at
2013-12-07 new paper

Accepted Paper on Applications of Monotone Measures and Universal Integrals

The paper Gagolewski M., Mesiar R., Monotone measures and universal integrals in a uniform framework for the scientific impact assessment problem has just been accepted for publication in Information Sciences (doi:10.1016/j.ins.2013.12.004).
Abstract: The Choquet, Sugeno, and Shilkret integrals with respect to monotone measures, as well as their generalization – the universal integral, stand for a useful tool in decision support systems. In this paper we propose a general construction method for aggregation operators that may be used in assessing output of scientists. We show that the most often currently used indices of bibliometric impact, like Hirsch's h, Woeginger's w, Egghe's g, Kosmulski's MAXPROD, and similar constructions, may be obtained by means of our framework. Moreover, the model easily leads to some new, very interesting functions.

AGOP 2015 Website Launched

8th International Summer School on Aggregation Operators - AGOP 2015 website has been launched.