2015-04-17 new paper

AGOP 2015: Two Papers Accepted

Two papers which I author or coauthor got accepted for the AGOP 2015 workshop in Katowice, Poland.
  • Cena A., Gagolewski M., Aggregation and soft clustering of informetric data, In: Proc. AGOP 2015, 2015, (in press)
  • Gagolewski M., Some issues in aggregation of multidimensional data, In: Proc. AGOP 2015, 2015, (in press)
2015-03-25 new paper

IFSA-EUSFLAT 2015: 4 Papers Accepted

Four papers which I author or coauthor got accepted for the IFSA-EUSFLAT 2015 conference in Gijon, Spain.

  • Cena A., Gagolewski M., A K-means-like algorithm for informetric data clustering, In: Proc. IFSA-EUSFLAT 2015, 2015, (in press)
  • Bartoszuk M., Gagolewski M., Detecting similarity of R functions via a fusion of multiple heuristic methods, In: Proc. IFSA-EUSFLAT 2015, 2015, (in press)
  • Gagolewski M., Lasek J., Learning experts' preferences from informetric data, In: Proc. IFSA-EUSFLAT 2015, 2015, (in press)
  • Gagolewski M., Normalized WDpWAM and WDpOWA spread measures, In: Proc. IFSA-EUSFLAT 2015, 2015, (in press)
2015-02-06 new paper

Accepted Paper in Journal of Informetrics

Cena A., Gagolewski M., Mesiar R., Problems and challenges of information resources producers' clustering, Journal of Informetrics, 2015, doi:10.1016/j.joi.2015.02.005; has been accepted for publication.

Abstract: Classically, unsupervised machine learning techniques are applied on data sets with fixed number of attributes (variables). However, many problems encountered in the field of informetrics face us with the need to extend these kinds of methods in a way such that they may be computed over a set of nonincreasingly ordered vectors of unequal lengths. Thus, in this paper, some new dissimilarity measures (metrics) are introduced and studied. Owing to that we may use i.a. hierarchical clustering algorithms in order to determine an input data set's partition consisting of sets of producers that are homogeneous not only with respect to the quality of information resources, but also their quantity.
2014-12-14 software

stringi_0.4-1 Released

Yet another official release of the stringi package for R is on CRAN now. This time we particularly focused on a better compatibility of stringi with stringr.

Notable changes since v0.3-1:

* [IMPORTANT CHANGE] `n_max` argument in `stri_split_*()` has been renamed `n`.

* [IMPORTANT CHANGE] `simplify=FALSE` in `stri_extract_all_*()` and
`stri_split_*()` now calls `stri_list2matrix()` with `fill=""`.
`fill=NA_character_` may be obtained by using `simplify=NA`.

* [IMPORTANT CHANGE, NEW FUNCTIONS] #120: `stri_extract_words` has been
renamed `stri_extract_all_words` and `stri_locate_boundaries` -
`stri_locate_all_boundaries` as well as `stri_locate_words` -
`stri_locate_all_words`. New functions are now available:
`stri_locate_first_boundaries`, `stri_locate_last_boundaries`,
`stri_locate_first_words`, `stri_locate_last_words`,
`stri_extract_first_words`, `stri_extract_last_words`.

* [IMPORTANT CHANGE] #111: `opts_regex`, `opts_collator`, `opts_fixed`, and
`opts_brkiter` can now be supplied individually via `...`.
In other words, you may now simply call e.g.
`stri_detect_regex(str, pattern, case_insensitive=TRUE)` instead of
`stri_detect_regex(str, pattern, opts_regex=stri_opts_regex(case_insensitive=TRUE))`.

* [NEW FEATURE] #110: Fixed pattern search engine's settings can
now be supplied via `opts_fixed` argument in `stri_*_fixed()`,
see `stri_opts_fixed()`. A simple (not suitable for natural language
processing) yet very fast `case_insensitive` pattern matching can be
performed now. `stri_extract_*_fixed` is again available.

* [NEW FEATURE] #23: `stri_extract_all_fixed`, `stri_count`, and
`stri_locate_all_fixed` may now also look for overlapping pattern
matches, see `?stri_opts_fixed`.

* [NEW FEATURE] #129: `stri_match_*_regex` gained a `cg_missing` argument.

* [NEW FEATURE] #117: `stri_extract_all_*()`, `stri_locate_all_*()`,
`stri_match_all_*()` gained a new argument: `omit_no_match`.
Setting it to `TRUE` makes these functions compatible with their
`stringr` equivalents.

* [NEW FEATURE] #118: `stri_wrap()` gained `indent`, `exdent`, `initial`,
and `prefix` arguments. Moreover Knuth's dynamic word wrapping algorithm
now assumes that the cost of printing the last line is zero, see #128.

* [NEW FEATURE] #122: `stri_subset()` gained an `omit_na` argument.

* [NEW FEATURE] `stri_list2matrix()` gained an `n_min` argument.

* [NEW FEATURE] #126: `stri_split()` now is also able to act
just like `stringr::str_split_fixed()`.

* [NEW FEATURE] #119:  `stri_split_boundaries()` now have
`n`, `tokens_only`, and `simplify` arguments. Additionally,
`stri_extract_all_words()` is now equipped with `simplify` arg.

* [NEW FEATURE] #116: `stri_paste()` gained a new argument:
`ignore_null`. Setting it to `TRUE` makes this function more compatible
with `paste()`.

* [NEW FEATURE] #114: `stri_paste()`: `ignore_null` arg has been added.

* [OTHER] #123: `useDynLib` is used to speed up symbol look-up in
the compiled dynamic library.

* [BUGFIX]  #94: Run-time errors on Solaris caused by setting
`-DU_DISABLE_RENAMING=1` -- memory allocation errors in i.a. ICU's
UnicodeString. This setting also caused some ABSan sanity check
failures within ICU code.
2014-11-22 grant

Research Project 2014/13/D/HS4/01700 (NCN)

My research project Construction and analysis of methods of information resources producers' quality management will receive funding from the National Science Centre (NCN), Poland via the SONATA founding scheme (host institution=Systems Research Institute, Polish Academy of Sciences, role=principal investigator, years=2015-2017).
2014-11-06 software

stringi_0.3-1 Released

The third official release of the stringi package for R is on CRAN now.

Notable changes since v0.2-5:

* [IMPORTANT CHANGE] #87: `%>%` overlapped with the pipe operator from
the `magrittr` package; now each operator like `%>%` has been renamed `%s>%`.

* [IMPORTANT CHANGE] #108: Now the BreakIterator (for text boundary analysis)
may be better controlled via `stri_opts_brkiter()` (see options `type`
and `locale` which aim to replace now-removed `boundary` and `locale` parameters
to `stri_locate_boundaries`, `stri_split_boundaries`, `stri_trans_totitle`,
`stri_extract_words`, `stri_locate_words`).

* [NEW FUNCTIONS] #109: `stri_count_boundaries` and `stri_count_words`
count the number of text boundaries in a string.

* [NEW FUNCTIONS] #41: `stri_startswith_*` and `stri_endswith_*`
determine whether a string starts or ends with a given pattern.

* [NEW FEATURE] #102: `stri_replace_all_*` gained a `vectorize_all` parameter,
which defaults to TRUE for backward compatibility.

* [NEW FUNCTION] #91: `stri_subset_*`, a convenient and more efficient
substitute for `str[stri_detect_*(str, ...)]`, added.

* [NEW FEATURE] #100: `stri_split_fixed`, `stri_split_charclass`,
`stri_split_regex`, `stri_split_coll` gained a `tokens_only` parameter,
which defaults to `FALSE` for backward compatibility.

* [NEW FUNCTION] #105: `stri_list2matrix` converts lists of atomic vectors
to character matrices, useful in connection with `stri_split`
and `stri_extract`.

* [NEW FEATURE] #107: `stri_split_*` now allow setting an `omit_empty=NA` argument.

* [NEW FEATURE] #106: `stri_split` and `stri_extract_all` gained a `simplify`
argument (if `TRUE`, then `stri_list2matrix(..., byrow=TRUE)`
is called on the resulting list.

* [NEW FUNCTION] #77: `stri_rand_lipsum` generates
(pseudo)random dummy *lorem ipsum* text.

* [NEW FEATURE] #98: `stri_trans_totitle` gained a `opts_brkiter`
parameter; it indicates which ICU BreakIterator should be used when
performing case mapping.

* [NEW FEATURE] `stri_wrap` gained a new parameter: `normalize`.

* [BUGFIX] #86: `stri_*_fixed`, `stri_*_coll`, and `stri_*_regex` could
give incorrect results if one of search strings were of length 0.

* [BUGFIX] #99: `stri_replace_all` did not use the `replacement` arg.

* [BUGFIX] #94: `R CMD check` should no longer fail if `icudt` download failed.

* [BUGFIX] #112: Some of the objects were not PROTECTed from
being garbage collected, which might have caused spontaneous SEGFAULTS.

* [BUGFIX] Some collator's options were not passed correctly to ICU services.

* [BUGFIX] Memory leaks causes as detected by
`valgrind --tool=memcheck --leak-check=full` have been removed.

* [DOCUMENTATION] Significant extensions/clean ups in the stringi manual.

Refer to NEWS for a complete list of changes, new features and bug fixes.


Advanced Data Analysis Software Development with R (e-learning @ ICS PAS)

My Advanced Data Analysis Software Development with R e-learning course has just started. It is run on the educational platform of the Interdisciplinary PhD studies program hosted by the Insitute of Computer Science, Polish Academy of Sciences. The project is co-financed by the Human Capital Operational Programme, European Social Found. Batch 02 of the course will start in February/March 2014.
2014-10-01 software

FuzzyNumbers_0.3-5 Now Available

A maintenance release of the FuzzyNumbers package for R is now available on CRAN. CHANGELOG:
* added proper import directives in NAMESPACE
* piecewiseLinearApproximation: method="ApproximateNearestEuclidean"
no longer accepted; use "NearestEuclidean" instead.
* package vignette now in the vignettes/ directory.