stringi 1.7.2#

Another major update of stringi brings a rewritten version of stri_sprintf, support for custom rule-based transliteration, extraction of named regex capture groups, and many other enhancements.

Changes since v1.6.2:

  • [BACKWARD INCOMPATIBILITY] %s$% and %stri$% now use the new stri_sprintf (see below) function instead of base::sprintf.

  • [BACKWARD INCOMPATIBILITY, NEW FEATURE] In stri_sub<- and stri_sub_all<-, providing a negative length from now on does not result in the corresponding input string being altered.

  • [BACKWARD INCOMPATIBILITY, NEW FEATURE] In stri_sub and stri_sub_all, negative length results in the corresponding output being NA or not extracted at all, depending on the setting of the new argument ignore_negative_length.

  • [BACKWARD INCOMPATIBILITY, BUGFIX, NEW FEATURE] In stri_subset* and their replacement versions, pattern and value cannot be longer than str (but now they are recycled if necessary).

  • [BACKWARD INCOMPATIBILITY, NEW FEATURE] stri_sub* now accept the from argument being a matrix like cbind(from, length=length). Unnamed columns or any other names are still interpreted as cbind(from, to). Also, the new argument use_matrix can be used to disable the special treatment of such matrices.

  • [DOCUMENTATION] It has been clarified that the syntax of *_charclass (e.g., used in stri_trim*) differs slightly from regex character classes.

  • [NEW FEATURE] #420: stri_sprintf (alias: stri_string_format) is a Unicode-aware replacement for and enhancement of the base sprintf: it adds a customised handling of NAs (on demand), computing field size based on code point width, outputting substrings of at most given width, variable width and precision (both at the same time), etc. Moreover, stri_printf can be used to display formatted strings conveniently.

  • [NEW FEATURE] #153: stri_match_*_regex now extract capture group names.

  • [NEW FEATURE] #25: stri_locate_*_regex now have a new argument, capture_groups, which allows for extracting positions of matches to parenthesised subexpressions.

  • [NEW FEATURE] stri_locate_* now have a new argument, get_length, whose setting may result in generating from-length matrices (instead of from-to ones).

  • [NEW FEATURE] #438: stri_trans_general now supports rule-based as well as reverse-direction transliteration.

  • [NEW FEATURE] #434: stri_datetime_format and stri_datetime_parse are now vectorised also with respect to the format argument.

  • [NEW FEATURE] stri_datetime_fstr has a new argument, ignore_special, which defaults to TRUE for backward compatibility.

  • [NEW FEATURE] stri_datetime_format, stri_datetime_add, and stri_datetime_fields now call as.POSIXct more eagerly.

  • [NEW FEATURE] stri_trim* now have a new argument, negate.

  • [NEW FEATURE] stri_replace_rstr converts gsub-style replacement strings to stri_replace-style.

  • [INTERNAL] stri_prepare_arg* have been refactored, buffer overruns in the exception handling subsystem are now avoided.

  • [BUGFIX] Few functions (stri_length, stri_enc_toutf32, etc.) did not throw an exception on an invalid UTF-8 byte sequence (and merely issues a warning instead).

  • [BUGFIX] stri_datetime_fstr did not honour NA_character_ and did not parse format strings such as "%Y%m%d" correctly. It has now been completely rewritten (in C).

  • [BUGFIX] stri_wrap did not recognise the width of certain Unicode sequences correctly.