stri_split_lines {stringi}R Documentation

Split a String Into Text Lines


These functions split each character string into text lines.


stri_split_lines(str, omit_empty = FALSE)




character vector (stri_split_lines) or a single string (stri_split_lines1)


logical vector; determines whether empty strings should be removed from the result [stri_split_lines only]


Vectorized over str and omit_empty.

omit_empty is applied during splitting. If it is set to TRUE, then empty strings will never appear in the resulting vector.

Newlines are represented on different platforms e.g. by carriage return (CR, 0x0D), line feed (LF, 0x0A), CRLF, or next line (NEL, 0x85). Moreover, the Unicode Standard defines two unambiguous separator characters, Paragraph Separator (PS, 0x2029) and Line Separator (LS, 0x2028). Sometimes also vertical tab (VT, 0x0B) and form feed (FF, 0x0C) are used. These functions follow UTR#18 rules, where a newline sequence corresponds to the following regular expression: (?:\u{D A}|(?!\u{D A})[\u{A}-\u{D}\u{85}\u{2028}\u{2029}]. Each match is used to split a text line. For efficiency reasons, the search here is not performed by the regex engine, however.


stri_split_lines returns a list of character vectors. If any input string is NA, then the corresponding list element is a single NA string.

stri_split_lines1(str) is equivalent to stri_split_lines(str[1])[[1]] (with default parameters), thus it returns a character vector. Moreover, if the input string ends at a newline sequence, the last empty string is omitted from the result. Therefore, this function may be handy if you wish to split a loaded text file into text lines.


Unicode Newline Guidelines – Unicode Technical Report #13,

Unicode Regular Expressions – Unicode Technical Standard #18,

