stringi
Documentation
stri_match_all {stringi} | R Documentation |
These functions extract substrings in str
that
match a given regex pattern
. Additionally, they extract matches
to every capture group, i.e., to all the sub-patterns given
in round parentheses.
stri_match_all(str, ..., regex) stri_match_first(str, ..., regex) stri_match_last(str, ..., regex) stri_match(str, ..., regex, mode = c("first", "all", "last")) stri_match_all_regex(str, pattern, omit_no_match = FALSE, cg_missing = NA_character_, ..., opts_regex = NULL) stri_match_first_regex(str, pattern, cg_missing = NA_character_, ..., opts_regex = NULL) stri_match_last_regex(str, pattern, cg_missing = NA_character_, ..., opts_regex = NULL)
str |
character vector; strings to search in |
... |
supplementary arguments passed to the underlying functions,
including additional settings for |
mode |
single string;
one of: |
pattern, regex |
character vector; search patterns; for more details refer to stringi-search |
omit_no_match |
single logical value; if |
cg_missing |
single string to be used if a capture group match is unavailable |
opts_regex |
a named list with ICU Regex settings,
see |
Vectorized over str
and pattern
.
If no pattern match is detected and omit_no_match=FALSE
,
then NA
s are included in the resulting matrix (matrices), see Examples.
Please note: ICU regex engine currently does not fully support named capture groups.
stri_match
, stri_match_all
, stri_match_first
,
and stri_match_last
are convenience functions.
They just call stri_match_*_regex
and were
provided for consistency with other string searching functions' wrappers,
see, among others, stri_extract
.
For stri_match_all*
,
a list of character matrices is returned. Each list element
represents the results of a different search scenario.
For stri_match_first*
and stri_match_last*
a character matrix is returned.
Each row corresponds to a different search result.
The first matrix column gives the whole match. The second one corresponds to the first capture group, the third – the second capture group, and so on.
Other search_extract: stri_extract_all_boundaries
,
stri_extract_all
,
stringi-search
stri_match_all_regex("breakfast=eggs, lunch=pizza, dessert=icecream", "(\\w+)=(\\w+)") stri_match_all_regex(c("breakfast=eggs", "lunch=pizza", "no food here"), "(\\w+)=(\\w+)") stri_match_all_regex(c("breakfast=eggs;lunch=pizza", "breakfast=bacon;lunch=spaghetti", "no food here"), "(\\w+)=(\\w+)") stri_match_first_regex(c("breakfast=eggs;lunch=pizza", "breakfast=bacon;lunch=spaghetti", "no food here"), "(\\w+)=(\\w+)") stri_match_last_regex(c("breakfast=eggs;lunch=pizza", "breakfast=bacon;lunch=spaghetti", "no food here"), "(\\w+)=(\\w+)") stri_match_first_regex(c("abcd", ":abcd", ":abcd:"), "^(:)?([^:]*)(:)?$") stri_match_first_regex(c("abcd", ":abcd", ":abcd:"), "^(:)?([^:]*)(:)?$", cg_missing="") # Match all the pattern of the form XYX, including overlapping matches: stri_match_all_regex("ACAGAGACTTTAGATAGAGAAGA", "(?=(([ACGT])[ACGT]\\2))")[[1]][,2] # Compare the above to: stri_extract_all_regex("ACAGAGACTTTAGATAGAGAAGA", "([ACGT])[ACGT]\\1")