Important Miscellany

The Importance of this miscellany

The features of strex that were deemed the most interesting have been given their own vignettes. However, the package was intended as a miscellany of useful functions, so the functions demonstrated here encapsulate the spirit of this package, i.e. functions that save R string manipulators time.

library(strex)
#> Loading required package: stringr

Could this be numeric?

Sometimes you don’t want to know whether something is numeric, just whether or not it could be. Now you can find out with str_can_be_numeric().

str_can_be_numeric(c("1a", "abc", "5", "2e7", "seven"))
#> [1] FALSE FALSE  TRUE  TRUE FALSE

Currency

To get currencies and amounts mentioned in strings, there are str_extract_currencies() and str_nth_currency(), str_first_currency() and str_last_currency(). str_first_currency() just returns the first currency amount. str_last_currency() returns the last. str_nth_currency() allows you to get the second, third and so on. str_extract_currencies() returns all currency amounts mentioned in a string.

string <- c("Alan paid £5", "Joe paid $7")
str_first_currency(string)
#>   string_num       string curr_sym amount
#> 1          1 Alan paid £5        £      5
#> 2          2  Joe paid $7        $      7
string <- c("€1 is $1.17", "£1 is $1.29")
str_nth_currency(string, n = c(1, 2))
#>   string_num      string curr_sym amount
#> 1          1 €1 is $1.17        €   1.00
#> 2          2 £1 is $1.29        $   1.29
str_last_currency(string) # only gets the first mentioned
#>   string_num      string curr_sym amount
#> 1          1 €1 is $1.17        $   1.17
#> 2          2 £1 is $1.29        $   1.29
str_extract_currencies(string)
#>   string_num      string curr_sym amount
#> 1          1 €1 is $1.17        €   1.00
#> 2          1 €1 is $1.17        $   1.17
#> 3          2 £1 is $1.29        £   1.00
#> 4          2 £1 is $1.29        $   1.29

Extract a single element of a string

This is a simple wrapper around stringr::str_sub().

string <- "abcdefg"
str_sub(string, 3, 3)
#> [1] "c"
str_elem(string, 3) # simpler and more exressive
#> [1] "c"

Extract numbers and non-numeric elements

string <- c("aa1bbb2ccc3", "xyz7ayc8jzk99elephant")
str_extract_numbers(string)
#> [[1]]
#> [1] 1 2 3
#> 
#> [[2]]
#> [1]  7  8 99
str_extract_non_numerics(string)
#> [[1]]
#> [1] "aa"  "bbb" "ccc"
#> 
#> [[2]]
#> [1] "xyz"      "ayc"      "jzk"      "elephant"

Split a string by its numbers

string <- c("aa1bbb2ccc3", "xyz7ayc8jzk99elephant")
str_split_by_numbers(string)
#> [[1]]
#> [1] "aa"  "1"   "bbb" "2"   "ccc" "3"  
#> 
#> [[2]]
#> [1] "xyz"      "7"        "ayc"      "8"        "jzk"      "99"       "elephant"

Force a file name to have an extension

We can give files a given extension, leaving them alone if they already have it.

string <- c("spreadsheet1.csv", "spreadsheet2")
str_give_ext(string, "csv")
#> [1] "spreadsheet1.csv" "spreadsheet2.csv"

If the file already has an extension, we can append one or replace it.

str_give_ext(string, "xls") # append
#> [1] "spreadsheet1.csv.xls" "spreadsheet2.xls"
str_give_ext(string, "csv", replace = TRUE) # replace
#> [1] "spreadsheet1.csv" "spreadsheet2.csv"

Strip away a file extension

string <- c("spreadsheet1.csv", "spreadsheet2")
str_before_last_dot(string)
#> [1] "spreadsheet1" "spreadsheet2"

Remove quoted bits from a string

string <- "I hate having these \"quotes\" in the middle of my strings."
cat(string)
#> I hate having these "quotes" in the middle of my strings.
str_remove_quoted(string)
#> [1] "I hate having these  in the middle of my strings."

Split camel case

I’m not mad on CamelCase, I often want to deconstruct it.

string <- c("CamelVar1", c("CamelVar2"))
str_split_camel_case(string)
#> [[1]]
#> [1] "Camel" "Var1" 
#> 
#> [[2]]
#> [1] "Camel" "Var2"

Convert a string to a vector

This is something I did a lot to avoid using regular expression. Don’t do it for that purpose. Learn regex. https://regexone.com/ is a very good start.

string <- "R is good."
str_to_vec(string)
#>  [1] "R" " " "i" "s" " " "g" "o" "o" "d" "."

Trim anything, not just whitespace

What if something is needlessly surrounded by parentheses and we want to get rid of them?

string <- "(((Why all the parentheses?)))"
string %>%
  str_trim_anything(coll("("), side = "left") %>%
  str_trim_anything(coll(")"), side = "r")
#> [1] "Why all the parentheses?"

Remove duplicated bits of strings

string <- c("I often write the word *my* twice in a row in my my sentences.")
str_singleize(string, " my")
#> [1] "I often write the word *my* twice in a row in my sentences."

2024-10-02