Extract the numbers from a string, where decimals, scientific notation and thousand separators are optionally allowed.
str_extract_numbers(
string,
decimals = FALSE,
leading_decimals = decimals,
negs = FALSE,
sci = FALSE,
big_mark = "",
leave_as_string = FALSE,
commas = FALSE
)
A string.
Do you want to include the possibility of decimal numbers
(TRUE
) or not (FALSE
, the default).
Do you want to allow a leading decimal point to be the start of a number?
Do you want to allow negative numbers? Note that double negatives are not handled here (see the examples).
Make the search aware of scientific notation e.g. 2e3 is the same as 2000.
A character. Allow this character to be used as a thousands
separator. This character will be removed from between digits before they
are converted to numeric. You may specify many at once by pasting them
together e.g. big_mark = ",_"
will allow both commas and underscores.
Internally, this will be used inside a []
regex block so e.g. "a-z"
will behave differently to "az-"
. Most common separators (commas, spaces,
underscores) should work fine.
Do you want to return the number as a string (TRUE
)
or as numeric (FALSE
, the default)?
Deprecated. Use big_mark
instead.
For str_extract_numbers
and str_extract_non_numerics
, a list of
numeric or character vectors, one list element for each element of
string
. For str_nth_number
and str_nth_non_numeric
, a numeric or
character vector the same length as the vector string
.
If any part of a string contains an ambiguous number (e.g. 1.2.3
would be
ambiguous if decimals = TRUE
(but not otherwise)), the value returned for
that string will be NA
and a warning
will be issued.
With scientific notation, it is assumed that the exponent is not a decimal
number e.g. 2e2.4
is unacceptable. Thousand separators, however, are
acceptable in the exponent.
Numbers outside the double precision floating point range (i.e. with absolute
value greater than 1.797693e+308) are read as Inf
(or -Inf
if they begin
with a minus sign). This is what base::as.numeric()
does.
Other numeric extractors:
str_nth_number()
,
str_nth_number_after_mth()
,
str_nth_number_before_mth()
strings <- c(
"abc123def456", "abc-0.12def.345", "abc.12e4def34.5e9",
"abc1,100def1,230.5", "abc1,100e3,215def4e1,000"
)
str_extract_numbers(strings)
#> [[1]]
#> [1] 123 456
#>
#> [[2]]
#> [1] 0 12 345
#>
#> [[3]]
#> [1] 12 4 34 5 9
#>
#> [[4]]
#> [1] 1 100 1 230 5
#>
#> [[5]]
#> [1] 1 100 3 215 4 1 0
#>
str_extract_numbers(strings, decimals = TRUE)
#> [[1]]
#> [1] 123 456
#>
#> [[2]]
#> [1] 0.120 0.345
#>
#> [[3]]
#> [1] 0.12 4.00 34.50 9.00
#>
#> [[4]]
#> [1] 1.0 100.0 1.0 230.5
#>
#> [[5]]
#> [1] 1 100 3 215 4 1 0
#>
str_extract_numbers(strings, decimals = TRUE, leading_decimals = TRUE)
#> [[1]]
#> [1] 123 456
#>
#> [[2]]
#> [1] 0.120 0.345
#>
#> [[3]]
#> [1] 0.12 4.00 34.50 9.00
#>
#> [[4]]
#> [1] 1.0 100.0 1.0 230.5
#>
#> [[5]]
#> [1] 1 100 3 215 4 1 0
#>
str_extract_numbers(strings, big_mark = ",")
#> [[1]]
#> [1] 123 456
#>
#> [[2]]
#> [1] 0 12 345
#>
#> [[3]]
#> [1] 12 4 34 5 9
#>
#> [[4]]
#> [1] 1100 1230 5
#>
#> [[5]]
#> [1] 1100 3215 4 1000
#>
str_extract_numbers(strings,
decimals = TRUE, leading_decimals = TRUE,
sci = TRUE
)
#> [[1]]
#> [1] 123 456
#>
#> [[2]]
#> [1] 0.120 0.345
#>
#> [[3]]
#> [1] 1.20e+03 3.45e+10
#>
#> [[4]]
#> [1] 1.0 100.0 1.0 230.5
#>
#> [[5]]
#> [1] 1 100000 215 40 0
#>
str_extract_numbers(strings,
decimals = TRUE, leading_decimals = TRUE,
sci = TRUE, big_mark = ",", negs = TRUE
)
#> [[1]]
#> [1] 123 456
#>
#> [[2]]
#> [1] -0.120 0.345
#>
#> [[3]]
#> [1] 1.20e+03 3.45e+10
#>
#> [[4]]
#> [1] 1100.0 1230.5
#>
#> [[5]]
#> [1] Inf Inf
#>
str_extract_numbers(strings,
decimals = TRUE, leading_decimals = FALSE,
sci = FALSE, big_mark = ",", leave_as_string = TRUE
)
#> [[1]]
#> [1] "123" "456"
#>
#> [[2]]
#> [1] "0.12" "345"
#>
#> [[3]]
#> [1] "12" "4" "34.5" "9"
#>
#> [[4]]
#> [1] "1,100" "1,230.5"
#>
#> [[5]]
#> [1] "1,100" "3,215" "4" "1,000"
#>
str_extract_numbers(c("22", "1.2.3"), decimals = TRUE)
#> Warning: `NA`s introduced by ambiguity.
#> ℹ The first such ambiguity is in string number 2 which is '1.2.3'.
#> ✖ The offending part of that string is '.2.3'.
#> [[1]]
#> [1] 22
#>
#> [[2]]
#> [1] NA
#>