My Cart (0)

Customer Service 1-800-221-5528

R Data Types and "Gotchas"

Written by Scott McCoy

Some programming languages have strong data typing. Some have weak data typing. And some languages are R.

R data types can be confusing, even for pirates. R data types can be confusing, even for pirates.

An introduction to data types in R

Some of the basic data types in R:

  • Boolean (logical)
  • Numeric
  • Double
  • Integer
  • Character (string)

Some common R data structures, also called complex types:

  • Factor
  • Vector
  • Data frame
  • Dates

Conveniently, everything in R is an object, so we can use functions to get info about variables/data and learn more about their types.

  • typeof()
  • class()
  • length()
  • attributes()

For example:

> numeric_var <- 1.5
> typeof(numeric_var)
[1] "double"
> class(numeric_var)
[1] "numeric"
> length(numeric_var)
[1] 1
> attributes(numeric_var)
NULL

Another example, this time using a vector:

> factor(levels = c("a","b","c")) %>% attributes()
$levels
[1] "a" "b" "c"
$class
[1] "factor"

The R type system has a number of interesting properties:

  • R is interpreted
  • R is dynamically typed
  • R uses lazy evaluation

This means that R verifies type safety at runtime, not compilation, since R doesn't compile like other languages. You're not going to generate an exe file with R; just hand someone a script. This means that errors are only a problem if they actually run.

For example, this doesn't throw an error because the else clause is never run:

if(TRUE) {  1+1 }
 else {  "a" + 1 }

Unlike other languages that require functions to change a variable's type or at least the new type name in parentheses, R uses implicit coercion. That means it's done automatically at runtime as long as it's possible (it usually is). R does also allow explicit coercion when you want to tell it what to do specifically. This usually uses the as.<class_name> functions, like as.integer() or as.list().

A type coercion "gotcha". First we create a data frame explicitly with dates...

example_data <-
as_tibble(data.frame(StartDate = c(as.Date("2022-01-01"), as.Date("2022-01-31"), as.Date("2022-03-01")),
                                     EndDate = c(as.Date("2022-02-01"), as.Date("2022-02-28"), as.Date("2022-03-31")),
                                     Month = c("January","February","March")))

Then we create a function that adds 1 to the start date...

add_dates <- function(row) {
  row[1] + 1 # this should increment the date by 1, right?
  typeof(row[1])
}

We apply the function to tibble and...

> example_data %>% apply(MARGIN = 1, FUN = add_dates)
 Error in row[1] + 1 : non-numeric argument to binary operator

 # gotcha

This happens because each row is passed to our add_dates() function as a vector, and vectors are homogenous – they can only contain one type. And in our tibble, the last column contains strings. That means our data that was explicitly created using the date data type gets converted to strings (without any kind of notification). Then, attempting "2022-02-01" + 1 causes a type error.

Okay, so this will work better, right...? We're explicitly only passing the dates from the tibble.

> example_data %>% select(StartDate, EndDate) %>%>   apply(MARGIN = 1, FUN = add_dates)
Error in row[1] + 1 : non-numeric argument to binary operator

# gotcha again

R also converts any complex data types (like dates, factors, etc.) to basic data types (character, numeric, etc.) when it applies the function. So once again, we've tried to add a number to a character string.

Another example, this time with vectors:

example_vec <- c(1,2,3)
sum(example_vec) # nothing unusual here

[1] 6

...But adding a string to the vector coerces the whole thing to strings and ruins our summing operation.

> sum(c(example_vec, "*"))
Error in sum(c(example_vec, "*")) :
  invalid 'type' (character) of argument
> typeof(c(example_vec, "*"))
[1] "character"

A similar thing happens with NA values (Not Available, a special value in R that means “no data here”), but NA obliterates the results when you use aggregate functions.

> sum(c(example_vec, NA))
[1] NA

> typeof(c(example_vec, NA))
[1] "double"

# still numeric though

As you can see, adding data to your rows can silently coerce your data frame's columns. Worse, if you're not aware of what's going on fixing it can take forever to debug. Fortunately, as long as you keep in mind R's particular way of handling data types, a solution usually isn't too far away.

Our Ironclad Guarantee

You must be satisfied. Try our print books for 30 days or our eBooks for 14 days. If they aren't the best you've ever used, you can return the books or cancel the eBooks for a prompt refund. No questions asked!

Contact Murach Books

For orders and customer service:

1-800-221-5528

Weekdays, 8 to 4 Pacific Time

College Instructors

If you're a college instructor who would like to consider a book for a course, please visit our website for instructors to learn how to get a complimentary review copy and the full set of instructional materials.