Missing data

Standards exist for representing missing data, so we’ll start there.

1 The functions testing for NA and NaN

The two most important functions you need to understand are is.na(x) (“does x have a value?”) and is.nan(x) (“is x an undefined number?”).

2 Examples

Consider the following examples. We’ll start with what we hope are obvious cases:

is.na(3)
[1] FALSE
is.na(NA)
[1] TRUE

We would certainly hope that R recognizes that 3 has a value and that NA does not. Whew.

Okay, now a trickier pair of examples

is.na(TRUE)
[1] FALSE
is.na(FALSE)
[1] FALSE

Okay, even though FALSE is the absence of truth, it is not the absence of value! So is.na(FALSE) must be true.

What does is.na() think about undefined numbers?

is.na(0/0)
[1] TRUE
is.na(NaN)
[1] TRUE

Well, R’s position is that something that has an undefined numeric value does not have a value. That seems right…you just have to think carefully about it.

Now let’s consider a few cases with is.nan() (not a number) that should be straight forward to interpret:

is.nan("a")
[1] FALSE
is.nan(3.4)
[1] FALSE
is.nan(0/0)
[1] TRUE
is.nan(NaN)
[1] TRUE

Nothing too surprising here. When testing to see if the argument is an undefined number, both "a" and 3.4 fail (because they are not undefined numbers) but both 0/0 and NaN pass (because they are undefined numbers). Which is as it should be.

Let’s look at what R’s position is on the question “is NA an undefined number?”:

is.nan(NA)
[1] FALSE

So, R’s position is that it is not the case that NA (a non-existent value) is an undefined number.

Now, sometimes you want to test for the opposite of these two functions. You might want to know the answers to the questions “is data available?” and “is the data anything other than an undefined number?” R does not have to define new operators to accomplish this; you can simply use the not operator (!):

  • !is.na(x): “does x have a value?”
  • !is.nan(x): “is x anything other than an undefined number?”