Handling dates

Probably the most complicated part of data analysis is the handling of dates and times. They are simply messy. Fortunately, R has a lot of useful date and time functions to make it easier…but it simply is not straight forward.

The honda_long data has a Month column and a Year column. It’s obvious that the original data was set up so that analysis could be done on a calendar-year basis with less of an emphasis on a rolling 5-year analysis for example.

If that yearly type of analysis is all that we would have to do, then this data structure is perfectly reasonable, and we could absolutely stop right now.

However, it’s entirely possible that the conversion to R-based analysis would be the perfect opportunity to still enable a calendar-year analysis while simplifying and opening up the possibilities for analysis based on arbitrary time blocks.

This means that we are going to have to change from Month and Year columns to a single date column (that we’re going to call TheDate). Fortunately, R has tools to support this transformation.

Clearly, knowing the month and the year is not enough to fully specify a single date. We are going to get around this by assuming that we are dealing with the 1st of every month. We are going to make this conversion in two steps (shown below):

  1. Convert Month and Year to a string with the format Month-01-Year, where Month takes the three character abbreviation for the month and Year takes the full four-digit representation of the year (e.g., Mar-01-2015).
  2. Convert the above date format (e.g., Mar-01-2015) to an R-based date representation. In this second statement we have to use a format string (%b-%d-%Y) to spell out the representation of that date string.

Here are the two commands:

honda_long$TheDate <- 
  paste(honda_long$Month, 
        "-01-", 
        honda_long$Year,
        sep = "")
honda_long$TheDate <-
  as.Date(honda_long$TheDate, 
          format = "%b-%d-%Y")

Let’s take a look at the data. Note that we have kept the Month and Year columns only so that you can verify that TheDate holds the appropriate value:

head(honda_long)
# A tibble: 6 × 5
   Year Area  Month  Sold TheDate   
  <int> <fct> <fct> <int> <date>    
1  2013 Japan Jan   58772 2013-01-01
2  2013 Japan Feb   60392 2013-02-01
3  2013 Japan Mar   61666 2013-03-01
4  2013 Japan Apr   57058 2013-04-01
5  2013 Japan May   53400 2013-05-01
6  2013 Japan Jun   59427 2013-06-01

The honda_long data frame is now ready to support a wide variety of analyses—many more than were entirely possible way back when we started with honda_data.