Handling dates
Probably the most complicated part of data analysis is the handling of dates and times. They are simply messy. Fortunately, R has a lot of useful date and time functions to make it easier…but it simply is not straight forward.
The honda_long
data has a Month
column and a Year
column. It’s obvious that the original data was set up so that analysis could be done on a calendar-year basis with less of an emphasis on a rolling 5-year analysis for example.
If that yearly type of analysis is all that we would have to do, then this data structure is perfectly reasonable, and we could absolutely stop right now.
However, it’s entirely possible that the conversion to R-based analysis would be the perfect opportunity to still enable a calendar-year analysis while simplifying and opening up the possibilities for analysis based on arbitrary time blocks.
This means that we are going to have to change from Month
and Year
columns to a single date column (that we’re going to call TheDate
). Fortunately, R has tools to support this transformation.
Clearly, knowing the month and the year is not enough to fully specify a single date. We are going to get around this by assuming that we are dealing with the 1st
of every month. We are going to make this conversion in two steps (shown below):
- Convert
Month
andYear
to a string with the formatMonth-01-Year
, whereMonth
takes the three character abbreviation for the month andYear
takes the full four-digit representation of the year (e.g.,Mar-01-2015
). - Convert the above date format (e.g.,
Mar-01-2015
) to an R-based date representation. In this second statement we have to use a format string (%b-%d-%Y
) to spell out the representation of that date string.
Here are the two commands:
Let’s take a look at the data. Note that we have kept the Month
and Year
columns only so that you can verify that TheDate
holds the appropriate value:
# A tibble: 6 × 5
Year Area Month Sold TheDate
<int> <fct> <fct> <int> <date>
1 2013 Japan Jan 58772 2013-01-01
2 2013 Japan Feb 60392 2013-02-01
3 2013 Japan Mar 61666 2013-03-01
4 2013 Japan Apr 57058 2013-04-01
5 2013 Japan May 53400 2013-05-01
6 2013 Japan Jun 59427 2013-06-01
The honda_long
data frame is now ready to support a wide variety of analyses—many more than were entirely possible way back when we started with honda_data
.