2-06: Datetime objects

0.1 To-do

1 Purpose

  • Convert a string into a Date or Datetime object

  • Reformat Date and Datetime objects

2 Material

The script for the lesson is here

The data for the lesson: dateTimeData.csv

Warning: do not save this data file using Excel. Excel will edit the Date columns and break the lesson script.

3 Two new variable types: Date and POSIXct

We have mostly worked with two variable types in R: numeric and string (also called characters or string of characters).  Numeric values always appear outside of quotes and string values always appear in quotes. Note: if a numeric value is in quotes, it is treated as a string.

 

In this lesson we introduce two more variable type: Date and POSIXctDate, as the name implies, holds information about the date whereas POSIXct holds information about both the date and the time.

 

POSIX is a system of standards developed for UNIX computers in the 1970s.  ct means Calendar Time.  So POSIXct is the UNIX standard for calendar time – and we continue to use this standard to this date (and maintain the unintuitive name).

3.1 Reading datetime objects from a dataframe

Let’s open the data file and save the contents to a data frame called dateTimeData.

  dateTimeData = read.csv("data/dateTimeData.csv");

The first six columns in dateTimeData (Figure 1) have values that look like dates and times but R still sees these columns as strings (chr).

 

In other words, R does not automatically assign values that look like dates and times to variable types Date or POSIXct.  R only sees quotes and assigns the values to a string variable.  We need to manually convert the string columns into Date or POSIXct variables.

Figure 1: R sees the columns with date and time values as strings (chr)

3.2 Using as.Date()

We are going to start with converting strings that hold only date values to a Date variables and later in the lesson we will convert strings that have time values to POSIXct variables.  The process is very similar for both Date and POSIXct.

 

as.Date() is the function that reads in a string and converts it to a Date variable.  However, if you pass a string column (in this case, date3 from dateTimeData) into as.Date(), you will get the following error:

> as.Date(dateTimeData$date3)
Error in charToDate(x) : 
  character string is not in a standard unambiguous format

This error occurs because dates can come in many different formats (e.g., April 15, 2022; 2022-04-15; 4/15/22).  as.Date() tries to figure out the format used in the string values but as.Date() often fails to figure out a format and it gives you an error.

 

Even when as.Date() figures out a format, it is often the wrong format. In the following example, as.Date() assumes the first digits in the string represent the year even though, visually, the year is obviously the last four digits.

> as.Date("12-10-2022")
[1] "0012-10-20"
Figure 2: as.Date thinks 12-10-2022 is October, 20 in the year 12

3.3 the format argument in as.Date()

The solution is to explicitly tell as.Date() the format of the date values using the format argument.

Figure 3: The as.Date() function with the format argument

First we need to break down the date values into their component parts. 

Let’s start with the first column (Figure 1) where the dates look like this: Apr 15, 2022

 

We need to explicitly tell as.Date() how the date is formatted and this means breaking down every component of the string.

 

Broken down, each cell in the column date1 has:

  • the abbreviation for the month  (e.g., Apr)

  • a space

  • the day of the month (e.g., 15)

  • a comma and a space

  • the 4-digit year (e.g., 2022)

4 Formatting the date

Next, we to translate the above information into a language that as.Date() understands using the format argument.

 

The official term for the representation of the different parts of the datetime string is conversion specifications, which you can find under the details section here: https://stat.ethz.ch/R-manual/R-devel/library/base/html/strptime.html

Figure 4: The beginning of the list of conversion specifications

The conversion specifications allow you to generalize the different components of the datetime – and they all start with a %.  When you declare the format of a datetime string, you replace the date and time components of the string with conversion specifications.  For instance, Apr 15, 2022 contains the abbreviated month so we know %b will be in the specification.

 

Some other specifications:

  • %m: Month – given as a number between 00 and 12

  • %M: Minutes – given as a number between 00 and 59

  • %y: Last two digits of the year

  • %Y: Full year

4.1 Using conversion specifications

Let’s take our string that represents a datetime and replace the date-time components with a conversion specification using the example of string: “Apr 15, 2022”

 

Broken down, each cell in the column date1 has:

  • the abbreviation for the month  (%b)

  • a space

  • the day of the month (%d)

  • a comma and a space

  • the 4-digit year (%Ynote: %y would be the 2-digit year

     

The non-datetime characters in the specification (e.g., the spaces and the comma) need to be maintained

 

Now we have the general format for the dates in the date1 column using conversion specifications and it is:

"%b %d, %Y"

Note: This is still a string value, so we keep the quotes.

 

And we can attach that conversion specification string using the format argument in as.Date():

  stnDate = as.Date(dateTimeData$date1,    # date1 is a chr (string) column
                    format="%b %d, %Y");   # give the format of date1

In the Environment tab, we see that stnDate is a Date variable with 300 values in it.

stnDate Date[1:300], format: "2022-04-15"...

stnDate is shown in the default method R uses to display Date variables, which is 4-digit year, 2-digit month, and 2-digit day.

4.2 Sorting a Date object

Formatting dates properly is vital when you are calculating or plotting date and time data.  When properly formatted, R understands the order of the value and the intervals between them.

 

If we take the original date1 column and sort it, the sorting will be in alphabetical order with April dates followed by August, then July, June, and May (because R sees date1 as a string).

 

note: to save space, I am only outputting every 10th value

> sort(dateTimeData$date1[seq(from=1,to=300,by=10)])
 [1] "Apr 15, 2022" "Apr 15, 2022" "Apr 15, 2022"
 [4] "Apr 28, 2022" "Apr 28, 2022" "Apr 28, 2022"
 [7] "Aug 06, 2022" "Aug 06, 2022" "Aug 06, 2022"
[10] "Jul 12, 2022" "Jul 12, 2022" "Jul 12, 2022"
[13] "Jul 24, 2022" "Jul 24, 2022" "Jul 24, 2022"
[16] "Jun 04, 2022" "Jun 04, 2022" "Jun 04, 2022"
[19] "Jun 17, 2022" "Jun 17, 2022" "Jun 17, 2022"
[22] "Jun 29, 2022" "Jun 29, 2022" "Jun 29, 2022"
[25] "May 10, 2022" "May 10, 2022" "May 10, 2022"
[28] "May 23, 2022" "May 23, 2022" "May 23, 2022"

But, if we sort stnDate, a Date object, we get the values in order of the date:

> sort(stnDate[seq(from=1,to=300,by=10)])
 [1] "2022-04-15" "2022-04-15" "2022-04-15"
 [4] "2022-04-28" "2022-04-28" "2022-04-28"
 [7] "2022-05-10" "2022-05-10" "2022-05-10"
[10] "2022-05-23" "2022-05-23" "2022-05-23"
[13] "2022-06-04" "2022-06-04" "2022-06-04"
[16] "2022-06-17" "2022-06-17" "2022-06-17"
[19] "2022-06-29" "2022-06-29" "2022-06-29"
[22] "2022-07-12" "2022-07-12" "2022-07-12"
[25] "2022-07-24" "2022-07-24" "2022-07-24"
[28] "2022-08-06" "2022-08-06" "2022-08-06"

You can also perform addition and subtraction on Date objects to add or subtract days:

> stnDate[1:5]
[1] "2022-04-15" "2022-04-17" "2022-04-18" "2022-04-19" "2022-04-20"
> stnDate[1:5]-4
[1] "2022-04-11" "2022-04-13" "2022-04-14" "2022-04-15" "2022-04-16"
> stnDate[1:5]+3
[1] "2022-04-18" "2022-04-20" "2022-04-21" "2022-04-22" "2022-04-23"

Extension: How Date and POSIXct objects tell time

5 Reformatting Dates

Once you have date values properly formatted and saved as a Date object, you can then reformat the date to customize the output using format().

 

For instance, you can customize stnDate to the more familiar to American month/date/2-digit year format:

date_formatted = format(stnDate, format="%m/%d/%y");  

Or, you can get information from the dates, like the day of the week (%A):

date_weekOfDay = format(stnDate, format="%A");      
date_formatted: chr[1:300] "4-15-22" "4-17-22"...
date_weekOfDay: chr[1:300] "Friday" "Sunday"...

When you reformat, the object is no longer a Date object – the reformatted object is a string object and behaves as a string object.  This is awkward behavior in R – there is no way to create a Date variable with a customized format.  Instead, you use the Date variable in plots or calculation and then reformat afterwards when you want to display the values.

6 POSIXct values

Date variables and the corresponding as.Date() function work for values with only dates in them.  Values that have dates and times (or just times) need to be saved as POSIXct variables and the corresponding function is as.POSIXct().

 

The only difference between Date and POSIXct variables is that there are a lot more conversion specifications that can be used for POSIXct (all of the specification used for dates and all of the specifications used for time). Otherwise, they operate the same.

6.1 POSIXct example

Let’s break down the datetime1 column in dateTimeData, which has values that look like this:  2022-04-15 09:42PM

 

There is:

  • the 4-digit year (e.g., 2022)

  • a dash

  • Month as a number (e.g., 04)

  • a dash

  • the day of the month (e.g., 15)

  • a space

  • Hour in 12-hour time (e.g., 09)

  • a colon

  • minutes (e.g., 42)

  • AM/PM indicator (e.g., PM)

     

Using the conversion specifications, this becomes:

  • the 4-digit year (%Y)

  • a dash

  • Month as a number (%m)

  • a dash

  • the day of the month (%d)

  • a space

  • Hour in 12-hour time (%I)

  • a colon

  • minutes (%M)

     

So, the general format of all the values in datetime1 is:

"%Y-%m-%d %I:%M%p"

Once again, it is really important to maintain every non-datetime component (e.g., spaces, dashes, commas).  The conversion specification is an instruction to a datetime function (as.Date or as.POSIXct) that gives the exact format of the string.  If the format is off even by a little, the function will likely not produce correct datetimes.

6.2 Creating a POSIXct variable

We can create a POSIXct variable using as.POSIXct and the datetime conversion specification above:

  stnDateTime = as.POSIXct(dateTimeData$dateTime1,
                           format="%Y-%m-%d %I:%M%p");

stnDateTime is a POSIXct object and R’s default method for displaying POSIXct value is the same as for Date – followed by a 24-hour time using colons. 

stnDateTime: POSIXct[1:300], format "2022-04-15 21:42:00"

Like Date object, we can reformat a POSIXct object using format():

dateTime_formatted = format(stnDateTime, format="%m-%d-%y %H%m");

Or, pull some information out of it (in this case, the abbreviated day of the week %a):

dateTime_weekOfDay = format(stnDateTime, format="%a");
dateTime_formatted: chr[1:300] "04-15-22 2104" "04-17-22 2104"...
dateTime_weekOfDay: chr[1:300] "Fri" "Sun" "Mon"...

6.3 Addition and Subtraction on POSIXct

When we added and subtraction from a Date object, each unit is 1 day.  So, adding 5 to a Date object adds 5 days.

 

When we added and subtraction from a POSIXct object, each unit is 1 second.  So, adding 5 to a POSIXct object adds 5 seconds.

> stnDateTime[1:5]
[1] "2022-04-15 21:42:00 EDT"
[2] "2022-04-17 03:42:00 EDT"
[3] "2022-04-18 09:42:00 EDT"
[4] "2022-04-19 15:42:00 EDT"
[5] "2022-04-20 21:42:00 EDT"
> stnDateTime[1:5] + 34
[1] "2022-04-15 21:42:34 EDT"
[2] "2022-04-17 03:42:34 EDT"
[3] "2022-04-18 09:42:34 EDT"
[4] "2022-04-19 15:42:34 EDT"
[5] "2022-04-20 21:42:34 EDT"
> stnDateTime[1:5] - 12
[1] "2022-04-15 21:41:48 EDT"
[2] "2022-04-17 03:41:48 EDT"
[3] "2022-04-18 09:41:48 EDT"
[4] "2022-04-19 15:41:48 EDT"
[5] "2022-04-20 21:41:48 EDT"

Extension: How Date and POSIXct objects tell time

7 Adding vectors to the data frame

We can add the reformatted vectors we created to the data frame:

  dateTimeData$date_ref = date_formatted;
  dateTimeData$weekOfDay = dateTime_weekOfDay;
Figure 5: Adding two of the reformatted datetime vectors to the data frame

8 Application

  1. Create a properly formatted Date object from the dateTimeData columns date2 and date3

  2. Create a properly formatted POSIXct object from the dateTimeData columns dateTime2 and dateTime3

  3. Create a vector that has the dates in this format: 15-April, 2022

    • add this vector to a column named date_formatted in dateTimeData
  4. Create a vector that has the date-times in this format: 09:36pm on Fri 04/15/22 

    • add this vector to a column named dateTime_formatted in dateTimeData
  5. Create a vector that has date-times given by number of seconds since the epoch 

    • the epochis Jan 1, 1970 at midnight GMT – basically where UNIX starts counting time (you do not need to know the exact time…)

    • you only need to use one conversion specification (i.e., one %_)

    • add this vector to a column named epoch in dateTimeData

 

Save the script as app2-06.r in your scripts folder and  email your Project Folder to Charlie Belinsky at belinsky@msu.edu.

 

Instructions for zipping the Project Folder are here.

 

If you have any questions regarding this application, feel free to email them to Charlie Belinsky at belinsky@msu.edu.

8.1 Questions to answer

Answer the following in comments inside your application script:

  1. What was your level of comfort with the lesson/application?

  2. What areas of the lesson/application confused or still confuses you?

  3. What are some things you would like to know more about that is related to, but not covered in, this lesson?

9 Extension: How Date and POSIXct objects tell time

Behind the scenes, Date and POSIXct objects store datetimes as a single number.

 

The zero-point for Date is Jan 1, 1970 and adds one for every day after:

  • Jan 2, 1970 = 1

  • Jan 3, 1970  = 2

  • Jan 30, 1970 = 31,

  • Jan 1, 1971 = 366

 

If you want to go earlier than you subtract 1 so:

  • Dec 31, 1969 = -1

  • Dec 30, 1969 = -2

     

The zero-point for POSIXct is Jan 1, 1970 at midnight Greenwich Mean Time.  To get any other time, you add 1 for every second after or subtract 1 for every second before.  This is called epoch time and the current epoch time (as of this writing) is 1749475799, which means there have been 1749475799 seconds since Jan 1, 1970 at midnight GMT.

 

as.Date() and as.POSIXct() take string values with conversion specifications and create numeric datetime values.  format() does the reverse and takes numeric datetime values and converts them into strings that are readable to the user.  This all works because the epoch time is fixed to a point in time that everyone has agree upon.  Everything else is a (very complicated) conversion.

 

If you want to know more about how this system works then here is a video I really like discussing all the problems with dealing time zones:

https://www.youtube.com/watch?v=-5wpm-gesOY