I have a data frame with a date columns that I need to convert into a format R recognizes as a date.
> dataframe
Date Sum
1 06/09/15 2.51
2 06/09/15 3.75
3 06/09/15 3.50
...
I first converted it using sapply
:
> dataframe$Date2<-sapply(dataframe$Date,as.Date,format="%m/%d/%y")
This returned the date as the number of days from Jan 1, 1970:
> dataframe
Date Sum Date2
1 06/09/15 2.51 16595
2 06/09/15 3.75 16595
3 06/09/15 3.50 16595
...
Later on I tried converting it without sapply
:
> dataframe$Date3<-as.Date(dataframe$Date,format="%m/%m/%d")
This, in turn, returned
> dataframe
Date Sum Date2 Date3
1 06/09/15 2.51 16595 2015-09-15
2 06/09/15 3.75 16595 2015-09-15
3 06/09/15 3.50 16595 2015-09-15
...
These are two very different, apparently incompatible formats. Why does sapply
return one format (days since the origin), while doing without it returns another (%Y-%m-%d)?
Now, obviously I could just ignore one method and go forth never using sapply
with as.Date
but I'd like to know why it reads differently. I am also struggling to convert the Date3 vector into the Date2 format.
Thus, I have two questions:
Why does
sapply
provide a different date format?How do I convert a date-recognizable sequence (such as mm/dd/yyyy) into the number of days since 1 Jan 1970?
Answers
Here is an answer to the second part of your original question. To obtain the number of days since the epoch (1 Jan 1970) for a date in the format mm/dd/yyyy
you can use the as.Date()
function:
some.date <- as.Date("06/17/2015", "%m/%d/%Y")
days.since.epoch <- unclass(some.date)
> days.since.epoch
[1] 16616
Internally, R stores the date object some.date
in terms of the number of days since the epoch (1 Jan 1970), and calling unclass()
reveals this internal representation.
Answers
when working with dates I love to use lubridate
as it is in my eyes much easier to use and much more intuitive then the base functions.
Your second question could be done with the following code:
require(lubridate)
dataframe$Date2<-difftime(dataframe$Date3,dmy("01-01-1970"),units="days")
depending on if you want to have the 1. January 1970 as day 1 or not you may have to add a +1 to the end of this line.
I don't really work with sapply and tapply directly (I prefer to use plyr for this) so I can't help with your first question.