Time information is one the most misunderstood types of data. This misunderstanding usually culminates in the question “Should time be stored as local time or as UTC?”, the answer to which is “Neither, it should be stored as time.”
Points in time and their human representation
A point in time is a concept, a written date is the human readable representation of this concept, the name of the point in time. The point in time is the same everywhere but it may be called by different names.The Atlantis space shuttle launched for the last time at “11:29 am, July 8 2011 EDT” which just means the time in Florida when the last space shuttle was launched was 11:29 in the morning. At that time it was only 8:29 in Los Angeles, and someone in Sydney would have to get up at 1:29 in the morning of the 9th to catch the liftoff live.
“11:29 am, July 8 2011 EDT” is just a representation for the time the of the lauch, “01:29 am, July 9 2011 Sydney time” and “3:29 pm, July 9 2011 UTC” are other representations for the same point in time.
The problem is that as humans we are used to dealing with non complete time representations. When someone asks us to call them “at four”, we infer, from context, to mean “today at four in the afternoon local time.” Most of our interactions with time leave some information to context. The most common omitted information is the time-zone, and unfortunately this has been brought over to computer languages.
Time in Computer Programming Languages
Most computer programming languages’ standard libraries offer a type for storing date-time information.The common internal representation used to express time in these types is usually a number of some unit of time since a specific well defined date. Since this date is date zero it is called the epoch for that language/platform.
For example the C/C++ time_t type is defined as the number of seconds since midnight January 1, 1970 (the Unix epoch;) Java’s Date uses the same epoch but the units are milliseconds and .Net’s DateTime stores the number of 100 ns units (called ticks) since midnight January 1 of year 1 CE (the common era epoch.)
Using units since the epoch makes date and time calculations and comparisons simple, but it is decidedly not human readable, which is why standard libraries also provide ways for date types to be converted to and from human readable formats and here is where problems usually start.
In C/C++ the epoch is not well defined (i.e. it does not contain timezone information) and timezone information is also absent from the tm structure which is used to parse time_t variables into the components of date and time representations (month, yesr, hour, etc.mponents. It is almost imposible in C/C++ to do time translations without ending up with time_t variables using epochs in different time zones.
Java’s Date class used to look like a wrapper for C/C++ time functions, this was fixed in JDK 1.1, which obsoleted those functions and added the GregorianCalendar class to handle conversions. Java also specifies the epoch’s timezone as UTC, but since Date does not store timezone information this is merely a convention.
.Net did almost the same blunder in version 1 and 1.1, but where a bit more frank about it and added the DateTimeKind enumeration which marks a DateTime as UTC i.e. number of ticks since UTC epoch, local i.e number ticks since the epoch in the timezone configured into the computer; or undefined i.e. timezone is unknown.
It is not possible to reliably compare between times with different timezone epochs so a variable specifying a utc time can be compared with any other utc time variable, local time variables can only be compared with local time variables whose value is known to have originated in the same timezone. In all other cases only variables with times more than 24 hours apart can be reliably compared.
How should time be stored?
Twenty years ago data did not travel much, so there was not much concern about implications of using local dates and time in a global setting. In today’s interconnected world this is no longer the case.Now, pun intended, it’s time we revisit the question at the start of this post:
“Should time be stored as local time or as UTC?”
The answer remains the same “Neither, it should be stored as time.”, but to qualify this answer:
There is no UTC time or local time, a point in time is a point in time, UTC and local are just representations. It is possible to use a date/time variable that stores time using a UTC epoch, a local epoch or even an unspecified epoch, but only the first one is a well defined point in time; the second relies on context, namely the timezone configured in the computer; and the third is not properly defined at all.
When using date/time types that do not offer timezone information use UTC time.
The question now is “Is this enough?”
For applications that just need to record when something happens, it is indeed enough.
Timezone only matters when the time needs to be formatted in a human readable representation; at other times, date/time information can live happily as an opaque date/time variable.
There are applications where the local time of the event is important.
When I look at a photo taken on a holiday, I’d like to know that it was taken at two in the afternoon local time.
If the photo in this example contained location information theoretically that would be enough to figure out the local time, but in practice the offset from UTC in time zones changes occasionally and the period when daylight time savings are in effect for a specific timezone changes much more often. This is information that is not readily available, so the best approach is to store the time using a well defined epoch, and the offset from UTC.
For run-time .Net provides the DateTimeOffset data type which stores a local DateTime and its offset from UTC. Java does not provide a class for storing time and timezone information, but provides the SimpleTimeZone class which can be used to convert Date objects to local string representation (remember in Java all Date objects should be UTC.)
For persistent storage, data types which support timezone offset in databases should be used if available.
If everything else fails time can be persisted as a standard string representation that includes timezone offset information such as the full date and time in ISO 8601, but this approach requires parsing the strings before any computations.
No comments:
Post a Comment