12 September 2012

Part 6: T-SQL Implementation of NORMDIST / NORM.S.DIST

This is Part 6 of a set of posts. The other posts are:
Part 1: Introduction
Part 2: Probability Density Function Implementation
Part 3: Cumulative Distribution Function Implementation 1
Part 4: Cumulative Distribution Function Implementation 2
Part 5: Cumulative Distribution Function Implementation 3
Part 6: References and Notes

References and Notes

As stated in Part 1 the implementations presented in this series of posts are my own, however I conducted no original research in their development.
In this part I list the sources I used while preparing this series of posts as well as notes and additional sources of interest.

Main Sources

Wikipedia article on the Normal Distribution - the Wikipedia article was my first stop when looking for information for these series of posts. It contains the basic information and formulas for the PDF and CDF functions, but more important the section about numerical approximations contains a reference to algorithm 26.2.17 from Abramowitz & Stegun (1964) used in Part 4
Milton Abramowitz and Irene Stegun (1964) - Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables - referred in these posts as "Abramowitz & Stegun (1964)"; this book contains a wide array of formulas and approximation in several fields. The Wikipedia article has more information about this work.
The approximations used in Part 3 and Part 4 are taken from this source.
W. J. Cody (1969) - Rational Chebyshev Approximations for the Error Function, Mathematics of Computation, Vol. 23, No. 107 (Jul., 1969), pp. 631-637 - Referred in these posts as "W. J. Cody (1969)", I came accross this paper from one of the pioneers in numerical analysis while looking for Hart (1968) (see Addtional References below) and decided to use his approximation for the "ultimate" implementation in Part 5.

Improvents in Excel 2007 and 2010 are discussed in the following sources:

The white paper in the last reference in particular is very interesting as it contains a primer on common approximation methods.

Additional References

Cecil Hastings, Jr. (1955) - Approximations for Digital Computers - this is the source for the algoritms presented in Abramowitz & Stegun (1964), thought the actual approximations in this book are for the erf function.
Part 1 of this book describes the methods used to develop the approximations.

John F. Hart, et al (1968) - Computer Approximations - this oft sited book contains a wide array of approximations for several functions including highly accurate ones for the erf function.

Part 5: T-SQL Implementation of NORMDIST / NORM.S.DIST

This is Part 5 of a set of posts. The other posts are:
Part 1: Introduction
Part 2: Probability Density Function Implementation
Part 3: Cumulative Distribution Function Implementation 1
Part 4: Cumulative Distribution Function Implementation 2
Part 5: Cumulative Distribution Function Implementation 3
Part 6: References and Notes

Cumulative Distribution Function Implementation 3

The cumulative distribution function has no closed form expression and can only be approximated (see Part 1.)
The following implementation uses a set of rational approximations from W. J. Cody (1969) (see Part 6 for full references.)

-- =============================================
-- Author:    Eli Algranti
-- Description:  Standard Normal Distribution
--        Cumulative Distribution Function
--        using a rational polynomial
--        approximation to erf() from
--        W. J. Cody 1969
-- Copyright:  Eli Algranti (c) 2012
--
-- This code is licensed under the Microsoft Public
-- License (Ms-Pl) (http://tsqlnormdist.codeplex.com/license)
-- =============================================
CREATE FUNCTION [dbo].[StdNormalDistributionCDF_3] ( @x FLOAT)
RETURNS FLOAT
AS
    BEGIN
    DECLARE @Z FLOAT = ABS(@x)/SQRT(2.0);
    DECLARE @Z2 FLOAT = @Z*@Z; -- optimization

    IF (@Z >=11.0) -- value is too large no need to compute
    BEGIN
      IF @x > 0.0
        RETURN 1.0;
      RETURN 0.0;
    END

    -- Compute ERF using W. J. Cody 1969

    DECLARE @ERF FLOAT;

    IF (@Z <= 0.46786)
    BEGIN
      DECLARE @pA0 FLOAT = 3.209377589138469472562E03;
      DECLARE @pA1 FLOAT = 3.774852376853020208137E02;
      DECLARE @pA2 FLOAT = 1.138641541510501556495E02;
      DECLARE @pA3 FLOAT = 3.161123743870565596947E00;
      DECLARE @pA4 FLOAT = 1.857777061846031526730E-01;

      DECLARE @qA0 FLOAT = 2.844236833439170622273E03;
      DECLARE @qA1 FLOAT = 1.282616526077372275645E03;
      DECLARE @qA2 FLOAT = 2.440246379344441733056E02;
      DECLARE @qA3 FLOAT = 2.360129095234412093499E01;
      DECLARE @qA4 FLOAT = 1.000000000000000000000E00;

      -- For efficiency compute sequence of powers of @Z
      -- (instead of calling POWER(@Z,2), POWER(@Z,4), etc.)
      DECLARE @ZA4 FLOAT = @Z2*@Z2;
      DECLARE @ZA6 FLOAT = @ZA4*@Z2;
      DECLARE @ZA8 FLOAT = @ZA6*@Z2;

      SELECT @ERF = @Z *
        (@pA0 + @pA1*@Z2 + @pA2*@ZA4 + @pA3*@ZA6 + @pA4*@ZA8) /
        (@qA0 + @qA1*@Z2 + @qA2*@ZA4 + @qA3*@ZA6 + @qA4*@ZA8);
    END
    ELSE IF (@Z <= 4.0)
    BEGIN
      DECLARE @pB0 FLOAT = 1.23033935479799725272E03;
      DECLARE @pB1 FLOAT = 2.05107837782607146532E03;
      DECLARE @pB2 FLOAT = 1.71204761263407058314E03;
      DECLARE @pB3 FLOAT = 8.81952221241769090411E02;
      DECLARE @pB4 FLOAT = 2.98635138197400131132E02;
      DECLARE @pB5 FLOAT = 6.61191906371416294775E01;
      DECLARE @pB6 FLOAT = 8.88314979438837594118E00;
      DECLARE @pB7 FLOAT = 5.64188496988670089180E-01;
      DECLARE @pB8 FLOAT = 2.15311535474403846343E-08;

      DECLARE @qB0 FLOAT = 1.23033935480374942043E03;
      DECLARE @qB1 FLOAT = 3.43936767414372163696E03;
      DECLARE @qB2 FLOAT = 4.36261909014324715820E03;
      DECLARE @qB3 FLOAT = 3.29079923573345962678E03;
      DECLARE @qB4 FLOAT = 1.62138957456669018874E03;
      DECLARE @qB5 FLOAT = 5.37181101862009857509E02;
      DECLARE @qB6 FLOAT = 1.17693950891312499305E02;
      DECLARE @qB7 FLOAT = 1.57449261107098347253E01;
      DECLARE @qB8 FLOAT = 1.00000000000000000000E00;

      -- For efficiency compute sequence of powers of @Z
      -- (instead of calling POWER(@Z,2), POWER(@Z,3), etc.)
      DECLARE @ZB3 FLOAT = @Z2*@Z;
      DECLARE @ZB4 FLOAT = @ZB3*@Z;
      DECLARE @ZB5 FLOAT = @ZB4*@Z;
      DECLARE @ZB6 FLOAT = @ZB5*@Z;
      DECLARE @ZB7 FLOAT = @ZB6*@Z;
      DECLARE @ZB8 FLOAT = @ZB7*@Z;

      SELECT @ERF = 1.0 - EXP(-@Z2) *
              (@pB0 + @pB1*@Z + @pB2*@Z2 + @pB3*@ZB3 + @pB4*@ZB4
               + @pB5*@ZB5 + @pB6*@ZB6 + @pB7*@ZB7 + @pB8*@ZB8) /
              (@qB0 + @qB1*@Z + @qB2*@Z2 + @qB3*@ZB3 + @qB4*@ZB4
               + @qB5*@ZB5 + @qB6*@ZB6 + @qB7*@ZB7 + @qB8*@ZB8);
    END
    ELSE
    BEGIN
      DECLARE @pC0 FLOAT = -6.58749161529837803157E-04;
      DECLARE @pC1 FLOAT = -1.60837851487422766278E-02;
      DECLARE @pC2 FLOAT = -1.25781726111229246204E-01;
      DECLARE @pC3 FLOAT = -3.60344899949804439429E-01;
      DECLARE @pC4 FLOAT = -3.05326634961232344035E-01;
      DECLARE @pC5 FLOAT = -1.63153871373020978498E-02;

      DECLARE @qC0 FLOAT = 2.33520497626869185443E-03;
      DECLARE @qC1 FLOAT = 6.05183413124413191178E-02;
      DECLARE @qC2 FLOAT = 5.27905102951428412248E-01;
      DECLARE @qC3 FLOAT = 1.87295284992346047209E00;
      DECLARE @qC4 FLOAT = 2.56852019228982242072E00;
      DECLARE @qC5 FLOAT = 1.00000000000000000000E00;

      DECLARE @pi FLOAT = 3.141592653589793238462643383;

      -- For efficiency compute sequence of powers of @Z
      -- (instead of calling POWER(@Z,-2), POWER(@Z,-3), etc.)
      DECLARE @ZC2 FLOAT = (1/@Z)/@Z;
      DECLARE @ZC4 FLOAT = @ZC2*@ZC2;
      DECLARE @ZC6 FLOAT = @ZC4*@ZC2;
      DECLARE @ZC8 FLOAT = @ZC6*@ZC2;
      DECLARE @ZC10 FLOAT = @ZC8*@ZC2;

      SELECT @ERF = 1 - EXP(-@Z2)/@Z * (1/SQRT(@pi) + 1/(@Z2)*
             ((@pC0 + @pC1*@ZC2 + @pC2*@ZC4 + @pC3*@ZC6 + @pC4*@ZC8 + @pC5*@ZC10) /
              (@qC0 + @qC1*@ZC2 + @qC2*@ZC4 + @qC3*@ZC6 + @qC4*@ZC8 + @qC5*@ZC10)));
    END

    DECLARE @cd FLOAT = 0.5*(1+@ERF);

    IF @x > 0
      RETURN @cd;

    RETURN 1.0-@cd;
    END

Verification Using Excel 2010 as Reference

Maximum difference 5.55111512312578E-16 at value 1.00999999999983

Verification Using Excel 2007 as Reference

Maximum difference 5.09148279093097E-13 at value 4.97999999999994

Discussion

This more complex and slower implementation uses three different rational approximations to provide maximum accuracy over a wide range of values.
Here at last we see the differences between Excel 2010 and Excel 2007.
Using Excel 2010 as reference we note the approximation provides the maximum precision possible with a double precision float (15.5 decimal places.)
Using Excel 2007 the precision drops down to 12 decimal places. The graph shows this happens for only two small bands at around 5 and -5 which is where the Excel 2007 implementation is lacking and where Excel 2010 shows improvements.

Part 4: T-SQL Implementation of NORMDIST / NORM.S.DIST

This is Part 4 of a set of posts. The other posts are:
Part 1: Introduction
Part 2: Probability Density Function Implementation
Part 3: Cumulative Distribution Function Implementation 1
Part 4: Cumulative Distribution Function Implementation 2
Part 5: Cumulative Distribution Function Implementation 3
Part 6: References and Notes

Cumulative Distribution Function Implementation 2

The cumulative distribution function has no closed form expression and can only be approximated (see Part 1.)
The following implementation uses approximation 26.2.17 from Abramowitz & Stegun (1964) (see Part 6) for full references.

-- =============================================
-- Author:    Eli Algranti
-- Description:  Standard Normal Distribution
--        Cummulative Distribution Function
--        using polinomial approximation 26.2.17
--        from Abramowitz & Stegun (1964)
-- Remarks:    This function depends on
--        [dbo].[StdNormalDistributionPDF]
--        to calculate the Standard Normal
--        Distribution PDF for the value
-- Copyright:  Eli Algranti (c) 2012
--
-- This code is licensed under the Microsoft Public
-- License (Ms-Pl) (https://tsqlnormdist.codeplex.com/license)
-- =============================================
CREATE FUNCTION [dbo].[Stdnormaldistributioncdf_2] (@x FLOAT)
returns FLOAT
AS
  BEGIN
      DECLARE @Z FLOAT = Abs(@x)

      IF ( @Z >= 15 ) -- value is too large no need to compute
        BEGIN
            IF @x > 0
              RETURN 1;

            RETURN 0;
        END

      -- Compute the Standard Normal Cummulative Distribution using
      -- polinomial approximation 26.2.17 from Abramowitz & Stegun (1964)
      DECLARE @p FLOAT = 0.2316419;
      DECLARE @b1 FLOAT = 0.319381530;
      DECLARE @b2 FLOAT = -0.356563782;
      DECLARE @b3 FLOAT = 1.781477937;
      DECLARE @b4 FLOAT = -1.821255978;
      DECLARE @b5 FLOAT = 1.330274429;
      DECLARE @t FLOAT = 1.0 / ( 1.0 + @p * @Z );
      -- For efficiency compute sequence of powers of @t (instead of calling POWER(@t,2), POWER(@t,3), etc.)
      DECLARE @t2 FLOAT = @t * @t;
      DECLARE @t3 FLOAT = @t2 * @t;
      DECLARE @t4 FLOAT = @t3 * @t;
      DECLARE @cd FLOAT = 1.0 - [dbo].[Stdnormaldistributionpdf](@Z) * (
                                  @b1 * @t + @b2 * @t2 + @b3 * @t3 + @b4 * @t4 +
                                  @b5 *
                                  @t4 * @t )

      IF @x > 0
        RETURN @cd;

      RETURN 1.0 - @cd;
  END

Verification Using Excel 2010 as Reference

Maximum difference 7.4506132929919E-08 at value 0.719999999999832

Verification Using Excel 2007 as Reference

Maximum difference 7.4506132929919E-08 at value 0.719999999999832

Discussion

This approximation provides a precision of eight decimal places and surprisingly the same maximum difference for both versions of Excel. I had to check twice.
Unlike the previous implementation this one is not a pure polynomial approximation and requires the PDF to be computed.
The performance of this implementation can be vastly improved if the PDF needs to be computed anyway by passing the pre-computed PDF value as a parameter instead of computing it in the function.

Part 3: T-SQL Implementation of NORMDIST / NORM.S.DIST

This is Part 3 of a set of posts. The other posts are:
Part 1: Introduction
Part 2: Probability Density Function Implementation
Part 3: Cumulative Distribution Function Implementation 1
Part 4: Cumulative Distribution Function Implementation 2
Part 5: Cumulative Distribution Function Implementation 3
Part 6: References and Notes

Cumulative Distribution Function Implementation 1

The cumulative distribution function has no closed form expression and can only be approximated (see Part 1.)
The following implementation uses polynomial approximation 26.2.18 from Abramowitz & Stegun (1964) (see Part 6) for full references.

-- =============================================
-- Author:    Eli Algranti
-- Description:  Standard Normal Distribution
--        Cummulative Distribution Function
--        using polinomial approximation 26.2.18
--        from Abramowitz & Stegun (1964)
-- Copyright:  Eli Algranti (c) 2012
--
-- This code is licensed under the Microsoft Public
-- License (Ms-Pl) (https://tsqlnormdist.codeplex.com/license)
-- =============================================
CREATE FUNCTION [dbo].[Stdnormaldistributioncdf_1] (@x FLOAT)
returns FLOAT
AS
  BEGIN
      DECLARE @Z FLOAT = Abs(@x)

      IF ( @Z >= 15 ) -- value is too large no need to compute
        BEGIN
            IF @x > 0
              RETURN 1;

            RETURN 0;
        END

      -- Compute the Standard Normal Cummulative Distribution using
      -- polinomial approximation 26.2.18 from Abramowitz & Stegun (1964)
      DECLARE @c1 FLOAT = 0.196854;
      DECLARE @c2 FLOAT = 0.115194;
      DECLARE @c3 FLOAT = 0.000344;
      DECLARE @c4 FLOAT = 0.019527;
      -- For efficiency compute sequence of powers of @Z (instead of calling POWER(@Z,2), POWER(@Z,3), etc.)
      DECLARE @Z2 FLOAT = @Z * @Z;
      DECLARE @Z3 FLOAT = @Z2 * @Z;
      DECLARE @cd FLOAT = 1.0 - 0.5 * Power(1 + @c1 * @Z + @c2 * @Z2 + @c3 * @Z3
                                            +
                                            @c4 * @Z3 * @Z, -4)

      IF @x > 0
        RETURN @cd;

      RETURN 1.0 - @cd;
  END

Verification Using Excel 2010 as Reference

Maximum difference 0.000232983371206426 at value -1.82000000000017

Verification Using Excel 2007 as Reference

Maximum difference 0.000232983371206363 at value -1.82000000000017

Discussion

This approximation provides a precision of three decimal places, which is enough for many applications.
The main advantage of this implementation is its simplicity, requiring just a handful of arithmetic operations.

Part 2: T-SQL Implementation of NORMDIST / NORM.S.DIST

This is Part 2 of a set of posts. The other posts are:
Part 1: Introduction
Part 2: Probability Density Function Implementation
Part 3: Cumulative Distribution Function Implementation 1
Part 4: Cumulative Distribution Function Implementation 2
Part 5: Cumulative Distribution Function Implementation 3
Part 6: References and Notes

Probability Density Function Implementation

The probability density function can be computed directly from the definition (see Part 1.)

-- =============================================
-- Author:    Eli Algranti
-- Description:  Standard Normal Distribution
--        Probability Density Function
-- Copyright:  Eli Algranti (c) 2012
--
-- This code is licensed under the Microsoft Public
-- License (Ms-Pl) (https://tsqlnormdist.codeplex.com/license)
-- =============================================
CREATE FUNCTION [dbo].[StdNormalDistributionPDF]
(
  @x FLOAT
)
RETURNS FLOAT
AS
BEGIN
  DECLARE @pi FLOAT = 3.141592653589793238462643383;

  -- The standard normal probability corresponding to @x
  RETURN EXP(-0.5*@x*@x)/SQRT(2.0*@pi);
END

Verification Using Excel 2010 as Reference

Maximum difference 4.71844785465692E-16 at value 1.02999999999983

Verification Using Excel 2007 as Reference

Maximum difference 4.9960036108132E-16 at value 1.02999999999983

Note: The graphs do not display a value for the Y axis because the values are too small.

Discussion

No surprises here. The T-SQL implementation closely matches both Excel's implementations and is within the error margin for double computations. The function may be optimized by precomputing SQRT(2*@pi).

Part 1: T-SQL Implementation of NORMDIST / NORM.S.DIST

This is Part 1 of a set of posts. The other posts are:
Part 1: Introduction
Part 2: Probability Density Function Implementation
Part 3: Cumulative Distribution Function Implementation 1
Part 4: Cumulative Distribution Function Implementation 2
Part 5: Cumulative Distribution Function Implementation 3
Part 6: References and Notes

Introduction and Methods

I’m not the first to use Excel as a tool for developing an algorithm; nor the first, when implementing said algorithm, to be surprised that SQL Server does not ship with a library of statistical functions.
In this series of posts I’ll develop a set of T-SQL functions which implement the very useful Normal Distribution’s ‘Probability Density Function’ (PDF) and ‘Cumulative Distribution Function’ (CDF.)

The Very Basics

The Standard Normal Distribution is the special case of a Normal Distribution with mean zero and standard deviation one. The function for the Standard Normal Distribution for variable x is:

The Normal Distribution PDF (for a variable x with mean µ and standard deviation s) can be expressed as a function of the Standard Normal Distribution PDF:

The function for the Standard Normal Distribution CDF is:

Like the Normal Distribution PDF the Normal Distribution CDF can be expressed as a function of the Standard Normal Distribution CDF:

Another way to express it is by means of the special function erf (the error function):

The Implementation

The Standard Normal Distribution PDF (and by extension the Normal Distribution PDF) can be implemented as a user defined function directly from the definition. Such an implementation is presented in Part 2 of this series. The Standard Normal Distribution CDF has no closed form equivalent and can only be approximated:
Part 3 presents an implementation of a simple (and fast) polynomial approximation with a precision slightly better than three decimal points.
Part 4 presents a more complex polynomial approximation implementation with a precision of seven decimal points.
Finally Part 5 presents an implementation using a set of rational approximation. This more complex and slower) approximation yields a precision of fifteen decimal digits, which is also the precision of a double precision floating point value.
The approximations in Part 3 & Part 4 are courtesy of Abramowitz & Stegun (1964) while the implementation in Part 5 uses an approximation of the erf function from W. J. Cody (1969.)
See Part 6 for a list of references and notes.

Ensuring Accuracy

One of the problems when implementing mathematical functions is how to ensure the implementation is correct and the values returned accurate.
In order to test the accuracy of my implementations I've used to compute the values of doubles from -10 to 10 with 0.01 intervals and compared them to the results of the Excel 2010 implementation of the same function. Statistical functions in Excel versions prior to Excel 2003 where criticized for their accuracy. Microsoft greatly increased the accuracy of Excel 2003 from previous versions and made more improvements in Excel 2010.
Since the trigger for the work in these series of posts was my desire to implement an algorithm prototyped in Excel, and given that the latest Excel version addresses accuracy concerns; Excel 2010 is a good candidate to test the implementations.
I've also included the results using Excel 2007 as the reference implementation. In order to perform the comparison I wrote a small console utility called CheckError. CheckError gets as parameter the range, interval and functions to test and then:

Writes on the console the maximum difference between Excel and Sql Server implementations and at which input value it occurrs.
Writes files each containing the double precision floating point binary representation of:
- The input values.
- The Excel results.
- The Sql Server results.
- The differences between the Excel and Sql Server results.
Writes a tab delimited file containing decimal string representations of the value above.

The binary files can be used to test the precision of the solution using other reference implementations (say MatLab.) The tab delimited files should not be used for this purpose as there is a loss of precision involved in converting from the binary representation to the decimal string representation.
I used the tab delimited file to create the graphs presented along side the implementations since the loss of precision is not relevant in this case.
The source code for CheckError the T-SQL implementations, output files and Excel file with graphs can be found at codeplex and is released under the Microsoft Public License (Ms-PL.)

Caveat Emptor

The license for the code discussed here contains the full “as is” disclaimer, however since statistical functions may be used in all sorts of critical applications consider the following warnings if you decide to use these implementations:
While the tests described above are good enough for my purposes they are far from a complete analysis of the implementations.
In particular:

I have not performed a review of available literature to discover whether the sources I’ve based my implementations on have been disputed or corrected.
My implementation (or Excel 2010 which I use as the reference implementation or both) may have bugs which my testing does not reveal and may result in an error greater than the one stated.

Note Regarding References

The implementations and testing tools presented here are my own, but no original research was undertaken in their development.
Part 6 contains a full list of references and notes on where I found what information.

04 July 2012

Well Known Text (WTK) and Coordinates in Geography Objects

I’ve been working with MS-SQL Spatial objects lately and specially for Geography objects documentation seems to be somewhat erratic; specifically which number in a Well Known Text (WKT) string definition is the latitude and which is the longitude.

Points in WKT for Geography coordinates are defined “longitude latitude”.
This makes some sense as it corresponds to the “X Y” representation of coordinates of Geometry object, longitude being marked on the X axis and latitude on the Y axis of a North on top map, except, it doesn’t.
That’s because everywhere else coordinates are usually represented “latitude longitude” even in the Geography classes themselves.
The signaure of the factory method for Geography Points is:

public static SqlGeography Point(
double latitude,
double longitude,
int srid
)

For example:

SqlGeography.Point(-30, 10, 4326);

is the same as:

SqlGeometry.STPointFromText("POINT (10 -30)");

Bonus fact:

When specifying geographic boundaries as Geography Polygons the points should be specified counter-clockwise.

01 March 2012

Date and Time

Disclaimer: this post is about date and time on earth in human time frames, relativity concerns do not apply.

Time information is one the most misunderstood types of data. This misunderstanding usually culminates in the question “Should time be stored as local time or as UTC?”, the answer to which is “Neither, it should be stored as time.”

Points in time and their human representation

A point in time is a concept, a written date is the human readable representation of this concept, the name of the point in time. The point in time is the same everywhere but it may be called by different names.
The Atlantis space shuttle launched for the last time at “11:29 am, July 8 2011 EDT” which just means the time in Florida when the last space shuttle was launched was 11:29 in the morning. At that time it was only 8:29 in Los Angeles, and someone in Sydney would have to get up at 1:29 in the morning of the 9th to catch the liftoff live.
“11:29 am, July 8 2011 EDT” is just a representation for the time the of the lauch, “01:29 am, July 9 2011 Sydney time” and “3:29 pm, July 9 2011 UTC” are other representations for the same point in time.
The problem is that as humans we are used to dealing with non complete time representations. When someone asks us to call them “at four”, we infer, from context, to mean “today at four in the afternoon local time.” Most of our interactions with time leave some information to context. The most common omitted information is the time-zone, and unfortunately this has been brought over to computer languages.

Time in Computer Programming Languages

Most computer programming languages’ standard libraries offer a type for storing date-time information.
The common internal representation used to express time in these types is usually a number of some unit of time since a specific well defined date. Since this date is date zero it is called the epoch for that language/platform.
For example the C/C++ time_t type is defined as the number of seconds since midnight January 1, 1970 (the Unix epoch;) Java’s Date uses the same epoch but the units are milliseconds and .Net’s DateTime stores the number of 100 ns units (called ticks) since midnight January 1 of year 1 CE (the common era epoch.)
Using units since the epoch makes date and time calculations and comparisons simple, but it is decidedly not human readable, which is why standard libraries also provide ways for date types to be converted to and from human readable formats and here is where problems usually start.
In C/C++ the epoch is not well defined (i.e. it does not contain timezone information) and timezone information is also absent from the tm structure which is used to parse time_t variables into the components of date and time representations (month, yesr, hour, etc.mponents. It is almost imposible in C/C++ to do time translations without ending up with time_t variables using epochs in different time zones.
Java’s Date class used to look like a wrapper for C/C++ time functions, this was fixed in JDK 1.1, which obsoleted those functions and added the GregorianCalendar class to handle conversions. Java also specifies the epoch’s timezone as UTC, but since Date does not store timezone information this is merely a convention.
.Net did almost the same blunder in version 1 and 1.1, but where a bit more frank about it and added the DateTimeKind enumeration which marks a DateTime as UTC i.e. number of ticks since UTC epoch, local i.e number ticks since the epoch in the timezone configured into the computer; or undefined i.e. timezone is unknown.
It is not possible to reliably compare between times with different timezone epochs so a variable specifying a utc time can be compared with any other utc time variable, local time variables can only be compared with local time variables whose value is known to have originated in the same timezone. In all other cases only variables with times more than 24 hours apart can be reliably compared.
How should time be stored?
Twenty years ago data did not travel much, so there was not much concern about implications of using local dates and time in a global setting. In today’s interconnected world this is no longer the case.
Now, pun intended, it’s time we revisit the question at the start of this post:

“Should time be stored as local time or as UTC?”

The answer remains the same “Neither, it should be stored as time.”, but to qualify this answer:

There is no UTC time or local time, a point in time is a point in time, UTC and local are just representations. It is possible to use a date/time variable that stores time using a UTC epoch, a local epoch or even an unspecified epoch, but only the first one is a well defined point in time; the second relies on context, namely the timezone configured in the computer; and the third is not properly defined at all.
When using date/time types that do not offer timezone information use UTC time.
The question now is “Is this enough?”
For applications that just need to record when something happens, it is indeed enough.
Timezone only matters when the time needs to be formatted in a human readable representation; at other times, date/time information can live happily as an opaque date/time variable.
There are applications where the local time of the event is important.
When I look at a photo taken on a holiday, I’d like to know that it was taken at two in the afternoon local time.
If the photo in this example contained location information theoretically that would be enough to figure out the local time, but in practice the offset from UTC in time zones changes occasionally and the period when daylight time savings are in effect for a specific timezone changes much more often. This is information that is not readily available, so the best approach is to store the time using a well defined epoch, and the offset from UTC.
For run-time .Net provides the DateTimeOffset data type which stores a local DateTime and its offset from UTC. Java does not provide a class for storing time and timezone information, but provides the SimpleTimeZone class which can be used to convert Date objects to local string representation (remember in Java all Date objects should be UTC.)
For persistent storage, data types which support timezone offset in databases should be used if available.
If everything else fails time can be persisted as a standard string representation that includes timezone offset information such as the full date and time in ISO 8601, but this approach requires parsing the strings before any computations.