FOUNDATION SERVICES - Using long double

Using long double

This example illustrates the use of a long double accumulator to evaluate a sum accurately. It also highlights a key portability issue of the long double type. Like the previous example, SDot, this is inspired by the LINPACK and LAPACK BLAS in FORTRAN.

The first function, SampleNorm2, applies to all platforms on which long double has greater exponent range than the type double; that is, SampleNorm2 works on PA-RISC and X86 platforms. Its implementation is comparatively trivial because of the extra range and precision:

  double SampleNorm2(int n, const double *dx) {
      long double sum = 0.0;
  
      for (int i = 0; i < n; i++, dx++)
          sum += (long double) *dx * *dx;
      return SquareRoot(sum);
  }

Computationally, SampleNorm2 is simpler than SDot in the previous example because it is the sum of squares and cancellation does not arise. However, when all of the vector's components are very small, their squares can underflow in the double format, while if some of the elements are very large, their squares can overflow double. In either case, the square root of the sum may be within range, if only it can be computed.

This implementation is much simpler than the associated function DNRM2 in the BLAS. The extra exponent range of long double obviates the scaling of large and small values in DNRM2. The largest long double value on PA-RISC and X86 platforms is nearly , compared to the largest double value, . Undeserved overflow does not occur. The extra precision--11 bits on X86 and 60 on PA-RISC--ensures a low error bound.

NOTE SampleNorm2 is not robust on PowerPC platforms for vectors all of whose arguments are tiny or some of whose arguments are huge.

Here is a version of the code designed to work on the PowerPC. It reflects the complexity of its BLAS forebear, DNRM2.

  
struct phaseData {
      int scalePower;
      double scaleThreshold;
  } phaseTable[3] = 
      {{ (int) (-0.75 * Logb(DBL_MIN)), SquareRoot(DBL_MIN)},
       {0, SquareRoot(DBL_MAX)},
       { (int) (-0.75 * Logb(DBL_MAX)), kInfinity}};
  
  double PowerNorm2(int n, const double *dx) {
      int phase = 0;
      long double sum = 0.0;
      double_t localThreshold = phaseTable[0].scaleThreshold;
      for (int i = 0; i < n; i++, dx++) {
          
while (Abs(*dx) > localThreshold) {
              phase++;
              localThreshold = phaseTable[phase].scaleThreshold/n;
                  // dividing threshold by n applies only for phase == 1
              sum = Scalb(sum, -2.0 * phaseTable[phase].scalePower));
          }
          sum += Square(Scalb(*dx, phaseTable[phase].scalePower));
      }
      return Scalb(SquareRoot(sum), -phaseTable[phase].scalePower);
  }

Like DRM2, PowerNorm2 has several internal phases. While all components are tiny PowerNorm2 scales them up to compute (scaled) sum. When it encounters its first modest-sized component, PowerNorm2 rescales sum and accumulates with no scaling. Finally, on the occurrence of the first large element, PowerNorm2 begins scaling elements downward to avoid overflow. In the return statement, sum is restored to its proper scaling.

DBL_MIN and DBL_MAX are the smallest and largest postive, normal double values, respectively. They are defined in the Standard C header float.h.

This code is considerably more complicated than SampleNorm2, and it has not been tuned for best performance. There is a clear advantage to using a long double type with extra exponent range.

NOTE There is no guarantee about the speed of long double computations. They are fast on X86 platforms but are carried out in software on PowerPC and PA-RISC platforms. The long double type is not portable and is not even supported in CommonPoint stream I/O. It is intended for use in calculations such as SampleNorm2.

[Contents] [Previous] [Next]

Click the icon to mail questions or corrections about this material to Taligent personnel.

Generated with WebMaker