FOUNDATION SERVICES - Inner products using wide accumulators

Inner products using wide accumulators

This example illustrates the use of a double_t accumulator to evaluate a sum quickly, accurately, and portably. The function SDot is adapted from the widely used LINPACK and LAPACK basic linear algebra subroutines (BLAS) in FORTRAN.

  float SDot(int n, const float *sx, const float *sy) {
      double_t sum = 0.0;
      if (n > 0) { // skip empty vectors

          for (int i = 0; i < n % 5; i++)
              sum += (double_t) *sx++ * *sy++;
          for (int i = 0; i < n / 5; i++) {
              sum += (double_t) *sx++ * *sy++;
              sum += (double_t) *sx++ * *sy++;
              sum += (double_t) *sx++ * *sy++;
              sum += (double_t) *sx++ * *sy++;
              sum += (double_t) *sx++ * *sy++;
          }
      }
      return sum;
  }

This function computes the dot product of two float vectors--that is, the sum of the products of their corresponding elements. To lessen the relative cost of the loop control when the vectors are long, the authors of the BLAS unrolled the main loop, computing five products per iteration. To improve the accuracy of the result beyond the float-only LINPACK form (written to avoid any performance penalty for mixed-format arithmetic), this routine uses double (or wider) calculation to accumulate the float sum.

A key feature of the function is the type cast in the inner loop:

sum += (double_t) *sx++ * *sy++;

On CommonPoint platforms with processors from the PowerPC and X86 families, which evaluate all expressions to double_t, the cast has no effect. However, on platforms such as PA-RISC, with orthogonal sets of instructions for each floating-point type, the product *sx++ * *sy++ is evaluated in the float format without the cast, sacrificing some accuracy and protection against overflow and underflow.

NOTE Opinions differ regarding the "right way" for compilers to treat mixed-format expressions; the various standards leave the issue open. Casting is a portable technique to ensure that a narrow innermost expression is evaluated to a wider type.

[Contents] [Previous] [Next]

Click the icon to mail questions or corrections about this material to Taligent personnel.

Generated with WebMaker