The computational model

Floating-point computations arise in several forms. Arithmetic operations such as addition and multiplication are built into C++ as infix operators; mathematical functions such as sine and logarithm are available as global library functions. Some floating-point computations are implicit--for example, the conversion required to display floating-point numbers as decimal values. All these computations apply to members of the floating-point number system, as described in Chapter 8.

A single model underlies the arithmetic operations and mathematical functions in CommonPoint systems. With an understanding of the computational model and with a little mathematics, you can deduce the outcome of any computation. The model is this:

Compute each result as if it had unbounded precision and exponent range; then coerce that mathematical result to the format of the destination.

Mathematically, the model is simple, however, computing a result as if it had unbounded precision and exponent range is impossible in some cases.

Coercing a mathematical result to a floating-point format means finding the most suitable representable value in that format. In the simplest case, for example, , the mathematical result (4) is exactly representable. However, the quotient is not exactly representable in binary floating-point number systems. In this case, the model requires, in effect, that enough quotient bits be computed to determine the representable number nearest to . This is the process of rounding. The results of some computations, such as Exp(1.0E6), are so far beyond the range of CommonPoint number systems that the nearest representable value is very distant, indeed. Such huge mathematical results are typically replaced by an infinite value, whose own behavior in floating-point operations is defined by rules inspired by, but strictly speaking outside, the real number system. Such huge values are said to have overflowed. Finally, some operations, such as , have no result in the real number system. Those with a well defined limit, as in the case , are given the corresponding floating-point value. Other expressions, such as , lack any suitable limit value. The IEEE standards specify that they take the value NaN, a not-a-number symbol signifying complete loss of information.

[Contents] [Previous] [Next]

Click the icon to mail questions or corrections about this material to Taligent personnel.

Generated with WebMaker