It doesn't fit exactly into 24 significant bits, so Step 4 applies. The exponent is below so subnormalize to the form:
The braces indicate bits beyond the 24 significant bits of the float type. Proceed to Step 5. When rounding to nearest, round up to:
Raise underflow and inexact because the value was subnormalized and rounded. It can be represented as the 32-bit float value encoded as
.
Closure
With this set of rules the system is closed in the sense that any operation on any floating-point operands produces a well defined result within the system. On every CommonPoint platform you're guaranteed reasonable results under all circumstances, with a suitable exception raised when certain boundaries are transgressed. Exceptions are covered in more detail later. Alternatives
The prescription for computation given here, while thorough, is not the only way to compute results meeting the requirements of the various applicable standards. The IEEE standards allow implementations three ways to detect underflow:
The differences between these definitions rarely matter. Although the IEEE standards define underflow in terms of the process of detecting it, it's helpful to think of the definitions in terms of tiny results:
NOTE
The computed value does not differ between implementations. Sign of zero
While the real value zero is exactly representable in the floating-point number systems, its sign is an artifact outside the mathematics of real numbers. The sign of a zero result is determined as follows. First, the IEEE standards specify the sign of a zero product or quotient according to usual sign conventions for nonzero results. Similarly, the sign of the zero result of a format conversion has the sign of the source value (unless the destination is an integer format which cannot represent
). Finally, the sum of two positive zeros is
; the sum of two negative zeros is
. The standards specify (arbitrarily) these ambiguous cases:
More generally, if f is a function of a single variable and
for a floating-point value z, the sign of zero is determined by this model, which captures the sense of the IEEE specifications:
The situation is similar for a function of two arguments. The idea is to use the obvious sign when it's unambiguous, to use one-sided limits when the function assumes a zero value at a zero argument, and to choose arbitrarily otherwise.
[Contents]
[Previous]
[Next]
Click the icon to mail questions or corrections about this material to Taligent personnel.
Generated with WebMaker