Rounding

For the purposes of coercing a mathematical result to a floating-point format, it's useful to think of the real number line divided into half-open binades of the form , and to think of each binade divided into uniform subintervals, where p is the precision of the format. The endpoints of these subintervals and their negatives are just those values that can be represented with precision p. They are often represented as ticks on a number line, as they were in Figure 55 on page 206.

Because the exponent of a floating-point format has a fixed range, there is a largest and a smallest normal binade of representable values. The largest binade has the form , open on the right side because its upper bound is not representable.

The smallest binade has the form . To cope with results in the interval , that is, to cope with underflow, normal numbers are augmented by the endpoints of the uniform subintervals of . As shown in Figure 55 on page 206 these subnormal numbers have the same spacing as those in the smallest normal binade.

Then any finite result can be rounded according to where it falls on the number line among the ticks of representable numbers. If a result falls on a tick then it requires no rounding. Otherwise, the result is rounded to one of its two neighboring ticks according to the rounding mode in effect.


The default rounding mode--to nearest--is suitable for most computations. However, you can control the direction of rounding dynamically, as shown in Chapter 10.


[Contents] [Previous] [Next]
Click the icon to mail questions or corrections about this material to Taligent personnel.
Copyright©1995 Taligent,Inc. All rights reserved.

Generated with WebMaker