FOUNDATION SERVICES - Considering floating-point numbers as a subset of the real numbers

Considering floating-point numbers as a subset of the real numbers

Most floating-point values--the normal and subnormal numbers--are simply real numbers. The two floating-point signed zeros correspond to the real number zero with some extra definitions regarding operations outside the real arithmetic. The NaNs and signed infinities have no counterparts in the real number system. You can get some insight into the subtle differences between the floating-point and real number systems by comparing their properties.

The real numbers are dense; between any two real numbers lies another. A floating-point number system is discrete; the gap between neighboring elements is determined by the precision and the elements' values.
The real numbers are unbounded. Floating-point numbers are bounded according to the maximum exponent of a normal number.
Real numbers may be found arbitrarily close to zero. The smallest nonzero floating-point value is the smallest subnormal number.
The real numbers are fully precise. Floating-point numbers have a limited precision, so that results of calculations are usually just approximate. In the real number system, 1/3 has a well-defined, fully precise meaning. In binary floating-point arithmetic, the value 1/3 is represented by rounding it to a nearby representable number.
Most real numbers require an infinite binary (or decimal) expansion. For example, , where the bar indicates a repeating decimal digit, and , where the ellipsis indicates an infinite, nonrepeating fraction. By definition, floating-point values can be expressed with a finite number of bits (or digits); generally, it takes as many decimal digits as binary bits to represent a binary fraction exactly in decimal.

[Contents] [Previous] [Next]

Click the icon to mail questions or corrections about this material to Taligent personnel.

Copyright©1995 Taligent,Inc. All rights reserved.

Generated with WebMaker