IEEE(3) | MidnightBSD Library Functions Manual | IEEE(3) |
ieee
— IEEE
standard 754 for floating-point arithmetic
The IEEE Standard 754 for Binary Floating-Point Arithmetic defines representations of floating-point numbers and abstract properties of arithmetic operations relating to precision, rounding, and exceptional cases, as described below.
Radix: Binary.
Overflow and underflow:
Zero is represented ambiguously as +0 or -0.
copysign
(x,
±0). In particular, comparison (x > y, x
≥ y, etc.) cannot be affected by the sign of zero; but if finite x = y
then infinity = 1/(x-y) ≠ -1/(y-x) = -infinity.Infinity is signed.
Reserved operands (NaNs):
Rounding:
Exceptions:
Exception | Default Result |
Invalid Operation | NaN, or FALSE |
Overflow | ±infinity |
Divide by Zero | ±infinity |
Underflow | Gradual Underflow |
Inexact | Rounded value |
NOTE: An Exception is not an Error unless handled badly. What makes a class of exceptions exceptional is that no single default response can be satisfactory in every instance. On the other hand, if a default response will serve most instances satisfactorily, the unsatisfactory instances cannot justify aborting computation every time the exception occurs.
Single-precision:
Wordsize: 32 bits.
Precision: 24 significant bits, roughly like 7 significant decimals.
If x and x' are consecutive positive single-precision numbers (they differ by 1 ulp), then
5.9e-08 < 0.5**24 < (x'-x)/x ≤ 0.5**23 < 1.2e-07. |
Range: | Overflow threshold = 2.0**128 = 3.4e38 |
Underflow threshold = 0.5**126 = 1.2e-38 |
Underflowed results round to the nearest integer multiple of
0.5**149 = 1.4e-45. |
Double-precision:
Wordsize: 64 bits.
Precision: 53 significant bits, roughly like 16 significant decimals.
If x and x' are consecutive positive double-precision numbers (they differ by 1 ulp), then
1.1e-16 < 0.5**53 < (x'-x)/x ≤ 0.5**52 < 2.3e-16. |
Range: | Overflow threshold = 2.0**1024 = 1.8e308 |
Underflow threshold = 0.5**1022 = 2.2e-308 |
Underflowed results round to the nearest integer multiple of
0.5**1074 = 4.9e-324. |
Extended-precision:
Wordsize: 96 bits.
Precision: 64 significant bits, roughly like 19 significant decimals.
If x and x' are consecutive positive extended-precision numbers (they differ by 1 ulp), then
1.0e-19 < 0.5**63 < (x'-x)/x ≤ 0.5**62 < 2.2e-19. |
Range: | Overflow threshold = 2.0**16384 = 1.2e4932 |
Underflow threshold = 0.5**16382 = 3.4e-4932 |
Underflowed results round to the nearest integer multiple of
0.5**16445 = 5.7e-4953. |
Quad-extended-precision:
Wordsize: 128 bits.
Precision: 113 significant bits, roughly like 34 significant decimals.
If x and x' are consecutive positive quad-extended-precision numbers (they differ by 1 ulp), then
9.6e-35 < 0.5**113 < (x'-x)/x ≤ 0.5**112 < 2.0e-34. |
Range: | Overflow threshold = 2.0**16384 = 1.2e4932 |
Underflow threshold = 0.5**16382 = 3.4e-4932 |
Underflowed results round to the nearest integer multiple of
0.5**16494 = 6.5e-4966. |
For each kind of floating-point exception, IEEE 754 provides a Flag that is raised each time its exception is signaled, and stays raised until the program resets it. Programs may also test, save and restore a flag. Thus, IEEE 754 provides three ways by which programs may cope with exceptions for which the default result might be unsatisfactory:
CAUTION: The only reliable ways to discover whether Underflow has occurred are to test whether products or quotients lie closer to zero than the underflow threshold, or to test the Underflow flag. (Sums and differences cannot underflow in IEEE 754; if x ≠ y then x-y is correct to full precision and certainly nonzero regardless of how tiny it may be.) Products and quotients that underflow gradually can lose accuracy gradually without vanishing, so comparing them with zero (as one might on a VAX) will not reveal the loss. Fortunately, if a gradually underflowed value is destined to be added to something bigger than the underflow threshold, as is almost always the case, digits lost to gradual underflow will not be missed because they would have been rounded off anyway. So gradual underflows are usually provably ignorable. The same cannot be said of underflows flushed to 0.
At the option of an implementor conforming to IEEE 754, other ways to cope with exceptions may be provided:
Ideally, each elementary function should act as if it were indivisible, or atomic, in the sense that ...
The functions in libm
are only
approximately atomic. They signal no inappropriate exception except possibly
...
fenv(3), ieee_test(3), math(3)
An explanation of IEEE 754 and its proposed extension p854 was published in the IEEE magazine MICRO in August 1984 under the title "A Proposed Radix- and Word-length-independent Standard for Floating-point Arithmetic" by W. J. Cody et al. The manuals for Pascal, C and BASIC on the Apple Macintosh document the features of IEEE 754 pretty well. Articles in the IEEE magazine COMPUTER vol. 14 no. 3 (Mar. 1981), and in the ACM SIGNUM Newsletter Special Issue of Oct. 1979, may be helpful although they pertain to superseded drafts of the standard.
IEEE Std 754-1985
January 26, 2005 | midnightbsd-3.1 |