Floating point in modern computer

The computer representation for binary floating-point numbers has been standardized by the IEEE in IEEE 754.IEEE 754-1985 officially adopted in 1985 and superseded in 2008 by IEEE 754-2008.

The standard defines:

  • arithmetic formats: sets of binary and decimal floating-point data, which consist of finite numbers (including signed zeros and subnormal numbers), infinities, and special “not a number” values (NaNs)
  • interchange formats: encodings (bit strings) that may be used to exchange floating-point data in an efficient and compact form
  • rounding rules: properties to be satisfied when rounding numbers during arithmetic and conversions
  • operations: arithmetic and other operations on arithmetic formats
  • exception handling: indications of exceptional conditions (such as division by zero, overflow, etc.)

An IEEE 754 format comprises:

  • Finite numbers, which may be either base 2 (binary) or base 10 (decimal). Each finite number is described by three integers: s = a sign (zero or one), c = a significand (or ‘coefficient’), q = an exponent. The numerical value of a finite number is:
    (−1)s × c × bq
    where b is the base (2 or 10). For example, if the sign is 1 (indicating negative), the significand is 12345, the exponent is −3, and the base is 10, then the value of the number is −12.345.
  • Two infinities: +∞ and −∞.
  • Two kinds of NaN: a quiet NaN (qNaN) and a signaling NaN (sNaN). A NaN may carry a payload that is intended for diagnostic information indicating the source of the NaN. The sign of a NaN has no meaning, but it may be predictable in some circumstances.

0.1562510  = 1.5625 * 10-1 =  0.001012 = 1/8 + 1/32 = 1.01 * 2-3
IEEE 754 Single Floating Point Format

sign = 0 (0: positive, 1: negative)
biased exponent = 124 = -3 + bias (bias = 011111112 = 127)
fraction = .01000…2


One thought on “Floating point in modern computer

Leave a Reply

Your email address will not be published. Required fields are marked *