Manostaxx

The text that follows is owned by the site above referred.

Here is only a small part of the article, for more please follow the link

SOURCE: https://www3.ntu.edu.sg/home/ehchua/programming/java/datarepresentation.html

In computers, floating-point numbers are represented in scientific notation of *fraction* (`F`

) and *exponent* (`E`

) with a *radix* of 2, in the form of `F×2^E`

. Both `E`

and `F`

can be positive as well as negative. Modern computers adopt IEEE 754 standard for representing floating-point numbers. There are two representation schemes: 32-bit single-precision and 64-bit double-precision.

In 32-bit single-precision floating-point representation:

- The most significant bit is the
*sign bit*(`S`

), with 0 for positive numbers and 1 for negative numbers. - The following 8 bits represent
*exponent*(`E`

). - The remaining 23 bits represents
*fraction*(`F`

).

##### Normalized Form

Let’s illustrate with an example, suppose that the 32-bit pattern is `1 1000 0001 011 0000 0000 0000 0000 0000`

, with:

`S = 1`

`E = 1000 0001`

`F = 011 0000 0000 0000 0000 0000`

In the *normalized form*, the actual fraction is normalized with an implicit leading 1 in the form of `1.F`

. In this example, the actual fraction is `1.011 0000 0000 0000 0000 0000 = 1 + 1×2^-2 + 1×2^-3 = 1.375D`

.

The sign bit represents the sign of the number, with `S=0`

for positive and `S=1`

for negative number. In this example with `S=1`

, this is a negative number, i.e., `-1.375D`

.

In normalized form, the actual exponent is `E-127`

(so-called excess-127 or bias-127). This is because we need to represent both positive and negative exponent. With an 8-bit E, ranging from 0 to 255, the excess-127 scheme could provide actual exponent of -127 to 128. In this example, `E-127=129-127=2D`

.

Hence, the number represented is `-1.375×2^2=-5.5D`

.