Lab 8: Floating Point examples

Lab 8 asks you to implement addition and multiplication for single-precision, 32-bit floating point numbers. A few worked examples are given below.

Addition

My implementation for xadd looks something like this.

Check if either x or y is zero and return the other one if so;
Denormalize x or y to give them the same exponent (the exponent of the larger one);
Add the significands, taking sign into account;
If the result is 0, return 0;
Normalize the result; and
Encode the result (bias the exponent; and mask off the hidden bit) and finally construct the new Xfloat with the result’s sign, exponent, and significand.

Let’s apply this algorithm to adding $2.5$ and $-7.25$ to see how this works.

Let’s start by encoding these as a 32-bit single precision floating point number, performing the 5 steps, and looking at the result.

$2.5_{10} = 10.1_2 = 1.01_2\cdot2^1 = (-1)^0\cdot1.01000000000000000000000_2\cdot2^1$ . Therefore, the three fields of the float are sign = 0, exponent = 1, and significand = $1.01000000000000000000000_2$ . These are encoded as 0 10000000 01000000000000000000000 (where I’ve added 127 to the exponent and dropped the hidden bit from the significand).

$-7.25_{10} = -111.01_2 = (-1)^1\cdot1.11010000000000000000000_2\cdot2^2$ . Therefore, the three fields of the float are sign = 1, exponent = 2, and significand = $1.01000000000000000000000_2$ . Similarly, these are encoded as 1 10000001 11010000000000000000000.

Step 1. Neither x nor y are zero, so we continue to step 2.

Step 2. Extract and unbias the exponents for x and y; extract and add in the hidden bit to the significands for x and y. This gives the following.

x_exp = 1
y_exp = 2
x_sig = 00000000 1 01000000000000000000000
y_sig = 00000000 1 11010000000000000000000

In x_sig and y_sig, the first 8 bits are 0s, the next bit is the hidden 1 bit and the remaining 23 are the significand without the hidden bit.

Since y_exp > x_exp, we denormalize x by shifting x_sig right by y_exp - x_exp and setting x_exp to y_exp. This gives us the following variables.

x_exp = 2
y_exp = 2
x_sig = 00000000010100000000000000000000
y_sig = 00000000111010000000000000000000

Step 3. We need to add the significands together paying attention to the sign. Because x is positive and y is negative, we need to compute z_sig = x_sig - y_sig where z is going to be the result. In this case, z_sig is negative and we can’t have that, so we’ll set z_sign = 1 and z_sig = -z_sig. In any case, z_exp = x_exp = y_exp, although that can change in step 5 when we normalize the result.

z_sign = 1
z_exp = 2
z_sig = 00000000100110000000000000000000

Step 4. The result is not zero, so move to the next step.

Step 5. Now, we need to normalize the result, but in this particular case, there’s actually nothing to be done! I can tell there’s nothing to be done because the significand is 00000000 1 00110000000000000000000. In other words, it starts with eight 0s and then a 1 (the hidden bit) followed by 23 bits. If it started with fewer than eight 0s, we’d need to shift the significand to the right and increment the exponent until it started with eight 0s. If it started with more than eight 0s, we’d need to shift the significand to the left and decrement the exponent until it started with eight 0s.

This gives our final value of

z_sign = 1
z_exp = 2
z_sig = 00000000 1 00110000000000000000000

We can check we did this correctly with some arithmetic: $(-1)^1\cdot1.0011_2\cdot2^2 = -100.11_2=-4.75$ . Success!

Step 6. The last thing we need to do is encode the result. So we need to add 127 to the exponent and mask off the hidden bit in the significand giving 1 10000001 00110000000000000000000.

The previous example shows off all of the steps except for step 5, normalize the result. So let’s do another quick example, but this time using 2.5 and 7.25.

Steps 1 and 2 are identical. After them, we have

x_exp = 2
y_exp = 2
x_sig = 00000000010100000000000000000000
y_sig = 00000000111010000000000000000000

In step 3, we simply add the significands together since neither x nor y is negative. This gives.

z_sign = 0
z_exp = 2
z_sig = 00000001001110000000000000000000

Step 4 is the same: z_sig is not 0, so we continue to step 5.

Step 5. Here, z_sig starts with seven 0s so we need to shift z_sig right by 1 and increment the exponent by one giving

z_sign = 0
z_exp = 3
z_sig = 00000000 1 00111000000000000000000

Checking our arithmetic, we have $(-1)^0\cdot1.00111_2\cdot2^3 = 1001.11_2 = 9.75$ . Success!

Step 6. We encode by adding 127 to the exponent and masking off the hidden bit as before giving 0 10000010 00111000000000000000000.

Multiplication

My implementation of xmult looks something like this.

If either x or y is 0, return 0;
Compute the sign of the result;
Add the exponents;
Multiply the significands and shift;
Normalize the result; and
Encode the exponent and significand.

As a simple example, let’s multiply $-1$ and $33.25$ . These have encodings

x: 1 01111111 00000000000000000000000
y: 0 10000100 00001010000000000000000

Step 1. Neither x nor y are 0, so continue to step 2.

Step 2. Since x is negative and y is positive, we know the result is negative.

z_sign = 1

Step 3. Add the exponents. Since the exponent of x is 0 and the exponent of y is 5, the exponent of the result is 5.

z_exp = 5

Step 4. Multiply the significands as 64-bit longs, don’t forget the hidden bits!

x_sig = 00000000 00000000 00000000 00000000 00000000 10000000 00000000 00000000
y_sig = 00000000 00000000 00000000 00000000 00000000 10000101 00000000 00000000
z_sig = 00000000 00000000 01000010 10000000 00000000 00000000 00000000 00000000

The significands of x and y aren’t really $100000000000000000000000_2$ and $100001010000000000000000_2$ , they’re actually $1.00000000000000000000000_2$ and $1.00001010000000000000000_2$ (note the binary point). That is, the real significands are the 24-bit values (23 bits from the float plus the hidden bit) multiplied by $2^{-23}$ . And when we multiply them together, our result is really $00000000\, 00000000\, 01000010\, 10000000\, 00000000\, 00000000\, 00000000\, 00000000_2\times2^{-46}.$

To bring the result back to our normal representation as a number times $2^{-23}$ , we need to shift z_sig right by 23 bits.

z_sig = 00000001 00001010 0000000 00000000

Steps 5 and 6, normalizing and encoding the result work the same as for addition.

In this case, there’s nothing to be done, z_sig is already normalized (as a 32-bit integer, it has 8 leading zeros followed by the 1 corresponding to the hidden bit). Thus, our result is $(-1)^1\cdot1.0000101_2\times2^5=-100001.01_2=-33.25$ and its encoding is 1 10000100 00001010000000000000000.