Lab 8: Floating Point examples
Lab 8 asks you to implement addition and multiplication for single-precision, 32-bit floating point numbers. A few worked examples are given below.
Addition
My implementation for xadd
looks something like this.
- Check if either
x
or y
is zero and return the other one if so; - Denormalize
x
or y
to give them the same exponent (the exponent of the larger one); - Add the significands, taking sign into account;
- If the result is 0, return 0;
- Normalize the result; and
- Encode the result (bias the exponent; and mask off the hidden bit) and finally construct the new
Xfloat
with the result’s sign, exponent, and significand.
Let’s apply this algorithm to adding 2.5 and −7.25 to see how this works.
Let’s start by encoding these as a 32-bit single precision floating point number, performing the 5 steps, and looking at the result.
2.510=10.12=1.012⋅21=(−1)0⋅1.010000000000000000000002⋅21. Therefore, the three fields of the float are sign = 0, exponent = 1, and significand = 1.010000000000000000000002. These are encoded as 0 10000000 01000000000000000000000
(where I’ve added 127 to the exponent and dropped the hidden bit from the significand).
−7.2510=−111.012=(−1)1⋅1.110100000000000000000002⋅22. Therefore, the three fields of the float are sign = 1, exponent = 2, and significand = 1.010000000000000000000002. Similarly, these are encoded as 1 10000001 11010000000000000000000
.
Step 1. Neither x
nor y
are zero, so we continue to step 2.
Step 2. Extract and unbias the exponents for x
and y
; extract and add in the hidden bit to the significands for x
and y
. This gives the following.
x_exp = 1
y_exp = 2
x_sig = 00000000 1 01000000000000000000000
y_sig = 00000000 1 11010000000000000000000
In x_sig
and y_sig
, the first 8 bits are 0s, the next bit is the hidden 1 bit and the remaining 23 are the significand without the hidden bit.
Since y_exp > x_exp
, we denormalize x
by shifting x_sig
right by y_exp - x_exp
and setting x_exp
to y_exp
. This gives us the following variables.
x_exp = 2
y_exp = 2
x_sig = 00000000010100000000000000000000
y_sig = 00000000111010000000000000000000
Step 3. We need to add the significands together paying attention to the sign. Because x
is positive and y
is negative, we need to compute z_sig = x_sig - y_sig
where z
is going to be the result. In this case, z_sig
is negative and we can’t have that, so we’ll set z_sign = 1
and z_sig = -z_sig
. In any case, z_exp = x_exp = y_exp
, although that can change in step 5 when we normalize the result.
z_sign = 1
z_exp = 2
z_sig = 00000000100110000000000000000000
Step 4. The result is not zero, so move to the next step.
Step 5. Now, we need to normalize the result, but in this particular case, there’s actually nothing to be done! I can tell there’s nothing to be done because the significand is 00000000 1 00110000000000000000000
. In other words, it starts with eight 0s and then a 1 (the hidden bit) followed by 23 bits. If it started with fewer than eight 0s, we’d need to shift the significand to the right and increment the exponent until it started with eight 0s. If it started with more than eight 0s, we’d need to shift the significand to the left and decrement the exponent until it started with eight 0s.
This gives our final value of
z_sign = 1
z_exp = 2
z_sig = 00000000 1 00110000000000000000000
We can check we did this correctly with some arithmetic: (−1)1⋅1.00112⋅22=−100.112=−4.75. Success!
Step 6. The last thing we need to do is encode the result. So we need to add 127 to the exponent and mask off the hidden bit in the significand giving 1 10000001 00110000000000000000000
.
The previous example shows off all of the steps except for step 5, normalize the result. So let’s do another quick example, but this time using 2.5 and 7.25.
Steps 1 and 2 are identical. After them, we have
x_exp = 2
y_exp = 2
x_sig = 00000000010100000000000000000000
y_sig = 00000000111010000000000000000000
In step 3, we simply add the significands together since neither x nor y is negative. This gives.
z_sign = 0
z_exp = 2
z_sig = 00000001001110000000000000000000
Step 4 is the same: z_sig
is not 0, so we continue to step 5.
Step 5. Here, z_sig
starts with seven 0s so we need to shift z_sig
right by 1 and increment the exponent by one giving
z_sign = 0
z_exp = 3
z_sig = 00000000 1 00111000000000000000000
Checking our arithmetic, we have (−1)0⋅1.001112⋅23=1001.112=9.75. Success!
Step 6. We encode by adding 127 to the exponent and masking off the hidden bit as before giving 0 10000010 00111000000000000000000
.
Multiplication
My implementation of xmult
looks something like this.
- If either
x
or y
is 0, return 0; - Compute the sign of the result;
- Add the exponents;
- Multiply the significands and shift;
- Normalize the result; and
- Encode the exponent and significand.
As a simple example, let’s multiply −1 and 33.25. These have encodings
x: 1 01111111 00000000000000000000000
y: 0 10000100 00001010000000000000000
Step 1. Neither x
nor y
are 0, so continue to step 2.
Step 2. Since x
is negative and y
is positive, we know the result is negative.
Step 3. Add the exponents. Since the exponent of x
is 0 and the exponent of y
is 5, the exponent of the result is 5.
Step 4. Multiply the significands as 64-bit long
s, don’t forget the hidden bits!
x_sig = 00000000 00000000 00000000 00000000 00000000 10000000 00000000 00000000
y_sig = 00000000 00000000 00000000 00000000 00000000 10000101 00000000 00000000
z_sig = 00000000 00000000 01000010 10000000 00000000 00000000 00000000 00000000
The significands of x
and y
aren’t really 1000000000000000000000002 and 1000010100000000000000002, they’re actually 1.000000000000000000000002 and 1.000010100000000000000002 (note the binary point). That is, the real significands are the 24-bit values (23 bits from the float plus the hidden bit) multiplied by 2−23. And when we multiply them together, our result is really 00000000000000000100001010000000000000000000000000000000000000002×2−46.
To bring the result back to our normal representation as a number times 2−23, we need to shift z_sig
right by 23 bits.
z_sig = 00000001 00001010 0000000 00000000
Steps 5 and 6, normalizing and encoding the result work the same as for addition.
In this case, there’s nothing to be done, z_sig
is already normalized (as a 32-bit integer, it has 8 leading zeros followed by the 1 corresponding to the hidden bit). Thus, our result is (−1)1⋅1.00001012×25=−100001.012=−33.25 and its encoding is 1 10000100 00001010000000000000000
.