# Lab 8: Floating Point examples

Lab 8 asks you to implement addition and multiplication for single-precision, 32-bit floating point numbers. A few worked examples are given below.

## Addition

My implementation for `xadd`

looks something like this.

- Check if either
`x`

or`y`

is zero and return the other one if so; - Denormalize
`x`

or`y`

to give them the same exponent (the exponent of the larger one); - Add the significands, taking sign into account;
- If the result is 0, return 0;
- Normalize the result; and
- Encode the result (bias the exponent; and mask off the hidden bit) and finally construct the new
`Xfloat`

with the result’s sign, exponent, and significand.

Let’s apply this algorithm to adding $2.5$ and $-7.25$ to see how this works.

Let’s start by encoding these as a 32-bit single precision floating point number, performing the 5 steps, and looking at the result.

$2.5_{10} = 10.1_2 = 1.01_2\cdot2^1 =
(-1)^0\cdot1.01000000000000000000000_2\cdot2^1$. Therefore, the three fields of the float are sign = 0, exponent = 1, and significand = $1.01000000000000000000000_2$. These are encoded as `0 10000000 01000000000000000000000`

(where I’ve added 127 to the exponent and dropped the hidden bit from the significand).

$-7.25_{10} = -111.01_2 = (-1)^1\cdot1.11010000000000000000000_2\cdot2^2$. Therefore, the three fields of the float are sign = 1, exponent = 2, and significand = $1.01000000000000000000000_2$. Similarly, these are encoded as `1 10000001 11010000000000000000000`

.

Step 1. Neither `x`

nor `y`

are zero, so we continue to step 2.

Step 2. Extract and unbias the exponents for `x`

and `y`

; extract and add in the hidden bit to the significands for `x`

and `y`

. This gives the following.

```
x_exp = 1
y_exp = 2
x_sig = 00000000 1 01000000000000000000000
y_sig = 00000000 1 11010000000000000000000
```

In `x_sig`

and `y_sig`

, the first 8 bits are 0s, the next bit is the hidden 1 bit and the remaining 23 are the significand without the hidden bit.

Since `y_exp > x_exp`

, we denormalize `x`

by shifting `x_sig`

*right* by `y_exp - x_exp`

and setting `x_exp`

to `y_exp`

. This gives us the following variables.

```
x_exp = 2
y_exp = 2
x_sig = 00000000010100000000000000000000
y_sig = 00000000111010000000000000000000
```

Step 3. We need to add the significands together paying attention to the sign. Because `x`

is positive and `y`

is negative, we need to compute `z_sig = x_sig - y_sig`

where `z`

is going to be the result. In this case, `z_sig`

is negative and we can’t have that, so we’ll set `z_sign = 1`

and `z_sig = -z_sig`

. In any case, `z_exp = x_exp = y_exp`

, although that can change in step 5 when we normalize the result.

```
z_sign = 1
z_exp = 2
z_sig = 00000000100110000000000000000000
```

Step 4. The result is not zero, so move to the next step.

Step 5. Now, we need to normalize the result, but in this particular case, there’s actually nothing to be done! I can tell there’s nothing to be done because the significand is `00000000 1 00110000000000000000000`

. In other words, it starts with eight 0s and then a 1 (the hidden bit) followed by 23 bits. If it started with fewer than eight 0s, we’d need to shift the significand to the right and increment the exponent until it started with eight 0s. If it started with more than eight 0s, we’d need to shift the significand to the left and decrement the exponent until it started with eight 0s.

This gives our final value of

```
z_sign = 1
z_exp = 2
z_sig = 00000000 1 00110000000000000000000
```

We can check we did this correctly with some arithmetic: $(-1)^1\cdot1.0011_2\cdot2^2 = -100.11_2=-4.75$. Success!

Step 6. The last thing we need to do is encode the result. So we need to add 127 to the exponent and mask off the hidden bit in the significand giving `1 10000001 00110000000000000000000`

.

The previous example shows off all of the steps except for step 5, normalize the result. So let’s do another quick example, but this time using 2.5 and 7.25.

Steps 1 and 2 are identical. After them, we have

```
x_exp = 2
y_exp = 2
x_sig = 00000000010100000000000000000000
y_sig = 00000000111010000000000000000000
```

In step 3, we simply add the significands together since neither x nor y is negative. This gives.

```
z_sign = 0
z_exp = 2
z_sig = 00000001001110000000000000000000
```

Step 4 is the same: `z_sig`

is not 0, so we continue to step 5.

Step 5. Here, `z_sig`

starts with seven 0s so we need to shift `z_sig`

right by 1 and increment the exponent by one giving

```
z_sign = 0
z_exp = 3
z_sig = 00000000 1 00111000000000000000000
```

Checking our arithmetic, we have $(-1)^0\cdot1.00111_2\cdot2^3 = 1001.11_2 = 9.75$. Success!

Step 6. We encode by adding 127 to the exponent and masking off the hidden bit as before giving `0 10000010 00111000000000000000000`

.

## Multiplication

My implementation of `xmult`

looks something like this.

- If either
`x`

or`y`

is 0, return 0; - Compute the sign of the result;
- Add the exponents;
- Multiply the significands and shift;
- Normalize the result; and
- Encode the exponent and significand.

As a simple example, let’s multiply $-1$ and $33.25$. These have encodings

```
x: 1 01111111 00000000000000000000000
y: 0 10000100 00001010000000000000000
```

Step 1. Neither `x`

nor `y`

are 0, so continue to step 2.

Step 2. Since `x`

is negative and `y`

is positive, we know the result is negative.

```
z_sign = 1
```

Step 3. Add the exponents. Since the exponent of `x`

is 0 and the exponent of `y`

is 5, the exponent of the result is 5.

```
z_exp = 5
```

Step 4. Multiply the significands as 64-bit `long`

s, don’t forget the hidden bits!

```
x_sig = 00000000 00000000 00000000 00000000 00000000 10000000 00000000 00000000
y_sig = 00000000 00000000 00000000 00000000 00000000 10000101 00000000 00000000
z_sig = 00000000 00000000 01000010 10000000 00000000 00000000 00000000 00000000
```

The significands of `x`

and `y`

aren’t really $100000000000000000000000_2$ and $100001010000000000000000_2$, they’re actually $1.00000000000000000000000_2$ and $1.00001010000000000000000_2$ (note the binary point). That is, the real significands are the 24-bit values (23 bits from the float plus the hidden bit) multiplied by $2^{-23}$. And when we multiply them together, our result is really $00000000\, 00000000\, 01000010\, 10000000\, 00000000\, 00000000\, 00000000\, 00000000_2\times2^{-46}.$

To bring the result back to our normal representation as a number times $2^{-23}$, we need to shift `z_sig`

right by 23 bits.

```
z_sig = 00000001 00001010 0000000 00000000
```

Steps 5 and 6, normalizing and encoding the result work the same as for addition.

In this case, there’s nothing to be done, `z_sig`

is already normalized (as a 32-bit integer, it has 8 leading zeros followed by the 1 corresponding to the hidden bit). Thus, our result is $(-1)^1\cdot1.0000101_2\times2^5=-100001.01_2=-33.25$ and its encoding is `1 10000100 00001010000000000000000`

.