Lab 8: Floating Point

Due: Sunday, January 2 at 23:59

In this lab, you will implement part of the IEEE 754 Standard for Floating-Point Arithmetic. In particular, you will implement single precision (32-bit) floating point addition and multiplication.

Preliminaries

You may discuss your implementation with a partner; however, each of you needs to write and submit your own code. So no copying and pasting, but you should feel free to work together on a solution and to debug.

Click on the assignment link.

Once you have accepted the assignment, you can clone the repository on your computer by following the instruction and begin working.

Be sure to ask any questions on Piazza.

Program Specification

In the assignment repo, you’ll find a single Java file, Xfloat.java which defines an Xfloat class. This class represents a single-precision floating point number with the following members.

private byte sign;
private byte exponent;
private int significand;

Of course, these represent the three fields of a floating point number.

The file also contains a main method that takes two floats as command-line arguments and prints them out along with the product and sum of the numbers. For example, running the program with floats 2.5 and 7.25 gives this output.

$ java Xfloat 2.5 7.25
x:   40200000 (0 10000000 01000000000000000000000) sign: 0 exp: 80 sig: 200000 2.5
y:   40E80000 (0 10000001 11010000000000000000000) sign: 0 exp: 81 sig: 680000 7.25
x*y: 00000000 (0 00000000 00000000000000000000000) sign: 0 exp: 00 sig: 000000 0.0
x+y: 00000000 (0 00000000 00000000000000000000000) sign: 0 exp: 00 sig: 000000 0.0

The first 8 hex-digit number corresponds to the bits of the actual IEEE floating point number. The binary representation follows, separated into sign, exponent and significand fields. Next are the corresponding Xfloat fields in hex. Note that the significand field requires only 6 hex digits since only 23 of the 32 bits are actually used.

You only need to implement two methods.

public static Xfloat xadd(Xfloat x, Xfloat y): Add Xfloats x and y together and return the result.
public static Xfloat xmult(Xfloat x, Xfloat y): Multiply Xfloats x and y together and return the result.

You must implement the methods by performing the addition/multiplication by operating on the three fields of each Xfloat.

You should not modify any other methods.

You may ignore subnormal numbers, infinities, and NaN, but you must properly handle 0.0.

You must use bit masking and shifting to manipulate bits. In particular, you are not allowed to convert your integers to strings.

Hints

When implementing xadd, you’ll need to shift the significand of the number with the smaller magnitude so that the two numbers have the same exponent. However, if the number of bits you need to shift by is greater than 31, you should just set the significand to rather than trying to shift by more than 31. This is because val >> 32 in Java is the same thing as val and val >> 33 is the same as val >> 1. In general if shamt >= 32, then val >> shamt is the same as val >> (shamt % 32) which can lead to surprising results.
For example, when adding 0.000000237 and 234624.0, the exponent for the first is $-23$ and the exponent for the second is 17. To make the first have the same exponent as the second, we’d need to shift the first’s significand right by $17 - -23 = 40$ . In this case, the final result should be 234624.0.
When implementing xmult, you’ll need to multiply the significands. You should perform the following steps.
1. Convert the significands to 64-bit longs;
2. Use Xfloat.HIDDEN_BIT to add the hidden bit to the significands;
3. Multiply the two 64-bit significands;
4. Shift the result appropriately (making necessary adjustments to the exponent); and
5. Cast the significand back to 32 bits and remove the implicit 1.
The various XXX_MASK constants defined in Xfloat may be helpful.
In Java, the unsigned right shift operator is >>>.
Because Java only uses signed numbers, you may want to use types that are larger than the ones you need. For example, you will probably want to work with the exponents by assigning them to ints before working with them. Something like
```
int e = x.exponent & 0xFF;
```
will give you x’s exponent as an integer in the range [0, 255]. If you omit the & 0xFF, then any x.exponent that’s larger than 127 will give you a negative number.
Check out the page of worked examples.

Submission

Submit the lab by committing your code and pushing it to your GitHub repository.