# Floating Point Rounding

When a floating point computation is performed, the floating point result will often not be equal to the 'true' result. For example, the result of multiplying the two binary numbers .1001 and .1101 together is .01110101 but if we are using floating point arithmetic with only 4 bit precision then the result would be .01110 or .01111. The choice of which of these two results will actually be produced is called "rounding".

At first sight, this might look pretty easy, and it is - except for a few special cases. The obvious thing to do is to chose the result which give the least error, which we can call round to nearest. In our example, we would choose .01111 because the size of the error between this and the 'true' result is .00000011 whereas the error for the other result is .00000101.

What do we do if the error is the same for both choices? There are lots of possibilities here, including:

• Choose the result with the larger magnitude.
• Choose the result with the smaller magnitude.
• Choose the more positive result.
• Choose the more negative result.
• Choose the even result, i.e. the result which has a '0' in the least significant bit.
• Choose the odd result, i.e. the result which has a '1' in the least significant bit.
• Make the choice randomly, or alternate odd and even results, or choose even results on Thursdays, or ...

Most modern FPU's (including those in Intel 80x86 processors) will choose the even result because it is recommended by the IEEE standards.

## Round to nearest or even

Round to nearest along with the choice of even result when neither choice is nearer is called round to nearest or even. Features of this method of rounding include:

• the result always has the minimum error (it is round to nearest),
• it tends to distribute the results with equal probability between larger and smaller choices when neither choice has smaller error,
• the result is predictable and repeatable.

A feature of round to nearest or even (which is not shared by some other rounding methods) is that rounding performed in two or more stages may result in an error. Consider our example again; imagine that we have our FPU running in a mode where it produces 5 bit precision results, in this case the correctly rounded result is .011101. Now consider what happens if we store this result as a 4 bit precision number. This will require another rounding and the correct result (round to even) is .01110, which is different from the result we obtained (.01111) by performing the rounding to 4 bit precision in one step. In general, the result of rounding in several stages can be different from the result of rounding in one step unless either:

• the precision of two successive stages is the same, or
• the precision of the current stage is less than or equal to half the number of bits of the preceding stage.

The proof is left as an exercise for the reader ;-). This property has implications on architectures such as the Intel 80x86 processors. In the 'C' language on such machines, a 'double' has 53 bit precision and a 'long double' has 64 bit precision. Unlike some other architectures, the precision of FPU results is not encoded into the actual arithmetic instructions, but is set by separate instructions which load the FPU control word. For efficiency reasons it is therefore normal to run the FPU at the highest required precision, which defaults to that of a 'long double'. Therefore (for example) the result of multiplying two 'double' operands (53 bit precision) is always rounded to the precision of a 'long double' (64 bit precision). This result will then be rounded to 'double' precision when it is subsequently stored in RAM. This two stage rounding means that the results of computation on Intel machines can be different from the results produced on other architectures. Except for rare cases, the differences will only be significant for poorly designed programs.

## Other rounding methods

In addition to rounding to the nearest result, there is a need for other rounding modes such as:

• Round down; the more negative choice is always made.
• Round up; the more positive choice is always made.
• Chop; the choice with the smaller magnitude is always made.

These three rounding modes are provided on the Intel 80x86 architecture in addition to round to nearest or even.