Fixed-Point Numbers

Fixed-point numbers are a way of representing real numbers by fixing the position of the decimal point in advance and using a positional numeral system. Given places for digits, the idea is to write a real number in the form

where is the digits at the -th place going right to left, is the chosen base and is the radix which determines the position of the decimal point.

NOTE

Despite the name, has nothing to do with the base .

The radix determines the position of the decimal point in the following way:

  • When , the number is multiplied with , i.e. it remains just , meaning that is an integer.
  • When , the number is multiplied with . Its digits are shifted places towards the most significant digit and the gaps are filled with zeros. The decimal point disappears, which means that we sacrifice precision but can represent larger numbers.
  • When , the number is multiplied with . The decimal point moves places towards the most significant digit. We increase the precision, but the largest representable number becomes smaller.

The distance between consecutive representable numbers using this format is constant and is equal to .

Algebraic Sign

So far, the fixed-point numbers format can only represent non-negative (unsigned) numbers. We thus need to find a way to extend it to allow for negative (signed) numbers. There are three main ways to do this when :

  • sign and absolute value;
  • ones’ complement;
  • twos’ complement.

Sign and Absolute Value

In the sign and absolute value system, we treat the first bit as the sign: indicates a negative sign, while indicates a positive sign. The rest of the bits are treated as the absolute value of the number.

For example, represents , while represents .

The main advantage of this format is its simplicity. However, it comes with the problem that there are always two ways to represent zero, namely and . This makes computations more difficult for computers.

Ones’ Complement

In the ones’ complement system, the negative of a number is constructed by inverting all its bits. For example, represents , while represents . This has the advantage that addition and subtraction work in (mostly) the same way as usual:

However, this format still has the problem that there are two ways to represent zero, namely and . Moreover, adding () to () results in () instead of ().

Two’s Complement

In the two’s complement system, the negative of a number is built by first constructing its ones’ complement and then adding the to it. For example, represents , while represents .

This solves all problems with the double representation of zero and so addition and subtraction always work as possible. The only problem is that it makes the positive and negative numbers asymmetric: for any fixed number of bits, the total number of representable negative numbers is always greater by one than the total number of representable positive numbers.

IMPORTANT

All modern computers use two’s complement.