Select Page

Floating point arithmetic involves the representation and manipulation of numbers with fractional parts. In computing, floating point numbers are typically represented using a standardized format such as IEEE 754.

Here’s a brief overview of the representation of floating point numbers:

  1. Sign bit: This bit determines the sign of the number, where 0 represents a positive number and 1 represents a negative number.
  2. Exponent: This part represents the exponent of the number in scientific notation. It determines the magnitude of the number. Typically, it is represented in biased form, where a bias value is added to the actual exponent to allow for both positive and negative exponents to be represented.
  3. Fraction (Mantissa): This part represents the significant digits of the number in binary. It is also known as the mantissa. The fraction is normalized so that the leading bit is always 1, which allows for more efficient representation and arithmetic operations.

The overall format of a floating point number is often represented as:

 

Where:

  • is the sign bit.
  • is the fractional part (mantissa).
  • is the exponent part.

  • is a bias value used to represent the exponent in biased form.

Floating point arithmetic involves operations such as addition, subtraction, multiplication, and division on floating point numbers while considering the rules and limitations imposed by the chosen floating point representation format. These operations may sometimes result in rounding errors or loss of precision due to the finite representation of numbers in the floating point format.