Why 0.1 + 0.2 != 0.3

讀書百遍而義自見 - 晉・陳壽《三國志・魏志・王肅傳》

Problem

When you enter 0.1 + 0.2 in Python or JavaScript console, the result is not 0.3. How could this happen?

# Python
>>> 0.1+0.2
0.30000000000000004
>>> 0.1+0.2==0.3
False

# JavaScript
> 0.1+0.2
0.30000000000000004
> 0.1+0.2===0.3
false

It's quite an old question and there're already many answers in stackoverflow. But I still want to explain it a bit, because I want to (a) get a better understanding of this question, (b) explain this question using my own words.

Solution

Computers do math in binary, let's convert 0.1 from decimal to binary manually with below method. But usually I just go to Wolfram Alpha and type "0.1 to binary".

0.1 x 2 = 0.2 ... 0
0.2 x 2 = 0.4 ... 0
0.4 x 2 = 0.8 ... 0
0.8 x 2 = 1.6 ... 1
0.6 x 2 = 1.2 ... 1
0.2 x 2 = 0.4 ... 0
...

So 0.1 in binary can be represented as (0.00011001100110011...)2(0.00011001100110011...)_2, using a scientific notation it's (1.1001100110011...)2×24(1.1001100110011...)_2\times2^{-4}. Similarly (0.2)10=(1.1001100110011...)2×23(0.2)_{10}=(1.1001100110011...)_2\times2^{-3}.

After adding 2 numbers in binary, based on how many digits you keep, you will get results like below.

(1.0011001100110011001100110011001100110011001100110100)2×22=(0.30000000000000004441)10\begin{align*} &(1.0011001100110011001100110011001100110011001100110100)_2 \times 2^{-2} \\ &=(0.30000000000000004441)_{10} \end{align*}

That's exactly the same output(except for the rounding) as it is from console!

To conclude, human uses decimal while computer uses binary for arithmetic. Decimal numbers are converted to binary for calculation and then converted back to decimal afterwards. There are precision lost during the conversions, so we see the odd output.

Going Further

Representing Floating Point Numbers

But it's not yet how it is represented in computer, e.g. IEEE 754 double-precision binary floating-point format AKA float64, requires numbers to be formatted as below.

IEEE 754 Double Floating Point Format

Described in formula it's (1)sign(1.b51b50...b0)2×2e1023(-1)^{sign}(1.b_{51}b_{50}...b_0)_2\times2^{e-1023}.

For 0.1, sign equals to 0 and e equals to 1019. So it's (1)0(1.1001100110011001100110011001100110011001100110011010)2×210191023(-1)^0(1.1001100110011001100110011001100110011001100110011010)_2 \times 2^{1019-1023}.

Written in float64 format and that's how 0.1 is represented in memory:

0.1 in float64 format, rounded

Note that it depends on which number representation standard used by each language. e.g. Python and JavaScript represent floating point numbers using double precision defined by the IEEE 754 standard, but for C, it offers a wide variety of arithmetic types, double precision is not required by the standards. And platform also matters, e.g. float64 will work for a 8-bit machine.

Largest and Smallest Numbers

Referenced from JavaScript: The Definitive Guide, 7th Edition

JavaScript represents numbers using the 64-bit floating-point format defined by the IEEE 754 standard, which means it can represent numbers as large as ±1.7976931348623157×10308\pm1.7976931348623157 × 10^{308} and as small as ±5×10324\pm5 × 10^{−324}.

When I first read this, I asked myself, how are these largest and smallest numbers calculated? Luckily with the handy online conversion tools I got the answer.

Largest representable number: +1.7976931348623157×10308+1.7976931348623157\times10^{308} -> 0x7FEFFFFFFFFFFFFF

Smallest number without losing precision: +2.2250738585072014×10308+2.2250738585072014\times10^{-308} -> 0x0010000000000000

Smallest representable number: +5×10324+5\times10^{−324} -> 0x0000000000000001

Representing Integers

This part is largely referenced from an article from Ruan YiFeng's Blog.

A negative number is represented by two's complement, a non-negative number is represented by its ordinary binary representation.

What is Two's Complement

1 hour clockwise equals 11 hours anti clockwise.

The two's complement of an NN-bit number is defined as its complement with respect to 2N2^N; the sum of a number and its two's complement is 2N2^N.

It's called "two's" because it's based on 2, more accurately 2N2^N's complement. Similarly there are 10's complement(decimal), 12's complement(clock).

Below is an example, assuming numbers are 8-bit.

+8=000010008=10000000000001000=11111000\begin{align*} +8 &= 0000\,1000 \\ -8 &= 1\,0000\,0000-0000\,1000 \\ &= 1111\,1000 \end{align*}

1000000001\,0000\,0000 can be written as 11111111+000000011111\,1111+0000\,0001

 11111111
-00001000
--------- NOT operation
 11110111
+00000001
--------- plus 1
 11111000

Why Using Two's Complement

Two's-complement integer representations eliminate the need for separate addition and subtraction units. It simplifies the design of CPU, because it can use one type of circuit for both addition and subtraction.

It also conveys the idea of unification in software engineering.

See Also

壬寅年谷雨後一日於Düsseldorf

最后更新于