Tuesday, March 5, 2013

The problem with decimal fractions

The problem with decimal fractions

Most people use decimals with fractions without having too many problems. It's not exactly hard to calculate that 2.50 + 2.50 will add up to 5. It's only natural to expand this kind of thinking into computer programming. After all, computers are much better and faster at calculating things than the human brain. There is, however, a slight problem. Computers tend to deal with decimal fractions a bit differently from us humans. To be precise, computers don't count in base ten.

While most examples in this article are in PHP, the basics are applicable to other programming languages as well, since the article is mostly about how computers deal with decimal fractions, which is not PHP specific behavior.

The Problem

If you've done calculations in PHP with decimal fractions, you might have noticed some rather unexpected results. Sometimes calculations with decimal fractions provide unexpected results or numbers that should clearly be equal turn out to be not equal. These are just few examples of different things that seem to go wrong.

Let's look at a little example. First, let's just dump a simple number with decimal fractions.

> var_dump(0.1); float(0.1)

That's all very simple. Now, let's add another small number to that.

> var_dump(0.1 + 0.2); float(0.3)

That wasn't too hard. I bet you could have even calculated that in your head. After all, it's just a simple addition that results in "0.3", right?

> var_dump((0.1 + 0.2) == 0.3); bool(false)

Not quite what one would expect. The previous calculation clearly showed, that the result should be "0.3". How is it possible that it does not equal 0.3 then. I mean, one would expect it to be no different from the following situation:

> var_dump(0.3 == 0.3); bool(true)

As we can clearly see. That should be true. Well, at least if we subtract 0.3 from the result, one would expect the result to be 0.

> var_dump((0.1 + 0.2) - 0.3); float(5.5511151231258E-17)

Close, but no cigar.

The Explanation

In PHP, decimal fractions are represented internally as floating point numbers. You may have heard, that computers tend to deal with everything in 1s and 0s. The same goes for calculating numbers, which is called binary arithmetic.

When you write, for example, 5 + 5 in PHP, the computer will actually perform the calculation using binary arithmetic. This means, that for computer, this calculation is done in base two. The number is 101 in base two, so the calculation looks like 101 + 101 to the computer.

The very same principle is also applied to the decimal fractions. However, the problem is that integer arithmetic operates a bit differently from floating point arithmetic, which is used to calculate fractions. While calculating decimal integers with binary does not cause any problems, the same does not apply to calculating decimal fractions.

The fractions, as the name implies, are fractions of the whole number. In base ten, the numbers after decimal separator are simply fractions of 10 (to nth power). For example, the number 0.1 in base ten simply means 1/10. Using the same representation in base two, the number 0.1 means same as 1/2, which can be easily converted to 5/10, i.e. 0.5 in base 10. However, if we were to use base three, the number 0.1 would mean 1/3. In base 10, that would be represented as repeating 0.33333...

In other words, problems may arise with fractions, because some fractions cannot be represented in other number systems using a finite amount of numbers. Because the numbers are written to the code using base ten and displayed to the user using base ten, the computer has to perform conversions to and from base two to perform the calculation using binary arithmetic. Unfortunately, the computer can only deal with finite number of bits. This can result in some inaccuracies as demonstrated by the example in the problem section.

The simple fact is, that when you calculate 0.1 + 0.2 using binary arithmetic used by the computers, you will end up with slightly different number than what you get by converting 0.3 to base two. Because of this, they do not appear equal to the computer and will result in difference other than 0.

It can be confusing, because the numbers in the code are typically written using decimal fractions and the results are outputted using decimal fractions. In some cases, the conversion back to base ten will cause enough rounding for the number to appear to be what you except. Because of this, string conversion can unexpectedly make things work.

> var_dump((0.1 + 0.2) == 0.3); bool(false) > var_dump((string) (0.1 + 0.2) == 0.3); bool(true) > var_dump(((string) (0.1 + 0.2)) - 0.3); float(0)

Do keep in mind, though, that this is not good practice and you cannot rely on this behavior in most cases.

The Solution

Unfortunately, there is no simple solution to the problems caused by the binary floating point arithmetic. There are, however, multiple different ways to go around the issue. The appropriate approach depends on what you are trying to achieve or do.

The two most important issues to keep in mind when dealing with fractions is equality and accuracy.

First, if you ever need to compare floating point numbers, you must remember that you cannot reliably compare if two floating point numbers are equal using the equality operator "==", because the floating point numbers may not be exactly equal even if they should be using decimal arithmetic. The solution is to use a function to compare if the two numbers are "almost" equal. For example:

Now, if we wanted to compare if two floating point numbers are equal, we could use:

> var_dump(almostEqual(0.1 + 0.2, 0.3)); bool(true)

In that function, the purpose of the epsilon is to signify the smallest meaningful number. When the difference between the two numbers is smaller than that number, the two compared numbers are equal enough that any smaller difference is not relevant. It is important to acknowledge the fact, however, that defining the epsilon in that function is not a trivial matter. Take care defining it according to the input values.

You must also remember that other comparison operators also rely on the concept of equality. Whenever you use any of the "<", "<=", "==", ">=", ">" operators, you must also remember to use the almostEqual function. For example, the "<" operator should be define as "$a < b && !almostEqual($a, $b)". Additionally, it's generally good idea to round any floating point numbers you intend to output to the user.

However, sometimes accuracy is really important and having mostly accurate numbers with binary floating point arithmetic is simply not enough. This is usually the case when dealing with money or other important real world numbers such as measurements. In these cases, you simply need to use decimal arithmetic instead of binary arithmetic.

In some cases this may be easier said than done. However, in PHP, like many other languages, functions already exist for your convenience. The BCMath library can handle some simple calculations using numbers of arbitrary size.

The most important thing to remember when dealing with floating point numbers is to be really careful about what you are doing, because it can easily result in unexpect problems.

More information

If you found this article confusing, don't worry, the subject is actually far more complex than how this article describes it. For more information about floating point numbers, see these resources available on the Internet.

Disclaimer

I am neither a mathematician or a scientist. I'm simply a programmer and I've tried to explain the issue from the PHP point of view without getting too technical or mathematical. The article makes some simplifications, but please inform me, if you notice any factual inaccuracies.