Monday, August 6, 2012

Why Computer Science is not an Exact Science?

I had a simple task today. Write a comparator for integers. My attempt was:

int compare(int a, int b){
    return a - b;

As you know this function must return a negative number if a<b, a positive number if a>b, and zero if both numbers are equal. As in any Exact Science, to show that something is correct you must conduct a proof.

There are three claims to prove:

  • If a < b, the algorithm returns a negative number.
In fact if a<b, then b=a+c, where c is a positive number. Thus a<a+c and  a-b = a-(a+c) = -c, a negative number
  • if a>b, the algorithm returns a positive number.  
Using the same reasoning, a>b => a=b+c => a-b = b+c-b = c, a positive number
  • if a=b, the algorithm returns zero.
Trivially : a=b => a-b = a-a = 0.

But yet, the algorithm is incorrect.
"How come? Didn't you just prove it correct with mathematical rigor?"
Yes, I did. Your professor from the Algorithm Analysis course would like this proof. But as I said, it is completely wrong.  Guess why? Just an small tip : 231-1

Yes! Integer overflow! My biggest mistake here was to assume a,b ∈ ℤ. Actually, they are restricted to (considering VM Java and 32bits for C/C++)  the interval [-231,231-1] . And the math is pretty weird for outsiders: the  sum of two very small negative integers may resulting in a positive number, and the sum of two very big positive numbers may yield a negative number.

"So let us account for the overflows. Let's put a check on your code. If ..."
Hey! Wait! Before complicating the code I want to show you how the Java Library implements the comparison:

 public static int compare(int x, int y) {
     return (x < y) ? -1 : ((x == y) ? 0 : 1);

Imagine my duh! face when I read that.

Two lessons I learned here:
  • Theory is good, but practice is always better.  In theory integers are infinite. In practice they are limited by the word size, and they came in signed and unsigned flavors.  That also applies to more complicated data structures. Your perfect hash function may not be perfect when you consider  the hardware  and software constraints.
  • Do not pre-optimize code,  focus first in making it the most readable as possible.  By trying to making the comparison with a single operation I made it not readable by others at first sight. And worst: instead of saving "microseconds", I introduced a bug.

No comments:

Post a Comment