When people try out Julia for the first time, many of them are worried by the following example:
1 2 3 4 5 6 7 | julia> factorial(n) = n == 0 ? 1 : n * factorial(n - 1) julia> factorial(20) 2432902008176640000 julia> factorial(21) -4249290049419214848 |
If you’re not familiar with computer architecture, this result is very troubling. Why would Julia claim that the factorial of 21 is a negative number?
The answer is simple, but depends upon a set of concepts that are largely unfamiliar to programmers who, like me, grew up using modern languages like Python and Ruby. Julia thinks that the factorial of 21 is a negative number because computers are machines.
Because they are machines, computers represent numbers using many small groups of bits. Most modern machines work with groups of 64 bits at a time. If an operation has to work with more than 64 bits at a time, that operation will be slower than a similar operation than only works with 64 bits at a time.
As a result, if you want to write fast computer code, it helps to only execute operations that are easily expressible using groups of 64 bits.
Arithmetic involving small integers fits into the category of operations that only require 64 bits at a time. Every integer between -9223372036854775808 and 9223372036854775807 can be expressed using just 64 bits. You can see this for yourself by using the typemin and typemax functions in Julia:
1 2 3 4 5 | julia> typemin(Int64) -9223372036854775808 julia> typemax(Int64) 9223372036854775807 |
If you do things like the following, the computer will quickly produce correct results:
1 2 3 4 5 | julia> typemin(Int64) + 1 -9223372036854775807 julia> typemax(Int64) - 1 9223372036854775806 |
But things go badly if you try to break out of the range of numbers that can be represented using only 64 bits:
1 2 3 4 5 | julia> typemin(Int64) - 1 9223372036854775807 julia> typemax(Int64) + 1 -9223372036854775808 |
The reasons for this are not obvious at first, but make more sense if you examine the actual bits being operated upon:
1 2 3 4 5 6 7 8 | julia> bits(typemax(Int64)) "0111111111111111111111111111111111111111111111111111111111111111" julia> bits(typemax(Int64) + 1) "1000000000000000000000000000000000000000000000000000000000000000" julia> bits(typemin(Int64)) "1000000000000000000000000000000000000000000000000000000000000000" |
When it adds 1 to a number, the computer blindly uses a simple arithmetic rule for individual bits that works just like the carry system you learned as a child. This carrying rule is very efficient, but works poorly if you end up flipping the very first bit in a group of 64 bits. The reason is that this first bit represents the sign of an integer. When this special first bit gets flipped by an operation that overflows the space provided by 64 bits, everything else breaks down.
The special interpretation given to certain bits in a group of 64 is the reason that factorial of 21 is a negative number when Julia computes it. You can confirm this by looking at the exact bits involved:
1 2 3 4 5 | julia> bits(factorial(20)) "0010000111000011011001110111110010000010101101000000000000000000" julia> bits(factorial(21)) "1100010100000111011111010011011010111000110001000000000000000000" |
Here, as before, the computer has just executed the operations necessary to perform multiplication by 21. But the result has flipped the sign bit, which causes the result to appear to be a negative number.
There is a way around this: you can tell Julia to work with groups of more than 64 bits at a time when expressing integers using the BigInt type:
1 2 3 4 5 6 7 8 9 10 | julia> require("BigInt") julia> BigInt(typemax(Int)) 9223372036854775807 julia> BigInt(typemax(Int)) + 1 9223372036854775808 julia> BigInt(factorial(20)) * 21 51090942171709440000 |
Now everything works smoothly. By working with BigInt‘s automatically, languages like Python avoid these concerns:
1 2 3 4 | >>> factorial(20) 2432902008176640000 >>> factorial(21) 51090942171709440000L |
The L at the end of the numbers here indicates that Python has automatically converted a normal integer into something like Julia’s BigInt. But this automatic conversion comes at a substantial cost: every operation that stays within the bounds of 64-bit arithmetic is slower in Python than Julia because of the time required to check whether an operation might go beyond the 64-bit bound.
Python’s automatic conversion approach is safer, but slower. Julia’s approach is faster, but requires that the programmer understand more about the computer’s architecture. Julia achieves its performance by confronting the fact that computers are machines head on. This is confusing at first and frustrating at times, but it’s a price that you have to pay for high performance computing. Everyone who grew up with C is used to these issues, but they’re largely unfamiliar to programmers who grew up with modern languages like Python. In many ways, Julia sets itself apart from other new languages by its attempt to recover some of the power that was lost in the transition from C to languages like Python. But the transition comes with a substantial learning curve.
And that’s why I wrote this post.
“Because they are machines, computers represent numbers using many small groups of bits” better to say that using a small group of bits implies a trade off between speed and accuracy. Being a machine is unrelated to how numbers are introduced or coded in a computer.
Using a small number of bits means that there will be overflow and other errors, so you need to be aware of what is the cost of using this trade off.
Machines are not liable of our decisions.
I think that it is a reasonable decision to use BigInt by default if the language dosn’t have a type system (POLS).
In scala you wouldn’t wonder because you were forced to declare your function to return ints or bigints. Julia allows you to specify a type, but if you don’t it would be nice if it would be automatically converted, even at the expense of some processing time.
I believe that the author over estimates the benefits (code execution speed) and under estimates the cost (additional programming complexity and the potential for data-dependent bugs). How many potential Julia users are as capable of handling overflows as the Python development team? After accounting for overflow guards, how much faster is Julia? Benchmark?
There is a place and an argument for moving the calculation down the machine level. However, doing that in the design of a general purpose computing language, IMHO, does not make for safe and reliable statistical computing environment.
The trade off isn’t quite as simple as you make it out to be. Common Lisp uses a few bits of type information so has a smaller range of small integer operations, but seamlessly moves to big ints as needed. If you stay within the bounds of small integer operations, you keep the performance, but you safely overflow into big int operations when needed. If you or the compiler can guarantee this isn’t going to happen, c like performance is achievable locally. This is probably the right thing to do in a language like Julia too, but has some pretty deep implications for the type system and compiler….
@Andrew: the Julia website has benchmarks comparing Julia’s performance at recursion, loops, etc. to Python, among other high-level dynamic languages: http://julialang.org/. Python is generally 30-70x slower than C; with that much overhead, integer auto-promotion is a negligible cost. Julia, on the other hand is within 1-2x slower than C – and we’re aiming to close that gap all the way. When you’re that close to C, any additional overhead on something as basic as integer arithmetic is very costly.
@Watson Aname: you’re right – the tradeoff is a bit more complicated. There’s a third factor, which is transparency and simplicity of program execution and data representation. Some Lisp implementations do use fancy tricks like the ones you describe to get bigint behavior without much performance cost. However, there is another cost: it ceases to be easy to understand what’s happening and why a program has the performance characteristics that it does.
In Julia, we want simplicity and transparency as well as performance. In this case, if you understand that Julia’s integers are machine Int64s, then you understand exactly how they behave how they are represented. Ultimately, for the vast majority of programs, you just want your integers to be 64-bit machine ints. Moreover, if you’re storing integers in arrays – as happens now and then in numerical computing – having a simple, standard, transparent representation becomes crucial. If your default integers are just 64-bit machine integers, then you know exactly what an array of them looks like in memory. This has lots of benefits: the array can be passed as-is to C and Fortran libraries; you can write the array to disk and know exactly what you’ll get; you can mmap a data file of Int64s into a Julia int array and it just works. You can’t do any of those things if your system plays tricks with how integers are represented.
In my experience, having a simple, predictable representation for your data is far more important in practice than trying to ensure that factorial(200) gives a mathematically correct answer. Computing things like factorial(200) is a really a gimmick that doesn’t actually come up much in real life. But, of course, if it does come up, you can just use BigInts and get your answer.
One of the things I like about Julia is that it gives you such a high level syntax for expressing such low-level constructs. This abstraction is so good that we are surprised when we realize how close to the machine we really are.
Excellent post – really like how Julia is used to explain Julia.