Computers are Machines

When people try out Julia for the first time, many of them are worried by the following example:

1
2
3
4
5
6
7
julia> factorial(n) = n == 0 ? 1 : n * factorial(n - 1)
 
julia> factorial(20)
2432902008176640000
 
julia> factorial(21)
-4249290049419214848

If you’re not familiar with computer architecture, this result is very troubling. Why would Julia claim that the factorial of 21 is a negative number?

The answer is simple, but depends upon a set of concepts that are largely unfamiliar to programmers who, like me, grew up using modern languages like Python and Ruby. Julia thinks that the factorial of 21 is a negative number because computers are machines.

Because they are machines, computers represent numbers using many small groups of bits. Most modern machines work with groups of 64 bits at a time. If an operation has to work with more than 64 bits at a time, that operation will be slower than a similar operation than only works with 64 bits at a time.

As a result, if you want to write fast computer code, it helps to only execute operations that are easily expressible using groups of 64 bits.

Arithmetic involving small integers fits into the category of operations that only require 64 bits at a time. Every integer between -9223372036854775808 and 9223372036854775807 can be expressed using just 64 bits. You can see this for yourself by using the typemin and typemax functions in Julia:

1
2
3
4
5
julia> typemin(Int64)
-9223372036854775808
 
julia> typemax(Int64)
9223372036854775807

If you do things like the following, the computer will quickly produce correct results:

1
2
3
4
5
julia> typemin(Int64) + 1
-9223372036854775807
 
julia> typemax(Int64) - 1
9223372036854775806

But things go badly if you try to break out of the range of numbers that can be represented using only 64 bits:

1
2
3
4
5
julia> typemin(Int64) - 1
9223372036854775807
 
julia> typemax(Int64) + 1
-9223372036854775808

The reasons for this are not obvious at first, but make more sense if you examine the actual bits being operated upon:

1
2
3
4
5
6
7
8
julia> bits(typemax(Int64))
"0111111111111111111111111111111111111111111111111111111111111111"
 
julia> bits(typemax(Int64) + 1)
"1000000000000000000000000000000000000000000000000000000000000000"
 
julia> bits(typemin(Int64))
"1000000000000000000000000000000000000000000000000000000000000000"

When it adds 1 to a number, the computer blindly uses a simple arithmetic rule for individual bits that works just like the carry system you learned as a child. This carrying rule is very efficient, but works poorly if you end up flipping the very first bit in a group of 64 bits. The reason is that this first bit represents the sign of an integer. When this special first bit gets flipped by an operation that overflows the space provided by 64 bits, everything else breaks down.

The special interpretation given to certain bits in a group of 64 is the reason that factorial of 21 is a negative number when Julia computes it. You can confirm this by looking at the exact bits involved:

1
2
3
4
5
julia> bits(factorial(20))
"0010000111000011011001110111110010000010101101000000000000000000"
 
julia> bits(factorial(21))
"1100010100000111011111010011011010111000110001000000000000000000"

Here, as before, the computer has just executed the operations necessary to perform multiplication by 21. But the result has flipped the sign bit, which causes the result to appear to be a negative number.

There is a way around this: you can tell Julia to work with groups of more than 64 bits at a time when expressing integers using the BigInt type:

1
2
3
4
5
6
7
8
9
10
julia> require("BigInt")
 
julia> BigInt(typemax(Int))
9223372036854775807
 
julia> BigInt(typemax(Int)) + 1
9223372036854775808
 
julia> BigInt(factorial(20)) * 21
51090942171709440000

Now everything works smoothly. By working with BigInt‘s automatically, languages like Python avoid these concerns:

1
2
3
4
>>> factorial(20)
2432902008176640000
>>> factorial(21)
51090942171709440000L

The L at the end of the numbers here indicates that Python has automatically converted a normal integer into something like Julia’s BigInt. But this automatic conversion comes at a substantial cost: every operation that stays within the bounds of 64-bit arithmetic is slower in Python than Julia because of the time required to check whether an operation might go beyond the 64-bit bound.

Python’s automatic conversion approach is safer, but slower. Julia’s approach is faster, but requires that the programmer understand more about the computer’s architecture. Julia achieves its performance by confronting the fact that computers are machines head on. This is confusing at first and frustrating at times, but it’s a price that you have to pay for high performance computing. Everyone who grew up with C is used to these issues, but they’re largely unfamiliar to programmers who grew up with modern languages like Python. In many ways, Julia sets itself apart from other new languages by its attempt to recover some of the power that was lost in the transition from C to languages like Python. But the transition comes with a substantial learning curve.

And that’s why I wrote this post.

22 responses to “Computers are Machines”

  1. Daniel Rupis

    “Because they are machines, computers represent numbers using many small groups of bits” better to say that using a small group of bits implies a trade off between speed and accuracy. Being a machine is unrelated to how numbers are introduced or coded in a computer.

    Using a small number of bits means that there will be overflow and other errors, so you need to be aware of what is the cost of using this trade off.

    Machines are not liable of our decisions.

  2. johnny

    I think that it is a reasonable decision to use BigInt by default if the language dosn’t have a type system (POLS).

    In scala you wouldn’t wonder because you were forced to declare your function to return ints or bigints. Julia allows you to specify a type, but if you don’t it would be nice if it would be automatically converted, even at the expense of some processing time.

  3. Andrew

    I believe that the author over estimates the benefits (code execution speed) and under estimates the cost (additional programming complexity and the potential for data-dependent bugs). How many potential Julia users are as capable of handling overflows as the Python development team? After accounting for overflow guards, how much faster is Julia? Benchmark?

    There is a place and an argument for moving the calculation down the machine level. However, doing that in the design of a general purpose computing language, IMHO, does not make for safe and reliable statistical computing environment.

  4. Watson Aname

    The trade off isn’t quite as simple as you make it out to be. Common Lisp uses a few bits of type information so has a smaller range of small integer operations, but seamlessly moves to big ints as needed. If you stay within the bounds of small integer operations, you keep the performance, but you safely overflow into big int operations when needed. If you or the compiler can guarantee this isn’t going to happen, c like performance is achievable locally. This is probably the right thing to do in a language like Julia too, but has some pretty deep implications for the type system and compiler….

  5. Stefan Karpinski

    @Andrew: the Julia website has benchmarks comparing Julia’s performance at recursion, loops, etc. to Python, among other high-level dynamic languages: http://julialang.org/. Python is generally 30-70x slower than C; with that much overhead, integer auto-promotion is a negligible cost. Julia, on the other hand is within 1-2x slower than C – and we’re aiming to close that gap all the way. When you’re that close to C, any additional overhead on something as basic as integer arithmetic is very costly.

  6. Stefan Karpinski

    @Watson Aname: you’re right – the tradeoff is a bit more complicated. There’s a third factor, which is transparency and simplicity of program execution and data representation. Some Lisp implementations do use fancy tricks like the ones you describe to get bigint behavior without much performance cost. However, there is another cost: it ceases to be easy to understand what’s happening and why a program has the performance characteristics that it does.

    In Julia, we want simplicity and transparency as well as performance. In this case, if you understand that Julia’s integers are machine Int64s, then you understand exactly how they behave how they are represented. Ultimately, for the vast majority of programs, you just want your integers to be 64-bit machine ints. Moreover, if you’re storing integers in arrays – as happens now and then in numerical computing – having a simple, standard, transparent representation becomes crucial. If your default integers are just 64-bit machine integers, then you know exactly what an array of them looks like in memory. This has lots of benefits: the array can be passed as-is to C and Fortran libraries; you can write the array to disk and know exactly what you’ll get; you can mmap a data file of Int64s into a Julia int array and it just works. You can’t do any of those things if your system plays tricks with how integers are represented.

    In my experience, having a simple, predictable representation for your data is far more important in practice than trying to ensure that factorial(200) gives a mathematically correct answer. Computing things like factorial(200) is a really a gimmick that doesn’t actually come up much in real life. But, of course, if it does come up, you can just use BigInts and get your answer.

  7. Adam Savitzky

    One of the things I like about Julia is that it gives you such a high level syntax for expressing such low-level constructs. This abstraction is so good that we are surprised when we realize how close to the machine we really are.

  8. Edward Garson

    Excellent post – really like how Julia is used to explain Julia.

  9. stla

    Hello. I’m mainly a R user. I have one comment and one question.

    Comment: R has not very strong numerical performance but generally it is sufficient. So I agree it’s better to increase the performance only when needed. Automatic conversion hides the fact that you need something more than usual performance. Moreover, switching from Int to BigInt seems to be easy with Julia.

    Question: Is the Julia BigInt type similar to the GMP library for “arithmetic without limitations” ? (as I used here http://stla.overblog.com/the-binary-splitting-with-the-r-gmp-package-application-to-gauss-hypergeometric-function )

    Cheers.

    1. John Myles White

      Julia’s BigInt type is a wrapper for GMP

  10. stla

    Thank you John. Excellent :)

    Is there something like the Rmpfr package in Julia for “arbitrarily precise numbers” ? (http://cran.r-project.org/web/packages/Rmpfr/vignettes/Rmpfr-pkg.pdf)

    I still have an open question here : http://stackoverflow.com/questions/14302742/conversion-bigq-to-mpfr-with-rmpfr-package

  11. Stefan Karpinski

    Julia’s BigFloats are a implemented using MPFR, so yes. Both BigInt (GMP) and BigFloat (MPFR) types are built into Julia.

  12. Stefan Karpinski

    Also, regarding ease-of-use, this seems to be much more pleasant in Julia than in R:

    julia> big(1)
    1

    julia> float(big(1))
    1e+00 with 256 bits of precision

  13. stla

    Thank you, nice. GMP rationals are not available yet ? (in R, I use GMP rationals with the RCDD package, which is actually an interface to the cddlib library).

  14. Stefan Karpinski

    You can create Rationals of any integer type:

    julia> big(2)^1000//big(3)^1000
    10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376//1322070819480806636890455259752144365965422032752148167664920368226828597346704899540778313850608061963909777696872582355950954582100618911865342725257953674027620225198320803878014774228964841274390400117588618041128947815623094438061566173054086674490506178125480344405547054397038895817465368254916136220830268563778582290228416398307887896918556404084898937609373242171846359938695516765018940588109060426089671438864102814350385648747165832010614366132173102768902855220001

    julia> typeof(ans)
    Rational{BigInt}

    We use this instead of GMP’s rationals because it integrates much better with other numerics in Julia.

  15. stla

    With the latest Julia Studio on Windows :

    julia> big(3)
    big not defined

    It works with BigInt() :

    julia> BigInt(3)
    3

    But the quotient is undefined :

    julia> BigInt(3)/BigInt(6)
    no method convert(Type{Float64},BigInt)

  16. Stefan Karpinski

    The big function was added fairly recently. Try BigInt(1) and BigFloat(1.0) and such. Or install the latest Julia from source (take a while the first time, but it’s worth it).

  17. stla

    Hello Stefan. There’s no urgency for me. And actually I’d really need the cddlib functions/macros returing results in GMP rationals. This would allow to write a program helping me for my research in mathematics. This program written in R is extremely slow.

  18. stla

    Why I need the cddlib library: http://pagist.github.io/?5861876
    (this is a preliminary version of the next article of my blog)