Today I started reading the Ruby Snips website, which has a pretty good sample of interesting snippets of Ruby code on it. I was particularly intrigued by the following snippet from a post on Prime Numbers dating back to March 23rd, 2007:
1 2 3 4 5 | class Fixnum def prime? ('1' * self) !~ /^1?$|^(11+?)\1+$/ end end |
At first, I was convinced this code was broken. Before I had given much thought to the algorithm, I was ready to assume that the “prime?” name was a misnomer: I assumed the code was actually testing for members of a specific class of palindromes. After a minute or two, I pieced together what was really happening when testing a number:
- The number, n, is expanded into a string of 1′s of length n.
- The string of 1′s is testing using a regex with back-references that finds the presence of a repeated divisor, a technique that works because a string of length a * b = n is composed of b copies of a string of length a.
At that point, I began to marvel at the simplicity of the algorithm relative to the obscurity of the code. And while I was so amazed by this, it occurred to me: this is an absolutely terrible way to test for primality. Because you have no control over the loop bounds for the regex, you effectively test every number up to n as a possible divisor if n is a prime. But you really only need to test up to the square root of n to determine if n is prime. So your code runs for the square of the time it should run. And the expansion of n into a string, at least theoretically, requires exponentially more space than an integer expressed in binary would. This seemed like one of the worst possible ways to test for primality I had ever seen.
Still, I wanted to verify this empirically as well as theoretically, so I typed in
1 | 399839483.prime? |
during an IRB session. The CPU and memory usage were absurd for a primality test on such a small number. I didn’t even bother letting the code complete after a few seconds of the watching interpreter hanging there, silently wasting CPU cycles. In contrast, the simpler, clearer code,
1 2 3 4 5 6 7 8 9 10 | class Fixnum def prime? (2..Math.sqrt(self).to_i).each do |i| if i * (self / i) == self return false end end return true end end |
runs fairly quickly.
The lesson I think everyone should take from this is a simple one: don’t be clever when writing code. At the very least, don’t be clever in the way the author of this snippet was. Your code is likely to end up being unclear, slow and wasteful with memory.
Perhaps I should be clearer about the problems of cleverness. Clever code is a mistake; clever algorithms are wonderful. Quick sort is a clever algorithm that always confuses me until I think carefully about it — and then I marvel at its superiority over other less efficient sorting algorithms; this snippet is a mediocre algorithm written in a style of code that’s needlessly confusing.
I really like this article, and it is a good example of someone trying to save line space at the price of efficiency.
However, clever code that uses clever algorithms and techniques is really wonderful when you see it. I found this quite common when working with small problems in prolog and haskell.
Also, I image the overhead of using a regex in that algorithm was partially responsible for the poor performance.
I’m glad you liked this post, Jason. I agree that clever code using clever algorithms is great, so long as it’s still readable. An amazing example is the regular expression parser featured in the first chapter of O’Reilly’s Beautiful Code. I also think that more sophisticated languages like Prolog, Haskell or even Ruby allow people to write clever, but understandable, code more easily.
And, yes, I’m sure that the overhead of using a regex in the algorithm is a major source of the poor performance, though even an implementation just using direct comparisons of substrings would take longer. If you’ve got Ruby on one of your machines, just try the example calculation with both algorithms above. I can never bring myself to let the first bit of code continue for more than thirty seconds, while the second bit finishes in less than a second.