I’ve changed the theme on this blog to the Hybrid theme. I was tired of how busy the old theme was, and I wanted more space for graphics and code snippets. If you have any suggestions for cleaning things up even more, please let me know.
The Price of Calculation
In a world in which the price of calculation continues to decrease rapidly, but the price of theorem proving continues to hold steady or increase, elementary economics indicates that we ought to spend a larger and larger fraction of our time on calculation.1
Over the next ten years, I hope that more and more mathematically minded hackers, empowered by open source tools like the R programming language and emboldened by the popularization of statistical analyses by people like Steve Levitt, will follow Tukey’s suggestion.
- J. W. Tukey : The American Statistician : Sunset Salvo↩
An Alternative to Occam’s Razor
In light of human foibles, I would suggest that this decision rule be used in lieu of Occam’s Razor: of several possible explanations for an observation, the most boring one is probably the most accurate.
iBad: The FSF Kool-Aid and Other Dystopian Hallucinations
The people who worry that the iPad will bring about a dystopian future for home computing keep forgetting something: for the rest of humanity, their ideal world of perfectly hackable machines is already a dystopian nightmare. It’s a world in which nothing works without spending hours setting it up, in which basic features are missing while the manual lists thousands of irrelevant options, in which a million hardware extensions are available for their machine, but none of them help to solve a single one of their day-to-day problems. While being something of a hacker myself, I feel that the hacker’s vision of totally open computing probably should become a niche market, in much the same way that chemistry sets represent a niche market. The fact that not every person has a set of tools in his house that, by default, allows him to conduct arbitrary chemistry experiments has not substantially slowed down the progress of chemistry from what I can tell. The arrival of a world in which the most popular computers are closed to arbitrary hardware extensions and all applications are required to run within a sandbox probably won’t slow down the progress of personal computing much either.
Hackers of the world, your priorities are not simply different from the average user’s: they often represent a direct attack on the average user’s preferences. You keep asserting that you have the normal person’s interests in mind, but I think you’re often simply concealing your own self-interest underneath politicized rhetoric about freedom and openness.
Gay Marriage: Another Data Point
Relevant to my earlier post about the relationship between direct democracy and laws prohibiting gay marriage, Pew Research just published poll data showing that a majority of Americans disapprove of same-sex marriage.
Academics’ Slang: Orthogonal
H. G. Wells famously said that, “statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.” I think we’re getting closer to that day: even the Supreme Court of the United States plans to start using the word ‘orthogonal’ colloquially.
Outlawing Gay Marriage
Given the recent votes on same-sex marriage in New Jersey and Portugal, I wanted to test a seemingly innocuous claim that touches upon very broad issues in political theory: does the degree of directness of a “democratic” vote predict whether the vote will promote or prohibit same-sex marriage? Naively, it seemed clear to me that this was true: every single direct vote in my memory has outlawed same-sex marriage, while the only decisions that have allowed same-sex marriage have originated among elected representatives or unelected judges.
I decided to compile some rough data to test this idea using Wikipedia’s articles on the topic. I took what seemed to be the non-redundant decisions since the 1990′s from the following three articles:
- Same-Sex Marriage in the Netherlands
- Same-Sex Marriage in Spain
- Same-Sex Marriage in the United States
And I arrived at this table:
| Region | Date | Type of Decision | Promoted |
|---|---|---|---|
| Hawaii | 1993-05-05 | Court Ruling | 1 |
| U.S.A. | 1996-09-21 | Congressional Decision | 0 |
| Hawaii | 1998-11-03 | Direct Vote | 0 |
| Vermont | 1999-12-20 | Court Ruling | 1 |
| Holland | 2000-12-19 | Congressional Decision | 1 |
| Massachusetts | 2003-11-18 | Court Ruling | 1 |
| Spain | 2005-06-30 | Congressional Decision | 1 |
| California | 2008-05-15 | Court Ruling | 1 |
| Connecticut | 2008-10-10 | Court Ruling | 1 |
| California | 2008-11-04 | Direct Vote | 0 |
| New Hampshire | 2009-03-26 | Congressional Decision | 1 |
| Iowa | 2009-04-03 | Court Ruling | 1 |
| Vermont | 2009-04-07 | Congressional Decision | 1 |
| Connecticut | 2009-04-23 | Congressional Decision | 1 |
| Maine | 2009-05-06 | Congressional Decision | 1 |
| New York | 2009-05-12 | Congressional Decision | 1 |
| Texas | 2009-10-02 | Court Ruling | 1 |
| Maine | 2009-11-03 | Direct Vote | 0 |
| Washington, D.C. | 2009-12-01 | Court Ruling | 1 |
| New York | 2009-12-02 | Congressional Decision | 1 |
| New Jersey | 2010-01-07 | Congressional Decision | 0 |
You can find a CSV file with this data at the GitHub repository where I’ve stored all of the analyses I’ve completed so far. I’d love for people to add more data or contribute some visualizations of the patterns I’m pointing out. And, as always, I’d love to hear criticisms or suggestions about my approach.
The summary statistics provide pretty clear support for my intuition, though they’re rarely statistically significant, given the small sample sizes:
- 100% of court rulings promoted same-sex marriage.
- 80% of congressional decisions promoted same-sex marriage.
- 0% of direct votes promoted same-sex marriage.
For me the conclusion to be drawn from all of this is not that our respect for the will of the majority should compel us to prohibit same-sex marriage, but rather that we should be more hostile to direct democracy, because it is a powerful force for promoting intolerance in American society. I imagine this will sound un-American to many readers, but I think it’s quite clear that this sentiment was foundational for the American experiment in government. You see this distrust of direct democracy repeatedly reiterated in the Federalist Papers, especially in Federalist Number 10. Indeed, one can argue that the major purpose of our Constitution is to limit the ability of direct democracy to harm minorities within our society.
Killing Yourself: An Addendum
In further support of the claim that a lot of deaths are partly self-induced, here’s a fascinating piece by Wired on the extraordinary rise in the percent of deaths among the young caused by their own poor decisions.
It’s remarkable that, for the young, modern science has already made the world so safe that humanity, rather than nature, is now responsible for a majority of its own suffering.
Announcing r-ORM: A Pure R Object-Relational Mapper
My apologies for the long break between posts. Before the end of this week I’ll return to my series of posts on image processing in R. In the intervening time, I’ve finished a piece of code that I’d like to officially release to the public.
The code in question is a very minimal object-relational mapper written entirely in R. As of today, you can find the code on GitHub here. If you’re not familiar with using an ORM, I’d suggest reading a bit about how to use ActiveRecord, the ORM I’ve tried to emulate.
As it stands, the code I have is able to connect to a MySQL database and extract information about a specified table using MySQL’s SHOW COLUMNS. The code then automatically builds up R code that will map the rows of the table onto R objects. This code can subsequently be eval’d or cat’d to a file.
Let me note right off the bat that the code I’ve written is fairly heinous stylistically: because my understanding of R metaprogramming is rather limited, I’ve produced what amounts to the ugliest sort of code generation tool. If you check out the source in orm.R, you’ll quickly see what I mean: the output R code is generated as a string using a long series of paste() operations, many of which embed paste() operations inside of themselves. In the future, I would like to rewrite the code in a clearer fashion, possibly constructing expressions and parse trees directly, rather than using an intermediate layer of code as string.
That said, I want to release the code so that I can get feedback on my approach. To help you give me feedback, here’s a quick walkthrough of how’d you use the current code:
(1) Download the code from its GitHub home. Place the files you get into a directory of your choice. Let’s assume you use ~/r-ORM.
(2) Install the R YAML and MySQL packages if you don’t already have them on your system:
1 2 | install.packages('yaml') install.packages('RMySQL') |
(3) Create a test database in MySQL called `sample_database`:
1 | CREATE DATABASE `sample_database`; |
(4) Give permissions on this test database to `sample_user`:
1 2 3 | GRANT ALL ON `sample_database`.* TO 'sample_user'@'localhost' IDENTIFIED BY 'sample_password'; FLUSH PRIVILEGES; |
(5) Create a test table called `users` in `sample_database`:
1 2 3 4 5 6 7 8 | USE `sample_database`; CREATE TABLE `users` ( `user_id` INT(11) NOT NULL AUTO_INCREMENT, `name` VARCHAR(255) NOT NULL, `password` VARCHAR(255) NOT NULL, PRIMARY KEY (`user_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1; |
(6) Edit the database.yml file if you didn’t use the database name, user name or password I just suggested. You’ll also need to edit it if you’re not working on localhost.
(7) Open up an R interpreter. Set your working directory and source orm.R:
1 2 | setwd('~/r-ORM') source('orm.R') |
(8) Build code for our user objects using orm.build.model('user'):
1 2 | code <- orm.build.model('user') cat(code) |
At this point, you can review the code that’s been generated to see whether it should work on your system. If it will, you can eval it now:
1 | eval(parse(text = code)) |
With that done, you should have a working set of functions that handle creating, finding, manipulating and deleting R objects that are serialized to the database. To test out the resulting model, let’s start by creating a user object. In general, an object of class foo will be created using an auto-generated function called create.foo:
1 | user <- create.user() |
Calling this functions builds a user object in memory. Nothing is in the database so far. To see the object, type user at the command line:
1 | user |
Now you can edit this user to make it a real piece of data. The columns of the database table are mapped onto object attributes with appropriate getter and setter methods, like so:
1 2 | name(user) <- 'test_user' password(user) <- 'test_password' |
Once the user object is worth keeping, we store it in the database using store:
1 | user <- store(user) |
This stores the user object in the database. To get the ID’s edited correctly, you have to perform the assignment as indicated above. In the future I may change this.
After storing something, you might want to retrieve it later. Since we know that we just created the first user object, we can get it again by using a find.user call:
1 | user <- find.user(1) |
In general, you can find objects of class foo by calling find.foo(). If you provide an integer as an input, you’ll get the object with that ID. If you provide the string 'all', you’ll get a list containing all of the objects in your database. Other inputs produce an error.
You can see that we got the correct object using the getter methods:
1 2 3 | user.id(user) == 1 name(user) == 'test_user' password(user) == 'test_password' |
We can edit it again to see that updating rows of the database works as expected:
1 2 | name(user) <- 'new_user' store(user) |
Finally, now that we’re done with it, we can delete it from the database:
1 | delete(user) |
That, in a nutshell, is the use of this ORM solution. If you have any questions, please let me know: I’ll be happy to answer them.
I should note that there are many weaknesses of functionality with the current code. Notably:
- If you want a ‘foo’ object, it must be in a ‘foos’ table. This is inspired by ActiveRecord’s default pluralization rule.
- There’s no type checking. You can try to set string values on numeric columns; you’ll just get strange errors as a result.
- There’s no checking whether you’re inserting a NULL into a column that won’t allow NULL’s.
- The system might clobber an existing generic method or function if there’s a naming conflict. I’d recommend writing everything to disk before using this code until you’re familiar with the results.
- The default objects are completely blank: the database default values are not used to initialize any attibutes of a new object.
- The SQL generated is not sanitized. You should not use this system to process end-user input.
That said, I find the system usable for my current needs. I hope it will help you, and I really hope that you’ll consider making suggestions for hoping to improve the code. Patches would be especially welcome.
I would also love feedback about the interface that’s shown to the end user of this code. Here are some outstanding questions I’d like to hear from people about:
- Are function names like
orm.build.model()memorable? - Does the syntax of calls on the resulting objects seem sufficiently R-like?
- Would it be useful to break out my minimal database abstraction layer into a separate package? Is there already a package to do this that I should be using instead?
- How can I avoid the “code as string” approach I’m taking?
- Do you have ideas about the ideal user syntax for implementing relations across tables, i.e. has_one or has_many relationships?
- Should a caching mechanism be designed?
- Should the database queries be minimized using some sort of pooling?
- Should transactions be used?
Using Complex Numbers in R
This post is a continuation of my series dealing with matrix operations for image processing. My next goal is to demonstrate the construction of simple low-pass and high-pass spatial frequency filters in R. It’s easy enough to construct simple versions of these filters in R using the Fast Fourier Transform (also known as the FFT), but, because the FFT is a slightly complicated tool, I’m going to build up to using it progressively over a few posts.
For starters, I want to review the use of complex numbers in R. As always, if you’re interested in reviewing the mathematics of complex numbers, I’d start by browsing references online. The tutorial here seemed good to me at first glance, though I can’t claim to have read it through.
Because complex numbers are implemented in the “base” package, it’s very easy to start working with them. To construct the complex number x + iy, you use complex:
1 2 3 4 5 6 7 | x <- 1 y <- 1 z <- complex(real = x, imaginary = y) z # [1] 1+1i |
It’s conventional in mathematics to use z to refer to a complex number, so I’ll continue on with that tradition.
As always occurs with mathematical data types in R, you can convert other objects to class “complex” using as.complex:
1 2 | as.complex(-1) # [1] -1+0i |
And you can test that an object is complex using is.complex:
1 2 | is.complex(as.complex(-1)) # [1] TRUE |
Beyond those standard operations, there are five essential mathematical operations you’d want to use on complex numbers.
First off, you want to be able to extract the real and imaginary components of a complex number. You can do this using Re and Im respectively:
1 2 3 4 5 6 7 | z <- complex(real = 2, imaginary = 1) Re(z) # [1] 2 Im(z) # [1] 1 |
The (x, y) representation of numbers is easier to understand at first, but a polar coordinates representation is often more practical. You can get the relevant components of this representation by finding the modulus and complex argument of a complex number. In R, you would use Mod and Arg:
1 2 3 4 5 6 7 8 9 10 | z <- complex(real = 0, imaginary = 1) Mod(z) # [1] 1 Arg(z) # [1] 1.570796 pi / 2 # [1] 1.570796 |
This corresponds to the intuition that i should be at a distance 1 from the origin and an angle of pi / 2.
Finally, you’ll want to be able to take the complex conjugate of a complex number; to do that in R, you can use Conj:
1 2 3 4 5 | Conj(z) # [1] 0-1i Mod(z) == z * Conj(z) # [1] TRUE |
As you can see, the modulus of z equals z times the conjugate of z, which is exactly what you expect.
Now, historically, complex numbers were invented so that you could find the square root of negative numbers. By default, sqrt does not return a complex number when you ask for the square root of a negative number. Instead, it produces a NaN error, as you can see below:
1 2 3 4 | sqrt(-1) # [1] NaN # Warning message: # In sqrt(-1) : NaNs produced |
To get the complex square root, you need to cast your negative number as a complex number using as.complex before applying sqrt:
1 2 3 | sqrt(as.complex(-1)) # [1] 0+1i |
In practice, complex numbers can be used to solve a huge range of problems, not least of which is the efficient representation of the FFT of an input vector. For fun, I thought I’d show a simple application: building a fractal using the Mandelbrot set.
To quote Wikipedia,
Mathematically the Mandelbrot set can be defined as the set of complex values of c for which the orbit of 0 under iteration of the complex quadratic polynomial zn+1 = zn2 + c remains bounded. That is, a complex number, c, is in the Mandelbrot set if, when starting with z0 = 0 and applying the iteration repeatedly, the absolute value of zn never exceeds a certain number (that number depends on c) however large n gets.
We can translate this definition into R pretty easily by making certain assumptions about the exactness we want in our results. For my purposes, I’ll say that a number is in the Mandelbrot set if, after 100 iterations of the Mandelbrot algorithm, it’s never once had a modulus greater than 1,000,000. As you’ll see from the image I produced, this assumption works out pretty well, though it takes some real number crunching to get results out of it for reasonable sized data sets.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | in.mandelbrot.set <- function(c, iterations = 100, bound = 1000000) { z <- 0 for (i in 1:iterations) { z <- z ** 2 + c if (Mod(z) > bound) { return(FALSE) } } return(TRUE) } |
Once we have this algorithm, we can easily generate an image of the Mandelbrot set by producing a matrix of complex numbers and depicting their inclusion in the set using image:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | resolution <- 0.001 sequence <- seq(-1, 1, by = resolution) m <- matrix(nrow = length(sequence), ncol = length(sequence)) for (x in sequence) { for (y in sequence) { mandelbrot <- in.mandelbrot.set(complex(real = x, imaginary = y)) m[round((x + resolution + 1) / resolution), round((y + resolution + 1) / resolution)] <- mandelbrot } } png('mandelbrot.png') image(m) dev.off() |
And here’s the resulting image:

UPDATE: Thanks to Giuseppe for pointing out the rounding error in the original code, producing an image with lines in it.