Jan 29 2010

iBad: The FSF Kool-Aid and Other Dystopian Hallucinations

The people who worry that the iPad will bring about a dystopian future for home computing keep forgetting something: for the rest of humanity, their ideal world of perfectly hackable machines is already a dystopian nightmare. It’s a world in which nothing works without spending hours setting it up, in which basic features are missing while the manual lists thousands of irrelevant options, in which a million hardware extensions are available for their machine, but none of them help to solve a single one of their day-to-day problems. While being something of a hacker myself, I feel that the hacker’s vision of totally open computing probably should become a niche market, in much the same way that chemistry sets represent a niche market. The fact that not every person has a set of tools in his house that, by default, allows him to conduct arbitrary chemistry experiments has not substantially slowed down the progress of chemistry from what I can tell. The arrival of a world in which the most popular computers are closed to arbitrary hardware extensions and all applications are required to run within a sandbox probably won’t slow down the progress of personal computing much either.

Hackers of the world, your priorities are not simply different from the average user’s: they often represent a direct attack on the average user’s preferences. You keep asserting that you have the normal person’s interests in mind, but I think you’re often simply concealing your own self-interest underneath politicized rhetoric about freedom and openness.


Nov 20 2009

Cleaning Up an iTunes Library with MacRuby

For a little more than a year now, I’ve been meaning to write a script to rename all of the files in my iTunes library so that they’re in proper English title case. In large part, this project was inspired by reading John Gruber’s post about a Perl script that he’d written to convert text strings into title case programmatically. After reading Gruber’s post, I grabbed a copy of Sam Souder’s translation of Gruber’s Title Case script into Ruby and set about trying to use it as part of a home-grown Ruby script to clean up my iTunes library. During my first pass, I tried to combine Sam’s convenient utility method with RubyCocoa to produce a script that could access the iTunes methods directly and rename my files automatically without editing the MP3 files directly. Unfortunately, the RubyCocoa documentation was too sparse at the time for me to figure out the relevant method calls to be making.

Thankfully, I spent some time this week reading the MacRuby documentation and realized in the process how to write my desired script. The results are now on GitHub. The only original bit of code is below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#!/usr/bin/macruby
 
# We iterate over every track, stripping bounding whitespace and putting things into proper title case.
 
require 'titlecase'
 
framework 'cocoa'
 
load_bridge_support_file 'ITunes.bridgesupport'
 
framework("ScriptingBridge")
 
itunes = SBApplication.applicationWithBundleIdentifier("com.apple.itunes")
 
music_playlist_tracks = itunes.sources.objectWithName("Library").userPlaylists.objectWithName("Music").fileTracks
 
music_playlist_tracks.each do |track|
  old_artist = track.artist
  new_artist = track.artist.strip.titlecase
  if old_artist != new_artist
    track.artist = new_artist
    puts "Artist: #{old_artist} => #{new_artist}"
  end
 
  old_album = track.album
  new_album = track.album.strip.titlecase
  if old_album != new_album
    track.album = new_album
    puts "Album: #{old_album} => #{new_album}"
  end
 
  old_name = track.name
  new_name = track.name.strip.titlecase
  if old_name != new_name
    track.name = new_name
    puts "Name: #{old_name} => #{new_name}"
  end
end

Outside of this little snippet of MacRuby code that I had to write, I made two small changes to Sam Souder’s original code:

  1. I defined a titlecase method for the NSString class rather than the String class.
  2. I pulled the list of English prepositions out of the main code and put it into a separate YAML file to make it easier for a non-programmer to edit the list.

I know from experience with checking the results in a half gleeful and half paranoid frenzy that this code works properly on my own two machines, both of which are running MacRuby 0.5. That said, I won’t vouch for the reliability of this program in any way. If you notice any bugs, please do let me know, so that I can fix my own iTunes library with a revised version of the script.


Nov 18 2009

Suggestions for TextMate’s Search and Replace

Like so many other programmers, I adore TextMate. For that reasons, here are two simple features that I’d enjoy seeing in the next version:

  1. A case-preserving search and replace tool. If I search for a string like my_class and want to replace it with my_new_class, I’d like my_class to transform into my_new_class at the same time that MyClass transforms into MyNewClass.
  2. A version of search and replace that also edits filenames. If I replace MyClass with MyNewClass everywhere, I’d like to have the file named MyClass.java renamed to MyNewClass.java at the same time.

Both of these would save me from a ton of stupid code drift problems while performing a global search and replace. Do other people think these are interesting enough to merit suggesting them directly to Allan Oddgaard?


Nov 11 2009

Two of Unison’s Quirks

As many people know, I adore the program Unison. That said, Unison has its fair share of quirks. Today I found myself confronting one that I had spent a whole day confused by about a month ago.

Unison stores a cache of information about the file system for every directory that it synchronizes. On Macs, this caches lives in ~/Library/Application Support/Unison/. What you need to know is that this cache must either be absent on both the client and the server or it must be present on both. If it gets deleted on one but not the other, Unison hangs indefinitely without producing any error messages when you try to synchronize the directories again. Suffice it to say, this is confusing.

Possibly this quirk is already fixed in newer versions, but I’ve had trouble compiling Unison recently, so I’ve left that task for the future. In the meantime, the other quirk that irks me is the inconsistency in the icons used by the Mac GUI. Sometimes a top level element in the GUI represents a folder and sometimes it represents a single file inside of a folder. I think that a cleaner GUI would simply render a folder with nested files every single time you had a discrepancy between the client and server — even if the discrepancy is a single file. Changing the meaning of items at the same spatial positioning seems to produce a Stroop effect for me — and I suspect for others as well.


Aug 25 2009

Updating R Packages Automatically

Here’s a very naive program I just wrote to update all of the R packages I have on my system after I update the core R binary. Please let me know if there’s anything obviously wrong with this, such as failing to update items with chains of dependencies.

1
2
3
4
5
6
7
8
9
10
11
12
13
all.packages <- installed.packages()
r.version <- paste(version[['major']], '.', version[['minor']], sep = '')
 
for (i in 1:nrow(all.packages))
{
    package.name <- all.packages[i, 1]
    package.version <- all.packages[i, 3]
    if (package.version != r.version)
    {
        print(paste('Installing', package.name))
        install.packages(package.name)
    }
}

Mar 20 2009

iPhone 3.0 and the End of Jott

For me, one of the best features of the upcoming iPhone 3.0 software is the ability to record voice messages for myself. This will totally replace Jott, which I decided a month or two ago to abandon after their recent — and asinine — decision to only offer voice recording to paying customers.

When it first debuted, Jott seemed to be useful. Unfortunately, it became clear over time that the text-to-speech conversion quality would remain low, particularly for the sorts of scientific ideas I would most want to take note of. Because of the terrible conversion quality, I would have been just as well served by a tool that merely recorded my voice and forewent the futile attempt at producing text. Since June of last year, I think that I have used Jott exactly one time.

And then Jott decided to begin to charge for their mediocre product, rather than improve it enough that their subscriber base would grow. This seemed to be a clear sign of their obsolescence. I suspect that this is general principle in the Internet age: when you convert a free service to a paid service, you are effectively admitting that your advertising income is so low that you need to do anything you can just to stay afloat.

But these last, desperate attempts to stave off bankruptcy merely delay the inevitable — and in the process they alienate those few customers these sites have. I would advise any start-ups to avoid this tendency. When the time has come to fold, you should fold gracefully.


Feb 25 2009

Text Processing in R

On a regular basis, I have to process text in R. I invariably find that I need a function whose name or usage I can’t bring to mind. To help my future self, I’m writing this review of R’s built-in text processing functions. Hopefully, this review will also be of use to others.

Character Vectors == Arrays of Strings
The first source of confusion for me is the R type system. In R, a string is considered to be a character vector, but an R character vector would be an array of strings in any other programming language. Consider the following example:

1
2
str = 'string'
str[1] # This evaluates to 'string'.

To get access to the individual characters in an R string, you need to use the substr function:

1
2
str = 'string'
substr(str, 1, 1) # This evaluates to 's'.

For the same reason, you can’t use length to find the number of characters in a string. You have to use nchar instead.

But let’s go back to substr. The first argument to substr is a character vector, the second is the index of the first character you want, and the third is the index of the last character you want. So you can also use substr as follows:

1
2
3
str = 'string'
substr(str, 1, 2) == 'st'
substr(str, 5, 6) == 'ng'

As you can see, substr lets you access the individual characters of a string using an indexing/slicing strategy.

To break strings apart into vectors of characters, you can use the strsplit function, which works a lot like the split function in Perl. Here’s an example:

1
strsplit('0-0-1', '-') # Evaluates to list('0', '0', '1')

Putting Things Back Together Again
Now that you can pull strings apart, you need to be able to put the characters back together again into strings. You can do this using paste. paste is an idiosyncratic function: it is the only function for concatenation of strings in R, but it also handles the work of more sophisticated functions like Perl’s join. Try the following:

1
2
3
str1 = 'first'
str2 = 'second'
print(paste(str1, str))

As you’ll see, there’s an odd space added to the output. That’s because paste has an optional argument that provides a separator used when combining strings that defaults to a single space. So,

1
paste('first', 'second') == paste('first', 'second', sep = ' ')

You can get rid of the space by specifying a null separator instead.

1
print(paste('first', 'second', sep = ''))

Changing Case
To change the case of strings or individual characters, you need to use the tolower and toupper functions. You can use these with substr to make a function that turns most common words into their title case form:

1
2
3
4
5
pseudo.titlecase = function(str)
{
	substr(str, 1, 1) = toupper(substr(str, 1, 1))
	return(str)
}

With a little more sophistication, you can make a full title case function à la John Gruber. The result of my attempt to do this is fairly long, so I’ve posted it to my GitHub account. I’ll probably see about adding it to the R repository at some point if I can incorporate enough features to make it worth using. If you’re interested in helping or using what I’ve written, you can check out the code here.

Finally, there is a chartr function that translates characters in the input into the corresponding characters you select. For instance, you might try this:

1
chartr('abc', 'XYZ', 'abcabc') # Evaluates to "XYZXYZ".

This will remind Perl users of tr, which I personally never use. Nevertheless, it’s nice knowing that it’s there in R, albeit with a slightly different name.

Substring Containment
Finally, you might want to know if a string is contained in another string or set of strings. You can do this using the charmatch function:

1
2
charmatch("m",   c("mean", "mode")) # returns 0
charmatch("med", c("mean", "median")) # returns 2

I tend to use regular expressions by the time I would need substring matching, so I’m not sure if I would ever use charmatch in practice.

Future Ideas
For more sophisticated text processing, you would want to use regular expressions and the grep family of functions. I’ll have to read about them and write up something about their use in the future. R also implements an approximate regular expression matching system using Levenshtein edit distances, but I haven’t tried using that yet.


Feb 20 2009

Efficiency versus Readability

Every programming language — also every programmer — must trade off between writing code that is readable and producing code that executes the absolute minimum number of instructions to perform a task. This is a constant source of potential decisions, because it is almost always possible to make code less understandable while making it more efficient. For example, iterating over an entire list can be inefficient if you are only looking for three elements that may well be at the front of the list, but code that implements a generic search of the entire list — think grep — is usually much simpler to understand.


Feb 16 2009

Yale Courses Online

Yale has updated its impressive set of videotaped lectures. For those interested in automating downloading the videos for any course, the script below should be useful. You’ll need to install the Perl module WWW::Mechanize before you can run the script. You’ll also want to update the list of courses URLs to reflect the courses that you want to download.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/usr/local/bin/perl
 
use strict;
use warnings;
 
use WWW::Mechanize;
use File::Spec;
 
my $mech = WWW::Mechanize->new();
 
my @courses = (
    'http://oyc.yale.edu/astronomy/frontiers-and-controversies-in-astrophysics/content/downloads',
    'http://oyc.yale.edu/economics/financial-markets/content/downloads'
);
 
for my $course (@courses)
{
    $mech->get($course);
 
    for my $link ($mech->find_all_links)
    {
        if ($link->url =~ m/mov$/ and $link->text =~ m/high/i)
        {
            print $link->url;
            print $link->text;
            print "\n\n";
 
            my (undef, undef, $filename) = File::Spec->splitpath($link->url);
 
            print "$filename\n";
 
            $mech->get( $link->url, ':content_file' => $filename );
        }
    }
}

Feb 15 2009

Higher Order Programming

Updated: 2.17.09

Advocates of higher order programming languages, ranging from Lisp to Ruby, usually claim that programming in a higher order language is more efficient than programming in a language that is designed for easy translation into machine code, such as C. The case for higher order programming languages can be made on many points: (1) mental power, rather than computing power, is usually the limiting factor in modern program design; (2) fully generic operations are nearly or entirely impossible to express in languages that are too strongly typed; (3) etc., etc.

I think that we should emphasize cognitive factors instead: higher order programmers are experts in the application of a set of primitive operations that are more universally applicable to the problems we are asked to solve as programmers than the operations provided by lower level languages.

There is a considerable literature in cognitive psychology on problem solving and expertise. Experts solve problems more efficiently than novices in part because they immediately recognize the relevant set of abstractions to apply to a problem. All of the modern higher order programming languages share a set of abstractions that I would claim are better suited for problem solving than the abstractions offered by C or Java.

First and foremost, to describe algorithms cleanly, you want to be able to write functions that use other functions as input. If you cannot do this, you end up with design patterns — the accumulated knowledge of how to write scaffold code that is not unique to your problem at hand. This wastes your time and it also makes the resulting code ugly. In code, beauty comes very near to being truth, because bugs tend to hide in the bland sections of your programs that you find tedious to read.

To convince yourself that treating functions as a primitive data type is invaluable, ask yourself how you would write a function that returns the derivative of a given function as a new function. This is a trivial task in higher order programming languages, but it is much subtler in lower level languages. Sometimes it’s simply impossible.

Once you have the ability to recognize that there are general patterns in how you apply functions to problems, you can encode these patterns as new functions in higher order programming languages. These functions of functions are the primitive operations that functional programmers use constantly, but they’re entirely lacking in lower level languages.

To illustrate how these patterns can be encoded as generic functions to produce clearer code, I’ll discuss the use of the map and reduce functions provided by four languages: Perl, Python, Ruby and Clojure. I’ll give a brief explanation of these functions and then show how they can be used to make code shorter. In particular, I’ll solve the following easy problems: (1) finding the sum of the first five squares and (2) creating a string of upper case characters separated by dashes from list of lower case characters.

Map
map is function that applies functions to the entries of an array in order and returns the sequence that results. It therefore substitutes for explicit looping. You can use map to easily find the factorial of each of the first five numbers:

1
[1, 2, 3, 4, 5].map { |n| factorial(n) }

Reduce
reduce is a function that transforms an array into a single value. You iterate over the pairs of items in the array and combine them a function you specify. This lets you express ideas like summing a sequence of numbers or joining a set of strings together without any explicit loops. Here’s how you can use reduce to find the sum of the first five numbers in Ruby:

1
[1, 2, 3, 4, 5].reduce { |a,b| a + b }

When you can combine these operations, you can write more complex operations quickly that would require multiple nested loops in lower level languages. To see how this helps as the functions get larger, let’s go over the use of compositions of map and reduce in four languages.

Perl
We’re going to start with Perl, but there’s a problem at the start: Perl 5 does not implement reduce. reduce does exist in Perl 6, but, as always, no one knows when Perl 6 will be available. In the absence of reduce, you have to write an explicit for loop to do the work reduce would do in the background. This adds dummy variables equivalent to useless pronouns in written English, which makes the code less clean:

1
2
3
4
5
6
7
8
my @squares = map {$_**2} (1, 2, 3, 4, 5);
 
$sum = 0;
 
for my $square (@squares)
{
	$sum = $sum + $square;
}

That solves the numeric calculation. For strings, it turns out that Perl implements a function that is really a special case of using reduce: join. join takes a separator and an array of strings and returns the string you get by concatenating all of the strings while placing the separator in between. This is particularly helpful, because an explicit for loop would be complicated by a test for your position in the array to insure that you didn’t add the separator at the start or the end of the output.

1
join '-', map {uc $_} ('a', 'b', 'c');

Python
Python does implement both map and reduce, so there’s no need for much fuss.

1
2
3
reduce(lambda a, b: a + b,
           map(lambda a: a**2,
                  [1, 2, 3, 4, 5]))
1
2
3
reduce(lambda a, b: a + '-' + b,
           map(lambda a: a.upper(),
                  ['a', 'b', 'c']))

Ruby
Ruby also implements both map and reduce:

1
[1, 2, 3, 4, 5].map { |a| a**2 }.reduce { |a,b| a + b }
1
['a', 'b', 'c'].map { |a| a.upcase }.reduce { |a,b| a + '-' + b }

Clojure
And Clojure, being a Lisp, of course implements both map and reduce:

1
2
3
(reduce + 
            (map (fn [n] (* n n))
                     '(1 2 3 4 5)))
1
2
3
(reduce (fn [a, b] (str a "-" b))
             (map (fn [a] (. a toUpperCase)) 
                      '("a", "b", "c")))