The Psychology of Music and the ‘tuneR’ Package

Introduction

This semester I’m TA’ing a course on the Psychology of Music taught by Phil Johnson-Laird. It’s been a great course to teach because (i) so much of the material is new to me and (ii) because the study of the psychology of music brings together so many of the intellectual tools I enjoy, including music theory, psychophysics and Fourier analysis.

One topic this semester that was completely new to me was the theory of tuning: I had known about the invention of the well-tempered system of tuning, but had never heard of Pythagorean tuning or just tuning — and certainly was not aware that the well-tempered system Bach celebrated was not identical to our current equal-tempered system of tuning.

As a way of consolidating some of the knowledge I’ve gained, I decided I’d write a blog entry after several months of neglecting this blog. (For that neglect, I’ll blame a combination of grant writing, book writing, ongoing research projects and personal life developments.) In what follows, I’ll give a brief overview of the theory of tuning at a theoretical level that should be accessible to anyone who’s familiar with the names of intervals and feels comfortable thinking quantitatively.

After surveying the field, I’ll turn to a discussion of some code I’ve written in R that implements these ideas using the ‘tuneR’ package, which is one of my favorite hidden gems from CRAN. Along the way, I’ll introduce some of the simplest tools from the ‘tuneR’ package that can be used for generating computer music.

Tuning Systems: Pythagorean, Just and 12-Tet

It’s worth noting right at the start that tuning is a misleading name for the topic we’ll be discussing: we’re not talking about how one tunes a fixed instrument so that it sounds in tune, but rather we’re interested in how one defines the very notes that the instrument should be able to produce when it’s perfectly in tune.

To make that clear, let’s assume that we’ve accepted as a given that a frequency of 440 Hz will be called A. Our problem then becomes one of deciding which of the infinitely many frequencies we could produce actually deserves the label of A#, B, C, C#, and so on.

Pythagorean Tuning

The simplest solution to this problem I know of is the Pythagorean tuning system. It’s based on constructing all of the possible notes using a series of perfect fifths. If you remember the Circle of Fifths, you’ll remember that you can reach every chromatic note by ascending fifths: if you start at A, you’ll proceed through E, B, F# and so on.

The Pythagorean system implements the Circle of Fifths directly using repeated multiplication of a base frequency. To do this, you first declare that a perfect fifth is at a frequency 3/2 above your base frequency. For example, this definition implies that the perfect fifth above the A at 440 Hz has to be at a frequency of 3/2 * 440 = 660 Hz. Once you do this, you’ve defined the frequency we’ll call E.

And following on with this logic, you produce a B at 990 Hz. Of course, this B occurs an octave above the base A at 440 Hz, so you transpose it down an octave to produce the B you’ll actually use. To do this, you need to assume that an octave is at a frequency 2 times the base frequency. Since we’ve accepted that 990 Hz is a B, we divide 990 by 2 and conclude that 495 Hz should be B.

With these three notes defined, we have the following table of frequency/note pairs:

Note Frequency Ratio with 440 Hz
A 440 Hz 1
E 660 Hz 3/2
B 495 Hz 9/8

If we continue on with this logic and calculate many more multiplications by 3/2 and divisions by 2, we will eventually produce a complete table for all of the notes in the chromatic scale that looks like the following:

Note Frequency Ratio
A 440 1
A# 463.5391 256/243
B 495 9/8
C 521.4815 32/27
C# 556.875 81/64
D 586.6667 4/3
D# 626.4844 729/512
E 660 3/2
F 695.3086 128/81
F# 742.5 27/16
G 782.2222 16/9
G# 835.3125 243/128
A 880 2

One thing about this table might strike you as odd if you’re mathematically savvy: the octave, which we’ve defined by fiat as a ratio of 2:1, could never have been produced by successive multiplication by 3/2, since no power of 3 will be evenly divisible by a power of 2. This is the one flub in the Pythagorean system: you can’t really produce the entire chromatic scale using only multiples of 3/2. Here we’ve solved that problem by replacing the note we would have called A with a true octave generated using multiplication by 2. Because the exact octave produced by Pythagorean tuning is slightly out of tune with our preferred definition of an octave, you may hear people refer to this discrepancy as the the Pythagorean comma.

Just Tuning

Given that we had to cheat a bit to create a proper octave using the Pythagorean tuning system based on multiples of 3/2, it makes sense to ask why we shouldn’t just allow ourselves to use other multipliers than 3/2. Looking at the Pythagoren tuning table, we see some pretty ugly fractions like 729/512. What if we forced these fractions to be simpler by employing ratios like 4/3 and 5/4 to build up the whole system?

The result of allowing ourselves several fractions beyond just those derived from 3/2 is called the just tuning system. Here we assume that perfect fifths occur at a frequency ratio of 3/2 and that perfect fourths occur at a frequency ratio of 4/3. Continuing on with this process, we eventually end up with the following tuning table:

Note Frequency Ratio
A 440 1
A# 469.3333 16/15
B 495 9/8
C 528 6/5
C# 550 5/4
D 586.6667 4/3
D# 625.7778 64/45
E 660 3/2
F 704 8/5
F# 733.3333 5/3
G 782.2222 16/9
G# 825 15/8
A 880 2

This is the tuning that early Classical music was written in. Looking at the table you con immediately appreciate the theoretical assertion that the relative dissonance of an interval is determined by the simplicity of the ratio of frequencies between the two notes: perfect fifths are 3/2 and major thirds are 5/4, while minor seconds are 16/15 and major sevenths are 15/8. This is one of the things I most enjoy about the theory of harmony: there’s a match between the aesthetics of fractions and the aesthetics of sounds that, for me, helps to justify my sense that certain fractions are more beautiful than others.

12 Tet / Equal-Temperament

Now, if you know the history of Bach’s Well-Tempered Clavier, you know that there is a problem with the just tuning system: it sounds great in the key you used as the base (here A), but it sounds a bit out of tune in other keys. The modern 12-tet system is the most recent approach to solving this problem: you assume the gap between two semitones (e.g. A to A# or A# to B) is always the exact same multiple. Since you’ll repeat this multiplication 12 times before reaching an octave, you can conclude that two notes that are a semitone apart must be separated by the 12th root of 2. Building a tuning system using that ratio alone gives us our modern system of tuning, which is shown in the table above using the decimal expansion of the ratios instead of their representation as powers of the 12th root of 2:

Note Frequency Ratio
A 440 1.000000
A# 466.1638 1.059463
B 493.8833 1.122462
C 523.2511 1.189207
C# 554.3653 1.259921
D 587.3295 1.334840
D# 622.2540 1.414214
E 659.2551 1.498307
F 698.4565 1.587401
F# 739.9888 1.681793
G 783.9909 1.781797
G# 830.6094 1.887749
A 880 2.000000

Listening to the Results

We’ve just described three ways to define the notes used in Western music. But how different do they sound? To answer that, I decided to produce a series of simple sine wave audio samples that were tuned using each of the three tuning systems. To produce those audio samples, I used the ‘tuneR’ package, which I’ll describe now. Before you read on, you should install it from CRAN using the standard install.packages('tuneR') invocation.

A tuneR Tutorial

The tuneR package is an extremely convenient tool for generating audio files from R based on a numeric description of the audio stream. For the purposes of this discussion of tuning systems, we simply need to produce basic sine waves. Thankfully, that’s very easy to do with tuneR. Here’s an example:

1
2
3
4
5
library('tuneR')
 
sound <- sine(440, bit = 16)
 
writeWave(sound, '440.wav')

Here we’ve loaded the tuneR package, created a 1s snippet of sine wave audio at 16 bits resolution using the sine function, and then written out the audio to a WAV file using writeWave. If you look at your current directory and listen to this file, you’ll hear a sine wave at 440 Hz.

If you want to explore the use of sine, you can easily play with the duration of the sound by changing the duration parameter. If you want to, you can also change the sample rate and the bit rate, but I don’t see any reason to do that while exploring ideas about tuning.

More important is knowing that you can superimpose two sine waves using the `+` operator and that you can concatenate them using the bind function. To show off producing octaves, for example, you might use the following code to hear an A at 440 Hz, then an A an octave above it, and finally the harmony they produce together:

1
2
3
4
5
6
7
library('tuneR')
 
sound <- bind(sine(440, bit = 16),
              sine(880, bit = 16),
              sine(440, bit = 16) + sine(880, bit = 16))
 
writeWave(sound, 'octaves.wav')

Unfortunately, this sample code produces an error because of the naive addition we’ve implemented using the `+` operator. Adding two sine waves directly together overfills the bit rate we’re using. To safely perform addition of two sine waves, we need to normalize the results of our summation using the normalize function. This gives us just one more line of code:

1
2
3
4
5
6
7
8
9
library('tuneR')
 
sound <- bind(sine(440, bit = 16),
              sine(880, bit = 16),
              sine(440, bit = 16) + sine(880, bit = 16))
 
sound <- normalize(sound, unit = '16')
 
writeWave(sound, 'octaves.wav')

For reasons that are not clear to me, you have to specify the bit rate to normalize using the unit parameter rather than the bit parameter.

Demoing Tuning Systems

Our little octave demo is cute, but we really want to know what more interesting harmonies like major thirds and minor seconds sound like in the various tuning systems we described. To do that, I first wrote a function called interval that spits out the multiplier you need to use to produce a given interval for any of the three tuning systems. That function is in a GitHub repository I’ve set up with code for making these demos. If you download that repository, you could load my interval function using a simple call to source like the one seen below. And using this interval function, we can generate demos of various intervals as follows:

1
2
3
4
5
6
7
8
9
10
11
library('tuneR')
source('interval.R')
 
base <- 440
 
sound <- sine(base) + sine(interval('minor-second',
                                    tuning = 'pythagorean') * base)
 
sound <- normalize(sound, unit = '16')
 
writeWave(sound, 'minor_second_pythagorean.wav')

On GitHub there’s a file called test_intervals.R that will go through and generate all of the intervals in all three tuning systems. If you run that file, you’ll generate a lot of audio files you can listen to as demos of the three tuning systems we’ve described. For me, these tuning systems all produce intervals that sound surprisingly similar, though at high volumes I find it moderately easy to hear slight differences between the tuning systems. That said, I very much doubt I would pick up on them in a normal musical context.

That’s the end of my little introduction to tuning systems and the use of the tuneR package to explore them. If you’re interested in thinking computationally about music, I highly recommend playing around with tuneR until you feel like you can produce interesting results. I’m already working on trying to build up some interesting timbres to work with.

Twitter Math Puzzle and Solution

Yesterday I posted a very simple math puzzle to Twitter that I found in Jonathan Baron’s book, Thinking and Deciding. The puzzle is the following:

Show that every number of the form ABC,ABC is divisible by 13.

The puzzle comes up in Baron’s book as an example of an “insight problem” in which one goes from not knowing the answer at all to knowing the complete answering in a sudden moment of insight.

Several people replied to my tweet with solutions: I especially like Will Townes’s solution. In particular, if you’re familiar with modular arithmetic, I like the logic of Will’s answer because it gives a simple generalization. First, represent ABC,ABC as ABC * 1000 + ABC * 1 rather than as ABC * 1001. Then notice that

  1. 1 = 1 mod 13
  2. 1000 = -1 mod 13

Thus ABC,ABC = ABC * -1 + ABC * 1 = 0 mod 13. This logic can be easily extended to show that (ABC,ABC,)*ABC,ABC = 0 mod 13 no matter how many times you repeat the ABC,ABC pattern.

Visualizing Periodic Data

Yesterday the Princeton machine learning reading group went through a paper by Tukey on “Some graphic and semigraphic displays”. One issue we talked about at length was Tukey’s idiosyncratic approach to visualizing periodic data in a circular format to emphasize the connections between the “start” and the “end” of the data set.

Allison Chaney pointed out that many fields (for instance, environmental engineering) might want to consider using these circular displays to make periodic trends clear to the viewer. That inspired me to try plotting periodic weather data using both a standard x-y plane display and a polar coordinates display. The results are shown below in two videos that I’ve uploaded to Vimeo:

There’s a clear tradeoff that’s being made when choosing between these two approaches: the polar coordinates plot, as promised, correctly connects the two “ends” of the data set. But it also makes it much harder to see the height of the graph at each point in time, so that the sinusoidal shape that can easily be seen in the x-y plane display is basically hidden in the polar coordinates display.

Since making these videos, it occurred to me that another potential visualization technique would be to project the data onto a cylinder, rather than a plane, and then progressively rotate the cylinder to reveal the time trend. This would allow heights to be seen properly, while emphasizing the periodicity. The problem with this cylindrical projection is that the entire data set is never fully visible at one time, but can only be seen by completing a full rotation of the data.

In his paper, Tukey describes one other approach: draw the periodic data twice so that the period is clearly visible. It wasn’t clear to me how to do this without some numeric hacks in ggplot2, so I’ll leave it to reader to search for Tukey’s example in the original paper.

ProjectTemplate News

The news below was recently reported on the ProjectTemplate mailing list. For completeness, I’m also reporting it here.

  • The first piece of ProjectTemplate news is that I won’t be the exclusive maintainer for ProjectTemplate anymore. Allen Goodman, who works at BankSimple, is now my co-maintainer and he has full commit privileges. In the next few months, the emerging group with commit privileges is likely to grow beyond the two of us, but hopefully just having one more person in charge of ProjectTemplate’s development will help to keep things moving forward.
  • There’s a new draft of ProjectTemplate available on GitHub. v0.3-1 fixes problems with the YAML configuration system not working on Windows 64 machines by switching over to the DCF format that R naturally supports. Editing your configuration scripts should be trivial, but be prepared for ProjectTemplate to break on your existing v0.2-1 projects until you’ve updated them to use DCF instead of YAML.
  • In addition to switching the configuration system over to DCF, ProjectTemplate v0.3-1 now uses namespaces and separate functions to implement all of the automatic data loading functions that were previously nested inside of load.project(). Hopefully this will make it easier for end users to override ProjectTemplate’s defaults, while allowing ProjectTemplate releases to automatically rolls out bug fixes to less advanced users. On that note, the list of supported file formats for automatic data loading is growing and new patches on that front are always welcome.
  • A minimal project format: Some people have asked for the option to create projects without some of the clutter that the standard project format creates, such as the diagnostics and profiling directories. There’s now a minimal project format that you can use by invoking create.project() with the option create.project(minimal = TRUE).
  • Starting in two weeks, the version of ProjectTemplate available on CRAN will stay in pace with the version on GitHub. If you’re still using v0.1-3, please consider upgrading or forking.
  • There is now an official ProjectTemplate website at http://projecttemplate.net/ that will hopefully be the start of a new era of better documentation for ProjectTemplate. While the material on the site is still in noticeably draft form, I expect the documentation to improve considerably in the near future. If anyone out there is a graphic designer and would like to make the new site look better, please let me know by e-mailing me at jmw@johnmyleswhite.com.

For now that’s all, but there’s more ProjectTemplate news coming soon. Stay tuned!

Speeding Up MLE Code in R

Recently, I’ve been fitting some models from the behavioral economics literature to choice data. Most of these models amount to non-linear variants of logistic regression in which I want to infer the parameters of a utility function. Because several of these models aren’t widely used, I’ve had to write my own maximum likelihood code to estimate the parameters of these models.

In the process, I’ve started to learn something about how to write code that runs quickly in R. In this post, I’ll try to share some of that knowledge by describing three ways of performing maximum likelihood estimation in R whose runtimes differ by two orders of magnitude. The differences seem to depend upon two factors: (1) how I access the entries of a data frame and (2) whether I use loops or vectorized operations to perform basic arithmetic.

To simplify things, I’ll present a model that should be familiar to people with a background in economics: the exponentially discounted utility model. To implement it in R, we define the discounted value of x dollars at time t as:

1
2
3
4
discounted.value <- function(x, t, delta)
{
  return(x * delta ^ t)
}

In addition to the discounted utility model, we assume that choices originate from a stochastic choice model with logistic noise. To invert this noise during inference, we’ll use the inverse logit transform:

1
2
3
4
invlogit <- function(z)
{
  return(1 / (1 + exp(-z)))
}

To test my inference routine, I need to generate “stochastic” data of the sort you would expect to see from an exponentially discounting agent that’s indifferent between having $1 at time t = 0 and $3 at time t = 1. I’ll refer to the first good as (X1, T1) and the second good as (X2, T2). If the agent chooses (X2, T2), I’ll write that as C == 1; if they choose (X1, T1), I’ll write that as C == 0. With those conventions, the sample data is generated as:

1
2
3
4
5
6
7
n <- 100
 
choices <- data.frame(X1 = rep(1, each = n),
                      T1 = rep(0, each = n),
                      X2 = rep(3, each = n),
                      T2 = rep(1, each = n),
                      C = rep(c(0, 1), by = n / 2))

To fit the exponential model to this data set, we’ll use the optim function to minimize the negative log likelihood of the data by setting two parameters: a, the variance of the noise in the utility function; and delta, the discount factor in the discounted utility model. The three implementations of this model that I’ll show only differ in the definition of the log likelihood function, so the final call to optim to perform maximum likelihood estimation is constant across all examples:

1
2
3
4
5
6
logit.estimator <- function(choices)
{ 
  wrapper <- function(x) {-log.likelihood(choices, x[1], x[2])}
  optimization.results <- optim(c(1, 1), wrapper, method = 'L-BFGS-B', lower = c(0, 0), upper = c(Inf, 1))
  return(optimization.results$par)
}

Here, I had to specify bounds for the parameters, a and delta, because it’s assumed that a must be positive and that delta must lie in the interval [0, 1]. To deal with these bounds, one has to use the L-BFGS-B method in optim.

The first implementation I’ll show is the one I find most natural to write, even though it turns out to be the least efficient by far:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
log.likelihood <- function(choices, a, delta)
{
  ll <- 0
 
  for (i in 1:nrow(choices))
  {
    u2 <- discounted.value(choices[i, 'X2'], choices[i, 'T2'], delta)
    u1 <- discounted.value(choices[i, 'X1'], choices[i, 'T1'], delta)
 
    p <- invlogit(a * (u2 - u1))
 
    if (choices[i, 'C'] == 1)
    {
      ll <- ll + log(p)
    }
    else
    {
      ll <- ll + log(1 - p)
    }
  }
 
  return(ll)
}

In the second implementation, I define a row level likelihood function, so that the summing and logarithmic transform are vectorized.

1
2
3
4
5
6
7
8
9
10
11
12
13
rowwise.likelihood <- function(row, a, delta)
{
  u2 <- discounted.value(row['X2'], row['T2'], delta)
  u1 <- discounted.value(row['X1'], row['T1'], delta)
  p <- invlogit(a * (u2 - u1))
  return(ifelse(row['C'] == 1, p, 1 - p))
}
 
log.likelihood <- function(choices, a, delta)
{
  likelihoods <- apply(choices, 1, function (row) {rowwise.likelihood(row, a, delta)})
  return(sum(log(likelihoods)))
}

In the third implementation, I define a fully vectorized log likelihood function that avoids any explicit iteration and therefore removes most of the data frame indexing operations:

1
2
3
4
5
6
7
8
log.likelihood <- function(choices, a, delta)
{
  u2 <- discounted.value(choices$X2, choices$T2, delta)
  u1 <- discounted.value(choices$X1, choices$T1, delta)
  p <- invlogit(a * (u2 - u1))
  likelihoods <- ifelse(choices$C == 1, p, 1 - p)
  return(sum(log(likelihoods)))
}

The code I used to call all of these implementations and compare them is up on GitHub for those interested. The results, which strike me as remarkable, are below:

  1. On my laptop, implementation 1 takes ~1.0 second to run.
  2. On my laptop, implementation 2 takes ~0.25 seconds to run.
  3. On my laptop, implementation 3 takes ~0.01 seconds to run.

In short, the third implementation is 100x faster than the first implementation with only minor changes to the code I originally wrote. Hopefully this example will help inspire others who have R code they’d like to speed up, but aren’t sure where to start.

The Post-Lehman Era

The existence of recessions no more invalidates economic theory than the existence of AIDS invalidates molecular biology.

Norvig and the Nature of Modern Science

In this, Chomsky is in complete agreement with O’Reilly. (I recognize that the previous sentence would have an extremely low probability in a probabilistic model trained on a newspaper or TV corpus.)1

Anyone who considers themself an intellectual should be required to read this new essay by Peter Norvig. It’s the best summary I’ve ever seen of the many types of science that now exist in our world — almost all of which are moving away from the simple algebraic, deterministic models of the world that fill high school science textbooks.

  1. On Chomsky and the Two Cultures of Statistical Learning

Problems with ggplot2 0.8.9 and R 2.13.0 on Mac OS X via plyr 1.5

This morning I tried to completely update my R installation. I first dumped a list of all the packages I have on my system using the installed.packages() function. Then I installed R 2.13.0 using the OS X disk image. And finally I reinstalled all of my packages from scratch.

Unfortunately, I ran into some serious problems along the way. After installing everything from scratch, ‘ggplot2′ 0.8.9 was broken. Specifically, I couldn’t get error bars to work with stat_summary(). For example, this code wouldn’t work on my system:

1
2
3
4
5
6
7
8
9
10
11
# Problem with ggplot2 Version "0.8.9"
 
library('ggplot2')
 
set.seed(1)
 
example.data <- data.frame(Measurement = rnorm(5, 0, 1), Class = rep('A', 5))
 
ggplot(example.data, aes(x = Class, y = Measurement)) +
  stat_summary(fun.data = 'mean_cl_boot', geom = 'bar') +
  stat_summary(fun.data = 'mean_cl_boot', geom = 'errorbar')

Thankfully, I managed to enlist Dirk Eddelbuettel’s help through Twitter and he ran the code on his own recently updated system. Things worked fine for him, which suggested that the problem was in my system configuration. We compared package versions and discovered that he had ‘plyr’ 1.5.1 on his Ubuntu machine, while I had ‘plyr’ 1.5 on my OS X machine. After looking at CRAN, it was clear that the Mac OS X build wasn’t available on CRAN yet.

To fix this, I grabbed the source for ‘plyr’ 1.5.1 and tried to install it myself. That led to the following error:

1
2
** preparing package for lazy loading
Error: package 'plyr' is required by 'reshape' so will not be detached

The problem was that ‘reshape’ was being loaded automatically when R was starting up. Since ‘reshape’ depends on ‘plyr’, R wasn’t willing to overwrite my old ‘plyr’ 1.5 with the new ‘plyr’ 1.5.1. The solution was to edit my .Rprofile file to prevent ‘reshape’ from being autoloaded. Once I did this, I was able to run the standard R CMD INSTALL and get the new version of ‘plyr’ on my system. And after that ‘ggplot2′ 0.8.9 started working properly.

Hopefully no one else will come up against the same issue after the binary for ‘plyr’ 1.5.1 gets pushed through all of the CRAN mirrors. But if you get errors while using ‘ggplot2′ 0.8.9, look into installing ‘plyr’ 1.5.1 from source on your system.

Many thanks to Dirk for giving me so much help today.

A Request for Foursquare Data

[UPDATE 3/28/2011: Fixed an enormous bug in the R code.]

I’m trying to collect data sets that showcase how the classical statistical distributions appear in modern contexts. I’ve already got some data that shows how the gamma distribution appears in video game scores, and now I’m hoping to find an example where the exponential distribution shows up. I think that checkins for Foursquare might be a good place to start.

To test this intuition, I’m hoping to collect some pilot data. Below you’ll find some code that you can use to help me gather data.

First, there’s a shell script to gather your own checkin data from FourSquare. To use this script, you need to substitute your e-mail address where EMAIL appears and your password where PASSWORD appears in the code below:

1
curl -u 'EMAIL:PASSWORD' https://api.foursquare.com/v1/history?l=250 > checkin_history.xml

And second there’s an R script you can use to preprocess the data from the last step into a nice format before sending it to me. If you’re not an R user, you can easily skip this step and send the data you have in its raw XML format.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
library('plyr')
library('XML')
filename <- 'checkin_history.xml'
tree <- xmlTreeParse(filename, asTree = TRUE)
checkins <- tree$doc$children$checkins
venue.names <- c()
latitudes <- c()
longitudes <- c()
for (i in 1:length(checkins))
{
  venue.names <- c(venue.names, as.character(checkins[i]$checkin[['venue']][['name']][['text']])[6])
  latitudes <- c(latitudes, as.numeric(unclass(checkins[i]$checkin[['venue']][['geolat']][['text']])$value))
  longitudes <- c(longitudes, as.numeric(unclass(checkins[i]$checkin[['venue']][['geolong']][['text']])$value))
}
checkin.data <- data.frame(Venue = factor(venue.names), Latitude = as.numeric(latitudes), Longitude = as.numeric(longitudes))
count.data <- ddply(checkin.data, 'Venue', nrow)
names(count.data) <- c('Venue', 'TotalCheckins')
write.csv(count.data, file = 'count_data.csv', row.names = FALSE)

After running these two pieces of code, the output file, count_data.csv, should look like this:

Venue TotalCheckins
“Brooklyn Boulders” 13

Once you’ve got data, you can send it to me by e-mail at jmw@johnmyleswhite.com.

Spam Comments

I may have deleted a genuine comment by accident today while cleaning out the spam queue. My apologies if it was yours. I generally delete all the comments in my spam queue without checking their contents carefully because of their quantity. Only after clicking the delete button a few moments ago did I notice that one of the comments contained the phrase “random variable” and might actually have been a genuine comment. Sadly, it’s irretrievably gone.