Neil Kodner wrote a great post this morning about yesterday’s Twifficiency scores outbreak. He grabbed all the auto-tweeted scores he could find and plotted their distribution. I was struck by the asymmetry of the resulting distribution, which you can see below:

Thankfully, Neil handed me the raw data for his plot, so I was able to run a K-S test for normality, which rejected normality pretty easily, though I’m coming up with a tie that I’m surprised by:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | scores <- read.csv('twifficiencyscores.txt', header = FALSE) scores <- scores[,1] m <- mean(scores) s <- sd(scores) ks.test(scores, 'pnorm', m, s) # # One-sample Kolmogorov-Smirnov test # #data: scores #D = 0.0616, p-value < 2.2e-16 #alternative hypothesis: two-sided # #Warning message: #In ks.test(scores, "pnorm", m, s) : # cannot compute correct p-values with ties |

I suppose that I’m a bit worried that the p-value is simply a reflection of sample size here, since there are 7089 measurements. Would it be more compelling to bootstrap the D score from the K-S test on samples of 500 scores at a time to confirm that the non-normality is present even in small groups of scores?

Assuming that the data really has a skewed distribution, does anyone understand the scoring system well enough to say what produces the asymmetry?

I know nothing of the underlying algo. But ignorance never constrains me from telling a good story. So how about this:

There are probably two drivers of the asymmetry. I suspect the first one is the algorithm of the scoring which may be log based. If I were writing it I’d make it log based or else you have the problem of the most popular folks on Twitter being orders of magnitude stronger than the rest of us mere mortals.

The other factor could be the combined impact of network effects (power laws) and self selection in the twifficiency testing. The lump from 25-55 in the graph could be that people in that range are more likely to take a ‘vanity test’ like twifficiency. The right tail could be that there just aren’t many folks up that high given the algo. The left tail thinness could be that folks with that low of a score didn’t hear about the test (network effect) and/or were not interested in taking this type of test. Folks in the middle both heard about it, and were inclined to take the test.

So it’s a bit of a conditional probability issue. To have taken the test in the 24 hours (that right?) that Neil captured you had to have heard of the test and self selected to take the test. Two conditions both of which probably are a function of the thing the test measures.

As an economist I feel like I should have an “on the other hand” comment…. I’ll work on that.

-JD

Looks like it could be log normally distributed, whatever it is. Or perhaps the logarithm of some power law distributed latent score. The distribution has a consistent form, but is ugly.

The log normal distribution sounds plausible, Ryan: I should test for that. Still not clear what the mechanism is, but I don’t really know anything about the score calculations.