# Statistics

## The Convexity of Improbability: How Rare are K-Sigma Effects?

In my experience, people seldom appreciate just how much more compelling a 5-sigma effect is than a 2-sigma effect. I suspect part of the problem is that p-values don’t invoke the visceral sense of magnitude that statements of the form, “this would happen 1 in K times”, would invoke. To that end, I wrote a […]

## Why I’m Not a Fan of R-Squared

The Big Message People sometimes use $$R^2$$ as their preferred measure of model fit. Unlike quantities such as MSE or MAD, $$R^2$$ is not a function only of model’s errors, its definition contains an implicit model comparison between the model being analyzed and the constant model that uses only the observed mean to make predictions. […]

## A Variant on “Statistically Controlling for Confounding Constructs is Harder than you Think”

Yesterday, a coworker pointed me to a new paper by Jacob Westfall and Tal Yarkoni called “Statistically controlling for confounding constructs is harder than you think”. I quite like the paper, which describes some problems that arise when drawing conclusions about the relationships between theoretical constructs using only measurements of observables that are, at best, […]

## Understanding the Pseudo-Truth as an Optimal Approximation

Introduction One of the things that set statistics apart from the rest of applied mathematics is an interest in the problems introduced by sampling: how can we learn about a model if we’re given only a finite and potentially noisy sample of data? Although frequently important, the issues introduced by sampling can be a distraction […]

## Some Observations on Winsorization and Trimming

Over the last few months, I’ve had a lot of conversations with people about the use of winsorization to deal with heavy-tailed data that is positively skewed because of large outliers. After a conversation with my friend Chris Said this past week, it became clear to me that I needed to do some simulation studies […]

## What’s Wrong with Statistics in Julia?

Introduction Several months ago, I promised to write an updated version of my old post, “The State of Statistics in Julia”, that would describe how Julia’s support for statistical computing has evolved since December 2012. I’ve kept putting off writing that post for several reasons, but the most important reason is that all of my […]

## That Way Madness Lies: Arithmetic on data.frames

tl;dr Please do not use arithmetic on data.frame objects when programming in R. It’s a hack that only works if you know everything about your datasets. If anything happens to change the order of the rows in your data set, previously safe data.frame arithmetic operations will produce incorrect answers. If you learn to always explicitly […]

## My Experience at JuliaCon

Introduction I just got home from JuliaCon, the first conference dedicated entirely to Julia. It was a great pleasure to spend two full days listening to talks about a language that I started advocating for just a little more than two years ago. What follows is a very brief review of the talks that excited […]

## Falsifiability versus Rationalization

Here are two hypothetical conversations about psychological research. I’ll leave it to others to decide whether these conversation could ever take place. Theories are just directional assertions about effects Person A: And, just as I predicted, I found in my early studies that the correlation between X and Y is 0.4. Person B: What do […]

## A Note on the Johnson-Lindenstrauss Lemma

Introduction A recent thread on Theoretical CS StackExchange comparing the Johnson-Lindenstrauss Lemma with the Singular Value Decomposition piqued my interest enough that I decided to spend some time last night reading the standard JL papers. Until this week, I only had a vague understanding of what the JL Lemma implied. I previously mistook the JL […]