Archives by date

You are browsing the site archives by date.

Type Safety and Statistical Computing

I broadly believe that the statistics community would benefit from greater exposure to computer science concepts. Consistent with that belief, I argue in this post that the concept of type-safety could be used to develop a normative theory for how statistical computing systems ought to behave. I also argue that such a normative theory would […]

Once Again: Prefer Confidence Intervals to Point Estimates

Today I saw a claim being made on Twitter that 17% of Jill Stein supporters in Louisiana are also David Duke supporters. For anyone familiar with US politics, this claim is a priori implausible, although certainly not impossible. Given how non-credible this claim struck me as being, I decided to look into the origin of […]

Claims and Evidence: A Joke

The other day a friend posted the following old joke about the level of rigor that mathematicians usually require. (Disclaimer: if you take the joke as a serious claim about the standards of quality in the other fields referenced in the joke, it is an obviously unfair characterization of both astronomy and physics.) A Mathematician, […]

No juice for you, CSV format. It just makes you more awful.

I just found a minimal example of how easy it is to confuse R’s CSV parser when providing it with ill-formatted data. To make it easy to understand, I put material for reproducing the problem up on GitHub. I’m sure one could construct many similar examples for Python and Julia. The problem here is two-fold: […]

Turning Distances into Distributions

Deriving Distributions from Distances Several of the continuous univariate distributions that frequently come up in statistical theory can be derived by transforming distances into probabilities. Essentially, these distributions only differ in terms of how frequently values are drawn that lie at a distance \(d\) from the mode. To see how these transformations work (and unify […]

The Convexity of Improbability: How Rare are K-Sigma Effects?

In my experience, people seldom appreciate just how much more compelling a 5-sigma effect is than a 2-sigma effect. I suspect part of the problem is that p-values don’t invoke the visceral sense of magnitude that statements of the form, “this would happen 1 in K times”, would invoke. To that end, I wrote a […]

Why I'm Not a Fan of R-Squared

Why I’m Not a Fan of R-Squared

The Big Message People sometimes use \(R^2\) as their preferred measure of model fit. Unlike quantities such as MSE or MAD, \(R^2\) is not a function only of model’s errors, its definition contains an implicit model comparison between the model being analyzed and the constant model that uses only the observed mean to make predictions. […]

A Variant on "Statistically Controlling for Confounding Constructs is Harder than you Think"

A Variant on “Statistically Controlling for Confounding Constructs is Harder than you Think”

Yesterday, a coworker pointed me to a new paper by Jacob Westfall and Tal Yarkoni called “Statistically controlling for confounding constructs is harder than you think”. I quite like the paper, which describes some problems that arise when drawing conclusions about the relationships between theoretical constructs using only measurements of observables that are, at best, […]

Understanding the Pseudo-Truth as an Optimal Approximation

Understanding the Pseudo-Truth as an Optimal Approximation

Introduction One of the things that set statistics apart from the rest of applied mathematics is an interest in the problems introduced by sampling: how can we learn about a model if we’re given only a finite and potentially noisy sample of data? Although frequently important, the issues introduced by sampling can be a distraction […]