Some Observations on Winsorization and Trimming

Over the last few months, I’ve had a lot of conversations with people about the use of winsorization to deal with heavy-tailed data that is positively skewed because of large outliers. After a conversation with my friend Chris Said this past week, it became clear to me that I needed to do some simulation studies […]

Why Julia’s DataFrames are Still Slow

Introduction Although I’ve recently decided to take a break from working on OSS for a little while, I’m still as excited as ever about Julia as a language. That said, I’m still unhappy with the performance of Julia’s core data analysis infrastructure. The performance of code that deals with missing values has been substantially improved […]

Rereading Meehl

Lately, I’ve been rereading a lot of Meehl’s papers on the epistemological problems with research in psychology. This passage from “The Problem Is Epistemology, Not Statistics: Replace Significance Tests by Confidence Intervals and Quantify Accuracy of Risky Numerical Predictions” strikes me as an almost perfect summary of his concerns, although it’s quite abstract and assumes […]