Archives by date

You are browsing the site archives by date.

Promising R Packages

As a quick note, here are two R packages that were mentioned to me recently and that look promising: reldist and mixtools.

EM and Regression Mixture Modeling

EM and Regression Mixture Modeling

[UPDATE: As Will points out in the comments, this isn’t really the EM algorithm. There isn’t a proper E step, because there’s no distribution being estimated: there’s only a maximization step that alternates between maximizing the class labels and the slopes. You can think of this algorithm as a degenerate version of EM in the […]

Apologies for Polluting Twitter

I’d like to publicly apologize to anyone that follows me on Twitter and saw the argument I started with two people yesterday morning. While I still believe that the people on the other side of the argument had behaved inappropriately enough that someone needed to confront them, my actual reaction was completely counter-productive and represented […]

R Recommendation Contest Launches on Kaggle

The R Recommendation Engine contest is now live on Kaggle. Please head over there and start submitting your predictions for the test data set. Once you do, you can check the leaderboard to see how your algorithm compares with other people’s work. We know that there’s still plenty of progress that can be made, because […]

Build a Recommendation System for R Packages

On Dataists, a new collaborative blog for data hackers that I’m contributing to, we’ve just announced a data contest that’s custom made for R users. To win the contest, you need to build a recommendation system for R packages. To find out more, check out the official announcement on Dataists. Then go to GitHub to […]

ProjectTemplate Version 0.1-3 Released

I’ve just released the newest version of ProjectTemplate. The primary change is a completely redesigned mechanism for automatically loading data. ProjectTemplate can now read compressed CSV files, access CSV data files over HTTP, read Stata, SPSS and RData binary files and even load MySQL database tables automatically. For my own projects, this is a big […]

Three-Quarter Truths: Correlation Is Not Causation

Three-Quarter Truths: Correlation Is Not Causation

Other than our culture’s implicit association between lies, damned lies and statistics, I think no idea has stifled the growth of statistical literacy as much as the endless repetition of the words correlation is not causation. This phrase seems to be primarily used to suppress intellectual inquiry by encouraging the unspoken assumption that correlational knowledge […]