Statistics

Review of R Graphs Cookbook

The kind people at Packt Publishing recently asked me to review one of their newest R books: the R Graphs Cookbook. In general, I think pretty highly of the book: it provides a nice overview of the basic tools for visualizing data in R. If you’re just getting started with creating graphs in R, this [...]

Modern Science and the Bayesian-Frequentist Controversy

The Bayesian-Frequentist debate reflects two different attitudes to the process of doing science, both quite legitimate. Bayesian statistics is well-suited to individual researchers, or a research group, trying to use all the information at its disposal to make the quickest possible progress. In pursuing progress, Bayesians tend to be aggressive and optimistic with their modeling [...]

Inconsistencies in Bayesian Models of Decision-Making

But modeling devices that make sense for an unbiased decisionmaker may not make sense for a biased one. For example, why would individuals have priors and posteriors if they are destined to apply Bayes’ law incorrectly?1 A question I often ask myself. Wolfgang Pesendorfer : Behavioral Economics Comes of Age: A Review Essay on Advances [...]

Academic Jargon: Field-Specific Insults

Every academic field seems to develop a set of generic insults based on their intellectual toolkit. Here are two examples I hear often: Probabilists and Statisticians: “I think that’s an interesting case, but it’s in a set with measure zero.” Economists: “X group’s behavior is clearly rent-seeking.” Do any readers have good examples from other [...]

A Draft of ProjectTemplate v0.2-1

I’ve just uploaded a new binary of ProjectTemplate to GitHub. This is a draft version of the next release, v0.2-1, which includes some fairly substantial changes and is backwards incompatible in several ways with previous versions of ProjectTemplate. Foremost of the changes is that most of the logic for load.project() is now built into the [...]

The NYC Marathon

The NYC Marathon

New York’s annual marathon took place yesterday. Watching a bit of it on television with my friends, I was struck by the much earlier starting time for women than men. Specifically, professional women started running yesterday at 9:10 AM, while professional men start running at 9:40 AM. (This information comes from the runner’s handbook.) I [...]

The Answer Depends on the Question

To quote from the preface to the first edition in Jeffreys (1961): ‘It is sometimes considered a paradox that the answer depends not only on the observations but on the question; it should be a platitude.’1 Generalized Linear Models : P. McCullagh and J. A. Nelder : Chapter 2↩

Promising R Packages

As a quick note, here are two R packages that were mentioned to me recently and that look promising: reldist and mixtools.

EM and Regression Mixture Modeling

EM and Regression Mixture Modeling

[UPDATE: As Will points out in the comments, this isn't really the EM algorithm. There isn't a proper E step, because there's no distribution being estimated: there's only a maximization step that alternates between maximizing the class labels and the slopes. You can think of this algorithm as a degenerate version of EM in the [...]

R Recommendation Contest Launches on Kaggle

The R Recommendation Engine contest is now live on Kaggle. Please head over there and start submitting your predictions for the test data set. Once you do, you can check the leaderboard to see how your algorithm compares with other people’s work. We know that there’s still plenty of progress that can be made, because [...]