Statistics

That Way Madness Lies: Arithmetic on data.frames

tl;dr Please do not use arithmetic on data.frame objects when programming in R. It’s a hack that only works if you know everything about your datasets. If anything happens to change the order of the rows in your data set, previously safe data.frame arithmetic operations will produce incorrect answers. If you learn to always explicitly […]

My Experience at JuliaCon

Introduction I just got home from JuliaCon, the first conference dedicated entirely to Julia. It was a great pleasure to spend two full days listening to talks about a language that I started advocating for just a little more than two years ago. What follows is a very brief review of the talks that excited […]

Falsifiability versus Rationalization

Here are two hypothetical conversations about psychological research. I’ll leave it to others to decide whether these conversation could ever take place. Theories are just directional assertions about effects Person A: And, just as I predicted, I found in my early studies that the correlation between X and Y is 0.4. Person B: What do […]

A Note on the Johnson-Lindenstrauss Lemma

Introduction A recent thread on Theoretical CS StackExchange comparing the Johnson-Lindenstrauss Lemma with the Singular Value Decomposition piqued my interest enough that I decided to spend some time last night reading the standard JL papers. Until this week, I only had a vague understanding of what the JL Lemma implied. I previously mistook the JL […]

Data corruption in R 3.0.2 when using read.csv

Introduction It may be old news to some, but I just recently discovered that the automatic type inference system that R uses when parsing CSV files assumes that data sets will never contain 64-bit integer values. Specially, if an integer value read from a CSV file is too large to fit in a 32-bit integer […]

The Relationship between Vectorized and Devectorized Code

Introduction Some people have come to believe that Julia’s vectorized code is unusably slow. To correct this misconception, I outline a naive benchmark below that suggests that Julia’s vectorized code is, in fact, noticeably faster than R’s vectorized code. When experienced Julia programmers suggest that newcomers should consider devectorizing code, we’re not trying to beat […]

Writing Type-Stable Code in Julia

For many of the people I talk to, Julia’s main appeal is speed. But achieving peak performance in Julia requires that programmers absorb a few subtle concepts that are generally unfamiliar to users of weakly typed languages. One particularly subtle performance pitfall is the need to write type-stable code. Code is said to be type-stable […]

September Talks

To celebrate my last full month on the East Coast, I’m doing a bunch of talks. If you’re interested in hearing more about Julia or statistics in general, you might want to come out to one of the events I’ll be at: Julia Tutorial at DataGotham: On 9/12, Stefan and I will be giving a […]

Hopfield Networks in Julia

Hopfield Networks in Julia

As a fun side project last night, I decided to implement a basic package for working with Hopfield networks in Julia. Since I suspect many of the readers of this blog have never seen a Hopfield net before, let me explain what they are and what they can be used for. The short-and-skinny is that […]

What’s Next

The last two weeks have been full of changes for me. For those who’ve been asking about what’s next, I thought I’d write up a quick summary of all the news. (1) I successfully defended my thesis this past Monday. Completing a Ph.D. has been a massive undertaking for the past five years, and it’s […]