No juice for you, CSV format. It just makes you more awful.

I just found a minimal example of how easy it is to confuse R’s CSV parser when providing it with ill-formatted data. To make it easy to understand, I put material for reproducing the problem up on GitHub. I’m sure one could construct many similar examples for Python and Julia. The problem here is two-fold: […]

Why Julia’s DataFrames are Still Slow

Introduction Although I’ve recently decided to take a break from working on OSS for a little while, I’m still as excited as ever about Julia as a language. That said, I’m still unhappy with the performance of Julia’s core data analysis infrastructure. The performance of code that deals with missing values has been substantially improved […]

What’s Wrong with Statistics in Julia?

Introduction Several months ago, I promised to write an updated version of my old post, “The State of Statistics in Julia”, that would describe how Julia’s support for statistical computing has evolved since December 2012. I’ve kept putting off writing that post for several reasons, but the most important reason is that all of my […]

The Lesser Known Normal Forms of Database Design

-1st Normal Form: The database contains at least one table that is an exact copy of another table, except with additional columns. -2nd Normal Form: The database contains at least one table that is a corrupt, out-of-date copy of another table, except with additional columns. It is impossible to determine if these additional columns can […]

Values vs. Bindings: The Map is Not the Territory

Many newcomers to Julia are confused by the seemingly dissimilar behaviors of the following two functions: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 julia> a = [1, 2, 3] […]

That Way Madness Lies: Arithmetic on data.frames

tl;dr Please do not use arithmetic on data.frame objects when programming in R. It’s a hack that only works if you know everything about your datasets. If anything happens to change the order of the rows in your data set, previously safe data.frame arithmetic operations will produce incorrect answers. If you learn to always explicitly […]

My Experience at JuliaCon

Introduction I just got home from JuliaCon, the first conference dedicated entirely to Julia. It was a great pleasure to spend two full days listening to talks about a language that I started advocating for just a little more than two years ago. What follows is a very brief review of the talks that excited […]

The Relationship between Vectorized and Devectorized Code

Introduction Some people have come to believe that Julia’s vectorized code is unusably slow. To correct this misconception, I outline a naive benchmark below that suggests that Julia’s vectorized code is, in fact, noticeably faster than R’s vectorized code. When experienced Julia programmers suggest that newcomers should consider devectorizing code, we’re not trying to beat […]

Writing Type-Stable Code in Julia

For many of the people I talk to, Julia’s main appeal is speed. But achieving peak performance in Julia requires that programmers absorb a few subtle concepts that are generally unfamiliar to users of weakly typed languages. One particularly subtle performance pitfall is the need to write type-stable code. Code is said to be type-stable […]