# Why Use ProjectTemplate or Any Other Framework?

We use frameworks like Ruby on Rails or ProjectTemplate to minimize the time we spend on irrelevant details. By definition, an irrelevant detail isn’t of interest to us. But how can we tell which details are irrelevant? This isn’t a trivial task and it seems to be, on the surface, a profoundly subjective matter.

Thankfully, it’s much simpler to point to examples of irrelevant details than to provide a general theory, though I do have a general theory that I’ll describe in a follow up post. For now, let’s look at a source code file for a theoretical data analysis project written in R. The file is called load_data.R:

 1 2 3  choices <- read.csv('data/choices.csv', header = TRUE, sep = ',') conditions <- read.csv('data/conditions.csv', header = TRUE, sep = ',') metadata <- read.csv('data/metadata.csv', header = TRUE, sep = ',')

My claim is that most of the code in this file is concerned with irrelevant details. How can you tell that these details are irrelevant? In this example, there are two flaws that are obvious to me:

1. I repeat the code that calls read.csv() for each of the data files I’m loading.
2. I’ve hardcoded the data set’s filenames, so I can’t use this script in another project without editing it first.
3. Because the filenames and variables are all hardcoded, I have to edit this file manually even when I add new data files to this project.

Solving these three problems was the first step that led me to design ProjectTemplate. Rather than rewrite load_data.R each time I start a new analysis project, it’s much easier to create a single generic script that can be used in all of my projects, even if I will only use the generic script as a base for something customized to the project on hand. By using a script, rather than a function, I can make it easy to change anything that needs to be customized for each specific project, such as skipping one enormous file that I don’t need to load each time or loading some data set from SQL databases. This default script approach is how I get around having to exploit R’s inheritance system, which seems sufficiently complex that I’d rather not emulate Rails’ approach directly.

1. CSV files always have a header line and use comma separators.
2. Every data file for a project is contained in the data directory.
3. The file X.csv will always get loaded into a data frame called X.

With these assumptions, coding a generic script is simple enough, if we use some of the more abstract functions in R like assign: