A Variant on "Statistically Controlling for Confounding Constructs is Harder than you Think"

2016-02-25 3-minute read

Yesterday, a coworker pointed me to a new paper by Jacob Westfall and Tal Yarkoni called “Statistically controlling for confounding constructs is harder than you think”. I quite like the paper, which describes some problems that arise when drawing conclusions about the relationships between theoretical constructs using only measurements of observables that are, at best, approximations to those theoretical constructs.

Given their intended audience, I think Westfall and Yarkoni’s approach is close to optimal: they motivate their concerns using examples that depend only upon the reader having a basic understanding of linear regression and measurement error. That said, the paper made me wish that more psychologists were familiar with Judea Pearl’s work on graphical techniques for describing causal relationships.

To see how a graphical modeling approach could be used to articulate Westfall and Yarkoni’s point, let’s assume that the true causal structure of the world looks like the graph shown in Figure 1 below:

For those who’ve never seen such a graph before, the edges are directed in a way that indicates causal influence. This graph says that the objective temperature on any given day is the sole cause of (a) the subjective temperature experienced by any person, (b) the amount of ice cream consumed on that day, and (c) the number of deaths in swimming pools on the day.

If this graph reflects the true causal structure of the world, then the only safe way to analyze data about the relationship between ice cream consumption and swimming pool deaths is to condition on objective temperature. If highlighting represents conditioning on a variable, the conclusion is that a correct analysis must look like the following colored-in graph:

Westfall and Yarkoni’s point is that this correct analysis is frequently not performed because the objective temperature variable is not observed and therefore cannot be conditioned on. As such, one conditions on subjective temperature instead of objective temperature, leading to this colored-in graph:

The problem with conditioning on subjective temperature instead of objective temperature is that you have failed to block the relevant path in the graph, which is the one that links ice cream consumption and swimming pool deaths by passing through objective temperature. Because the relevant path of dependence is not properly blocked, an erroneous relationship between ice cream consumption and swimming pool deaths leaks through. It is this leaking through that causes the high false positive rates that are the focus of Westfall and Yarkoni’s paper.

One reason I like this graphical approach is that it makes it clear that the opposite problem can occur as well: the world might have also have a causal structure such that one must condition on subjective temperature, but conditioning on objective temperature is always insufficient.

To see how that could occur, consider this alternative hypothetical causal structure for the world:

If this alternative structure were correct, then a correct analysis would need to color in subjective temperature:

An incorrect analysis would be to color in objective temperature rather than subjective temperature:

But, somewhat surprising, coloring in both would be fine:

It’s taken me some time to master this formalism, but I now find it quite easy to reason about these kinds of issues thanks to the brevity of graphical models as a notational technique. I’d love to see this approach become more popular in psychology, given that it has already become quite widespread in other fields. Of course, Westfall and Yarkoni are already advocating for something very similar by advocating for the use of SEM’s, but the graphical approach is strictly more general than SEM’s and, in my personal opinion, strictly simpler to reason about.