Using JAGS in R with the rjags Package

Get Everything Set Up

I’m going to assume that you have access to a machine that will run JAGS. If you don’t, then you should be able to use WinBUGS, which is very easy to get set up. Unfortunately, the details of what follows may not help you as much if you’re using WinBUGS.

To set up your system for using JAGS, there are two very easy steps:

  1. Go download the current version of JAGS (2.1.0 as of 8/20/2010).
  2. Install the current rjags package from CRAN (2.1.0-6 as of 8/20/2010).

Once you’ve done that, a simple call to library('rjags') will be enough to run JAGS from inside of R. You’ll want to do everything except model specification in R. You’ll specify the model in a separate file using BUGS/JAGS syntax.

Example 1: Inference on Normally Distributed Data

Let’s assume that you’ve got a bunch of data points from a normal distribution with unknown mean and variance. This is arguably the simplest data set you can analyze with JAGS. So, how do you perform the analysis?

First, let’s create some simulation data that we’ll use to test our JAGS model specification:

1
2
3
4
5
6
7
N <- 1000
x <- rnorm(N, 0, 5)
 
write.table(x,
            file = 'example1.data',
            row.names = FALSE,
            col.names = FALSE)

We don’t actually need to write out the data since ‘rjags’ automatically does this for us (and in another format at that), but it’s nice to be able to check that JAGS has done something reasonable by analyzing the raw inputs post hoc.

With your simulated data in hand, we’ll write up a model specification in JAGS syntax. Put the model specification in a file called example1.bug. The complete model looks like this:

1
2
3
4
5
6
7
8
model {
	for (i in 1:N) {
		x[i] ~ dnorm(mu, tau)
	}
	mu ~ dnorm(0, .0001)
	tau <- pow(sigma, -2)
	sigma ~ dunif(0, 100)
}

In every model specification file, you have to start out by telling JAGS that you’re specifying a model. Then you set up the model for every single data point using a for loop. Here, we say that x[i] is distributed normally (hence the dnorm() call) with mean mu and precision tau, where the precision is simply the reciprocal of the variance. Then we specify our priors for mu and tau, which are meant to be constant across the loop. We tell JAGS that mu is distributed normally with mean 0 and standard deviation 100. This is meant to serve as a non-informative prior, since our data set was designed to have all measurements substantially below 100. Then we specify tau in a slightly round-about way. We say that tau is a deterministic function (hence the deterministic <- instead of the distributional ~) of sigma, after raising sigma to the -2 power. Then we say that sigma has a uniform prior over the interval [0,100].

With this model specified in example1.bug, we can write more R code to invoke it and perform inference properly:

1
2
3
4
5
6
7
8
9
10
11
12
13
library('rjags')
 
jags <- jags.model('example1.bug',
                   data = list('x' = x,
                               'N' = N),
                   n.chains = 4,
                   n.adapt = 100)
 
update(jags, 1000)
 
jags.samples(jags,
             c('mu', 'tau'),
             1000)

Obviously, we have to import the 'rjags' package. Then we need to set up our model object in R, which we do using the jags.model() function. We specify the JAGS model specification file and the data set, which is a named list where the names must be those used in the JAGS model specification file. Finally, we tell the system how many parallel chains to run. (If you don't understand what the chains represent, I'd suggest just playing around and then reading up about the issue of mixing in MCMC.) Finally, we tell the system how many samples should be thrown away as part of the adaptive sampling period for each chain. For this example, I suspect that we could safely set this parameter to 0, but it costs so little that I've used 100 just as a placeholder. After calling jags.model(), we receive a JAGS model object, which we store in the jags variable.

After all of that set up, I've chosen to have the system run another 1000 iterations of the sampler just to show how to use the update() function, even though it's completely unnecessary in this simple problem. Finally, we use jags.sample() to draw 1000 samples from the sampler for the values of the named variables mu and tau.

When you call jags.sample(), you'll see the output provides proposed values for mu and tau. These should be close to 0 and 0.04 if JAGS is working properly, since those were the mean and precision values we used to create our simulation data. (At the risk of being pedantic: we used a standard deviation of 5, which gives a variance of 25 and a precision of 1 / 25 = 0.04.) Of course, they'll be even closer to the sample mean mean(x) and the sample precision 1 / var(x), so you should not forget to compare the inferred values to these values. The sample size, 1,000, isn't large enough to guarantee that the mean will be all that close to 0.

Example 2: Basic Linear Regression

Moving on to a slightly more interesting example, we can perform a simple linear regression in JAGS very easily. As before, we set up simulation data from a theoretical linear model:

1
2
3
4
5
6
7
8
9
N <- 1000
x <- 1:N
epsilon <- rnorm(N, 0, 1)
y <- x + epsilon
 
write.table(data.frame(X = x, Y = y, Epsilon = epsilon),
            file = 'example2.data',
            row.names = FALSE,
            col.names = TRUE)

We then set up the Bayesian model for our regression in example2.bug:

1
2
3
4
5
6
7
8
9
10
model {
	for (i in 1:N){
		y[i] ~ dnorm(y.hat[i], tau)
		y.hat[i] <- a + b * x[i]
	}
	a ~ dnorm(0, .0001)
	b ~ dnorm(0, .0001)
	tau <- pow(sigma, -2)
	sigma ~ dunif(0, 100)
}

Here, we've said that every data point is drawn from a normal distribution with mean a + b * x[i] and precision tau. We assign non-informative normal priors to a and b and a non-informative uniform prior to the standard deviation sigma, which is deterministically transformed into tau.

Then, we run this model using the same exact approach as we used earlier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
library('rjags')
 
jags <- jags.model('example2.bug',
                   data = list('x' = x,
                               'y' = y,
                               'N' = N),
                   n.chains = 4,
                   n.adapt = 100)
 
update(jags, 1000)
 
jags.samples(jags,
             c('a', 'b'),
             1000)

After running the chain for a good number of samples, we draw inferences for a and b, which should be close to the proper values of 0 and 1. I've ignored tau here, though there's no reason not to check that it was properly inferred.

Example 3: One Dimensional Logistic Regression

Finally, it's good to see a model that's harder to implement without a good deal of knowledge of optimization tools unless you use a sampling technique like the one JAGS automates. For that purpose, I'll show how to implement logistic regression. Here we set up a simple one-dimensional predictor for our binary outcome variable and assume the standard logistic model:

1
2
3
4
5
6
7
8
9
N <- 1000
x <- 1:N
z <- 0.01 * x - 5
y <- sapply(1 / (1 + exp(-z)), function(p) {rbinom(1, 1, p)})
 
write.table(data.frame(X = x, Z = z, Y = y),
            file = 'example3.data',
            row.names = FALSE,
            col.names = TRUE)

Then we set up our Bayesian model in example3.bug, where y[i] is Bernoulli distributed (or binomial distributed with 1 draw, if you prefer that sort of thing) and the linear model coefficients a and b are given non-informative normal priors:

1
2
3
4
5
6
7
8
9
model {
	for (i in 1:N){
		y[i] ~ dbern(p[i])
		p[i] <- 1 / (1 + exp(-z[i]))
		z[i] <- a + b * x[i]
	}
	a ~ dnorm(0, .0001)
	b ~ dnorm(0, .0001)
}

Finally, we perform our standard inference calls in R to run the model through JAGS and extract predicted values for a and b.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
library('rjags')
 
jags <- jags.model('example3.bug',
                   data = list('x' = x,
                               'y' = y,
                               'N' = N),
                   n.chains = 4,
                   n.adapt = 100)
 
update(jags, 1000)
 
jags.samples(jags,
             c('a', 'b'),
             1000)

As always, you should check that the outputs you get make sense: here, you expect a to be approximately -5 and b to be around 0.01. This inference problem should take a good bit longer to solve: there are other tools for handling logistic regressions in JAGS that are faster, but I find this approach conceptually simplest and best for highlighting the similarity to a standard linear regression.

Some Errors I Made at the Start

Here are a few errors that stumped me for a bit as I got started using JAGS today:

  1. Error 1, 'Invalid parent error': I got this error when I erroneously assigned a normal prior to one of my precision variables tau. This is nonsensical, since precisions are always positive. Hence, the parent node involving tau was deemed to be 'invalid', causing a fatal run-time error.
  2. Error 2, 'Attempt to redefine node z[1]': This is an error that Gelman and Hill warn users about in their ARM book: you must be sure that you don't treat the values in loops as local variables, because they cannot be reset on each iteration -- they must have unique values across all iterations. Thus, you must build an array for all of the variables that you might think of as local within loops, such as intermediate latent variables. Not doing so will produce fatal run-time errors.
  3. Error 3, 'Invalid vector argument to exp': This is related to the error above: when I corrected part of my attempt to reset z on each pass through the logistic loop in Example 3, I forgot to reset it in the definition of p[i]. This led to an invalid vector with no definition being passed to the exp() function, giving a fatal run-time error.

One Question I Have

My first attempt to run a linear regression didn't work. I still don't entirely understand why, but here is the alternative code that failed if you ever make the same type of mistake and find yourself puzzled:

1
2
3
4
5
6
7
8
9
10
model {
	for (i in 1:N){
		y[i] <- a + b * x[i] + epsilon[i]
                epsilon[i] ~ dnorm(0, tau)
	}
	a ~ dnorm(0, .0001)
	b ~ dnorm(0, .0001)
	tau <- pow(sigma, -2)
	sigma ~ dunif(0, 100)
}

My assumption is that it's more difficult to infer values for all the epsilon's at the same time as tau, which makes this harder than the earlier call without any explicit epsilon values. If that's wrong, please do correct me. Another hypothesis I entertained is that it's a problem to ever set y[i] to be a deterministic node, though this doesn't seem really plausible to me.

30 responses to “Using JAGS in R with the rjags Package”

  1. Bernd

    This a really nice blog post! I like the way you introduce different models. May I suggest that you add a short description how to do some (simple) convergence diagnostics via coda[1] (or similar packages).

    [1] I have written a similar (and simpler) post about JAGS + rjags (unfortunately, it’s in German …) which also includes some convergence diagnostics.

  2. gg

    I think I know why your first attempt of linear regression (code in “One Question I Have”) didn’t work. It does not use the data ‘y’ when the model is coded as such. You can try removing ‘y’ = y from the ‘data’ argument in the jags.model() function and it will still work! That means that y[i] is a deterministic node.

  3. gg

    Hi John Myles White,
    I think that is so, from the examples I have seen. But I am not very sure about that, as I have only started to learn jags this week. Have you tested my claim that the data ‘y’ is not used by jags? I cannot be very sure of that too!

  4. Bill Jefferys

    This was very useful in learning the basics of rjags.

    But I discovered something that people who use Macs (maybe others, I do not know) need to know.

    I created the file ‘example1.bug’ in the R console text editor (Create new document…) Saved it with that name.
    But R could not find the document. Was it in the right directory? Yes. Could any R function find it? No.

    Then I found out (did an “info” on the file) that the actual file name wasn’t ‘example1.bug’, but was ‘example1.bug.R’

    The extension was there, but MacOS did not display it.

    Usually, MacOS doesn’t care about extensions. But in this case it does. This may be a bug in R.

    Bill

  5. Pedro Regueiro

    hey!!, Useful post.

    I am also new to rjags and did same mistake Bill posted up there. However, i still cant find the correct way to create a .bug file on a mac. Any ideas?

    (Because this is probably the most stupid question you have ever been asked, i do feel the need to tell you that i am being serious)

    cheers!

    Pedro

  6. Pedro Regueiro

    Awesome!!

    Thanks

  7. j

    Thanks for this great tutorial. To confirm your hypothesis regarding deterministic data nodes:

    “It is an error to supply a data value for a deterministic node. (See, however, section 8.0.3 on observable functions).”

    This is from page 8 of the JAGS user manual.

  8. Filipe Ferminiano

    Nice tutorial, but I’m having trouble with rjags in Windows 7. R crushes when I try to run the line:
    jags <- jags.model('example1.bug', data = list('x' = x,'N' = N),n.chains = 4,n.adapt = 100)
    I'm using a 64 bit Windows 7 version. I have already updated to the last R version, but it failed in the same way.
    I tried run this tutorial in a Windows XP machine and it worked well.
    Can you help me?

  9. Jake

    Hey John,

    Nice post. Quick question. I am just now getting into some bayesian modeling of simple linear regressions but I have data that i want to use to inform my model. My question is this: I understand how to inform the alpha and beta, that makes clear sense to me, but when do you/how do you know to place informed priors on tau? I guess I don’t really understand what tau is doing.

    Any comments would be great.

    thanks,

    Jake

  10. Sandrine

    Thks a lot for your tutorial. I would like to use rjags to estimate an ordered probit but I find difficulties to implement the model. Do you know available examples? thks a lot in advance

  11. Jake

    Hey John,

    Thanks for the post. I am trying to run some bayesian regressions using informed priors in a big loop. I have all the code written and the individual regressions work smoothly, but I can’t figure out how to insert the prior information into the “model” that is saved as a text (I am generating the informed priors within the loop and then naming them and trying to insert them into the model code).
    For example in the model below I have named my prior values “Aprior” and “Bprior”. Every time I use non-numeric terms (see below), it doesn’t work even if these vectors are named before and/or inside the code itself.

    It would save me days of work (though I am running them all by hand now) if I could figure this out, so any ideas would be great.

    thanks a lot,

    Jake

    Aprior <- .3
    Bprior <- .5 (and I have done this using things like as.numeric(value) as well)

    model {
    for (i in 1:N){
    y[i] <- a + b * x[i] + epsilon[i]
    epsilon[i] ~ dnorm(0, tau)
    }
    a ~ dnorm( Aprior, .0001)
    b ~ dnorm( Bprior, .0001)
    tau <- pow(sigma, -2)
    sigma ~ dunif(0, 100)
    }

  12. CHCH

    I’m wondering whether there’s a straightforward extension of these examples to performing these inference hierarchically, so that estimates for individual subjects are constrained by the mu and sigma as estimated across subjects.

    This was an excellent tutorial, btw. Thanks a lot.

  13. John Myles White

    There are fairly easy extensions of these models to the hierarchical case. You can find code for them on GitHub at https://github.com/johnmyleswhite/JAGSExamples

    There are a lot of other models there to wade through, but the code in bugs/hierarchical should get you started.

  14. Matthew

    Excellent. Between Gelman and Hill, Kruschke, and now you… the application of Bayesian inference is starting to make sense!!

  15. EvanZ

    There aren’t too many sites that have help for the logistic regression in jags, so I thought I would post my problem here and see if anyone can help. I’m trying to do a very simple regression based on oring failure (as a function of temperature) data from Robert and Casella (MCSM) chapter 1. I’m using rjags and RStudio as the front-end, which crashes every time the following is run (I’ve tried multiple variants of the model part and all seem to cause an exit of the program):

    # THE MODEL.

    modelstring = ”
    model {
    for (i in 1:N) {
    y[i] ~ dbern(mu)
    logit(mu) <- alpha+beta*x[i]
    }
    alpha ~ dnorm(0, 0.0001)
    beta ~ dnorm(0, 0.0001)
    }
    " # close quote for modelstring
    writeLines(modelstring, con="oring.bug")

    #——————————————————————————
    # THE DATA.
    x = c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)
    y = c(1,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0)
    N = length(x)

    #——————————————————————————
    # RUN

    library('rjags')

    jags <- jags.model('oring.bug',
    data = list('x' = x,
    'y' = y,
    'N' = N),
    n.chains = 4,
    n.adapt = 500)

    update(jags, n.iter=1000,progress.bar="text")

    oring.out=coda.samples(jags,
    c("alpha","beta"),
    n.iter=1000)

    summary(oring.out)

  16. John Myles White

    What is the actual model printed to the data file? Generating that file in R seems like overkill and a likely source of corruption.

  17. EvanZ

    I’m not sure which file you’re referring to (‘oring.bug’?). But I have run a similar linear regression using the same setup, and it runs fine. Does the model itself look ok?

  18. John Myles White

    One thing is clearly wrong: mu needs to be mu[i].

  19. EvanZ

    Ok, that makes it work. Now I have to remind myself why I need that. Thanks!

  20. WillT

    EvanZ – as a general tip, try using R instead of RStudio as your front-end when you are debugging your model – it won’t crash if there’s an error in the model, and will frequently give you an error message that will help you determine where the error in your model code is.

  21. Vili

    Hi John,

    I am working on multidimensional model in RJags and I am wondering if you have any insight. I am getting error 2 that you discussed here when I try to run it.

    Data file looks like this:

    phat <-c(
    0.52, 0.40,
    0.50, 0.45,
    0.48, 0.46,
    0.49, 0.46,
    0.44, 0.47,
    0.43, 0.46,
    0.49, 0.44,
    0.41, 0.48,
    0.41, 0.47,
    0.4, 0.51,
    0.38, 0.52
    )
    phat <-matrix (phat,nrow=11,ncol=2)

    n <- c(500,100,2000,500,
    440,550,500,50,
    1000,1200,2000)
    The actual model is:
    model
    {
    for (i in 1:11)
    {
    Y[i,1] <- logit(phat[i,1])
    Y[i,2] <- logit(phat[i,2])
    }

    for (i in 2:11)
    {
    V[1,1]<-1/(phat[i,1]*(1-phat[i,1]))
    V[1,2]<- -1/(1-phat[i,1])*(1-phat[i,2])
    V[2,1]<- -1/(1-phat[i,1])*(1-phat[i,2])
    V[2,2]<-1/(phat[i,2]*(1-phat[i,2]))
    Y[i,1:2] ~ dmnorm(theta[i,1:2],1/n[i]*V[,])
    theta[i,1:2]~dmnorm (theta[i-1,1:2],W[,])
    }

    theta[1,1:2] ~ dmnorm(theta0[],prec[,])
    W[1:2,1:2] ~ dwish(W0[,],3)

    theta0[1]<-0.55
    theta0[2]<-0.40

    W0[1,1] <- 75
    W0[2,2] <- 75
    W0[1,2] <- 60
    W0[2,1] <- 60

    }

    I am having issues with the specification of V[,]. I am sure I need to specify something in the data file pertaining to V[,] but I tried multiple things and nothing works :-(.

    Thanks in advance any help will be greatly appreciated.

  22. Yanjie Fu

    Hi John,

    Sorry for my bothering.

    Since you are familiar with RJags, I am seeking your help on Modeling User-Item Rating Using RJags and BUGS. This problem is challenging because the model is complicate. Please see my question on stackoverflow.

    http://stackoverflow.com/questions/13907674/modeling-user-item-rating-using-rjags-and-bugs

    Thank you.