### Get Everything Set Up

I’m going to assume that you have access to a machine that will run JAGS. If you don’t, then you should be able to use WinBUGS, which is very easy to get set up. Unfortunately, the details of what follows may not help you as much if you’re using WinBUGS.

To set up your system for using JAGS, there are two very easy steps:

- Go download the current version of JAGS (2.1.0 as of 8/20/2010).
- Install the current rjags package from CRAN (2.1.0-6 as of 8/20/2010).

Once you’ve done that, a simple call to `library('rjags')`

will be enough to run JAGS from inside of R. You’ll want to do everything except model specification in R. You’ll specify the model in a separate file using BUGS/JAGS syntax.

### Example 1: Inference on Normally Distributed Data

Let’s assume that you’ve got a bunch of data points from a normal distribution with unknown mean and variance. This is arguably the simplest data set you can analyze with JAGS. So, how do you perform the analysis?

First, let’s create some simulation data that we’ll use to test our JAGS model specification:

1 2 3 4 5 6 7 | N <- 1000 x <- rnorm(N, 0, 5) write.table(x, file = 'example1.data', row.names = FALSE, col.names = FALSE) |

We don’t actually need to write out the data since ‘rjags’ automatically does this for us (and in another format at that), but it’s nice to be able to check that JAGS has done something reasonable by analyzing the raw inputs post hoc.

With your simulated data in hand, we’ll write up a model specification in JAGS syntax. Put the model specification in a file called `example1.bug`

. The complete model looks like this:

1 2 3 4 5 6 7 8 | model { for (i in 1:N) { x[i] ~ dnorm(mu, tau) } mu ~ dnorm(0, .0001) tau <- pow(sigma, -2) sigma ~ dunif(0, 100) } |

In every model specification file, you have to start out by telling JAGS that you’re specifying a model. Then you set up the model for every single data point using a `for`

loop. Here, we say that `x[i]`

is distributed normally (hence the `dnorm()`

call) with mean `mu`

and precision `tau`

, where the precision is simply the reciprocal of the variance. Then we specify our priors for `mu`

and `tau`

, which are meant to be constant across the loop. We tell JAGS that `mu`

is distributed normally with mean 0 and standard deviation 100. This is meant to serve as a non-informative prior, since our data set was designed to have all measurements substantially below 100. Then we specify `tau`

in a slightly round-about way. We say that `tau`

is a deterministic function (hence the deterministic `<-`

instead of the distributional `~`

) of `sigma`

, after raising `sigma`

to the -2 power. Then we say that `sigma`

has a uniform prior over the interval [0,100].

With this model specified in `example1.bug`

, we can write more R code to invoke it and perform inference properly:

1 2 3 4 5 6 7 8 9 10 11 12 13 | library('rjags') jags <- jags.model('example1.bug', data = list('x' = x, 'N' = N), n.chains = 4, n.adapt = 100) update(jags, 1000) jags.samples(jags, c('mu', 'tau'), 1000) |

Obviously, we have to import the 'rjags' package. Then we need to set up our model object in R, which we do using the `jags.model()`

function. We specify the JAGS model specification file and the data set, which is a named list where the names must be those used in the JAGS model specification file. Finally, we tell the system how many parallel chains to run. (If you don't understand what the chains represent, I'd suggest just playing around and then reading up about the issue of mixing in MCMC.) Finally, we tell the system how many samples should be thrown away as part of the adaptive sampling period for each chain. For this example, I suspect that we could safely set this parameter to 0, but it costs so little that I've used 100 just as a placeholder. After calling `jags.model()`

, we receive a JAGS model object, which we store in the `jags`

variable.

After all of that set up, I've chosen to have the system run another 1000 iterations of the sampler just to show how to use the `update()`

function, even though it's completely unnecessary in this simple problem. Finally, we use `jags.sample()`

to draw 1000 samples from the sampler for the values of the named variables `mu`

and `tau`

.

When you call `jags.sample()`

, you'll see the output provides proposed values for `mu`

and `tau`

. These should be close to 0 and 0.04 if JAGS is working properly, since those were the mean and precision values we used to create our simulation data. (At the risk of being pedantic: we used a standard deviation of 5, which gives a variance of 25 and a precision of 1 / 25 = 0.04.) Of course, they'll be even closer to the sample mean `mean(x)`

and the sample precision `1 / var(x)`

, so you should not forget to compare the inferred values to these values. The sample size, 1,000, isn't large enough to guarantee that the mean will be all that close to 0.

### Example 2: Basic Linear Regression

Moving on to a slightly more interesting example, we can perform a simple linear regression in JAGS very easily. As before, we set up simulation data from a theoretical linear model:

1 2 3 4 5 6 7 8 9 | N <- 1000 x <- 1:N epsilon <- rnorm(N, 0, 1) y <- x + epsilon write.table(data.frame(X = x, Y = y, Epsilon = epsilon), file = 'example2.data', row.names = FALSE, col.names = TRUE) |

We then set up the Bayesian model for our regression in `example2.bug`

:

1 2 3 4 5 6 7 8 9 10 | model { for (i in 1:N){ y[i] ~ dnorm(y.hat[i], tau) y.hat[i] <- a + b * x[i] } a ~ dnorm(0, .0001) b ~ dnorm(0, .0001) tau <- pow(sigma, -2) sigma ~ dunif(0, 100) } |

Here, we've said that every data point is drawn from a normal distribution with mean `a + b * x[i]`

and precision `tau`

. We assign non-informative normal priors to `a`

and `b`

and a non-informative uniform prior to the standard deviation `sigma`

, which is deterministically transformed into `tau`

.

Then, we run this model using the same exact approach as we used earlier:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | library('rjags') jags <- jags.model('example2.bug', data = list('x' = x, 'y' = y, 'N' = N), n.chains = 4, n.adapt = 100) update(jags, 1000) jags.samples(jags, c('a', 'b'), 1000) |

After running the chain for a good number of samples, we draw inferences for `a`

and `b`

, which should be close to the proper values of 0 and 1. I've ignored `tau`

here, though there's no reason not to check that it was properly inferred.

### Example 3: One Dimensional Logistic Regression

Finally, it's good to see a model that's harder to implement without a good deal of knowledge of optimization tools unless you use a sampling technique like the one JAGS automates. For that purpose, I'll show how to implement logistic regression. Here we set up a simple one-dimensional predictor for our binary outcome variable and assume the standard logistic model:

1 2 3 4 5 6 7 8 9 | N <- 1000 x <- 1:N z <- 0.01 * x - 5 y <- sapply(1 / (1 + exp(-z)), function(p) {rbinom(1, 1, p)}) write.table(data.frame(X = x, Z = z, Y = y), file = 'example3.data', row.names = FALSE, col.names = TRUE) |

Then we set up our Bayesian model in `example3.bug`

, where `y[i]`

is Bernoulli distributed (or binomial distributed with 1 draw, if you prefer that sort of thing) and the linear model coefficients `a`

and `b`

are given non-informative normal priors:

1 2 3 4 5 6 7 8 9 | model { for (i in 1:N){ y[i] ~ dbern(p[i]) p[i] <- 1 / (1 + exp(-z[i])) z[i] <- a + b * x[i] } a ~ dnorm(0, .0001) b ~ dnorm(0, .0001) } |

Finally, we perform our standard inference calls in R to run the model through JAGS and extract predicted values for `a`

and `b`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | library('rjags') jags <- jags.model('example3.bug', data = list('x' = x, 'y' = y, 'N' = N), n.chains = 4, n.adapt = 100) update(jags, 1000) jags.samples(jags, c('a', 'b'), 1000) |

As always, you should check that the outputs you get make sense: here, you expect `a`

to be approximately -5 and `b`

to be around 0.01. This inference problem should take a good bit longer to solve: there are other tools for handling logistic regressions in JAGS that are faster, but I find this approach conceptually simplest and best for highlighting the similarity to a standard linear regression.

### Some Errors I Made at the Start

Here are a few errors that stumped me for a bit as I got started using JAGS today:

**Error 1, 'Invalid parent error'**: I got this error when I erroneously assigned a normal prior to one of my precision variables`tau`

. This is nonsensical, since precisions are always positive. Hence, the parent node involving`tau`

was deemed to be 'invalid', causing a fatal run-time error.**Error 2, 'Attempt to redefine node z[1]'**: This is an error that Gelman and Hill warn users about in their ARM book: you must be sure that you don't treat the values in loops as local variables, because they cannot be reset on each iteration -- they must have unique values across all iterations. Thus, you must build an array for all of the variables that you might think of as local within loops, such as intermediate latent variables. Not doing so will produce fatal run-time errors.- Error 3, 'Invalid vector argument to exp': This is related to the error above: when I corrected part of my attempt to reset
`z`

on each pass through the logistic loop in Example 3, I forgot to reset it in the definition of`p[i]`

. This led to an invalid vector with no definition being passed to the`exp()`

function, giving a fatal run-time error.

### One Question I Have

My first attempt to run a linear regression didn't work. I still don't entirely understand why, but here is the alternative code that failed if you ever make the same type of mistake and find yourself puzzled:

1 2 3 4 5 6 7 8 9 10 | model { for (i in 1:N){ y[i] <- a + b * x[i] + epsilon[i] epsilon[i] ~ dnorm(0, tau) } a ~ dnorm(0, .0001) b ~ dnorm(0, .0001) tau <- pow(sigma, -2) sigma ~ dunif(0, 100) } |

My assumption is that it's more difficult to infer values for all the `epsilon`

's at the same time as `tau`

, which makes this harder than the earlier call without any explicit `epsilon`

values. If that's wrong, please do correct me. Another hypothesis I entertained is that it's a problem to ever set `y[i]`

to be a deterministic node, though this doesn't seem really plausible to me.

This a really nice blog post! I like the way you introduce different models. May I suggest that you add a short description how to do some (simple) convergence diagnostics via coda[1] (or similar packages).

[1] I have written a similar (and simpler) post about JAGS + rjags (unfortunately, it’s in German …) which also includes some convergence diagnostics.

Thanks, Bernd. I was planning to write a post today about convergence diagnostics using coda. I didn’t include it in yesterday’s post since I hadn’t started using coda yet when I started writing. Your example code for coda is really helpful.

I think I know why your first attempt of linear regression (code in “One Question I Have”) didn’t work. It does not use the data ‘y’ when the model is coded as such. You can try removing ‘y’ = y from the ‘data’ argument in the jags.model() function and it will still work! That means that y[i] is a deterministic node.

That’s really helpful, gg. Is it a generic feature of JAGS that a data node must not appear in an assignment statement, but only as part of a distributional definition? That would help prevent a lot of bugs if I knew that for sure.

Hi John Myles White,

I think that is so, from the examples I have seen. But I am not very sure about that, as I have only started to learn jags this week. Have you tested my claim that the data ‘y’ is not used by jags? I cannot be very sure of that too!

I’ve checked your claim, gg, and it’s definitely correct. You can drop the y variable and the code will run, simply with no constraints to cause the sampler to find proper values.

I’ve also played around with simple variations of the standard OLS problem, including having y and z vectors as outputs and using a non-normal error distribution. For all of those simple examples, it’s clear that any data input to JAGS must be given a distribution rather a deterministic assignment, even if the assignment incorporates random variables into its definition.

This is really useful to know. Thanks so much for pointing this out.

This was very useful in learning the basics of rjags.

But I discovered something that people who use Macs (maybe others, I do not know) need to know.

I created the file ‘example1.bug’ in the R console text editor (Create new document…) Saved it with that name.

But R could not find the document. Was it in the right directory? Yes. Could any R function find it? No.

Then I found out (did an “info” on the file) that the actual file name wasn’t ‘example1.bug’, but was ‘example1.bug.R’

The extension was there, but MacOS did not display it.

Usually, MacOS doesn’t care about extensions. But in this case it does. This may be a bug in R.

Bill

hey!!, Useful post.

I am also new to rjags and did same mistake Bill posted up there. However, i still cant find the correct way to create a .bug file on a mac. Any ideas?

(Because this is probably the most stupid question you have ever been asked, i do feel the need to tell you that i am being serious)

cheers!

Pedro

Hi Pedro,

There are a couple of options. The simplest, I think, is to use Terminal and change the name of the file that way. Terminal never hides extensions for files, so it should be really easy to type something like

mv example.bug.R example.bug

Another option is to configure Finder to show file extensions. In Finder, click on Finder -> Preferences. Then select the Advanced tab and set the “Show all filename extensions”

Finally, the last option is to just change your code to look for example.bug.R instead of example.bug.

Another option that simplifies things is to put the BUGS model in a string, in your R code & then just read it w/ textConnection(), which treats the string like a file, e.g.,

> point.model model

> {

> phi ~ dunif(0,1);

> y ~ dbin(phi, N);

> }

> ”

> m <- jags.model(textConnection(point.model), result)

Awesome!!

Thanks

Thanks for this great tutorial. To confirm your hypothesis regarding deterministic data nodes:

“It is an error to supply a data value for a deterministic node. (See, however, section 8.0.3 on observable functions).”

This is from page 8 of the JAGS user manual.

Nice tutorial, but I’m having trouble with rjags in Windows 7. R crushes when I try to run the line:

jags <- jags.model('example1.bug', data = list('x' = x,'N' = N),n.chains = 4,n.adapt = 100)

I'm using a 64 bit Windows 7 version. I have already updated to the last R version, but it failed in the same way.

I tried run this tutorial in a Windows XP machine and it worked well.

Can you help me?

Hey John,

Nice post. Quick question. I am just now getting into some bayesian modeling of simple linear regressions but I have data that i want to use to inform my model. My question is this: I understand how to inform the alpha and beta, that makes clear sense to me, but when do you/how do you know to place informed priors on tau? I guess I don’t really understand what tau is doing.

Any comments would be great.

thanks,

Jake

@Filipe, I unfortunately don’t have any experience getting JAGS working on a Windows machine. Perhaps move over to WinBUGS?

@Jake, tau (really sigma) is controlling the variance of the linear model. I think of this as a quantity you very much want to use a wide prior for so that the data is in full control, but I don’t have a strong intuition about what will happen if you set it to a lower value through the prior. Try giving sigma a uniform(0, 0.01) prior and see what happens. I’d be interested to know.

Thks a lot for your tutorial. I would like to use rjags to estimate an ordered probit but I find difficulties to implement the model. Do you know available examples? thks a lot in advance

Sandrine, I think there are examples of an ordered probit model in Gelman and Hill’s book. If not probit, there is definitely a logit model example.

Hey John,

Thanks for the post. I am trying to run some bayesian regressions using informed priors in a big loop. I have all the code written and the individual regressions work smoothly, but I can’t figure out how to insert the prior information into the “model” that is saved as a text (I am generating the informed priors within the loop and then naming them and trying to insert them into the model code).

For example in the model below I have named my prior values “Aprior” and “Bprior”. Every time I use non-numeric terms (see below), it doesn’t work even if these vectors are named before and/or inside the code itself.

It would save me days of work (though I am running them all by hand now) if I could figure this out, so any ideas would be great.

thanks a lot,

Jake

Aprior <- .3

Bprior <- .5 (and I have done this using things like as.numeric(value) as well)

model {

for (i in 1:N){

y[i] <- a + b * x[i] + epsilon[i]

epsilon[i] ~ dnorm(0, tau)

}

a ~ dnorm( Aprior, .0001)

b ~ dnorm( Bprior, .0001)

tau <- pow(sigma, -2)

sigma ~ dunif(0, 100)

}

I’m wondering whether there’s a straightforward extension of these examples to performing these inference hierarchically, so that estimates for individual subjects are constrained by the mu and sigma as estimated across subjects.

This was an excellent tutorial, btw. Thanks a lot.

There are fairly easy extensions of these models to the hierarchical case. You can find code for them on GitHub at https://github.com/johnmyleswhite/JAGSExamples

There are a lot of other models there to wade through, but the code in bugs/hierarchical should get you started.

Excellent. Between Gelman and Hill, Kruschke, and now you… the application of Bayesian inference is starting to make sense!!

Glad to hear it, Matthew!

There aren’t too many sites that have help for the logistic regression in jags, so I thought I would post my problem here and see if anyone can help. I’m trying to do a very simple regression based on oring failure (as a function of temperature) data from Robert and Casella (MCSM) chapter 1. I’m using rjags and RStudio as the front-end, which crashes every time the following is run (I’ve tried multiple variants of the model part and all seem to cause an exit of the program):

# THE MODEL.

modelstring = ”

model {

for (i in 1:N) {

y[i] ~ dbern(mu)

logit(mu) <- alpha+beta*x[i]

}

alpha ~ dnorm(0, 0.0001)

beta ~ dnorm(0, 0.0001)

}

" # close quote for modelstring

writeLines(modelstring, con="oring.bug")

#——————————————————————————

# THE DATA.

x = c(53,57,58,63,66,67,67,67,68,69,70,70,70,70,72,73,75,75,76,76,78,79,81)

y = c(1,1,1,1,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,0,0)

N = length(x)

#——————————————————————————

# RUN

library('rjags')

jags <- jags.model('oring.bug',

data = list('x' = x,

'y' = y,

'N' = N),

n.chains = 4,

n.adapt = 500)

update(jags, n.iter=1000,progress.bar="text")

oring.out=coda.samples(jags,

c("alpha","beta"),

n.iter=1000)

summary(oring.out)

What is the actual model printed to the data file? Generating that file in R seems like overkill and a likely source of corruption.

I’m not sure which file you’re referring to (‘oring.bug’?). But I have run a similar linear regression using the same setup, and it runs fine. Does the model itself look ok?

One thing is clearly wrong: mu needs to be mu[i].

Ok, that makes it work. Now I have to remind myself why I need that. Thanks!

EvanZ – as a general tip, try using R instead of RStudio as your front-end when you are debugging your model – it won’t crash if there’s an error in the model, and will frequently give you an error message that will help you determine where the error in your model code is.

Hi John,

I am working on multidimensional model in RJags and I am wondering if you have any insight. I am getting error 2 that you discussed here when I try to run it.

Data file looks like this:

phat <-c(

0.52, 0.40,

0.50, 0.45,

0.48, 0.46,

0.49, 0.46,

0.44, 0.47,

0.43, 0.46,

0.49, 0.44,

0.41, 0.48,

0.41, 0.47,

0.4, 0.51,

0.38, 0.52

)

phat <-matrix (phat,nrow=11,ncol=2)

n <- c(500,100,2000,500,

440,550,500,50,

1000,1200,2000)

The actual model is:

model

{

for (i in 1:11)

{

Y[i,1] <- logit(phat[i,1])

Y[i,2] <- logit(phat[i,2])

}

for (i in 2:11)

{

V[1,1]<-1/(phat[i,1]*(1-phat[i,1]))

V[1,2]<- -1/(1-phat[i,1])*(1-phat[i,2])

V[2,1]<- -1/(1-phat[i,1])*(1-phat[i,2])

V[2,2]<-1/(phat[i,2]*(1-phat[i,2]))

Y[i,1:2] ~ dmnorm(theta[i,1:2],1/n[i]*V[,])

theta[i,1:2]~dmnorm (theta[i-1,1:2],W[,])

}

theta[1,1:2] ~ dmnorm(theta0[],prec[,])

W[1:2,1:2] ~ dwish(W0[,],3)

theta0[1]<-0.55

theta0[2]<-0.40

W0[1,1] <- 75

W0[2,2] <- 75

W0[1,2] <- 60

W0[2,1] <- 60

}

I am having issues with the specification of V[,]. I am sure I need to specify something in the data file pertaining to V[,] but I tried multiple things and nothing works :-(.

Thanks in advance any help will be greatly appreciated.

Hi John,

Sorry for my bothering.

Since you are familiar with RJags, I am seeking your help on Modeling User-Item Rating Using RJags and BUGS. This problem is challenging because the model is complicate. Please see my question on stackoverflow.

http://stackoverflow.com/questions/13907674/modeling-user-item-rating-using-rjags-and-bugs

Thank you.