<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>John Myles White: Die Sudelbücher</title>
	<atom:link href="http://www.johnmyleswhite.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.johnmyleswhite.com</link>
	<description>&#34;He who refuses to do arithmetic is doomed to talk nonsense.&#34;</description>
	<lastBuildDate>Wed, 01 Sep 2010 22:15:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>The Goddess Logical Rigor</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/09/01/the-goddess-logical-rigor/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/09/01/the-goddess-logical-rigor/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 22:15:16 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Citations]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=4030</guid>
		<description><![CDATA[Initiates into the mysteries of the goddess Logical Rigor use a strange speech among themselves, and find it all but impossible to communicate their visions to the mass of ordinary, unilluminated mankind. This accounts in part for the fact that, of the three disciplines most devoted to that goddess, analytical philosophy and neo-classical economics have [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>
Initiates into the mysteries of the goddess Logical Rigor use a strange speech among themselves, and find it all but impossible to communicate their visions to the mass of ordinary, unilluminated mankind. This accounts in part for the fact that, of the three disciplines most devoted to that goddess, analytical philosophy and neo-classical economics have done next to nothing to shape thought and the culture at large, or even within the academy, while mathematics gave up all such pretensions long ago.<sup><a href="http://www.johnmyleswhite.com/notebook/2010/09/01/the-goddess-logical-rigor/#footnote_0_4030" id="identifier_0_4030" class="footnote-link footnote-identifier-link" title="From Cosma Shalizi&amp;#8217;s Review of Roemer&amp;#8217;s &amp;#8220;A Future for Socialism&amp;#8221;">1</a></sup>
</p></blockquote>
<p>I am not sure when it happened, but, at some point over the past ten years, I started to avoid arguing certain issues with people not familiar with rigorous mathematics, because I found it too upsetting to be reminded that many settled questions were still considered open to serious disagreement of opinion. Hopefully at some point I&#8217;ll find a better way to resolve these issues, because there is so much known only by the devotees of the goddess Logical Rigor that could benefit people, if only it were explained in a way they could understand. Sadly, I find it largely impossible to give someone more than a superficial intuition of how economic theory works without resorting to mathematics.</p>
<ol class="footnotes"><li id="footnote_0_4030" class="footnote">From <a href="http://cscs.umich.edu/~crshalizi/reviews/future-for-socialism/">Cosma Shalizi&#8217;s Review of Roemer&#8217;s &#8220;A Future for Socialism&#8221;</a></li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/09/01/the-goddess-logical-rigor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MCMC Diagnostics in R with the coda Package</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/08/29/mcmc-diagnostics-in-r-with-the-coda-package/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/08/29/mcmc-diagnostics-in-r-with-the-coda-package/#comments</comments>
		<pubDate>Mon, 30 Aug 2010 01:27:00 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3998</guid>
		<description><![CDATA[This is a follow up to my recent post introducing the use of JAGS in R through the rjags package. In the comments on that post, Bernd Weiss encouraged me to write a short addendum that describes diagnostic functions that you should use to assess the output from an MCMC sampler. I&#8217;ve only been using [...]]]></description>
			<content:encoded><![CDATA[<p>This is a follow up to <a href="http://www.johnmyleswhite.com/notebook/2010/08/20/using-jags-in-r-with-the-rjags-package/">my recent post introducing the use of JAGS in R</a> through the <a href="http://cran.r-project.org/web/packages/rjags/index.html">rjags package</a>. In the comments on that post, <a href="http://blog.berndweiss.net/">Bernd Weiss</a> encouraged me to write a short addendum that describes diagnostic functions that you should use to assess the output from an MCMC sampler.</p>
<p>I&#8217;ve only been using these diagnostics for a week now for an academic project of my own, so I&#8217;ll summarize my understanding of their use as it stands today. Please correct me if I&#8217;m spreading misinformation.</p>
<p>As I see it, all diagnostics used to analyze the output of an MCMC sampler try to answer a simple question: <i>has the sampler been given a sufficient adaptation (&#8220;burn-in&#8221;) period to justify your claim that the samples you draw from the chains approximate the posterior distribution of interest to you?</i> This question in turn leads one to analyzing the samples drawn after the burn-in period for obvious warning signs. To test for these potential warning signs, it&#8217;s easiest to use the diagnostic functions that are included in coda.</p>
<p>To make the use of coda clear, I&#8217;m going to follow up on the linear regression example that I used in my last post. First, we generate some data from a linear model:</p>

<div class="wp_codebox"><table><tr id="p399810"><td class="line_numbers"><pre>1
2
3
4
5
6
</pre></td><td class="code" id="p3998code10"><pre class="c" style="font-family:monospace;">library<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'rjags'</span><span style="color: #009900;">&#41;</span>
&nbsp;
N <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">1000</span>
x <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>N
epsilon <span style="color: #339933;">&lt;-</span> rnorm<span style="color: #009900;">&#40;</span>N<span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span>
y <span style="color: #339933;">&lt;-</span> x <span style="color: #339933;">+</span> epsilon</pre></td></tr></table></div>

<p>Then we set up our BUGS/JAGS model in <code>example.bug</code>:</p>

<div class="wp_codebox"><table><tr id="p399811"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code" id="p3998code11"><pre class="c" style="font-family:monospace;">model
<span style="color: #009900;">&#123;</span>
	<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>i in <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>N<span style="color: #009900;">&#41;</span>
	<span style="color: #009900;">&#123;</span>
		y<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> ~ dnorm<span style="color: #009900;">&#40;</span>y.<span style="color: #202020;">hat</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> tau<span style="color: #009900;">&#41;</span>
		y.<span style="color: #202020;">hat</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;-</span> a <span style="color: #339933;">+</span> b <span style="color: #339933;">*</span> x<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span>
	<span style="color: #009900;">&#125;</span>
	a ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color:#800080;">.0001</span><span style="color: #009900;">&#41;</span>
	b ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color:#800080;">.0001</span><span style="color: #009900;">&#41;</span>
	tau <span style="color: #339933;">&lt;-</span> pow<span style="color: #009900;">&#40;</span>sigma<span style="color: #339933;">,</span> <span style="color: #339933;">-</span><span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span>
	sigma ~ dunif<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">100</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>All of this is copied exactly from my earlier post. Now we change things by replacing our call to <code>jags.samples()</code> with a call to <code>coda.samples()</code>. This will provide output from JAGS in the format necessary for using the two diagnostic functions I understand best: <code>plot()</code> and <code>gelman.plot()</code>.</p>

<div class="wp_codebox"><table><tr id="p399812"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code" id="p3998code12"><pre class="c" style="font-family:monospace;">jags <span style="color: #339933;">&lt;-</span> jags.<span style="color: #202020;">model</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'example.bug'</span><span style="color: #339933;">,</span>
                   data <span style="color: #339933;">=</span> list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'x'</span> <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'y'</span> <span style="color: #339933;">=</span> y<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'N'</span> <span style="color: #339933;">=</span> N<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">chains</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">adapt</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">10</span><span style="color: #009900;">&#41;</span>
samples <span style="color: #339933;">&lt;-</span> coda.<span style="color: #202020;">samples</span><span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span>
                        c<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'b'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                        <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>
png<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'plot_1.png'</span><span style="color: #009900;">&#41;</span>
plot<span style="color: #009900;">&#40;</span>samples<span style="color: #009900;">&#41;</span>
dev.<span style="color: #202020;">off</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>Because we have used such a small number of adaptive samples (only 10), our call to <code>jags.model</code> will produce  this warning message:</p>

<div class="wp_codebox"><table><tr id="p399813"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p3998code13"><pre class="c" style="font-family:monospace;"><span style="color: #339933;">#Warning message:</span>
<span style="color: #339933;">#In adapt(model, n.adapt) :</span>
<span style="color: #339933;">#  Adaptation incomplete. Recreate the model with a longer adaptive phase.</span></pre></td></tr></table></div>

<p>Thankfully, this message is not the only evidence that we&#8217;ve used too few adaptive samples: you can also tell from the output of our call to <code>plot()</code>:</p>
<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/08/plot_1.png" alt="plot_1.png" border="0" width="480" height="480" /></div>
<p>The density plots for <code>a</code> and <code>b</code> in the right column are very suspicious: they are extremely pointed and seem to include a large number of outlier values that don&#8217;t belong if the values for <code>a</code> and <code>b</code> are normally distributed. You can also see these extreme values in the trace plots in the left column as well: they are the extreme values at the start of the trace for each chain.</p>
<p>Simply using 1,000 adaptive samples instead of 10 makes a world of difference:</p>

<div class="wp_codebox"><table><tr id="p399814"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code" id="p3998code14"><pre class="c" style="font-family:monospace;">jags <span style="color: #339933;">&lt;-</span> jags.<span style="color: #202020;">model</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'example.bug'</span><span style="color: #339933;">,</span>
                   data <span style="color: #339933;">=</span> list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'x'</span> <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'y'</span> <span style="color: #339933;">=</span> y<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'N'</span> <span style="color: #339933;">=</span> N<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">chains</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">adapt</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>
samples <span style="color: #339933;">&lt;-</span> coda.<span style="color: #202020;">samples</span><span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span>
                        c<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'b'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                        <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>
png<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'plot_2.png'</span><span style="color: #009900;">&#41;</span>
plot<span style="color: #009900;">&#40;</span>samples<span style="color: #009900;">&#41;</span>
dev.<span style="color: #202020;">off</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/08/plot_2.png" alt="plot_2.png" border="0" width="480" height="480" /></div>
<p>In this image, you can see that the trajectory of the chain is consistent over time and that its distribution looks appropriately normal. So the first takeaway message is simple: check the traces and distributions of your variables using <code>plot()</code> to make sure that they are reasonable and don&#8217;t indicate clear deficiencies in the length of your adaptation period. When you know the distributional form of your posteriors, this is particularly effective, as in our example here, where we know to expect normal distributions.</p>
<p>Another diagnostic tool is the Gelman plot, which has a simple logic to it: you test how similar the various chains you&#8217;re using are as a way to detect whether they&#8217;ve hit the target distribution. This similarity is what&#8217;s called mixing. To start, it&#8217;s easier to simply repeat our examples above using the Gelman plot:</p>

<div class="wp_codebox"><table><tr id="p399815"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code" id="p3998code15"><pre class="c" style="font-family:monospace;">jags <span style="color: #339933;">&lt;-</span> jags.<span style="color: #202020;">model</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'example.bug'</span><span style="color: #339933;">,</span>
                   data <span style="color: #339933;">=</span> list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'x'</span> <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'y'</span> <span style="color: #339933;">=</span> y<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'N'</span> <span style="color: #339933;">=</span> N<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">chains</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">adapt</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">10</span><span style="color: #009900;">&#41;</span>
samples <span style="color: #339933;">&lt;-</span> coda.<span style="color: #202020;">samples</span><span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span>
                        c<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'b'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                        <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>                        
png<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'plot_3.png'</span><span style="color: #009900;">&#41;</span>
gelman.<span style="color: #202020;">plot</span><span style="color: #009900;">&#40;</span>samples<span style="color: #009900;">&#41;</span>
dev.<span style="color: #202020;">off</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/08/plot_3.png" alt="plot_3.png" border="0" width="480" height="480" /></div>
<p>And then you can try to see how the plot looks different when you switch to a longer adaptive period:</p>

<div class="wp_codebox"><table><tr id="p399816"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code" id="p3998code16"><pre class="c" style="font-family:monospace;">jags <span style="color: #339933;">&lt;-</span> jags.<span style="color: #202020;">model</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'example.bug'</span><span style="color: #339933;">,</span>
                   data <span style="color: #339933;">=</span> list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'x'</span> <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'y'</span> <span style="color: #339933;">=</span> y<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'N'</span> <span style="color: #339933;">=</span> N<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">chains</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">adapt</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>
samples <span style="color: #339933;">&lt;-</span> coda.<span style="color: #202020;">samples</span><span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span>
                        c<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'b'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                        <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>                        
png<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'plot_4.png'</span><span style="color: #009900;">&#41;</span>
gelman.<span style="color: #202020;">plot</span><span style="color: #009900;">&#40;</span>samples<span style="color: #009900;">&#41;</span>
dev.<span style="color: #202020;">off</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/08/plot_4.png" alt="plot_4.png" border="0" width="480" height="480" /></div>
<p>Unfortunately, given our current call to <code>jags.model()</code>, it&#8217;s quite hard visually to identify convergence using Gelman plots, since the scales of these plots are not identical across our two examples, and the most prominent visual patterns are likely to be the results of random noise. There is a reason for this difficulty: we&#8217;re not properly initializing our sampler&#8217;s starting values separately for each chain. Both chains start from identical positions, which means that we don&#8217;t have enough power to really see the size of the space a theoretical chain might pass through before settling down. To fix that, we change our call to <code>jags.model()</code> to include an <code>inits</code> value, for which we provide an anonymous function that provides random values consistent with the prior we specified in <code>example.bug</code>. First, let&#8217;s repeat our previous approach again:</p>

<div class="wp_codebox"><table><tr id="p399817"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code" id="p3998code17"><pre class="c" style="font-family:monospace;">jags <span style="color: #339933;">&lt;-</span> jags.<span style="color: #202020;">model</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'example.bug'</span><span style="color: #339933;">,</span>
                   data <span style="color: #339933;">=</span> list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'x'</span> <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'y'</span> <span style="color: #339933;">=</span> y<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'N'</span> <span style="color: #339933;">=</span> N<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">chains</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">adapt</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">10</span><span style="color: #009900;">&#41;</span>
samples <span style="color: #339933;">&lt;-</span> coda.<span style="color: #202020;">samples</span><span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span>
                        c<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'b'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                        <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>
png<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'plot_5.png'</span><span style="color: #009900;">&#41;</span>
gelman.<span style="color: #202020;">plot</span><span style="color: #009900;">&#40;</span>samples<span style="color: #009900;">&#41;</span>
dev.<span style="color: #202020;">off</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/08/plot_51.png" alt="plot_5.png" border="0" width="480" height="480" /></div>
<p>Now let&#8217;s add proper random initialization for the starting position of each chain:</p>

<div class="wp_codebox"><table><tr id="p399818"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code" id="p3998code18"><pre class="c" style="font-family:monospace;">jags <span style="color: #339933;">&lt;-</span> jags.<span style="color: #202020;">model</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'example.bug'</span><span style="color: #339933;">,</span>
                   data <span style="color: #339933;">=</span> list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'x'</span> <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'y'</span> <span style="color: #339933;">=</span> y<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'N'</span> <span style="color: #339933;">=</span> N<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                   inits <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">function</span> <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
                   <span style="color: #009900;">&#123;</span>
                     list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span> <span style="color: #339933;">=</span> rnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">100</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                          <span style="color: #ff0000;">'b'</span> <span style="color: #339933;">=</span> rnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">100</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
                   <span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">chains</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">adapt</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">10</span><span style="color: #009900;">&#41;</span>
samples <span style="color: #339933;">&lt;-</span> coda.<span style="color: #202020;">samples</span><span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span>
                        c<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'b'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                        <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>
png<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'plot_6.png'</span><span style="color: #009900;">&#41;</span>
gelman.<span style="color: #202020;">plot</span><span style="color: #009900;">&#40;</span>samples<span style="color: #009900;">&#41;</span>
dev.<span style="color: #202020;">off</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/08/plot_61.png" alt="plot_6.png" border="0" width="480" height="480" /></div>
<p>Now you should be able to easily see that the chains are converging near the end of the sampling period, and that we would do well to give ourselves a greater adaptation period before using any of the samples we&#8217;ve generated, since the early samples in the chain are far too dispersed.</p>
<p>With these two techniques you should be able to diagnose obvious deficiencies in the length of your adaptive period.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/08/29/mcmc-diagnostics-in-r-with-the-coda-package/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Blegging for Data</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/08/28/blegging-for-data/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/08/28/blegging-for-data/#comments</comments>
		<pubDate>Sun, 29 Aug 2010 00:35:33 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3990</guid>
		<description><![CDATA[I&#8217;m in the middle of a new project that involves analyzing the packages that are currently on CRAN. As part of my work, I could really benefit from information about which packages are installed on people&#8217;s computers. If you&#8217;re willing to part with a bit of your time and privacy, I&#8217;d very much appreciate you [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m in the middle of a new project that involves analyzing the packages that are currently on CRAN. As part of my work, I could really benefit from information about which packages are installed on people&#8217;s computers. If you&#8217;re willing to part with a bit of your time and privacy, I&#8217;d very much appreciate you running the following script in R,</p>

<div class="wp_codebox"><table><tr id="p399020"><td class="line_numbers"><pre>1
2
3
4
</pre></td><td class="code" id="p3990code20"><pre class="c" style="font-family:monospace;">package.<span style="color: #202020;">info</span> <span style="color: #339933;">&lt;-</span> installed.<span style="color: #202020;">packages</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#91;</span><span style="color: #339933;">,</span>c<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span><span style="color: #0000dd;">3</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#93;</span>
write.<span style="color: #202020;">csv</span><span style="color: #009900;">&#40;</span>package.<span style="color: #202020;">info</span><span style="color: #339933;">,</span>
          file <span style="color: #339933;">=</span> <span style="color: #ff0000;">'my_installed_packages.csv'</span><span style="color: #339933;">,</span>
          row.<span style="color: #202020;">names</span> <span style="color: #339933;">=</span> FALSE<span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>and sending me the output file <code>my_installed_packages.csv</code> by e-mail to jmw@johnmyleswhite.com.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/08/28/blegging-for-data/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Providence at Work</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/08/27/providence-at-work/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/08/27/providence-at-work/#comments</comments>
		<pubDate>Fri, 27 Aug 2010 22:19:24 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Business]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3984</guid>
		<description><![CDATA[In August 2008, I commented on the remarkable incompetence of Blockbuster&#8217;s CEO. Now, in August 2010, it seems that we&#8217;re about to watch Blockbuster announce bankruptcy. It&#8217;s extraordinary how just the world can be at times.]]></description>
			<content:encoded><![CDATA[<p>In August 2008, I commented on <a href="http://www.johnmyleswhite.com/notebook/2008/08/20/no-one-will-ever-want-a-personal-computer/">the remarkable incompetence of Blockbuster&#8217;s CEO.</a> Now, in August 2010, it seems that we&#8217;re about to <a href="http://www.engadget.com/2010/08/27/blockbuster-filing-for-bankruptcy-next-month-probably/">watch Blockbuster announce bankruptcy.</a></p>
<p>It&#8217;s extraordinary how just the world can be at times.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/08/27/providence-at-work/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ProjectTemplate</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/08/26/projecttemplate/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/08/26/projecttemplate/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 20:31:29 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3973</guid>
		<description><![CDATA[Introduction As many people already know, I&#8217;ve recently uploaded a new R package called ProjectTemplate to GitHub and CRAN. The ProjectTemplate package provides a function, create.project(), that automatically builds a directory for a new R project with a clean sub-directory structure and automatic data and library loading tools. My hope is that standardized data loading, [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p>As many people already know, I&#8217;ve recently uploaded a new R package called ProjectTemplate to <a href="http://github.com/johnmyleswhite/ProjectTemplate">GitHub</a> and <a href="http://cran.r-project.org/web/packages/ProjectTemplate/index.html">CRAN</a>. The ProjectTemplate package provides a function, <code>create.project()</code>, that automatically builds a directory for a new R project with a clean sub-directory structure and automatic data and library loading tools. My hope is that standardized data loading, automatic importing of best practice packages, integrated unit testing and useful nudges towards keeping a cleanly organized codebase will improve the quality of R coding.</p>
<p>My inspiration for this approach comes from the <code>rails</code> command from Ruby on Rails, which initializes a new Rails project with the proper skeletal structure automatically. Also taken from Rails is ProjectTemplate&#8217;s approach of preferring convention over configuration: the automatic data and library loading as well as the automatic testing work out of the box because assumptions are made about the directory structure and naming conventions that will be used in your code. You can customize your codebase however you&#8217;d like, but you will have to edit the automation scripts to use your conventions instead of the defaults before you&#8217;ll get their benefits again.</p>
<p>In what follows, I try to highlight the state of the package as of today.</p>
<h3>Installing</h3>
<p>ProjectTemplate is available on CRAN and can be installed using a simple call to <code>install.packages()</code>:</p>

<div class="wp_codebox"><table><tr id="p397326"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p3973code26"><pre class="c" style="font-family:monospace;">install.<span style="color: #202020;">packages</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'ProjectTemplate'</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>If you would like access to changes that are not available in the <a href="http://cran.r-project.org/web/packages/ProjectTemplate/index.html">current version on CRAN</a>, please download the contents of the <a href="http://github.com/johnmyleswhite/ProjectTemplate">GitHub repository</a> and then run,</p>

<div class="wp_codebox"><table><tr id="p397327"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p3973code27"><pre class="sh" style="font-family:monospace;">R CMD BUILD .
R CMD INSTALL ProjectTemplate_*.tar.gz</pre></td></tr></table></div>

<h3>Example Code</h3>
<p>To create a new project called <code>my-project</code>, open R and type:</p>

<div class="wp_codebox"><table><tr id="p397328"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p3973code28"><pre class="c" style="font-family:monospace;">library<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'ProjectTemplate'</span><span style="color: #009900;">&#41;</span>
create.<span style="color: #202020;">project</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'my-project'</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>To enter that project&#8217;s home directory and start working, type:</p>

<div class="wp_codebox"><table><tr id="p397329"><td class="line_numbers"><pre>1
2
</pre></td><td class="code" id="p3973code29"><pre class="c" style="font-family:monospace;">setwd<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'my-project'</span><span style="color: #009900;">&#41;</span>
load.<span style="color: #202020;">project</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>Once you have code worth testing, you can also type,</p>

<div class="wp_codebox"><table><tr id="p397330"><td class="line_numbers"><pre>1
</pre></td><td class="code" id="p3973code30"><pre class="c" style="font-family:monospace;">run.<span style="color: #202020;">tests</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>to automatically run all of the unit tests in your <code>tests</code> directory.</p>
<p>If you&#8217;re interested in these last two functions, you should know that <code>load.project()</code> is essentially a mnemonic for calling <code>source('lib/boot.R')</code>, which automatically loads all of your libraries and data sets. Similarly, <code>run.tests()</code> is essentially a mnemonic for calling <code>source('lib/run_test.R')</code>, which automatically runs all of the &#8216;testthat&#8217; style unit tests contained in your <code>tests</code> directory.</p>
<h3>Overview</h3>
<p>As far as ProjectTemplate is concerned, a good project should look like the following:</p>
<ul>
<li>project/</li>
<ul>
<li>data/</li>
<li>diagnostics/</li>
<li>doc/</li>
<li>graphs/</li>
<li>lib/</li>
<ul>
<li>boot.R</li>
<li>load_data.R</li>
<li>load_libraries.R</li>
<li>preprocess_data.R</li>
<li>run_tests.R</li>
<li>utilities.R</li>
</ul>
<li>profiling/</li>
<li>reports/</li>
<li>tests/</li>
<li>README</li>
<li>TODO</li>
</ul>
</ul>
<p>To do work on such a project, enter the main directory, open R and type <code>source('lib/boot.R')</code>. This will then automatically perform the following actions:</p>
<ul>
<li><code>source('lib/load_libraries.R')</code>, which automatically loads the CRAN packages currently deemed best practices. At present, this list includes:
<ul>
<li>reshape</li>
<li>plyr</li>
<li>stringr</li>
<li>ggplot2</li>
<li>testthat</li>
</ul>
<li><code>source('lib/load_data.R')</code>, which automatically imports any CSV or TSV data files inside of the <code>data/</code> directory.</li>
<li><code>source('lib/preprocess_data.R')</code>, which allows you to make any run-time modifications to your data sets automatically. This is blank by default.</li>
</ul>
<h3>Default Project Layout</h3>
<p>Within your project directory, ProjectTemplate creates the following directories and files whose purpose is explained below:</p>
<ul>
<li><code>data</code>: Store your raw data files here. If they are CSV or TSV files, they will automatically be loaded when you call <code>load.project()</code> or <code>source('lib/boot.R')</codE>, for which <code>load.project()</code> is essentially a mnemonic.</li>
<li><code>diagnostics/</code>: Store any scripts you use to diagnose your data sets for corruption or problematic data points. You should also put code that globally censors any data points here.</li>
<li><code>doc/</code>: Store documentation for your analysis here.</li>
<li><code>graphs/</code>: Store any graphs that you produce here.</li>
<li><code>lib/</code>: Store any files that provide useful functionality for your work, but do not constitute a statistical analysis per se here.</li>
<li><code>lib/boot.R</code>: This script handles automatically loading the other files in <code>lib/</code> automatically. Calling <code>load.project()</code> automatically loads this file.</li>
<li><code>lib/load_data.R</code>: This script handles the automatic loading of any CSV and TSV files contained in <code>data/</code>.</li>
<li><code>lib/load_libraries.R</code>: This script handles the automatic loading of the best practice packages, which are reshape, plyr, stringr, ggplot2 and testthat.</li>
<li><code>lib/preprocess_data.R</code>: This script handles the preprocessing of your data, if you need to add columns at run-time or merge normalized data sets.</li>
<li><code>lib/run_tests.R</code>: This script automatically runs any test files contained in the <code>tests/</code> directory using the 'testthat' package. Calling <code>run.tests()</code> automatically runs this script.</li>
<li><code>lib/utilities.R</code>: This script should contain quick general purpose code that belongs in a package, but hasn't been packaged up yet.</li>
<li><code>profiling/</code>: Store any scripts you use to benchmark and time your code here.</li>
<li><code>reports/</code>: Store any output reports, such as HTML or LaTeX versions of tables here. Sweave documents should also go here.</li>
<li><code>tests/</code>: Store any test cases in this directory. Your test files should use 'testthat' style tests.</li>
<li><code>README</code>: Write notes to help orient newcomers to your project.</li>
<li><code>TODO</code>: Write a list of future improvements and bug fixes you have planned.</li>
</ul>
<h3>Request for Comments</h3>
<p>I would love to hear feedback about things that ProjectTemplate is missing or should do differently. Please leave any and all comments you have.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/08/26/projecttemplate/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Using JAGS in R with the rjags Package</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/08/20/using-jags-in-r-with-the-rjags-package/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/08/20/using-jags-in-r-with-the-rjags-package/#comments</comments>
		<pubDate>Fri, 20 Aug 2010 21:13:19 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3926</guid>
		<description><![CDATA[Get Everything Set Up I&#8217;m going to assume that you have access to a machine that will run JAGS. If you don&#8217;t, then you should be able to use WinBUGS, which is very easy to get set up. Unfortunately, the details of what follows may not help you as much if you&#8217;re using WinBUGS. To [...]]]></description>
			<content:encoded><![CDATA[<h3>Get Everything Set Up</h3>
<p>I&#8217;m going to assume that you have access to a machine that will run JAGS. If you don&#8217;t, then you should be able to use <a href="http://www.mrc-bsu.cam.ac.uk/bugs/">WinBUGS</a>, which is very easy to get set up. Unfortunately, the details of what follows may not help you as much if you&#8217;re using WinBUGS.</p>
<p>To set up your system for using JAGS, there are two very easy steps:</p>
<ol>
<li>Go <a href="http://www-fis.iarc.fr/~martyn/software/jags/">download the current version</a> of JAGS (2.1.0 as of 8/20/2010).</li>
<li>Install the current <a href="http://cran.r-project.org/web/packages/rjags/index.html">rjags</a> package from CRAN (2.1.0-6 as of 8/20/2010).</li>
</ol>
<p>Once you&#8217;ve done that, a simple call to <code>library('rjags')</code> will be enough to run JAGS from inside of R. You&#8217;ll want to do everything except model specification in R. You&#8217;ll specify the model in a separate file using BUGS/JAGS syntax.</p>
<h3>Example 1: Inference on Normally Distributed Data</h3>
<p>Let&#8217;s assume that you&#8217;ve got a bunch of data points from a normal distribution with unknown mean and variance. This is arguably the simplest data set you can analyze with JAGS. So, how do you perform the analysis?</p>
<p>First, let&#8217;s create some simulation data that we&#8217;ll use to test our JAGS model specification:</p>

<div class="wp_codebox"><table><tr id="p392641"><td class="line_numbers"><pre>1
2
3
4
5
6
7
</pre></td><td class="code" id="p3926code41"><pre class="c" style="font-family:monospace;">N <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">1000</span>
x <span style="color: #339933;">&lt;-</span> rnorm<span style="color: #009900;">&#40;</span>N<span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">5</span><span style="color: #009900;">&#41;</span>
&nbsp;
write.<span style="color: #202020;">table</span><span style="color: #009900;">&#40;</span>x<span style="color: #339933;">,</span>
            file <span style="color: #339933;">=</span> <span style="color: #ff0000;">'example1.data'</span><span style="color: #339933;">,</span>
            row.<span style="color: #202020;">names</span> <span style="color: #339933;">=</span> FALSE<span style="color: #339933;">,</span>
            col.<span style="color: #202020;">names</span> <span style="color: #339933;">=</span> FALSE<span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>We don&#8217;t actually need to write out the data since &#8216;rjags&#8217; automatically does this for us (and in another format at that), but it&#8217;s nice to be able to check that JAGS has done something reasonable by analyzing the raw inputs post hoc.</p>
<p>With your simulated data in hand, we&#8217;ll write up a model specification in JAGS syntax. Put the model specification in a file called <code>example1.bug</code>. The complete model looks like this:</p>

<div class="wp_codebox"><table><tr id="p392642"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
</pre></td><td class="code" id="p3926code42"><pre class="c" style="font-family:monospace;">model <span style="color: #009900;">&#123;</span>
	<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>i in <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>N<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
		x<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> ~ dnorm<span style="color: #009900;">&#40;</span>mu<span style="color: #339933;">,</span> tau<span style="color: #009900;">&#41;</span>
	<span style="color: #009900;">&#125;</span>
	mu ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color:#800080;">.0001</span><span style="color: #009900;">&#41;</span>
	tau <span style="color: #339933;">&lt;-</span> pow<span style="color: #009900;">&#40;</span>sigma<span style="color: #339933;">,</span> <span style="color: #339933;">-</span><span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span>
	sigma ~ dunif<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">100</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>In every model specification file, you have to start out by telling JAGS that you&#8217;re specifying a model. Then you set up the model for every single data point using a <code>for</code> loop. Here, we say that <code>x[i]</code> is distributed normally (hence the <code>dnorm()</code> call) with mean <code>mu</code> and precision <code>tau</code>, where the precision is simply the reciprocal of the variance. Then we specify our priors for <code>mu</code> and <code>tau</code>, which are meant to be constant across the loop. We tell JAGS that <code>mu</code> is distributed normally with mean 0 and standard deviation 100. This is meant to serve as a non-informative prior, since our data set was designed to have all measurements substantially below 100. Then we specify <code>tau</code> in a slightly round-about way. We say that <code>tau</code> is a deterministic function (hence the deterministic <code><-</code> instead of the distributional <code>~</code>) of <code>sigma</code>, after raising <code>sigma</code> to the -2 power. Then we say that <code>sigma</code> has a uniform prior over the interval [0,100].</p>
<p>With this model specified in <code>example1.bug</code>, we can write more R code to invoke it and perform inference properly:</p>

<div class="wp_codebox"><table><tr id="p392643"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code" id="p3926code43"><pre class="c" style="font-family:monospace;">library<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'rjags'</span><span style="color: #009900;">&#41;</span>
&nbsp;
jags <span style="color: #339933;">&lt;-</span> jags.<span style="color: #202020;">model</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'example1.bug'</span><span style="color: #339933;">,</span>
                   data <span style="color: #339933;">=</span> list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'x'</span> <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'N'</span> <span style="color: #339933;">=</span> N<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">chains</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">adapt</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">100</span><span style="color: #009900;">&#41;</span>
&nbsp;
update<span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span> <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>
&nbsp;
jags.<span style="color: #202020;">samples</span><span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span>
             c<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'mu'</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'tau'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
             <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>Obviously, we have to import the 'rjags' package. Then we need to set up our model object in R, which we do using the <code>jags.model()</code> function. We specify the JAGS model specification file and the data set, which is a named list where the names must be those used in the JAGS model specification file. Finally, we tell the system how many parallel chains to run. (If you don't understand what the chains represent, I'd suggest just playing around and then reading up about the issue of mixing in MCMC.) Finally, we tell the system how many samples should be thrown away as part of the adaptive sampling period for each chain. For this example, I suspect that we could safely set this parameter to 0, but it costs so little that I've used 100 just as a placeholder. After calling <code>jags.model()</code>, we receive a JAGS model object, which we store in the <code>jags</code> variable.</p>
<p>After all of that set up, I've chosen to have the system run another 1000 iterations of the sampler just to show how to use the <code>update()</code> function, even though it's completely unnecessary in this simple problem. Finally, we use <code>jags.sample()</code> to draw 1000 samples from the sampler for the values of the named variables <code>mu</code> and <code>tau</code>.</p>
<p>When you call <code>jags.sample()</code>, you'll see the output provides proposed values for <code>mu</code> and <code>tau</code>. These should be close to 0 and 0.04 if JAGS is working properly, since those were the mean and precision values we used to create our simulation data. (At the risk of being pedantic: we used a standard deviation of 5, which gives a variance of 25 and a precision of 1 / 25 = 0.04.) Of course, they'll be even closer to the sample mean <code>mean(x)</code> and the sample precision <code>1 / var(x)</code>, so you should not forget to compare the inferred values to these values. The sample size, 1,000, isn't large enough to guarantee that the mean will be all that close to 0.</p>
<h3>Example 2: Basic Linear Regression</h3>
<p>Moving on to a slightly more interesting example, we can perform a simple linear regression in JAGS very easily. As before, we set up simulation data from a theoretical linear model:</p>

<div class="wp_codebox"><table><tr id="p392644"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code" id="p3926code44"><pre class="c" style="font-family:monospace;">N <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">1000</span>
x <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>N
epsilon <span style="color: #339933;">&lt;-</span> rnorm<span style="color: #009900;">&#40;</span>N<span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span>
y <span style="color: #339933;">&lt;-</span> x <span style="color: #339933;">+</span> epsilon
&nbsp;
write.<span style="color: #202020;">table</span><span style="color: #009900;">&#40;</span>data.<span style="color: #202020;">frame</span><span style="color: #009900;">&#40;</span>X <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span> Y <span style="color: #339933;">=</span> y<span style="color: #339933;">,</span> Epsilon <span style="color: #339933;">=</span> epsilon<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
            file <span style="color: #339933;">=</span> <span style="color: #ff0000;">'example2.data'</span><span style="color: #339933;">,</span>
            row.<span style="color: #202020;">names</span> <span style="color: #339933;">=</span> FALSE<span style="color: #339933;">,</span>
            col.<span style="color: #202020;">names</span> <span style="color: #339933;">=</span> TRUE<span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>We then set up the Bayesian model for our regression in <code>example2.bug</code>:</p>

<div class="wp_codebox"><table><tr id="p392645"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code" id="p3926code45"><pre class="c" style="font-family:monospace;">model <span style="color: #009900;">&#123;</span>
	<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>i in <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>N<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		y<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> ~ dnorm<span style="color: #009900;">&#40;</span>y.<span style="color: #202020;">hat</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> tau<span style="color: #009900;">&#41;</span>
		y.<span style="color: #202020;">hat</span><span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;-</span> a <span style="color: #339933;">+</span> b <span style="color: #339933;">*</span> x<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span>
	<span style="color: #009900;">&#125;</span>
	a ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color:#800080;">.0001</span><span style="color: #009900;">&#41;</span>
	b ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color:#800080;">.0001</span><span style="color: #009900;">&#41;</span>
	tau <span style="color: #339933;">&lt;-</span> pow<span style="color: #009900;">&#40;</span>sigma<span style="color: #339933;">,</span> <span style="color: #339933;">-</span><span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span>
	sigma ~ dunif<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">100</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Here, we've said that every data point is drawn from a normal distribution with mean <code>a + b * x[i]</code> and precision <code>tau</code>. We assign non-informative normal priors to <code>a</code> and <code>b</code> and a non-informative uniform prior to the standard deviation <code>sigma</code>, which is deterministically transformed into <code>tau</code>.</p>
<p>Then, we run this model using the same exact approach as we used earlier:</p>

<div class="wp_codebox"><table><tr id="p392646"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code" id="p3926code46"><pre class="c" style="font-family:monospace;">library<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'rjags'</span><span style="color: #009900;">&#41;</span>
&nbsp;
jags <span style="color: #339933;">&lt;-</span> jags.<span style="color: #202020;">model</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'example2.bug'</span><span style="color: #339933;">,</span>
                   data <span style="color: #339933;">=</span> list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'x'</span> <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'y'</span> <span style="color: #339933;">=</span> y<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'N'</span> <span style="color: #339933;">=</span> N<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">chains</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">adapt</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">100</span><span style="color: #009900;">&#41;</span>
&nbsp;
update<span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span> <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>
&nbsp;
jags.<span style="color: #202020;">samples</span><span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span>
             c<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'b'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
             <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>After running the chain for a good number of samples, we draw inferences for <code>a</code> and <code>b</code>, which should be close to the proper values of 0 and 1. I've ignored <code>tau</code> here, though there's no reason not to check that it was properly inferred.</p>
<h3>Example 3: One Dimensional Logistic Regression</h3>
<p>Finally, it's good to see a model that's harder to implement without a good deal of knowledge of optimization tools unless you use a sampling technique like the one JAGS automates. For that purpose, I'll show how to implement logistic regression. Here we set up a simple one-dimensional predictor for our binary outcome variable and assume the standard logistic model:</p>

<div class="wp_codebox"><table><tr id="p392647"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code" id="p3926code47"><pre class="c" style="font-family:monospace;">N <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">1000</span>
x <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>N
z <span style="color: #339933;">&lt;-</span> <span style="color:#800080;">0.01</span> <span style="color: #339933;">*</span> x <span style="color: #339933;">-</span> <span style="color: #0000dd;">5</span>
y <span style="color: #339933;">&lt;-</span> sapply<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span> <span style="color: #339933;">/</span> <span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span> <span style="color: #339933;">+</span> exp<span style="color: #009900;">&#40;</span><span style="color: #339933;">-</span>z<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>rbinom<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> p<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span>
&nbsp;
write.<span style="color: #202020;">table</span><span style="color: #009900;">&#40;</span>data.<span style="color: #202020;">frame</span><span style="color: #009900;">&#40;</span>X <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span> Z <span style="color: #339933;">=</span> z<span style="color: #339933;">,</span> Y <span style="color: #339933;">=</span> y<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
            file <span style="color: #339933;">=</span> <span style="color: #ff0000;">'example3.data'</span><span style="color: #339933;">,</span>
            row.<span style="color: #202020;">names</span> <span style="color: #339933;">=</span> FALSE<span style="color: #339933;">,</span>
            col.<span style="color: #202020;">names</span> <span style="color: #339933;">=</span> TRUE<span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>Then we set up our Bayesian model in <code>example3.bug</code>, where <code>y[i]</code> is Bernoulli distributed (or binomial distributed with 1 draw, if you prefer that sort of thing) and the linear model coefficients <code>a</code> and <code>b</code> are given non-informative normal priors:</p>

<div class="wp_codebox"><table><tr id="p392648"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
</pre></td><td class="code" id="p3926code48"><pre class="c" style="font-family:monospace;">model <span style="color: #009900;">&#123;</span>
	<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>i in <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>N<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		y<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> ~ dbern<span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span>
		p<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">1</span> <span style="color: #339933;">/</span> <span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span> <span style="color: #339933;">+</span> exp<span style="color: #009900;">&#40;</span><span style="color: #339933;">-</span>z<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
		z<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;-</span> a <span style="color: #339933;">+</span> b <span style="color: #339933;">*</span> x<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span>
	<span style="color: #009900;">&#125;</span>
	a ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color:#800080;">.0001</span><span style="color: #009900;">&#41;</span>
	b ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color:#800080;">.0001</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Finally, we perform our standard inference calls in R to run the model through JAGS and extract predicted values for <code>a</code> and <code>b</code>.</p>

<div class="wp_codebox"><table><tr id="p392649"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code" id="p3926code49"><pre class="c" style="font-family:monospace;">library<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'rjags'</span><span style="color: #009900;">&#41;</span>
&nbsp;
jags <span style="color: #339933;">&lt;-</span> jags.<span style="color: #202020;">model</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'example3.bug'</span><span style="color: #339933;">,</span>
                   data <span style="color: #339933;">=</span> list<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'x'</span> <span style="color: #339933;">=</span> x<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'y'</span> <span style="color: #339933;">=</span> y<span style="color: #339933;">,</span>
                               <span style="color: #ff0000;">'N'</span> <span style="color: #339933;">=</span> N<span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">chains</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span>
                   n.<span style="color: #202020;">adapt</span> <span style="color: #339933;">=</span> <span style="color: #0000dd;">100</span><span style="color: #009900;">&#41;</span>
&nbsp;
update<span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span> <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span>
&nbsp;
jags.<span style="color: #202020;">samples</span><span style="color: #009900;">&#40;</span>jags<span style="color: #339933;">,</span>
             c<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'b'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
             <span style="color: #0000dd;">1000</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>As always, you should check that the outputs you get make sense: here, you expect <code>a</code> to be approximately -5 and <code>b</code> to be around 0.01. This inference problem should take a good bit longer to solve: there are other tools for handling logistic regressions in JAGS that are faster, but I find this approach conceptually simplest and best for highlighting the similarity to a standard linear regression.</p>
<h3>Some Errors I Made at the Start</h3>
<p>Here are a few errors that stumped me for a bit as I got started using JAGS today:</p>
<ol>
<li><b>Error 1, 'Invalid parent error'</b>: I got this error when I erroneously assigned  a normal prior to one of my precision variables <code>tau</code>. This is nonsensical, since precisions are always positive. Hence, the parent node involving <code>tau</code> was deemed to be 'invalid', causing a fatal run-time error.</li>
<li><b>Error 2, 'Attempt to redefine node z[1]'</b>: This is an error that Gelman and Hill warn users about in their <a href="http://www.stat.columbia.edu/~gelman/arm/">ARM book</a>: you must be sure that you don't treat the values in loops as local variables, because they cannot be reset on each iteration -- they must have unique values across all iterations. Thus, you must build an array for all of the variables that you might think of as local within loops, such as intermediate latent variables. Not doing so will produce fatal run-time errors.</li>
<li>Error 3, 'Invalid vector argument to exp'</b>: This is related to the error above: when I corrected part of my attempt to reset <code>z</code> on each pass through the logistic loop in Example 3, I forgot to reset it in the definition of <code>p[i]</code>. This led to an invalid vector with no definition being passed to the <code>exp()</code> function, giving a fatal run-time error.</li>
</ol>
<h3>One Question I Have</h3>
<p>My first attempt to run a linear regression didn't work. I still don't entirely understand why, but here is the alternative code that failed if you ever make the same type of mistake and find yourself puzzled:</p>

<div class="wp_codebox"><table><tr id="p392650"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code" id="p3926code50"><pre class="c" style="font-family:monospace;">model <span style="color: #009900;">&#123;</span>
	<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>i in <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>N<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#123;</span>
		y<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">&lt;-</span> a <span style="color: #339933;">+</span> b <span style="color: #339933;">*</span> x<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">+</span> epsilon<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span>
                epsilon<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> tau<span style="color: #009900;">&#41;</span>
	<span style="color: #009900;">&#125;</span>
	a ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color:#800080;">.0001</span><span style="color: #009900;">&#41;</span>
	b ~ dnorm<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color:#800080;">.0001</span><span style="color: #009900;">&#41;</span>
	tau <span style="color: #339933;">&lt;-</span> pow<span style="color: #009900;">&#40;</span>sigma<span style="color: #339933;">,</span> <span style="color: #339933;">-</span><span style="color: #0000dd;">2</span><span style="color: #009900;">&#41;</span>
	sigma ~ dunif<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">100</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>My assumption is that it's more difficult to infer values for all the <code>epsilon</code>'s at the same time as <code>tau</code>, which makes this harder than the earlier call without any explicit <code>epsilon</code> values. If that's wrong, please do correct me. Another hypothesis I entertained is that it's a problem to ever set <code>y[i]</code> to be a deterministic node, though this doesn't seem really plausible to me.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/08/20/using-jags-in-r-with-the-rjags-package/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Twifficiency Scores</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/08/18/twifficiency-scores/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/08/18/twifficiency-scores/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 12:52:27 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3924</guid>
		<description><![CDATA[Neil Kodner wrote a great post this morning about yesterday&#8217;s Twifficiency scores outbreak. He grabbed all the auto-tweeted scores he could find and plotted their distribution. I was struck by the asymmetry of the resulting distribution, which you can see below: Thankfully, Neil handed me the raw data for his plot, so I was able [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.neilkodner.com">Neil Kodner</a> wrote a great post this morning about <a href="http://www.neilkodner.com/2010/08/twifficiency-scores-analyzed-and-visualized">yesterday&#8217;s Twifficiency scores outbreak</a>. He grabbed all the auto-tweeted scores he could find and plotted their distribution. I was struck by the asymmetry of the resulting distribution, which you can see below:</p>
<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/08/twifficiencyfreq.png" alt="twifficiencyfreq.png" border="0" width="900" height="734" /></div>
<p>Thankfully, Neil handed me the raw data for his plot, so I was able to run a K-S test for normality, which rejected normality pretty easily, though I&#8217;m coming up with a tie that I&#8217;m surprised by:</p>

<div class="wp_codebox"><table><tr id="p392452"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code" id="p3924code52"><pre class="c" style="font-family:monospace;">scores <span style="color: #339933;">&lt;-</span> read.<span style="color: #202020;">csv</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'twifficiencyscores.txt'</span><span style="color: #339933;">,</span> header <span style="color: #339933;">=</span> FALSE<span style="color: #009900;">&#41;</span>
scores <span style="color: #339933;">&lt;-</span> scores<span style="color: #009900;">&#91;</span><span style="color: #339933;">,</span><span style="color: #0000dd;">1</span><span style="color: #009900;">&#93;</span>
&nbsp;
m <span style="color: #339933;">&lt;-</span> mean<span style="color: #009900;">&#40;</span>scores<span style="color: #009900;">&#41;</span>
s <span style="color: #339933;">&lt;-</span> sd<span style="color: #009900;">&#40;</span>scores<span style="color: #009900;">&#41;</span>
&nbsp;
ks.<span style="color: #202020;">test</span><span style="color: #009900;">&#40;</span>scores<span style="color: #339933;">,</span> <span style="color: #ff0000;">'pnorm'</span><span style="color: #339933;">,</span> m<span style="color: #339933;">,</span> s<span style="color: #009900;">&#41;</span>
<span style="color: #339933;">#</span>
<span style="color: #339933;">#	One-sample Kolmogorov-Smirnov test</span>
<span style="color: #339933;">#</span>
<span style="color: #339933;">#data:  scores </span>
<span style="color: #339933;">#D = 0.0616, p-value &lt; 2.2e-16</span>
<span style="color: #339933;">#alternative hypothesis: two-sided </span>
<span style="color: #339933;">#</span>
<span style="color: #339933;">#Warning message:</span>
<span style="color: #339933;">#In ks.test(scores, &quot;pnorm&quot;, m, s) :</span>
<span style="color: #339933;">#  cannot compute correct p-values with ties</span></pre></td></tr></table></div>

<p>I suppose that I&#8217;m a bit worried that the p-value is simply a reflection of sample size here, since there are 7089 measurements. Would it be more compelling to bootstrap the D score from the K-S test on samples of 500 scores at a time to confirm that the non-normality is present even in small groups of scores?</p>
<p>Assuming that the data really has a skewed distribution, does anyone understand the scoring system well enough to say what produces the asymmetry?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/08/18/twifficiency-scores/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Unit Testing in R: The Bare Minimum</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/08/17/unit-testing-in-r-the-bare-minimum/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/08/17/unit-testing-in-r-the-bare-minimum/#comments</comments>
		<pubDate>Tue, 17 Aug 2010 19:18:14 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3891</guid>
		<description><![CDATA[Introduction This week I decided to start unit testing my R code, so I taught myself the bare minimum about the RUnit and testthat packages to be able to use them. Here&#8217;s what I found necessary to get started writing tests with both packages. RUnit Basic Example I&#8217;m going to assume that you&#8217;ve got a [...]]]></description>
			<content:encoded><![CDATA[<h3>Introduction</h3>
<p>This week I decided to start unit testing my R code, so I taught myself the bare minimum about the RUnit and testthat packages to be able to use them. Here&#8217;s what I found necessary to get started writing tests with both packages.</p>
<h3>RUnit Basic Example</h3>
<p>I&#8217;m going to assume that you&#8217;ve got a bunch of functions in <code>sample.R</code> that you want to test. For example, <code>sample.R</code> might contain a definition of the naïve factorial function:</p>

<div class="wp_codebox"><table><tr id="p389158"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code" id="p3891code58"><pre class="c" style="font-family:monospace;">factorial <span style="color: #339933;">&lt;-</span> <span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>n<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>n <span style="color: #339933;">==</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">return</span><span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#125;</span>
  <span style="color: #b1b100;">else</span>
  <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">return</span><span style="color: #009900;">&#40;</span>n <span style="color: #339933;">*</span> factorial<span style="color: #009900;">&#40;</span>n <span style="color: #339933;">-</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>To test your functions, create a directory called <code>tests</code> that will store all of your test cases. In <code>tests</code>, create a file called <code>1.R</code> that will contain your first set of tests. Each set of tests will go in a function inside of <code>1.R</code>, named according to the convention <code>test.*</code>. For example, you might have this:</p>

<div class="wp_codebox"><table><tr id="p389159"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code" id="p3891code59"><pre class="c" style="font-family:monospace;">test.<span style="color: #202020;">examples</span> <span style="color: #339933;">&lt;-</span> <span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  checkEquals<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">6</span><span style="color: #339933;">,</span> factorial<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">3</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
  checkEqualsNumeric<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">6</span><span style="color: #339933;">,</span> factorial<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">3</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
  checkIdentical<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">6</span><span style="color: #339933;">,</span> factorial<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">3</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
  checkTrue<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">2</span> <span style="color: #339933;">+</span> <span style="color: #0000dd;">2</span> <span style="color: #339933;">==</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'Arithmetic works'</span><span style="color: #009900;">&#41;</span>
  checkException<span style="color: #009900;">&#40;</span>log<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">'Unable to take the log() of a string'</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
test.<span style="color: #202020;">deactivation</span> <span style="color: #339933;">&lt;-</span> <span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  DEACTIVATED<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'Deactivating this test function'</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>To run this set of tests, we need to create a file called <code>run_tests.R</code> that will act as a test suite and invoke all of the tests in your <code>tests</code> directory:</p>

<div class="wp_codebox"><table><tr id="p389160"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code" id="p3891code60"><pre class="c" style="font-family:monospace;">library<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'RUnit'</span><span style="color: #009900;">&#41;</span>
&nbsp;
source<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'sample.R'</span><span style="color: #009900;">&#41;</span>
&nbsp;
test.<span style="color: #202020;">suite</span> <span style="color: #339933;">&lt;-</span> defineTestSuite<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;example&quot;</span><span style="color: #339933;">,</span>
                              dirs <span style="color: #339933;">=</span> file.<span style="color: #202020;">path</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">&quot;tests&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                              testFileRegexp <span style="color: #339933;">=</span> <span style="color: #ff0000;">'^<span style="color: #000099; font-weight: bold;">\\</span>d+<span style="color: #000099; font-weight: bold;">\\</span>.R'</span><span style="color: #009900;">&#41;</span>
&nbsp;
test.<span style="color: #202020;">result</span> <span style="color: #339933;">&lt;-</span> runTestSuite<span style="color: #009900;">&#40;</span>test.<span style="color: #202020;">suite</span><span style="color: #009900;">&#41;</span>
&nbsp;
printTextProtocol<span style="color: #009900;">&#40;</span>test.<span style="color: #202020;">result</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>Here you inform the <code>defineTestSuite()</code> function that you&#8217;re creating a test suite called &#8220;example&#8221; and that the test files are located in a directory called &#8220;tests&#8221; where all of the files match the regular expression &#8216;^\\d+\\.R&#8217;. Then you run the suite explicitly and print out the results in a text format.</p>
<p>That&#8217;s it. With those ideas, you can write your own test suite for your R code.</p>
<h3>Using the check*() Functions </h3>
<p>In general, with RUnit, you use a function named something like <code>check*</code> to test the following conditions:</p>
<ul>
<li>checkEquals: Are two objects equal, including named attributes?</li>
<li>checkEqualsNumeric: Are two numeric values equal?</li>
<li>checkIdentical: Are two objects exactly the same?</li>
<li>checkTrue: Does an expression evaluate to <code>TRUE</code>?</li>
<li>checkException: Does an expression raise an error?</li>
</ul>
<p>In addition to these functions, there&#8217;s also a <code>DEACTIVATED()</code> function that lets you turn off a test function during its execution if you need to do that.</p>
<h3>testthat Basic Example</h3>
<p>As above, I&#8217;m going to assume that you&#8217;ve got a bunch of functions in <code>sample.R</code> that you want to test. And, as before, to test your functions, you should create a directory called <code>tests</code> that will store all of your test cases. In <code>tests</code>, create a file called <code>1.R</code> that will contain your first set of tests. We&#8217;ll use <code>expect_that()</code> for all of our tests, as in the example below:</p>

<div class="wp_codebox"><table><tr id="p389161"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code" id="p3891code61"><pre class="c" style="font-family:monospace;">expect_that<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span> <span style="color: #339933;">^</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> equals<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
expect_that<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">2</span> <span style="color: #339933;">^</span> <span style="color: #0000dd;">2</span><span style="color: #339933;">,</span> equals<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">4</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
&nbsp;
expect_that<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">2</span> <span style="color: #339933;">+</span> <span style="color: #0000dd;">2</span> <span style="color: #339933;">==</span> <span style="color: #0000dd;">4</span><span style="color: #339933;">,</span> is_true<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
expect_that<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">2</span> <span style="color: #339933;">==</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> is_false<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
&nbsp;
expect_that<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> is_a<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'numeric'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
&nbsp;
expect_that<span style="color: #009900;">&#40;</span>print<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'Hello World!'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> prints_text<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'Hello World!'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
&nbsp;
expect_that<span style="color: #009900;">&#40;</span>log<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'a'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> throws_error<span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
&nbsp;
expect_that<span style="color: #009900;">&#40;</span>factorial<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">16</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> takes_less_than<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>To run this set of tests, we need to create a file called <code>run_tests.R</code> that will act as a test suite and invoke all of the tests in your <code>tests</code> directory:</p>

<div class="wp_codebox"><table><tr id="p389162"><td class="line_numbers"><pre>1
2
3
4
5
</pre></td><td class="code" id="p3891code62"><pre class="c" style="font-family:monospace;">library<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'testthat'</span><span style="color: #009900;">&#41;</span>
&nbsp;
source<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'sample.R'</span><span style="color: #009900;">&#41;</span>
&nbsp;
test_dir<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'tests'</span><span style="color: #339933;">,</span> reporter <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Summary'</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<p>To get output, you have to inform <code>test_dir()</code> to use the SummaryReporter, which provides more than enough information for my purposes. See the testthat docs for other reporters you could use.</p>
<h3>Using expect_that() </h3>
<p>In general, you can ask <code>expect_that()</code> to test the following conditions:</p>
<ul>
<li>is_true: Does the expression evaluate to <code>TRUE</code>?</li>
<li>is_false: Does the expression evaluate to <code>FALSE</code>?</li>
<li>is_a: Did the object inherit from a specified class?</li>
<li>equals: Is the expression equal within numerical tolerance to your expected value?</li>
<li>is_equivalent_to: Is the object equal up to attributes to your expected value?</li>
<li>is_identical_to: Is the object exactly equal to your expected value?</li>
<li>matches: Does a string match the specified regular expression?</li>
<li>prints_text: Does the text that&#8217;s printed match the specified regular expression?</li>
<li>throws_error: Does the expression raise an error?</li>
<li>takes_less_than: Does the expression take less than a specified number of seconds to run?</li>
</ul>
<h3>More testthat Tricks</h3>
<p>There are some other tricks that testthat can do as well. It can automatically rerun tests on a directory of code whenever the code is edited using a function called <code>auto_test()</code> and it can set up contexts to separate tests using a function <code>context()</code>. I haven&#8217;t really explored either, so I can&#8217;t comment more on them.</p>
<h3>Which to Use?</h3>
<p>While I don&#8217;t think the arguments on behalf of either RUnit or testthat are unquestionable, I&#8217;m inclined to think that testthat has a brighter future, especially since it&#8217;s written by ggplot2&#8242;s author, <a href="http://had.co.nz/">Hadley Wickham</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/08/17/unit-testing-in-r-the-bare-minimum/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Announcing Junebug</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/07/04/announcing-junebug/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/07/04/announcing-junebug/#comments</comments>
		<pubDate>Sun, 04 Jul 2010 13:41:56 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Junebug]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3855</guid>
		<description><![CDATA[Today, my friend Jim Keller and I are officially opening our new dating site, Junebug, to the public. For the moment, we&#8217;re focusing on building a user base in the areas around Philadelphia and New York City, but the site is open to everyone in the US. If you&#8217;d like to check it out, please [...]]]></description>
			<content:encoded><![CDATA[<p>Today, my friend Jim Keller and I are officially opening our new dating site, <a href="http://junebugdating.com">Junebug</a>, to the public. For the moment, we&#8217;re focusing on building a user base in the areas around Philadelphia and New York City, but the site is open to everyone in the US. If you&#8217;d like to check it out, please go to <a href="http://junebugdating.com">junebugdating.com</a>.</p>
<p>I decided to get involved with this project because I feel that existing dating sites are not taking advantage of the tools that machine learning provides for building better recommendation systems, which a dating site effectively is at its core. The contrast between the quality of suggestions that you get from Netflix and the quality of matches you get on existing dating sites is so great that I felt compelled to get involved, if only to see if we could do something to improve the state of the art in the field.</p>
<p>While there are many reasons why matching people with other people that they&#8217;d be interested in dating is harder than matching people with movies they&#8217;d be interested in watching or webpages they&#8217;d be interested in reading, I&#8217;ve become increasingly convinced that the big players in the online dating market are either unaware of or simply indifferent to the tools that computer scientists can offer them. I&#8217;d really like to help remedy that. With the so-called Age of Big Data upon us, I think dating sites need to exploit the amount of information they collect by turning things over to statistical algorithms that can do a more effective job of inferring people&#8217;s preferences.</p>
<p>We&#8217;ve built Junebug to do just that. We&#8217;ve got a lot of great algorithms ready to churn through the user data we&#8217;re hoping to collect. We need your help, though, because we don&#8217;t have much real data yet. If you&#8217;re reading this and are single, please sign up, try out the site, and tell us what we can do to make it more enjoyable to use. And it would be even better if you told your friends to try out the site as well, since that would start to provide us with the amount of data we need to really show much more effective a systematic statistical approach can be for building a better dating site.</p>
<p>If you think that online dating can be improved by the introduction of more sophisticated mathematical analysis, please helps us prove that&#8217;s true by spreading the word and getting new people to participate on our site. We can promise to do everything we know how to do to turn the data we get into a valuable service for our users.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/07/04/announcing-junebug/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Doing Maximum Likelihood Estimation by Hand in R</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/04/21/doing-maximum-likelihood-estimation-by-hand-in-r/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/04/21/doing-maximum-likelihood-estimation-by-hand-in-r/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 14:14:47 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3836</guid>
		<description><![CDATA[Lately I&#8217;ve been writing maximum likelihood estimation code by hand for some economic models that I&#8217;m working with. It&#8217;s actually a fairly simple task, so I thought that I would write up the basic approach in case there are readers who haven&#8217;t built a generic estimation system before. First, let&#8217;s start with a toy example [...]]]></description>
			<content:encoded><![CDATA[<p>Lately I&#8217;ve been writing maximum likelihood estimation code by hand for some economic models that I&#8217;m working with. It&#8217;s actually a fairly simple task, so I thought that I would write up the basic approach in case there are readers who haven&#8217;t built a generic estimation system before.</p>
<p>First, let&#8217;s start with a toy example for which there is a closed-form analytic solution. We&#8217;ll ignore that solution and use optimization functions to do the estimation. Starting with this toy example makes it easy to see how well an approximation system can be expected to perform under the best circumstances &#8212; and also where it goes wrong if you make poor programming decisions.</p>
<p>Suppose that you&#8217;ve got a sequence of values from an unknown Bernoulli variable like so:</p>

<div class="wp_codebox"><table><tr id="p383673"><td class="line_numbers"><pre>1
2
3
</pre></td><td class="code" id="p3836code73"><pre class="c" style="font-family:monospace;">p.<span style="color: #202020;">parameter</span> <span style="color: #339933;">&lt;-</span> <span style="color:#800080;">0.8</span>
sequence <span style="color: #339933;">&lt;-</span> rbinom<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">10</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> p.<span style="color: #202020;">parameter</span><span style="color: #009900;">&#41;</span>
<span style="color: #339933;"># [1] 0 1 1 1 1 1 1 0 1 0</span></pre></td></tr></table></div>

<p>Given the sequence, we want to estimate the value of the parameter, <i><b>p</b></i>, which is not known to us. The maximum likelihood approach says that we should select the parameter that makes the data most probable. For a Bernoulli variable, this is simply a search through the space of values for <i><b>p</b></i> (i.e [0, 1]) that makes the data most probable to have observed.</p>
<p>It&#8217;s worth pointing out that the analytic solution to the maximum likelihood estimation problem is to use the sample mean. We&#8217;ll therefore use <code>mean(sequence)</code> as a measure of the accuracy of our approximation algorithm.</p>
<p>How do we find the parameter numerically? First, we want to define a function that specifies the probability of our entire data set. We assume that each observation in the data is independently and identically distributed, so that the probability of the sequence is the product of the probabilities of each value. For the Bernoulli variables, this becomes the following function:</p>

<div class="wp_codebox"><table><tr id="p383674"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code" id="p3836code74"><pre class="c" style="font-family:monospace;">likelihood <span style="color: #339933;">&lt;-</span> <span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>sequence<span style="color: #339933;">,</span> p.<span style="color: #202020;">parameter</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  likelihood <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">1</span>
&nbsp;
  <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>i in <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>length<span style="color: #009900;">&#40;</span>sequence<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>sequence<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
      likelihood <span style="color: #339933;">&lt;-</span> likelihood <span style="color: #339933;">*</span> p.<span style="color: #202020;">parameter</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #b1b100;">else</span>
    <span style="color: #009900;">&#123;</span>
      likelihood <span style="color: #339933;">&lt;-</span> likelihood <span style="color: #339933;">*</span> <span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span> <span style="color: #339933;">-</span> p.<span style="color: #202020;">parameter</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #b1b100;">return</span><span style="color: #009900;">&#40;</span>likelihood<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>To do maximum likelihood estimation, we therefore only need to use an optimization function to maximize this function. A quick examination of the likelihood function as a function of <i><b>p</b></i> makes it clear that any decent optimization algorithm should be able to find the maximum:</p>

<div class="wp_codebox"><table><tr id="p383675"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code" id="p3836code75"><pre class="c" style="font-family:monospace;">possible.<span style="color: #202020;">p</span> <span style="color: #339933;">&lt;-</span> seq<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> by <span style="color: #339933;">=</span> <span style="color:#800080;">0.001</span><span style="color: #009900;">&#41;</span>
jpeg<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'Likelihood_Concavity.jpg'</span><span style="color: #009900;">&#41;</span>
library<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'ggplot2'</span><span style="color: #009900;">&#41;</span>
qplot<span style="color: #009900;">&#40;</span>possible.<span style="color: #202020;">p</span><span style="color: #339933;">,</span>
      sapply<span style="color: #009900;">&#40;</span>possible.<span style="color: #202020;">p</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">function</span> <span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>likelihood<span style="color: #009900;">&#40;</span>sequence<span style="color: #339933;">,</span> p<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
      geom <span style="color: #339933;">=</span> <span style="color: #ff0000;">'line'</span><span style="color: #339933;">,</span>
      main <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Likelihood as a Function of P'</span><span style="color: #339933;">,</span>
      xlab <span style="color: #339933;">=</span> <span style="color: #ff0000;">'P'</span><span style="color: #339933;">,</span>
      ylab <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Likelihood'</span><span style="color: #009900;">&#41;</span>
dev.<span style="color: #202020;">off</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/04/Likelihood_Concavity.jpg" alt="Likelihood_Concavity.jpg" border="0" width="480" height="480" /></div>
<p>For single variable cases, I find that it&#8217;s easiest to use R&#8217;s base function <code>optimize</code> to solve the optimization problem:</p>

<div class="wp_codebox"><table><tr id="p383676"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code" id="p3836code76"><pre class="c" style="font-family:monospace;">mle.<span style="color: #202020;">results</span> <span style="color: #339933;">&lt;-</span> optimize<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>likelihood<span style="color: #009900;">&#40;</span>sequence<span style="color: #339933;">,</span> p<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
                        interval <span style="color: #339933;">=</span> c<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                        maximum <span style="color: #339933;">=</span> TRUE<span style="color: #009900;">&#41;</span>
&nbsp;
mle.<span style="color: #202020;">results</span>
<span style="color: #339933;"># $maximum</span>
<span style="color: #339933;"># [1] 0.6999843</span>
<span style="color: #339933;">#</span>
<span style="color: #339933;"># $objective</span>
<span style="color: #339933;"># [1] 0.002223566</span></pre></td></tr></table></div>

<p>Here I&#8217;ve used an anonymous function that returns the likelihood of our current data given a value of <i><b>p</b></i>; I&#8217;ve also specified that the values of <i><b>p</b></i> must lie in the interval [0, 1] and asked <code>optimize</code> to maximize the result, rather than minimize, which is the default behavior. Examining the output of <code>optimize</code>, we can see that the likelihood of the data set was maximized very near 0.7, the sample mean. This suggests that the optimization approximation can work. It&#8217;s worth noting that the objective value is the likelihood of the data set for the specified value of <i><b>p</b></i>. The smallness of the objective for large problems can become a major problem. To understand why, it&#8217;s worth seeing what happens as the size of the sample grows from 10 to 2500 samples:</p>

<div class="wp_codebox"><table><tr id="p383677"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="code" id="p3836code77"><pre class="c" style="font-family:monospace;">error.<span style="color: #202020;">behavior</span> <span style="color: #339933;">&lt;-</span> data.<span style="color: #202020;">frame</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
&nbsp;
<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>n in <span style="color: #0000dd;">10</span><span style="color: #339933;">:</span><span style="color: #0000dd;">2500</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  sequence <span style="color: #339933;">&lt;-</span> rbinom<span style="color: #009900;">&#40;</span>n<span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> p.<span style="color: #202020;">parameter</span><span style="color: #009900;">&#41;</span>
&nbsp;
  likelihood.<span style="color: #202020;">results</span> <span style="color: #339933;">&lt;-</span> optimize<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>likelihood<span style="color: #009900;">&#40;</span>sequence<span style="color: #339933;">,</span> p<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
                                 interval <span style="color: #339933;">=</span> c<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                                 maximum <span style="color: #339933;">=</span> TRUE<span style="color: #009900;">&#41;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">true</span>.<span style="color: #202020;">mle</span> <span style="color: #339933;">&lt;-</span> mean<span style="color: #009900;">&#40;</span>sequence<span style="color: #009900;">&#41;</span>
&nbsp;
  likelihood.<span style="color: #202020;">error</span> <span style="color: #339933;">&lt;-</span> <span style="color: #000000; font-weight: bold;">true</span>.<span style="color: #202020;">mle</span> <span style="color: #339933;">-</span> likelihood.<span style="color: #202020;">results</span>$maximum
&nbsp;
  error.<span style="color: #202020;">behavior</span> <span style="color: #339933;">&lt;-</span> rbind<span style="color: #009900;">&#40;</span>error.<span style="color: #202020;">behavior</span><span style="color: #339933;">,</span>
                          data.<span style="color: #202020;">frame</span><span style="color: #009900;">&#40;</span>N <span style="color: #339933;">=</span> n<span style="color: #339933;">,</span>
                                     Error <span style="color: #339933;">=</span> likelihood.<span style="color: #202020;">error</span><span style="color: #339933;">,</span>
                                     Algorithm <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Likelihood'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/04/Likelihood_Problems.jpg" alt="Likelihood_Problems.jpg" border="0" width="480" height="480" /></div>
<p>As you can see, our approximation approach works great until our data set grows, and then it falls apart. This is exactly the opposite of what asymptotical statistical theory tells us should be happening, so it&#8217;s clear that something is going very wrong. A quick examination of the results from the last pass through our loop makes clear what&#8217;s wrong:</p>

<div class="wp_codebox"><table><tr id="p383678"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code" id="p3836code78"><pre class="c" style="font-family:monospace;">sequence <span style="color: #339933;">&lt;-</span> rbinom<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">2500</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> p.<span style="color: #202020;">parameter</span><span style="color: #009900;">&#41;</span>
&nbsp;
likelihood.<span style="color: #202020;">results</span> <span style="color: #339933;">&lt;-</span> optimize<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>likelihood<span style="color: #009900;">&#40;</span>sequence<span style="color: #339933;">,</span> p<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
                               interval <span style="color: #339933;">=</span> c<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                               maximum <span style="color: #339933;">=</span> TRUE<span style="color: #009900;">&#41;</span>
&nbsp;
likelihood.<span style="color: #202020;">results</span>
<span style="color: #339933;"># $maximum</span>
<span style="color: #339933;"># [1] 0.9999339</span>
<span style="color: #339933;">#</span>
<span style="color: #339933;"># $objective</span>
<span style="color: #339933;"># [1] 0</span></pre></td></tr></table></div>

<p>The likelihood of our data is numerically indistinguishable from 0 given the precision of my machine&#8217;s floating point values. Multiplying thousands of probabilities together is simply not a viable approach without infinite precision. Thankfully, there&#8217;s a very simple solution: replace all of the probabilities with their logarithms. Instead of maximizing the likelihood, we maximize the log likelihood, which involves summing rather than multiplying, and therefore stays numerically stable:</p>

<div class="wp_codebox"><table><tr id="p383679"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code" id="p3836code79"><pre class="c" style="font-family:monospace;">log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">&lt;-</span> <span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>sequence<span style="color: #339933;">,</span> p<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">0</span>
&nbsp;
  <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>i in <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>length<span style="color: #009900;">&#40;</span>sequence<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#123;</span>
    <span style="color: #b1b100;">if</span> <span style="color: #009900;">&#40;</span>sequence<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">==</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#123;</span>
      log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">&lt;-</span> log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">+</span> log<span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #b1b100;">else</span>
    <span style="color: #009900;">&#123;</span>
      log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">&lt;-</span> log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">+</span> log<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">1</span> <span style="color: #339933;">-</span> p<span style="color: #009900;">&#41;</span>
    <span style="color: #009900;">&#125;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #b1b100;">return</span><span style="color: #009900;">&#40;</span>log.<span style="color: #202020;">likelihood</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>You can check that this problem is as easily solved numerically as the original problem by graphing the log likelihood:</p>

<div class="wp_codebox"><table><tr id="p383680"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code" id="p3836code80"><pre class="c" style="font-family:monospace;">sequence <span style="color: #339933;">&lt;-</span> c<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">0</span><span style="color: #009900;">&#41;</span>
possible.<span style="color: #202020;">p</span> <span style="color: #339933;">&lt;-</span> seq<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> by <span style="color: #339933;">=</span> <span style="color:#800080;">0.001</span><span style="color: #009900;">&#41;</span>
jpeg<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'Log_Likelihood_Concavity.jpg'</span><span style="color: #009900;">&#41;</span>
qplot<span style="color: #009900;">&#40;</span>possible.<span style="color: #202020;">p</span><span style="color: #339933;">,</span>
      sapply<span style="color: #009900;">&#40;</span>possible.<span style="color: #202020;">p</span><span style="color: #339933;">,</span> <span style="color: #000000; font-weight: bold;">function</span> <span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>log.<span style="color: #202020;">likelihood</span><span style="color: #009900;">&#40;</span>sequence<span style="color: #339933;">,</span> p<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
      geom <span style="color: #339933;">=</span> <span style="color: #ff0000;">'line'</span><span style="color: #339933;">,</span>
      main <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Log Likelihood as a Function of P'</span><span style="color: #339933;">,</span>
      xlab <span style="color: #339933;">=</span> <span style="color: #ff0000;">'P'</span><span style="color: #339933;">,</span>
      ylab <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Log Likelihood'</span><span style="color: #009900;">&#41;</span>
dev.<span style="color: #202020;">off</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/04/Log_Likelihood_Concavity.jpg" alt="Log_Likelihood_Concavity.jpg" border="0" width="480" height="480" /></div>
<p>And then you can rerun our error diagnostics using both approaches to confirm that the log likelihood approach does not suffer from the same numerical problems:</p>

<div class="wp_codebox"><table><tr id="p383681"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
</pre></td><td class="code" id="p3836code81"><pre class="c" style="font-family:monospace;">error.<span style="color: #202020;">behavior</span> <span style="color: #339933;">&lt;-</span> data.<span style="color: #202020;">frame</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span>
&nbsp;
<span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>n in <span style="color: #0000dd;">10</span><span style="color: #339933;">:</span><span style="color: #0000dd;">2500</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  sequence <span style="color: #339933;">&lt;-</span> rbinom<span style="color: #009900;">&#40;</span>n<span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #339933;">,</span> p.<span style="color: #202020;">parameter</span><span style="color: #009900;">&#41;</span>
&nbsp;
  likelihood.<span style="color: #202020;">results</span> <span style="color: #339933;">&lt;-</span> optimize<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>likelihood<span style="color: #009900;">&#40;</span>sequence<span style="color: #339933;">,</span> p<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
                                 interval <span style="color: #339933;">=</span> c<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                                 maximum <span style="color: #339933;">=</span> TRUE<span style="color: #009900;">&#41;</span>
&nbsp;
  log.<span style="color: #202020;">likelihood</span>.<span style="color: #202020;">results</span> <span style="color: #339933;">&lt;-</span> optimize<span style="color: #009900;">&#40;</span><span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>p<span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>log.<span style="color: #202020;">likelihood</span><span style="color: #009900;">&#40;</span>sequence<span style="color: #339933;">,</span> p<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#125;</span><span style="color: #339933;">,</span>
                                     interval <span style="color: #339933;">=</span> c<span style="color: #009900;">&#40;</span><span style="color: #0000dd;">0</span><span style="color: #339933;">,</span> <span style="color: #0000dd;">1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                                     maximum <span style="color: #339933;">=</span> TRUE<span style="color: #009900;">&#41;</span>
&nbsp;
  <span style="color: #000000; font-weight: bold;">true</span>.<span style="color: #202020;">mle</span> <span style="color: #339933;">&lt;-</span> mean<span style="color: #009900;">&#40;</span>sequence<span style="color: #009900;">&#41;</span>
&nbsp;
  likelihood.<span style="color: #202020;">error</span> <span style="color: #339933;">&lt;-</span> <span style="color: #000000; font-weight: bold;">true</span>.<span style="color: #202020;">mle</span> <span style="color: #339933;">-</span> likelihood.<span style="color: #202020;">results</span>$maximum
  log.<span style="color: #202020;">likelihood</span>.<span style="color: #202020;">error</span> <span style="color: #339933;">&lt;-</span> <span style="color: #000000; font-weight: bold;">true</span>.<span style="color: #202020;">mle</span> <span style="color: #339933;">-</span> log.<span style="color: #202020;">likelihood</span>.<span style="color: #202020;">results</span>$maximum
&nbsp;
  error.<span style="color: #202020;">behavior</span> <span style="color: #339933;">&lt;-</span> rbind<span style="color: #009900;">&#40;</span>error.<span style="color: #202020;">behavior</span><span style="color: #339933;">,</span>
                          data.<span style="color: #202020;">frame</span><span style="color: #009900;">&#40;</span>N <span style="color: #339933;">=</span> n<span style="color: #339933;">,</span>
                                     Error <span style="color: #339933;">=</span> likelihood.<span style="color: #202020;">error</span><span style="color: #339933;">,</span>
                                     Algorithm <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Likelihood'</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span>
                          data.<span style="color: #202020;">frame</span><span style="color: #009900;">&#40;</span>N <span style="color: #339933;">=</span> n<span style="color: #339933;">,</span>
                                     Error <span style="color: #339933;">=</span> log.<span style="color: #202020;">likelihood</span>.<span style="color: #202020;">error</span><span style="color: #339933;">,</span>
                                     Algorithm <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Log Likelihood'</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span>
&nbsp;
jpeg<span style="color: #009900;">&#40;</span><span style="color: #ff0000;">'Long-Term_Error_Behavior.jpg'</span><span style="color: #009900;">&#41;</span>
ggplot<span style="color: #009900;">&#40;</span>error.<span style="color: #202020;">behavior</span><span style="color: #339933;">,</span> aes<span style="color: #009900;">&#40;</span>x <span style="color: #339933;">=</span> N<span style="color: #339933;">,</span> y <span style="color: #339933;">=</span> Error<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> geom_line<span style="color: #009900;">&#40;</span>aes<span style="color: #009900;">&#40;</span>group <span style="color: #339933;">=</span> Algorithm<span style="color: #339933;">,</span> color <span style="color: #339933;">=</span> Algorithm<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">+</span> opts<span style="color: #009900;">&#40;</span>title <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Long-Term Error Behavior of Two Numerical Approaches'</span><span style="color: #339933;">,</span> xlab <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Sample Size'</span><span style="color: #339933;">,</span> ylab <span style="color: #339933;">=</span> <span style="color: #ff0000;">'Deviation from True MLE'</span><span style="color: #009900;">&#41;</span>
dev.<span style="color: #202020;">off</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span></pre></td></tr></table></div>

<div style="text-align:center;"><img src="http://www.johnmyleswhite.com/notebook/wp-content/uploads/2010/04/Long-Term_Error_Behavior.jpg" alt="Long-Term_Error_Behavior.jpg" border="0" width="480" height="480" /></div>
<p>More generally, given any data set and any model, you can &#8212; at least in principle &#8212; solve the maximum likelihood estimation problem using numerical optimization algorithms. The general algorithm requires that you specify a more general log likelihood function analogous to the R-like pseudocode below:</p>

<div class="wp_codebox"><table><tr id="p383682"><td class="line_numbers"><pre>1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code" id="p3836code82"><pre class="c" style="font-family:monospace;">log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">&lt;-</span> <span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>sequence.<span style="color: #202020;">as</span>.<span style="color: #202020;">data</span>.<span style="color: #202020;">frame</span><span style="color: #339933;">,</span> likelihood.<span style="color: #000000; font-weight: bold;">function</span><span style="color: #339933;">,</span> parameters<span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
  log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">&lt;-</span> <span style="color: #0000dd;">0</span>
&nbsp;
  <span style="color: #b1b100;">for</span> <span style="color: #009900;">&#40;</span>i in <span style="color: #0000dd;">1</span><span style="color: #339933;">:</span>nrow<span style="color: #009900;">&#40;</span>sequence.<span style="color: #202020;">as</span>.<span style="color: #202020;">data</span>.<span style="color: #202020;">frame</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#123;</span>
    log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">&lt;-</span> log.<span style="color: #202020;">likelihood</span> <span style="color: #339933;">+</span> log<span style="color: #009900;">&#40;</span>likelihood.<span style="color: #000000; font-weight: bold;">function</span><span style="color: #009900;">&#40;</span>sequence.<span style="color: #202020;">as</span>.<span style="color: #202020;">data</span>.<span style="color: #202020;">frame</span><span style="color: #009900;">&#91;</span>i<span style="color: #339933;">,</span><span style="color: #009900;">&#93;</span><span style="color: #339933;">,</span> parameters<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span>
  <span style="color: #009900;">&#125;</span>
&nbsp;
  <span style="color: #b1b100;">return</span><span style="color: #009900;">&#40;</span>log.<span style="color: #202020;">likelihood</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#125;</span></pre></td></tr></table></div>

<p>Then you need to apply multivariable, constrained optimization tools to find your maximum likelihood estimates. This actually turns out to be a hard problem in general, so I&#8217;m going to bail out on the topic here.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/04/21/doing-maximum-likelihood-estimation-by-hand-in-r/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Theme Change</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/04/20/theme-change/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/04/20/theme-change/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 12:32:28 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Site News]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3827</guid>
		<description><![CDATA[I&#8217;ve changed the theme on this blog to the Hybrid theme. I was tired of how busy the old theme was, and I wanted more space for graphics and code snippets. If you have any suggestions for cleaning things up even more, please let me know.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve changed the theme on this blog to the <a href="http://wordpress.org/extend/themes/hybrid">Hybrid theme</a>. I was tired of how busy the old theme was, and I wanted more space for graphics and code snippets. If you have any suggestions for cleaning things up even more, please let me know.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/04/20/theme-change/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Price of Calculation</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/03/15/the-price-of-calculation/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/03/15/the-price-of-calculation/#comments</comments>
		<pubDate>Mon, 15 Mar 2010 15:13:20 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Economics]]></category>
		<category><![CDATA[Mathematics]]></category>
		<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3820</guid>
		<description><![CDATA[In a world in which the price of calculation continues to decrease rapidly, but the price of theorem proving continues to hold steady or increase, elementary economics indicates that we ought to spend a larger and larger fraction of our time on calculation.1 Over the next ten years, I hope that more and more mathematically [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>
In a world in which the price of calculation continues to decrease rapidly, but the price of theorem proving continues to hold steady or increase, elementary economics indicates that we ought to spend a larger and larger fraction of our time on calculation.<sup><a href="http://www.johnmyleswhite.com/notebook/2010/03/15/the-price-of-calculation/#footnote_0_3820" id="identifier_0_3820" class="footnote-link footnote-identifier-link" title="J. W. Tukey : The American Statistician : Sunset Salvo">1</a></sup>
</p></blockquote>
<p>Over the next ten years, I hope that more and more mathematically minded hackers, empowered by open source tools like the R programming language and emboldened by the popularization of statistical analyses by people like Steve Levitt, will follow Tukey&#8217;s suggestion.</p>
<ol class="footnotes"><li id="footnote_0_3820" class="footnote">J. W. Tukey : The American Statistician : Sunset Salvo</li></ol>]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/03/15/the-price-of-calculation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Alternative to Occam&#8217;s Razor</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/02/09/an-alternative-to-occams-razor/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/02/09/an-alternative-to-occams-razor/#comments</comments>
		<pubDate>Tue, 09 Feb 2010 15:19:24 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Aphorisms]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3818</guid>
		<description><![CDATA[In light of human foibles, I would suggest that this decision rule be used in lieu of Occam&#8217;s Razor: of several possible explanations for an observation, the most boring one is probably the most accurate.]]></description>
			<content:encoded><![CDATA[<p>In light of human foibles, I would suggest that this decision rule be used in lieu of Occam&#8217;s Razor: of several possible explanations for an observation, the most boring one is probably the most accurate.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/02/09/an-alternative-to-occams-razor/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>iBad: The FSF Kool-Aid and Other Dystopian Hallucinations</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/01/29/ibad-the-fsf-kool-aid-and-other-dystopian-hallucinations/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/01/29/ibad-the-fsf-kool-aid-and-other-dystopian-hallucinations/#comments</comments>
		<pubDate>Fri, 29 Jan 2010 15:51:23 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Mac OS X]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3816</guid>
		<description><![CDATA[The people who worry that the iPad will bring about a dystopian future for home computing keep forgetting something: for the rest of humanity, their ideal world of perfectly hackable machines is already a dystopian nightmare. It&#8217;s a world in which nothing works without spending hours setting it up, in which basic features are missing [...]]]></description>
			<content:encoded><![CDATA[<p>The people who worry that the iPad will bring about a dystopian future for home computing keep forgetting something: for the rest of humanity, <i>their ideal world of perfectly hackable machines is already a dystopian nightmare</i>. It&#8217;s a world in which nothing works without spending hours setting it up, in which basic features are missing while the manual lists thousands of irrelevant options, in which a million hardware extensions are available for their machine, but none of them help to solve a single one of their day-to-day problems. While being something of a hacker myself, I feel that the hacker&#8217;s vision of totally open computing probably should become a niche market, in much the same way that chemistry sets represent a niche market. The fact that not every person has a set of tools in his house that, by default, allows him to conduct arbitrary chemistry experiments has not substantially slowed down the progress of chemistry from what I can tell. The arrival of a world in which the most popular computers are closed to arbitrary hardware extensions and all applications are required to run within a sandbox probably won&#8217;t slow down the progress of personal computing much either.</p>
<p>Hackers of the world, your priorities are not simply different from the average user&#8217;s: they often represent a direct attack on the average user&#8217;s preferences. You keep asserting that you have the normal person&#8217;s interests in mind, but I think you&#8217;re often simply concealing your own self-interest underneath politicized rhetoric about freedom and openness.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/01/29/ibad-the-fsf-kool-aid-and-other-dystopian-hallucinations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Gay Marriage: Another Data Point</title>
		<link>http://www.johnmyleswhite.com/notebook/2010/01/16/gay-marriage-another-data-point/</link>
		<comments>http://www.johnmyleswhite.com/notebook/2010/01/16/gay-marriage-another-data-point/#comments</comments>
		<pubDate>Sat, 16 Jan 2010 16:09:36 +0000</pubDate>
		<dc:creator>John Myles White</dc:creator>
				<category><![CDATA[Statistics]]></category>

		<guid isPermaLink="false">http://www.johnmyleswhite.com/?p=3813</guid>
		<description><![CDATA[Relevant to my earlier post about the relationship between direct democracy and laws prohibiting gay marriage, Pew Research just published poll data showing that a majority of Americans disapprove of same-sex marriage.]]></description>
			<content:encoded><![CDATA[<p>Relevant to my earlier post about the relationship between direct democracy and laws prohibiting gay marriage, Pew Research just published poll data showing that <a href="http://pewresearch.org/databank/dailynumber/?NumberID=881">a majority of Americans disapprove of same-sex marriage.</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.johnmyleswhite.com/notebook/2010/01/16/gay-marriage-another-data-point/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.791 seconds -->
