<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ufo &#8211; Weird Data Science</title>
	<atom:link href="https://www.weirddatascience.net/category/ufo/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.weirddatascience.net</link>
	<description>Paranormal Distributions. Cyclopean Data. Esoteric Regression.</description>
	<lastBuildDate>Sat, 15 Nov 2025 14:01:45 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
<site xmlns="com-wordpress:feed-additions:1">143387998</site>	<item>
		<title>Bayes vs. the Invaders (Redivivus)</title>
		<link>https://www.weirddatascience.net/2025/11/15/bayes-vs-the-invaders-redivivus/</link>
					<comments>https://www.weirddatascience.net/2025/11/15/bayes-vs-the-invaders-redivivus/#respond</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Sat, 15 Nov 2025 10:50:27 +0000</pubDate>
				<category><![CDATA[event]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[stan]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">https://www.weirddatascience.net/?p=10231</guid>

					<description><![CDATA[<div class="mh-excerpt">Unidentifiable aerial and marine phenomena. Impossible lights in the sky. Patterns of visitation and terror. Insidious influences from the hadal voids between the stars. Who--what--swoop and glide through the ink-black nights of our world, probing and testing our structures, our societies, our minds? From barely remembered history, to early reports of impossible objects, to blurrily evidenced documentation, data concerning flying arcane observations has grown and twisted, along with our capacity to lay them bare, to subject them to analysis, and to interrogate their secrets.

In this year's OII Halloween Lecture, we will tremblingly revisit a Bayesian analysis of seventy years of UFO sightings, drawn from a dataset collected by the National UFO Reporting Center (NUFORC). Scepticism, fear, doubt, and most accepted standards of statistical rigor, will be cast aside in our unyielding and disquieting pursuit of an uncompromised truth.</div> <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2025/11/15/bayes-vs-the-invaders-redivivus/" title="Bayes vs. the Invaders (Redivivus)">[...]</a>]]></description>
										<content:encoded><![CDATA[<p>Straying still further from whatever dubious graces it once presumed, academia&#8217;s luciferous descent into murky realms of huddled speculation continues unabated. Relocated, reconstituted, in ever-fading cycles, the Oxford Internet Institute at the University of Oxford once again saw fit to deprive its students of the peace and comfort of banal rationality through this fourth annual Halloween Lecture.</p>
<p>In the absence of fresh insight, this year&#8217;s lecture revisits the chilling implications of humanity&#8217;s contact with inexplicable aerial and marine phenomena, drawing from the faintest whispers of the ante-historical record through to the shimmering echoes of statistical reasoning. Through what means do visitors from beyond the void encroach on our night-time skies? What subtle deceptions underpin their visible geometries? What values lie behind their peculiar interests in certain uncomfortably-favoured regions?</p>
<p>Despite all safeguards, and in the face of numerous barely-perceptible currents opposing such efforts, this event was captured, stored, and released into a world still cruelly unprepared to face its findings.</p>
<p>More details, and the underlying code, for these findings can be found&#8211;for those unwary enough to look&#8211;in the series of entries beginning <a href="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/">here</a>.</p>
<p><a href="https://www.youtube.com/watch?v=WobPKY4UZc0"><img decoding="async" src="https://img.youtube.com/vi/WobPKY4UZc0/2.jpg" alt="Bayes vs. the Invaders (Redivivus)"></a></p>
<p><a href="https://www.youtube.com/watch?v=WobPKY4UZc0">Click here to view the video on YouTube</a>.</p>

<blockquote><p>
<strong>Oxford Internet Institute Halloween Lecture</strong><br />
Bayes vs. the Invaders (Redivivus): A Bayesian Analysis of 70 Years of UFO Sightings<br />
<em>Prof. Joss Wright</em><br />
<em>Oxford. October 2025</em></p>
<p>The searing heat of summer retreats, cools, fades, surrendering its vitality to the flickering uncertainties of autumn. Nights draw close, like dimly-remembered friends clustering in our dreams. The spring leaves abandon their verdant dance, as they age, wither, and drift into a russet swirl of skeletal, wind-stirred fragments. </p>
<p>The dying seasons return, dragging with them time-hallowed fears and uneasy rumours, pooling around these darkly dreaming spires in a mire of primal superstition. The agonizingly brittle certainties of the modern enlightenment, our desperate faith in the gossamer fabrics of scientific progress, falter in the face of primal terrors that lurk implacably in the gloom.</p>
<p>In these darkening days, as faith in our treasured understanding dims, it is yet again time to turn our faces fully to the darkness. Halloween, slouching inexorably towards our minds, impels us as scholars to gather our methods, our theories, our data, our knowledge; and glean what light we can from the primordial glimmers of the unknown.</p>
<p>Unidentifiable aerial and marine phenomena. Impossible lights in the sky. Patterns of visitation and terror. Insidious influences from the hadal voids between the stars. Who&#8211;what&#8211;swoop and glide through the ink-black nights of our world, probing and testing our structures, our societies, our minds? From barely remembered history, to early reports of impossible objects, to blurrily evidenced documentation, data concerning flying arcane observations has grown and twisted, along with our capacity to lay them bare, to subject them to analysis, and to interrogate their secrets.</p>
<p>In this year&#8217;s OII Halloween Lecture, we will tremblingly revisit a Bayesian analysis of seventy years of UFO sightings, drawn from a dataset collected by the National UFO Reporting Center (NUFORC). Scepticism, fear, doubt, and most accepted standards of statistical rigour, will be cast aside in our unyielding and disquieting pursuit of an uncompromised truth.
</p></blockquote>
<a href="https://www.weirddatascience.net/wp-content/uploads/2025/11/2025-bayes_vs_invaders-redivivus.pdf" class="pdfemb-viewer" style="" data-width="max" data-height="max" data-mobile-width="500"  data-scrollbar="none" data-download="on" data-tracking="on" data-newwindow="on" data-pagetextbox="off" data-scrolltotop="off" data-startzoom="100" data-startfpzoom="100" data-toolbar="bottom" data-toolbar-fixed="off">2025-bayes_vs_invaders-redivivus<br/></a>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2025/11/15/bayes-vs-the-invaders-redivivus/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">10231</post-id>	</item>
		<item>
		<title>Whisperings in the Academy</title>
		<link>https://www.weirddatascience.net/2022/11/20/whisperings-in-the-academy/</link>
					<comments>https://www.weirddatascience.net/2022/11/20/whisperings-in-the-academy/#respond</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Sun, 20 Nov 2022 13:12:43 +0000</pubDate>
				<category><![CDATA[event]]></category>
		<category><![CDATA[maps]]></category>
		<category><![CDATA[spatial analysis]]></category>
		<category><![CDATA[stan]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">https://www.weirddatascience.net/?p=4385</guid>

					<description><![CDATA[<div class="mh-excerpt">The noblest of human endeavours is to enlighten the uninitiated consciousness; to bare its awareness before the endless and terrifying vistas that lie beyond darkness and ignorance.</div> <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2022/11/20/whisperings-in-the-academy/" title="Whisperings in the Academy">[...]</a>]]></description>
										<content:encoded><![CDATA[<p>The noblest of human endeavours is to enlighten the uninitiated consciousness; to bare its awareness before the endless and terrifying vistas that lie beyond darkness and ignorance.</p>
<p>In pursuit of such necessarily painful revelations the Oxford Internet Insitute at the University of Oxford &#8212; the unwitting host on which the investigations here parasitise &#8212; recently hosted an inaugural Halloween lecture. This oration drew on several years of dark explorations chronicled in this blog, to inculcuate into a new generation of unprepared and curious minds the horror and necessity of subjecting our reality to the insidious power of statistical science. Through what seems a dangerously careless oversight, this brief glimpse of truth was recorded and made available for posterity.</p>
<p><a href="https://www.youtube.com/watch?v=qaBYjnXbnWE"><img decoding="async" src="https://img.youtube.com/vi/qaBYjnXbnWE/2.jpg" alt="Whisperings in the Academy"></a></p>
<p><a href="https://www.youtube.com/watch?v=qaBYjnXbnWE">Click here to view the video on YouTube</a>.</p>

<p>&nbsp;</p>
<p>For the terminally inquisitive, the archival materials on which this work was drawn are presented here.</p>
<a href="https://www.weirddatascience.net/wp-content/uploads/2022/11/bayes_vs_invaders.pdf" class="pdfemb-viewer" style="" data-width="max" data-height="max" data-mobile-width="500"  data-scrollbar="none" data-download="on" data-tracking="on" data-newwindow="on" data-pagetextbox="off" data-scrolltotop="off" data-startzoom="100" data-startfpzoom="100" data-toolbar="bottom" data-toolbar-fixed="off">bayes_vs_invaders<br/></a>
<p>&nbsp;</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2022/11/20/whisperings-in-the-academy/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">4385</post-id>	</item>
		<item>
		<title>Bayes vs. the Invaders! Part Four: Convergence</title>
		<link>https://www.weirddatascience.net/2019/04/28/bayes-vs-the-invaders-part-four-convergence/</link>
					<comments>https://www.weirddatascience.net/2019/04/28/bayes-vs-the-invaders-part-four-convergence/#respond</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Sun, 28 Apr 2019 08:38:43 +0000</pubDate>
				<category><![CDATA[beyond the veil]]></category>
		<category><![CDATA[stan]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/?p=728</guid>

					<description><![CDATA[<div class="mh-excerpt">In the previous three posts in our series delving into the cosmic horror of UFO sightings in the United States, we have descended from the deceptively warm and sunlit waters of basic linear regression, through the increasingly frigid, stygian depths of Bayesian inference, generalised linear models, and the probabilistic programming language Stan. In this final post we will explore the implications of the murky realms in which we find ourselves, and consider the awful choices that have led us to this point. We will therefore look, with merciful brevity, at the foul truth revealed by our models, but also consider the arcane philosophies that lie sleeping beneath.</div> <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2019/04/28/bayes-vs-the-invaders-part-four-convergence/" title="Bayes vs. the Invaders! Part Four: Convergence">[...]</a>]]></description>
										<content:encoded><![CDATA[<h1>Sealed and Buried</h1>
<p>In the previous three posts<span id='easy-footnote-1-728' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/28/bayes-vs-the-invaders-part-four-convergence/#easy-footnote-bottom-1-728' title='To immerse yourself in the full horror of our journey, the previous posts in this series are here: &lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.weirddatascience.net/index.php/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/&quot;&gt;Bayes vs. the Invaders! Part One: The 37th Parallel&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.weirddatascience.net/index.php/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/&quot;&gt;Bayes vs. the Invaders! Part Two: Abnormal Distributions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.weirddatascience.net/index.php/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/&quot;&gt;Bayes vs. the Invaders! Part Three: The Parallax View&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
'><sup>1</sup></a></span> in our series delving into the cosmic horror of UFO sightings in the US, we have descended from the deceptively warm and sunlit waters of basic linear regression, through the increasingly frigid, stygian depths of Bayesian inference, generalised linear models, and the probabilistic programming language Stan.</p>
<p>In this final post we will explore the implications of the murky realms in which we find ourselves, and consider the awful choices that have led us to this point. We will therefore look, with merciful brevity, at the foul truth revealed by our models, but also consider the arcane philosophies that lie sleeping beneath.</p>
<h1>Deviant Interpretations</h1>
<p>Our crazed wanderings through dark statistical realms have led us eventually to a <a href="https://www.weirddatascience.net/index.php/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/">varying slope, varying intercept negative binomial generalised linear model</a>, whose selection was justified over its simpler cousins via leave-one-out cross-validation (LOO-CV). By interrogating the range of hyperparameters of this model, we could reproduce an alluringly satisfying visual display of the posterior predictive distribution across the United States:</p>
<figure id="attachment_703" aria-describedby="caption-attachment-703" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.png"><img fetchpriority="high" decoding="async" data-attachment-id="703" data-permalink="https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/predictive_plot-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="bvi03-predictive_plot" data-image-description="" data-image-caption="&lt;p&gt;Varying intercept and slope negative binomial GLM of UFO sightings against population.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.png" alt="" width="1920" height="1080" class="size-full wp-image-703" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-64x36.png 64w" sizes="(max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-703" class="wp-caption-text">Varying intercept and slope negative binomial GLM of UFO sightings against population. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.pdf">PDF Version</a>)</figcaption></figure>
<p>Further, our model provides us with insight into the individual per-state intercept &#92;(\alpha&#92;) and slope &#92;(\beta&#92;) parameters of the underlying linear model, demonstrating that there is variation between the rate of sightings in US states that cannot be accounted for by their ostensibly human population.</p>
<figure id="attachment_705" aria-describedby="caption-attachment-705" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.png"><img decoding="async" data-attachment-id="705" data-permalink="https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/ufo_per-state_intercepts-slopes/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ufo_per-state_intercepts-slopes" data-image-description="" data-image-caption="&lt;p&gt;Varying slope and intercept negative binomial GLM parameter plot.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.png" alt="" width="1920" height="1080" class="size-full wp-image-705" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-64x36.png 64w" sizes="(max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-705" class="wp-caption-text">Varying slope and intercept negative binomial GLM parameter plot for UFO sightings model. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.pdf">PDF Version</a>)</figcaption></figure>
<p>Interpreting these parameters, however, is not as quite as simple as in a basic linear model<span id='easy-footnote-2-728' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/28/bayes-vs-the-invaders-part-four-convergence/#easy-footnote-bottom-2-728' title='One advantage of generalised linear models is that they allow flexibility whilst remaining relatively interpretable. As models become more complex, extending generalised linear models (GLMs) through generalised &lt;em&gt;additive&lt;/em&gt; models (GAMs), and further into the vast chaotic mass of machine learning approaches such as random forests and convolutional neural networks, there is a general tradeoff between the terrifying power of the modelling technique and the ability of the human mind to &lt;a href=&quot;TODO:call_of_cthulhu&quot;&gt;correlate its contents&lt;/a&gt;.'><sup>2</sup></a></span>. Most importantly our negative binomial GLM employs a <em>log link</em> function to relate the linear model to the data:</p>

$$\begin{eqnarray}<br />
y &amp;\sim&amp; \mathbf{NegBinomial}(\mu, \phi)&#92;&#92;<br />
\log(\mu) &amp;=&amp; \alpha + \beta x&#92;&#92;<br />
\alpha &amp;\sim&amp; \mathcal{N}(0, 1)&#92;&#92;<br />
\beta &amp;\sim&amp; \mathcal{N}(0, 1)&#92;&#92;<br />
\phi &amp;\sim&amp; \mathbf{HalfCauchy}(2)<br />
\end{eqnarray}$$</p>
<p>In a basic linear regression, &#92;(y=\alpha+\beta x&#92;), the &#92;(\alpha&#92;) parameter can be interpreted as the value of &#92;(y&#92;) when &#92;(x&#92;) is 0. Increasing the value of &#92;(x&#92;) by 1 results in a change in the &#92;(y&#92;) value of &#92;(\beta&#92;). We have, however, been drawn far beyond such naive certainties.</p>
<p>The &#92;(\alpha&#92;&#41; and &#92;(\beta&#92;) coefficients in our negative binomial GLM produce the &#92;(\log&#92;) of the &#92;(y&#92;) value: the <em>mean</em> of the negative binomial in our parameterisation.</p>
<p>With a simple rearrangement, we can being to understand the grim effects of this transformation:</p>
<p>$$\begin{array}<br />
_ &amp; \log(\mu) &amp;=&amp; \alpha + \beta x&#92;&#92;<br />
\Rightarrow &amp;\mu &amp;=&amp; \operatorname{e}^{\alpha + \beta x}&#92;&#92;<br />
\end{array}$$</p>
<p>If we set &#92;(x=0&#92;):</p>
<p>$$\begin{eqnarray}<br />
\mu_0 &amp;=&amp; \operatorname{e}^{\alpha}<br />
\end{eqnarray}$$</p>
<p>The mean of the negative binomial when &#92;(x&#92;) is 0 is therefore &#92;(\operatorname{e}^{\alpha&#125;&#92;). If we increase the value of &#92;(x&#92;) by 1:</p>
<p>$$\begin{eqnarray}<br />
\mu_1 &amp;=&amp; \operatorname{e}^{\alpha + \beta}&#92;&#92;<br />
&amp;=&amp; \operatorname{e}^{\alpha} \operatorname{e}^{\beta}<br />
\end{eqnarray}$$</p>
<p>Which, if we recall the definition of the underlying mean of our model&#8217;s negative binomial, &#92;(\mu_0&#92;), above, is:<br />
$$\mu_0 \operatorname{e}^{\beta}$$</p>
<p>The effect of an increase in &#92;(x&#92;) is therefore <em>multiplicative</em> with a log link: each increase of &#92;(x&#92;) by 1 causes the mean of the negative binomial to be further multiplied by &#92;(\operatorname{e}^{\beta}&#92;).</p>
<p>Despite this insidious complexity, in many senses our naive interpretation of these values still holds true. A higher value for the &#92;(\beta&#92;) coefficient does mean that the rate of sightings increases more swiftly with population.</p>
<p>With the full, unsettling panoply of US States laid out before us, any attempt to elucidate their many and varied deviations would be overwhelming. Broadly, we can see that both slope and intercepts are generally restricted to a fairly close range, with the 50% and 95% credible intervals notably overlapping in many cases. Despite this, there are certain unavoidable abnormalities from which we cannot, must not, shrink:</p>
<ul>
<li> Only Pennsylvania presents a slope (\(\beta\)) parameter that could be considered as potentially zero, if we consider its 95% credible interval. The correlation between population and number of sightings is otherwise unambiguously positive.</li>
<li>Delaware, whilst presenting a wide credible interval for its slope (\(\beta\)) parameter, stands out as suffering from the greatest rate of change in sightings as its population increases.</li>
<li>Both California and Utah, present suspiciously narrow credible intervals on their slope (\(\beta\)) parameters. The growth in sightings as the population increases therefore demonstrates a worrying consistency although, in both cases, this rate is amongst the lowest of all the states.</li>
</ul>
<p>We can conclude, then, that while the <em>total</em> number of sightings in Delaware are currently low, any increase in numbers of residents there appears to possess a strange fascination for visitors from beyond the inky blackness of space. By contrast, whilst our alien observers have devoted significant resources to monitoring Utah and California, their apparent willingness to devote further effort to tracking those states&#8217; burgeoning populations is low.</p>
<h1>Trembling Uncertainty</h1>
<p>One of the fundamental elements of the Bayesian approach is its willing embrace of uncertainty. The output of our eldritch inferential processes are not <em>point estimates</em> of the outcome, as in certain other approaches, but instead <em>posterior predictive distributions</em> for those outcomes. As such, if when we turn our minds to predicting new outcomes based on previously unseen data, our outcome is a <em>distribution</em> over possible values rather than a single estimate. Thus, at the dark heart of Bayesian inference is a belief in the truth that all uncertainty be quantified as probability distributions.</p>
<p>The Bayesian approach as inculcated here has a <em>predictive</em> bent to it. These intricate methods lend themselves to forecasting a distribution of possibilities before the future unveils itself. Here, we gain a horrifying glimpse into the emerging occurrence of alien visitations to the US as its people <a href="https://www.goodreads.com/work/quotes/3194841-the-war-of-the-worlds">busy themselves about their various concerns, scrutinised and studied, perhaps almost as narrowly as a man with a microscope might scrutinise the transient creatures that swarm and multiply in a drop of water</a>.</p>
<h1>Unavoidable Choices</h1>
<p>The twisted reasoning underlying this series of posts has been not only in indoctrinating others into the hideous formalities of Bayesian inference, probabilistic programming, and the arcane subtleties of the <a href="https://www.mc-stan.org">Stan</a> programming language; but also as an exercise in exposing our own minds to their horrors. As such, there is a tentative method to the madness of some of the choices made in this series that we will now elucidate.</p>
<p>Perhaps the most jarring choice has been to code these models in Stan directly, rather than using one of the excellent helper libraries that allow for more concise generation of the underlying Stan code. Both <a href="https://cran.r-project.org/package=brms"><code>brms</code></a> and <a href="https://cran.r-project.org/package=rstanarm"><code>rstanarm</code></a> possess the capacity to spawn models such as ours with greater simplicity of specification and efficiency of output, due to a number of arcane tricks. As an exercise in internalising such forbidden knowledge, however, it is useful to address reality unshielded by such swaddling conveniences.</p>
<p>In fabricating models for more practical reasons, however, we would naturally turn to these tools unless our unspeakable demands go beyond their natural scope. As a personal choice, <code>brms</code> is appealing due to its more natural construction of readable per-model Stan code to be compiled. This allows for the grotesque internals of generated models to be inspected and, if required, twisted to whatever form we desire. <code>rstanarm</code>, by contrast, avoids per-model compilation by pre-compiling more generically applicable models, but its underlying Stan code is correspondingly more arcane for an unskilled neophyte.</p>
<p>The Stan models presented in previous posts have also been constructed as simply as possible and have avoided all but the most universally accepted tricks for improving speed and stability<span id='easy-footnote-3-728' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/28/bayes-vs-the-invaders-part-four-convergence/#easy-footnote-bottom-3-728' title='The most important of these are to center and scale our predictors, and to operate on the log scale through use of the &lt;code&gt;log&lt;/code&gt; form of the probability distribution sampling statements. Both of these contribute significantly to speed and numerical stability in sampling, and are sufficiently universal that their inclusion seemed justified.'><sup>3</sup></a></span>. Most notably, Stan presents specific functions for GLMs based on the Poisson and negative binomial distributions that apply standard link functions directly. As mentioned, we consider it more useful for personal and public indoctrination to use the basic, albeit <code>log</code>-form parameterisations.</p>
<h1>Last Rites</h1>
<p>In concluding the dark descent of this series of posts on Bayesian inference, generalised linear models, and the unearthly effects of extraterrestrial visitions on humanity, we have applied numerous esoteric techniques to identify, describe, and quantify the relationship between human population and UFO sightings. The enigmatic model constructed throughout this and the previous three entries darkly implies that, while the rate of inexplicable aerial phenomena is inextricably and positively linked to humanity&#8217;s unchecked growth, there are nonetheless unseen factors that draw our non-terrestrial visitors to certain populations more than others, and that their focus and attention is ever more acute.</p>
<p>This series has inevitably fallen short of a full and meaningful elucidation of the techniques of Bayesian inference and Stan. From this first step on such a path, then, interested students of the bizarre and arcane would be well advised to draw on the following esoteric resources:</p>
<ul>
<li>McElreath&#8217;s <a href="https://xcelab.net/rm/statistical-rethinking/">Statistical Rethinking</a></li>
<li>Gelman et al.&#8217;s <a href="http://www.stat.columbia.edu/~gelman/book/">Bayesian Data Analysis</a></li>
<li><a href="https://mc-stan.org/">Stan Manual and Tutorials</a></li>
</ul>
<p>Until then, watch the skies and archive your data.</p>
<h2>Footnotes</h2>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2019/04/28/bayes-vs-the-invaders-part-four-convergence/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">728</post-id>	</item>
		<item>
		<title>Bayes vs. the Invaders! Part Three: The Parallax View</title>
		<link>https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/</link>
					<comments>https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/#comments</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Wed, 17 Apr 2019 13:35:56 +0000</pubDate>
				<category><![CDATA[beyond the veil]]></category>
		<category><![CDATA[stan]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/?p=654</guid>

					<description><![CDATA[<div class="mh-excerpt">In the <a href="http://www.weirddatascience.net/index.php/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/">previous post</a> of this series unveiling the relationship between UFO sightings and population, we crossed the threshold of normality underpinning linear models to construct a <em>generalised linear model</em> based on the more theoretically satisfying Poisson distribution. On inspection, however, this model revealed itself to be less well suited to the data than we had, in our tragic ignorance, hoped.</div> <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/" title="Bayes vs. the Invaders! Part Three: The Parallax View">[...]</a>]]></description>
										<content:encoded><![CDATA[<h1>The Parallax View</h1>
<p>In the <a href="http://www.weirddatascience.net/index.php/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/">previous post</a> of this series unveiling the relationship between UFO sightings and population, we crossed the threshold of normality underpinning linear models to construct a <em>generalised linear model</em> based on the more theoretically satisfying Poisson distribution.</p>
<p>On inspection, however, this model revealed itself to be less well suited to the data than we had, in our tragic ignorance, hoped. While it appeared, on visual inspection, to capture some features of the data, the predictive posterior density plot demonstrated that it still fell short of addressing the subtleties of the original.</p>
<p>In this post, we will seek to overcome this sad lack in two ways: firstly, we will subject our models to pitiless mathematical scrutiny to assess their ability to describe the data. With our eyes irrevocably opened to these techniques, we will construct an ever more complex <a href="https://www.youtube.com/watch?v=yRb63jt4uzw">armillary</a> with which to approach the unknowable truth.</p>
<h1>Critical Omissions of Information</h1>
<p>Our previous post showed the different fit of the Poisson model to the data from the simple Gaussian linear model. When presented with a grim array of potential simulacra, however, it is crucial to have reliable and quantitative mechanisms to select amongst them.</p>
<p>The eldritch procedure most suited to this purpose, <em>model selection</em>, in our framework, draws on <em>information criteria</em> that express the <em>relative</em> effectiveness of models at creating sad mockeries of the original data. The original and most well-known such criterion is the <a href="https://en.wikipedia.org/wiki/Akaike_information_criterion"><em>Akaike Information Criterion</em></a>, which has, in turn, spawned a multitude of successors applicable in different situations and with different properties. Here, we will make use of <a href="https://mc-stan.org/loo/"><em>Leave-One-Out Cross Validation</em></a> (LOO-CV)<span id='easy-footnote-4-654' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/#easy-footnote-bottom-4-654' title='Specifically, leave-one-out cross validation calculated via &lt;a href=&quot;https://arxiv.org/abs/1507.04544&quot;&gt;&lt;em&gt;Pareto-smoothed importance sampling&lt;/em&gt;&lt;/a&gt;.'><sup>4</sup></a></span> as the most applicable to the style of model and set of techniques applied here.</p>
<p>It is important to reiterate that these approaches do not speak to an absolute underlying truth; information criteria allow us to choose between models, assessing which has most closely assimilated the madness and chaos of the data. For LOO-CV, this results in an <em>expected log predictive density</em> (<code>elpd</code>) for each model. The model with the lowest <code>elpd</code> is the least-warped mirror of reality amongst those we subject to scrutiny.</p>
<p>There are many fragile subtleties to model selection, of which we will mention only two here. Firstly, in general, the greater the number of predictors or variables incorporated into a model, the more closely it will be able to mimic the original data. This is problematic, in that a model can become <em>overfit</em> to the original data and thus be unable to represent previously unseen data accurately &#8212; it learns to mimic the form of the observed data at the expense of uncovering its underlying reality. The LOO-CV technique avoids this trap by, in effect, withholding data from the model to assess its ability to make accurate inferences on previously unseen data.</p>
<p>The second consideration in model selection is that the information criteria scores of models, such as (<code>elpd</code>) in LOO-CV, are subject to <em>standard error</em> in their assessment; the score itself is not a perfect metric of model performance, but a cunning approximation. As such we will only consider one model to have outperformed its competitors if the difference in their relative <code>elpd</code> is several times greater than this standard error.</p>
<p>With this understanding in hand, we can now ruthlessly quantify the effectiveness of the Gaussian linear model against the Poisson generalised linear model.</p>
<h1>Gaussian vs. the Poisson</h1>
<p>The original model presented before our subsequent descent into horror was a simple linear Gaussian, produced through use of <code>ggplot2</code>&#8216;s <code>geom_smooth</code> function. To compare this meaningfully against the Poisson model of the previous post, we must now recreate this model using the, now hideously familar, tools of Bayesian modelling with Stan.</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show Gaussian model specification code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><code>population_model_normal.stan</code><br />
[code language=&#8221;c&#8221;]
<p>data {</p>
<p>	// Number of rows (observations)<br />
	int&lt;lower=1&gt; observations;</p>
<p>	// Predictor (population of state)<br />
	vector[ observations ] population;</p>
<p>	// Response (counts)<br />
	real&lt;lower=0&gt; counts[observations];</p>
<p>}</p>
<p>parameters {</p>
<p>	// Intercept<br />
	real&lt; lower=0 &gt; a;</p>
<p>	// Slope<br />
	real&lt; lower=0 &gt; b;</p>
<p>	// Standard deviation<br />
	real&lt; lower=0 &gt; sigma;<br />
}</p>
<p>model {</p>
<p>	// Priors<br />
	a ~ normal( 0, 5 );<br />
	b ~ normal( 0, 5 );<br />
	sigma ~ cauchy( 0, 2.5 );</p>
<p>	// Model<br />
	counts ~ normal( a + population * b, sigma );</p>
<p>}</p>
<p>generated quantities {</p>
<p>	// Posterior predictions<br />
	vector[observations] counts_pred;</p>
<p>	// Log likelihood (for LOO)<br />
	vector[observations] log_lik;</p>
<p>	for (n in 1:observations) {</p>
<p>		log_lik[n] = normal_lpdf( counts[n] | a + population[n]*b, sigma );<br />
		counts_pred[n] = normal_rng( a + population[n]*b, sigma );</p>
<p>	}</p>
<p>}</p>
[/code]
</div></div>
</div>
<p>With both models straining in their different directions towards the light, we apply LOO-CV cross validation to assess their effectiveness at predicting the data.</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show LOO-CV comparison code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p>LOO-CV R-code snippet. Full code is at the end of this post.<br />
[code language=&#8221;R&#8221;]
<p>&#8230;<br />
# Compare models with LOO<br />
log_lik_normal &lt;- extract_log_lik(fit_ufo_pop_normal, merge_chains = FALSE)<br />
r_eff_normal &lt;- relative_eff(exp(log_lik_normal))<br />
loo_normal &lt;- loo(log_lik_normal, r_eff = r_eff_normal, cores = 2)</p>
<p>log_lik_poisson &lt;- extract_log_lik(fit_ufo_pop_poisson, merge_chains = FALSE)<br />
r_eff_poisson &lt;- relative_eff(exp(log_lik_poisson))<br />
loo_poisson &lt;- loo(log_lik_poisson, r_eff = r_eff_poisson, cores = 2)<br />
&#8230;</p>
[/code]
</div></div>
</div>
<pre class="brush: plain; title: ; notranslate">
&gt; compare( loo_normal, loo_poisson )
elpd_diff        se 
  -8576.1     712.5 
</pre>
<p>The information criterion shows that the complexity of the Poisson model does not, in fact, produce a more effective model than the false serenity of the Gaussian<span id='easy-footnote-5-654' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/#easy-footnote-bottom-5-654' title='This is perhaps not entirely surprising, based on our belief from the first post of this series that we should consider sightings at the state level rather than globally. With over fifty individual count processes fused into an amorphous mass, it is not entirely surprising that the &lt;a href=&quot;https://en.wikipedia.org/wiki/Central_limit_theorem&quot;&gt;Gaussian is a better approximation to the data&lt;/a&gt;.'><sup>5</sup></a></span>. The negative <code>elpd_diff</code> of the <code>compare</code> function supports the first of the two models, and the magnitude being over twelve times greater than the standard error leaves little doubt that the difference is significant. We must, it seems, look further.</p>
<p>With these techniques for selecting between models in hand, then, we can move on to constructing ever more complex attempts to dispel the darkness.</p>
<h1>Trials without End</h1>
<p>The Poisson distribution, whilst appropriate for many forms of count data, suffers from fundamental limits to its understanding. The single parameter of the Poisson, &#92;(\lambda&#92;), enforces that the mean and variance of the data are equal. When such comforting falsehoods wither in the pale light of reality, we must move beyond the gentle chains in which the Poisson binds us.</p>
<p>The next horrific evolution, then, is the <a href="https://en.wikipedia.org/wiki/Negative_binomial_distribution"><em>negative binomial</em></a> distribution, which similarly speaks to count data, but presents a <em>dispersion</em> parameter (&#92;(\phi&#92;)) that allows the variance to exceed the mean<span id='easy-footnote-6-654' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/#easy-footnote-bottom-6-654' title='There are several parameterisations of the negative binomial used for different applications. The distribution itself is often characterised as representing the number of binary trials, such as a coin flip, required before a given result, such as a number of heads, is achieved. The parameterisation we use here is a common way to represent &lt;em&gt;overdispersed&lt;/em&gt; Poisson data.'><sup>6</sup></a></span>.</p>
<p>With our arcane theoretical library suitably expanded, we can now transplant the still-beating Poisson heart of our earlier generalised linear model with the more complex machinery of the negative binomial:</p>

$$\begin{eqnarray}<br />
y &amp;\sim&amp; \mathbf{NegBinomial}(\mu, \phi)&#92;&#92;<br />
\log(\mu) &amp;=&amp; \alpha + \beta x&#92;&#92;<br />
\alpha &amp;\sim&amp; \mathcal{N}(0, 1)&#92;&#92;<br />
\beta &amp;\sim&amp; \mathcal{N}(0, 1)&#92;&#92;<br />
\phi &amp;\sim&amp; \mathbf{HalfCauchy}(2)<br />
\end{eqnarray}$$</p>
<p>As with the Poisson, our negative binomial generalised linear model employs a log link function to transform the linear predictor. The Stan code for this model is given below.</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show negative binomial model specification code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><code>population_model_negbinomial.stan</code><br />
[code language=&#8221;c&#8221;]
<p>data {</p>
<p>	// Number of rows (observations)<br />
	int&lt;lower=1&gt; observations;</p>
<p>	// Predictor (population of state)<br />
	vector[ observations ] population_raw;</p>
<p>	// Response (counts)<br />
	int&lt;lower=0&gt; counts[observations];</p>
<p>}</p>
<p>transformed data {</p>
<p>	// Center and scale the predictor<br />
	vector[ observations ] population;<br />
	population = ( population_raw &#8211; mean( population_raw ) ) / sd( population_raw );</p>
<p>}</p>
<p>parameters {</p>
<p>	// Negative binomial dispersion parameter<br />
	real phi;</p>
<p>	// Intercept<br />
	real a;</p>
<p>	// Slope<br />
	real b;</p>
<p>}</p>
<p>transformed parameters {</p>
<p>	vector&lt;lower=0&gt;[observations] mu;<br />
	mu = a + b*population;</p>
<p>}</p>
<p>model {</p>
<p>	// Priors<br />
	a ~ normal( 0, 1 );<br />
	b ~ normal( 0, 1 );<br />
	phi ~ cauchy( 0, 5 );</p>
<p>	// Model<br />
	// Uses the log version of the neg_binomial_2 to avoid<br />
	// manual exponentiation of the linear predictor.<br />
	// (This avoids numerical problems in the calculations.)<br />
	counts ~ neg_binomial_2_log( mu, phi );</p>
<p>}</p>
<p>generated quantities {</p>
<p>	vector[observations] counts_pred;<br />
	vector[observations] log_lik;</p>
<p>	for (n in 1:observations) {</p>
<p>		log_lik[n] = neg_binomial_2_log_lpmf( counts[n] | mu[n], phi );<br />
		counts_pred[n] = neg_binomial_2_log_rng( mu[n], phi );</p>
<p>	}</p>
<p>}</p>
[/code]
</div></div>
</div>
<p>With this model fit, we can compare its whispered falsehoods against both the original linear Gaussian model and the Poisson GLM:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show LOO-CV comparison code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p>Code snippet for calculating the LOO-CV <code>elpd</code> for three models. The full R code for building and comparing all models is listed at the end of this post.</p>
[code]
[code language=&quot;R&quot;]
&#8230;<br />
# Compare models with LOO<br />
log_lik_normal &lt;- extract_log_lik(fit_ufo_pop_normal, merge_chains = FALSE)<br />
r_eff_normal &lt;- relative_eff(exp(log_lik_normal))<br />
loo_normal &lt;- loo(log_lik_normal, r_eff = r_eff_normal, cores = 2)</p>
<p>log_lik_poisson &lt;- extract_log_lik(fit_ufo_pop_poisson, merge_chains = FALSE)<br />
r_eff_poisson &lt;- relative_eff(exp(log_lik_poisson))<br />
loo_poisson &lt;- loo(log_lik_poisson, r_eff = r_eff_poisson, cores = 2)</p>
<p>log_lik_negbinom &lt;- extract_log_lik(fit_ufo_pop_negbinom, merge_chains = FALSE)<br />
r_eff_negbinom &lt;- relative_eff(exp(log_lik_negbinom))<br />
loo_negbinom &lt;- loo(log_lik_negbinom, r_eff = r_eff_negbinom, cores = 2)<br />
&#8230;<br />
[/code]
</div></div>
</div>
<pre class="brush: plain; title: ; notranslate">
&gt; compare( loo_poisson, loo_negbinom )
elpd_diff        se 
   8880.8     721.9 
</pre>
<p>With the first comparison, it is clear that the sinuous flexibility offered by the dispersion parameter, &#92;(\phi&#92;), of the negative binomial allows that model to mould itself much more effectively to the data than the Poisson. The <code>elpd_diff</code> score is positive, indicating that the second of the two compared models is favoured; the difference is over twelve times the standard error, giving us confidence that the negative binomial model is meaningfully more effective than the Poisson.</p>
<p>Whilst superior to the Poisson, does this adaptive capacity allow the negative binomial model to render the na&iuml;ve Gaussian linear model obsolete?</p>
<pre class="brush: plain; title: ; notranslate">
&gt; compare( loo_normal, loo_negbinom )
elpd_diff        se 
    304.7      30.9 
</pre>
<p>The negative binomial model subsumes the Gaussian with little effort. The <code>elpd_diff</code> is almost ten times the standard error in favour of the negative binomial GLM, giving us confidence in choosing it. From here on, we will rely on the negative binomial as the core of our schemes.</p>
<h1>Overlapping Realities</h1>
<p>The improvements we have seen with the negative binomial model allow us to discard the Gaussian and Poisson models with confidence. It is not, however, sufficient to fill the gaping void induced by our belief that the sightings of abnormal aerial phenomena in differing US states vary differently with their human population.</p>
<p>To address this question we must ascertain whether allowing our models to unpick the individual influence of states will improve their predictive ability. This, in turn, will lead us into the gnostic insanity of <em>hierarchical models</em>, in which we group predictors in our models to account for their shadowy underlying structures.</p>
<h1>Limpid Pools</h1>
<p>The first step on this path is to allow part of the linear function underpinning our model, specifically the intercept value, &#92;(\alpha&#92;), to vary between different US states. In a simple linear model, this causes the line of best fit for each state to meet the y-axis at a different point, whilst maintaining a constant slope for all states. In such a model, the result is a set of parallel lines of fit, rather than a single global truth.</p>
<p>This varying intercept can describe a range of possible phenomena for which the rate of change remains constant, but the baseline value varies. In such <em>hierarchical models</em> we employ a concept known as <em>partial pooling</em> to extract as much forbidden knowledge from the reluctant data as possible.</p>
<p>A set of entirely separate models, such as the per-state set of linear regressions presented in the first post of this series, employs a <em>no pooling</em> approach: the data of each state is treated separately, with an entirely different model fit to each. This certainly considers the uniqueness of each state, but cannot benefit from insights drawn from the broader range of data we have available, which we may reasonably assume to have some relevance.</p>
<p>By contrast, the global Gaussian, Poisson, and negative binomial models presented so far represent <em>complete pooling</em>, in which the entire set of data is considered a formless, protean amalgam without meaningful structure. This mindless, groping approach causes the unique features of each state to be lost amongst the anarchy and chaos.</p>
<p>A partial pooling approach instead builds a <em>global</em> mean intercept value across the dataset, but allows the intercept value for each individual state to deviate according to a governing probability distribution. This both accounts for the individuality of each group of observations, in our case the state, but also draws on the accumulated wisdom of the whole.</p>
<p>We now construct a partially-pooled varying intercept model, in which the parameters and observations for each US state in our dataset is individually indexed:</p>
<p>$$\begin{eqnarray}<br />
y &amp;\sim&amp; \mathbf{NegBinomial}(\mu, \phi)&#92;&#92;<br />
\log(\mu) &amp;=&amp; \alpha_i + \beta x&#92;&#92;<br />
\alpha_i &amp;\sim&amp; \mathcal{N}(\mu_\alpha, \sigma_\alpha)&#92;&#92;<br />
\beta &amp;\sim&amp; \mathcal{N}(0, 1)&#92;&#92;<br />
\phi &amp;\sim&amp; \mathbf{HalfCauchy}(2)<br />
\end{eqnarray}$$</p>
<p>Note that the intercept parameter, &#92;(\alpha&#92;), in the second line is now indexed by the state, represented here by the subscript &#92;(i&#92;). The slope parameter, &#92;(\beta&#92;), remains constant across all states.</p>
<p>This model can be rendered in Stan code as follows:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show Gaussian model specification code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><code>population_model_negbinomial_var_intercept.stan</code><br />
[code language=&#8221;c&#8221;]
<p>data {</p>
<p>	// Number of rows (observations)<br />
	int&lt;lower=1&gt; observations;</p>
<p>	// Number of states<br />
	int&lt; lower=0 &gt; states;</p>
<p>	// Vector detailing the US state in which each observation (count of<br />
	// counts in a year) occurred<br />
	int&lt; lower=1, upper=states &gt; state[ observations ];</p>
<p>	// Predictor (population of state)<br />
	vector[ observations ] population_raw;</p>
<p>	// Response (counts)<br />
	int&lt;lower=0&gt; counts[ observations ];</p>
<p>}</p>
<p>transformed data {</p>
<p>	// Center and scale the predictor<br />
	vector[ observations ] population;<br />
	population = ( population_raw &#8211; mean( population_raw ) ) / sd( population_raw );</p>
<p>}</p>
<p>parameters {</p>
<p>	// Per-state intercepts<br />
	vector[ states ] a;</p>
<p>	// Mean and SD of distribution from which per-state intercepts are drawn<br />
	real&lt; lower=0 &gt; mu_a;<br />
	real&lt; lower=0 &gt; sigma_a;</p>
<p>	// Negative binomial dispersion parameter<br />
	real&lt; lower=0 &gt; phi;</p>
<p>	// Slope<br />
	real b;</p>
<p>}</p>
<p>transformed parameters {</p>
<p>	// Calculate location parameter for negative binomial incorporating<br />
	// per-state indicator.<br />
	vector[ observations ] eta;</p>
<p>	for( i in 1:observations ) {<br />
		eta[i] = a[ state[i] ] + population[i] * b;<br />
	}<br />
}</p>
<p>model {</p>
<p>	mu_a ~ normal(0, 1);<br />
	sigma_a ~ cauchy(0, 2);</p>
<p>	// Priors<br />
	a ~ normal ( mu_a, sigma_a );<br />
	b ~ normal( 0, 1 );<br />
	phi ~ cauchy( 0, 2 );</p>
<p>	// Model<br />
	counts ~ neg_binomial_2_log( eta, phi );</p>
<p>}</p>
<p>generated quantities {</p>
<p>	vector[observations] counts_pred;<br />
	vector[observations] log_lik;</p>
<p>	vector[observations] mu;<br />
	mu = exp( eta );</p>
<p>	for (n in 1:observations) {</p>
<p>		log_lik[n] = neg_binomial_2_log_lpmf( counts[n] | eta[n], phi );<br />
		counts_pred[n] = neg_binomial_2_log_rng( eta[n], phi );</p>
<p>	}</p>
<p>}</p>
[/code]
</div></div>
</div>
<p>Once the model has twisted itself into the most appropriate form for our data, we can now compare it against our previous completely-pooled model:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show LOO-CV comparison code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p>Code snippet for comparing models via LOO-CV. Full code at the end of this post.</p>
[code]
&#8230;<br />
# Compare models with LOO<br />
log_lik_negbinom_var_intercept &lt;- extract_log_lik(fit_ufo_pop_negbinom_var_intercept, merge_chains = FALSE)<br />
r_eff_negbinom_var_intercept &lt;- relative_eff(exp(log_lik_negbinom_var_intercept))<br />
loo_negbinom_var_intercept &lt;- loo(log_lik_negbinom_var_intercept, r_eff = r_eff_negbinom_var_intercept, cores = 2)<br />
&#8230;<br />
[/code]
</div></div>
</div>
<pre class="brush: plain; title: ; notranslate">
&gt; compare( loo_negbinom, loo_negbinom_var_intercept )
elpd_diff        se 
    363.2      28.8 
</pre>
<p>Our transcendent journey from the statistical primordial ooze continues: the varying intercept model is favoured over the completely-pooled model by a significant margin.</p>
<h1>Sacred Geometry</h1>
<p>Now that our minds have apprehended a startling glimpse of the implications of the varying intercept model, it is natural to consider taking a further terrible step and allowing both the slope and the intercept to vary<span id='easy-footnote-7-654' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/#easy-footnote-bottom-7-654' title='One can build varying slope models with a fixed intercept, but we will not approach that horror here.'><sup>7</sup></a></span>.</p>
<p>With both the intercept and slope of the underlying linear predictor varying, an additional complexity raises its head: can we safely assume that these parameters, the intercept and slope, vary independently of each other, or may there be arcane correlations between them? Do states with a higher intercept also experience a higher slope in general, or is the opposite the case? Without prior knowledge to the contrary, we must allow our model to determine these possible correlations, or we are needlessly throwing away potential information in our model.</p>
<p>For a varying slope and intercept model, therefore, we must now include a <em>correlation matrix</em>, &#92;(\Omega&#92;), between the parameters of the linear predictor for each state in our model. This correlation matrix, as with all parameters in a Bayesian framework, must be expressed with a prior distribution from which the model can begin its evaluation of the data.</p>
<p>With deference to the authoritative <a href="https://mc-stan.org/docs/2_18/stan-users-guide/">quaint and curious volume of forgotten lore</a> we will use an <a href="https://mc-stan.org/docs/2_18/stan-users-guide/multivariate-hierarchical-priors-section.html">LKJ prior</a> for the correlation matrix without further discussion of the reasoning behind it.</p>
<p>$$\begin{eqnarray}<br />
y &amp;\sim&amp; \mathbf{NegBinomial}(\mu, \phi)&#92;&#92;<br />
\log(\mu) &amp;=&amp; \alpha_i + \beta x_i&#92;&#92;<br />
\begin{bmatrix}<br />
\alpha_i&#92;&#92;<br />
\beta_i<br />
\end{bmatrix} &amp;\sim&amp; \mathcal{N}(<br />
\begin{bmatrix}<br />
\mu_\alpha&#92;&#92;<br />
\mu_\beta<br />
\end{bmatrix}, \Omega )&#92;&#92;<br />
\Omega &amp;\sim&amp; \mathbf{LKJCorr}(2)&#92;&#92;<br />
\phi &amp;\sim&amp; \mathbf{HalfCauchy}(2)<br />
\end{eqnarray}$$</p>
<p>This model has grown and gained a somewhat twisted complexity compared with the serene austerity of our earliest linear model. Despite this, each further step in the descent has followed its own perverse logic, and the progression should clear. The corresponding Stan code follows:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show negative binomial varying intercept and slope model specification code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><code>population_model_negbinomial_var_intercept_slope.stan</code><br />
[code]
<p>data {</p>
<p>	// Number of rows (observations)<br />
	int&lt;lower=1&gt; observations;</p>
<p>	// Number of states<br />
	int&lt; lower=0 &gt; states;</p>
<p>	// Vector detailing the US state in which each observation (count of<br />
	// counts in a year) occurred<br />
	int&lt; lower=1, upper=states &gt; state[ observations ];</p>
<p>	// Predictor (population of state)<br />
	vector[ observations ] population_raw;</p>
<p>	// Response (counts)<br />
	int&lt;lower=0&gt; counts[ observations ];</p>
<p>}</p>
<p>transformed data {</p>
<p>	// Center and scale the predictor<br />
	vector[ observations ] population;<br />
	population = ( population_raw &#8211; mean( population_raw ) ) / sd( population_raw );</p>
<p>}</p>
<p>parameters {</p>
<p>	// Per-state intercepts and slopes<br />
	vector[ states ] state_intercept;<br />
	vector[ states ] state_slope;</p>
<p>	// Baseline intercept and slope from which each group deviates.<br />
	real pop_intercept;<br />
	real pop_slope;</p>
<p>	// Per-state standard deviations for intercept and slope<br />
	vector&lt; lower=0 &gt;[2] state_sigma;</p>
<p>	// Negative binomial dispersion parameter<br />
	real&lt; lower=0 &gt; phi;</p>
<p>	// Parameter correlation matrix<br />
	corr_matrix[2] omega;</p>
<p>}</p>
<p>transformed parameters {</p>
<p>	vector[2] vec_intercept_slope[ states ];<br />
	vector[2] mu_intercept_slope;</p>
<p>	// Location parameter<br />
	vector[observations] eta;</p>
<p>	// Per-state intercepts and slopes<br />
	for( i in 1:states ) {</p>
<p>		vec_intercept_slope[ i, 1] = state_intercept[i];<br />
		vec_intercept_slope[ i, 2] = state_slope[i];</p>
<p>	}</p>
<p>	// Population slope and intercept<br />
	mu_intercept_slope[1] = pop_intercept;<br />
	mu_intercept_slope[2] = pop_slope;</p>
<p>	// Calculation negbinomial location parameter<br />
	for( i in 1:observations ) {<br />
		eta[i] = state_intercept[ state[i] ] + state_slope[ state[i]] * population[ i ];<br />
	}</p>
<p>}</p>
<p>model {</p>
<p>	// Priors<br />
	omega ~ lkj_corr(2);<br />
	phi ~ cauchy(0, 3 );<br />
	state_sigma ~ cauchy( 0, 3 );</p>
<p>	pop_intercept ~ normal( 0, 1 );<br />
	pop_slope ~ normal( 0, 1 );</p>
<p>	vec_intercept_slope ~ multi_normal( mu_intercept_slope, quad_form_diag( omega, state_sigma ) );</p>
<p>	// Model<br />
	counts ~ neg_binomial_2_log( eta, phi );</p>
<p>}</p>
<p>generated quantities {</p>
<p>	vector[observations] counts_pred;<br />
	vector[observations] log_lik;</p>
<p>	for (n in 1:observations) {</p>
<p>		log_lik[n] = neg_binomial_2_log_lpmf( counts[n] | eta[n], phi );<br />
		counts_pred[n] = neg_binomial_2_log_rng( eta[n], phi );</p>
<p>	}</p>
<p>}</p>
[/code]
</div></div>
</div>
<p>The ultimate test of our faith, then, is whether the added complexity of the partially-pooled varying slope, varying intercept model is justified. Once again, we turn to the ruthless judgement of the LOO-CV:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show LOO-CV comparison code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
Code snippet for calculating the LOO-CV <code>elpd</code>. The full R code for building and comparing all models in this post is listed at the end.</p>
[code language=&#8221;R&#8221;]
&#8230;<br />
log_lik_negbinom_var_intercept_slope &lt;- extract_log_lik(fit_ufo_pop_negbinom_var_intercept_slope, merge_chains = FALSE)<br />
r_eff_negbinom_var_intercept_slope &lt;- relative_eff(exp(log_lik_negbinom_var_intercept_slope))<br />
loo_negbinom_var_intercept_slope &lt;- loo(log_lik_negbinom_var_intercept_slope, r_eff = r_eff_negbinom_var_intercept_slope, cores = 2)<br />
&#8230;<br />
[/code]
</div></div>
</div>
<pre class="brush: plain; title: ; notranslate">
&gt; compare( loo_negbinom_var_intercept, loo_negbinom_var_intercept_slope )
elpd_diff        se 
     13.3       2.4 
</pre>
<p>In this final step we can see that our labours in the arcane have been rewarded. The final model is once again a significant improvement over its simpler relatives. Whilst the potential for deeper and more perfect models never ends, we will settle for now on this.</p>
<h1>Mortal Consequences</h1>
<p>With our final model built, we can now begin to examine its mortifying implications. We will leave the majority of the subjective analysis for the next, and final, post in this series. For now, however, we can reinforce our quantitative analysis with visual assessment of the posterior predictive distribution output of our final model.</p>
<figure id="attachment_701" aria-describedby="caption-attachment-701" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive.png"><img decoding="async" data-attachment-id="701" data-permalink="https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/posterior_predictive-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="bvi03-posterior_predictive" data-image-description="" data-image-caption="&lt;p&gt;Posterior predictive density plot of varying intercept, varying slope negative binomial model of UFO sightings.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive.png" alt="" width="1920" height="1080" class="size-full wp-image-701" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive-64x36.png 64w" sizes="(max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-701" class="wp-caption-text">Posterior predictive density plot of varying intercept, varying slope negative binomial GLM of UFO sightings. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/posterior_predictive.pdf">PDF Version</a>)</figcaption></figure>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show posterior predictive plotting code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><code>bayes_plots.r</code><br />
(Includes code to generate traceplot and posterior predictive distribution plot.)<br />
[code language=&#8221;R&#8221;]
<p>library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p>library( ggplot2 )<br />
library( showtext )<br />
library( cowplot )</p>
<p>library( rstan )<br />
library( bayesplot )<br />
library( tidybayes )</p>
<p># Load UFO data<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p># UFO reporting font<br />
font_add( &quot;main_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
font_add( &quot;bold_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Plots, posterior predictive checking, LOO. </p>
<p># (Visualisations only produced for varying slope/intercept model, as a result<br />
# of LOO checking.</p>
<p># Bayesplot needs to be told which theme to use as a default.<br />
theme_set( theme_weird() )</p>
<p># Read the fitted model<br />
fit_ufo_pop_negbinom_var_intercept_slope &lt;-<br />
	readRDS( &quot;work/fit_ufo_pop_negbinom_var_intercept_slope.rds&quot; )</p>
<p>## Model checking visualisations</p>
<p># Extract posterior estimates from the fit (from the generated quantities of the stan model)<br />
counts_pred_negbinom_var_intercept_slope &lt;- as.matrix( fit_ufo_pop_negbinom_var_intercept_slope, pars = &quot;counts_pred&quot; )</p>
<p># First, as always, a traceplot<br />
tp &lt;-<br />
	traceplot(<br />
				 fit_ufo_pop_negbinom_var_intercept_slope,<br />
				 pars = c(&quot;pop_intercept&quot;, &quot;pop_slope&quot;, &quot;phi&quot; ),<br />
				 ncol=1 ) +<br />
	scale_colour_viridis_d( name=&quot;Chain&quot;, direction=-1 ) +<br />
	theme_weird()</p>
<p>title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;Traceplot of Key Parameters&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.40) </p>
<p>titled_tp &lt;-<br />
	plot_grid(title, tp, ncol=1, rel_heights=c(0.1, 1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/traceplot.pdf&quot;,<br />
			 titled_tp,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
<p># Posterior predictive density. (Visual representation of goodness of fit.)<br />
gp_ppc &lt;-<br />
	ppc_dens_overlay(<br />
						  y = extract2( ufo_population_sightings, &quot;count&quot; ),<br />
						  yrep = counts_pred_negbinom_var_intercept_slope  ) +<br />
	theme_weird()</p>
<p>title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;Posterior Predictive Density Plot&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.40) </p>
<p>titled_gp_ppc &lt;-<br />
	plot_grid(title, gp_ppc, ncol=1, rel_heights=c(0.1, 1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/posterior_predictive.pdf&quot;,<br />
			 titled_gp_ppc,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
[/code]
</div></div>
</div>
<p>In comparison with earlier attempts, the varying intercept and slope model visibly captures the overall shape of the distribution with terrifying ease. As our wary confidence mounts in the mindless automaton we have fashioned, we can now examine its predictive ability on our original data.</p>
<figure id="attachment_703" aria-describedby="caption-attachment-703" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.png"><img loading="lazy" decoding="async" data-attachment-id="703" data-permalink="https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/predictive_plot-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="bvi03-predictive_plot" data-image-description="" data-image-caption="&lt;p&gt;Varying intercept and slope negative binomial GLM of UFO sightings against population.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.png" alt="" width="1920" height="1080" class="size-full wp-image-703" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot-64x36.png 64w" sizes="auto, (max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-703" class="wp-caption-text">Varying intercept and slope negative binomial GLM of UFO sightings against population. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/predictive_plot.pdf">PDF Version</a>)</figcaption></figure>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show negative binomial varying intercept and slope plot code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><code>population_plot.r</code><br />
[code]
<p>library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p>library( ggplot2 )<br />
library( showtext )<br />
library( cowplot )</p>
<p>library( rstan )<br />
library( bayesplot )<br />
library( tidybayes )<br />
library( modelr )</p>
<p># Load UFO data and model<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p>fit_ufo_pop_negbinom_var_intercept_slope &lt;-<br />
	readRDS(&quot;work/fit_ufo_pop_negbinom_var_intercept_slope.rds&quot;)<br />
	#readRDS(&quot;work/fit_ufo_pop_normal_var_intercept_slope.rds&quot;)</p>
<p># UFO reporting font<br />
font_add( &quot;main_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
font_add( &quot;bold_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Plots, posterior predictive checking, LOO<br />
theme_set( theme_weird() )</p>
<p>## Model checking visualisations</p>
<p># Extract posterior estimates from the fit (from the generated quantities of the stan model)<br />
counts_pred_negbinom_var_intercept_slope &lt;- as.matrix( fit_ufo_pop_negbinom_var_intercept_slope, pars = &quot;counts_pred&quot; )</p>
<p>## Create per-state predictive fit plots</p>
<p># Convert fitted model (stanfit) object to a tibble<br />
fit_tbl &lt;-<br />
	summary(fit_ufo_pop_negbinom_var_intercept_slope)$summary %&gt;%<br />
	as.data.frame() %&gt;%<br />
	mutate(variable = rownames(.)) %&gt;%<br />
	select(variable, everything()) %&gt;%<br />
	as_tibble()</p>
<p>counts_predicted &lt;-<br />
	fit_tbl %&gt;%<br />
	filter( str_detect(variable,&#8217;counts_pred&#8217;) ) </p>
<p>ufo_population_sightings_pred &lt;-<br />
	ufo_population_sightings %&gt;%<br />
	ungroup() %&gt;%<br />
	mutate( count_mean = counts_predicted$mean,<br />
			 lower = counts_predicted$`25%`,<br />
			 upper = counts_predicted$`75%`) </p>
<p># (Using mean and SD of fit summary)<br />
predictive_plot &lt;-<br />
	ggplot( ufo_population_sightings_pred ) +<br />
	geom_point( aes( x=population, y=count, colour=state ), size=0.6, alpha=0.8 ) +<br />
	geom_line(aes( x=population, y=count_mean, colour=state )) +<br />
	geom_ribbon(aes(x=population, ymin = lower, ymax = upper, fill=state), alpha = 0.25) +<br />
	labs( x=&quot;Population (Thousands)&quot;, y=&quot;Annual Sightings&quot; ) +<br />
	scale_fill_viridis_d( name=&quot;State&quot; ) +<br />
	scale_colour_viridis_d( name=&quot;State&quot; ) +<br />
	theme(<br />
			axis.title.y = element_text( angle=90 ),<br />
			legend.position = &quot;none&quot; )</p>
<p># Construct full plot, with title and backdrop.<br />
title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;UFO Sightings against State Population (1990-2014)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;Negative Binomial Hierarchical GLM. Varying slope and intercept. 50% credible intervals.&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.48) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.16) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org | Tool: http://www.mc-stan.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>predictive_plot_titled &lt;-<br />
	plot_grid(title, predictive_plot, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/predictive_plot.pdf&quot;,<br />
			 predictive_plot_titled,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
[/code]
</div></div>
</div>
<p>The purpose of our endeavours is to show whether or not the frequency of extraterrestrial visitations is merely a sad reflection of the number of unsuspecting humans living in each state. After seemingly endless cryptic calculations, our statistical machinery implies that there are deeper mysteries here: allowing the relationship between sightings and the underlying linear predictors to vary by state more perfectly predicts the data. There are clearly other, hidden, factors in play.</p>
<p>More than that, however, our final model allows us to quantify these differences. We can now retrieve from the very bowels of our inferential process the per-state distribution of paremeters for both the slope and intercept of the linear predictor.</p>
<figure id="attachment_705" aria-describedby="caption-attachment-705" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.png"><img loading="lazy" decoding="async" data-attachment-id="705" data-permalink="https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/ufo_per-state_intercepts-slopes/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ufo_per-state_intercepts-slopes" data-image-description="" data-image-caption="&lt;p&gt;Varying slope and intercept negative binomial GLM parameter plot.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.png" alt="" width="1920" height="1080" class="size-full wp-image-705" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes-64x36.png 64w" sizes="auto, (max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-705" class="wp-caption-text">Varying slope and intercept negative binomial GLM parameter plot for UFO sightings model. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/ufo_per-state_intercepts-slopes.pdf">PDF Version</a>)</figcaption></figure>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show per-state intercept and slope plotting code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><code>slope_intercept_plot.r</code><br />
[code language=&#8221;R&#8221;]
<p>library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p>library( ggplot2 )<br />
library( showtext )<br />
library( cowplot )</p>
<p>library( rstan )<br />
library( bayesplot )<br />
library( tidybayes )<br />
library( modelr )</p>
<p># Load UFO data and model<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p>fit_ufo_pop_negbinom_var_intercept_slope &lt;-<br />
	readRDS(&quot;work/fit_ufo_pop_negbinom_var_intercept_slope.rds&quot;)<br />
	#readRDS(&quot;work/fit_ufo_pop_normal_var_intercept_slope.rds&quot;)</p>
<p># UFO reporting font<br />
font_add( &quot;main_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
font_add( &quot;bold_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Plots, posterior predictive checking, LOO<br />
theme_set( theme_weird() )</p>
<p># Use teal colour scheme<br />
color_scheme_set( &quot;teal&quot;)</p>
<p>## Model checking visualisations</p>
<p># Extract posterior estimates from the fit (from the generated quantities of the stan model)<br />
counts_pred_negbinom_var_intercept_slope &lt;- as.matrix( fit_ufo_pop_negbinom_var_intercept_slope, pars = &quot;counts_pred&quot; )</p>
<p># US state data<br />
us_state_factors &lt;-<br />
	levels( factor( ufo_population_sightings$state ) )</p>
<p># US state names for nice plotting<br />
# Data: &lt;https://www.50states.com/abbreviations.htm&gt;<br />
state_code_data &lt;-<br />
	read_csv( file=&quot;data/us_states.csv&quot; ) %&gt;%<br />
	filter( code %in% us_state_factors ) </p>
<p># Rename variables back to state names<br />
posterior_intercepts &lt;-<br />
	as.data.frame( fit_ufo_pop_negbinom_var_intercept_slope ) %&gt;%<br />
	as_tibble %&gt;%<br />
	select(starts_with(&#8216;state_intercept&#8217;) ) %&gt;%<br />
	rename_all( ~us_state_factors ) %&gt;%<br />
	rename_all( ~extract2( state_code_data, &quot;us_state&quot; ) )</p>
<p># Rename variables back to state names<br />
posterior_slopes &lt;-<br />
	as.data.frame( fit_ufo_pop_negbinom_var_intercept_slope ) %&gt;%<br />
	as_tibble %&gt;%<br />
	select(starts_with(&#8216;state_slope&#8217;) ) %&gt;%<br />
	rename_all( ~us_state_factors ) %&gt;%<br />
	rename_all( ~extract2( state_code_data, &quot;us_state&quot; ) )</p>
<p># Posterior draws combined<br />
posterior_slopes_long &lt;-<br />
	posterior_slopes %&gt;%<br />
	gather( value = &quot;slope&quot; )</p>
<p>posterior_intercepts_long &lt;-<br />
	posterior_intercepts %&gt;%<br />
	gather( value = &quot;intercept&quot; )</p>
<p>posterior_draws_long &lt;-<br />
	bind_cols( posterior_intercepts_long, posterior_slopes_long ) %&gt;%<br />
	select( -key1 ) %&gt;%<br />
	transmute( state = key, intercept, slope )</p>
<p># Interval plots (slope and intervals)<br />
# Plot intercept parameters for varying intercept and slope model<br />
gp_intercept &lt;-<br />
	mcmc_intervals( posterior_intercepts ) +<br />
	ggtitle( &quot;Intercepts&quot; ) +<br />
	theme_weird() </p>
<p># Plot slope parameters for varying intercept and slope model<br />
# (Remove y-axis labels as this will be aligned with the intercept plot.)<br />
gp_slope &lt;-<br />
	mcmc_intervals( posterior_slopes ) +<br />
	ggtitle( &quot;Slopes&quot; ) +<br />
	theme_weird() +<br />
	theme(<br />
			axis.text.y = element_blank()<br />
			)</p>
<p>gp_slope_intercept &lt;-<br />
	plot_grid( gp_intercept, gp_slope, ncol=2 )</p>
<p># Construct full plot, with title and backdrop.<br />
title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;Per-State UFO Intercepts and Slopes&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;Mean value, 50% credible interval, and 95% credible interval shown.&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.48) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.16) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org | Tool: http://www.mc-stan.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org | Tool: http://www.mc-stan.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org | Tool: http://www.mc-stan.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>gp_slope_intercept_titled &lt;-<br />
	plot_grid(title, gp_slope_intercept, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) +<br />
	theme( panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;)) </p>
<p>save_plot(&quot;output/ufo_per-state_intercepts-slopes.pdf&quot;,<br />
			 gp_slope_intercept_titled,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
[/code]
</div></div>
</div>
<p>It is important to note that, while we are still referring to the &#92;(\alpha&#92;) and &#92;(\beta&#92;) parameters as the slope and intercept, their interpretation is more complex in a generalised linear model with a &#92;(\log&#92;) link function than in the simple linear model. For now, however, this diagram is sufficient to show that the horror visited on innocent lives by our interstellar visitors is not purely arbitrary, but depends at least in part on geographical location.</p>
<p>With this malign inferential process finally complete we will turn, in the next post, to a trembling interpretation of the model and its dark implications for our collective future.</p>
<h2>Model Fitting and Comparison Code Listing</h2>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show full model fitting and LOO-CV comparison code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p>This code fits the range of models developed in this series, relying on the individual Stan source code files, and runs the LOO-CV comparisons discussed in this post.</p>
<p><code>population_model.r</code><br />
[code]
library( tidyverse )<br />
library( magrittr )</p>
<p>library( ggplot2 )<br />
library( showtext )</p>
<p>library( rstan )<br />
library( tidybayes )<br />
library( loo )</p>
<p># Load UFO data<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p>## Simple Models<br />
## Complete Pooling &#8212; all states considered identical. </p>
<p># Fit model of UFO sightings (Normal)<br />
if( not( file.exists( &quot;work/fit_ufo_pop_normal.rds&quot; ) ) ) {</p>
<p>	message(&quot;Fitting basic Normal model.&quot;)</p>
<p>	fit_ufo_pop_normal &lt;-<br />
		stan( file=&quot;model/population_model_normal.stan&quot;,<br />
			  data=list(<br />
							observations = nrow( ufo_population_sightings ),<br />
							population = extract2( ufo_population_sightings, &quot;population&quot; ),<br />
							counts = extract2( ufo_population_sightings, &quot;count&quot; )<br />
			  )<br />
		)</p>
<p>	saveRDS( fit_ufo_pop_normal, &quot;work/fit_ufo_pop_normal.rds&quot; )</p>
<p>	message(&quot;Basic Normal model fitted.&quot;)</p>
<p>} else {</p>
<p>	fit_ufo_pop_normal &lt;- readRDS( &quot;work/fit_ufo_pop_normal.rds&quot; )</p>
<p>}</p>
<p># Fit model of UFO sightings (Poisson)<br />
if( not( file.exists( &quot;work/fit_ufo_pop_poisson.rds&quot; ) ) ) {</p>
<p>	message(&quot;Fitting basic Poisson model.&quot;)</p>
<p>	fit_ufo_pop_poisson &lt;-<br />
		stan( file=&quot;model/population_model_poisson.stan&quot;,<br />
			  data=list(<br />
							observations = nrow( ufo_population_sightings ),<br />
							population_raw = extract2( ufo_population_sightings, &quot;population&quot; ),<br />
							counts = extract2( ufo_population_sightings, &quot;count&quot; )<br />
			  )<br />
		)</p>
<p>	saveRDS( fit_ufo_pop_poisson, &quot;work/fit_ufo_pop_poisson.rds&quot; )</p>
<p>	message(&quot;Basic Poisson model fitted.&quot;)</p>
<p>} else {</p>
<p>	fit_ufo_pop_poisson &lt;- readRDS( &quot;work/fit_ufo_pop_poisson.rds&quot; )</p>
<p>}</p>
<p># Fit model of UFO sightings (Negative Binomial)<br />
if( not( file.exists( &quot;work/fit_ufo_pop_negbinom.rds&quot; ) ) ) {</p>
<p>	message(&quot;Fitting basic negative binomial model.&quot;)</p>
<p>	fit_ufo_pop_negbinom &lt;-<br />
		stan( file=&quot;model/population_model_negbinomial.stan&quot;,<br />
			  data=list(<br />
							observations = nrow( ufo_population_sightings ),<br />
							population_raw = extract2( ufo_population_sightings, &quot;population&quot; ),<br />
							counts = extract2( ufo_population_sightings, &quot;count&quot; ) )<br />
		)</p>
<p>	saveRDS( fit_ufo_pop_negbinom, &quot;work/fit_ufo_pop_negbinom.rds&quot; )</p>
<p>	message(&quot;Basic negative binomial model fitted.&quot;)</p>
<p>} else {</p>
<p>	fit_ufo_pop_negbinom &lt;- readRDS( &quot;work/fit_ufo_pop_negbinom.rds&quot; )</p>
<p>}</p>
<p>## Multilevel Models<br />
## Partial Pooling (Varying Intercept)</p>
<p>if( not( file.exists( &quot;work/fit_ufo_pop_negbinom_var_intercept.rds&quot; ) ) ) {</p>
<p>	message(&quot;Fitting varying intercept negative binomial model.&quot;)</p>
<p>	fit_ufo_pop_negbinom_var_intercept &lt;-<br />
		stan( file=&quot;model/population_model_negbinomial_var_intercept.stan&quot;,<br />
			  data=list(<br />
							observations = nrow( ufo_population_sightings ),<br />
							population_raw = extract2( ufo_population_sightings, &quot;population&quot; ),<br />
							counts = extract2( ufo_population_sightings, &quot;count&quot; ),<br />
							states = length( unique( ufo_population_sightings$state ) ),<br />
							state = as.numeric( factor( ufo_population_sightings$state ) )<br />
							),<br />
			  chains=4, iter=2000,<br />
			  control = list(max_treedepth = 15, adapt_delta=0.9)<br />
		)</p>
<p>	saveRDS( fit_ufo_pop_negbinom_var_intercept, &quot;work/fit_ufo_pop_negbinom_var_intercept.rds&quot; )</p>
<p>	message(&quot;Varying intercept negative binomial model fitted.&quot;)</p>
<p>} else {</p>
<p>	fit_ufo_pop_negbinom_var_intercept &lt;- readRDS( &quot;work/fit_ufo_pop_negbinom_var_intercept.rds&quot; )</p>
<p>}</p>
<p>### Partial Pooling (Varying intercept and slope.)</p>
<p>if( not( file.exists( &quot;work/fit_ufo_pop_negbinom_var_intercept_slope.rds&quot; ) ) ) {</p>
<p>	message(&quot;Fitting varying intercept and slope negative binomial model&quot;)</p>
<p>	fit_ufo_pop_negbinom_var_intercept_slope &lt;-<br />
		stan( file=&quot;model/population_model_negbinomial_var_intercept_slope.stan&quot;,<br />
			  data=list(<br />
							observations = nrow( ufo_population_sightings ),<br />
							population_raw = extract2( ufo_population_sightings, &quot;population&quot; ),<br />
							counts = extract2( ufo_population_sightings, &quot;count&quot; ),<br />
							states = length( unique( ufo_population_sightings$state ) ),<br />
							state = as.numeric( factor( ufo_population_sightings$state ) )<br />
							),<br />
			  chains=4, iter=2000,<br />
			  control = list(max_treedepth = 12, adapt_delta=0.8)<br />
		)</p>
<p>	saveRDS( fit_ufo_pop_negbinom_var_intercept_slope, &quot;work/fit_ufo_pop_negbinom_var_intercept_slope.rds&quot; )</p>
<p>	message(&quot;Varying intercept and slope negative binomial model fitted.&quot;)</p>
<p>} else {</p>
<p>	fit_ufo_pop_negbinom_var_intercept_slope &lt;- readRDS( &quot;work/fit_ufo_pop_negbinom_var_intercept_slope.rds&quot; )</p>
<p>}</p>
<p># Hierarchical normal. (Linear regression)<br />
if( not( file.exists( &quot;work/fit_ufo_pop_normal_var_intercept_slope.rds&quot; ) ) ) {</p>
<p>	message(&quot;Fitting varying intercept and slope normal model&quot;)</p>
<p>	fit_ufo_pop_normal_var_intercept_slope &lt;-<br />
		stan( file=&quot;model/population_model_normal_var_intercept_slope.stan&quot;,<br />
			  data=list(<br />
							observations = nrow( ufo_population_sightings ),<br />
							population_raw = extract2( ufo_population_sightings, &quot;population&quot; ),<br />
							counts = extract2( ufo_population_sightings, &quot;count&quot; ),<br />
							states = length( unique( ufo_population_sightings$state ) ),<br />
							state = as.numeric( factor( ufo_population_sightings$state ) )<br />
							),<br />
			  chains=4, iter=2000,<br />
			  control = list(max_treedepth = 15, adapt_delta=0.9)<br />
		)</p>
<p>	saveRDS( fit_ufo_pop_normal_var_intercept_slope, &quot;work/fit_ufo_pop_normal_var_intercept_slope.rds&quot; )</p>
<p>	message(&quot;Varying intercept and slope normal model fitted.&quot;)</p>
<p>} else {</p>
<p>	fit_ufo_pop_normal_var_intercept_slope &lt;- readRDS( &quot;work/fit_ufo_pop_normal_var_intercept_slope.rds&quot; )</p>
<p>}</p>
<p>## Notify by text<br />
message(&quot;All models fit.&quot;)</p>
<p># Compare models with LOO<br />
log_lik_normal &lt;- extract_log_lik(fit_ufo_pop_normal, merge_chains = FALSE)<br />
r_eff_normal &lt;- relative_eff(exp(log_lik_normal))<br />
loo_normal &lt;- loo(log_lik_normal, r_eff = r_eff_normal, cores = 2)</p>
<p>log_lik_poisson &lt;- extract_log_lik(fit_ufo_pop_poisson, merge_chains = FALSE)<br />
r_eff_poisson &lt;- relative_eff(exp(log_lik_poisson))<br />
loo_poisson &lt;- loo(log_lik_poisson, r_eff = r_eff_poisson, cores = 2)</p>
<p>log_lik_negbinom &lt;- extract_log_lik(fit_ufo_pop_negbinom, merge_chains = FALSE)<br />
r_eff_negbinom &lt;- relative_eff(exp(log_lik_negbinom))<br />
loo_negbinom &lt;- loo(log_lik_negbinom, r_eff = r_eff_negbinom, cores = 2)# Compare models with LOO</p>
<p>log_lik_negbinom_var_intercept &lt;- extract_log_lik(fit_ufo_pop_negbinom_var_intercept, merge_chains = FALSE)<br />
r_eff_negbinom_var_intercept &lt;- relative_eff(exp(log_lik_negbinom_var_intercept))<br />
loo_negbinom_var_intercept &lt;- loo(log_lik_negbinom_var_intercept, r_eff = r_eff_negbinom_var_intercept, save_psis = TRUE)</p>
<p>log_lik_negbinom_var_intercept_slope &lt;- extract_log_lik(fit_ufo_pop_negbinom_var_intercept_slope, merge_chains = FALSE)<br />
r_eff_negbinom_var_intercept_slope &lt;- relative_eff(exp(log_lik_negbinom_var_intercept_slope))<br />
loo_negbinom_var_intercept_slope &lt;- loo(log_lik_negbinom_var_intercept_slope, r_eff = r_eff_negbinom_var_intercept_slope, save_psis = TRUE)</p>
<p>normal_poisson_comparison &lt;- compare( loo_normal, loo_poisson )<br />
poiss_negbinom_comparison &lt;- compare( loo_poisson, loo_negbinom )<br />
negbinom_negbinom_var_intercept_comparison &lt;- compare( loo_negbinom, loo_negbinom_var_intercept )<br />
negbinom_var_intercept_negbinom_var_intercept_slope_comparison &lt;- compare( loo_negbinom_var_intercept, loo_negbinom_var_intercept_slope )</p>
<p>saveRDS( normal_poisson_comparison, &quot;work/normal_poisson_comparison.rds&quot; )<br />
saveRDS( poiss_negbinom_comparison, &quot;work/poiss_negbinom_comparison.rds&quot; )<br />
saveRDS( negbinom_negbinom_var_intercept_comparison, &quot;work/negbinom_negbinom_var_intercept_comparison.rds&quot; )<br />
saveRDS( negbinom_var_intercept_negbinom_var_intercept_slope_comparison, &quot;work/negbinom_var_intercept_negbinom_var_intercept_slope_comparison.rds&quot; )</p>
[/code]
</div></div>
</div>
<h2>Footnotes</h2>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2019/04/17/bayes-vs-the-invaders-part-three-the-parallax-view/feed/</wfw:commentRss>
			<slash:comments>4</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">654</post-id>	</item>
		<item>
		<title>Bayes vs. the Invaders! Part Two: Abnormal Distributions</title>
		<link>https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/</link>
					<comments>https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/#comments</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Mon, 08 Apr 2019 14:48:58 +0000</pubDate>
				<category><![CDATA[beyond the veil]]></category>
		<category><![CDATA[stan]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/?p=559</guid>

					<description><![CDATA[<div class="mh-excerpt">This post continues our series on developing statistical models to explore the arcane relationship between UFO sightings and population. The simple linear model developed in the previous post is far from satisfying. It makes many unsupportable assumptions about the data and the form of the residual errors from the model. Most obviously, it relies on an underlying Gaussian (or _normal_) distribution for its understanding of the data. For our count data, some basic features of the Guassian are inappropriate. </div> <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/" title="Bayes vs. the Invaders! Part Two: Abnormal Distributions">[...]</a>]]></description>
										<content:encoded><![CDATA[<h1>Crossing the Line</h1>
<p>This post continues our series on developing statistical models to explore the arcane relationship between UFO sightings and population. The previous post is available here: <a href="http://www.weirddatascience.net/index.php/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/">Bayes vs. the Invaders! Part One: The 37th Parallel</a>.</p>
<p>The simple linear model developed in the previous post is far from satisfying. It makes many unsupportable assumptions about the data and the form of the residual errors from the model. Most obviously, it relies on an underlying Gaussian (or <em>normal</em>) distribution for its understanding of the data. For our count data, some basic features of the Guassian are inappropriate.</p>
<p>Most notably:</p>
<ul>
<li> a Gaussian distribution is <a href="https://en.wikipedia.org/wiki/Probability_distribution#Continuous_probability_distribution">continuous</a> whilst counts are <a href="https://en.wikipedia.org/wiki/Probability_distribution#Discrete_probability_distribution">discrete</a> &#8212; you can&#8217;t have 2.3 UFO sightings in a given day;</li>
<li> the Gaussian can produce negative values, which are impossible when dealing with counts &#8212; you can&#8217;t have a negative number of UFO sightings;</li>
<li> the Gaussian is symmetrical around its mean value whereas count data is typically <em>skewed</em>.</li>
</ul>
<p>Moving from the safety and comfort of basic <em>linear regression</em>, then, we will delve into the madness and chaos of <em>generalized linear models</em> that allow us to choose from a range of distributions to describe the relationship between state population and counts of UFO sightings.</p>
<h1>Basic Models</h1>
<p>We will be working in a <a href="https://en.wikipedia.org/wiki/Bayesian_inference">Bayesian</a> framework, in which we assign a <em>prior distribution</em> to each parameter that allows, and requires, us to express some <em>prior knowledge</em> about the parameters of interest. These priors are the initial starting points for parameters Afrom which the model moves towards the underlying values as it learns from the data. Choice of priors can have significant effects not only on the outputs of the model, but also its ability to function effectively; as such, it is both an important, but also arcane and subtle, aspect of the Bayesian approach<span id='easy-footnote-8-559' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/#easy-footnote-bottom-8-559' title='This is not a full, or even partially adequate, description of the Bayesian approach to statistical inference, and as such we will skip over a number of very significant details. This includes full discussion of the Bayesian approach, the difference between informative and uninformative priors, and the reasoning behind many of the choices made with respect to priors. An excellent, hugely engaging, and practical textbook for indoctrination into the cult of Bayesian inference is &lt;a href=&quot;https://xcelab.net/rm/statistical-rethinking/&quot;&gt;Statistical Rethinking&lt;/a&gt; by Richard McElreath.'><sup>8</sup></a></span>.</p>
<p>Practically speaking, a simple linear regression can be expressed in the following form:<br />

$$y \sim \mathcal{N}(\mu, \sigma)$$</p>
<p>(Read as &#8220;&#92;(y&#92;) <em>is drawn from</em> a normal distribution with mean &#92;(\mu&#92;) and standard deviation &#92;(\sigma&#92;)&#8221;).</p>
<p>In the the above expression the model relies on a Gaussian, or <em>normal</em> <em>likelihood</em> (&#92;(\mathcal{N}&#92;)) to describe the data &#8212; making assertions regarding how we believe the underlying data was generated. The Gaussian distribution is parameterised by a <em>location parameter</em> (&#92;(\mu&#92;)) and a standard deviation (&#92;(\sigma&#92;)).</p>
<p>If we were uninterested in prediction, we could describe the <em>shape</em> of the distribution of counts (&#92;(y&#92;)) without a predictor variable. In this approach, we could specify our model by providing <em>priors</em> for &#92;(\mu&#92;) and &#92;(\sigma&#92;) that express a level of belief in their likely values:</p>
<p>$$\begin{eqnarray}<br />
y &amp;\sim&amp; \mathcal{N}(\mu, \sigma) &#92;&#92;<br />
\mu &amp;\sim&amp; \mathcal{N}(0, 1) &#92;&#92;<br />
\sigma &amp;\sim&amp; \mathbf{HalfCauchy}(2)<br />
\end{eqnarray}$$</p>
<p>This provides an initial belief as to the likely shape of the data that informs, via arcane computational procedures, the model of how the observed data approaches the underlying truth<span id='easy-footnote-9-559' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/#easy-footnote-bottom-9-559' title='For practical reasons related to efficient computation and numerical stability, it is common practice to &lt;em&gt;standardize&lt;/em&gt; the input data by centering and scaling it so that the mean value of the data is 0 and the variance is 1. Rather than being simply a convenience, this practice can have significant effects on both the speed and stability of models, and is highly recommended unless there is a good reason to avoid it.'><sup>9</sup></a></span>.</p>
<p>This model is less than interesting, however. It simply defines a range of possible Gaussian distributions without unveiling the horror of the underlying relationships between unsuspecting terrestrial inhabitants and anomalous events.</p>
<p>To construct such a model, relating a <em>predictor</em> to a <em>response</em>, we express those relationships as follows:</p>
<p>$$\begin{eqnarray}<br />
y &amp;\sim&amp; \mathcal{N}(\mu, \sigma) &#92;&#92;<br />
\mu &amp;=&amp; \alpha + \beta x &#92;&#92;<br />
\alpha &amp;\sim&amp; \mathcal{N}(0, 1) &#92;&#92;<br />
\beta &amp;\sim&amp; \mathcal{N}(0, 1) &#92;&#92;<br />
\sigma &amp;\sim&amp; \mathbf{HalfCauchy}(1)<br />
\end{eqnarray}$$</p>
<p>In this model, the parameters of the likelihood are now probability distributions themselves. From a traditional linear model, we now have an <em>intercept</em> (&#92;(\alpha&#92;)), and a <em>slope</em> (&#92;(\beta&#92;)) that relates the change in the predictor variable (&#92;(x&#92;)) to the change in the response. Each of these <a href="https://en.wikipedia.org/wiki/Hyperparameter"><em>hyperparameters</em></a> is fitted according to the observed dataset.</p>
<h1>A New Model</h1>
<p>We can now break free from the bonds of pure linear regression and consider other distributions that more naturally describe data of the form that we are considering. The awful power of GLMs is that they can use an underlying linear model, such &#92;(\alpha + \beta x&#92;), as parameters to a range of likelihoods beyond the Gaussian. This allows the natural description of a vast and esoteric menagerie of possible data.</p>
<p>The second key element of a generalised linear model is the <em>link function</em> that transforms the relationship between the parameters and the data into a form suitable for our twisted calculations. We can consider the link function as acting on the linear predictor &#8212; such as &#92;(\alpha + \beta x&#92;) in our example model &#8212; to represent a different relationship via a range of possible functions, many of which are inextricably bound to certain likelihood functions.</p>
<p>For count data the most commonly-chosen likelihood is the <a href="https://en.wikipedia.org/wiki/Poisson_distribution">Poisson</a> distribution, whose sole parameter is the <em>arrival rate</em> (&#92;(\lambda&#92;)). While somewhat restricted, as we will see, we can begin our descent into madness by fitting a Poisson-based model to our observed data. For Poisson-based generalised linear models, the canonical link function is the <em>log</em> &#8212;  our linear predictor, rather than directly being the parameter &#92;(\lambda&#92;) is instead the <em>logarithm</em> of &#92;(\lambda&#92;). The insidious effects of this on the output of the model will become all too obvious as we persist.</p>
<h1>Stan</h1>
<p>To fit a model, we will use the <a href="http://mc-stan.org">Stan</a> probabilistic programming language. Stan allows us to write a program defining a stastical model which can then be fit to the data using <a href="https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov-Chain Monte Carlo</a> (MCMC) methods. In effect, at a very abstract level, this approach uses a random sampling to discover the values of the parameters that best fit the observed data<span id='easy-footnote-10-559' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/#easy-footnote-bottom-10-559' title='For a fuller explanation of the concepts behind this approach, the &lt;a href=&quot;http://mc-stan.org&quot;&gt;Stan website&lt;/a&gt; and the book &lt;a href=&quot;https://xcelab.net/rm/statistical-rethinking/&quot;&gt;Statistical Rethinking&lt;/a&gt; are enormously valuable references. For the deepest descent into the occult mysteries, Gelman et al.&amp;#8217;s &lt;a href=&quot;http://www.stat.columbia.edu/~gelman/book/&quot;&gt;Bayesian Data Analysis&lt;/a&gt; is &lt;a href=&quot;http://www.hplovecraft.com/writings/texts/fiction/dh.aspx&quot;&gt;the key and the guardian of the gate&lt;/a&gt;.'><sup>10</sup></a></span>.</p>
<p>Stan lets us specify models in the form given above, along with ways to pass in and define the nature and form of the data. This code can then be called from R using the <code>rstan</code> package.</p>
<p>In this, and subsequent posts, we will be using Stan code directly as both a learning and explanatory exercise. In typical usage, however it is often more convenient to use one of two excellent R packages <a href="https://github.com/paul-buerkner/brms"><code>brms</code></a> or <a href="https://github.com/stan-dev/rstanarm"><code>rstanarm</code></a> that allow for more compact and convenient specification of models, with well-specified raw Stan code generated automatically.</p>
<h1>De Profundis</h1>
<p>In seeking to take our first steps beyond the <a href="http://www.hplovecraft.com/writings/texts/fiction/cc.aspx">placid island of ignorance</a> of the Gaussian, the Poisson distribution is a first step for assessing count data. Adapting the Gaussian model above, we can propose a predictive model for the entire population of states as follows:</p>
<p>$$\begin{eqnarray}<br />
y &amp;\sim&amp; \mathbf{Poisson}(\lambda) &#92;&#92;<br />
\log( \lambda ) &amp;=&amp; \alpha + \beta x &#92;&#92;<br />
\alpha &amp;\sim&amp; \mathcal{N}(0, 1) &#92;&#92;<br />
\beta &amp;\sim&amp; \mathcal{N}(0, 1)<br />
\end{eqnarray}$$</p>
<p>The sole parameter of the Poisson is the <em>arrival rate</em> (&#92;(\lambda&#92;)) that we construct here from a population-wide intercept (&#92;(\alpha&#92;)) and slope (&#92;(\beta&#92;)). Note that, in contrast to earlier models, the linear predictor is subject to the &#92;(\log&#92;) <em>link function</em>.</p>
<p>The Stan code for the above model, and associated R code to run it, is below:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show model specification and execution code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p>population_model_poisson.stan<br />
[code language=&#8221;c&#8221;]
<p>data {</p>
<p>	// Number of rows (observations)<br />
	int&lt;lower=1&gt; observations;</p>
<p>	// Predictor (population of state)<br />
	vector[ observations ] population_raw;</p>
<p>	// Response (counts)<br />
	int&lt;lower=0&gt; counts[observations];</p>
<p>}</p>
<p>transformed data {</p>
<p>	// Center and scale the predictor<br />
	vector[ observations ] population;<br />
	population = ( population_raw &#8211; mean( population_raw ) ) / sd( population_raw );</p>
<p>}</p>
<p>parameters {</p>
<p>	// Intercept<br />
	real&lt; lower=0 &gt; a;</p>
<p>	// Slope<br />
	real&lt; lower=0 &gt; b;</p>
<p>}</p>
<p>transformed parameters {</p>
<p>	vector&lt;lower=0&gt;[ observations ] mu;<br />
	mu = a + b * population;</p>
<p>}</p>
<p>model {</p>
<p>	// Priors<br />
	a ~ normal( 0, 1 );<br />
	b ~ normal( 0, 1 );</p>
<p>	// Model using the log-parameterised poisson<br />
	counts ~ poisson_log( mu );</p>
<p>}</p>
<p>generated quantities {</p>
<p>	// Posterior predictions<br />
	vector[observations] counts_pred;</p>
<p>	// Log likelihood (for LOO)<br />
	vector[observations] log_lik;</p>
<p>	for (n in 1:observations) {</p>
<p>		log_lik[n] = poisson_log_lpmf( counts[n] | mu[n] );<br />
		counts_pred[n] = poisson_log_rng( mu[n] );</p>
<p>	}</p>
<p>}</p>
[/code]
<p>population_model_poisson.r<br />
[code language=&#8221;r&#8221;]
library( tidyverse )<br />
library( magrittr )</p>
<p>library( ggplot2 )<br />
library( showtext )</p>
<p>library( rstan )<br />
library( tidybayes )<br />
library( loo )</p>
<p># Load UFO data<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p># Fit model of UFO sightings (Poisson)<br />
# As this is computationally expensive, the fitted model will be<br />
# saved to disk, and the process only run if the saved model file<br />
# does not already exist.<br />
if( not( file.exists( &quot;work/fit_ufo_pop_poisson.rds&quot; ) ) ) {</p>
<p>	message(&quot;Fitting basic Poisson model.&quot;)<br />
	sms_notify(&quot;Fitting basic Poisson model.&quot;)</p>
<p>	fit_ufo_pop_poisson &lt;-<br />
		stan( file=&quot;model/population_model_poisson.stan&quot;,<br />
			  data=list(<br />
							observations = nrow( ufo_population_sightings ),<br />
							population = extract2( ufo_population_sightings, &quot;population&quot; ),<br />
							counts = extract2( ufo_population_sightings, &quot;count&quot; )<br />
			  )<br />
		)</p>
<p>	saveRDS( fit_ufo_pop_poisson, &quot;work/fit_ufo_pop_poisson.rds&quot; )</p>
<p>	message(&quot;Basic Poisson model fitted.&quot;)</p>
<p>} else {</p>
<p>	fit_ufo_pop_poisson &lt;- readRDS( &quot;work/fit_ufo_pop_poisson.rds&quot; )</p>
<p>}<br />
[/code]
</div></div>
</div>
<p>With this model encoded and fit, we can now peel back the layers of the procedure to see the extent to which it has endured the horror of our data.</p>
<p>The MCMC algorithm that underpins Stan &#8212; specifically <a href="https://en.wikipedia.org/wiki/Hamiltonian_Monte_Carlo">Hamiltonian Monte Carlo</a> (HMC) using the <a href="http://www.stat.columbia.edu/~gelman/research/unpublished/nuts.pdf">No U-Turn Sampler</a> (NUTS) &#8212; attempts to find an island of stability in the space of possibilities that corresponds to the best fit to the observed data. To do so, the algorithm spawns a set of <a href="https://en.wikipedia.org/wiki/Markov_chain">Markov chains</a> that explore the parameter space. If the model is appropriate, and the data coherent, the set of Markov chains end up <em>converging</em> to exploring a similar, small set of possible states.</p>
<h1>Validation</h1>
<p>When modelling via this approach, a first check of the model&#8217;s chances of having fit correctly is to examine the so-called &#8216;traceplot&#8217; that shows how well the separate Markov chains &#8216;mix&#8217; &#8212; that is, converge to exploring the same area of the parameter space<span id='easy-footnote-11-559' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/#easy-footnote-bottom-11-559' title='&amp;#8220;&lt;a href=&quot;https://arxiv.org/abs/1709.01449&quot;&gt;Visualization in Bayesian workflow&lt;/a&gt;&amp;#8221; by Gabry et al. is an excellent reference for the visual aspects of this approach to model checking.'><sup>11</sup></a></span>. For the Poisson model above, the traceplot can be created using the <code>bayesplot</code> library:</p>
<figure id="attachment_612" aria-describedby="caption-attachment-612" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot.png"><img loading="lazy" decoding="async" data-attachment-id="612" data-permalink="https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/poisson_traceplot-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="bvi02-poisson_traceplot" data-image-description="" data-image-caption="&lt;p&gt;Traceplot of Markov chains from Poisson model fitting.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot.png" alt="Traceplot of Markov chains from Poisson model fitting." width="1920" height="1080" class="size-full wp-image-612" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot-64x36.png 64w" sizes="auto, (max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-612" class="wp-caption-text">Traceplot of Markov chains from Poisson model fitting. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_traceplot.pdf">PDF Version</a>)</figcaption></figure>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show traceplot code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
[code language=&#8221;r&#8221;]
library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p>library( ggplot2 )<br />
library( showtext )<br />
library( cowplot )</p>
<p>library( rstan )<br />
library( bayesplot )<br />
library( tidybayes )</p>
<p># Load UFO data<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p># UFO reporting font<br />
font_add( &quot;main_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
font_add( &quot;bold_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Bayesplot needs to be told which theme to use as a default.<br />
theme_set( theme_weird() )</p>
<p># Read the fitted model<br />
fit_ufo_pop_poisson &lt;- readRDS( &quot;work/fit_ufo_pop_poisson.rds&quot; )</p>
<p># First, as always, a traceplot<br />
tp &lt;-<br />
	traceplot(<br />
				 fit_ufo_pop_poisson,<br />
				 pars = c(&quot;a&quot;, &quot;b&quot;),<br />
				 ncol=1 ) +<br />
	scale_colour_viridis_d( name=&quot;Chain&quot;, direction=-1 ) +<br />
	theme_weird()</p>
<p>title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;Traceplot of Key Model Parameters&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.40) </p>
<p>titled_tp &lt;-<br />
	plot_grid(title, tp, ncol=1, rel_heights=c(0.1, 1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/poisson_traceplot.pdf&quot;,<br />
			 titled_tp,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )<br />
[/code]
</div></div>
</div>
<p>These traceplots exhibit the characteristic insane scribbling of well-mixed chains often referred to, in hushed whispers, as weirdly reminiscent of a <a href="https://druedin.com/2016/12/26/that-hairy-caterpillar/">hairy caterpillar</a>; the separate lines representing each chain are clearly overlapping and exploring the same forbidding regions. If, by contrast, the lines were largely separated or did not show the same space, there would be reason to believe that our model had become lost and unable to find a coherent voice amongst the myriad babbling murmurs of the data.</p>
<p>A second check on the sanity of the modelling process is to examine the output of the model itself to show the value of the fitted parameters of interest, and some diagnostic information:</p>
<pre class="brush: r; title: ; notranslate">
fit_ufo_pop_poisson %&gt;%
summary(pars=c(&quot;a&quot;, &quot;b&quot; )) %&gt;%
extract2( &quot;summary&quot; )
       mean      se_mean          sd      2.5%       25%       50%      75%     97.5%    n_eff      Rhat
a 4.0236045 1.026568e-04 0.004851688 4.0139485 4.0203329 4.0236485 4.026829 4.0330836 2233.626 0.9995597
b 0.5070227 6.206903e-05 0.002263160 0.5027733 0.5054245 0.5069979 0.508547 0.5115027 1329.477 1.0021745
</pre>
<p>For assessment of successful model fit, the <a href="https://github.com/stan-dev/stan/wiki/Stan-Best-Practices">Rhat</a> (&#92;(\hat{R}&#92;)) value represents the extent to which the various Markov chains exploring the parameter space, of which there are four by default in Stan, are consistent with each other. As a rule of thumb, a value of &#92;(\hat{R} \gt 1.1&#92;) indicates that the model has not converged appropriately and may require a longer set of random sampling iterations, or an improved model. Here, the values of &#92;(\hat{R}&#92;) are close to the ideal value of 1.</p>
<p>As a final step, we should examine how well our model can reproduce the shape of the original data. Models aim to be eerily lifelike parodies of the truth; in a Bayesian framework, and in the Stan language, we can build into the model the ability to draw random samples from the <em>posterior predictive distribution</em> &#8212; the set of parameters that the model has learnt from the data &#8212; to create new possible values of the outcomes based on the observed inputs. This process can be repeated many times to produce a multiplicity of possible outcomes drawn from model, which we can then visualize to see graphically how well our model fits the observed data.</p>
<p>In the Stan code above, this is created in the <code>generated_quantities</code> block. When using more convenient libraries such as <code>brms</code> or <code>rstanarm</code>, draws from the posterior predictive distribution can be obtained more simply after the model has been fit through a range of helper functions. Here, we undertake the process manually.</p>
<p>We can see, then, how well the Poisson distribution, informed by our selection of priors, has shaped itself to the underlying data.</p>
<figure id="attachment_610" aria-describedby="caption-attachment-610" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive.png"><img loading="lazy" decoding="async" data-attachment-id="610" data-permalink="https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/poisson_posterior_predictive-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="bvi02-poisson_posterior_predictive" data-image-description="" data-image-caption="&lt;p&gt;Posterior predictive density plot of fitted Poisson model.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive.png" alt="Posterior predictive density plot of fitted Poisson model." width="1920" height="1080" class="size-full wp-image-610" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive-64x36.png 64w" sizes="auto, (max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-610" class="wp-caption-text">Posterior predictive density plot of fitted Poisson model. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_posterior_predictive.pdf">PDF Version</a>)</figcaption></figure>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show posterior predictive plot code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
[code language=&#8221;r&#8221;]
library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p>library( ggplot2 )<br />
library( showtext )<br />
library( cowplot )</p>
<p>library( rstan )<br />
library( bayesplot )<br />
library( tidybayes )</p>
<p># Load UFO data<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p># UFO reporting font<br />
font_add( &quot;main_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
font_add( &quot;bold_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Plots, posterior predictive checking, LOO. </p>
<p># Bayesplot needs to be told which theme to use as a default.<br />
theme_set( theme_weird() )</p>
<p># Read the fitted model<br />
fit_ufo_pop_poisson &lt;- readRDS( &quot;work/fit_ufo_pop_poisson.rds&quot; )</p>
<p>## Model checking visualisations</p>
<p># Extract posterior estimates from the fit (from the generated quantities of the stan model)<br />
counts_pred_poisson &lt;- as.matrix( fit_ufo_pop_poisson, pars = &quot;counts_pred&quot; )</p>
<p># Posterior predictive density. (Visual representation of goodness of fit.)<br />
# Sample 50 rows for overlay<br />
counts_pred_sample &lt;-<br />
	counts_pred_poisson[ sample( nrow( counts_pred_poisson ), 50 ), ]
gp_ppc &lt;-<br />
	ppc_dens_overlay(<br />
						  y = extract2( ufo_population_sightings, &quot;count&quot; ),<br />
						  yrep = counts_pred_sample,<br />
						  alpha=0.4) +<br />
	theme_weird()</p>
<p>title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;Posterior Predictive Density Plot&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.40) </p>
<p>titled_gp_ppc &lt;-<br />
	plot_grid(title, gp_ppc, ncol=1, rel_heights=c(0.1, 1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/poisson_posterior_predictive.pdf&quot;,<br />
			 titled_gp_ppc,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )<br />
[/code]
</div></div>
</div>
<p>In the diagram above, the yellow line shows the densities of count values; the cyan lines show a sample of twisted mockeries spawned by our <a href="https://www.collinsdictionary.com/dictionary/french-english/poisson">piscine</a> approximations. The model has roughly captured the shape of the distribution of the original data, but demonstrates certain hideous dissimilarities &#8212; the peak of the posterior predictive distribution is significantly skewed away from the observed value.</p>
<p>To appreciate the full horror of what we have wrought we can plot the predictions of the model against the real data.</p>
<figure id="attachment_618" aria-describedby="caption-attachment-618" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot.png"><img loading="lazy" decoding="async" data-attachment-id="618" data-permalink="https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/poisson_predictive_plot-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="bvi02-poisson_predictive_plot" data-image-description="" data-image-caption="&lt;p&gt;Global poisson GLM of UFO sightings against population.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot.png" alt="Global poisson GLM of UFO sightings against population." width="1920" height="1080" class="size-full wp-image-618" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot-64x36.png 64w" sizes="auto, (max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-618" class="wp-caption-text">Global poisson GLM of UFO sightings against population. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/poisson_predictive_plot.pdf">PDF Version</a>)</figcaption></figure>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show posterior predictive plot code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
[code language=&#8221;r&#8221;]
<p>library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p>library( ggplot2 )<br />
library( showtext )<br />
library( cowplot )</p>
<p>library( rstan )<br />
library( bayesplot )<br />
library( tidybayes )<br />
library( modelr )</p>
<p># Load UFO data and model<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p>fit_ufo_pop_poisson &lt;-<br />
	readRDS(&quot;work/fit_ufo_pop_poisson.rds&quot;)</p>
<p># UFO reporting font<br />
font_add( &quot;main_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
font_add( &quot;bold_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Plots, posterior predictive checking, LOO<br />
theme_set( theme_weird() )</p>
<p># Use teal colour scheme<br />
color_scheme_set( &quot;teal&quot;)</p>
<p>## Model checking visualisations</p>
<p># Extract posterior estimates from the fit (from the generated quantities of the stan model)<br />
counts_pred_poisson &lt;- as.matrix( fit_ufo_pop_poisson, pars = &quot;counts_pred&quot; )</p>
<p># US state data<br />
us_state_factors &lt;-<br />
	levels( factor( ufo_population_sightings$state ) )</p>
<p>## Create per-state predictive fit plots</p>
<p># Convert fitted model (stanfit) object to a tibble<br />
fit_tbl &lt;-<br />
	summary(fit_ufo_pop_poisson)$summary %&gt;%<br />
	as.data.frame() %&gt;%<br />
	mutate(variable = rownames(.)) %&gt;%<br />
	select(variable, everything()) %&gt;%<br />
	as_tibble()</p>
<p>counts_predicted &lt;-<br />
	fit_tbl %&gt;%<br />
	filter( str_detect(variable,&#8217;counts_pred&#8217;) ) </p>
<p>ufo_population_sightings_pred &lt;-<br />
	ufo_population_sightings %&gt;%<br />
	ungroup() %&gt;%<br />
	mutate( count_mean = counts_predicted$mean,<br />
			 lower = counts_predicted$`2.5%`,<br />
			 upper = counts_predicted$`97.5%`) </p>
<p># (Using mean and SD of fit summary)<br />
predictive_plot &lt;-<br />
	ggplot( ufo_population_sightings_pred ) +<br />
	geom_point( aes( x=population, y=count ), colour=&quot;#0b6788&quot;, size=0.6, alpha=0.8 ) +<br />
	geom_line(aes( x=population, y=count_mean ), colour=&quot;#3cd070&quot; ) +<br />
	geom_ribbon(aes(x=population, ymin = lower, ymax = upper ), alpha = 0.2, fill=&quot;#3cd070&quot;) +<br />
	labs( x=&quot;Population (Thousands)&quot;, y=&quot;Annual Sightings&quot; ) +<br />
	scale_fill_viridis_d( name=&quot;State&quot; ) +<br />
	scale_colour_viridis_d( name=&quot;State&quot; ) +<br />
	theme(<br />
			axis.title.y = element_text( angle=90 ),<br />
			legend.position = &quot;none&quot; )</p>
<p># Construct full plot, with title and backdrop.<br />
title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;UFO Sightings against State Population (1990-2014)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;Poisson GLM. 50% credible intervals.&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.48) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.16) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org | Tool: http://www.mc-stan.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>predictive_plot_titled &lt;-<br />
	plot_grid(title, predictive_plot, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/poisson_predictive_plot.pdf&quot;,<br />
			 predictive_plot_titled,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
[/code]
</div></div>
</div>
<p>This shows a notably different line of best fit to that produced from the basic Gaussian model in the <a href="http://www.weirddatascience.net/index.php/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/">previous post</a>. The most visible difference is the curved predictor resulting from the &#92;(\log&#92;) link function, which appears to account for the changes in the data very differently to the constrained absolute linearity of the previous Gaussian model<span id='easy-footnote-12-559' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/#easy-footnote-bottom-12-559' title='A direct comparison is slightly more complex, as the output of &lt;code&gt;geom_smooth&lt;/code&gt; is a frequentist 95% &lt;a href=&quot;https://en.wikipedia.org/wiki/Confidence_interval&quot;&gt;&lt;em&gt;confidence interval&lt;/em&gt;&lt;/a&gt;, whereas this plot shows the Bayesian 95% &lt;a href=&quot;https://en.wikipedia.org/wiki/Credible_interval&quot;&gt;&lt;em&gt;credible interval&lt;/em&gt;&lt;/a&gt;. The difference between the two is beyond the scope of this post, but we will resolve this in our next steps.'><sup>12</sup></a></span>. Whether this is more or less effective remains to be seen.</p>
<h1>Unsettling Distributions</h1>
<p>In this post we have opened our eyes to the weirdly non-linear possibilities of generalised linear models; sealed and bound this concept within the wild philosophy of Bayesian inference; and unleashed the horrifying capacities of Markov Chain Monte Carlo methods and their manifestation in the Stan language.</p>
<p>Applying the Poisson distribution to our records of extraterrestrial sightings, we have seen that we can, to some extent, create a mindless <a href="https://www.amazon.co.uk/Golem-Second-Should-Science-Classics/dp/1107604656">Golem</a> that imperfectly mimics the original data. In the next post, we will delve more deeply into the esoteric possibilities of other distributions for count data, explore ways in which to account for arcane relationships across and between per-state observations, and show how we can compare the effectiveness of different models to select the final glimpse of dread truth that we inadvisably seek.</p>
<h2>Footnotes</h2>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2019/04/08/bayes-vs-the-invaders-part-two-abnormal-distributions/feed/</wfw:commentRss>
			<slash:comments>2</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">559</post-id>	</item>
		<item>
		<title>Bayes vs. the Invaders! Part One: The 37th Parallel</title>
		<link>https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/</link>
					<comments>https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/#comments</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Wed, 03 Apr 2019 13:03:10 +0000</pubDate>
				<category><![CDATA[beyond the veil]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[stan]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/?p=503</guid>

					<description><![CDATA[<div class="mh-excerpt">From our earlier <a href="http://www.weirddatascience.net/index.php/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/">studies of UFO sightings</a>, a recurring question has been the extent to which the frequency of sightings of inexplicable otherworldly phenomena depends on the population of an area. Intuitively: where there are more people to catch a glimpse of the unknown, there will be more reports of alien visitors. Is this hypothesis, however, true? Do UFO sightings closely follow population or are there other, less comforting, factors at work?</div> <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/" title="Bayes vs. the Invaders! Part One: The 37th Parallel">[...]</a>]]></description>
										<content:encoded><![CDATA[<h1>Introduction</h1>
<p>From our earlier <a href="http://www.weirddatascience.net/index.php/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/">studies of UFO sightings</a>, a recurring question has been the extent to which the frequency of sightings of inexplicable otherworldly phenomena depends on the population of an area. Intuitively: where there are more people to catch a glimpse of the unknown, there will be more reports of alien visitors.</p>
<p>Is this hypothesis, however, true? Do UFO sightings closely follow population or are there other, less comforting, factors at work?</p>
<p>In this short series of posts, we will build a statistical model of UFO sightings in the United States, based on data <a href="http://www.weirddatascience.net/blog/index.php/">previously scraped</a> from the <a href="http://www.nuforc.org">National UFO Reporting Centre</a> and see how well we can predict the rate of UFO sightings based on state population.</p>
<p>This series of posts is part tutorial and part exploration of a set of modelling tools and techniques. Specifically, we will use Generalized Linear Models (GLMs), Bayesian inference, and the <a href="http://www.mc-stan.org">Stan</a> probabilistic programming language to unveil the relationship between unsuspecting populations of US states and the dread sightings of extraterrestrial truth that they experience.</p>
<h1>Data</h1>
<p>As mentioned, we will rely on data from <a href="http://www.nuforc.org">NUFORC</a> for extraterrestrial sightings.</p>
<p>For population data, we can rely on the the <a href="https://fred.stlouisfed.org/release?rid=118">FRED</a> database for historical US state-level census data. The combination of these datasets provides us with a count of UFO sightings per year for each state, and the population of that state in that year.</p>
<p>The downloading and scraping code is included here:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show scraping code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p>ZSH script to download via `curl`<br />
[code language=&#8221;bash&#8221;]
#!/bin/zsh<br />
# Download US state-level population datasets from FRED<br />
# State series names are stored in the file &#8216;series_names&#8217; (downloaded from fred.stlouisfed.org)<br />
# &lt;https: fred.stlouisfed.org=&quot;&quot; release?rid=&quot;118&quot;&gt;<br />
#<br />
# The per-series requests is included below.&lt;/https:&gt;</p>
<p>export IFS=$&#8217;\n&#8217;</p>
<p># Download<br />
for state_series in $(cat series_names); do</p>
<p>curl -o &quot;output/$state_series.csv&quot; &quot;https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&amp;amp;chart_type=line&amp;amp;drp=0&amp;amp;fo=open%20sans&amp;amp;graph_bgcolor=%23ffffff&amp;amp;height=450&amp;amp;mode=fred&amp;amp;recession_bars=on&amp;amp;txtcolor=%23444444&amp;amp;ts=12&amp;amp;tts=12&amp;amp;width=1168&amp;amp;nt=0&amp;amp;thu=0&amp;amp;trc=0&amp;amp;show_legend=yes&amp;amp;show_axis_titles=yes&amp;amp;show_tooltip=yes&amp;amp;id=$state_series&amp;amp;scale=left&amp;amp;cosd=1900-01-01&amp;amp;coed=2018-01-01&amp;amp;line_color=%234572a7&amp;amp;link_values=false&amp;amp;line_style=solid&amp;amp;mark_type=none&amp;amp;mw=3&amp;amp;lw=2&amp;amp;ost=-99999&amp;amp;oet=99999&amp;amp;mma=0&amp;amp;fml=a&amp;amp;fq=Annual&amp;amp;fam=avg&amp;amp;fgst=lin&amp;amp;fgsnd=2009-06-01&amp;amp;line_index=1&amp;amp;transformation=lin&amp;amp;vintage_date=2019-03-04&amp;amp;revision_date=2019-03-04&amp;amp;nd=1900-01-01&quot;</p>
<p>done<br />
[/code]
<p>Necessary &#8216;series_names&#8217; file:<br />
[code language=&#8221;text&#8221;]
WAPOP<br />
GAPOP<br />
CAPOP<br />
MOPOP<br />
DSPOP<br />
ILPOP<br />
TXPOP<br />
NYPOP<br />
FLPOP<br />
ALPOP<br />
COPOP<br />
WIPOP<br />
AZPOP<br />
MIPOP<br />
NCPOP<br />
MAPOP<br />
CTPOP<br />
LAPOP<br />
OHPOP<br />
AKPOP<br />
TNPOP<br />
MNPOP<br />
NJPOP<br />
NMPOP<br />
ARPOP<br />
MDPOP<br />
PAPOP<br />
NVPOP<br />
IAPOP<br />
ORPOP<br />
T5POP<br />
DCPOP<br />
HIPOP<br />
NDPOP<br />
KYPOP<br />
VAPOP<br />
IDPOP<br />
KSPOP<br />
INPOP<br />
WVPOP<br />
RIPOP<br />
SCPOP<br />
MSPOP<br />
DEPOP<br />
MTPOP<br />
MEPOP<br />
NEPOP<br />
OKPOP<br />
WYPOP<br />
UTPOP<br />
NHPOP<br />
VTPOP<br />
SDPOP<br />
[/code]
<p>R code to combine data into tidy format<br />
[code language=&#8221;r&#8221;]
library( tidyverse )</p>
<p># Read all CSV files<br />
census_files &lt;- list.files( &quot;output&quot;, full.names=TRUE )</p>
<p># Join all data into a single table<br />
census_data &lt;-<br />
census_files %&gt;%<br />
map( read_csv ) %&gt;%																				# Read each file, forming a list with an element for each<br />
reduce( full_join, by=&quot;DATE&quot; ) %&gt;%															# Reduce (left to right) running a full join<br />
dplyr::arrange( DATE ) %&gt;%																		# Sort by date<br />
gather( key=&quot;state&quot;, value=&quot;population&quot;, -DATE ) %&gt;%									# Gather to long format<br />
transmute( date=DATE, state=str_replace( state, &quot;POP&quot;, &quot;&quot; ), population )		# Rename and tidy variables and names</p>
<p># Output to an .rds<br />
saveRDS( census_data, &quot;data/annual_population.rds&quot; )</p>
[/code]
</div></div>
</div>
<p>For ease, we will treat each year&#8217;s count of sightings as <em>independent</em> from the previous year&#8217;s &#8212; we do not make an assumption that the number of sightings in each year is based on the number of sightings in the previous year, but is rather due to the unknowable schemes of alien minds. (If extraterrestrials visitors were colonising areas in secrecy rather than making sporadic visits, and thus being seen repeatedly, we might not want to make such a bold assumption.) Each annual count will be treated as an individual, independent data point relating population to count, with each observation tagged by state.</p>
<p>For simplicity, particularly in building later models, we will restrict ourselves to sightings post 1990, roughly reflecting a period in which the NUFORC data sees a significant increase in reporting and thus relies less on historical reports. (NUFORC&#8217;s phone hotline has existed since 1974, and its web form since 1998.)</p>
<h1>An Awful Simplicity</h1>
<p>To begin, we start with the most basic form of model: a simple linear relationship between the count of sightings and the population of the state at that time. If sightings were purely dependent on population, it might be reasonable to assume that such a model would fit the data fairly well.</p>
<p>This relationship can be plotted with relative ease using the <code>geom_smooth()</code> function of <code>ggplot2</code> in R. For opening our eyes to the awful truth contained in the data, this is a useful first step.</p>
<figure id="attachment_539" aria-describedby="caption-attachment-539" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.png"><img loading="lazy" decoding="async" data-attachment-id="539" data-permalink="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/lm_ufo_population_sightings-combined-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lm_ufo_population_sightings-combined" data-image-description="" data-image-caption="&lt;p&gt;Global linear regression of UFO sightings against population.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.png" alt="Regression of UFO sightings against population." width="1920" height="1080" class="size-full wp-image-539" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-64x36.png 64w" sizes="auto, (max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-539" class="wp-caption-text">Global linear regression of UFO sightings against population. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.pdf">PDF Version</a>)</figcaption></figure>
<p>While this graph does seem to support the argument that sightings increase with population <em>in general</em>, a closer inspection shows that the individual data points are clearly clustered. If we highlight the location of each data point, colouring points by US state, this becomes clearer:</p>
<figure id="attachment_541" aria-describedby="caption-attachment-541" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.png"><img loading="lazy" decoding="async" data-attachment-id="541" data-permalink="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/lm_ufo_population_sightings-state-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lm_ufo_population_sightings-state" data-image-description="" data-image-caption="&lt;p&gt;Global linear regression of UFO sightings against population with per-state colours.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.png" alt="Global linear regression of UFO sightings against population with per-state colours." width="1920" height="1080" class="size-full wp-image-541" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-64x36.png 64w" sizes="auto, (max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-541" class="wp-caption-text">Global linear regression of UFO sightings against population with per-state colours. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.pdf">PDF Version</a>)</figcaption></figure>
<p>This strongly suggests that, in preference to the simple linear model across all sightings, we might instead fit a linear model individually to each state:</p>
<figure id="attachment_543" aria-describedby="caption-attachment-543" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.png"><img loading="lazy" decoding="async" data-attachment-id="543" data-permalink="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/lm_ufo_population_sightings-trends-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lm_ufo_population_sightings-trends" data-image-description="" data-image-caption="&lt;p&gt;Per-state linear regression of UFO sightings against population,&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.png" alt="Per-state linear regression of UFO sightings against population," width="1920" height="1080" class="size-full wp-image-543" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-64x36.png 64w" sizes="auto, (max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-543" class="wp-caption-text">Per-state linear regression of UFO sightings against population. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.pdf">PDF Version</a>)</figcaption></figure>
<p>The code to produce the above graphs from the NUFORC and FRED data is given below:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show data preparation and visualization code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p>Prepare and combine datasets:<br />
[code language=&#8221;r&#8221;]
library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p># Prepare data for model fitting (and plotting)</p>
<p># Load US population and UFO datasets<br />
ufo &lt;- read_csv( &quot;data/ufo_spatial.csv&quot; )<br />
census &lt;- readRDS( &quot;data/annual_population.rds&quot; )</p>
<p># Process UFO data to per-state counts per year.<br />
# Drop Puerto Rico as we don&#8217;t have census data. (Also, very few sightings &#8212; 33 in dataset.)<br />
ufo_state_annual &lt;-<br />
	ufo %&gt;%<br />
	# US only<br />
	filter( country == &quot;us&quot; ) %&gt;%<br />
	# Apologies to Puerto Rico.<br />
	filter( state != &quot;pr&quot; ) %&gt;%<br />
	# Convert date to year, drop all other variables except state.<br />
	transmute( date = year( as.POSIXct( datetime, format=&quot;%m/%d/%Y %H:%M&quot; ) ), state=str_to_upper( state ) ) %&gt;%<br />
	# Group by year<br />
	group_by( date, state ) %&gt;%<br />
	# Sum sightings<br />
	summarize( count = n() )</p>
<p># Process census suitable for joining with UFO sightings.<br />
# Drop &quot;DS&quot; state entry &#8212; (&quot;Department of State&quot;?)<br />
census &lt;-<br />
	census %&gt;%<br />
	filter( state != &quot;DS&quot; ) %&gt;%<br />
	mutate( date=year( date ) ) </p>
<p># Join datasets<br />
ufo_population_sightings &lt;-<br />
	full_join( ufo_state_annual, census )</p>
<p># Missing data implies zero sightings.<br />
# Restrict to post-1990 to avoid a high proportion of very small numbers of<br />
# sightings.<br />
ufo_population_sightings &lt;-<br />
	ufo_population_sightings %&gt;%<br />
	mutate( count = replace_na( count, 0 ) ) %&gt;%<br />
	filter( !is.na( population ) ) %&gt;%<br />
	filter( date &gt;= 1990 ) %&gt;%<br />
	filter( date &lt;= 2014 )</p>
<p>saveRDS( ufo_population_sightings, &quot;work/ufo_population_sightings.rds&quot; )<br />
[/code]
<p>Fit linear trend in data via <code>geom_smooth()</code> using a linear model.<br />
[code language=&#8221;r&#8221;]
library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p>library( ggplot2 )<br />
library( showtext )<br />
library( RColorBrewer )</p>
<p>library( cowplot )</p>
<p># Load UFO data<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p># UFO reporting font<br />
font_add( &quot;main_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Combined plot ignoring states.<br />
ufo_pop_plot &lt;-<br />
	ggplot( ufo_population_sightings, aes( x=population, y=count )  ) +<br />
	geom_point( colour=&quot;#0b6788&quot;, size=0.6, alpha=0.8 ) +<br />
	geom_smooth( method=&quot;lm&quot;, colour=&quot;#3cd070&quot; ) +  # UFO green<br />
	xlab( &quot;Population&quot; ) +<br />
	ylab( &quot;Sightings per annum&quot; ) +<br />
	theme_weird() +<br />
	theme(<br />
			axis.title.y = element_text( angle=90 )<br />
			)</p>
<p># Construct full plot, with title and backdrop.<br />
title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;UFO Sightings against State Population (1990-2014)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.40) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>ufo_pop_titled &lt;-<br />
	plot_grid(title, ufo_pop_plot, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/lm_ufo_population_sightings-combined.pdf&quot;,<br />
			 ufo_pop_titled,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
<p># Combined plot colouring states.<br />
ufo_pop_plot_states &lt;-<br />
	ggplot( ufo_population_sightings, aes( x=population, y=count )  ) +<br />
	geom_point( aes( colour=state ), size=0.6, alpha=0.8 ) +<br />
	geom_smooth( method=&quot;lm&quot;, colour=&quot;#3cd070&quot; ) +  # UFO green<br />
	xlab( &quot;Population&quot; ) +<br />
	ylab( &quot;Sightings per annum&quot; ) +<br />
	scale_colour_manual( values=rep( brewer.pal( name=&quot;Set3&quot;, n=12 ), times=5 ) ) +<br />
	theme_weird() +<br />
	theme(<br />
			axis.title.y = element_text( angle=90 ),<br />
			legend.position=&quot;none&quot;<br />
			)</p>
<p># Construct full plot, with title and backdrop.<br />
title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;UFO Sightings against State Population (1990-2014)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;(Per-state sightings)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=16, hjust=0, vjust=1, x=0.02, y=0.48) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.16) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>ufo_pop_states_titled &lt;-<br />
	plot_grid(title, ufo_pop_plot_states, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/lm_ufo_population_sightings-state.pdf&quot;,<br />
			 ufo_pop_states_titled,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
<p># Combined plot colouring states with per-state trend lines.<br />
ufo_pop_plot_states_trends &lt;-<br />
	ggplot( ufo_population_sightings, aes( x=population, y=count )  ) +<br />
	geom_point( aes( colour=state ), size=0.6, alpha=0.8 ) +<br />
	geom_smooth( method=&quot;lm&quot;, aes( colour=state ) ) +<br />
	xlab( &quot;Population&quot; ) +<br />
	ylab( &quot;Sightings Per Annum&quot; ) +<br />
	scale_colour_manual( values=rep( brewer.pal( name=&quot;Set3&quot;, n=12 ), times=5 ) ) +<br />
	theme_weird() +<br />
	theme(<br />
			axis.title.y = element_text( angle=90 ),<br />
			legend.position=&quot;none&quot;<br />
			)</p>
<p># Construct full plot, with title and backdrop.<br />
title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;UFO Sightings against State Population (1990-2014)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;(Per-state trends)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=16, hjust=0, vjust=1, x=0.02, y=0.48) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.16) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>ufo_pop_states_trends_titled &lt;-<br />
	plot_grid(title, ufo_pop_plot_states_trends, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/lm_ufo_population_sightings-trends.pdf&quot;,<br />
			 ufo_pop_states_trends_titled,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
[/code]
</div></div>
</div>
<h1>Result</h1>
<p>The plots shown here strongly indicate that the rate of dread interplanetary visitations per capita varies differently per state. It seems, therefore, that while the number of sightings is generally proportional to population, the specific relationship is state-dependent.</p>
<p>This simple linear model is, however, entirely unsatisfactory in describing the data, despite its support for the argument that different states have different underlying rates of sightings.</p>
<p>In the next post, therefore, we will delve deeper into the unsettling relationships between UFO sightings and the innocent humans to which they are drawn. To do so, we will have to consider a class of techniques that go beyond the normal distribution that underpins key assumptions of the simple linear models used here, and so move into the eldritch world of <em>generalized linear models</em>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/feed/</wfw:commentRss>
			<slash:comments>6</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">503</post-id>	</item>
		<item>
		<title>The Shape of the Other: The Evolution of UFO Sightings by Shape</title>
		<link>https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/</link>
					<comments>https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/#comments</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Wed, 20 Jun 2018 16:09:02 +0000</pubDate>
				<category><![CDATA[scraping]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/blog/?p=331</guid>

					<description><![CDATA[<div class="mh-excerpt">In earlier analyses of the UFO phenomenon, based on the NUFORC dataset, we have examined the global density of sightings, and the relative distribution of sightings against the location of military bases in the United States. All of these analyses have, however, considered individual sightings to be more <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/" title="The Shape of the Other: The Evolution of UFO Sightings by Shape">[...]</a></div>]]></description>
										<content:encoded><![CDATA[<p>In earlier analyses of the UFO phenomenon, based on the NUFORC dataset, we have examined the global density of sightings, and the relative distribution of sightings against the location of military bases in the United States. All of these analyses have, however, considered individual sightings to be more or less equivalent.</p>
<p>The NUFORC dataset, however, provides much more detailed information on individual sightings. The most significant immediate feature of each report, beyond its time and location, is the recorded shape of each object. Was the reported UFO saucer-shaped? Triangular? A flash of light? Or did the individual see more more than one object moving in formation? By considering this aspect of the data we can interrogate more closely the nature of UFO sightings over the years.</p>
<p>The NUFORC dataset classifies each sighting as one of 46 possible shapes, with approximately three percent of entries not being classified directly. Of those 46, several categories overlap each other; &#8220;Triangle&#8221;, &#8220;triangle&#8221;, and &#8220;Triangular&#8221; are all, for example, possibilities. Additionally, the dataset contains both &#8220;other&#8221; and &#8220;unknown&#8221; categories.</p>
<p>With a minimal level of cleaning we are left with 26 categories, including the familiar circular objects, but also &#8220;crescent&#8221; (2 entries), &#8220;hexagon&#8221; (1 entry), and &#8220;cross&#8221; (356 entries). For easier representation and analysis, we have collapsed several infrequent and similar categories together, resulting in eight top-level categories distributed in the following way:</p>
<figure id="attachment_341" aria-describedby="caption-attachment-341" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer.png"><img loading="lazy" decoding="async" data-attachment-id="341" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/all_frequency-fewer-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="all_frequency-fewer" data-image-description="" data-image-caption="&lt;p&gt;Frequency of UFO sightings by shape.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-1024x576.png" alt="Frequency of UFO sightings by shape." width="1024" height="576" class="size-large wp-image-341" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-341" class="wp-caption-text">Frequency of UFO sightings by shape. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer.pdf" rel="noopener noreferrer" target="_blank">PDF Version</a>.) | <a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency.png">Frequency of UFO sightings by shape (all categories).</a> (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency.pdf" rel="noopener noreferrer" target="_blank">PDF Version</a>.)</figcaption></figure>
<p>We can clearly see from this that lights are the most commonly-reported extraterrestrial manifestation, closely followed by the category of &#8220;round&#8221; objects that most closely matches, perhaps, the traditional concept of a UFO sighting. This category does, however, extend to spheres, disks, ovals, domes, eggs, and cones.</p>
<p>This breakdown of frequency is somewhat deceptive: the sightings reported in the NUFORC database span from a reported 1400CE (a roughly-dated cave painting in Texas depicting a saucer-shaped object) to the present day. For reliability, we have discounted reports prior to 1900CE from our analysis. In our data, then, are these sightings consistent over time? Has the form and nature of our extraterrestrial visitors shifted in recent history? Are we naively assuming that all objects are from the same source, and with similar intentions?</p>
<p>At the most mundane level, the total volume of sightings has sharply increased since the early reports in the dataset. The total number of reported sightings in the 1940s was 144 in total, compared with 4934 sightings in 2017 alone, and a peak of 8651 sightings in 2014.</p>
<p>Broken down by category, the total number of sightings since 1945 is shown below. We have removed sightings prior to 1945 from this diagram, as they were sufficiently low in volume that they were not visible. The most marked rise in sightings begins in the mid-1990s, with 502 sightings in 1994 rising to 1467 in 1995, with the overall rising trend following until its peak in 2014.</p>
<figure id="attachment_343" aria-describedby="caption-attachment-343" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time.png"><img loading="lazy" decoding="async" data-attachment-id="343" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/count_over_time-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="count_over_time" data-image-description="" data-image-caption="&lt;p&gt;Count of sightings by shape over time.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-1024x576.png" alt="Count of sightings by shape over time." width="1024" height="576" class="size-large wp-image-343" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-343" class="wp-caption-text">Count of sightings by shape over time. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time.pdf">PDF Version</a>.)</figcaption></figure>
<p>To understand the specific nature of visitations, however, it is useful to view sightings as a proportion of the total, rather than their absolute numbers.</p>
<figure id="attachment_349" aria-describedby="caption-attachment-349" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time.png"><img loading="lazy" decoding="async" data-attachment-id="349" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/proportion_over_time-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="proportion_over_time" data-image-description="" data-image-caption="&lt;p&gt;Proportion of UFO sightings by shape over time.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-1024x576.png" alt="Proportion of UFO sightings by shape over time." width="1024" height="576" class="size-large wp-image-349" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-349" class="wp-caption-text">Proportion of UFO sightings by shape over time. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time.pdf">PDF Version.</a>)</figcaption></figure>
<p>It is clear that, allowing for the overall increase in numbers, the proportion of generically round UFOs has reduced since the 1950s, when they clearly dominated. The most significant increase has been the rise in triangular sightings, including &#8220;delta&#8221; and &#8220;chevron&#8221; shaped craft. This conceivably tracks the development of terrestrial military aircraft towards <a href="https://en.wikipedia.org/wiki/Delta_wing" rel="noopener noreferrer" target="_blank">&#8220;delta wing&#8221;</a> and similar profiles.</p>
<p>Since the mid-90s there have been a marked increase in sightings reported simply as &#8220;lights&#8221; &#8212; flashes, fireballs, flares, and similar. From 2000 onwards, the relative proportions appear to be mainly steady.</p>
<p>For specific cases, 1995 shows an oddly large proportion of unclassified &#8220;other&#8221; sightings, although these do not seem to be the result of any particular event. The highest proportion of these are in Seattle, with 38 sightings, but are spread fairly evenly throughout the year.</p>
<p>Breaking down sightings according to specific times, rather than year-by-year reveals some other points of interest. Firstly, sightings by month:</p>
<figure id="attachment_347" aria-describedby="caption-attachment-347" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month.png"><img loading="lazy" decoding="async" data-attachment-id="347" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/count_per_month-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="count_per_month" data-image-description="" data-image-caption="&lt;p&gt;Per-month UFO sightings by shape.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-1024x576.png" alt="Per-month UFO sightings by shape." width="1024" height="576" class="size-large wp-image-347" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-347" class="wp-caption-text">Per-month UFO sightings by shape. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month.pdf">PDF Version</a>.)</figcaption></figure>
<p>Sightings are much more common in the Northern hemisphere&#8217;s summer months, presumably due to higher numbers of people spending time outside and being in a position to spot anomalous phenomena.</p>
<p>Breaking down sightings by hour, we can see that sightings are far more common at night than during the day, with the lowest volumes of sightings around 08:00, and the highest at 21:00. For both monthly and hourly sightings, the relative proportions of sightings by shape remain relatively constant. We can conclude that UFOs&#8217; activity is unrelated to their shape. This consistency of behaviour suggests that, regardless of their shape, the various forms of UFO, however they disguise themselves, may be drawn from a single source.</p>
<figure id="attachment_345" aria-describedby="caption-attachment-345" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour.png"><img loading="lazy" decoding="async" data-attachment-id="345" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/count_per_hour-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="count_per_hour" data-image-description="" data-image-caption="&lt;p&gt;Hourly UFO sightings by shape.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-1024x576.png" alt="Hourly UFO sightings by shape." width="1024" height="576" class="size-large wp-image-345" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-345" class="wp-caption-text">Hourly UFO sightings by shape. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour.png">PDF Version</a>.)</figcaption></figure>
<p>This is far from a definitive breakdown of UFO behaviour by their shape. In future posts we will explore whether differing shapes of UFO cluster geographically, and the extent to which cotemporaneous sightings can be correlated by their shape and description.</p>
<p>You can keep up to date with our latest statistical esoterica on Twitter at <a href="https://twitter.com/weirddatasci" rel="noopener noreferrer" target="_blank">@WeirdDataSci</a>.</p>
<p>As always, keep delving.</p>
<p><strong>Code Note:</strong><br />
In developing this entry we have moved from using the excellent work of <a href="https://data.world/timothyrenner">Tim Renner</a> in gathering and cleaning the <a href="http://www.nuforc.org">NUFORC</a> UFO dataset, and developed our own scraping code. Most posts here have included source code at the bottom of each entry. As this post relied on more than the usual code, however, and included multiple outputs, we are including only representative code. The full scraping and analysis code will be the focus of a future post.</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show analysis code</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><strong>Data:</strong></p>
<ul>
<li>National UFO Reporting Centre: <a href="http://www.nuforc.org">http://www.nuforc.org</a>.</li>
</ul>
<p><strong>Other:</strong></p>
<ul>
<li>Tox Typewriter font: <a href="https://www.dafont.com/tox-typewriter.font">https://www.dafont.com/tox-typewriter.font</a>
</ul>
<p><strong>Proportional Time Series Plot Code:</strong></p>
[code language=&#8221;r&#8221;]
<p># Structure of ufo_tbl object output from NUFORC scraping code<br />
Classes ‘tbl_df’, ‘tbl’ and &#8216;data.frame&#8217;:       114949 obs. of  8 variables:<br />
 $ occurred: POSIXct, format: &quot;1995-02-02 23:00:00&quot; &quot;1995-02-02 19:15:00&quot; &quot;1995-02-02 20:10:00&quot; &quot;1994-12-13 18:55:00&quot; &#8230;<br />
 $ reported: POSIXct, format: &quot;1995-02-02 10:47:00&quot; &quot;1995-02-03 06:06:00&quot; &quot;1995-02-03 10:32:00&quot; &quot;1995-02-03 17:45:00&quot; &#8230;<br />
 $ posted  : POSIXct, format: &quot;2003-02-05&quot; &quot;2003-03-04&quot; &quot;2003-03-21&quot; &quot;2003-03-21&quot; &#8230;<br />
 $ location: chr  &quot;Shady Grove, OR&quot; &quot;Denmark, WI&quot; &quot;Traverse City, MI&quot; &quot;Murphy, NC&quot; &#8230;<br />
 $ shape   : chr  &quot;Other&quot; &quot;Round&quot; &quot;Other&quot; &quot;Other&quot; &#8230;<br />
 $ duration: chr  &quot;15 min&quot; &quot;75 min&quot; &quot;2 min (?)&quot; &quot;&quot; &#8230;<br />
 $ details : chr  &quot;Man and wife witness very bright, moving light over ridge to southwest.  Flashing green &amp; red lights. Good rept.&quot; &quot;Caller, and apparently several other people, witnessed multiple strange craft streaking through the night sky i&quot;| __truncated__ &quot;Four children left home to go sledding on a hill located approximately 500 yards away.  At approximately 2010 h&quot;| __truncated__ &quot;Woman reports seeing strange, lighted obj. with  \&quot;arms.\&quot;  Many witnesses and written reports.&quot; &#8230;<br />
 $ date    : Date, format: &quot;1995-02-02&quot; &quot;1995-02-02&quot; &quot;1995-02-02&quot; &quot;1994-12-13&quot; &#8230;<br />
[/code]
[code language=&#8221;r&#8221;]
<p>library(tidyverse)<br />
library(magrittr)<br />
library(lubridate)<br />
library(forcats)</p>
<p>library(ggplot2)<br />
library(ggridges)<br />
library(ggthemes)<br />
library(showtext)</p>
<p>library(viridis)</p>
<p># Create a summary barplot for UFO activity by shape over time</p>
<p># Load the data from scraping http://www.nuforc.org<br />
ufo_tbl &lt;- readRDS( &quot;data/ufo_processed.rds&quot; )</p>
<p># Load font<br />
font_add( &quot;mapfont&quot;, &quot;font/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Shape entries are inconsistent. Manual fixing required.<br />
# Most inconsistency is lowercase and uppercase, so use str_to_title to fix<br />
# that.<br />
ufo_tbl$shape &lt;- ufo_tbl$shape %&gt;%<br />
	str_to_title %&gt;%<br />
	str_replace( &quot;(^$|Unknown)&quot;, &quot;Other&quot; ) %&gt;%<br />
	str_replace( &quot;(Changed|Changing)&quot;, &quot;Changing&quot; ) %&gt;%<br />
	str_replace( &quot;(Delta|Triangle|Triangular)&quot;, &quot;Triangle&quot; ) %&gt;%<br />
	str_replace( &quot;(Circle|Round|Dome)&quot;, &quot;Circle&quot; ) %&gt;%<br />
	# Categories too small to be represented individually<br />
	str_replace( &quot;(Crescent|Pyramid|Hexagon|Dome)&quot;, &quot;Other&quot; ) %&gt;%<br />
	str_replace( &quot;Flare&quot;, &quot;Light&quot; )</p>
<p># Further category combination for frequency plot<br />
ufo_tbl$shape &lt;- ufo_tbl$shape %&gt;%<br />
	str_replace( &quot;(Fireball|Flash)&quot;, &quot;Light&quot; ) %&gt;%<br />
	str_replace( &quot;(Sphere|Disk|Oval|Egg|Circle|Cone)&quot;, &quot;Round&quot; ) %&gt;%<br />
	str_replace( &quot;(Cigar|Cylinder)&quot;, &quot;Cylinder&quot; ) %&gt;%<br />
	str_replace( &quot;Chevron&quot;, &quot;Triangle&quot; ) %&gt;%<br />
	str_replace( &quot;(Cross|Diamond|Teardrop)&quot;, &quot;Other&quot; ) </p>
<p># Cut off at 1900, as earlier sightings are infrequent and unreliable.<br />
# (Unlike those post 1900&#8230;)<br />
ufo_tbl &lt;- ufo_tbl %&gt;%<br />
	filter( occurred &gt;= &quot;1900-01-01&quot; ) %&gt;%<br />
	filter( occurred &lt;= &quot;2019-01-01&quot; )</p>
<p># Proportional frequency of each sighting<br />
ufo_tbl$date &lt;- lubridate::date( ufo_tbl$occurred )<br />
frequency_tbl &lt;- ufo_tbl %&gt;%<br />
	count( aggr_date = year(occurred), shape ) %&gt;%<br />
	group_by( aggr_date ) %&gt;%<br />
	mutate(freq = n / sum(n))</p>
<p>colnames( frequency_tbl ) &lt;- c( &quot;aggr_date&quot;, &quot;shape&quot;, &quot;n&quot; , &quot;freq&quot;)</p>
<p>gp &lt;- ggplot( frequency_tbl, aes( x=aggr_date, fill=shape, y=freq ) ) +<br />
	labs( x=&quot;Date&quot;, y=&quot;Sightings\n(Proportion)&quot; ) +<br />
	geom_col( alpha=0.4 ) +<br />
	scale_fill_viridis( name = &quot;Shape&quot;, option=&quot;D&quot;, discrete=TRUE ) +<br />
	theme_dark() +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			legend.key = element_rect(fill = &quot;#222222&quot;),<br />
			legend.background = element_rect(fill = &quot;#222222&quot;),<br />
			legend.title = element_text( size=18, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot;, margin = margin( t = 20 ) ),<br />
			legend.text = element_text( size=14, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot;, margin = margin( t = 20 ) ),<br />
			text = element_text( color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot; ),<br />
			axis.title.x = element_text( size=18, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot;, margin = margin( t = 20 ) ),<br />
			axis.title.y = element_text( size=18, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot;, margin = margin( r = 20 ) ),<br />
			axis.text = element_text( size=14, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot; ),<br />
			panel.grid.major = element_line(colour = &quot;#444444&quot;),<br />
			panel.grid.minor = element_line(colour = &quot;#444444&quot;),<br />
			)</p>
<p># Cowplot trick for ggtitle<br />
title &lt;- ggdraw() +<br />
	draw_label(&quot;Propotion of UFO Sightings by Shape, 1900-2017&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#eeeeee&quot;, size=18, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#eeeeee&quot;, size=14, hjust=0, vjust=1, x=0.02, y=0.40)</p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#eeeeee&quot;, size=14, hjust=1, x=0.98 ) </p>
<p># Remove legend from internal plot<br />
theme_set(theme_cowplot(font_size=4, font_family = &quot;mapfont&quot; ) ) # </p>
<p>tgp &lt;- plot_grid(title, gp, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) </p>
<p>tgp &lt;- tgp +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			) </p>
<p>save_plot(&quot;output/proportion_over_time.pdf&quot;,<br />
							tgp,<br />
							base_width = 16,<br />
							base_height = 9,<br />
			           	base_aspect_ratio = 1.78 )</p>
[/code]
</div></div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">331</post-id>	</item>
		<item>
		<title>Mapping Paranormal Manifestations in the British Isles</title>
		<link>https://www.weirddatascience.net/2018/04/10/mapping-paranormal-manifestations-in-the-british-isles/</link>
					<comments>https://www.weirddatascience.net/2018/04/10/mapping-paranormal-manifestations-in-the-british-isles/#comments</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Tue, 10 Apr 2018 20:08:23 +0000</pubDate>
				<category><![CDATA[cryptozoology]]></category>
		<category><![CDATA[ghosts]]></category>
		<category><![CDATA[maps]]></category>
		<category><![CDATA[paranormal]]></category>
		<category><![CDATA[spatial analysis]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/blog/?p=284</guid>

					<description><![CDATA[<div class="mh-excerpt">In our last entry we analysed the relative frequency of paranormal manifestations in the British Isles according to the Paranormal Database. The results of that analysis showed that hauntings of various forms are by far the most commonly-reported paranormal encounter in the British Isles, followed by cryptozoological sightings. <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2018/04/10/mapping-paranormal-manifestations-in-the-british-isles/" title="Mapping Paranormal Manifestations in the British Isles">[...]</a></div>]]></description>
										<content:encoded><![CDATA[<p>In our <a href="http://www.weirddatascience.net/index.php/2018/03/17/paranormal-manifestations-in-the-british-isles/" rel="noopener noreferrer" target="_blank">last entry</a> we analysed the relative frequency of paranormal manifestations in the British Isles according to the <a href="http://www.paranormaldatabase.com" rel="noopener noreferrer" target="_blank">Paranormal Database</a>. The results of that analysis showed that hauntings of various forms are by far the most commonly-reported paranormal encounter in the British Isles, followed by cryptozoological sightings.</p>
<p>This is, however, relatively unsatisfactory. It is much more interesting to know where such sightings and events occur. Are there particular haunts of restless spirits? Do mysterious beasts roam in particular regions more than others? To answer these questions, we need to delve into the specific locations of different reports.</p>
<p>The Paranormal Database does contain location information, but it is given very informally. To map this we can make use of <a href="https://developers.google.com/maps/documentation/geolocation/intro">Google&#8217;s Geolocation API</a> to convert free text strings, such as &#8220;<a href="https://www.nationaltrust.org.uk/features/a-haunted-library-and-a-bookish-ghost" rel="noopener noreferrer" target="_blank">Felbrigg Hall, Norfolk</a>&#8221; into usable latitude and longitude coordinates. (In this case: <a href="https://goo.gl/maps/xr3vZHB4Kwq">52.907479, 1.259443</a>.)</p>
<p>The geolocation is not perfect, but with sufficient manipulation of the service it was possible to produce geolocated coordinates for most of the entries in the database. In order to represent these meaningfully, we have also subdivided the entries into a different types. The original Paranormal Database data is subdivided into twenty categories, which we have reduced to six for easier presentation. This includes collapsing the various kinds of haunting, from poltergeists to &#8216;post-mortem manifestions&#8217;, simply to hauntings. Similarly, we combine alien big cats and <a href="https://en.wikipedia.org/wiki/Black_Shuck" rel="noopener noreferrer" target="_blank">shuck</a> into the broader family of cryptozoology.</p>
<p>With this in place, we can see the overall distribution of paranormal events in the British Isles.</p>
<figure id="attachment_289" aria-describedby="caption-attachment-289" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal.png"><img loading="lazy" decoding="async" data-attachment-id="289" data-permalink="https://www.weirddatascience.net/2018/04/10/mapping-paranormal-manifestations-in-the-british-isles/paranormal-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="paranormal" data-image-description="" data-image-caption="&lt;p&gt;Paranormal manifestations in the British Isles.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-1024x576.png" class="wp-image-289 size-large" width="1024" height="576" alt="" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-289" class="wp-caption-text">Paranormal manifestations in the British Isles. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal.pdf" rel="noopener noreferrer" target="_blank">PDF Version</a>.)</figcaption></figure>
<p>As might be expected, London is a dark and sinister nexus of paranormal activity. Hauntings, as might be expected from their overall frequency, dominate the majority of the British Isles. Moving north, particularly as we reach the Scottish Highlands, cryptozoological sightings begin to challenge hauntings as the most common supernatural event. We can also see significant cryptozoology in the Hebrides, Orkney, and Shetland &#8212; the archipelagos that surround the Scottish mainland.</p>
<p>Both Wales, Ireland, and Cornwall are significantly less densely haunted in the Paranormal Database, with the majority of sightings falling in England.</p>
<p>This overall view, however, combines a number of very different phenomena. Where, for example, are we most likely to receive a visitation from a restless spectre as opposed to being pursued by a savage and unnatural beast?</p>
<p>By breaking down the sightings into different types, and plotting a heatmap of event density over each, we can identify the regions in which different manifestations cluster.</p>
<figure id="attachment_292" aria-describedby="caption-attachment-292" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density.png"><img loading="lazy" decoding="async" data-attachment-id="292" data-permalink="https://www.weirddatascience.net/2018/04/10/mapping-paranormal-manifestations-in-the-british-isles/paranormal-density/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="paranormal-density" data-image-description="" data-image-caption="&lt;p&gt;Density plot of paranormal manifestations in the British Isles&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density-1024x576.png" class="size-large wp-image-292" width="1024" height="576" alt="" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-292" class="wp-caption-text">Density plot of paranormal manifestations in the British Isles. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/04/paranormal-density.pdf" rel="noopener noreferrer" target="_blank">PDF Version</a>.)</figcaption></figure>
<p>This view highlights several points of interest.</p>
<p>Firstly, London&#8217;s preeminent position is not for all forms of paranormal activity. Hauntings are extremely dense in London, however the rest of England is also well-populated. As might be expected from the first diagram, the less population-dense regions further north have produced fewer sad echoes of mortality. Despite its general reputation Edinburgh, while noticeably haunted, cannot compare with many regions of England.</p>
<p>Cryptozoologically, however, London is far from dominant. Whilst unknown beings may lurk in the foetid sewers of the capital, they clearly prefer the wide open spaces &#8212; both the Norfolk and Suffolk Broads are rife with cryptids, as are the Hebrides, Orkney, and Shetland that were noticeable earlier. Finally, visitors to Cornwall will pass through areas of increasing <a href="https://en.wikipedia.org/wiki/Beast_of_Bodmin_Moor" rel="noopener noreferrer" target="_blank">monstrous activity</a>.</p>
<p>UFO&#8217;s also appear to be attracted to East Anglia most strongly, and are otherwise most common in the large population centres of England. Less obviously, there is a noticeable density of UFO activity on the Pembrokeshire Coast, in the south-west of Wales.</p>
<p>Monsters, which in this classification includes werewolves, vampires, and dragons, produce a surprising cluster in North Wales, around Snowdonia. The most significant monstrous sightings, however, appear to be in Exmoor; again on the south-western tip of the British Isles.</p>
<p>The final categories of manifestation include legends, fairies, and a catch-all category of &#8216;other manifestations&#8217; that include mysterious orbs, talking trees, bleeding stones, and the supernatural impressions left by the work of <a href="https://en.wikipedia.org/wiki/John_Dee" rel="noopener noreferrer" target="_blank">John Dee</a>. As might be expected, this last category is more uniformly distributed across the country, matching high-density population areas. There is, however, another notable cluster in Cornwall for this category.</p>
<p>In conclusion, then, the British Isles are teeming with paranormal activity. Entities from beyond the grave lie close at all times, with twisted monstrosities roaming the wild spaces. UFOs descend from the night sky to terrorise the coastal regions.</p>
<p>From this analysis, ghost hunters should concentrate on London for the best chance of a sighting, although almost any of the large centres of population provide a reasonable chance of spectral apparitions. Cryptid researchers should concentrate in East Anglia or head north to the islands beyond Scotland. Those seeking contact with extraterrestrials should focus particularly on the east coast of Suffolk, or travel to the south-west of Wales. Paranormal investigators whose interests lie in legends or monsters, or less specific strange entities, would be well-advised to visit Cornwall.</p>
<p>Code for the plotting elements of this analyis are given below, following on from the scraping and parsing code in our <a href="http://www.weirddatascience.net/index.php/2018/03/17/paranormal-manifestations-in-the-british-isles/" rel="noopener noreferrer" target="_blank">previous post</a>. The geolocation step required a more significant effort, and will be the focus of a future code-based post.</p>
<p>You can keep up to date with our latest paranormal data mining on Twitter at <a href="https://twitter.com/weirddatasci" rel="noopener noreferrer" target="_blank">@WeirdDataSci</a>.</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show analysis code</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><strong>Data:</strong></p>
<ul>
<li>The Paranormal Database: <a href="http://www.paranormaldatabase.com">http://www.paranormaldatabase.com</a></li>
</ul>
<p><strong>Other:</strong></p>
<ul>
<li>JSL Ancient font: <a href="http://www.1001fonts.com/jsl-ancient-font.html">http://www.1001fonts.com/jsl-ancient-font.html</a></li>
<li>Cowplot: <a href="https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html">https://cran.r-project.org/web/packages/cowplot/vignettes/introduction.html</a></li>
</ul>
<p><strong>Combined Plot Code:</strong></p>
[code language=&#8221;r&#8221;]
library(spatstat)</p>
<p>library(rgdal)<br />
library(maptools)</p>
<p>library(tidyverse)<br />
library(magrittr)<br />
library(ggplot2)<br />
library(ggthemes)<br />
library(raster)<br />
library(viridis)<br />
library(scales)</p>
<p>library(sf) </p>
<p>library(showtext)</p>
<p>library(grid)</p>
<p>library(cowplot)<br />
library(magick)</p>
<p># Function to combine certain paranormal report types into broader categories<br />
combine_paranormal_types &lt;- function( type ) {</p>
<p>	# levels( paranormal_tbl$type )<br />
	# [1] &quot;Alien Big Cat&quot;                &quot;Crisis Manifestation&quot;<br />
	# [3] &quot;Cryptozoology&quot;                &quot;Curse&quot;<br />
	# [5] &quot;Dragon&quot;                       &quot;Environmental Manifestation&quot;<br />
	# [7] &quot;Experimental Manifestation&quot;   &quot;Fairy&quot;<br />
	# [9] &quot;Haunting Manifestation&quot;       &quot;Legend&quot;<br />
	# [11] &quot;Manifestation of the Living&quot;  &quot;Other&quot;<br />
	# [13] &quot;Poltergeist&quot;                  &quot;Post-Mortem Manifestation&quot;<br />
	# [15] &quot;Shuck&quot;                        &quot;Spontaneous Human Combustion&quot;<br />
	# [17] &quot;UFO&quot;                          &quot;Unknown Ghost Type&quot;<br />
	# [19] &quot;Vampire&quot;                      &quot;Werewolf&quot;                   </p>
<p>	# Simple lookup<br />
	case_when(<br />
				 type %in% c(&quot;Alien Big Cat&quot;, &quot;Cryptozoology&quot;, &quot;Shuck&quot;) ~ &quot;Cryptozoology&quot;,<br />
				 type %in% c(&quot;Dragon&quot;, &quot;Vampire&quot;, &quot;Werewolf&quot; ) ~ &quot;Monster&quot;,<br />
				 type %in% c(&quot;Crisis Manifestation&quot;, &quot;Environmental Manifestation&quot;, &quot;Experimental Manifestation&quot;, &quot;Haunting Manifestation&quot;, &quot;Manifestation of the Living&quot;, &quot;Poltergeist&quot;, &quot;Post-Mortem Manifestation&quot;, &quot;Unknown Ghost Type&quot;) ~ &quot;Haunting&quot;,<br />
				 type %in% c(&quot;Legend&quot;, &quot;Fairy&quot;, &quot;Curse&quot;) ~ &quot;Legend&quot;,<br />
				 type %in% c(&quot;Spontaneous Human Combustion&quot;, &quot;Other&quot;) ~ &quot;Other&quot;,<br />
				 type %in% c(&quot;UFO&quot;) ~ &quot;UFO&quot;<br />
				 )</p>
<p>}</p>
<p># Load font<br />
font_add( &quot;mapfont&quot;, &quot;/usr/share/fonts/TTF/weird/JANCIENT.TTF&quot; )<br />
showtext_auto()</p>
<p># Read world shapefile data and tranform to an appropriate projection.<br />
# Limit to the UK, Ireland, and the Isle of Man<br />
world &lt;- readOGR( dsn=&#8217;data/ne/10m_cultural&#8217;, layer=&#8217;ne_10m_admin_0_countries&#8217; )<br />
world_subset &lt;- world[ world$iso_a2 %in% c(&quot;GB&quot;,&quot;IE&quot;,&quot;IM&quot;), ]
world_subset &lt;- spTransform(world_subset,CRS(&quot;+init=epsg:4326&quot;))<br />
world_df &lt;- fortify( world_subset )</p>
<p># Read paranormal database<br />
paranormal_tbl &lt;- as.tibble( read.csv( file=&quot;data/paranormal_database.csv&quot; ) )</p>
<p># Convert the paranormal dataframe to a spatial dataframe that contains<br />
# explicit longitude and latitude projected appropriately for plotting.<br />
coordinates( paranormal_tbl ) &lt;- ~lng+lat<br />
proj4string( paranormal_tbl )&lt;-CRS(&quot;+init=epsg:4326&quot;)<br />
paranormal_tbl &lt;- spTransform(paranormal_tbl,CRS(proj4string(paranormal_tbl)))</p>
<p># Restrict paranormal_tbl to those points in the polygons defined by world_subset<br />
paranormal_tbl_rows &lt;- paranormal_tbl %&gt;%<br />
	over( world_subset ) %&gt;%<br />
	is.na() %&gt;%<br />
	not() %&gt;%<br />
	rowSums() %&gt;%<br />
	`!=`(0) %&gt;%<br />
	which</p>
<p>paranormal_tbl &lt;- as.tibble( paranormal_tbl[ paranormal_tbl_rows, ] )</p>
<p>paranormal_tbl$combined_type &lt;- paranormal_tbl$type %&gt;%<br />
	map( combine_paranormal_types ) %&gt;%<br />
	unlist</p>
<p># Show the map<br />
gp &lt;- ggplot() + </p>
<p>	geom_map( data = world_df, aes( map_id=id ), colour = &quot;#3c3f4a&quot;, fill = &quot;transparent&quot;, size = 0.5, map = world_df )</p>
<p># Display each sighting as geom_point. Use a level of transparency to highlight<br />
# more common areas. (On the 20180401 dataset, this reports that 16035 out of<br />
# the original 19387 points lie in the appropriate area. Several are geolocated<br />
# outside of the UK &#8212; the geolocation should be run again with better bounds<br />
# checking and region preference.)<br />
gp &lt;- gp + geom_point(data=paranormal_tbl, aes(x=lng, y=lat, colour=combined_type ), size=0.5, shape=17, alpha=0.9) +<br />
	expand_limits(x = world_df$long, y = world_df$lat)  </p>
<p>gp &lt;- gp +	</p>
<p>	# Theming<br />
	theme_map() +<br />
	theme(<br />
			plot.background = element_rect(fill = &quot;transparent&quot;, colour = &quot;transparent&quot;),<br />
			panel.border = element_blank(),<br />
			plot.title = element_text( size=24, colour=&quot;#3c3f4a&quot;, family=&quot;mapfont&quot; ),<br />
			text = element_text( size=14, color=&quot;#3c3f4a&quot;, family=&quot;mapfont&quot; ),<br />
			) +</p>
<p>	theme(<br />
			legend.background = element_rect( colour =&quot;transparent&quot;, fill = &quot;transparent&quot; ),<br />
			legend.key = element_rect(fill = &quot;transparent&quot;, colour=&quot;transparent&quot;),<br />
			legend.position = &quot;left&quot;,<br />
			legend.justification = c(0,0)<br />
			) +</p>
<p>	guides( fill = guide_colourbar( title.position=&quot;top&quot;, direction=&quot;vertical&quot;, barwidth=32, nrow=1 ) ) +<br />
	guides(colour = guide_legend(override.aes = list(size=2)) ) +<br />
	scale_colour_manual(<br />
		values = c(&quot;#00EA38&quot;,&quot;#417CCC&quot;,&quot;#B79F00&quot;,&quot;#F564E3&quot;,&quot;#00BFC4&quot;,&quot;#F8766D&quot;),<br />
		breaks = c(&quot;Haunting&quot;, &quot;Cryptozoology&quot;, &quot;Monster&quot;, &quot;Legend&quot;, &quot;UFO&quot;, &quot;Other&quot; ),<br />
		name = &quot;Manifestation&quot; ) +<br />
	coord_fixed( ratio=1.2 )</p>
<p># Cowplot trick for ggtitle<br />
title &lt;- ggdraw() +<br />
	draw_label(&quot;Paranormal Manifestations in the British Isles&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#3c3f4a&quot;, size=24, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#3c3f4a&quot;, size=14, hjust=0, vjust=1, x=0.02, y=0.40)</p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data from: http://www.paranormaldatabase.com&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#3c3f4a&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>paranormal_legend &lt;- get_legend(gp)</p>
<p># Remove legend from internal plot<br />
gp &lt;- gp + theme(legend.position=&#8217;none&#8217;)</p>
<p>tgp &lt;- plot_grid(title, gp, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) </p>
<p>vellum_plot &lt;- ggdraw() +<br />
	draw_image(&quot;img/vellum.jpg&quot;, scale=1.4 ) +<br />
	draw_plot(tgp) +<br />
	draw_plot( paranormal_legend, 0.02, 0.03 )</p>
<p>save_plot(&quot;output/paranormal.pdf&quot;,<br />
			vellum_plot,<br />
			base_width = 16,<br />
			base_height = 9,<br />
			base_aspect_ratio = 1.78 )<br />
[/code]
<p><strong>Per-Manifestation Density Plot Code:</strong></p>
[code language=&#8221;r&#8221;]
library(spatstat)</p>
<p>library(rgdal)<br />
library(maptools)</p>
<p>library(tidyverse)<br />
library(magrittr)<br />
library(ggplot2)<br />
library(ggthemes)<br />
library(raster)<br />
library(viridis)<br />
library(scales)</p>
<p>library(sf) </p>
<p>library(showtext)</p>
<p>library(grid)</p>
<p>library(cowplot)<br />
library(magick)</p>
<p># Function to combine certain paranormal report types into broader categories<br />
combine_paranormal_types &lt;- function( type ) {</p>
<p>	# levels( paranormal_tbl$type )<br />
	# [1] &quot;Alien Big Cat&quot;                &quot;Crisis Manifestation&quot;<br />
	# [3] &quot;Cryptozoology&quot;                &quot;Curse&quot;<br />
	# [5] &quot;Dragon&quot;                       &quot;Environmental Manifestation&quot;<br />
	# [7] &quot;Experimental Manifestation&quot;   &quot;Fairy&quot;<br />
	# [9] &quot;Haunting Manifestation&quot;       &quot;Legend&quot;<br />
	# [11] &quot;Manifestation of the Living&quot;  &quot;Other&quot;<br />
	# [13] &quot;Poltergeist&quot;                  &quot;Post-Mortem Manifestation&quot;<br />
	# [15] &quot;Shuck&quot;                        &quot;Spontaneous Human Combustion&quot;<br />
	# [17] &quot;UFO&quot;                          &quot;Unknown Ghost Type&quot;<br />
	# [19] &quot;Vampire&quot;                      &quot;Werewolf&quot;                   </p>
<p>	# Simple lookup<br />
	case_when(<br />
				 type %in% c(&quot;Alien Big Cat&quot;, &quot;Cryptozoology&quot;, &quot;Shuck&quot;) ~ &quot;Cryptozoology&quot;,<br />
				 type %in% c(&quot;Dragon&quot;, &quot;Vampire&quot;, &quot;Werewolf&quot; ) ~ &quot;Monster&quot;,<br />
				 type %in% c(&quot;Crisis Manifestation&quot;, &quot;Environmental Manifestation&quot;, &quot;Experimental Manifestation&quot;, &quot;Haunting Manifestation&quot;, &quot;Manifestation of the Living&quot;, &quot;Poltergeist&quot;, &quot;Post-Mortem Manifestation&quot;, &quot;Unknown Ghost Type&quot;) ~ &quot;Haunting&quot;,<br />
				 type %in% c(&quot;Legend&quot;, &quot;Fairy&quot;, &quot;Curse&quot;) ~ &quot;Legend&quot;,<br />
				 type %in% c(&quot;Spontaneous Human Combustion&quot;, &quot;Other&quot;) ~ &quot;Other Manifestation&quot;,<br />
				 type %in% c(&quot;UFO&quot;) ~ &quot;UFO&quot;<br />
				 )</p>
<p>}</p>
<p># Load font<br />
font_add( &quot;mapfont&quot;, &quot;/usr/share/fonts/TTF/weird/JANCIENT.TTF&quot; )<br />
showtext_auto()</p>
<p># Read world shapefile data and tranform to an appropriate projection.<br />
# Limit to the UK, Ireland, and the Isle of Man<br />
world &lt;- readOGR( dsn=&#8217;data/ne/10m_cultural&#8217;, layer=&#8217;ne_10m_admin_0_countries&#8217; )<br />
world_subset &lt;- world[ world$iso_a2 %in% c(&quot;GB&quot;,&quot;IE&quot;, &quot;IM&quot;), ]
world_subset &lt;- spTransform(world_subset,CRS(&quot;+init=epsg:4326&quot;))<br />
world_df &lt;- fortify( world_subset )</p>
<p># Read UK Major Cities datafile<br />
#cities &lt;- readOGR( dsn=&#8217;data/ons_uk_cities&#8217;, layer=&#8217;Major_Towns_and_Cities_December_2015_Boundaries&#8217; )<br />
#cities &lt;- spTransform(cities,CRS(&quot;+init=epsg:4326&quot;))<br />
#cities_fortified &lt;- fortify( cities )<br />
#cities_centroids_tbl &lt;- as.tibble( coordinates( cities ) )<br />
#cities_tbl &lt;- as.tibble( cbind( cities@data$tcity15nm, cities_centroids_tbl ))<br />
#colnames(cities_tbl) &lt;- c( &quot;city&quot;, &quot;lng&quot;, &quot;lat&quot; )</p>
<p># As the polygons are more confusing than useful, instead label centroids</p>
<p># Read paranormal database<br />
paranormal_tbl &lt;- as.tibble( read.csv( file=&quot;data/paranormal_database.csv&quot; ) )</p>
<p>paranormal_tbl$combined_type &lt;- paranormal_tbl$type %&gt;%<br />
	map( combine_paranormal_types ) %&gt;%<br />
	unlist</p>
<p># Convert the paranormal dataframe to a spatial dataframe that contains<br />
# explicit longitude and latitude projected appropriately for plotting.<br />
coordinates( paranormal_tbl ) &lt;- ~lng+lat<br />
proj4string( paranormal_tbl )&lt;-CRS(&quot;+init=epsg:4326&quot;)<br />
paranormal_tbl_spatial &lt;- spTransform(paranormal_tbl,CRS(proj4string(paranormal_tbl)))</p>
<p># Restrict paranormal_tbl to those points in the polygons defined by world_subset<br />
paranormal_tbl_rows &lt;- paranormal_tbl_spatial %&gt;%<br />
	over( world_subset ) %&gt;%<br />
	is.na() %&gt;%<br />
	not() %&gt;%<br />
	rowSums() %&gt;%<br />
	`!=`(0) %&gt;%<br />
	which</p>
<p>#paranormal_tbl &lt;- as.tibble( paranormal_tbl_spatial[ paranormal_tbl_rows, ] )</p>
<p># Create window for spatial analysis<br />
paranormal_owin &lt;- as.owin.SpatialPolygons(world_subset)</p>
<p># Function to plot density of a specific manifestation type.<br />
# plot_resolution is for the density raster, and is mainly used for quick prototyping of the output.<br />
density_plot &lt;- function( paranormal_type, plot_resolution = 1024 ) {</p>
<p>	cat( paste0( &quot;Plotting density: &quot;, paranormal_type, &quot;&#8230; &quot; ) )</p>
<p>	paranormal_tbl_spatial &lt;- paranormal_tbl_spatial[ which( paranormal_tbl_spatial$combined_type == paranormal_type ), ]
<p>	paranormal_ppp &lt;-<br />
		ppp( 	x=coordinates(paranormal_tbl_spatial)[,1],<br />
			 y=coordinates(paranormal_tbl_spatial)[,2],<br />
			 window = paranormal_owin )</p>
<p>	# This discards &#8216;illegal&#8217; points outside of the window<br />
	paranormal_ppp &lt;- as.ppp(paranormal_ppp)</p>
<p>	paranormal_density &lt;- density( paranormal_ppp, diggle=T, sigma=0.4, dimyx=c(plot_resolution,plot_resolution) )</p>
<p>	# Make density image object usable by ggplot as a raster<br />
	paranormal_density_raster &lt;- raster( paranormal_density )<br />
	raster_tbl &lt;- as.tibble( rasterToPoints( paranormal_density_raster ) )</p>
<p>	# Show the map<br />
	gp &lt;- ggplot() + </p>
<p>		geom_map( data = world_df, aes( map_id=id ), colour = &quot;#3c3f4a&quot;, fill = &quot;transparent&quot;, size = 0.8, map = world_df ) +</p>
<p>		# Add density of sightings as raster.<br />
		geom_raster( data = raster_tbl, alpha=0.8, aes( x=x, y=y, fill=layer), show.legend=TRUE ) +<br />
		scale_fill_viridis( option=&quot;magma&quot;, direction = -1, name=&quot;Density&quot; ) </p>
<p>	gp &lt;- gp +	</p>
<p>		# Theming<br />
		theme_map() + </p>
<p>		theme(<br />
				plot.background = element_rect(fill = &quot;transparent&quot;, colour = &quot;transparent&quot;),<br />
				panel.border = element_blank(),<br />
				plot.title = element_text( size=12, colour=&quot;#3c3f4a&quot;, family=&quot;mapfont&quot; ),<br />
				text = element_text( size=12, color=&quot;#3c3f4a&quot;, family=&quot;mapfont&quot; ),<br />
				) +</p>
<p>		theme(<br />
				legend.background = element_rect( colour =&quot;transparent&quot;, fill = &quot;transparent&quot; ),<br />
				legend.key = element_rect(fill = &quot;transparent&quot;, colour=&quot;transparent&quot;),<br />
				legend.position = c(0,0),<br />
				legend.justification = c(0,0)<br />
		) +</p>
<p>		guides( fill = guide_colourbar( title.position=&quot;top&quot;, direction=&quot;horizontal&quot;, barwidth=6, barheight=0.4 ) ) +</p>
<p>                # Fix the ratio of the plot to avoid distorting the UK<br />
		coord_fixed( ratio=1.2 )</p>
<p>	cat(&quot;done.\n&quot; )<br />
	return(gp)<br />
}</p>
<p># Calculate densities for each phenomenon<br />
phenomena &lt;- unique( paranormal_tbl$combined_type )</p>
<p>gp_list &lt;- phenomena %&gt;%<br />
	map( density_plot, 1024 )</p>
<p># Plot as a grid with cowplot<br />
theme_set(theme_cowplot(font_size=4, font_family = &quot;mapfont&quot; ) )<br />
gp &lt;- plot_grid( plotlist=gp_list,<br />
			labels = phenomena,<br />
			label_colour = &quot;#3c3f4a&quot; )</p>
<p># Cowplot trick for ggtitle<br />
title &lt;- ggdraw() +<br />
	draw_label(&quot;Density of Paranormal Manifestations in the British Isles&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#3c3f4a&quot;, size=24, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#3c3f4a&quot;, size=14, hjust=0, vjust=1, x=0.02, y=0.40)</p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data from: http://www.paranormaldatabase.com&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#3c3f4a&quot;, size=12, hjust=1, x=0.98 )</p>
<p>tgp &lt;- plot_grid(title, gp, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) </p>
<p>vellum_plot &lt;- ggdraw() +<br />
	draw_image(&quot;img/vellum.jpg&quot;, scale=1.4 ) +<br />
	draw_plot(tgp)</p>
<p>save_plot(&quot;output/paranormal-density.pdf&quot;,<br />
				vellum_plot,<br />
				base_width = 16,<br />
				base_height = 9,<br />
			        base_aspect_ratio = 1.78 )<br />
[/code]
</div></div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2018/04/10/mapping-paranormal-manifestations-in-the-british-isles/feed/</wfw:commentRss>
			<slash:comments>14</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">284</post-id>	</item>
		<item>
		<title>Are UFOs more commonly seen near US military bases?</title>
		<link>https://www.weirddatascience.net/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/</link>
					<comments>https://www.weirddatascience.net/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/#respond</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Tue, 27 Feb 2018 21:30:38 +0000</pubDate>
				<category><![CDATA[maps]]></category>
		<category><![CDATA[spatial analysis]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/blog/?p=174</guid>

					<description><![CDATA[<div class="mh-excerpt">What do they know? Since the Roswell Incident in 1947, UFO&#8217;s have been associated with secretive military installations where mysterious craft dart across the night sky. Skeptics might hold that many UFO sightings, far from being extraterrestrial visitors, are better explained as experimental or conventional military craft. Does <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/" title="Are UFOs more commonly seen near US military bases?">[...]</a></div>]]></description>
										<content:encoded><![CDATA[<p>What do they know?</p>
<p>Since the Roswell Incident in 1947, UFO&#8217;s have been associated with secretive military installations where mysterious craft dart across the night sky. Skeptics might hold that many UFO sightings, far from being extraterrestrial visitors, are better explained as experimental or conventional military craft. Does this association hold, though, in light of the wealth of UFO sightings collated by <a href="http://www.nuforc.org/" target="_blank" rel="noopener">NUFORC</a>? Are UFO&#8217;s more likely to be seen when in close proximity to a US airforce base?</p>
<figure id="attachment_176" aria-describedby="caption-attachment-176" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/02/base-density.png"><img loading="lazy" decoding="async" data-attachment-id="176" data-permalink="https://www.weirddatascience.net/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/base-density/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/02/base-density.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="ufo-density-us-airforce" data-image-description="&lt;p&gt;Density plot of US-based UFO sightings plotted against locations of US airforce installations.&lt;/p&gt;
" data-image-caption="&lt;p&gt;UFO Sightings Density against US Airforce Installations&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/02/base-density-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/02/base-density-1024x576.png" class="size-large wp-image-176" width="1024" height="576" alt="" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/02/base-density-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/02/base-density-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/02/base-density-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/02/base-density-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/02/base-density-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/02/base-density.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-176" class="wp-caption-text">UFO Sightings Density against US Airforce Installations. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/02/base-density.pdf">PDF</a> | <a href="http://www.weirddatascience.net/wp-content/uploads/2018/02/base-density-print.pdf">Print-Friendly PDF</a>)</figcaption></figure>
<p>As a first step to approaching this question, we can rely on the reported NUFORC data of UFO sightings, and the US government&#8217;s <a href="https://catalog.data.gov/dataset/military-bases-national" target="_blank" rel="noopener">conveniently thorough dataset of military installations</a>.</p>
<p>Before performing a more robust statistical analysis, we can quickly combine these two datasets to see if any obvious visual patterns emerge. As always, visual analysis comes with the strong caveat that apparent patterns must be backed up with real statistics. Beware eyeballs.</p>
<p>We will focus only on sightings in the United States. Whilst the NUFORC dataset is creditably global the reports are overwhelmingly from the US, reflecting the fact that NUFORC is largely a US-based endeavour and is much less likely to receive reports from elsewhere in the world.</p>
<p>For this quick visual exploration, we will produce a density plot, or heatmap, of UFO sightings going back to 1906, using the excellent <a href="http://spatstat.org/" target="_blank" rel="noopener"><code>spatstat</code></a> R package, as described in <a href="https://www.crcpress.com/Spatial-Point-Patterns-Methodology-and-Applications-with-R/Baddeley-Rubak-Turner/p/book/9781482210200" target="_blank" rel="noopener"> Baddeley, Rubak, and Turner&#8217;s <i>Spatial Point Patterns: Methodology and Applications with R</i></a>.</p>
<p>The first task is to transform the data appropriately from the weirdly <a href="http://www.hplovecraft.com/writings/texts/fiction/cc.aspx">non-Euclidean geometry</a> of longitude and latitude to a geometry that allows for consistent measurement of distance between points. Following this we can run a standard <a href="https://en.wikipedia.org/wiki/Kernel_density_estimation" target="_blank" rel="noopener">kernel density estimate</a> of sightings that, briefly, produces a probability distribution over the analysed space that estimates the likelihood of a UFO sighting at each point.</p>
<p>The output of spatstat&#8217;s density function is a pixel image, appropriate for plotting in R&#8217;s base graphics. As fanatical devotees of the cult of <a href="https://ggplot2.tidyverse.org/" target="_blank" rel="noopener"><code>ggplot2</code></a>, however, we instead convert this image to a raster suitable for plotting with ggplot&#8217;s <code>geom_raster</code>.</p>
<p>With the density plot calculated we can load the US military base data and perform a similar transform from longitude and latitude to Euclidean space. We then restrict the data to each active US Airforce installation, and plot the resulting set of points over the underlying density plot of UFO sightings. Being worthy of special attention, we highlight the location of Area 51.</p>
<p>At a first glance, there do seem to be correlations with particular clusters of airforce installations and UFO sightings. It is immediately obvious that sightings are much more common on the coasts of the US than in the centre of the continent, although the relatively sparse population density might go some way towards explaining this phenomenon.</p>
<p>The next step in this analysis, which we will carry out in a future post, will be to conduct a formal analysis of the correlation between the sightings density and the distance from airforce bases. From our initial observations, however, dark suspicions have already been raised.</p>
<p>You can keep up to date with our latest tearings of the mathematical veil on Twitter at <a href="https://twitter.com/weirddatasci" rel="noopener" target="_blank">@WeirdDataSci</a>.</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show analysis code</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><strong>Data:</strong></p>
<ul>
<li>NUFORC UFO Sightings Data compiled by Timothy Renner: <a href="https://data.world/timothyrenner/ufo-sightings">https://data.world/timothyrenner/ufo-sightings</a></li>
<li>Natural Earth Small-Scale (1:100m) Physical Geographic Data: <a href="http://www.naturalearthdata.com/">http://www.naturalearthdata.com</a></li>
<li>US Data.Gov Military Bases (National): <a href="https://catalog.data.gov/dataset/military-bases-national">https://catalog.data.gov/dataset/military-bases-national</a></li>
</ul>
<p><strong>Other:</strong></p>
<ul>
<li>Tox Typewriter font: <a href="https://www.dafont.com/tox-typewriter.font">https://www.dafont.com/tox-typewriter.font</a>
</ul>
<p><strong>Code:</strong><br />
[code language=&#8221;r&#8221;]
library(spatstat)</p>
<p>library(rgdal)<br />
library(maptools)</p>
<p>library(tidyverse)<br />
library(ggplot2)<br />
library(ggthemes)<br />
library(raster)<br />
library(viridis)<br />
library(scales)</p>
<p>library(showtext)</p>
<p># Load font<br />
font.add( &amp;quot;mapfont&amp;quot;, &amp;quot;font/Tox Typewriter.ttf&amp;quot;)<br />
showtext_auto()</p>
<p># Read world shapefile data and tranform to an appropriate projection<br />
world &amp;lt;- readOGR( dsn=&#8217;data/ne/110m_cultural&#8217;, layer=&#8217;ne_110m_admin_0_countries&#8217; )<br />
world.subset &amp;lt;- world[ world$continent %in% c(&amp;quot;North America&amp;quot;), ]
world.subset &amp;lt;- spTransform(world.subset,CRS(&amp;quot;+init=epsg:4326&amp;quot;))</p>
<p># Fortify world data for plotting<br />
world.df &amp;lt;- fortify( world.subset )</p>
<p># Get list of unique countries for processing.<br />
countries.all &amp;lt;- data.frame( unique(world.df$id)[-1])<br />
colnames( countries.all ) &amp;lt;- c(&amp;quot;country&amp;quot;)</p>
<p># Get US military base data<br />
us.mil &amp;lt;- readOGR( dsn=&#8217;data/usmil&#8217;, layer=&#8217;MIRTA_Points&#8217; )<br />
us.mil &amp;lt;- spTransform(us.mil,CRS(&amp;quot;+init=epsg:4326&amp;quot;))</p>
<p># UFO Sightings Data<br />
ufo &amp;lt;- read.csv(&amp;quot;data/scrubbed.csv&amp;quot;, stringsAsFactors=FALSE)</p>
<p># Convert latitude and longitude to numeric, and omit result NA values.<br />
ufo$latitude &amp;lt;- as.numeric(ufo$latitude)<br />
ufo$longitude &amp;lt;- as.numeric(ufo$longitude)<br />
ufo &amp;lt;- na.omit( ufo )</p>
<p># Convert the ufo dataframe to a spatial dataframe that contains explicit longitude and latitude projected appropriately for plotting.<br />
coordinates( ufo ) &amp;lt;- ~longitude+latitude<br />
proj4string( ufo )&amp;lt;-CRS(&amp;quot;+init=epsg:4326&amp;quot;)<br />
ufo &amp;lt;- spTransform(ufo,CRS(proj4string(ufo)))</p>
<p># Create window for spatial analysis<br />
ufo.owin &amp;lt;- as.owin.SpatialPolygons(world.subset)<br />
ufo.ppp &amp;lt;- ppp( x=coordinates(ufo)[,1], y=coordinates(ufo)[,2], window = ufo.owin, marks = ufo$shape )</p>
<p># This discards &#8216;illegal&#8217; points outside of the window<br />
ufo.ppp &amp;lt;- as.ppp(ufo.ppp)</p>
<p># Highlight Area 51<br />
roswell.coords = cbind(-115.806999, 37.237 )<br />
roswell.sp = SpatialPoints(roswell.coords)</p>
<p>proj4string( roswell.sp )&amp;lt;-CRS(&amp;quot;+proj=longlat&amp;quot;)<br />
roswell.sp &amp;lt;- spTransform(roswell.sp,CRS(&amp;quot;+init=epsg:4326&amp;quot;))<br />
roswell.sp.df &amp;lt;- as.data.frame( coordinates( roswell.sp ) )<br />
roswell.sp.df$text.label &amp;lt;- &#8216;Area 51&#8217;</p>
<p># Get only active US airforce bases.<br />
us.mil.airforce &amp;lt;- us.mil[ us.mil$COMPONENT == &amp;quot;AF Active&amp;quot;, ]
us.mil.ppp &amp;lt;- ppp( x=coordinates(us.mil.airforce)[,1], y=coordinates(us.mil.airforce)[,2], window = ufo.owin )<br />
us.mil.ppp &amp;lt;- as.ppp( us.mil.ppp )<br />
us.mil.sp &amp;lt;- as( us.mil.ppp, &amp;quot;SpatialPoints&amp;quot; )</p>
<p># Calculate density of UFO sightings<br />
ufo.density &amp;lt;- density( ufo.ppp, diggle=T, sigma=1.4, dimyx=c(2048,2048) )</p>
<p># Make density image object usable by ggplot as a raster<br />
ufo.density.raster &amp;lt;- raster(ufo.density)<br />
raster.df &amp;lt;- as.data.frame( rasterToPoints( ufo.density.raster))</p>
<p># Plot<br />
gp &amp;lt;- ggplot() +</p>
<p># Underlying map<br />
geom_map( data = world.df, aes( map_id=id ), colour = &amp;quot;grey20&amp;quot;, size = 0.5, map = world.df, show.legend = FALSE ) +</p>
<p># Theming<br />
theme_map() +<br />
theme(<br />
plot.background = element_rect(fill = &amp;quot;#444444&amp;quot;),<br />
panel.border = element_blank(),<br />
plot.title = element_text( size=24, colour=&amp;quot;#cccccc&amp;quot;, family=&amp;quot;mapfont&amp;quot; ),<br />
text = element_text( size=14, color=&amp;quot;#cccccc&amp;quot;, family=&amp;quot;mapfont&amp;quot; ),<br />
) +</p>
<p>theme(<br />
legend.background = element_rect( colour =&amp;quot;#444444&amp;quot;, fill = &amp;quot;#444444&amp;quot; ),<br />
legend.key = element_rect(fill = &amp;quot;#ffffff&amp;quot;, colour=&amp;quot;#444444&amp;quot;),<br />
legend.position = c(1,0),<br />
legend.justification = c(1,0)<br />
) +</p>
<p>guides( fill = guide_colourbar( title.position=&amp;quot;top&amp;quot;, direction=&amp;quot;horizontal&amp;quot;, barwidth=16 ) ) +</p>
<p>   # Add density of sightings as raster.<br />
   geom_raster( data = raster.df, alpha=1, aes( x=x, y=y, fill=layer), show.legend=TRUE ) +</p>
<p># Add density of sightings as raster.<br />
geom_raster( data = raster.df, alpha=1, aes( x=x, y=y, fill=layer), show.legend=FALSE) +</p>
<p># Colour using Viridis<br />
scale_fill_viridis( option=&amp;quot;magma&amp;quot; )</p>
<p># Overlay US Airforce locations<br />
dot.size &amp;lt;- 1.2<br />
stroke.size &amp;lt;- 0.4</p>
<p>gp &amp;lt;- gp +<br />
geom_point( data = as.data.frame( coordinates(us.mil.sp) ), aes( x=mx, y=my ), size=dot.size, stroke=stroke.size, shape=21, fill=&amp;quot;#eeeeee&amp;quot;, color=&amp;quot;#222222&amp;quot; ) +</p>
<p># Special case for Area 51<br />
# Due to ggplot only supporting a single colour scale for the entire plot, there&#8217;s some awkwardness here in using &#8216;colour&#8217; rather than &#8216;fill&#8217; for the Area 51 dot and manually adding an outline. This lets us add an easy legend to the map, though, highlighting that location.<br />
geom_point( data = roswell.sp.df, aes( x=coords.x1, y=coords.x2, colour=text.label ), size=1, stroke=0, show.legend=TRUE ) +<br />
geom_point( data = roswell.sp.df, aes( x=coords.x1, y=coords.x2 ), size=1, stroke=stroke.size, shape=21, color=&amp;quot;#222222&amp;quot; ) +<br />
scale_colour_manual(values = c(&amp;quot;red&amp;quot;)) +<br />
theme( legend.title = element_blank(), legend.background = element_rect( fill = &amp;quot;#444444&amp;quot; ), legend.key = element_rect(fill = &amp;quot;#444444&amp;quot;) ) +</p>
<p># Acknowledge data sources<br />
labs( caption = &amp;quot;Data: http://www.nuforc.org | https://catalog.data.gov/dataset/military-bases-national&amp;quot; ) +</p>
<p># Add our title<br />
ggtitle(&amp;quot;Density of UFO Sightings against Active US Airforce Installations&amp;quot;, subtitle=&amp;quot;http://www.weirddatascience.net | @WeirdDataSci&amp;quot;)</p>
<p># Output<br />
ggsave( &amp;quot;output/base-density.pdf&amp;quot;, width=16, height=9 )<br />
[/code]
</div></div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">174</post-id>	</item>
		<item>
		<title>Unveiling the Global UFO Inquirer</title>
		<link>https://www.weirddatascience.net/2018/02/22/unveiling-the-global-ufo-inquirer/</link>
					<comments>https://www.weirddatascience.net/2018/02/22/unveiling-the-global-ufo-inquirer/#respond</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Thu, 22 Feb 2018 07:52:54 +0000</pubDate>
				<category><![CDATA[interactive]]></category>
		<category><![CDATA[maps]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/blog/?p=152</guid>

					<description><![CDATA[<div class="mh-excerpt">Our previous post showed an interactive map of UFO sightings drawn from the NUFORC dataset. Whilst this did allow us to see UFO sightings over time, it remained a relatively crude tool that created a static visualisation for each year. In delving deeper into the NUFORC data, it <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2018/02/22/unveiling-the-global-ufo-inquirer/" title="Unveiling the Global UFO Inquirer">[...]</a></div>]]></description>
										<content:encoded><![CDATA[<p>Our <a href="http://www.weirddatascience.net/index.php/2018/02/18/interactive-map-of-ufo-sightings-by-year/" rel="noopener noreferrer" target="_blank">previous post</a> showed an interactive map of UFO sightings drawn from the <a href="http://www.nuforc.org/" rel="noopener noreferrer" target="_blank">NUFORC</a> dataset. Whilst this did allow us to see UFO sightings over time, it remained a relatively crude tool that created a static visualisation for each year.</p>
<p>In delving deeper into the NUFORC data, it has become useful to create a more interactive tool. As we believe in providing potentially dangerous knowledge to the woefully unprepared mind, the Weird Data Science laboratory trepidatiously unveils the <a href="https://trapezohedron.weirddatascience.net/weirddatascience.net/global-ufo-inquirer" rel="noopener noreferrer" target="_blank">Global UFO Inquirer</a>.</p>
<p><iframe src="https://trapezohedron.weirddatascience.net/weirddatascience.net/global-ufo-inquirer/" 
        style="width: 100%; height: 800px" allowfullscreen="true"><br />
Apologies, it seems that your browser doesn&#8217;t support iframes.<br />
</iframe></p>
<p>You can visit the Global UFO Inquirer directly <a href="https://trapezohedron.weirddatascience.net/weirddatascience.net/global-ufo-inquirer/">here</a>.</p>
<p>This is a <a href="http://leafletjs.com/" rel="noopener noreferrer" target="_blank">Leaflet</a>-based map, built on <a href="https://www.r-project.org/" rel="noopener noreferrer" target="_blank">R</a> and <a href="https://shiny.rstudio.com" rel="noopener noreferrer" target="_blank">Shiny</a>. The map can be explored at will, restricted to any range of dates, with each data point from the NUFORC dataset labelled. Clicking on the icon for a report will unveil the location, date, and details of that report.</p>
<p>This has highlighted that our initial dataset is sadly incomplete. Firstly, NUFORC reports are frequently updated and the dataset we have ends in 2014. Of greater importance, however, is that the dataset to which we have access contains only the summary details of each report, which is restricted to the first line of text. The NUFORC website contains the full details, but these are not currently in our dataset.</p>
<p>We will therefore be developing a scraper to extract the full data from the NUFORC website as it is updated and feed it into the Inquirer. There is existing code floating around the net to carry out part of this task, but more work is needed. As an initial step we will link each report in the Global UFO Inquirer to the full online NUFORC report, but full data in the tool is the ultimate goal.</p>
<p>You can keep up to date with our latest collating of statistical horror on Twitter at <a href="https://twitter.com/weirddatasci" rel="noopener noreferrer" target="_blank">@WeirdDataSci</a>.</p>
<p>Every day, the horror deepens.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2018/02/22/unveiling-the-global-ufo-inquirer/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">152</post-id>	</item>
	</channel>
</rss>
