<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>scraping &#8211; Weird Data Science</title>
	<atom:link href="https://www.weirddatascience.net/category/scraping/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.weirddatascience.net</link>
	<description>Paranormal Distributions. Cyclopean Data. Esoteric Regression.</description>
	<lastBuildDate>Sat, 15 Nov 2025 14:01:45 +0000</lastBuildDate>
	<language>en-GB</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
<site xmlns="com-wordpress:feed-additions:1">143387998</site>	<item>
		<title>Bayes vs. the Invaders (Redivivus)</title>
		<link>https://www.weirddatascience.net/2025/11/15/bayes-vs-the-invaders-redivivus/</link>
					<comments>https://www.weirddatascience.net/2025/11/15/bayes-vs-the-invaders-redivivus/#respond</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Sat, 15 Nov 2025 10:50:27 +0000</pubDate>
				<category><![CDATA[event]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[stan]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">https://www.weirddatascience.net/?p=10231</guid>

					<description><![CDATA[<div class="mh-excerpt">Unidentifiable aerial and marine phenomena. Impossible lights in the sky. Patterns of visitation and terror. Insidious influences from the hadal voids between the stars. Who--what--swoop and glide through the ink-black nights of our world, probing and testing our structures, our societies, our minds? From barely remembered history, to early reports of impossible objects, to blurrily evidenced documentation, data concerning flying arcane observations has grown and twisted, along with our capacity to lay them bare, to subject them to analysis, and to interrogate their secrets.

In this year's OII Halloween Lecture, we will tremblingly revisit a Bayesian analysis of seventy years of UFO sightings, drawn from a dataset collected by the National UFO Reporting Center (NUFORC). Scepticism, fear, doubt, and most accepted standards of statistical rigor, will be cast aside in our unyielding and disquieting pursuit of an uncompromised truth.</div> <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2025/11/15/bayes-vs-the-invaders-redivivus/" title="Bayes vs. the Invaders (Redivivus)">[...]</a>]]></description>
										<content:encoded><![CDATA[<p>Straying still further from whatever dubious graces it once presumed, academia&#8217;s luciferous descent into murky realms of huddled speculation continues unabated. Relocated, reconstituted, in ever-fading cycles, the Oxford Internet Institute at the University of Oxford once again saw fit to deprive its students of the peace and comfort of banal rationality through this fourth annual Halloween Lecture.</p>
<p>In the absence of fresh insight, this year&#8217;s lecture revisits the chilling implications of humanity&#8217;s contact with inexplicable aerial and marine phenomena, drawing from the faintest whispers of the ante-historical record through to the shimmering echoes of statistical reasoning. Through what means do visitors from beyond the void encroach on our night-time skies? What subtle deceptions underpin their visible geometries? What values lie behind their peculiar interests in certain uncomfortably-favoured regions?</p>
<p>Despite all safeguards, and in the face of numerous barely-perceptible currents opposing such efforts, this event was captured, stored, and released into a world still cruelly unprepared to face its findings.</p>
<p>More details, and the underlying code, for these findings can be found&#8211;for those unwary enough to look&#8211;in the series of entries beginning <a href="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/">here</a>.</p>
<p><a href="https://www.youtube.com/watch?v=WobPKY4UZc0"><img decoding="async" src="https://img.youtube.com/vi/WobPKY4UZc0/2.jpg" alt="Bayes vs. the Invaders (Redivivus)"></a></p>
<p><a href="https://www.youtube.com/watch?v=WobPKY4UZc0">Click here to view the video on YouTube</a>.</p>

<blockquote><p>
<strong>Oxford Internet Institute Halloween Lecture</strong><br />
Bayes vs. the Invaders (Redivivus): A Bayesian Analysis of 70 Years of UFO Sightings<br />
<em>Prof. Joss Wright</em><br />
<em>Oxford. October 2025</em></p>
<p>The searing heat of summer retreats, cools, fades, surrendering its vitality to the flickering uncertainties of autumn. Nights draw close, like dimly-remembered friends clustering in our dreams. The spring leaves abandon their verdant dance, as they age, wither, and drift into a russet swirl of skeletal, wind-stirred fragments. </p>
<p>The dying seasons return, dragging with them time-hallowed fears and uneasy rumours, pooling around these darkly dreaming spires in a mire of primal superstition. The agonizingly brittle certainties of the modern enlightenment, our desperate faith in the gossamer fabrics of scientific progress, falter in the face of primal terrors that lurk implacably in the gloom.</p>
<p>In these darkening days, as faith in our treasured understanding dims, it is yet again time to turn our faces fully to the darkness. Halloween, slouching inexorably towards our minds, impels us as scholars to gather our methods, our theories, our data, our knowledge; and glean what light we can from the primordial glimmers of the unknown.</p>
<p>Unidentifiable aerial and marine phenomena. Impossible lights in the sky. Patterns of visitation and terror. Insidious influences from the hadal voids between the stars. Who&#8211;what&#8211;swoop and glide through the ink-black nights of our world, probing and testing our structures, our societies, our minds? From barely remembered history, to early reports of impossible objects, to blurrily evidenced documentation, data concerning flying arcane observations has grown and twisted, along with our capacity to lay them bare, to subject them to analysis, and to interrogate their secrets.</p>
<p>In this year&#8217;s OII Halloween Lecture, we will tremblingly revisit a Bayesian analysis of seventy years of UFO sightings, drawn from a dataset collected by the National UFO Reporting Center (NUFORC). Scepticism, fear, doubt, and most accepted standards of statistical rigour, will be cast aside in our unyielding and disquieting pursuit of an uncompromised truth.
</p></blockquote>
<a href="https://www.weirddatascience.net/wp-content/uploads/2025/11/2025-bayes_vs_invaders-redivivus.pdf" class="pdfemb-viewer" style="" data-width="max" data-height="max" data-mobile-width="500"  data-scrollbar="none" data-download="on" data-tracking="on" data-newwindow="on" data-pagetextbox="off" data-scrolltotop="off" data-startzoom="100" data-startfpzoom="100" data-toolbar="bottom" data-toolbar-fixed="off">2025-bayes_vs_invaders-redivivus<br/></a>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2025/11/15/bayes-vs-the-invaders-redivivus/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">10231</post-id>	</item>
		<item>
		<title>Bayes vs. the Invaders! Part One: The 37th Parallel</title>
		<link>https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/</link>
					<comments>https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/#comments</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Wed, 03 Apr 2019 13:03:10 +0000</pubDate>
				<category><![CDATA[beyond the veil]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[stan]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/?p=503</guid>

					<description><![CDATA[<div class="mh-excerpt">From our earlier <a href="http://www.weirddatascience.net/index.php/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/">studies of UFO sightings</a>, a recurring question has been the extent to which the frequency of sightings of inexplicable otherworldly phenomena depends on the population of an area. Intuitively: where there are more people to catch a glimpse of the unknown, there will be more reports of alien visitors. Is this hypothesis, however, true? Do UFO sightings closely follow population or are there other, less comforting, factors at work?</div> <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/" title="Bayes vs. the Invaders! Part One: The 37th Parallel">[...]</a>]]></description>
										<content:encoded><![CDATA[<h1>Introduction</h1>
<p>From our earlier <a href="http://www.weirddatascience.net/index.php/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/">studies of UFO sightings</a>, a recurring question has been the extent to which the frequency of sightings of inexplicable otherworldly phenomena depends on the population of an area. Intuitively: where there are more people to catch a glimpse of the unknown, there will be more reports of alien visitors.</p>
<p>Is this hypothesis, however, true? Do UFO sightings closely follow population or are there other, less comforting, factors at work?</p>
<p>In this short series of posts, we will build a statistical model of UFO sightings in the United States, based on data <a href="http://www.weirddatascience.net/blog/index.php/">previously scraped</a> from the <a href="http://www.nuforc.org">National UFO Reporting Centre</a> and see how well we can predict the rate of UFO sightings based on state population.</p>
<p>This series of posts is part tutorial and part exploration of a set of modelling tools and techniques. Specifically, we will use Generalized Linear Models (GLMs), Bayesian inference, and the <a href="http://www.mc-stan.org">Stan</a> probabilistic programming language to unveil the relationship between unsuspecting populations of US states and the dread sightings of extraterrestrial truth that they experience.</p>
<h1>Data</h1>
<p>As mentioned, we will rely on data from <a href="http://www.nuforc.org">NUFORC</a> for extraterrestrial sightings.</p>
<p>For population data, we can rely on the the <a href="https://fred.stlouisfed.org/release?rid=118">FRED</a> database for historical US state-level census data. The combination of these datasets provides us with a count of UFO sightings per year for each state, and the population of that state in that year.</p>
<p>The downloading and scraping code is included here:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show scraping code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p>ZSH script to download via `curl`<br />
[code language=&#8221;bash&#8221;]
#!/bin/zsh<br />
# Download US state-level population datasets from FRED<br />
# State series names are stored in the file &#8216;series_names&#8217; (downloaded from fred.stlouisfed.org)<br />
# &lt;https: fred.stlouisfed.org=&quot;&quot; release?rid=&quot;118&quot;&gt;<br />
#<br />
# The per-series requests is included below.&lt;/https:&gt;</p>
<p>export IFS=$&#8217;\n&#8217;</p>
<p># Download<br />
for state_series in $(cat series_names); do</p>
<p>curl -o &quot;output/$state_series.csv&quot; &quot;https://fred.stlouisfed.org/graph/fredgraph.csv?bgcolor=%23e1e9f0&amp;amp;chart_type=line&amp;amp;drp=0&amp;amp;fo=open%20sans&amp;amp;graph_bgcolor=%23ffffff&amp;amp;height=450&amp;amp;mode=fred&amp;amp;recession_bars=on&amp;amp;txtcolor=%23444444&amp;amp;ts=12&amp;amp;tts=12&amp;amp;width=1168&amp;amp;nt=0&amp;amp;thu=0&amp;amp;trc=0&amp;amp;show_legend=yes&amp;amp;show_axis_titles=yes&amp;amp;show_tooltip=yes&amp;amp;id=$state_series&amp;amp;scale=left&amp;amp;cosd=1900-01-01&amp;amp;coed=2018-01-01&amp;amp;line_color=%234572a7&amp;amp;link_values=false&amp;amp;line_style=solid&amp;amp;mark_type=none&amp;amp;mw=3&amp;amp;lw=2&amp;amp;ost=-99999&amp;amp;oet=99999&amp;amp;mma=0&amp;amp;fml=a&amp;amp;fq=Annual&amp;amp;fam=avg&amp;amp;fgst=lin&amp;amp;fgsnd=2009-06-01&amp;amp;line_index=1&amp;amp;transformation=lin&amp;amp;vintage_date=2019-03-04&amp;amp;revision_date=2019-03-04&amp;amp;nd=1900-01-01&quot;</p>
<p>done<br />
[/code]
<p>Necessary &#8216;series_names&#8217; file:<br />
[code language=&#8221;text&#8221;]
WAPOP<br />
GAPOP<br />
CAPOP<br />
MOPOP<br />
DSPOP<br />
ILPOP<br />
TXPOP<br />
NYPOP<br />
FLPOP<br />
ALPOP<br />
COPOP<br />
WIPOP<br />
AZPOP<br />
MIPOP<br />
NCPOP<br />
MAPOP<br />
CTPOP<br />
LAPOP<br />
OHPOP<br />
AKPOP<br />
TNPOP<br />
MNPOP<br />
NJPOP<br />
NMPOP<br />
ARPOP<br />
MDPOP<br />
PAPOP<br />
NVPOP<br />
IAPOP<br />
ORPOP<br />
T5POP<br />
DCPOP<br />
HIPOP<br />
NDPOP<br />
KYPOP<br />
VAPOP<br />
IDPOP<br />
KSPOP<br />
INPOP<br />
WVPOP<br />
RIPOP<br />
SCPOP<br />
MSPOP<br />
DEPOP<br />
MTPOP<br />
MEPOP<br />
NEPOP<br />
OKPOP<br />
WYPOP<br />
UTPOP<br />
NHPOP<br />
VTPOP<br />
SDPOP<br />
[/code]
<p>R code to combine data into tidy format<br />
[code language=&#8221;r&#8221;]
library( tidyverse )</p>
<p># Read all CSV files<br />
census_files &lt;- list.files( &quot;output&quot;, full.names=TRUE )</p>
<p># Join all data into a single table<br />
census_data &lt;-<br />
census_files %&gt;%<br />
map( read_csv ) %&gt;%																				# Read each file, forming a list with an element for each<br />
reduce( full_join, by=&quot;DATE&quot; ) %&gt;%															# Reduce (left to right) running a full join<br />
dplyr::arrange( DATE ) %&gt;%																		# Sort by date<br />
gather( key=&quot;state&quot;, value=&quot;population&quot;, -DATE ) %&gt;%									# Gather to long format<br />
transmute( date=DATE, state=str_replace( state, &quot;POP&quot;, &quot;&quot; ), population )		# Rename and tidy variables and names</p>
<p># Output to an .rds<br />
saveRDS( census_data, &quot;data/annual_population.rds&quot; )</p>
[/code]
</div></div>
</div>
<p>For ease, we will treat each year&#8217;s count of sightings as <em>independent</em> from the previous year&#8217;s &#8212; we do not make an assumption that the number of sightings in each year is based on the number of sightings in the previous year, but is rather due to the unknowable schemes of alien minds. (If extraterrestrials visitors were colonising areas in secrecy rather than making sporadic visits, and thus being seen repeatedly, we might not want to make such a bold assumption.) Each annual count will be treated as an individual, independent data point relating population to count, with each observation tagged by state.</p>
<p>For simplicity, particularly in building later models, we will restrict ourselves to sightings post 1990, roughly reflecting a period in which the NUFORC data sees a significant increase in reporting and thus relies less on historical reports. (NUFORC&#8217;s phone hotline has existed since 1974, and its web form since 1998.)</p>
<h1>An Awful Simplicity</h1>
<p>To begin, we start with the most basic form of model: a simple linear relationship between the count of sightings and the population of the state at that time. If sightings were purely dependent on population, it might be reasonable to assume that such a model would fit the data fairly well.</p>
<p>This relationship can be plotted with relative ease using the <code>geom_smooth()</code> function of <code>ggplot2</code> in R. For opening our eyes to the awful truth contained in the data, this is a useful first step.</p>
<figure id="attachment_539" aria-describedby="caption-attachment-539" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.png"><img fetchpriority="high" decoding="async" data-attachment-id="539" data-permalink="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/lm_ufo_population_sightings-combined-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lm_ufo_population_sightings-combined" data-image-description="" data-image-caption="&lt;p&gt;Global linear regression of UFO sightings against population.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.png" alt="Regression of UFO sightings against population." width="1920" height="1080" class="size-full wp-image-539" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined-64x36.png 64w" sizes="(max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-539" class="wp-caption-text">Global linear regression of UFO sightings against population. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-combined.pdf">PDF Version</a>)</figcaption></figure>
<p>While this graph does seem to support the argument that sightings increase with population <em>in general</em>, a closer inspection shows that the individual data points are clearly clustered. If we highlight the location of each data point, colouring points by US state, this becomes clearer:</p>
<figure id="attachment_541" aria-describedby="caption-attachment-541" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.png"><img decoding="async" data-attachment-id="541" data-permalink="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/lm_ufo_population_sightings-state-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lm_ufo_population_sightings-state" data-image-description="" data-image-caption="&lt;p&gt;Global linear regression of UFO sightings against population with per-state colours.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.png" alt="Global linear regression of UFO sightings against population with per-state colours." width="1920" height="1080" class="size-full wp-image-541" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state-64x36.png 64w" sizes="(max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-541" class="wp-caption-text">Global linear regression of UFO sightings against population with per-state colours. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-state.pdf">PDF Version</a>)</figcaption></figure>
<p>This strongly suggests that, in preference to the simple linear model across all sightings, we might instead fit a linear model individually to each state:</p>
<figure id="attachment_543" aria-describedby="caption-attachment-543" style="width: 1920px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.png"><img decoding="async" data-attachment-id="543" data-permalink="https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/lm_ufo_population_sightings-trends-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="lm_ufo_population_sightings-trends" data-image-description="" data-image-caption="&lt;p&gt;Per-state linear regression of UFO sightings against population,&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.png" alt="Per-state linear regression of UFO sightings against population," width="1920" height="1080" class="size-full wp-image-543" srcset="https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.png 1920w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends-64x36.png 64w" sizes="(max-width: 1920px) 100vw, 1920px" /></a><figcaption id="caption-attachment-543" class="wp-caption-text">Per-state linear regression of UFO sightings against population. (<a href="http://www.weirddatascience.net/wp-content/uploads/2019/04/lm_ufo_population_sightings-trends.pdf">PDF Version</a>)</figcaption></figure>
<p>The code to produce the above graphs from the NUFORC and FRED data is given below:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show data preparation and visualization code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p>Prepare and combine datasets:<br />
[code language=&#8221;r&#8221;]
library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p># Prepare data for model fitting (and plotting)</p>
<p># Load US population and UFO datasets<br />
ufo &lt;- read_csv( &quot;data/ufo_spatial.csv&quot; )<br />
census &lt;- readRDS( &quot;data/annual_population.rds&quot; )</p>
<p># Process UFO data to per-state counts per year.<br />
# Drop Puerto Rico as we don&#8217;t have census data. (Also, very few sightings &#8212; 33 in dataset.)<br />
ufo_state_annual &lt;-<br />
	ufo %&gt;%<br />
	# US only<br />
	filter( country == &quot;us&quot; ) %&gt;%<br />
	# Apologies to Puerto Rico.<br />
	filter( state != &quot;pr&quot; ) %&gt;%<br />
	# Convert date to year, drop all other variables except state.<br />
	transmute( date = year( as.POSIXct( datetime, format=&quot;%m/%d/%Y %H:%M&quot; ) ), state=str_to_upper( state ) ) %&gt;%<br />
	# Group by year<br />
	group_by( date, state ) %&gt;%<br />
	# Sum sightings<br />
	summarize( count = n() )</p>
<p># Process census suitable for joining with UFO sightings.<br />
# Drop &quot;DS&quot; state entry &#8212; (&quot;Department of State&quot;?)<br />
census &lt;-<br />
	census %&gt;%<br />
	filter( state != &quot;DS&quot; ) %&gt;%<br />
	mutate( date=year( date ) ) </p>
<p># Join datasets<br />
ufo_population_sightings &lt;-<br />
	full_join( ufo_state_annual, census )</p>
<p># Missing data implies zero sightings.<br />
# Restrict to post-1990 to avoid a high proportion of very small numbers of<br />
# sightings.<br />
ufo_population_sightings &lt;-<br />
	ufo_population_sightings %&gt;%<br />
	mutate( count = replace_na( count, 0 ) ) %&gt;%<br />
	filter( !is.na( population ) ) %&gt;%<br />
	filter( date &gt;= 1990 ) %&gt;%<br />
	filter( date &lt;= 2014 )</p>
<p>saveRDS( ufo_population_sightings, &quot;work/ufo_population_sightings.rds&quot; )<br />
[/code]
<p>Fit linear trend in data via <code>geom_smooth()</code> using a linear model.<br />
[code language=&#8221;r&#8221;]
library( tidyverse )<br />
library( magrittr )<br />
library( lubridate )</p>
<p>library( ggplot2 )<br />
library( showtext )<br />
library( RColorBrewer )</p>
<p>library( cowplot )</p>
<p># Load UFO data<br />
ufo_population_sightings &lt;-<br />
	readRDS(&quot;work/ufo_population_sightings.rds&quot;)</p>
<p># UFO reporting font<br />
font_add( &quot;main_font&quot;, &quot;/usr/share/fonts/TTF/weird/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Combined plot ignoring states.<br />
ufo_pop_plot &lt;-<br />
	ggplot( ufo_population_sightings, aes( x=population, y=count )  ) +<br />
	geom_point( colour=&quot;#0b6788&quot;, size=0.6, alpha=0.8 ) +<br />
	geom_smooth( method=&quot;lm&quot;, colour=&quot;#3cd070&quot; ) +  # UFO green<br />
	xlab( &quot;Population&quot; ) +<br />
	ylab( &quot;Sightings per annum&quot; ) +<br />
	theme_weird() +<br />
	theme(<br />
			axis.title.y = element_text( angle=90 )<br />
			)</p>
<p># Construct full plot, with title and backdrop.<br />
title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;UFO Sightings against State Population (1990-2014)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.40) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>ufo_pop_titled &lt;-<br />
	plot_grid(title, ufo_pop_plot, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/lm_ufo_population_sightings-combined.pdf&quot;,<br />
			 ufo_pop_titled,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
<p># Combined plot colouring states.<br />
ufo_pop_plot_states &lt;-<br />
	ggplot( ufo_population_sightings, aes( x=population, y=count )  ) +<br />
	geom_point( aes( colour=state ), size=0.6, alpha=0.8 ) +<br />
	geom_smooth( method=&quot;lm&quot;, colour=&quot;#3cd070&quot; ) +  # UFO green<br />
	xlab( &quot;Population&quot; ) +<br />
	ylab( &quot;Sightings per annum&quot; ) +<br />
	scale_colour_manual( values=rep( brewer.pal( name=&quot;Set3&quot;, n=12 ), times=5 ) ) +<br />
	theme_weird() +<br />
	theme(<br />
			axis.title.y = element_text( angle=90 ),<br />
			legend.position=&quot;none&quot;<br />
			)</p>
<p># Construct full plot, with title and backdrop.<br />
title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;UFO Sightings against State Population (1990-2014)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;(Per-state sightings)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=16, hjust=0, vjust=1, x=0.02, y=0.48) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.16) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>ufo_pop_states_titled &lt;-<br />
	plot_grid(title, ufo_pop_plot_states, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/lm_ufo_population_sightings-state.pdf&quot;,<br />
			 ufo_pop_states_titled,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
<p># Combined plot colouring states with per-state trend lines.<br />
ufo_pop_plot_states_trends &lt;-<br />
	ggplot( ufo_population_sightings, aes( x=population, y=count )  ) +<br />
	geom_point( aes( colour=state ), size=0.6, alpha=0.8 ) +<br />
	geom_smooth( method=&quot;lm&quot;, aes( colour=state ) ) +<br />
	xlab( &quot;Population&quot; ) +<br />
	ylab( &quot;Sightings Per Annum&quot; ) +<br />
	scale_colour_manual( values=rep( brewer.pal( name=&quot;Set3&quot;, n=12 ), times=5 ) ) +<br />
	theme_weird() +<br />
	theme(<br />
			axis.title.y = element_text( angle=90 ),<br />
			legend.position=&quot;none&quot;<br />
			)</p>
<p># Construct full plot, with title and backdrop.<br />
title &lt;-<br />
	ggdraw() +<br />
	draw_label(&quot;UFO Sightings against State Population (1990-2014)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;(Per-state trends)&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=16, hjust=0, vjust=1, x=0.02, y=0.48) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.16) </p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org&quot;, fontfamily=&quot;main_font&quot;, colour = &quot;#cccccc&quot;, size=12, hjust=1, x=0.98 ) </p>
<p>ufo_pop_states_trends_titled &lt;-<br />
	plot_grid(title, ufo_pop_plot_states_trends, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
	) </p>
<p>save_plot(&quot;output/lm_ufo_population_sightings-trends.pdf&quot;,<br />
			 ufo_pop_states_trends_titled,<br />
			 base_width = 16,<br />
			 base_height = 9,<br />
			 base_aspect_ratio = 1.78 )</p>
[/code]
</div></div>
</div>
<h1>Result</h1>
<p>The plots shown here strongly indicate that the rate of dread interplanetary visitations per capita varies differently per state. It seems, therefore, that while the number of sightings is generally proportional to population, the specific relationship is state-dependent.</p>
<p>This simple linear model is, however, entirely unsatisfactory in describing the data, despite its support for the argument that different states have different underlying rates of sightings.</p>
<p>In the next post, therefore, we will delve deeper into the unsettling relationships between UFO sightings and the innocent humans to which they are drawn. To do so, we will have to consider a class of techniques that go beyond the normal distribution that underpins key assumptions of the simple linear models used here, and so move into the eldritch world of <em>generalized linear models</em>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2019/04/03/bayes-vs-the-invaders-part-one-the-37th-parallel/feed/</wfw:commentRss>
			<slash:comments>6</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">503</post-id>	</item>
		<item>
		<title>Missing Links: Density of Bigfoot Sightings in North America</title>
		<link>https://www.weirddatascience.net/2018/10/14/missing-links-density-of-bigfoot-sightings-in-north-america/</link>
					<comments>https://www.weirddatascience.net/2018/10/14/missing-links-density-of-bigfoot-sightings-in-north-america/#respond</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Sun, 14 Oct 2018 22:15:40 +0000</pubDate>
				<category><![CDATA[cryptozoology]]></category>
		<category><![CDATA[maps]]></category>
		<category><![CDATA[scraping]]></category>
		<category><![CDATA[spatial analysis]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/blog/?p=406</guid>

					<description><![CDATA[<div class="mh-excerpt">Sightings of Homo sapiens cognatus, or Homo sylvestris, are well-recorded, with a particular prevalence in North America. Bigfoot, otherwise known as the Sasquatch, is one of the most widely-reported cryptids in the world; it is the subject of documentaries, film, music, and countless books. The Bigfoot Field Research <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2018/10/14/missing-links-density-of-bigfoot-sightings-in-north-america/" title="Missing Links: Density of Bigfoot Sightings in North America">[...]</a></div>]]></description>
										<content:encoded><![CDATA[<p>Sightings of <a href="http://sasquatchgenomeproject.org/sasquatch_genome_project_011.htm" rel="noopener noreferrer" target="_blank"><em>Homo sapiens cognatus</em></a>, or <a href="https://books.google.co.uk/books?id=Q1YF8WSor-AC&#038;lpg=PA123&#038;ots=AJejG0W58F&#038;dq=homo%20sylvestris%20linnaeus&#038;pg=PA123#v=onepage&#038;q=homo%20sylvestris%20linnaeus&#038;f=false" rel="noopener noreferrer" target="_blank"><em>Homo sylvestris</em></a>, are well-recorded, with a particular prevalence in North America. Bigfoot, otherwise known as the Sasquatch, is one of the most widely-reported cryptids in the world; it is the subject of <a href="http://bfro.net/nimoy2.asp" rel="noopener noreferrer" target="_blank">documentaries</a>, <a href="https://en.wikipedia.org/wiki/The_Abominable_Snowman_(film)" rel="noopener noreferrer" target="_blank">film</a>, <a href="https://www.youtube.com/watch?v=X9b78kM4x0w" rel="noopener noreferrer" target="_blank">music</a>, and countless <a href="https://www.amazon.com/Sasquatch-Legend-Science-Jeff-Meldrum/dp/0765312174" rel="noopener noreferrer" target="_blank">books</a>.</p>
<p>The <a href="http://www.bfro.net" rel="noopener noreferrer" target="_blank">Bigfoot Field Research Organisation</a> has compiled a detailed database of Bigfoot sightings going back to the 1920s. Each sighting is dated, located to the nearest town or road, and contains a full description of the sighting. In many cases, sightings are accompanied by a follow-up report from the organisation itself.</p>
<p>As previously with <a href="http://www.weirddatascience.net/index.php/2018/02/27/are-ufos-more-commonly-seen-near-us-military-bases/" rel="noopener noreferrer" target="_blank">UFO sightings</a> and <a href="http://www.weirddatascience.net/index.php/2018/04/10/mapping-paranormal-manifestations-in-the-british-isles/" rel="noopener noreferrer" target="_blank">paranormal manifestations</a>, our first step is to retrieve the data and parse it for analysis. Thankfully, the <code>bfro.net</code> dataset is relatively well-formatted; reports are broken down by region, with each report following a mainly standard format.</p>
<p>As before, we rely on the <a href="https://github.com/hadley/rvest" rel="noopener noreferrer" target="_blank"><code>rvest</code></a> package in R to explore and scrape the website. In this case, the key elements were to retrieve each state&#8217;s set of reports from the top level page, and retrieve the link for each report. Conveniently, these are in a standard format; the website also allows a printer-friendly mode that greatly simplifies scraping.</p>
<p>The scraping code is given here:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show scraping code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
[code language=&#8221;r&#8221;]
<p>library(tidyverse)<br />
library(progress)<br />
library(rvest)</p>
<p># Base URLs for scraping<br />
index_url &amp;lt;- &amp;quot;https://www.bfro.net/GDB/&amp;quot;<br />
base_url &amp;lt;- &amp;quot;https://www.bfro.net&amp;quot;<br />
report_base_url_pattern &amp;lt;- &amp;quot;https:\\/\\/www.bfro.net\\/GDB\\/show_report.asp\\?id=&amp;quot;</p>
<p># Retrieve list of state-level county report pages<br />
get_state_listing_urls &amp;lt;- function( url ) {</p>
<p>	read_html( url ) %&amp;gt;%<br />
		html_nodes( &#8216;a&#8217; ) %&amp;gt;%<br />
		html_attr( &#8216;href&#8217; ) %&amp;gt;%<br />
		str_match( &#8216;.*state_listing.asp\\?state=.*&#8217; ) %&amp;gt;%<br />
		na.omit %&amp;gt;%<br />
		unique %&amp;gt;%<br />
		{ paste0( base_url, . ) }</p>
<p>}</p>
<p># Return all URLs pointing to a county-level list of reports<br />
get_county_report_urls &amp;lt;- function( url ) {</p>
<p>	read_html( url ) %&amp;gt;%<br />
		html_nodes( &#8216;a&#8217; ) %&amp;gt;%<br />
		html_attr( &#8216;href&#8217; ) %&amp;gt;%<br />
		str_match( &#8216;.*show_county_reports.asp\\?state=.*&#8217; ) %&amp;gt;%<br />
		na.omit %&amp;gt;%<br />
		unique %&amp;gt;%<br />
		{ paste0( index_url, . ) }</p>
<p>}</p>
<p># Return all URLs pointing to a report in this page.<br />
get_report_urls &amp;lt;- function( url ) {</p>
<p>	read_html( url ) %&amp;gt;%<br />
		html_nodes( &#8216;a&#8217; ) %&amp;gt;%<br />
		html_attr( &#8216;href&#8217; ) %&amp;gt;%<br />
		str_match( &#8216;.*show_report.asp\\?id=[0-9]+&#8217; ) %&amp;gt;%<br />
		na.omit %&amp;gt;%<br />
		unique %&amp;gt;%<br />
		{ paste0( index_url, . ) }</p>
<p>}</p>
<p>progressive_get_county_report_urls &amp;lt;- function( url, progress_bar ) {</p>
<p>	progress_bar$tick()$print()<br />
	cat( paste(url, &amp;quot;\n&amp;quot;) , file=&amp;quot;work/progress.log&amp;quot;, append=TRUE )<br />
	chime()<br />
	get_county_report_urls( url )</p>
<p>}</p>
<p>progressive_get_report_urls &amp;lt;- function( url, progress_bar ) {</p>
<p>	progress_bar$tick()$print()<br />
	cat( paste(url, &amp;quot;\n&amp;quot;) , file=&amp;quot;work/progress.log&amp;quot;, append=TRUE )<br />
	get_report_urls( url )</p>
<p>}</p>
<p># Pull the report listing from a page.<br />
store_report &amp;lt;- function( url ) {</p>
<p>	report_id &amp;lt;-<br />
		url %&amp;gt;%<br />
		str_replace( report_base_url_pattern, &amp;quot;&amp;quot; ) %&amp;gt;%<br />
		str_replace( &amp;quot;/&amp;quot;, &amp;quot;-&amp;quot; ) %&amp;gt;%<br />
		str_replace( &amp;quot;.html&amp;quot;, &amp;quot;&amp;quot; )</p>
<p>	#message( paste0(&amp;quot;Processing report &amp;quot;, report_id, &amp;quot;&#8230; &amp;quot; ), appendLF=FALSE )</p>
<p>	# Check if this report has already been stored.<br />
	if( file.exists( paste0( &amp;quot;data/&amp;quot;, report_id, &amp;quot;.rds&amp;quot; ) ) ) {<br />
		message( paste0( &amp;quot;Report already retrieved: &amp;quot;, report_id ) )<br />
		return()<br />
	}</p>
<p>	url_pf &amp;lt;- paste0( url, &amp;quot;&amp;amp;PrinterFriendly=True&amp;quot; )</p>
<p>	report &amp;lt;-<br />
		tryCatch(<br />
					{<br />
						# Fetch and parse HTML of current page.<br />
						read_html( url_pf ) %&amp;gt;%<br />
							as.character<br />
					},<br />
					error=function( cond ) {<br />
						message( paste( &amp;quot;URL caused an error:&amp;quot;, url ))<br />
						message( &amp;quot;Error message:&amp;quot;)<br />
						message( cond )<br />
						return( NULL )<br />
					},<br />
					warning=function( cond ) {<br />
						message( paste( &amp;quot;URL caused a warning:&amp;quot;, url ))<br />
						message( &amp;quot;Warning message:&amp;quot;)<br />
						message( cond )<br />
						return( NULL )<br />
					},<br />
					finally={<br />
					}<br />
					)</p>
<p>	# Write report result to disk<br />
	saveRDS( report, paste0(&amp;quot;data/&amp;quot;, report_id, &amp;quot;.rds&amp;quot;) )</p>
<p>}</p>
<p># Progressive version of store_report<br />
progressive_store_report &amp;lt;- function( url, progress_bar ) {</p>
<p>	progress_bar$tick()$print()<br />
	cat( paste(url, &amp;quot;\n&amp;quot;) , file=&amp;quot;work/progress.log&amp;quot;, append=TRUE )<br />
	store_report( url )</p>
<p>}</p>
<p># The key to this is that links of the form<br />
# &amp;lt;https://www.bfro.net/GDB/show_county_reports.asp?state=&#8230;&amp;gt;<br />
# link to each state&#8217;s data. At the top level, then, we can just get all links matching that form.</p>
<p># At each sub-page, the links are:<br />
# &amp;lt;https://www.bfro.net/GDB/show_report.asp?id=&#8230;&amp;gt;<br />
# We can just pull all of those for each state.<br />
# NOTE: There are also non-US reports linked like this directly from the main GDB page.</p>
<p># Finally, it seems that adding &amp;quot;&amp;amp;PrinterFriendly=True&amp;quot; will strip a lot of<br />
# unnecessary formatting.</p>
<p># From the top-level page, get every URL that matches &#8216;show_county_reports.asp?state=&#8217;, make<br />
# the list unique, then download the pages.</p>
<p># Get all non-US state indices<br />
message(&amp;quot;Getting state report lists&#8230;&amp;quot;, appendLF=FALSE )<br />
bfro_state_indices &amp;lt;- get_state_listing_urls( index_url )<br />
message(&amp;quot;done.&amp;quot; )<br />
saveRDS( bfro_state_indices, &amp;quot;work/bfro_state_indices.rds&amp;quot; )</p>
<p># Get all US county-level indices<br />
message(&amp;quot;Getting county-level report urls&#8230; &amp;quot; )<br />
pb &amp;lt;- progress_estimated( length( bfro_state_indices ) )<br />
bfro_county_urls &amp;lt;-<br />
	bfro_state_indices %&amp;gt;%<br />
	map( progressive_get_county_report_urls, pb ) %&amp;gt;%<br />
	unlist %&amp;gt;%<br />
	unique<br />
saveRDS( bfro_county_urls, &amp;quot;work/bfro_county_urls.rds&amp;quot; )</p>
<p># Get URLs of US reports from county pages.<br />
# (Final subset line handles cases where pages are empty of reports, but<br />
# contain other links such as media articles.)<br />
pb &amp;lt;- progress_estimated( length( bfro_county_urls ) )<br />
bfro_report_urls &amp;lt;-<br />
	bfro_county_urls %&amp;gt;%<br />
	map( progressive_get_report_urls, pb ) %&amp;gt;%<br />
	unlist %&amp;gt;%<br />
	unique %&amp;gt;%<br />
	str_subset( &amp;quot;GDB\\/.&amp;quot; )<br />
saveRDS( bfro_report_urls, &amp;quot;work/bfro_report_urls.rds&amp;quot; )</p>
<p># Store all US reports<br />
pb &amp;lt;- progress_estimated( length( bfro_report_urls ) )<br />
bfro_report_urls %&amp;gt;%<br />
	map( progressive_store_report, pb )</p>
<p># Get all non-US indices<br />
message(&amp;quot;Getting non-US top-level report lists&#8230;&amp;quot;, appendLF=FALSE )<br />
bfro_nonus_indices &amp;lt;-<br />
	get_county_report_urls( index_url ) %&amp;gt;%<br />
	str_replace( &amp;quot;GDB\\/\\/&amp;quot;, &amp;quot;\\/&amp;quot; )<br />
message(&amp;quot;done.&amp;quot; )<br />
saveRDS( bfro_nonus_indices, &amp;quot;work/bfro_nonus_indices.rds&amp;quot; )</p>
<p># Get URLs of US reports from county pages<br />
message(&amp;quot;Getting non-US report lists&#8230;&amp;quot;, appendLF=FALSE )<br />
pb &amp;lt;- progress_estimated( length( bfro_nonus_indices ) )<br />
bfro_nonus_report_urls &amp;lt;-<br />
	bfro_nonus_indices %&amp;gt;%<br />
	map( progressive_get_report_urls, pb ) %&amp;gt;%<br />
	unlist %&amp;gt;%<br />
	unique  %&amp;gt;%<br />
	str_subset( &amp;quot;GDB\\/.&amp;quot; )<br />
saveRDS( bfro_nonus_report_urls, &amp;quot;work/bfro_nonus_report_urls.rds&amp;quot; )</p>
<p># Store all US reports<br />
pb &amp;lt;- progress_estimated( length( bfro_nonus_report_urls ) )<br />
bfro_nonus_report_urls %&amp;gt;%<br />
	map( progressive_store_report, pb )</p>
[/code]
</div></div>
</div>
<p>With each page retrieved, we step through and parse each report. Again, each page is fairly well-formatted, and uses a standard set of tags for date, location, and similar. The report parsing code is given here:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show parsing code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
[code language=&#8221;r&#8221;]
<p># NOTES</p>
<p># Process entire list to check for standardised headers. These are in capitals, and located in &amp;lt;span class=&amp;quot;field&amp;quot;&amp;gt; tags.</p>
<p># Report starts with:<br />
# &amp;lt;span class=&amp;quot;reportheader&amp;quot;&amp;gt;<br />
# &amp;lt;span class=\&amp;quot;reportclassification\&amp;quot;&amp;gt; (Look up what these mean.)<br />
# Some following &amp;lt;span class=&amp;quot;field&amp;quot;&amp;gt; types contain general summary information, eg:<br />
# &amp;quot; &amp;lt;span class=\&amp;quot;field\&amp;quot;&amp;gt;Submitted  by  witness   on Thursday, November 1, 2007.&amp;lt;/span&amp;gt;&amp;quot;<br />
# Following fields are in span tags separated by paragraph tags, eg:<br />
# &amp;lt;p&amp;gt;&amp;lt;span class=\&amp;quot;field\&amp;quot;&amp;gt;YEAR:&amp;lt;/span&amp;gt; 2007&amp;lt;/p&amp;gt;<br />
# STATE and COUNTY fields are typically links; we should pull their text. (I can&#8217;t see a good reason to parse links to anything other than the link text for our purposes.)</p>
<p>library(tidyverse)<br />
library(magrittr)<br />
library(progress)<br />
library(rvest)<br />
library(lubridate)</p>
<p># Exploratory functions</p>
<p># List all capitalised fields seen in any file<br />
list_all_fields &amp;lt;- function() {</p>
<p>	filenames &amp;lt;- list.files( &amp;quot;data&amp;quot;, pattern=&amp;quot;*.rds&amp;quot;, full.names=TRUE)</p>
<p>	# In each file, select the &amp;lt;span&amp;gt; classes, match the  uppercase field names, and extract text.<br />
	all_fields &amp;lt;-<br />
		filenames %&amp;gt;%<br />
		map( list_report_fields ) %&amp;gt;%<br />
		unlist %&amp;gt;%<br />
		unique</p>
<p>	saveRDS( all_fields, &amp;quot;work/fields.rds&amp;quot; )</p>
<p>	return( all_fields )</p>
<p>}</p>
<p>fields_to_colnames &amp;lt;- function( fields ) {</p>
<p>	# In total there are 18 fields reported in the data, with the final one<br />
	# being an apparently miscoded date and place from a report.</p>
<p>	# Format the text appropriately for dataframe column names<br />
	fields %&amp;gt;%<br />
		head( 17 ) %&amp;gt;%<br />
		str_remove( &amp;quot;:&amp;quot; ) %&amp;gt;%<br />
		str_replace_all( &amp;quot; &amp;quot;, &amp;quot;_&amp;quot; ) %&amp;gt;%<br />
		str_to_lower()</p>
<p>}</p>
<p>list_report_fields &amp;lt;- function( report ) {</p>
<p>	to_process &amp;lt;- readRDS( report )</p>
<p>	bfro_fields &amp;lt;- NULL</p>
<p>	if( !is.null( to_process ) ) {<br />
			bfro_fields &amp;lt;-<br />
				to_process %&amp;gt;%<br />
				read_html %&amp;gt;%<br />
				html_nodes( &amp;quot;span&amp;quot; ) %&amp;gt;%<br />
				html_nodes(xpath = &#8216;//*[@class=&amp;quot;field&amp;quot;]&#8217;) %&amp;gt;%<br />
				html_text %&amp;gt;%<br />
				str_subset( &amp;quot;[A-Z]+:&amp;quot; )<br />
	}</p>
<p>	return( bfro_fields )<br />
}</p>
<p># Extract node date<br />
parse_report_data &amp;lt;- function( report_data ) {</p>
<p>	# report_data should be an xml_nodeset</p>
<p>	# Metadata<br />
	metadata_list &amp;lt;-<br />
		report_data %&amp;gt;%<br />
		html_text %&amp;gt;%<br />
		str_subset( &amp;quot;^[A-Z ]+: &amp;quot; ) %&amp;gt;%<br />
		str_split_fixed( &amp;quot;: &amp;quot;, 2 ) %&amp;gt;%<br />
		as.tibble %&amp;gt;%<br />
		spread( key=V1, value=V2 ) %&amp;gt;%<br />
		set_colnames( fields_to_colnames( colnames(.) ) )</p>
<p>	# Extra details<br />
	extra_text &amp;lt;-<br />
		report_data %&amp;gt;%<br />
		html_text %&amp;gt;%<br />
		str_remove( &amp;quot;^[A-Z ]+:.*&amp;quot; ) %&amp;gt;%<br />
		str_flatten( &amp;quot; &amp;quot; ) %&amp;gt;%<br />
		str_trim</p>
<p>	# Add extra details as a column<br />
	metadata_list$extra &amp;lt;- extra_text</p>
<p>	# Note whether date is rough or accurately reported<br />
	metadata_list$rough_date &amp;lt;- FALSE</p>
<p>	# &amp;quot;Fix&amp;quot; missing date or month columns<br />
	if( &amp;quot;date&amp;quot; %in% colnames( metadata_list ) == FALSE ) {<br />
		metadata_list$date &amp;lt;- &amp;quot;1&amp;quot;<br />
		metadata_list$rough_date &amp;lt;- TRUE<br />
	}</p>
<p>	if( &amp;quot;month&amp;quot; %in% colnames( metadata_list ) == FALSE ) {<br />
		metadata_list$month &amp;lt;- &amp;quot;January&amp;quot;<br />
		metadata_list$rough_date &amp;lt;- TRUE<br />
	}</p>
<p>	# Combine date columns (year, month, date) to a true date.<br />
	metadata_list &amp;lt;-<br />
		metadata_list %&amp;gt;%<br />
		mutate( year = str_replace( year, &amp;quot;.*([0-9]{4}).*&amp;quot;, &amp;quot;\\1&amp;quot; ) ) %&amp;gt;%<br />
		mutate( date = paste0( year, &amp;quot;-&amp;quot;, month, &amp;quot;-&amp;quot;, date ), year=NULL, month=NULL ) %&amp;gt;%<br />
		mutate( date = as.POSIXct( date, format=&amp;quot;%Y-%B-%d&amp;quot; ) )</p>
<p>}</p>
<p># Read a file and pass to `post_get_all_thread`<br />
process_file &amp;lt;- function( filename ) {</p>
<p>	# Read stored file<br />
	to_process &amp;lt;- readRDS( filename )</p>
<p>	# Don&#8217;t process null files<br />
	if( is.null( to_process ) )<br />
		return( NULL )</p>
<p>	if( length( to_process ) == 0 )<br />
		return( NULL )</p>
<p>	# Produce an xml_nodeset for parsing<br />
	xml_thread &amp;lt;-<br />
		to_process %&amp;gt;%<br />
		read_html %&amp;gt;%<br />
		html_nodes( &amp;quot;p&amp;quot; )</p>
<p>	parse_report_data( xml_thread )</p>
<p>}</p>
<p># Progressive version of process_file<br />
progressive_process_file &amp;lt;- function( filename, progress_bar ) {</p>
<p>	progress_bar$tick()$print()<br />
	cat( paste(filename, &amp;quot;\n&amp;quot;) , file=&amp;quot;progress.log&amp;quot;, append=TRUE )</p>
<p>	report &amp;lt;-<br />
		tryCatch(<br />
					{<br />
						process_file( filename )<br />
					},<br />
					error=function( cond ) {<br />
						message( paste( &amp;quot;File caused an error:&amp;quot;, filename ))<br />
						message( &amp;quot;Error message:&amp;quot;)<br />
						message( cond )<br />
						return( NULL )<br />
					},<br />
					warning=function( cond ) {<br />
						message( paste( &amp;quot;URL caused a warning:&amp;quot;, url ))<br />
						message( &amp;quot;Warning message:&amp;quot;)<br />
						message( cond )<br />
						return( NULL )<br />
					},<br />
					finally={<br />
					}<br />
					)</p>
<p>	return( report )</p>
<p>}</p>
<p>## Processing starts here</p>
<p># Read all .rds data files and process into a thread<br />
filenames &amp;lt;- list.files(&amp;quot;data&amp;quot;, pattern=&amp;quot;*.rds&amp;quot;, full.names=TRUE)<br />
pb &amp;lt;- progress_estimated( length( filenames ) )</p>
<p># Begin logging<br />
cat( &amp;quot;Processing&#8230; &amp;quot;, file=&amp;quot;progress.log&amp;quot;, append=FALSE )</p>
<p>bfro_tbl &amp;lt;-<br />
	filenames %&amp;gt;%<br />
	map( progressive_process_file, pb ) %&amp;gt;%<br />
	bind_rows</p>
<p>saveRDS( bfro_tbl, &amp;quot;data/bfro_processed.rds&amp;quot; )</p>
[/code]
</div></div>
</div>
<p>With each report parsed into a form suitable for analysis, the final step in scraping the site is to geolocate the reports. As in previous posts, we rely on Google&#8217;s geolocation API. For each report, we extract an appropriate address and parse it into a set of latitude and longitude coordinates. For the purposes of this initial scrape we restrict ourselves to North America, which compromises a large majority of reports on <code>bfro.net</code>. Geolocation code is included below. (Note that a Google Geolocation API key is required for this code to run.)</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show geolocation code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
[code language=&#8221;r&#8221;]
<p>library(googleway)<br />
library(progress)<br />
library(tidyverse)<br />
library(magrittr)</p>
<p># weird.data.science Google API key (Geocoding API enabled)<br />
key &amp;lt;- &amp;lt;INSERT API KEY HERE&amp;gt;</p>
<p># Load bfro tibble<br />
bfro_data &amp;lt;- readRDS( &amp;quot;data/bfro_processed.rds&amp;quot; )</p>
<p># Geocode entries</p>
<p># BFRO entries contain some, all, or none of:<br />
# country, province, state, county, nearest_town</p>
<p># Country is only used for Canada, in which case a province is given.<br />
# Country and province are NA for the US.</p>
<p># Best plan, then is to create a string of (nearest_town, province, country) for Canadian results, and (nearest_town, county, state, &amp;quot;US&amp;quot;) for US results.</p>
<p># Bound the request to be only in North America.<br />
# (Used: &amp;lt;https://boundingbox.klokantech.com&amp;gt;)<br />
# (SW-&amp;gt;NE latitude and longitude.)<br />
bounding_box &amp;lt;- list( c( -170.3, 	24.4),<br />
							 c(  -52.3,		83.3) )</p>
<p>form_location_string &amp;lt;- function( country, province, state, county, nearest_town ) {</p>
<p>	location_string &amp;lt;- NA</p>
<p>	# US case, then Canadian<br />
	if( is.na( country ) )<br />
		location_string &amp;lt;- paste0( nearest_town, &amp;quot;, &amp;quot;, county, &amp;quot;, &amp;quot;, state, &amp;quot;, US&amp;quot; )<br />
	else<br />
		location_string &amp;lt;- paste0( nearest_town, &amp;quot;, &amp;quot;, province, &amp;quot;, &amp;quot;, &amp;quot;, Canada&amp;quot; )</p>
<p>	# Strip double commas caused by NA values and remove bracketed clauses.<br />
	location_string %&amp;lt;&amp;gt;%<br />
		str_remove_all( &amp;quot;\\([^\\)]*\\)&amp;quot; ) %&amp;gt;%<br />
		str_replace_all( &amp;quot;, , &amp;quot;, &amp;quot;, &amp;quot; ) %&amp;gt;%<br />
		str_replace_all( &amp;quot; ,&amp;quot;, &amp;quot;,&amp;quot; ) %&amp;gt;%<br />
		str_remove_all( &amp;quot;NA, &amp;quot; )</p>
<p>}</p>
<p># Create a safe version of google_geocode<br />
safe_geocode &amp;lt;- safely( google_geocode )</p>
<p># Wrap google_geocode with a progress bar call.<br />
# Also, optionally remove any bracketed substrings entirely (strip_brackets)<br />
progressive_geocode &amp;lt;- function( location_string, progress_bar ) {</p>
<p>	# Write status to log file.<br />
	progress_bar$tick()$print()</p>
<p>	cat( paste0( location_string, &amp;quot;\n&amp;quot; ), file=&amp;quot;progress.log&amp;quot;, append=TRUE )</p>
<p>	result &amp;lt;-<br />
		safe_geocode(<br />
						 address = location_string,<br />
						 key = key,<br />
						 bounds = bounding_box,<br />
						 simplify = TRUE ) </p>
<p>	# Note that this can be NULL<br />
	return( result$result )</p>
<p>}</p>
<p># Logging and output<br />
## Clear the screen first<br />
cat(c(&amp;quot;&#92;&#48;33[2J&amp;quot;,&amp;quot;&#92;&#48;33[0;0H&amp;quot;))<br />
cat(&amp;quot;Geolocating entries&#8230;\n&amp;quot;)<br />
cat( &amp;quot;Geolocating entries&#8230;\n&amp;quot;, file=&amp;quot;progress.log&amp;quot;, append=FALSE )</p>
<p># Create the location string<br />
bfro_data %&amp;lt;&amp;gt;%<br />
	mutate( location = form_location_string( country, province, state, county, nearest_town ) )</p>
<p># Geolocate each location.<br />
pb &amp;lt;- progress_estimated(nrow(bfro_data))<br />
bfro_data$geolocation &amp;lt;-<br />
	map( bfro_data$location, progressive_geocode, progress_bar = pb )</p>
<p>cat(&amp;quot;\nComplete.\n&amp;quot;)</p>
<p>## With all values scraped, save geolocated data.<br />
saveRDS( bfro_data, file=&amp;quot;work/bfro_data_geolocated.rds&amp;quot;)</p>
[/code]
</div></div>
</div>
<p>With geolocated data in hand, we can now venture into the wilds. In which areas of North America are Sasquatch most commonly seen to roam? The plot below shows the overall density of Bigfoot sightings, with individual reports marked.</p>
<figure id="attachment_414" aria-describedby="caption-attachment-414" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density.png"><img loading="lazy" decoding="async" data-attachment-id="414" data-permalink="https://www.weirddatascience.net/2018/10/14/missing-links-density-of-bigfoot-sightings-in-north-america/bigfoot-density-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Bigfoot Density Plot" data-image-description="" data-image-caption="&lt;p&gt;Density of Bigfoot sightings in North America.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-1024x576.png" alt="Density plot of Bigfoot sightings in North America" width="1024" height="576" class="size-large wp-image-414" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-414" class="wp-caption-text">Density of Bigfoot sightings in North America. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density.pdf">PDF Version</a>)</figcaption></figure>
<p>There are particular clusters on the Great Lakes, particularly in Southern Ontario; as well as in the Pacific Northwest. Smaller notable clusters exist in Florida, centered around Orlando. As with most report-based datasets, sightings are skewed towards areas of high population density.</p>
<p>The obvious first question to ask of such data is which, if any, environmental features correlate with these sightings. Other analyses of Bigfoot sightings, such as the seminal work of Lozier et al.<span id='easy-footnote-1-406' class='easy-footnote-margin-adjust'></span><span class='easy-footnote'><a href='https://www.weirddatascience.net/2018/10/14/missing-links-density-of-bigfoot-sightings-in-north-america/#easy-footnote-bottom-1-406' title='Lozier, J. D., Aniello, P. and Hickerson, M. J. (2009), Predicting the distribution of Sasquatch in western North America: anything goes with ecological niche modelling. Journal of Biogeography, 36: 1623-1627. doi:10.1111/j.1365-2699.2009.02152.x (&lt;a href=&quot;http://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2699.2009.02152.x&quot;&gt;http://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365-2699.2009.02152.x&lt;/a&gt;)'><sup>1</sup></a></span>, have suggested that forested regions are natural habitats for Sasquatch.</p>
<p>To answer this, we combine the underlying mapping data and Bigfoot sightings, with bioclimatic data taken from the <a href="http://www.landcover.org">Global Land Cover Facility</a>. Amongst other datasets, this provides us with an accurate, high-resolution <a href="http://www.landcover.org/data/lc/">land cover</a> raster map, detailing vegetation for each 5-arcminute cell in the country &#8212; approximately one cell per 10km² .</p>
<p>There are a range of bioclimatic variables in this dataset. The diagram below overlays all areas that are some form of forest onto the previous density plot.</p>
<figure id="attachment_416" aria-describedby="caption-attachment-416" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest.png"><img loading="lazy" decoding="async" data-attachment-id="416" data-permalink="https://www.weirddatascience.net/2018/10/14/missing-links-density-of-bigfoot-sightings-in-north-america/bigfoot-density-forest-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="Bigfoot Density Forest Cover" data-image-description="" data-image-caption="&lt;p&gt;Density of Bigfoot sightings in North America (Yellow) with forested areas overlaid (green).&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest-1024x576.png" alt="Density plot of Bigfoot sightings showing forest cover" width="1024" height="576" class="size-large wp-image-416" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-416" class="wp-caption-text">Density of Bigfoot sightings in North America (yellow) with forested areas overlaid (green). (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/10/bigfoot-density-forest.pdf">PDF Version</a>)</figcaption></figure>
<p>The code for producing both of the above plots is given here:</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show density plotting code.</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
[code language=&#8221;r&#8221;]
<p>library(spatstat)</p>
<p>library(rgdal)<br />
library(maptools)</p>
<p>library(tidyverse)<br />
library(magrittr)<br />
library(ggplot2)<br />
library(ggthemes)<br />
library(raster)<br />
library(viridis)<br />
library(scales)</p>
<p>library(sf) </p>
<p>library(showtext)</p>
<p>library(grid)</p>
<p>library(cowplot)<br />
library(magick)</p>
<p># Load font<br />
font_add( &amp;quot;mapfont&amp;quot;, &amp;quot;/usr/share/fonts/TTF/weird/alchemy/1689 GLC Garamond Pro/1689GLCGaramondProNormal.otf&amp;quot; )<br />
showtext_auto()</p>
<p># Read world shapefile data and tranform to an appropriate projection.<br />
world &amp;lt;- readOGR( dsn=&#8217;data/ne/10m_cultural&#8217;, layer=&#8217;ne_10m_admin_0_countries&#8217; )<br />
world_subset &amp;lt;- world[ world$iso_a2 %in% c(&amp;quot;US&amp;quot;,&amp;quot;CA&amp;quot;), ]
world_subset &amp;lt;- spTransform(world_subset,CRS(&amp;quot;+init=epsg:4326 +lon_wrap=170&amp;quot;))<br />
world_df &amp;lt;- fortify( world_subset )</p>
<p># Read bfro database, filtering out locations with &amp;quot;NA&amp;quot; latitude or longitude.<br />
# Also filter results that have been geolocated to &amp;quot;56.13037, -106.3468&amp;quot;, which is the centroid of Canada, and results from locations having no more specific result.<br />
bfro_tbl &amp;lt;-<br />
	readRDS( &amp;quot;data/bfro_locations.rds&amp;quot; ) %&amp;gt;%<br />
	filter( !is.na( lat ) &amp;amp; !is.na( lng ) ) %&amp;gt;%<br />
	filter( not( lat == modal( lat ) &amp;amp; lng == modal( lng ) ) )</p>
<p># Convert the bfro dataframe to a spatial dataframe that contains<br />
# explicit longitude and latitude projected appropriately for plotting.<br />
coordinates( bfro_tbl ) &amp;lt;- ~lng+lat<br />
proj4string( bfro_tbl ) &amp;lt;- CRS(&amp;quot;+init=epsg:4326 +lon_wrap=170&amp;quot;)<br />
bfro_tbl_spatial &amp;lt;- spTransform(bfro_tbl,CRS(proj4string(bfro_tbl)))</p>
<p># Restrict bfro_tbl to those points in the polygons defined by world_subset<br />
bfro_tbl_rows &amp;lt;- bfro_tbl_spatial %&amp;gt;%<br />
	over( world_subset ) %&amp;gt;%<br />
	is.na() %&amp;gt;%<br />
	not() %&amp;gt;%<br />
	rowSums() %&amp;gt;%<br />
	`!=`(0) %&amp;gt;%<br />
	which</p>
<p>bfro_tbl &amp;lt;- as.tibble( bfro_tbl_spatial[ bfro_tbl_rows, ] )</p>
<p># Create window for spatial analysis<br />
bfro_owin &amp;lt;- as.owin.SpatialPolygons(world_subset)</p>
<p># Function to plot density of a specific manifestation type.<br />
# plot_resolution is for the density raster, and is mainly used for quick prototyping of the output.<br />
density_plot &amp;lt;- function( density_res ) {</p>
<p>	cat( paste0( &amp;quot;Plotting density: &#8230; &amp;quot; ) )</p>
<p>	bfro_ppp &amp;lt;-<br />
		ppp(<br />
			 x=coordinates(bfro_tbl_spatial)[,1],<br />
			 y=coordinates(bfro_tbl_spatial)[,2],<br />
			 window = bfro_owin<br />
			 )</p>
<p>	# This discards &#8216;illegal&#8217; points outside of the window<br />
	bfro_ppp &amp;lt;- as.ppp(bfro_ppp)</p>
<p>	bfro_density &amp;lt;- density( bfro_ppp, diggle=T, sigma=2, dimyx = c( density_res, density_res ) )</p>
<p>	# Make density image object usable by ggplot as a raster<br />
	bfro_density_raster &amp;lt;- raster( bfro_density )<br />
	bfro_raster_tbl &amp;lt;- as.tibble( rasterToPoints( bfro_density_raster ) )</p>
<p>	# Show the map<br />
	gp &amp;lt;- ggplot()  </p>
<p>	# Add density of sightings as raster.<br />
	gp &amp;lt;- gp +<br />
		geom_raster( data = bfro_raster_tbl, aes( x=x, y=y, fill = layer ), alpha=0.8, show.legend=FALSE ) +<br />
		scale_fill_viridis( option = &amp;quot;cividis&amp;quot;, direction=1 )</p>
<p>		# Add sightings as points.<br />
	gp &amp;lt;- gp +<br />
		geom_point( data = as.tibble( bfro_ppp ), aes( x=x, y=y ), colour=&amp;quot;yellow&amp;quot;, size=0.1, alpha=0.5, show.legend=FALSE ) </p>
<p>	gp &amp;lt;- gp +<br />
		geom_map( data = world_df, aes( map_id=id ), colour=&amp;quot;#3c3f4a&amp;quot;, fill = &amp;quot;transparent&amp;quot;, size = 0.4, map = world_df ) </p>
<p>	# Theming<br />
	gp &amp;lt;- gp +<br />
		theme_map() </p>
<p>	gp &amp;lt;- gp +<br />
		theme(<br />
				plot.background = element_rect(fill = &amp;quot;transparent&amp;quot;, colour = &amp;quot;transparent&amp;quot;),<br />
				panel.border = element_blank(),<br />
				) </p>
<p>	gp &amp;lt;- gp +<br />
		guides( fill = guide_colourbar( title.position=&amp;quot;top&amp;quot;, direction=&amp;quot;horizontal&amp;quot;, barwidth=6, barheight=0.4 ) )</p>
<p>	cat(&amp;quot;done.\n&amp;quot; )<br />
	return(gp)</p>
<p>}</p>
<p># Add bioclimatic points to an existing plot<br />
add_bioclimatic_variables &amp;lt;- function( gp, bioclim = seq( 1, 16 ) ) {</p>
<p>	# This gets climatic variables for global range.<br />
	# (res=0.5 requires latitude and longitude to define a tile to download,<br />
	# but 2.5 minutes of a degree is ~5 miles, so probably good enough.)<br />
	#wc &amp;lt;- getData( &#8216;worldclim&#8217;, res=2.5, var=&#8217;bio&#8217; )</p>
<p>	# Value	Label<br />
	#	0		Water<br />
	#	1		Evergreen Needleleaf forest<br />
	#	2		Evergreen Broadleaf forest<br />
	#	3		Deciduous Needleleaf forest<br />
	#	4		Deciduous Broadleaf forest<br />
	#	5		Mixed forest<br />
	#	6		Closed shrublands<br />
	#	7		Open shrublands<br />
	#	8		Woody savannas<br />
	#	9		Savannas<br />
	#	10		Grasslands<br />
	#	11		Permanent wetlands<br />
	#	12		Croplands<br />
	#	13		Urban and built-up<br />
	#	14		Cropland/Natural vegetation mosaic<br />
	#	15		Snow and ice<br />
	#	16		Barren or sparsely vegetated<br />
	#	254	Unclassified<br />
	#	255	Fill Value                     </p>
<p>	glcf &amp;lt;- raster( &amp;quot;~/opt/datasets/gis/landcover/glcf/LC_5min_global_2012.tif&amp;quot; )</p>
<p>	# We can&#8217;t just reproject a raster to wrap differently.<br />
	# This splits and merges the two edges.<br />
	# (This is fragile, unfortunately, but works here.)<br />
	# (Reprojecting is best done as the last step, apparently.)<br />
	glcf_west &amp;lt;- crop(glcf, extent(-180, 0, 18, 84))<br />
	glcf_east &amp;lt;- crop(glcf, extent(0, 180, 18, 84))<br />
	extent(glcf_west) &amp;lt;- c(180, 360, 18, 84)<br />
	glcf  &amp;lt;- merge(glcf_west, glcf_east )</p>
<p>	glcf &amp;lt;- crop( glcf, extent( world_subset ) )<br />
	glcf &amp;lt;- projectRaster( glcf, crs = CRS(&amp;quot;+init=epsg:4326&amp;quot;) )</p>
<p>	# Make density image object usable by ggplot as a raster<br />
	glcf_raster_tbl &amp;lt;- as.tibble( rasterToPoints( glcf ) )<br />
	colnames( glcf_raster_tbl ) &amp;lt;- c (&amp;quot;x&amp;quot;, &amp;quot;y&amp;quot;, &amp;quot;layer&amp;quot; )</p>
<p>	# Add bioclimatic variables<br />
	gp &amp;lt;- gp +<br />
		geom_raster( data = glcf_raster_tbl[ which( glcf_raster_tbl$layer %in% bioclim ), ], alpha=0.8, aes( x=x, y=y ), fill=&amp;quot;#7cfc00&amp;quot;, show.legend=FALSE ) </p>
<p>	gp</p>
<p>}</p>
<p># Cowplot for full-panel plotting. (Parchment background).<br />
theme_set(theme_cowplot(font_size=4, font_family = &amp;quot;mapfont&amp;quot; ) )</p>
<p># Only calculate the density plot if it isn&#8217;t already stored on disk.<br />
if( not( file.exists( &amp;quot;work/density_plot.rds&amp;quot; ) ) ) {<br />
	gp &amp;lt;- density_plot( 1024 )<br />
	saveRDS( gp, &amp;quot;work/density_plot.rds&amp;quot; )<br />
} else {<br />
	gp &amp;lt;- readRDS( &amp;quot;work/density_plot.rds&amp;quot; )<br />
}</p>
<p># Create a plot including the bioclimatic variables.<br />
bioclim_gp &amp;lt;-<br />
	add_bioclimatic_variables( gp, seq( 1, 5 ) )</p>
<p># Construct full plot, with title and backdrop.<br />
title &amp;lt;- ggdraw() +<br />
	draw_label(&amp;quot;Bigfoot Sightings in North America&amp;quot;, fontfamily=&amp;quot;mapfont&amp;quot;, colour = &amp;quot;#3c3f4a&amp;quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&amp;quot;http://www.weirddatascience.net | @WeirdDataSci&amp;quot;, fontfamily=&amp;quot;mapfont&amp;quot;, colour = &amp;quot;#3c3f4a&amp;quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.40)</p>
<p>data_label &amp;lt;- ggdraw() +<br />
	draw_label(&amp;quot;Data: http://www.bfro.net&amp;quot;, fontfamily=&amp;quot;mapfont&amp;quot;, colour = &amp;quot;#3c3f4a&amp;quot;, size=12, hjust=1, x=0.98 )</p>
<p>tgp &amp;lt;- plot_grid(title, gp, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) </p>
<p>parchment_plot &amp;lt;- ggdraw() +<br />
	draw_image(&amp;quot;img/parchment.jpg&amp;quot;, scale=1.4 ) +<br />
	draw_plot(tgp)</p>
<p>save_plot(&amp;quot;output/bigfoot-density.pdf&amp;quot;,<br />
							parchment_plot,<br />
							base_width = 16,<br />
							base_height = 9,<br />
			           	base_aspect_ratio = 1.78 )</p>
<p># Add bioclimatic variables to plot.<br />
title &amp;lt;- ggdraw() +<br />
	draw_label(&amp;quot;Bigfoot Sightings in North America Showing Forest Cover&amp;quot;, fontfamily=&amp;quot;mapfont&amp;quot;, colour = &amp;quot;#3c3f4a&amp;quot;, size=20, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&amp;quot;http://www.weirddatascience.net | @WeirdDataSci&amp;quot;, fontfamily=&amp;quot;mapfont&amp;quot;, colour = &amp;quot;#3c3f4a&amp;quot;, size=12, hjust=0, vjust=1, x=0.02, y=0.40)</p>
<p>data_label &amp;lt;- ggdraw() +<br />
	draw_label(&amp;quot;Data: http://www.bfro.net&amp;quot;, fontfamily=&amp;quot;mapfont&amp;quot;, colour = &amp;quot;#3c3f4a&amp;quot;, size=12, hjust=1, x=0.98 )</p>
<p>tgp &amp;lt;- plot_grid(title, bioclim_gp, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) </p>
<p>parchment_plot &amp;lt;- ggdraw() +<br />
	draw_image(&amp;quot;img/parchment.jpg&amp;quot;, scale=1.4 ) +<br />
	draw_plot(tgp)</p>
<p>save_plot(&amp;quot;output/bigfoot-density-forest.pdf&amp;quot;,<br />
							parchment_plot,<br />
							base_width = 16,<br />
							base_height = 9,<br />
			           	base_aspect_ratio = 1.78 )</p>
[/code]
</div></div>
</div>
<p>From this initial plot we can see that, whilst tree cover is certainly not a bad predictor of Bigfoot sightings, it is far from a definitive correlation. The largest cluster, around the US-Canada border near Toronto, is principally lakes; whilst the secondary cluster in Florida is neither significantly forested or even close to the Everglades, which might have been expected. From the other perspective, there are significant forested areas for which sightings are reasonably rare.</p>
<p>The mystery of Bigfoot&#8217;s natural habitat and preferences is, therefore, very much unanswered from our initial analysis. With a broad range of variables still to explore &#8212; climate, altitude, food sources &#8212; future posts will attempt to determine what conditions lend themselves to strange survivals of pre-human primate activity. Perhaps changing conditions have pushed our far-distant cousins to previously unsuspected regions.</p>
<p>Until then, we keep following these trails into data unknown.</p>
<p><strong>References</strong></p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2018/10/14/missing-links-density-of-bigfoot-sightings-in-north-america/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">406</post-id>	</item>
		<item>
		<title>The Shape of the Other: The Evolution of UFO Sightings by Shape</title>
		<link>https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/</link>
					<comments>https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/#comments</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Wed, 20 Jun 2018 16:09:02 +0000</pubDate>
				<category><![CDATA[scraping]]></category>
		<category><![CDATA[ufo]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/blog/?p=331</guid>

					<description><![CDATA[<div class="mh-excerpt">In earlier analyses of the UFO phenomenon, based on the NUFORC dataset, we have examined the global density of sightings, and the relative distribution of sightings against the location of military bases in the United States. All of these analyses have, however, considered individual sightings to be more <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/" title="The Shape of the Other: The Evolution of UFO Sightings by Shape">[...]</a></div>]]></description>
										<content:encoded><![CDATA[<p>In earlier analyses of the UFO phenomenon, based on the NUFORC dataset, we have examined the global density of sightings, and the relative distribution of sightings against the location of military bases in the United States. All of these analyses have, however, considered individual sightings to be more or less equivalent.</p>
<p>The NUFORC dataset, however, provides much more detailed information on individual sightings. The most significant immediate feature of each report, beyond its time and location, is the recorded shape of each object. Was the reported UFO saucer-shaped? Triangular? A flash of light? Or did the individual see more more than one object moving in formation? By considering this aspect of the data we can interrogate more closely the nature of UFO sightings over the years.</p>
<p>The NUFORC dataset classifies each sighting as one of 46 possible shapes, with approximately three percent of entries not being classified directly. Of those 46, several categories overlap each other; &#8220;Triangle&#8221;, &#8220;triangle&#8221;, and &#8220;Triangular&#8221; are all, for example, possibilities. Additionally, the dataset contains both &#8220;other&#8221; and &#8220;unknown&#8221; categories.</p>
<p>With a minimal level of cleaning we are left with 26 categories, including the familiar circular objects, but also &#8220;crescent&#8221; (2 entries), &#8220;hexagon&#8221; (1 entry), and &#8220;cross&#8221; (356 entries). For easier representation and analysis, we have collapsed several infrequent and similar categories together, resulting in eight top-level categories distributed in the following way:</p>
<figure id="attachment_341" aria-describedby="caption-attachment-341" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer.png"><img loading="lazy" decoding="async" data-attachment-id="341" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/all_frequency-fewer-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="all_frequency-fewer" data-image-description="" data-image-caption="&lt;p&gt;Frequency of UFO sightings by shape.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-1024x576.png" alt="Frequency of UFO sightings by shape." width="1024" height="576" class="size-large wp-image-341" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-341" class="wp-caption-text">Frequency of UFO sightings by shape. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency-fewer.pdf" rel="noopener noreferrer" target="_blank">PDF Version</a>.) | <a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency.png">Frequency of UFO sightings by shape (all categories).</a> (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/all_frequency.pdf" rel="noopener noreferrer" target="_blank">PDF Version</a>.)</figcaption></figure>
<p>We can clearly see from this that lights are the most commonly-reported extraterrestrial manifestation, closely followed by the category of &#8220;round&#8221; objects that most closely matches, perhaps, the traditional concept of a UFO sighting. This category does, however, extend to spheres, disks, ovals, domes, eggs, and cones.</p>
<p>This breakdown of frequency is somewhat deceptive: the sightings reported in the NUFORC database span from a reported 1400CE (a roughly-dated cave painting in Texas depicting a saucer-shaped object) to the present day. For reliability, we have discounted reports prior to 1900CE from our analysis. In our data, then, are these sightings consistent over time? Has the form and nature of our extraterrestrial visitors shifted in recent history? Are we naively assuming that all objects are from the same source, and with similar intentions?</p>
<p>At the most mundane level, the total volume of sightings has sharply increased since the early reports in the dataset. The total number of reported sightings in the 1940s was 144 in total, compared with 4934 sightings in 2017 alone, and a peak of 8651 sightings in 2014.</p>
<p>Broken down by category, the total number of sightings since 1945 is shown below. We have removed sightings prior to 1945 from this diagram, as they were sufficiently low in volume that they were not visible. The most marked rise in sightings begins in the mid-1990s, with 502 sightings in 1994 rising to 1467 in 1995, with the overall rising trend following until its peak in 2014.</p>
<figure id="attachment_343" aria-describedby="caption-attachment-343" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time.png"><img loading="lazy" decoding="async" data-attachment-id="343" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/count_over_time-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="count_over_time" data-image-description="" data-image-caption="&lt;p&gt;Count of sightings by shape over time.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-1024x576.png" alt="Count of sightings by shape over time." width="1024" height="576" class="size-large wp-image-343" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-343" class="wp-caption-text">Count of sightings by shape over time. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_over_time.pdf">PDF Version</a>.)</figcaption></figure>
<p>To understand the specific nature of visitations, however, it is useful to view sightings as a proportion of the total, rather than their absolute numbers.</p>
<figure id="attachment_349" aria-describedby="caption-attachment-349" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time.png"><img loading="lazy" decoding="async" data-attachment-id="349" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/proportion_over_time-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="proportion_over_time" data-image-description="" data-image-caption="&lt;p&gt;Proportion of UFO sightings by shape over time.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-1024x576.png" alt="Proportion of UFO sightings by shape over time." width="1024" height="576" class="size-large wp-image-349" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-349" class="wp-caption-text">Proportion of UFO sightings by shape over time. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/proportion_over_time.pdf">PDF Version.</a>)</figcaption></figure>
<p>It is clear that, allowing for the overall increase in numbers, the proportion of generically round UFOs has reduced since the 1950s, when they clearly dominated. The most significant increase has been the rise in triangular sightings, including &#8220;delta&#8221; and &#8220;chevron&#8221; shaped craft. This conceivably tracks the development of terrestrial military aircraft towards <a href="https://en.wikipedia.org/wiki/Delta_wing" rel="noopener noreferrer" target="_blank">&#8220;delta wing&#8221;</a> and similar profiles.</p>
<p>Since the mid-90s there have been a marked increase in sightings reported simply as &#8220;lights&#8221; &#8212; flashes, fireballs, flares, and similar. From 2000 onwards, the relative proportions appear to be mainly steady.</p>
<p>For specific cases, 1995 shows an oddly large proportion of unclassified &#8220;other&#8221; sightings, although these do not seem to be the result of any particular event. The highest proportion of these are in Seattle, with 38 sightings, but are spread fairly evenly throughout the year.</p>
<p>Breaking down sightings according to specific times, rather than year-by-year reveals some other points of interest. Firstly, sightings by month:</p>
<figure id="attachment_347" aria-describedby="caption-attachment-347" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month.png"><img loading="lazy" decoding="async" data-attachment-id="347" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/count_per_month-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="count_per_month" data-image-description="" data-image-caption="&lt;p&gt;Per-month UFO sightings by shape.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-1024x576.png" alt="Per-month UFO sightings by shape." width="1024" height="576" class="size-large wp-image-347" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-347" class="wp-caption-text">Per-month UFO sightings by shape. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_month.pdf">PDF Version</a>.)</figcaption></figure>
<p>Sightings are much more common in the Northern hemisphere&#8217;s summer months, presumably due to higher numbers of people spending time outside and being in a position to spot anomalous phenomena.</p>
<p>Breaking down sightings by hour, we can see that sightings are far more common at night than during the day, with the lowest volumes of sightings around 08:00, and the highest at 21:00. For both monthly and hourly sightings, the relative proportions of sightings by shape remain relatively constant. We can conclude that UFOs&#8217; activity is unrelated to their shape. This consistency of behaviour suggests that, regardless of their shape, the various forms of UFO, however they disguise themselves, may be drawn from a single source.</p>
<figure id="attachment_345" aria-describedby="caption-attachment-345" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour.png"><img loading="lazy" decoding="async" data-attachment-id="345" data-permalink="https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/count_per_hour-2/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="count_per_hour" data-image-description="" data-image-caption="&lt;p&gt;Hourly UFO sightings by shape.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-1024x576.png" alt="Hourly UFO sightings by shape." width="1024" height="576" class="size-large wp-image-345" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-345" class="wp-caption-text">Hourly UFO sightings by shape. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/06/count_per_hour.png">PDF Version</a>.)</figcaption></figure>
<p>This is far from a definitive breakdown of UFO behaviour by their shape. In future posts we will explore whether differing shapes of UFO cluster geographically, and the extent to which cotemporaneous sightings can be correlated by their shape and description.</p>
<p>You can keep up to date with our latest statistical esoterica on Twitter at <a href="https://twitter.com/weirddatasci" rel="noopener noreferrer" target="_blank">@WeirdDataSci</a>.</p>
<p>As always, keep delving.</p>
<p><strong>Code Note:</strong><br />
In developing this entry we have moved from using the excellent work of <a href="https://data.world/timothyrenner">Tim Renner</a> in gathering and cleaning the <a href="http://www.nuforc.org">NUFORC</a> UFO dataset, and developed our own scraping code. Most posts here have included source code at the bottom of each entry. As this post relied on more than the usual code, however, and included multiple outputs, we are including only representative code. The full scraping and analysis code will be the focus of a future post.</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show analysis code</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><strong>Data:</strong></p>
<ul>
<li>National UFO Reporting Centre: <a href="http://www.nuforc.org">http://www.nuforc.org</a>.</li>
</ul>
<p><strong>Other:</strong></p>
<ul>
<li>Tox Typewriter font: <a href="https://www.dafont.com/tox-typewriter.font">https://www.dafont.com/tox-typewriter.font</a>
</ul>
<p><strong>Proportional Time Series Plot Code:</strong></p>
[code language=&#8221;r&#8221;]
<p># Structure of ufo_tbl object output from NUFORC scraping code<br />
Classes ‘tbl_df’, ‘tbl’ and &#8216;data.frame&#8217;:       114949 obs. of  8 variables:<br />
 $ occurred: POSIXct, format: &quot;1995-02-02 23:00:00&quot; &quot;1995-02-02 19:15:00&quot; &quot;1995-02-02 20:10:00&quot; &quot;1994-12-13 18:55:00&quot; &#8230;<br />
 $ reported: POSIXct, format: &quot;1995-02-02 10:47:00&quot; &quot;1995-02-03 06:06:00&quot; &quot;1995-02-03 10:32:00&quot; &quot;1995-02-03 17:45:00&quot; &#8230;<br />
 $ posted  : POSIXct, format: &quot;2003-02-05&quot; &quot;2003-03-04&quot; &quot;2003-03-21&quot; &quot;2003-03-21&quot; &#8230;<br />
 $ location: chr  &quot;Shady Grove, OR&quot; &quot;Denmark, WI&quot; &quot;Traverse City, MI&quot; &quot;Murphy, NC&quot; &#8230;<br />
 $ shape   : chr  &quot;Other&quot; &quot;Round&quot; &quot;Other&quot; &quot;Other&quot; &#8230;<br />
 $ duration: chr  &quot;15 min&quot; &quot;75 min&quot; &quot;2 min (?)&quot; &quot;&quot; &#8230;<br />
 $ details : chr  &quot;Man and wife witness very bright, moving light over ridge to southwest.  Flashing green &amp; red lights. Good rept.&quot; &quot;Caller, and apparently several other people, witnessed multiple strange craft streaking through the night sky i&quot;| __truncated__ &quot;Four children left home to go sledding on a hill located approximately 500 yards away.  At approximately 2010 h&quot;| __truncated__ &quot;Woman reports seeing strange, lighted obj. with  \&quot;arms.\&quot;  Many witnesses and written reports.&quot; &#8230;<br />
 $ date    : Date, format: &quot;1995-02-02&quot; &quot;1995-02-02&quot; &quot;1995-02-02&quot; &quot;1994-12-13&quot; &#8230;<br />
[/code]
[code language=&#8221;r&#8221;]
<p>library(tidyverse)<br />
library(magrittr)<br />
library(lubridate)<br />
library(forcats)</p>
<p>library(ggplot2)<br />
library(ggridges)<br />
library(ggthemes)<br />
library(showtext)</p>
<p>library(viridis)</p>
<p># Create a summary barplot for UFO activity by shape over time</p>
<p># Load the data from scraping http://www.nuforc.org<br />
ufo_tbl &lt;- readRDS( &quot;data/ufo_processed.rds&quot; )</p>
<p># Load font<br />
font_add( &quot;mapfont&quot;, &quot;font/Tox Typewriter.ttf&quot;)<br />
showtext_auto()</p>
<p># Shape entries are inconsistent. Manual fixing required.<br />
# Most inconsistency is lowercase and uppercase, so use str_to_title to fix<br />
# that.<br />
ufo_tbl$shape &lt;- ufo_tbl$shape %&gt;%<br />
	str_to_title %&gt;%<br />
	str_replace( &quot;(^$|Unknown)&quot;, &quot;Other&quot; ) %&gt;%<br />
	str_replace( &quot;(Changed|Changing)&quot;, &quot;Changing&quot; ) %&gt;%<br />
	str_replace( &quot;(Delta|Triangle|Triangular)&quot;, &quot;Triangle&quot; ) %&gt;%<br />
	str_replace( &quot;(Circle|Round|Dome)&quot;, &quot;Circle&quot; ) %&gt;%<br />
	# Categories too small to be represented individually<br />
	str_replace( &quot;(Crescent|Pyramid|Hexagon|Dome)&quot;, &quot;Other&quot; ) %&gt;%<br />
	str_replace( &quot;Flare&quot;, &quot;Light&quot; )</p>
<p># Further category combination for frequency plot<br />
ufo_tbl$shape &lt;- ufo_tbl$shape %&gt;%<br />
	str_replace( &quot;(Fireball|Flash)&quot;, &quot;Light&quot; ) %&gt;%<br />
	str_replace( &quot;(Sphere|Disk|Oval|Egg|Circle|Cone)&quot;, &quot;Round&quot; ) %&gt;%<br />
	str_replace( &quot;(Cigar|Cylinder)&quot;, &quot;Cylinder&quot; ) %&gt;%<br />
	str_replace( &quot;Chevron&quot;, &quot;Triangle&quot; ) %&gt;%<br />
	str_replace( &quot;(Cross|Diamond|Teardrop)&quot;, &quot;Other&quot; ) </p>
<p># Cut off at 1900, as earlier sightings are infrequent and unreliable.<br />
# (Unlike those post 1900&#8230;)<br />
ufo_tbl &lt;- ufo_tbl %&gt;%<br />
	filter( occurred &gt;= &quot;1900-01-01&quot; ) %&gt;%<br />
	filter( occurred &lt;= &quot;2019-01-01&quot; )</p>
<p># Proportional frequency of each sighting<br />
ufo_tbl$date &lt;- lubridate::date( ufo_tbl$occurred )<br />
frequency_tbl &lt;- ufo_tbl %&gt;%<br />
	count( aggr_date = year(occurred), shape ) %&gt;%<br />
	group_by( aggr_date ) %&gt;%<br />
	mutate(freq = n / sum(n))</p>
<p>colnames( frequency_tbl ) &lt;- c( &quot;aggr_date&quot;, &quot;shape&quot;, &quot;n&quot; , &quot;freq&quot;)</p>
<p>gp &lt;- ggplot( frequency_tbl, aes( x=aggr_date, fill=shape, y=freq ) ) +<br />
	labs( x=&quot;Date&quot;, y=&quot;Sightings\n(Proportion)&quot; ) +<br />
	geom_col( alpha=0.4 ) +<br />
	scale_fill_viridis( name = &quot;Shape&quot;, option=&quot;D&quot;, discrete=TRUE ) +<br />
	theme_dark() +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			legend.key = element_rect(fill = &quot;#222222&quot;),<br />
			legend.background = element_rect(fill = &quot;#222222&quot;),<br />
			legend.title = element_text( size=18, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot;, margin = margin( t = 20 ) ),<br />
			legend.text = element_text( size=14, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot;, margin = margin( t = 20 ) ),<br />
			text = element_text( color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot; ),<br />
			axis.title.x = element_text( size=18, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot;, margin = margin( t = 20 ) ),<br />
			axis.title.y = element_text( size=18, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot;, margin = margin( r = 20 ) ),<br />
			axis.text = element_text( size=14, color=&quot;#eeeeee&quot;, family=&quot;mapfont&quot; ),<br />
			panel.grid.major = element_line(colour = &quot;#444444&quot;),<br />
			panel.grid.minor = element_line(colour = &quot;#444444&quot;),<br />
			)</p>
<p># Cowplot trick for ggtitle<br />
title &lt;- ggdraw() +<br />
	draw_label(&quot;Propotion of UFO Sightings by Shape, 1900-2017&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#eeeeee&quot;, size=18, hjust=0, vjust=1, x=0.02, y=0.88) +<br />
	draw_label(&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#eeeeee&quot;, size=14, hjust=0, vjust=1, x=0.02, y=0.40)</p>
<p>data_label &lt;- ggdraw() +<br />
	draw_label(&quot;Data: http://www.nuforc.org&quot;, fontfamily=&quot;mapfont&quot;, colour = &quot;#eeeeee&quot;, size=14, hjust=1, x=0.98 ) </p>
<p># Remove legend from internal plot<br />
theme_set(theme_cowplot(font_size=4, font_family = &quot;mapfont&quot; ) ) # </p>
<p>tgp &lt;- plot_grid(title, gp, data_label, ncol=1, rel_heights=c(0.1, 1, 0.1)) </p>
<p>tgp &lt;- tgp +<br />
	theme(<br />
			panel.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			plot.background = element_rect(fill = &quot;#222222&quot;, colour = &quot;#222222&quot;),<br />
			) </p>
<p>save_plot(&quot;output/proportion_over_time.pdf&quot;,<br />
							tgp,<br />
							base_width = 16,<br />
							base_height = 9,<br />
			           	base_aspect_ratio = 1.78 )</p>
[/code]
</div></div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2018/06/20/the-shape-of-the-other-the-evolution-of-ufo-sightings-by-shape/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">331</post-id>	</item>
		<item>
		<title>Paranormal Manifestations in the British Isles</title>
		<link>https://www.weirddatascience.net/2018/03/17/paranormal-manifestations-in-the-british-isles/</link>
					<comments>https://www.weirddatascience.net/2018/03/17/paranormal-manifestations-in-the-british-isles/#comments</comments>
		
		<dc:creator><![CDATA[moth]]></dc:creator>
		<pubDate>Sat, 17 Mar 2018 09:20:09 +0000</pubDate>
				<category><![CDATA[cryptozoology]]></category>
		<category><![CDATA[ghosts]]></category>
		<category><![CDATA[paranormal]]></category>
		<category><![CDATA[scraping]]></category>
		<guid isPermaLink="false">http://www.weirddatascience.net/blog/?p=212</guid>

					<description><![CDATA[<div class="mh-excerpt">The British Isles are ancient, haunted places. Pre-Roman legends and folk songs are filled with dragons, magic, and strange, wild creatures. Spirits, fairies, and all kinds of imps and goblins roam the countryside with intent ranging from the mischievous to the malevolent. Will-o&#8217;-the-wisps lead unwary travellers deep into <a class="mh-excerpt-more" href="https://www.weirddatascience.net/2018/03/17/paranormal-manifestations-in-the-british-isles/" title="Paranormal Manifestations in the British Isles">[...]</a></div>]]></description>
										<content:encoded><![CDATA[<p>The British Isles are ancient, haunted places. Pre-Roman legends and folk songs are filled with dragons, magic, and strange, wild creatures. Spirits, fairies, and all kinds of imps and goblins roam the countryside with intent ranging from the mischievous to the malevolent. Will-o&#8217;-the-wisps lead unwary travellers deep into marshes in the night, before vanishing. The sad ghosts of past tragedies reside in ancient castles and stately homes.</p>
<p>In more recent history, strange beasts have been rumoured to live wild in the open spaces, whether large predators escaped from zoos, the last survivals of prehistory, or spirits. Every village, town, and county has its own stories and traditions.</p>
<p>The <a href="http://www.paranormaldatabase.com">Paranormal Database</a> is a collection of both traditional and recent paranormal events in Britain and Ireland. It contains details of almost 20,000 hauntings, cryptozoological sightings, legends, monsters, UFO&#8217;s, and other strange phenomena, with details of the date and time of sightings, the location, and brief descriptions.</p>
<p>The data is not easily accessible beyond directly reading pages, and required some effort and time to scrape and make usable. Paranormal Database entries contain names, dates, locations, and comments as unstructured text and so will require further effort to perform a more thorough analysis. The R code used to scrape the website is included at the end of this post.</p>
<p>To understand the range and breadth of the paranormal life of the British Isles, we will focus on the data stored in the Paranormal Database. For this initial entry, we will take a first look at the data and get an overview of what mind-numbing horrors are most commonly encountered by the unsuspecting traveller in the United Kingdom and beyond.</p>
<figure id="attachment_216" aria-describedby="caption-attachment-216" style="width: 1024px" class="wp-caption aligncenter"><a href="http://www.weirddatascience.net/wp-content/uploads/2018/03/haunting.png"><img loading="lazy" decoding="async" data-attachment-id="216" data-permalink="https://www.weirddatascience.net/2018/03/17/paranormal-manifestations-in-the-british-isles/haunting/" data-orig-file="https://www.weirddatascience.net/wp-content/uploads/2018/03/haunting.png" data-orig-size="1920,1080" data-comments-opened="1" data-image-meta="{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}" data-image-title="paranormal_occurrences" data-image-description="" data-image-caption="&lt;p&gt;Frequency of paranormal sightings in the British Isles.&lt;/p&gt;
" data-large-file="https://www.weirddatascience.net/wp-content/uploads/2018/03/haunting-1024x576.png" src="http://www.weirddatascience.net/wp-content/uploads/2018/03/haunting-1024x576.png" class="size-large wp-image-216" width="1024" height="576" alt="" srcset="https://www.weirddatascience.net/wp-content/uploads/2018/03/haunting-1024x576.png 1024w, https://www.weirddatascience.net/wp-content/uploads/2018/03/haunting-640x360.png 640w, https://www.weirddatascience.net/wp-content/uploads/2018/03/haunting-300x169.png 300w, https://www.weirddatascience.net/wp-content/uploads/2018/03/haunting-768x432.png 768w, https://www.weirddatascience.net/wp-content/uploads/2018/03/haunting-64x36.png 64w, https://www.weirddatascience.net/wp-content/uploads/2018/03/haunting.png 1920w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></a><figcaption id="caption-attachment-216" class="wp-caption-text">Frequency of paranormal sightings in the British Isles. (<a href="http://www.weirddatascience.net/wp-content/uploads/2018/03/haunting.pdf">PDF version</a>)</figcaption></figure>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show frequency table of manifestations</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<h2 id="tablepress-1-name" class="tablepress-table-name tablepress-table-name-id-1">Paranormal Manifestations in the British Isles</h2>

<table id="tablepress-1" class="tablepress tablepress-id-1" aria-labelledby="tablepress-1-name">
<thead>
<tr class="row-1">
	<th class="column-1">Manifestation Type</th><th class="column-2">Occurrences</th>
</tr>
</thead>
<tbody class="row-striping row-hover">
<tr class="row-2">
	<td class="column-1">Haunting Manifestation</td><td class="column-2">12376</td>
</tr>
<tr class="row-3">
	<td class="column-1">Legend</td><td class="column-2">1662</td>
</tr>
<tr class="row-4">
	<td class="column-1">Cryptozoology</td><td class="column-2">835</td>
</tr>
<tr class="row-5">
	<td class="column-1">Shuck</td><td class="column-2">688</td>
</tr>
<tr class="row-6">
	<td class="column-1">Poltergeist</td><td class="column-2">625</td>
</tr>
<tr class="row-7">
	<td class="column-1">Unknown Ghost Type</td><td class="column-2">614</td>
</tr>
<tr class="row-8">
	<td class="column-1">Fairy</td><td class="column-2">550</td>
</tr>
<tr class="row-9">
	<td class="column-1">Other</td><td class="column-2">427</td>
</tr>
<tr class="row-10">
	<td class="column-1">Alien Big Cat</td><td class="column-2">382</td>
</tr>
<tr class="row-11">
	<td class="column-1">UFO</td><td class="column-2">336</td>
</tr>
<tr class="row-12">
	<td class="column-1">Crisis Manifestation</td><td class="column-2">277</td>
</tr>
<tr class="row-13">
	<td class="column-1">Dragon</td><td class="column-2">190</td>
</tr>
<tr class="row-14">
	<td class="column-1">Curse</td><td class="column-2">183</td>
</tr>
<tr class="row-15">
	<td class="column-1">Post-Mortem Manifestation</td><td class="column-2">179</td>
</tr>
<tr class="row-16">
	<td class="column-1">Environmental Manifestation</td><td class="column-2">152</td>
</tr>
<tr class="row-17">
	<td class="column-1">Manifestation of the Living</td><td class="column-2">44</td>
</tr>
<tr class="row-18">
	<td class="column-1">Werewolf</td><td class="column-2">36</td>
</tr>
<tr class="row-19">
	<td class="column-1">Vampire</td><td class="column-2">34</td>
</tr>
<tr class="row-20">
	<td class="column-1">Spontaneous Human Combustion</td><td class="column-2">28</td>
</tr>
<tr class="row-21">
	<td class="column-1">Experimental Manifestation</td><td class="column-2">2</td>
</tr>
</tbody>
</table>
<!-- #tablepress-1 from cache -->
</div></div>
</div>
<p>As we can see from the diagram and the frequency table, hauntings are by far the most common manifestation in paranormal Britain, being an order of magnitude greater than the number of legend recorded. Examining the list, beyond &#8220;Haunting Manifestation&#8221; we see that several of the most common types are variants: both poltergeist activity and unknown types of ghost represent a significant amount of the total events recorded.</p>
<p>Cryptozoology, in its various forms, is also well-represented. The phenomenon of the <a href="https://en.wikipedia.org/wiki/Black_Shuck">Black Shuck</a>, a ghostly black dog, is one of the highest categories next to the main cryptozoology category, and alien big cats are close behind. Dragons, werewolves, and vampires, perhaps, deserve to be classed more as monstrous entities than cryptozoological oddities and are, in any case, far less common.</p>
<p>In brief conclusion, then, the unquiet dead are by far the most numerous beings to trouble the unhappy folk of the British Isles; twisted mockeries of natural fauna are far from rare.</p>
<p>Do particular phenomena cluster in regions and, if so, where? Are werewolves truly more commonly seen when the moon is full? Have certain manifestations become more common as time passes? Are certain sightings clustered temporally as well as geographically? Are the most haunted areas also the most cryptozoologically active? With access to the full horror of the data we can begin to answer these question about the darkest corners of the United Kingdom.</p>
<p>Full code for scraping the data and producing the plot are given below.</p>
<p>You can keep up to date with our latest visions of the statistical unknown on Twitter at <a href="https://twitter.com/weirddatasci" rel="noopener" target="_blank">@WeirdDataSci</a>.</p>
<div class="su-accordion su-u-trim">
<div class="su-spoiler su-spoiler-style-fancy su-spoiler-icon-chevron su-spoiler-closed" data-scroll-offset="0" data-anchor-in-url="no"><div class="su-spoiler-title" tabindex="0" role="button"><span class="su-spoiler-icon"></span>Show analysis code</div><div class="su-spoiler-content su-u-clearfix su-u-trim">
<p><strong>Data:</strong></p>
<ul>
<li>The Paranormal Database: <a href="http://www.paranormaldatabase.com">http://www.paranormaldatabase.com</a></li>
</ul>
<p><strong>Other:</strong></p>
<ul>
<li>JSL Ancient font: <a href="http://www.1001fonts.com/jsl-ancient-font.html">http://www.1001fonts.com/jsl-ancient-font.html</a></li>
<li>Rvest web scraping library: <a href="http://blog.rstudio.com/2014/11/24/rvest-easy-web-scraping-with-r/">http://blog.rstudio.com/2014/11/24/rvest-easy-web-scraping-with-r/</a> | <a href="https://cran.r-project.org/web/packages/rvest/index.html">https://cran.r-project.org/web/packages/rvest/index.html</a></li>
<li>xkcd styling for ggplot2: <a href="http://xkcd.r-forge.r-project.org/">http://xkcd.r-forge.r-project.org/</a></li>
</ul>
<p><strong>Paranormal Database Scraping Code:</strong></p>
[code language=&#8221;r&#8221;]
library(rvest)<br />
library(magrittr)<br />
library(tidyr)<br />
library(stringr)<br />
library(dplyr)</p>
<p># Base URL for scraping<br />
paranormal.base &lt;- &quot;http://www.paranormaldatabase.com&quot;</p>
<p># Extract links from a given page. Return a list with full URLs.<br />
extract.links &lt;- function( in.html, base.url ) {</p>
<p>	# Read the page and extract any links from the &#8216;hero-unit&#8217; div.<br />
	# Extract all links, making relatives absolute.<br />
	in.html %&gt;%<br />
		html_nodes( &#8216;.hero-unit&#8217; ) %&gt;%<br />
		html_nodes( &#8216;a&#8217; ) %&gt;%<br />
		html_attr( &#8216;href&#8217; ) %&gt;%<br />
		url_absolute( base.url )</p>
<p>}</p>
<p># Function to extract a paranormal database entries page into a dataframe.<br />
# Returns a dataframe of the entries from the given HTML.<br />
extract.entries &lt;- function( in.html ) {</p>
<p>	# Select all table cells with a width of 100% &#8212; ugly, but identifies the<br />
	# entries themselves.<br />
	page.entries &lt;- in.html %&gt;%<br />
		html_nodes( xpath=&#8217;//td[ contains( @width, &quot;100&quot; ) ]&#8217; ) %&gt;%<br />
		html_text() %&gt;%<br />
		str_replace_all( &quot;[\r\n]&quot; , &quot;&quot; )</p>
<p>	# If any entries were found, process them.<br />
	if( ( length( page.entries ) &gt; 0 ) &amp;&amp;<br />
		(str_detect(page.entries, &quot;(.*[[:graph:]])\\s*Location: (.*[[:graph:]])\\s*Type: (.*[[:graph:]])\\s*Date / Time: (.*[[:graph:]])\\s*Further Comments: (.*[[:graph:]])\\s*&quot;)) ) {<br />
		# Each entry is a string containing:<br />
		# &#8211; Location:<br />
		# &#8211; Type:<br />
		# &#8211; Date / Time:<br />
		# &#8211; Further Comments:<br />
		# So split the string on these and put the entries into a dataframe.<br />
		page.entries.df &lt;- data.frame( entry=page.entries) %&gt;%<br />
			tidyr::extract(entry, into=c(&#8216;title&#8217;, &#8216;location&#8217;, &#8216;type&#8217;, &#8216;date&#8217;, &#8216;comments&#8217;), &quot;(.*[[:graph:]])\\s*Location: (.*[[:graph:]])\\s*Type: (.*[[:graph:]])\\s*Date / Time: (.*[[:graph:]])\\s*Further Comments: (.*[[:graph:]])\\s*&quot; )</p>
<p>		# Further split these into subtypes where appropriate. Both &#8216;location&#8217; and &#8216;type&#8217; are often split into a main and a subtype by a space-surrounded hyphen.<br />
		# As we&#8217;ll do this twice, make it a function. This takes a base column name, a new name for the sub-column, and the separator string.<br />
		coalesce.join &lt;- function( base.data, base.column, sub.column, separator ) {</p>
<p>			base.data &lt;- base.data %&gt;%<br />
				tidyr::extract( base.column, into=c(&quot;tmp.main&quot;, sub.column), separator, remove=FALSE)</p>
<p>			# Combine the old base column with the new tmp.main column, using base.column to fill in the NAs in tmp.main<br />
			# where the separator was not matched.<br />
			base.data[ base.column ] &lt;- coalesce( base.data[[ &quot;tmp.main&quot; ]], base.data[[ base.column ]] )<br />
			base.data &lt;- base.data %&gt;%<br />
				select(-one_of(&quot;tmp.main&quot;))</p>
<p>		}</p>
<p>		# Split &#8216;type&#8217;<br />
		page.entries.df &lt;- coalesce.join( page.entries.df, &quot;type&quot;, &quot;subtype&quot;, &quot;(.*) &#8211; (.*)&quot; )<br />
		page.entries.df &lt;- coalesce.join( page.entries.df, &quot;location&quot;, &quot;sublocation&quot;, &quot;(.*) &#8211; (.*)&quot; )</p>
<p>		page.entries.df</p>
<p>	}</p>
<p>}</p>
<p># Read each page&#8217;s links to populate a list to process.<br />
# Each page contains a &quot;Return to Regional Listing&quot; link, and a link to enter the listings.<br />
# That listing can be either a straight entries page, or a link to subpages.</p>
<p># Starting with the base page, pull every link in the main &#8216;hero-unit&#8217; div and<br />
# add it to a &#8216;to.process&#8217; vector, checking that that URL is not already<br />
# listed. Also pass along a &#8216;processed&#8217; vector in the recursive calls so that<br />
# we don&#8217;t overly duplicate results.</p>
<p># WARNING: As written, this will cause a lot of duplication due to the lack of<br />
# global list of processed links. As this was a one-off script, I haven&#8217;t fixed this.</p>
<p># Base URL to being processing<br />
paranormal.regions &lt;- &quot;http://www.paranormaldatabase.com/regions/regions.htm&quot;</p>
<p># Recursively process the top-level URL<br />
scrape.recurse &lt;- function( in.url, processed ) {</p>
<p>	message( paste(&quot;Processing&quot;, in.url ) )</p>
<p>	# Try to fetch this URL and any subpages.<br />
	# Warn on failure<br />
	page &lt;- tryCatch(<br />
						  {<br />
							  # Fetch and parse HTML of current page.<br />
							  read_html( in.url )<br />
						  },<br />
						  error=function( cond ) {<br />
							  message( paste( &quot;URL caused an error:&quot;, in.url ))<br />
							  message( &quot;Error message:&quot;)<br />
							  message( cond )<br />
							  return( NULL )<br />
						  },<br />
						  warning=function( cond ) {<br />
							  message( paste( &quot;URL caused a warning:&quot;, in.url ))<br />
							  message( &quot;Warning message:&quot;)<br />
							  message( cond )<br />
							  return( NULL )<br />
						  },<br />
						  finally={<br />
						  }<br />
						  )</p>
<p>	if( is.null( page ) ) {<br />
		return( NULL )<br />
	}<br />
	else {<br />
		# Get the current page&#8217;s entries<br />
		links &lt;- extract.links( page, in.url )<br />
		entries &lt;- extract.entries( page )</p>
<p>		lapply( entries$title, function(x) cat( x, &quot;&#8230; &quot;, sep=&quot;&quot; ) )<br />
		cat( &quot;\n&quot;)</p>
<p>		# Recurse, but don&#8217;t follow any known links<br />
		new.entries &lt;-<br />
			lapply( setdiff( links, processed ), function(x) scrape.recurse( x, union( links, processed ) ) ) %&gt;%<br />
			bind_rows()</p>
<p>		# Bind entries from sub-pages to this page&#8217;s entries.<br />
		all.entries &lt;- bind_rows( entries, new.entries )</p>
<p>		# Report success and return results<br />
		message( paste( &quot;Successfully processed:&quot;, in.url ) )<br />
		return( all.entries )<br />
	}</p>
<p>}</p>
<p># Run the scraping and parsing, then remove duplicates.<br />
all.entries &lt;- scrape.recurse( paranormal.regions, paranormal.regions )<br />
paranormal.df &lt;- unique( all.entries )</p>
<p># Manually fix some type entries.<br />
paranormal.df$type[ which( paranormal.df$type == &quot;legend&quot; ) ] &lt;- &quot;Legend&quot;<br />
paranormal.df$type[ which( paranormal.df$type == &quot;Haunting Manifestation?&quot; ) ] &lt;- &quot;Haunting Manifestation&quot;<br />
paranormal.df$type[ which( paranormal.df$type == &quot;SHC&quot; ) ] &lt;- &quot;Spontaneous Human Combustion&quot;<br />
paranormal.df$type[ which( paranormal.df$type == &quot;ABC&quot; ) ] &lt;- &quot;Alien Big Cat&quot;</p>
<p># With all values scraped, save data.<br />
save( paranormal.df, file=&quot;paranormal.Rdata&quot;)</p>
[/code]
<p><strong>Paranormal Manifestations in the British Isles Plotting Code:</strong><br />
[code language=&#8221;r&#8221;]
library(ggplot2)<br />
library(jpeg)<br />
library(xkcd)<br />
library(showtext)<br />
library(grid)<br />
library(gridExtra)</p>
<p># Create a summary barplot for paranormal activity in the UK.</p>
<p># Load the data from scraping http://www.paranormaldatabase.com/<br />
load( &quot;../data/paranormal.Rdata&quot; )</p>
<p># Process the data to frequency counts of each type of event.<br />
counts &lt;- table( paranormal.df$type )<br />
df &lt;- as.data.frame.table( sort(counts, decreasing=FALSE) )<br />
colnames( df ) &lt;- c(&quot;type&quot;, &quot;frequency&quot; )</p>
<p># Create the plot.<br />
# Use the xkcd package to create irregular bars for a hand-drawn look.<br />
df$xmin &lt;- 1<br />
df$xmax &lt;- df$frequency<br />
df$ymin &lt;- as.numeric(df$type) &#8211; 0.1<br />
df$ymax &lt;- as.numeric(df$type) + 0.1</p>
<p>xrange &lt;- range( min( df$xmin ), max( df$xmax ) )<br />
yrange &lt;- range( min( df$ymin ), max( df$ymax ) + 1 )<br />
mapping &lt;- aes( xmin=xmin, ymin=ymin, xmax=xmax, ymax=ymax )</p>
<p># Load font<br />
font_add( &quot;handwriting&quot;, &quot;/usr/share/fonts/TTF/weird/JANCIENT.TTF&quot; )<br />
showtext_auto()</p>
<p># Load background image.<br />
bg.image &lt;- jpeg::readJPEG(&quot;img/vellum.jpg&quot;)<br />
bg.grob &lt;- rasterGrob( bg.image, interpolate=TRUE,<br />
							 width = 1.8,<br />
							 height = 1.4 )</p>
<p>gp &lt;- ggplot( data=paranormal.df, aes( x=frequency, y=type ) ) +<br />
			annotation_custom( bg.grob, xmin=-Inf, xmax=Inf, ymin=-Inf, ymax=Inf ) +<br />
			xkcdrect( mapping, df, alpha=0.6, fill=&quot;#8a0707&quot;, size=1 ) +<br />
			xkcdaxis( xrange, yrange ) +<br />
			theme(<br />
				text = element_text( size=12, color=&quot;#232732&quot;, family=&quot;handwriting&quot; ),<br />
				plot.background = element_rect(fill = &quot;transparent&quot;, color=&quot;transparent&quot;),<br />
				panel.background = element_rect(fill = &quot;transparent&quot;, color=&quot;transparent&quot;),<br />
				plot.title = element_text( size=24, colour=&quot;#232732&quot;, family=&quot;handwriting&quot; ),<br />
				axis.title = element_text( size=18, colour=&quot;#232732&quot;, family=&quot;handwriting&quot; ),<br />
				axis.text.x=element_text( size=12, color=&quot;#232732&quot;, family=&quot;handwriting&quot; ),<br />
				axis.text.y=element_text( size=12, color=&quot;#232732&quot;, family=&quot;handwriting&quot;, hjust=1) ) +<br />
			scale_y_discrete( labels=c( as.character( df$type ) ) ) +<br />
			scale_x_continuous( expand = c( 0, 0 ) ) +<br />
			ylab( &quot;Phenomenon&quot; ) +<br />
			xlab( &quot;Occurrence&quot; ) +<br />
			labs( caption = &quot;Data from: http://www.paranormaldatabase.com&quot; ) +<br />
			ggtitle(&quot;Paranormal Manifestations in the British Isles&quot;, subtitle=&quot;http://www.weirddatascience.net | @WeirdDataSci&quot;)</p>
<p># To allow the background image to extend to the edge of the panel, we have to generate<br />
# a gtable directly from gp, then turn off clipping.<br />
p2 &lt;- ggplot_gtable(ggplot_build(gp))<br />
p2$layout$clip[p2$layout$name==&quot;panel&quot;] &lt;- &quot;off&quot;</p>
<p># To create the output, we need to call grid.draw on the gtable.<br />
# Use the pdf device to draw directly to preserve clipping and fonts.<br />
# showtext has to be called explicitly in here, for some reason.<br />
pdf( file=&quot;output/haunting.pdf&quot;, width=16, height=9 )<br />
showtext_begin()<br />
grid.draw(p2)<br />
showtext_end()<br />
dev.off()</p>
[/code]
</div></div>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://www.weirddatascience.net/2018/03/17/paranormal-manifestations-in-the-british-isles/feed/</wfw:commentRss>
			<slash:comments>1</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">212</post-id>	</item>
	</channel>
</rss>
