What do they know?

Since the Roswell Incident in 1947, UFO’s have been associated with secretive military installations where mysterious craft dart across the night sky. Skeptics might hold that many UFO sightings, far from being extraterrestrial visitors, are better explained as experimental or conventional military craft. Does this association hold, though, in light of the wealth of UFO sightings collated by NUFORC? Are UFO’s more likely to be seen when in close proximity to a US airforce base?

UFO Sightings Density against US Airforce Installations. (PDF | Print-Friendly PDF)

As a first step to approaching this question, we can rely on the reported NUFORC data of UFO sightings, and the US government’s conveniently thorough dataset of military installations.

Before performing a more robust statistical analysis, we can quickly combine these two datasets to see if any obvious visual patterns emerge. As always, visual analysis comes with the strong caveat that apparent patterns must be backed up with real statistics. Beware eyeballs.

We will focus only on sightings in the United States. Whilst the NUFORC dataset is creditably global the reports are overwhelmingly from the US, reflecting the fact that NUFORC is largely a US-based endeavour and is much less likely to receive reports from elsewhere in the world.

For this quick visual exploration, we will produce a density plot, or heatmap, of UFO sightings going back to 1906, using the excellent spatstat R package, as described in Baddeley, Rubak, and Turner’s Spatial Point Patterns: Methodology and Applications with R.

The first task is to transform the data appropriately from the weirdly non-Euclidean geometry of longitude and latitude to a geometry that allows for consistent measurement of distance between points. Following this we can run a standard kernel density estimate of sightings that, briefly, produces a probability distribution over the analysed space that estimates the likelihood of a UFO sighting at each point.

The output of spatstat’s density function is a pixel image, appropriate for plotting in R’s base graphics. As fanatical devotees of the cult of ggplot2, however, we instead convert this image to a raster suitable for plotting with ggplot’s geom_raster.

With the density plot calculated we can load the US military base data and perform a similar transform from longitude and latitude to Euclidean space. We then restrict the data to each active US Airforce installation, and plot the resulting set of points over the underlying density plot of UFO sightings. Being worthy of special attention, we highlight the location of Area 51.

At a first glance, there do seem to be correlations with particular clusters of airforce installations and UFO sightings. It is immediately obvious that sightings are much more common on the coasts of the US than in the centre of the continent, although the relatively sparse population density might go some way towards explaining this phenomenon.

The next step in this analysis, which we will carry out in a future post, will be to conduct a formal analysis of the correlation between the sightings density and the distance from airforce bases. From our initial observations, however, dark suspicions have already been raised.

You can keep up to date with our latest tearings of the mathematical veil on Twitter at @WeirdDataSci.

Show analysis code








# Load font
font.add( "mapfont", "font/Tox Typewriter.ttf")

# Read world shapefile data and tranform to an appropriate projection
world <- readOGR( dsn='data/ne/110m_cultural', layer='ne_110m_admin_0_countries' )
world.subset <- world[ world$continent %in% c("North America"), ]
world.subset <- spTransform(world.subset,CRS("+init=epsg:4326"))

# Fortify world data for plotting
world.df <- fortify( world.subset )

# Get list of unique countries for processing.
countries.all <- data.frame( unique(world.df$id)[-1])
colnames( countries.all ) <- c("country")

# Get US military base data
us.mil <- readOGR( dsn='data/usmil', layer='MIRTA_Points' )
us.mil <- spTransform(us.mil,CRS("+init=epsg:4326"))

# UFO Sightings Data
ufo <- read.csv("data/scrubbed.csv", stringsAsFactors=FALSE)

# Convert latitude and longitude to numeric, and omit result NA values.
ufo$latitude <- as.numeric(ufo$latitude)
ufo$longitude <- as.numeric(ufo$longitude)
ufo <- na.omit( ufo )

# Convert the ufo dataframe to a spatial dataframe that contains explicit longitude and latitude projected appropriately for plotting.
coordinates( ufo ) <- ~longitude+latitude
proj4string( ufo )<-CRS("+init=epsg:4326")
ufo <- spTransform(ufo,CRS(proj4string(ufo)))

# Create window for spatial analysis
ufo.owin <- as.owin.SpatialPolygons(world.subset)
ufo.ppp <- ppp( x=coordinates(ufo)[,1], y=coordinates(ufo)[,2], window = ufo.owin, marks = ufo$shape )

# This discards 'illegal' points outside of the window
ufo.ppp <- as.ppp(ufo.ppp)

# Highlight Area 51
roswell.coords = cbind(-115.806999, 37.237 )
roswell.sp = SpatialPoints(roswell.coords)

proj4string( roswell.sp )<-CRS("+proj=longlat")
roswell.sp <- spTransform(roswell.sp,CRS("+init=epsg:4326"))
roswell.sp.df <- as.data.frame( coordinates( roswell.sp ) )
roswell.sp.df$text.label <- 'Area 51'

# Get only active US airforce bases.
us.mil.airforce <- us.mil[ us.mil$COMPONENT == "AF Active", ]
us.mil.ppp <- ppp( x=coordinates(us.mil.airforce)[,1], y=coordinates(us.mil.airforce)[,2], window = ufo.owin )
us.mil.ppp <- as.ppp( us.mil.ppp )
us.mil.sp <- as( us.mil.ppp, "SpatialPoints" )

# Calculate density of UFO sightings
ufo.density <- density( ufo.ppp, diggle=T, sigma=1.4, dimyx=c(2048,2048) )

# Make density image object usable by ggplot as a raster
ufo.density.raster <- raster(ufo.density)
raster.df <- as.data.frame( rasterToPoints( ufo.density.raster))

# Plot
gp <- ggplot() +

# Underlying map
geom_map( data = world.df, aes( map_id=id ), colour = "grey20", size = 0.5, map = world.df, show.legend = FALSE ) +

# Theming
theme_map() +
plot.background = element_rect(fill = "#444444"),
panel.border = element_blank(),
plot.title = element_text( size=24, colour="#cccccc", family="mapfont" ),
text = element_text( size=14, color="#cccccc", family="mapfont" ),
) +

legend.background = element_rect( colour ="#444444", fill = "#444444" ), 
legend.key = element_rect(fill = "#ffffff", colour="#444444"),
legend.position = c(1,0),
legend.justification = c(1,0)
) +

guides( fill = guide_colourbar( title.position="top", direction="horizontal", barwidth=16 ) ) +

   # Add density of sightings as raster.
   geom_raster( data = raster.df, alpha=1, aes( x=x, y=y, fill=layer), show.legend=TRUE ) +

# Add density of sightings as raster.
geom_raster( data = raster.df, alpha=1, aes( x=x, y=y, fill=layer), show.legend=FALSE) +

# Colour using Viridis
scale_fill_viridis( option="magma" )

# Overlay US Airforce locations
dot.size <- 1.2
stroke.size <- 0.4

gp <- gp +
geom_point( data = as.data.frame( coordinates(us.mil.sp) ), aes( x=mx, y=my ), size=dot.size, stroke=stroke.size, shape=21, fill="#eeeeee", color="#222222" ) +

# Special case for Area 51
# Due to ggplot only supporting a single colour scale for the entire plot, there's some awkwardness here in using 'colour' rather than 'fill' for the Area 51 dot and manually adding an outline. This lets us add an easy legend to the map, though, highlighting that location.
geom_point( data = roswell.sp.df, aes( x=coords.x1, y=coords.x2, colour=text.label ), size=1, stroke=0, show.legend=TRUE ) +
geom_point( data = roswell.sp.df, aes( x=coords.x1, y=coords.x2 ), size=1, stroke=stroke.size, shape=21, color="#222222" ) +
scale_colour_manual(values = c("red")) +
theme( legend.title = element_blank(), legend.background = element_rect( fill = "#444444" ), legend.key = element_rect(fill = "#444444") ) +

# Acknowledge data sources
labs( caption = "Data: http://www.nuforc.org | https://catalog.data.gov/dataset/military-bases-national" ) +

# Add our title
ggtitle("Density of UFO Sightings against Active US Airforce Installations", subtitle="http://www.weirddatascience.net | @WeirdDataSci")

# Output
ggsave( "output/base-density.pdf", width=16, height=9 )