Did you know that most reported UFOs are shaped like lights, disks, or triangles? Or that July 4th is when we get the most UFO reports? And most people who report UFOs consider the experience to be positive?
These are just a few things I learned as I did my initial data analysis on the dataset that contains 100,000+ reports of UFO sightings.
Last week, I did a refresh of my ongoing project where I look at UFO report statistically to see if I can get any insight into the phenomenon. This time I added some fields to support text mining and I included more derived fields to make any future dashboards or analysis easier.
Most sightings last around 30 seconds to 5 minutes and people usually report strange lights in the sky or disk like objects. Sightings happen more on the weekends and late at night. Most people feel positive about their experience and the top three emotions associated with reports are: anticipation, trust, and fear.
Here is an example of one of those typical cases:
I was looking up at the sky and saw what appeared to be a very very fast moving star like object moving very fast from west to east. I was standing alone on the corner of East 204th St and Villa Ave the Bronx I looked up in the sky and saw a very bright star-like object going at a very very fast speed from West to East heading towards Queens from my stand point. This is another object that I could not figure out It was not Military, Commercial, Helicopter, Blimp, Bird, Weather Balloon, or any known Earth made object. Just a disk/Star like flying object.
Many of these reports look like this, essentially people are seeing strange lights in the sky that move in ways they don't anticipate. It's tempting to dismiss reports like this since we know that satellites in Low Earth Orbit move from east to west and would appear like a bright star just like this sighting in the Bronx.
Many cases are just like this, unusual events witnessed by careful observers. Most people note the experience, but don't have enough information to say that the event was truly extraordinary.
It's nice to have a baseline so we can start to look for cases that may be a little bit more interesting. We can use the dataset to filter out common cases. Something else that we can do is code as much as the reports as we can to see if we can use clustering functions to try and see patterns in sightings.
Another thing I plan on doing is more text mining. I've already used this to get the stats around emotional tone and how people feel overall about their sightings. But, I think there are more ways to use the data in the description field. One thing could be to use a word cloud or some other tool to show frequency. But I might also want to try some topic analysis and even correlations to other texts. All of this is too involved for this preliminary data exploration.
Something I would like to do is create a very local version of this data and then go through each case carefully. In parallel to this project, I've created a version just for Bucks County, PA over the past ten years. It's about 10 cases. I'm going to build a simple dashboard to help read these cases.