A Century of Earthquakes in an Interactive Tool
How Data Science and Strategic Analytics Make Risk Management Smarter
I Feel the Earth Move Under My Feet
Easter 2014, I felt the earth rumbling beneath me for the first time in my life, as I sipped my morning tea in Oakham, Rutland, UK. Admittedly, the earthquake was a mere 1.7 on the Richter scale, with tremors so minor, my dog didn’t even bark.
However, only two weeks earlier, 3.2 and 3.5 Richter-scale earthquakes in Oakham had been felt 50 miles out. It didn’t matter that they caused no damage; the local media was in a frenzy. So were residents, tweeting things like “Another earthquake in Oakham? What on earth is going on?” A local newspaper rankled the public further by squawking about the UK earthquake average—200 a year! Who knew? Of course, they are nearly all imperceptible, only a few reaching a 3.0 magnitude.
As a card-carrying data scientist working in insurance, I compared this minor episode to the devastation wreaked by earthquakes around the world, and began to wonder how global earthquake magnitudes and frequency compare historically. As I was trapped inside on a bank holiday, rain coming down in droves, I decided to find out.
I Go to My Sources
Surprise: I went straight to the web! Online sources are growing exponentially each year. In fact, 2010 marked the beginning of the Zettabyte era. What’s a Zettabyte? A trillion gigabytes, enough to fill 1,000 data centers. Estimates vary, but the current data cloud is said to hold nearly 8 zettabytes of rapidly changing, complex data from over 1 billion websites.
Government and academic sites are obviously the first stop for reliable, accessible data. According to the UN’s 2014 E-government survey, public, archived data makes up the majority of government web content. Each of the 193 UN member states is now online, and 171 publish archives.
There is a caveat. It’s quite a challenge to find what you need, extract it, funnel it into an indexed database, analyze it, and apply it. Unless you can do that, this wealth of “big data” on the internet actually becomes overwhelming and virtually useless.
My data-scientist card means I’m up for the challenge.
I visited an old standby, the U.S. Geological Survey site. Jackpot: global earthquake data going back to 1900. As expected, however, separate phenomena were in separate reports, and most of the data was in a form that only seismologists and disaster experts would understand at first glance.
Time to Put Analytics to Work
Step 1, Data Acquisition: For my own rainy-day project, I had all the numbers I needed on a single internet site. In professional, complex projects, information has to be gathered from a slew of sources, and the process can be laborious.
Where do insurers and risk engineers normally acquire risk data? To start, obviously risk engineers and insurers already possess vast databases of information going back decades—in some cases over a century—on every kind of risk and disaster. But we can never have enough! In addition to our own historical files and online information, we incorporate expert, scientific modeling data. That provides a meteorological, geological, and hydrological framework for locations around the world.
The final element is current client data from submissions. This location-specific information is obviously fundamental to accurate risk analysis. Sometimes clients are reticent to share, or can’t see the point of supplying various details, like how thick the beams are in the roof, or what year certain features were added. But the fact is, cumulative risk is higher or lower, depending on a multitude of real building and location features. Also, apparently minor design and construction variations can dramatically alter the risk in rare natural disasters.
In most cases, commercial insurers are faced with a much bigger challenge than personal insurers when tracing statistical patterns. Personal insurance statistics truly qualify as “big data”, and include millions of comparable attribute sets from individuals around the world, with lots of obvious patterns. Commercial insurance statistics are based on small, highly varied data samples, with myriad exceptions. Our best hope is to get “broad data”, from as many sources as possible.
Step 2, Data Wrangling: So, the details matter in submissions, even if they seem, at first glance, not to influence risk directly. Diligent data scientists are combing through submissions as I write, wrangling numbers out of pdfs, cleaning them up, merging data from multiple files and making the format consistent, so that our analytics can interpret it.
The more precise and comprehensive the data, the better risk engineers can quantify and predict risk and losses, and advise companies on loss-prevention measures. Better data also enables underwriters to write policies that provide the right amount of coverage at an appropriate price. This is central to ensuring adequate money is available to pay claims swiftly and fully, so that businesses can rebound quickly from disaster.
Wrangling the earthquake data out of the US Geological Survey site was comparatively simple. I created several algorithms, which extracted specific data from various reports, and funneled it into a spreadsheet.
The Power of Interactive Visuals
Once the data was organized, I used the global map as a visual scaffold, creating an interactive tool that any 12-year-old could use, to see where earthquakes of 5.0 and greater magnitudes have struck around the world in the past century.
This is obviously very basic, but it illustrates a powerful device which we are just beginning to explore: visual analytics. The user manipulates data hands-on, and sees exactly how changing an isolated factor impacts the overall outcome. In the case of this tool, the user can change one of three data attributes—the magnitude, depth (geographical reach) or date—and then see how that affects earthquake patterns on the display.
Human memory is short, and it is difficult, or impossible, for anyone to intuitively estimate the potential for rare events, like earthquakes. The earthquake tool is specifically designed to answer a single question: how rare are earthquakes of varying magnitudes in different locations around the world?
Humans are also extremely visual. In fact, XL GAPS (Global Asset Protection Services LLC) already offers a tool called MyAnalysis, which allows clients to adjust their property location data, and generate tables and charts to illustrate risk trends by hazard, location, or severity. Clients can examine individual site risk or overall portfolio risk, and create strategies to minimize business exposure to physical disasters like fires and floods.
Increasingly, we look to “reversible” computer simulations, or “generative models”, to play out risk scenarios. This enables us to ask even better questions about what drives client risks. Imagine yet another dimension of interactivity, depicting location risk for a factory. There could be literally hundreds of risk factors which a risk engineer, underwriter, and, eventually, even a client could adjust. Each adjustment would be reflected in a simulation playing over a prescribed time period, showing how those minor variations could impact the overall location risk. Visual, hands-on tools make it much easier for anyone to understand the real relationships between complex risk factors.
Actionable Business Decisions
The question is: How do we interpret those relationships for smarter risk engineering and risk transfer—actionable business decisions? We return to the better data, better decision principle. Understanding data relationships is even more important than simply having a handle on isolated statistics. Those relationships between data sets are decisive in predicting the probability, scope, and fiscal impact of risk.
This raises the risk and market knowledge of risk engineers, risk managers, and underwriters to another level. It’s called strategic analytics, because it enables all of these experts to offer more holistic risk strategies, with enhanced, long-term risk mitigation and loss prevention, and even more accurate and effective insurance policies. It is early days yet, but in some sectors and regions, we have observed up to a 20% reduction in loss incidents, when strategic analytics are used. Disasters which might have caused clients severe business interruption were reduced to mere blips on the radar screen.
Our interactive future will include not only visual tools, but also deeper collaboration between all business stakeholders. We will be asking for more and more data, but our clients will quickly realize the benefits of working together more closely. Strategic analytics are crucial to keeping a business operational in an increasingly complex, migrating global economy, amid accelerating climate change and a population explosion.
My first earthquake was a reminder of the changes facing all of us. As a data scientist, it is my duty to create the tools that help clients confront these changes head on. This will empower them to seize opportunities and innovate with greater confidence, ultimately propelling the economy and global society forward.