The Role of Big Data Analytics in Predicting Maritime Security Threats and Identifying Risk Hotspots
1. Introduction
Big data analytics has been acknowledged to be an essential tool in identifying trends and patterns which lead to high quality decision-making in the corporate sector. It is able to do so as it draws upon a wide range of data sources, effectively organizing and managing these data sources by means of statistical and mathematical modeling. For many years, corporate enterprises have been using big data to better understand their customers and make more informed business decisions. Often these data sources are able to draw from external sources such as social media, photos, videos, sensors and RFID. While this has been a successful venture in the private sector, it has also been identified by various research organizations that big data is also currently an important tool for the federal intelligence community to maintain national security and recognize global trends. Big data is able to forecast scenarios of interest through modeling which is able to identify areas of instability, also when paired with visualization tools it can provide a clear understanding to decision makers on what the data represents. An example of this would be using Google Earth to visually represent areas of civil unrest around the world. This understanding ultimately leads to a more informed decision where it can be understood how best to allocate resources to risk areas. Big data has the potential to contribute to national security by means of comparative analysis. An example of this would be analyzing a data set of global terrorist incidents and comparing this to another data set of global GDP and economic instances. By locating areas of where the two data sets intersect, decision makers are able to understand where and why terrorism occurs and gain an insight on how best to prevent it. This same process is very useful to the corporate sector when understanding the behavior of customers and markets. Given that the global economy is built on an interconnected infrastructure of ships, ports, offshore facilities and the trade which occurs between them, data analysts have realized that maritime security and the economic implications of global trends are an area of high consequence and relatively low understanding. In real terms this translates to the cost of global trade being high and the cost of failure very detrimental. Big data has already been proven as a valuable tool to understand global trends of maritime piracy and the circumstances which cause piracy to occur. In research from 2008, data experts from the United States Department of Defense’s Office of Naval Research, the US Naval Postgraduate School and Homeland Security’s Pacific Disaster Center, used a data mining and computational modeling based approach to analyze piracy in the Western Indian Ocean. This was done by first identifying what it meant for there to be less piracy; which led to a better understanding of the trends difference between piracy occurring and not occurring. The role of this was to provide some insight for NATO and coalition forces which at the time were primarily focused on security operations in the Middle East.
1.1. Background and Significance
As a result of this ‘displacement’ effect, a recent assessment of global maritime piracy has concluded that acts of piracy and sea robbery have become more widespread, as opposed to being focused in specific high-risk areas. Given the strategic implications of a threat that seeks to undermine the rule of law and freedom of the seas, it is imperative that the global maritime community engages in cooperative efforts to identify these threat trends and act in a preventative manner. This will require a comprehensive and timely information sharing process that will facilitate an understanding of where, when, and why these incidents are occurring. An accurate and concise predictive picture of maritime security threats will enable states, international organizations, and industry bodies to allocate resources so as to prevent and deter these incidents, and in turn shift the security environment back to a state of order.
Over the past five years, the global maritime environment has witnessed a significant increase in acts of piracy and sea robbery. The geographic focus of these attacks has since shifted from more ‘traditional’ piracy locations such as the Straits of Malacca, Gulf of Aden and the South China Sea, to the waters off the coast of Western Africa. With the implementation of enhanced security measures, including the establishment of an Internationally Recommended Transit Corridor (IRTC) through the Gulf of Aden and the employment of vessel protection detachments (VPDs) by various shipping companies, there has since been a reduction in the number of successful pirate attacks in these high-risk areas. However, as demonstrated by the 2011 kidnapping and murder of a German citizen off the coast of the Philippines, these incidents have in turn led to a geographical shift toward lower security risk areas.
1.2. Research Objectives
The core of this research lies in the development of predictive analytics models which will provide advanced warning of when and where security incidents are likely to occur. By achieving this, our intention is to facilitate the allocation of patrol and inspection assets and thus deter and prevent the incident from occurring. Our overarching goal is to develop a decision support framework which will allow decision makers to identify and implement the most cost-effective security policies to reduce risk to acceptable levels. We will measure the feasibility of achieving this through a series of case studies with our project partners within the Asia Pacific region. The models and tools developed for this research will have application beyond anti-terrorism security into other maritime security threats such as piracy, stowaways, and theft of or damage to ships and port infrastructure.
1.3. Scope and Limitations
Maritime security threats are on the rise. Threats are becoming more complex and multifaceted, thus increasingly difficult to identify and manage. Currently, the UN International Maritime Organization’s (IMO) International Ship and Port Facility Security (ISPS) code provides the current overarching framework for the management of maritime security threats. In accordance with ISPS codes, maritime security threats are defined as “any action or event that could disrupt the smooth operation of international trade and shipping transportation through acts detrimental to the safety, security, and environment, thus requiring the timely response of law enforcement agencies”. Although the ISPS code provides a basic guideline of what constitutes a security threat, there still remains a lack of clear definition as to what constitutes an act of security terrorism. This lack of clear definition makes it difficult for individual state governments to devise and enforce policies aimed at the management of maritime security threats.
The lack of clear definitions results in the blurred judgment of events at sea. Terrorist acts on land can be clearly defined as they are conducted in a public environment, thus making it clear whether an act is a terrorist act or not. However, acts at sea are conducted in a far more discreet environment and may take form in the many different facets of maritime security threats, making it unclear as to whether a specific act constitutes a terrorist act or is just a common criminal act. With terrorism being the calculated use of violence or threat of violence to inculcate fear, it is often the case that criminals have acted with the intent of terrorism without actually being classified as terrorists. This further adds to the complex nature of maritime security threats and terrorism.
2. Methodology
Reported crimes at sea were used as data for this paper. This is consistent with previous efforts to model the spatial distribution of crime. Criminal behaviour at sea has been investigated by Nance (1988), Levine (1992), and more recently by Dowlatabadi and Blight (2002), but these studies were not specifically concerned with piracy and Dowlatabadi and Blight (2002) relied on secondary data. Collecting data on crime poses a challenge as many incidents will not be reported, people’s behaviour is inherently difficult to observe, and data are often kept by private organisations or state agencies and may be sensitive to release. Data collection in the maritime environment is made still more difficult by the sheer size of the area and the small number of potential observers in comparison to events occurring. Despite these challenges, data specifically regarding crime at sea was obtained and entered into a digital database. Various sources existed including reports from international organisations, reports by local mariners, insurance statistics, and information in newspapers. The database is continually updated as new information comes to light. An example of the data is given in Table 1. With improvements in data availability it is hoped that future research in this area can more confidently address the spatial distribution of crime at sea, and perhaps even assess temporal trends.
2.1. Data Collection
Now we move on to the data collection. The primary source of data for this particular research is attack data. Incidents of maritime attack were selected from the Global Terrorism Database (GTD). Use of the GTD has several advantages over other data sources such as the shipping incident data from the International Maritime Organization and International Maritime Bureau. The GTD has a wider focus on terrorism and thus provides more detailed information on terrorism events. The coding of terrorism events makes it easier to eliminate incidents that are not relevant to maritime terrorism. One disadvantage is that the GTD does not specifically code an event as a maritime attack. To overcome this problem we identified cases that involved an attack against a specific type of vessel: oil tanker, container ship, chemical tanker, etc., and cases that involved an attack against a port. A spreadsheet was then compiled of all the cases, in particular the longitude and latitude points for each event. The spreadsheet was submitted to the National Geospatial-Intelligence Agency (NGA), which provided a service to locate all these points and identify attacks within the data. This information was then used to create terrorism attack points with overlays of the global shipping network and hotspot maps.
2.2. Data Processing and Analysis
Once in the Palantir software, all data will effectively be categorized and its relationships mapped to other types of data. This is a simple process for the GIS (Geographical Information System) data which will be translated to effectively map new risk algorithms. All other data will be translated using various tactics to effectively create uniformity between all types of data. This process allows for the building of an event-based risk model to immediately measure new shipping activities against historical incidents and its relative risk.
Therefore, in order to test the viability of using big data analytics, all data on shipping and illegal activities will be integrated into the Palantir data fusion platform. This software effectively fuses and analyses data in a user-friendly interface by mapping all relationships between types of data. Data fusion is the process of integration of multiple data and knowledge representing the same real-world object into a unified representation. This is important for this assessment as there are many different types of behavior and activities which need to be assessed on their overall risk to maritime security.
Data processing and analysis of all collected data is primarily carried out in a software program known as AIS (Automatic Identification System). This program is designed for vessel tracking by collecting and displaying via a graphically presented interface. However, most AIS data is encoded and less than optimal for direct analysis. Data of this nature being highly complex and in a raw form is not suitable for analysis using standard statistical techniques.
2.3. Statistical Models and Algorithms
Modern statistical methods and theories on inference and prediction have gone beyond assuming that the data is a realization of an underlying stationary process, and have attempted to develop models which are able to identify and predict patterns of events as a non-stationary process. This is done through the development of spatio-temporal models and Bayesian models. Spatio-temporal models are statistical models in which the outcome is explained variation in time and/or space, which are often represented as a lattice or grid, and are used to examine complex interactions between location and changes in time on the risk of a certain event occurring. Simulation studies have indicated that using these types of models can provide a more accurate representation of the real word situation, and are able to detect changes in the environment. A study by Vigezzi examines maritime piracy as a non-stationary process, and implements a Bayesian hierarchical model considering the changes in the risk of piracy over space and time, and is able to provide useful information for risk assessment and decision making from historical data.
Statistical models are utilized to predict the likelihood of a specific type of incident occurring in a given area, which is calculated by dividing the number of that specific type of incident by the total number of incidents in the area. Previous encounters with incidents or attacks are used to assess the probability of another incident occurring, which is based on the concept that history repeats itself. The final results are typically displayed as a predicted number of incidents for an area, or as a probability surface (ESRI) which identify areas of higher risk. This type of analysis is quite useful in providing a broad overview of areas which may be at risk, however these methods do not take into account dynamic changes in the environment, and often provided with only a general idea of the level of risk of the area being investigated.
3. Predicting Maritime Security Threats
The first step in predicting maritime security threats is to identify what are the key factors that may influence the particular type of security threat or incident. This step involves sorting through potential variables and identifying which ones are likely to have the greatest impact on the security event that is of interest. It may be likely that some of these variables are spatio-temporal, so in this case it is necessary to make use of geographical information systems (GIS) and spatio-temporal data. An example of this process would be work done by the United States Coast Guard after the 9/11 attacks. The Coast Guard identified 93 different factors that could contribute to a terrorism incident occurring. These factors were used to develop a strategy for port security planning and risk management.
At a very basic level, prediction involves finding the relationship between an outcome we are interested in and other factors that might influence this outcome. Once we have an understanding of these relationships, we are then able to forecast when given certain conditions the outcome is likely to occur. Probability is another aspect of prediction, where there is an understanding that some kind of incident is a likely outcome, so it is necessary to be able to predict when and where this incident is likely to occur.
Predicting the future is always a tricky business. In the field of maritime security, though there are always new and emerging threats, it is essential to be able to forecast where and when these types of incidents or attacks are likely to occur. Being able to predict when and where a maritime security threat is likely to occur enables resources to be focused and allocated efficiently. Big data analytics has great potential in providing predictive tools that are able to forecast emerging maritime security threats.
3.1. Identification of Key Variables
Over the recent years, the increasing importance and understanding of predictive analytics has seen the application migrating from the large corporate and government agencies to smaller businesses. Predictive analytics can be used in a wide variety of settings, for example an in-house project to increase the knowledge about the business and its capabilities, to a systematic analysis to aid decision making in areas like healthcare. Techniques and methodologies used in predictive analytics are varying but can generally be classified as techniques that use patterns in the data to predict the future or techniques that identify the variable relationships. Big data is an example of a methodology that uses patterned data to make a prediction whereas Bayesian techniques use the relationships between variables to give a probability of success in a future prediction. Data mining can be used in a similar way to identify predictors between a result and an outcome. Predictive analytics is very familiar to the scientific method, assimilating evidence to see if a theory can be supported. The use of predictive analytics is now being recognized by the maritime security community, as an effective method of preventing future incidents through risk identification and the understanding of what causes it to happen. For example, piracy has been defined as an act of robbery or criminal violence at sea, it can be likened to a bank robbery and the variables underneath the crime identify the reasons and the risk. By knowing where an act of piracy is probable and by understanding the variables that lead to the crime, measures can be implemented to prevent the act happening using the knowledge of what is likely to occur. By understanding how far out to sea and the likelihood of an attack in different locations, it has been decided that an act of piracy by Somali pirates is less likely at the current time in the Gulf of Guinea. By predicting and understanding the act of piracy in Nigeria, this can help to prevent possible attacks.
3.2. Application of Machine Learning Techniques
The role machine learning can play in the implementation of predictive models can be a varied one. Machine learning is a continually evolving field and as such the maritime security community has only just begun to scratch the surface in terms of the applications relevant to the field. A simple application of ML may be through the automation of a manual process, for example in a study focusing on the behaviour of small high-speed boats, decision trees were used to automate the process of differentiating between behaviour observed by the boat and its actual intent. Clustering techniques can be used to identify hidden patterns within maritime data, an example of this would be in a study addressing maritime pollution in which SOM was used to cluster regions of the sea according to similarity of their environmental characteristics. Perhaps the most widespread implementation of ML is through the use of predictive models, accurate or not, the abundance of techniques for supervised learning means there is a model for almost any type of data.
An example of this can be seen in a recent NATO study which attempted to predict the occurrence of acts of piracy and armed robbery off the coast of Nigeria. Compiled incident data was used as the dependent variable and several independent variables relating to the socio-economic and political situation in the region were included. The model was to be used as a tool for tactical planning and as an early warning system, results were communicated to stakeholders within the region so they could take steps to prepare. Although the project was not a resounding success, it did identify shortcomings in the prepared data and it’s hoped that a more refined version of the model will be able to produce better results. A similar model from a study in 2004 used to predict the likelihood of conflict between two states was quite successful, with the area under the ROC curve at 0.85 it provided evidence that theory and rigorous methodology can be applied successfully to the analysis and modelling of actual geopolitical phenomena. A simple possible model may be to predict the locations at which a vessel is most at risk of a collision, the implementation of such a model would be invaluable to the maritime industry and if effective the results could be used to influence vessel routing.
3.3. Evaluation and Validation of Predictive Models
As decision making tools, predictive models need to be both conceptually and operationally validated to ensure they produce accurate representations of the reality the stakeholder is seeking to model. Conceptual validation assesses how well a model represents the reality of interest. Often, this involves consulting with area or subject matter experts to ensure the model includes all the key variables and causal relationships. A dynamic simulation model predicting the likelihood of piracy attacks in the Malacca Strait, for example, would need to be extensively validated with regional security experts from governments and the shipping industry. Operational validation focuses on building the model right. It is a continuous process that compares the output of the model with that of the reality. For predictive models using historical data (which is the case for the work presented in this paper), an effective method is to split the data into two subsets. One subset is used to build the model (estimation data), and the other is used to test the model (validation data). With estimation data, it is possible to compare the predictions with what actually happened to the variables of interest. This helps to detect changes that need to be made to the model so that it better represents the reality. Finally, when there are consistent predictions over a number of different scenarios and the model has correctly represented probability and/or causal relationships, a useful model can be said to have been built.
Decision making tools in big data analytics often involve the construction of a predictive model that allows stakeholders to play out “what if?” scenarios. Predictive modelling is a mathematical process wherein a reality is forecasted or predicted. Probability is used to rate each potential consequence, and the risk is a function of the probability to the consequence. Predictive models can be constructed using various statistical and machine learning methods. These range from traditional multivariate regression models to data mining and artificial intelligence methods. The model developed may be quite simple, for example, a risk score, which could be used to rank order maritime locations from the most risky to the least risky. The model may also be more complex, such as a dynamic simulation model that predicts the most likely sequence of events leading to a security threat and the potential effectiveness of various risk mitigation strategies. Dynamic models are particularly powerful in that they allow stakeholders to test how changing conditions might affect the level and distribution of maritime security threats. Effectively, the models are an extension of the data. They can be used to assess the effectiveness of a policy and to identify and act on emerging problems before they become critical.
3.4. Case Studies and Examples
The case studies provided by Ebert et al. were conducted with the support of NATO and ISAF in Afghanistan with the aim of providing a model and a practical tool for civilian and military actors in the target regions. The analysis on the maritime security environment in Indonesia, Malaysia, and Singapore with the particular aim of recommending policies and cooperative activities as means to strengthen safety and security in the Straits of Malacca and Indonesia is another possible component of a case study. With the existing supporting data and quantitative indicators, this would be an ideal location for a future project aimed at predictive risk analysis and could well be the example for which the research is seeking.
In addition to the identification of key variables and development of predictive models, another key step in identifying marine areas at potential risk to maritime security threats is the development of case studies examining likely threats at the regional or local level. This involves what Duane and his colleagues term as “red teaming,” which is an active process of alternative analysis intended to improve decision makers’ perspectives. This perhaps represents the most novel and underdeveloped area of the current research as there are very few examples of the identification of risk “hotspots” in a predictive sense, though the work by Ebert and his colleagues on the Comprehensive Approach to Civil Military Cooperation provides an interesting example of how this might be achieved.
4. Identifying Risk Hotspots
An example of this might be analysis of the relationships between different types of cargo traffic and the locations of piracy incidents. The most efficient means by which to infer the effect of cargo traffic on the likelihood of a piracy incident occurring is to establish the directional effect of different types of cargo traffic on the incidence of piracy, leading to the calculation of an expected value. This expected value can then be tested against the actual rates of piracy occurrence through the use of regression analysis.
The first approach is likely the simpler of the two, and is likely the method used by most policy analysts who are not specifically trained to work with spatial data. On the other hand, the optimal method for those trained in mapping or geography, or specifically attempting to identify spatial patterns, is the direct incorporation of the spatial data into the model. In other words, the spatial data should be collected and analyzed independently, and the evaluation of the relationships between parameters should occur through spatial interaction models.
The first method involves assigning a value to each parameter being modeled, making some measure of the relationships between those parameters, and drawing conclusions based on those measures. For example, analysts might examine maritime traffic patterns, assign values to potential threats, and draw conclusions on the risk of a terrorist attack. With those conclusions, they would create a model of the country or region, and place symbols on the map to represent the results of their conclusions.
There are two general approaches to modeling spatial data. The first approach is to take some set of parameters, draw some conclusions, and then create a spatial representation of those conclusions. The second approach is to incorporate the spatial component directly into the model.
4.1. Spatial Analysis and Visualization
In a comprehensive study on violent and organized crime in Newark, New Jersey, analysis was conducted to compare the differences in results when applying traditional regression analysis to analysis using spatial regression and GIS. Through traditional regression analysis, the compiled model explained a modest 20% of the variance. In comparison, the spatial regression model using GIS explained a far higher 60% of the variance, with results directly indicating specific streets and addresses where crime was most prevalent.
Typically, big data analysis identifies higher-level trends and patterns in the form of statistical probabilities. However, translating these results into information that decision makers can use to target resources has at times proven difficult. Through the visualization of analyzed data using GIS, policymakers and security practitioners can see evidence-based results of high-risk areas, which they can easily translate into action. The use of GIS in predictive modeling also allows for continuous monitoring and updating of analysis, as new data can be continuously integrated to improve results.
With the ever-increasing amounts of data available of varied types and from multiple sources, the use of techniques such as Geographic Information Systems (GIS) has become of increasing my research edu birdie thesis writer importance in the field of predictive big data analysis. Originally used in the visualization of physical and human geography, GIS tools and techniques such as spatial autocorrelation have been applied to the visualization and analysis of big data in order to uncover meaningful patterns and relationships.
4.2. Identification of High-Risk Areas
In the maritime security domain, predicting where a threat is likely to occur is just as important as predicting whether a threat will occur. Even more, policy makers have an imperative to be seen as doing the right thing to protect citizens from harm. For example, in the aftermath of the 9/11 terrorist attacks in the United States, the US Government was heavily criticized for its decision to wage war in Iraq. At that time and since, there has been much debate about whether the decision to attack Iraq was the best course of action to prevent future harm to US interests. With respect to maritime security, military action is generally considered a costly last resort in response to an identified and imminent threat. It is preferable to use some form of law enforcement action to intercept potential wrongdoers en route to their intended target. Therefore, a proactive strategy that focuses on crime prevention and disruption is most desirable. High risk areas are places where an act of wrongdoing is likely to be committed, or the nature of the act is very damaging. Identifying these areas and the factors that make them high risk is essential for predictive modelling. Finally, decision makers must consider the opportunity cost. If resources devoted to prevention in one area could be more effectively applied to prevent a greater harm elsewhere, then the original area is a lower priority.
4.3. Integration with Existing Security Measures
An example provided by NATO Centre for Maritime Research and Experimentation illustrates how decision support tools can aid in designing counter-piracy operations. Analysts use past event data to build a model of the pirates’ attack process as a basis to how they can prevent attacks. The decision support tool uses a space-time planner to compare the projected movement of the pirates’ armed small boats to mini speedboat. The tool then plans various operations targeting the pirates that occur at different points in time. The predicted impact of each operation is evaluated by how it changes the pirates’ planned movement to a less aggressive one. At the end of the simulation, the best operations to prevent attacks are selected. This data then acts as a guide for naval forces to intervene only when values of likelihood and consequences for pirate attacks are raised above a certain threshold. Simulation of the potential interventions plays back and builds an understanding of what are the best methods for force projection to prevent incidents.
The concept of real-time risk assessment and identification using data analytics has the capacity to revolutionize risk management in the maritime security domain. Developing the capacity to foresee the potential for security risks and threats in more specific terms than has been done to date provides a new tool with which governments, agencies, and industry can assess their security posture and allocate resources commensurate with the threat. The ability to mark an area of increased risk in the near to mid-term future enables security measures to be both flexible and targeted, which is an improvement from the current situation where security measures in the maritime domain are often static and spread too thinly across too many assets and vessel movements. Simulation modelling was noted as a method to validate the analytic products. A simulation model would enable an agency or organisation to test various security postures and compare the cost effectiveness of each in reducing risk to an acceptable level. In this way, decision makers can determine the best method to employ security measures in a cost-effective manner.
4.4. Policy Implications and Recommendations
The research findings detailed in the risk analysis case studies also bear directly upon the policy and practice of risk reduction in specific areas. The ability to identify specific categories of risk and the drivers underlying them provides a framework for targeting resources at the most acute risks or at chronic problems that require lasting solutions. This is highly pertinent to the case of the Malacca Straits where differing perceptions of threat have led to uncoordinated unilateral actions which may have unintended consequences. The ability to understand and simulate the effects of risk mitigation measures may help to coordinate a collective response. This research will be of value to national and regional security agencies, as well as non-state actors with interests in specific maritime areas.
The policy implications of this predictive risk analysis are broad and far-reaching. The concept of risk assessment is already utilized by government intelligence agencies, but with varying degrees of sophistication. Implementation of predictive analysis based on comprehensive heterogeneous data may provide new insights and priorities, and lead to a reallocation of resources. Although this hypothesis has not been tested, the potential would be to avoid conflicts before they emerge through the identification of critical pre-conditions in high-risk areas, or through the identification of risk migration towards previously low-risk areas. Predictive analysis might also be used to monitor the effectiveness of policies in preventing maritime security threats.

Published by
Thesis
View all posts