Viewing a single comment thread. View all comments

onion OP wrote

The Threat of the Status Quo

Two practical issues loom over the future of targeting and effective, focused U.S. national security actions: data overload and targeter enablement.

The New Stovepipes

Since the 9/11 Commission Report, intelligence “stovepipes” became part of the American lexicon and reflected bureaucratic turf wars and politics. Information wasn’t shared between agencies that could have increased the probability that the attack could have been detected and prevented. Today, volumes of information are shared between agencies, exponentially more per month is collected and shared than what was in the months before 9/11. Ten years ago, a targeter pursuing a high value target (HVT) – say the leader of a terrorist group – couldn’t find, let alone analyze, all of the information of potential value to the manhunt. Too much poorly organized data means the targeter cannot possibly conduct a thorough analysis at the speed the mission demands. Details are missed, opportunities lost, patterns misidentified, mistakes made. The disorganization and walling off of data for security purposes means new stovepipes have appeared, not between agencies, but between datasets-often within the same agency. As the data volume grows, these challenges have also grown.

Authors have been writing about the issue of data overload in the national security space for years now. Unfortunately, progress to manage the issue or offer workable solutions has been modest, at best. Data of a variety of types and formats, structured and unstructured, flows into USG repositories every hour; 24/7/365. Every year it grows exponentially. In the very near future, there should be little doubt, the USG will collect against foreign 5G, IoT, advanced satellite internet, and adversary databases in the terabyte, petabyte, exabyte, or larger realm. The ingestion, processing, parsing, and sensemaking challenges of these data loads will be like nothing anyone has ever faced before.

Let’s illustrate the issue with a notional comparison.

The U.S. military in 2008, raided an al-Qa’ida safehouse in Iraq and recovered a laptop with a 1GB hard drive. The data on the hard drive was passed to a targeter for analysis. It contained a variety of documents, photos, and video. It took several hours and the help of a linguist, but the targeter was able to identify several leads and items of interest that would advance the fight against al-Qa’ida.

The Afghan Government in 2017, raided an al-Qa’ida media house and recovered over 40TB of data. The data on the hard drives was passed to a targeter for analysis. It contained a variety of documents, photos, and video. Let’s be nice to our targeter and say, only a quarter of the 40TB is video – that’s still as much as 5,000 hours. That’s 208 days of around-the-clock video review and she still hasn’t been able to review the documents, audio, or photos. Obviously, this workload is impossible given the pace of her mission, so she’s not going to do that. Her and her team only look for a handful of specific documents and largely discard the rest.

Let’s say the National Security Agency in 2025, collected 1.4 petabytes of leaked Chinese Government emails and attachments. Our targeter and all of her teammates could easily spend the rest of their careers reviewing the data using current methods and tools.

In real life, the raid on Usama Bin Ladin’s compound produced over 250GB of material. It took an interagency task force in 2011 many months to manually comb through the data and identify material of interest. These examples shed light on only a subset of data overload. Keep in mind, this DOCEX is only one source our targeter has to review to get a full picture of her target and network. She’s also looking through all of the potentially relevant collected HUMINT, SIGINT, IMINT, OSINT, etc. that could be related to her target. That’s many more datasets, often stovepipes within stovepipes, with the same outmoded tools and methods.

This leads us to our second problem, human enablement.

2

onion OP wrote

The Collapsing Emergent System

Much of our targeter’s workday is spent on information extraction and organization, the vast majority of which is, well, robot work. She’ll be repeating manual tasks for most of the day. She knows what she needs to investigate today to continue building her target or network profile. Today it’s a name and a phone number. She has a time consuming, tedious, and potentially error-prone effort ahead of her–a “swivel chair process”–tracking down the name and phone number in multiple databases using a variety of outmoded software tools. She’ll manually investigate her name and phone number in multiple stovepiped databases. She’ll map what she’s found in a network analysis tool, in an electronic document, or <wince> a pen to paper notebook. Now…finally…she will begin to use her brain. She’ll look for patterns, she’ll analyze the data temporally, she’ll find new associations and correlations, and she’ll challenge her assumptions and come to new conclusions. Too bad she spent 80% of her time doing robot work.

This is the problem as it stands today. The targeter is overwhelmed with too much unstructured and stovepiped information and does not have access to the tools required to clean, sift, sort and process massive amounts of data. And remember, the system she operates is about to receive exponentially more data. Absent change, a handful of things are almost certain to happen:

More raw data will be collected than is actually relevant, and as a result will increase the stress on infrastructure to store all of that data for future analysis. Infrastructure (technical and process related) will continue to fail to make raw data available to technologists and targeters to begin processing at a mission relevant pace. Targeters and analysts will continue to perform manual tasks that take the majority of their time, leaving little time for actual analysis and delivery of insights. The timeline from data to information, to insights, to decision making is extended exponentially as data exponentially increases. Insights as a result of correlations between millions of raw data points will be missed entirely, leading to incorrect targets being identified, missed targets or patterns, or targets with inaccurate importance being prioritized first. This may seem banal or weedy, but it should be very concerning. This system – how the United States processes the information it collects to identify and prevent threats – will not work in the very near future. The data stovepipes of the 2020s can result in a surprise or catastrophe like the institutional stovepipes of the 1990s; it won’t be a black swan. As the U.S. competes with Beijing, its national defense will require more speed, not less, against more data than ever before. It will require evaluating data and making connections and correlations faster than a human can. It will require the effective processing of this mass of data to identify precision solutions that reduce the scope of intervention to achieve our goals, while minimizing harm. Our current and future national defense needs our targeter to be motivated, enabled, and effective.

2

onion OP wrote

Innovating the System

To overcome the exponential growth in data and subsequent stovepiping, the IC doesn’t need to hire armies of 20-somethings to do around-the-clock analysis in warehouses all over northern Virginia. It needs to modernize its security approach to connect these datasets, and apply a vast suite of machine learning models and other analytics to help targeters start innovating. Now. Technological innovations are also likely to lead to more engaged, productive, and energized targeters who spend their time applying their creativity and problem-solving skills, and spend less time doing robot work. We can’t afford to lose any more trained and experienced targeters to this rapidly fatiguing system.

The current system as discussed, is one of unvalidated data collection and mass storage, manual loading, mostly manual review, and robotic swivel chair processes for analysis.

The system of the future breaks down data stovepipes and eliminates the manual and swivel chair robot processes of the past. The system of the future automates data triage, so users can readily identify datasets of interest for deep manual research. It automates data processing, cleaning, correlations and target profiling – clustering information around a potential identity. It helps targeters identify patterns and suggests areas for future research.

How do current and emerging analytic and ML techniques bring us to the system of the future and better enable our targeter? Here are four ideas to start with:

Automated Data Triage: As data is fed into the system, a variety of analytics and ML pipelines are applied. A typical exploratory data analysis (EDA) report is produced (data size, file types, temporal analysis, etc.). Additionally, analytics ingest, clean and standardize the data. ML and other approaches identify languages, set aside likely irrelevant information, summarize topics and themes, and identify named entities, phone numbers, email addresses, etc. This first step aids in validating data need, enables an improved search capability, and sets a new foundation for additional analytics and ML approaches. There are seemingly countless examples across the U.S. national security space. Automated Correlation: Output from numerous data streams is brought into an abstraction layer and prepped for next generation analytics. Automated correlation is applied across a variety of variables: potential name matches, facial recognition and biometric clustering, phone number and email matches, temporal associations, and locations. Target Profiling: Network, Spatial, and Temporal Analytics: As the information is clustered, our targeter now sees associations pulled together by the computer. The robot, leveraging its computational speed along with machine learning for rapid comparison and correlation, has replaced the swivel chair process. Our targeter is now investigating associations, validating the profile, refining the target’s pattern-of-life. She is coming to conclusions about the target faster and more effectively and is bringing more value to the mission. She’s also providing feedback to the system, helping to refine its results. AI Driven Trend and Pattern Analysis: Unsupervised ML approaches can help identify new patterns and trends that may not fit into the current framing of the problem. These insights can challenge groupthink, identify new threats early, and find insights that our targeters may not even know to look for. Learning User Behavior: Our new system shouldn’t just enable our targeter, it should learn from her. Applying ML behind the scenes that monitors our targeter can help drive incremental improvements of the system. What does she click on? Did she validate or refute a machine correlation? Why didn’t she explore a dataset that may have had value to her investigation and analysis? The system should learn and adapt to her behavior to better support her. Her tools should highlight where data may be that could have value to her work. It should also help train new hires. Let’s be clear, we’re far from the Laplace’s demon of HBO’s “Westworld” or FX’s “Devs”: there is no super machine that will replace the talented and dedicated folks that make up the targeting cadre. Targeters will remain critical to evaluating and validating these results, doing deep research, and applying their human creativity and problem solving. The national security space hires brilliant and highly educated personnel to tackle these problems, let’s challenge and inspire them, not relegate them to the swivel chair processes of the past.

We need a new system to handle the data avalanche and support the next generation. Advanced computing, analytics, and applied machine learning will be critical to efficient data collection, successful data exploitation, and automated triage, correlation, and pattern identification. It’s time for a new chapter in how we ingest, process, and evaluate intelligence information. Let’s move forward.

2