Viewing a single comment thread. View all comments

onion OP wrote

The Collapsing Emergent System

Much of our targeter’s workday is spent on information extraction and organization, the vast majority of which is, well, robot work. She’ll be repeating manual tasks for most of the day. She knows what she needs to investigate today to continue building her target or network profile. Today it’s a name and a phone number. She has a time consuming, tedious, and potentially error-prone effort ahead of her–a “swivel chair process”–tracking down the name and phone number in multiple databases using a variety of outmoded software tools. She’ll manually investigate her name and phone number in multiple stovepiped databases. She’ll map what she’s found in a network analysis tool, in an electronic document, or <wince> a pen to paper notebook. Now…finally…she will begin to use her brain. She’ll look for patterns, she’ll analyze the data temporally, she’ll find new associations and correlations, and she’ll challenge her assumptions and come to new conclusions. Too bad she spent 80% of her time doing robot work.

This is the problem as it stands today. The targeter is overwhelmed with too much unstructured and stovepiped information and does not have access to the tools required to clean, sift, sort and process massive amounts of data. And remember, the system she operates is about to receive exponentially more data. Absent change, a handful of things are almost certain to happen:

More raw data will be collected than is actually relevant, and as a result will increase the stress on infrastructure to store all of that data for future analysis. Infrastructure (technical and process related) will continue to fail to make raw data available to technologists and targeters to begin processing at a mission relevant pace. Targeters and analysts will continue to perform manual tasks that take the majority of their time, leaving little time for actual analysis and delivery of insights. The timeline from data to information, to insights, to decision making is extended exponentially as data exponentially increases. Insights as a result of correlations between millions of raw data points will be missed entirely, leading to incorrect targets being identified, missed targets or patterns, or targets with inaccurate importance being prioritized first. This may seem banal or weedy, but it should be very concerning. This system – how the United States processes the information it collects to identify and prevent threats – will not work in the very near future. The data stovepipes of the 2020s can result in a surprise or catastrophe like the institutional stovepipes of the 1990s; it won’t be a black swan. As the U.S. competes with Beijing, its national defense will require more speed, not less, against more data than ever before. It will require evaluating data and making connections and correlations faster than a human can. It will require the effective processing of this mass of data to identify precision solutions that reduce the scope of intervention to achieve our goals, while minimizing harm. Our current and future national defense needs our targeter to be motivated, enabled, and effective.

2

onion OP wrote

Innovating the System

To overcome the exponential growth in data and subsequent stovepiping, the IC doesn’t need to hire armies of 20-somethings to do around-the-clock analysis in warehouses all over northern Virginia. It needs to modernize its security approach to connect these datasets, and apply a vast suite of machine learning models and other analytics to help targeters start innovating. Now. Technological innovations are also likely to lead to more engaged, productive, and energized targeters who spend their time applying their creativity and problem-solving skills, and spend less time doing robot work. We can’t afford to lose any more trained and experienced targeters to this rapidly fatiguing system.

The current system as discussed, is one of unvalidated data collection and mass storage, manual loading, mostly manual review, and robotic swivel chair processes for analysis.

The system of the future breaks down data stovepipes and eliminates the manual and swivel chair robot processes of the past. The system of the future automates data triage, so users can readily identify datasets of interest for deep manual research. It automates data processing, cleaning, correlations and target profiling – clustering information around a potential identity. It helps targeters identify patterns and suggests areas for future research.

How do current and emerging analytic and ML techniques bring us to the system of the future and better enable our targeter? Here are four ideas to start with:

Automated Data Triage: As data is fed into the system, a variety of analytics and ML pipelines are applied. A typical exploratory data analysis (EDA) report is produced (data size, file types, temporal analysis, etc.). Additionally, analytics ingest, clean and standardize the data. ML and other approaches identify languages, set aside likely irrelevant information, summarize topics and themes, and identify named entities, phone numbers, email addresses, etc. This first step aids in validating data need, enables an improved search capability, and sets a new foundation for additional analytics and ML approaches. There are seemingly countless examples across the U.S. national security space. Automated Correlation: Output from numerous data streams is brought into an abstraction layer and prepped for next generation analytics. Automated correlation is applied across a variety of variables: potential name matches, facial recognition and biometric clustering, phone number and email matches, temporal associations, and locations. Target Profiling: Network, Spatial, and Temporal Analytics: As the information is clustered, our targeter now sees associations pulled together by the computer. The robot, leveraging its computational speed along with machine learning for rapid comparison and correlation, has replaced the swivel chair process. Our targeter is now investigating associations, validating the profile, refining the target’s pattern-of-life. She is coming to conclusions about the target faster and more effectively and is bringing more value to the mission. She’s also providing feedback to the system, helping to refine its results. AI Driven Trend and Pattern Analysis: Unsupervised ML approaches can help identify new patterns and trends that may not fit into the current framing of the problem. These insights can challenge groupthink, identify new threats early, and find insights that our targeters may not even know to look for. Learning User Behavior: Our new system shouldn’t just enable our targeter, it should learn from her. Applying ML behind the scenes that monitors our targeter can help drive incremental improvements of the system. What does she click on? Did she validate or refute a machine correlation? Why didn’t she explore a dataset that may have had value to her investigation and analysis? The system should learn and adapt to her behavior to better support her. Her tools should highlight where data may be that could have value to her work. It should also help train new hires. Let’s be clear, we’re far from the Laplace’s demon of HBO’s “Westworld” or FX’s “Devs”: there is no super machine that will replace the talented and dedicated folks that make up the targeting cadre. Targeters will remain critical to evaluating and validating these results, doing deep research, and applying their human creativity and problem solving. The national security space hires brilliant and highly educated personnel to tackle these problems, let’s challenge and inspire them, not relegate them to the swivel chair processes of the past.

We need a new system to handle the data avalanche and support the next generation. Advanced computing, analytics, and applied machine learning will be critical to efficient data collection, successful data exploitation, and automated triage, correlation, and pattern identification. It’s time for a new chapter in how we ingest, process, and evaluate intelligence information. Let’s move forward.

2