Monday, January 11, 2010

Too much information... drowning in a sea of data... swamped by "false positives"

When too much information is collected and when there is no rationale or organizing purpose for collecting the data, bad things can happen. This is the problem of "false positives' that you encounter in elementary statistics that is fundamentally misunderstood by the broader public.

Examples include: mammography in 40-50 year old women with no relevant history. So much counter-terrorism data, including drone data (see article in NYT), is collected that no one can connect the dots. Fmri brain scans collect zillions of bits in data in fishing expeditions with no clear hypotheses

What do all of these examples have in common? Computerized imaging and other systems "overcollect" data and imperil our ability to sort and process MEANINGFUL information. We are drowning in a flood of data....The alternative, as Matt Yglesias puts it, is that we will be "swamped with false positives."

Jim Arkedis did an excellent post last week based on his work as an intelligence analyst walking you through how difficult it is is actually “connect the dots” and find a bad guy. The mathematical fact underlying the problem is, as I’ve emphasized, that since only a tiny number of people are al-Qaeda operatives, anything you do is going to be swamped with false positives.



No comments:

Post a Comment