Cory Doctorow: Past Performance is Not Indicative of Future Results
In “Full Employment“, my July 2020 column, I wrote, “I am an AI skeptic. I am baffled by anyone who isn’t. I don’t see any path from continuous improvements to the (admittedly impressive) ‘machine learning’ field that leads to a general AI any more than I can see a path from continuous improvements in horse-breeding that leads to an internal combustion engine.”
Today, I’d like to expand on that. Let’s talk about what machine learning is: it’s a statistical inference tool. That means that it analyzes training data to uncover correlations between different phenomena. Your phone observes that every time you type “hey,” you usually follow it with “darling” and it learns to autosuggest this the next time you type “hey.” It’s not sorcery, it’s “magic” – in the sense of being a parlor trick, something that seems baffling until you learn the underlying method, whereupon it becomes banal.
Automating the detection of statistical correlates is useful! Two eyes, a nose and a mouth are correlated with a face with a very high (albeit imperfect) degree of reliability. Likewise for the features that predict a car or a cat, a helicopter or an AR-15. Certain visible features are a good predictor of skin cancer, and certain waveforms reliably correspond to written words. Machine learning has bequeathed us a wealth of automation tools that operate with high degrees of reliability to classify and act on data acquired from the real world.
But that’s not the whole story. Machine learning is theory-free: it doesn’t know about mouths and eyes and noses – it knows that it had labelled training data that identified certain geometrical forms as representative of a face. That’s why we get those amusing stories about doorbell cameras that hallucinate faces in melting snow and page their owners to warn them about lurking strangers. Anyone who’s ever stared at clouds knows there are plenty of face-like elements of our real world, and no statistical picture of “face-ness” is a perfect substitute for understanding what a face actually is.
The problems of theory-free statistical inference go far beyond hallucinating faces in the snow. Anyone who’s ever taken a basic stats course knows that “correlation isn’t causation.” For example, maybe the reason cops find more crime in Black neighborhoods because they harass Black people more with pretextual stops and searches that give them the basis to unfairly charge them, a process that leads to many unjust guilty pleas because the system is rigged to railroad people into pleading guilty rather than fighting charges.
Understanding that relationship requires “thick description” – an anthropologist’s term for paying close attention to the qualitative experience of the subjects of a data-set. Clifford Geertz’s classic essay of the same name talks about the time he witnessed one of his subjects wink at the other, and he wasn’t able to determine whether it was flirtation, aggression, a tic, or dust in the eye. The only way to find out was to go and talk to both people and uncover the qualitative, internal, uncomputable parts of the experience.
Quantitative disciplines are notorious for incinerating the qualitative elements on the basis that they can’t be subjected to mathematical analysis. What’s left behind is a quantitative residue of dubious value… but at least you can do math with it. It’s the statistical equivalent to looking for your keys under a streetlight because it’s too dark where you dropped them.
This is not a good way to solve problems. In August, a group of physicists made headlines when they designed a model to predict the spread of the novel coronavirus at Michigan’s Albion College. The physicists made a bunch of unwise remarks about a) how easy epidemiology was compared to physics; and b) how effective their model would be at suppressing the spread of the disease, limiting the total case-count to not more than 100, and that was the worst-case scenario.
Naturally, the number of cases shot up over 700 in a matter of days and the campus had to shut down. The model accounted perfectly for all the quantitative elements and discarded the qualitative ones, like the possibility that students might get drunk and attend eyeball-licking parties.
Machine learning operates on quantitative elements of a system, and quantizes or discards any qualitative elements. And because it is theory-free – that is, because it has no understanding of the causal relationships between the correlates it identifies – it can’t know when it’s making a mistake.
The role this deficit plays in magnifying bias has been well-theorized and well-publicized by this point: feed a hiring algorithm the resumes of previously successful candidates and you will end up hiring people who look exactly like the people you’ve hired all along; do the same thing with a credit-assessment system and you’ll freeze out the same people who have historically faced financial discrimination; try it with risk-assessment for bail and you’ll lock up the same people you’ve always slammed in jail before trial. The only difference is that it happens faster, and with a veneer of empirical facewash that provides plausible deniability for those who benefit from discrimination.
But there’s another important point to make here – the same point I made in “Full Employment” in July 2020: there is no path of continuous, incremental improvement in statistical inference that yields understanding and synthesis of the sort we think of when we say “artificial intelligence.” Being able to calculate that Inputs a, b, c… z add up to Outcome X with a probability of 75% still won’t tell you if arrest data is racist, whether students will get drunk and breathe on each other, or whether a wink is flirtation of grit in someone’s eye.
We don’t have any consensus on what we meant by “intelligence,” but all the leading definitions include “comprehension,” and statistical inference doesn’t lead to comprehension, even if it sometimes approximates it.
Look at it this way: long before the internal combustion engine, people knew about gas expansion and understood pistons. But the tolerances needed for the controlled explosions at the heart of internal combustion are not really available to blacksmiths who practice metal-beating. The very best smith could hammer metal into something close to a piston, and maybe could refine that piston into something functional, but only by throwing away a lot of off-tolerance items. The resulting engine would be halting and unreliable, and would have no path to reliability that did not abandon metal-beating altogether in favor of processes like casting and (more importantly) machining.
Machine-learning is metal-beating. Brilliant people have done remarkable things with it. But the idea that if we just get better at statistical inference, consciousness will fall out of it is wishful thinking. It’s a premise for an SF novel, not a plan for the future.
Cory Doctorow is the author of Walkaway, Little Brother, and Information Doesn’t Want to Be Free (among many others); he is the co-owner of Boing Boing, a special consultant to the Electronic Frontier Foundation, a visiting professor of Computer Science at the Open University and an MIT Media Lab Research Affiliate.
All opinions expressed by commentators are solely their own and do not reflect the opinions of Locus.
This article and more like it in the November 2020 issue of Locus.
While you are here, please take a moment to support Locus with a one-time or recurring donation. We rely on reader donations to keep the magazine and site going, and would like to keep the site paywall free, but WE NEED YOUR FINANCIAL SUPPORT to continue quality coverage of the science fiction and fantasy field.
©Locus Magazine. Copyrighted material may not be republished without permission of LSFF.
2 thoughts on “Cory Doctorow: Past Performance is Not Indicative of Future Results”
My take on machine learning is that it is like alchemy. No one really knows how it works. Some geniuses can make it work anyway — sometimes.
And even when it works, you have a huge pile of numbers (the parameters) that encode the process. How do you debug that pile of numbers to ensure the process works all of the time?
To be completely robust, for example, AI driving needs to understand the difference between a driver waving you through and a driver giving you the finger.
So what will happen? It’s clear that machine learning works best in limited domains. AI driving for example may work just fine on a single isolated interstate lane where all of the other cars on that lane are also driven by AI. Limiting the domain works!
AGI, artificial general intelligence, by definition has no limited domain. So, yeah, we need to build the machines that will build the machines that will eventually result in AGI. The first step is indeed machine learning. We’re not breeding horses… we’re breeding machine intelligence.
> In August, a group of physicists made headlines when they designed a model to predict the spread of the novel coronavirus at Michigan’s Albion College. The physicists made a bunch of unwise remarks about a) how easy epidemiology was compared to physics; and b) how effective their model would be at suppressing the spread of the disease, limiting the total case-count to not more than 100, and that was the worst-case scenario.
Could you provide a citation for this section? Trying to find the backstory here but am not having any luck! Very curious what this group of over confident physicists at a little liberal arts college thought they knew with certainty.