Reports of our death are an exaggeration Part 2
Of the division of labour between mind and body, Nietzsche & the Camden Cat, AI as a cheapest-to-deliver strategy, LibraryThing, and a better use for this technology
On being a machine
“Any sufficiently advanced technology is indistinguishable from magic.”
—Arthur C. Clarke’s third law
We are in a machine age.
We call it that because machines have proven consistently good at doing things humans are too weak, slow, inconstant or easily bored to do well: mechanical things.
But state-of-the-art machines, per Arthur C. Clarke, aren’t magic: it just seems like it, sometimes. They are a two-dimensional, simplified model of human intelligence. A proxy: a modernist simulacrum. They are a shorthand way of mimicking a limited sort of sentience, potentially useful in known environments and constrained circumstances.
Yet we have begun to model ourselves upon machines. The most dystopian part of John Cryan’s opening quote was the first part — “today, we have people doing work like robots” — because it describes a stupid present reality. We have persuaded ourselves that “being machine-like’’ should be our loftiest aim. But if we are in a footrace where what matters is simply strength, speed, consistency, modularity, fungibility and mundanity — humans will surely lose.
But we aren’t in that foot race. Strength, speed, consistency, fungibility and patience are the loftiest aims only where there is no suitable machine.
If you have got a suitable machine, use it: let your people do something more useful.
If you haven’t, build one.
Body and mind as metaphors
We are used to the “Turing machine” as a metaphor for “mind” but it is a bad metaphor. It is unambitious. It does not do justice to the human mind.
Perhaps we could invert it. We might instead use “body” — in that dishonourably dualist, Cartesian sense — as a metaphor for a Turing machine, and “mind” for natural human intelligence. “Mind” and “body” in this sense, are a practical guiding principle for the division of labour between human and machine: what goes to “body”, give to a machine: motor skills; temperature regulation; the pulmonary system; digestion; aspiration — the conscious mind has no business there. There is little it can add. It only gets in the way. There is compelling evidence that when the conscious mind takes over motor skills, things go to hell.1
But leave interpersonal relationships, communication, perception, construction, decision-making in times of uncertainty, imagination and creation to the mind. Leave the machines out of this. They will only bugger it up. Let them report, by all means. Let them assist: triage the “conscious act” to hive off the mechanical tasks on which it depends.2 Let the machines loose on those mechanical tasks. Let them provide, on request, the information the conscious mind needs to make its build its models and make its plans, but do not let them intermediate that plan.
The challenge is not to automate indiscriminately, but judiciously. To optimise, so humans are set free of tasks they are not good at, and thereby not diverted from their valuable work by formal processes better suited to a machine. This can’t really be done by rote.
Here, “machine” carries a wider meaning than “computer”. It encompasses any formalised, preconfigured process. A playbook is a machine. A policy battery. An approval process.
AI overreach
Nor should we be misdirected by the “magic” of sufficiently advanced technology, like artificial intelligence, to look too far ahead.
We take one look at the output of an AI art generator and conclude the highest human intellectual achievements are under siege. However good humans artists may be, they cannot compete with the massively parallel power of LLMs, which can generate billions of images some of which, by accident, will be transcendentally great art.
Not only does reducing art to its “Bayesian priors” like this stunningly miss the point about art, but it suggests those who would deploy artificial intelligence have their priorities dead wrong. There is no shortage of sublime human expression: quite the opposite. The internet is awash with “content”: there is already an order of magnitude more content to consume than our collected ears and eyes have the capacity to take in. And, here: have some more.
We don’t need more content. What we do need is dross management and needle-from-haystack extraction. This is stuff machines ought to be really good at. Why don’t we point the machines at that?
There are plenty of easy, dreary, mechanical applications to which machines might profitably be put: remembering where you put the car keys, weeding out fake news, managing browser cookies, or simply curating the great corpus of human creation, rather than ripping it off.
Digression: Nietzsche, Blake and the Camden Cat
The Birth of Tragedy sold 625 copies in six years; the three parts of Thus Spoke Zarathustra sold fewer than a hundred copies each. Not until it was too late did his works finally reach a few decisive ears, including Edvard Munch, August Strindberg, and the Danish-Jewish critic Georg Brandes, whose lectures at the University of Copenhagen first introduced Nietzsche’s philosophy to a wider audience.
—The Sufferings of Nietzsche, Los Angeles Review of Books, 2018
The Bayesian priors are pretty damning. ...When Shakespeare wrote, almost all of Europeans were busy farming, and very few people attended university; few people were even literate — probably as low as about ten million people. By contrast, there are now upwards of a billion literate people in the Western sphere. What are the odds that the greatest writer would have been born in 1564?
—Sam Bankman Fried’s “sophomore college blog”
Sam Bankman-Fried had a point here, though not the one he thought.
Friedrich Nietzsche died in obscurity, as did William Blake and Emily Dickinson. They were lucky that the improbability engine worked its magic for them, even if not in their lifetimes.
But how many undiscovered Nietzsches, Blakes and Dickinsons are there, now sedimented into unreachably deep strata of the human canon?
How many living artists are currently ploughing an under-appreciated furrow, stampeding towards an obscurity a large language model might save them from, cursing their own immaculate “Bayesian priors”?
(I know of at least one: the Camden Cat, who for thirty years has plied his trade with a beat-up acoustic guitar on the Northern Line, and once wrote and recorded one of the great rockabilly singles of all time. It remains bafflingly unacknowledged. Here it is, on SoundCloud.)
Digression over.
If AI is a cheapest-to-deliver strategy, you’re doing it wrong
Cheapest-to-deliver
/ˈʧiːpɪst tuː dɪˈlɪvə/ (adj.)
Of the range of possible ways of discharging your contractual obligation to the letter, the one that will cost you the least and irritate your customer the most should you choose it.
Imagine having personal large language models at our disposal that could pattern-match against our individual reading and listening histories, our engineered prompts, our instructions and the recommendations of like-minded readers.
Our LLM would search through the billions of existing books, plays, films, recordings and artworks, known and unknown that comprise the human oeuvre but, instead of making its own mashups, it would retrieve existing works that its patterns said would specifically appeal to us?
This is not just the Spotify recommendation algorithm, as occasionally delightful as that is. Any commercial algorithm has its own primary goal to maximise revenue. A certain amount of “customer delight” might be a by-product, but only as far as it intersects with that primary commercial goal. As long as customers are just delighted enough to keep listening, the algorithm doesn’t care how delighted they are.3
Commercial algorithms need only follow a cheapest-to-deliver strategy: they “satisfice”. Being targeted at optimising revenue, they converge upon what is likely to be popular, because that is easier to find. Why scan ocean deeps for human content when you can skim the top and keep the punters happy enough?
This, by the way, has been the tragic commons of the collaborative internet: despite Chris Anderson’s forecast in 2006, that universal interconnectedness would change economics for the better4 — that, suddenly, it would be costless to service the long tail of global demand, prompting some kind of explosion in cultural diversity — the exact opposite has happened.5 The overriding imperatives of scale have obliterated the subtle appeal of diversity, while sudden, unprecedented global interconnectedness has had the system effect of homogenising demand.
This is the counter-intuitive effect of a “cheapest-to-deliver” strategy: while it has become ever easier to target the “fat head”, the long tail has grown thinner.6 As the tail contracts, the commercial imperative to target the lowest common denominators inflates. This is a highly undesirable feedback loop. It will homogenise us. We will become less diverse. We will become more fragile. We will resemble machines. We are not good at being machines.
Shouldn’t we be more ambitious about what artificial intelligence could do for us? Isn’t “giving you the bare minimum you’ll take to keep stringing you along” a bit underwhelming? Isn’t using it to “dumb us down” a bit, well, dumb?
Digression: Darwin’s profligate idea
The theory of evolution by natural selection really is magic: it gives a comprehensive account of the origin of life that reduces to a mindless, repetitive process that we can state in a short sentence:
In a population of organisms with individual traits whose offspring inherit those traits only with random variations, those having traits most suited to the prevailing environment will best survive and reproduce over time.
The economy of design in this process is staggering. The economy of effort in execution is not. Evolution is tremendously wasteful. Not just in how it does adapt, but in how often it does not. For every serendipitous mutation, there are millions and millions of duds.
The chain of adaptations from amino acids to Lennon & McCartney may have billions of links in it, but that is a model of parsimony compared with the number of adaptations that didn’t go anywhere — that arced off into one of design space’s gazillion dead ends and fizzled out.
Evolution isn’t directed — that is its very super-power — so it fumbles blindly around, fizzing and sparking, and only a vanishingly small proportion of the mutations it generates ever do something useful and those that do, do so accidentally. Evolution is a random, stochastic process. It depends on aeons of time and burns unimaginable resources. Evolution solves problems by brute force.
Even though it came about through undirected evolution, “natural” mammalian intelligence, whose apogee is homo sapiens, is directed. In a way that DNA cannot, humans can hypothesise, remember, learn, and rule out plainly stupid ideas without having to go through the motions of trying them.
All mammals can do this to a degree; even retrievers.7 Humans happen to be particularly good at it. It took three and a half billion years to get from amino acid to the wheel, but only 6,000 to get from the wheel to the Nvidia RTX 4090 GPU.
Now. Large language models are, like evolution, a “brute-force”, undirected method. They can get better by chomping yet more data, faster, in more parallel instances, with batteries of server farms in air-cooled warehouses full of lightning-fast multi-core graphics processors. But this is already starting to get harder. We are bumping up against computational limits, as Moore’s law conks out, and environmental consequences, as the planet does.
For the longest time, “computing power” has been the cheap, efficient option. That is ceasing to be true. More processing is not a zero-cost option. We will start to see the opportunity cost of devoting all these resources to something that, at the moment, creates diverting sophomore mashups we don’t need.8
We have, lying unused around us, petabytes of human ingenuity, voluntarily donated into the indifferent maw of the internet. We are not lacking content. Surely the best way of using these brilliant new machines is to harness what we already have. The one thing homo sapiens doesn’t need is more unremarkable information.
LibraryThing as a model
So, how about using AI to better exploit our existing natural intelligence, rather than imitating or superseding it? Could we, instead, create system effects to extend the long tail?
It isn’t hard to imagine how this might work. A rudimentary version exists in LibraryThing’s recommendation engine. It isn’t new or wildly clever — as far as I know, LibraryThing doesn’t use AI — each user lists, by ISBN, the books in her personal library. The LibraryThing algorithm will tell her with some degree of confidence, based on combined metadata, whether it thinks she will like any other book. Most powerfully, it will compare all the virtual “libraries” on the site and return the most similar user libraries to hers. The attraction of this is not the books she has in common, but the ones she doesn’t.
Browsing doppelganger libraries is like wandering around a library of books you have never read, but which is designed specifically to appeal to just you.
Note how this role — seeking out delightful new human creativity — satisfies our criteria for the division of labour in that it is quite beyond the capability of any group of humans to do it, and it would not devalue, much less usurp genuine human intellectual capacity. Rather, it would empower it.
Note also the system effect it would have: if we held out hope that algorithms were pushing us down the long tail of human creativity, and not shepherding people towards its monetisable head, this would incentivise us all to create more unique and idiosyncratic things.
It also would have the system effect of distributing wealth and information — that is, strength, not power — down the curve of human diversity, rather than concentrating it at the top.
Bayesian priors and the canon of ChatGPT
The “Bayesian priors” argument which fails for Shakespeare also fails for a large language model.
Just as most of the intellectual energy needed to render a text into the three-dimensional metaphorical universe we know as King Lear comes from the surrounding cultural milieu, so it does with the output of an LLM. The source, after all, is entirely drawn from the human canon. A model trained only on randomly assembled ASCII characters would return only randomly assembled ASCII characters.
But what if the material is not random? What if the model augments its training data with its own output? Might that create an apocalyptic feedback loop, whereby LLMs bootstrap themselves into some kind of hyper-intelligent super-language, beyond mortal cognitive capacity, whence the machines might dominate human discourse?
Are we inadvertently seeding Skynet?
Just look at what happened with Alpha Go. It didn’t require any human training data: it learned by playing millions of games against itself. Programmers just fed it the rules, switched it on and, with indecent brevity, it worked everything out and walloped the game’s reigning grandmaster.
Could LLMs do that? This fear is not new:
And to this end they built themselves a stupendous super-computer which was so amazingly intelligent that even before its databanks had been connected up it had started from “I think, therefore I am” and got as far as deducing the existence of rice pudding and income tax before anyone managed to turn it off.
But brute-forcing outcomes in fully bounded, zero-sum environments with simple, fixed rules — in the jargon of complexity theory, a “tame” environment — is what machines are designed to do. We should not be surprised that they are good at this, nor that humans are bad at it. This is exactly where we would expect a Turing machine to excel.
By contrast, LLMs must operate in complex, “wicked” environments. Here conditions are unbounded, ambiguous, inchoate and impermanent. This is where humans excel. Here, the whole environment, and everything in it, continually changes. The components interact with each other in non-linear ways. The landscape dances. Imagination here is an advantage: brute-force mathematical computation won’t do.
Think how hard physics would be if particles could think.
— Murray Gell-Mann
An LLM works by compositing a synthetic output from a massive database of pre-existing text. It must pattern-match against well-formed human language. Degrading its training data with its own output will progressively degrade its output. Such “model collapse” is an observed effect.9 LLMs will only work for humans if they’re fed human-generated content. Alpha Go is different.
I see a difference between large language models with Alpha Go learning to play super human Go through self-play.
When Alpha Go adds one of its own self-vs-self games to its training database, it is adding a genuine game. The rules are followed. One side wins. The winning side did something right.
Perhaps the standard of play is low. One side makes some bad moves, the other side makes a fatal blunder, the first side pounces and wins. I was surprised that they got training through self-play to work; in the earlier stages the player who wins is only playing a little better than the player who loses and it is hard to work out what to learn. But the truth of Go is present in the games and not diluted beyond recovery.
But an LLM is playing a post-modern game of intertextuality. It doesn’t know that there is a world beyond language to which language sometimes refers. Is what an LLM writes true or false? It is unaware of either possibility. If its own output is added to the training data, that creates a fascinating dynamic. But where does it go? Without Alpha Go’s crutch of the “truth” of which player won the game according to the hard-coded rules, I think the dynamics have no anchorage in reality and would drift, first into surrealism and then psychosis.
One sees that Alpha Go is copying the moves that it was trained on and an LLM is also copying the moves that it was trained on and that these two things are not the same.10
There is another contributor to the cultural milieu surrounding any text: the reader. It is the reader, and her “cultural baggage”, who must make head and tail of the text. She alone determines, for her own case, whether it stands or falls. This is true however rich is the cultural milieu that supports the text. We know this because the overture from Tristan und Isolde can reduce different listeners to tears of joy or boredom. One contrarian can see, in the Camden Cat, a true inheritor of the great blues pioneers, others might see an unremarkable busker.
Construing natural language, much less visuals or sound, is no matter of mere symbol-processing. Humans are not Turing machines. A text only sparks meaning, and becomes art, in the reader’s head. This is just as true of magic — the conjurer’s skill is to misdirect the audience into imagining something that isn’t there. The audience supplies the magic.
The same goes for an LLM — it is simply digital magic. We imbue what an LLM generates with meaning. We are doing the heavy lifting.
Coda
“Und wenn du lange in einen Abgrund blickst, blickt der Abgrund auch in dich hinein”
—Nietzsche
Man, this got out of control.
So is this just Desperate-Dan, last-stand pattern-matching from an obsolete model, staring forlornly into the abyss? Being told to accept his obsolescence is an occupational hazard for the JC, so no change there.
But if this really is the time that is different, something about it feels underwhelming. If this is the hill we die on, we’ve let ourselves down.
Don’t be suckered by parlour tricks. Don’t redraw our success criteria to suit the machines. To reconfigure how we judge each other to make it easier for technology to do it at scale is not to be obsolete. It is to surrender.
Humans can’t help doing their own sort of pattern-matching. There are common literary tropes where our creations overwhelm us — Frankenstein, 2001: A Space Odyssey, Blade Runner, Terminator, Jurassic Park and The Matrix. They are cautionary tales. They are deep in the cultural weft, and we are inclined to see them everywhere. The actual quotidian progress of technology has a habit of confounding science fiction and being a bit more boring.
LLMs will certainly change things, but we’re not fit for battery juice just yet.
Buck up, friends: there’s work to do.
This is the premise of Daniel Kahneman’s Thinking, Fast and Slow, and for that matter, Matthew Syed’s Bounce.
Julian Jaynes has a magnificent passage in his book The Origins of Consciousness in the Breakdown of the Bicameral Mind where he steps through all the aspects of consciousness that we assume are conscious, but which are not. “Consciousness is a much smaller part of our mental life than we are conscious of, because we cannot be conscious of what we are not conscious of. How simple that is to say; how difficult to appreciate!”
As with the JC’s school exam grades: anything more than 51% is wasted effort. Try as he might, the JC was never able to persuade his dear old Mutti about this.
It is called “cultural convergence”.
Anita Elberse’s Blockbusters is excellent on this point.
Actually come to think of it Lucille, bless her beautiful soul, didn’t seem to do that very often. But still.
Remember Arthur C. Clarke’s law here. The parallel processing power an LLM requires is already massive. It may be that the cost of expanding it in the way envisioned would be unfeasibly huge — in which case the original “business case” for technological redundancy falls away. See also the simulation hypothesis: it may be that the most efficient way of simulating the universe with sufficient granularity to support the simulation hypothesis is to actually build and run a universe in which case, the hypothesis fails.
https://www.techtarget.com/whatis/feature/Model-collapse-explained-How-synthetic-training-data-breaks-AI
This is from an excellent post by user:felis-parenthesis on Reddit.