So, in 2006, he started to write a blog, to explain why they were all wrong. He had to explain why artificial intelligence wouldn’t be like human intelligence. But, he realised, to explain that, he had to explain why human intelligence wasn’t exactly like rational thought — it is full of biases and shortcuts and systematic errors.
But to explain why human thought wasn’t like rational thought, he found, he had to explain what rational thought actually was — in the decision-theory sense, of making optimal decisions with the available information. And then, he found, he had to explain all the ways in which human thought differed from rational thought: he had to list all those biases and shortcuts and systematic errors.
And to explain those, he had to explain — almost everything. At one point, he wanted to write a relatively unimportant post using an analogy of AI utility functions. But in order to do that, he felt he ought to first explain a few other concepts. To explain those, he realised, he needed to explain a few more. But to explain those…
Anyway, long story short, over the following month he ended up writing about two dozen blog posts explaining evolution by natural selection. And then he could write the not-especially-central-to-his-thesis post about the utility functions.
The whole series of blog posts, which became known as the Sequences, sprawled out to a million words. It was edited down into an ebook version, Rationality, which is a more manageable 600,000 or so — though still rather longer than War and Peace, or than all three volumes of The Lord of the Rings put together.
The Sequences are the founding text for what would, later, become known as the “rationality community”, or the “rationalists”. I’ve written about them, and their AI concerns, a few times in the past; my own first book, The Rationalist’s Guide to the Galaxy, is about them. (It’s also just 80,000 words and gets most of the key ideas across, if you are pressed for time. All good bookshops!) They are a strange bunch: clever, thoughtful, sometimes paranoid; they get accused of being a cult, with the Sequences as their holy scripture and Yudkowsky as a sort of messianic figure (although I think that is not true.)
But the community is that rare thing, on the internet: a place where people can disagree and argue calmly and in good faith. And it sprang up around Yudkowsky’s great sprawling series of blog posts, because he explicitly encouraged that norm.
As well as — or as part of — trying to lay the foundations for AI safety, it tries to establish what it means to be rational, and to make us more so. In other words, the aim is to help us be “less wrong”, as the blog the Sequences are published on is called. Following decision theorists like Judea Pearl, he defined rationality in two forms.
First is epistemic rationality, establishing true beliefs about the world — maximising the extent to which my mental model of the universe, the “map” in my head, corresponds to the universe itself, the “territory” of reality. I believe that there is a door behind me, and that black holes emit Hawking radiation: those are facts about my brain. Whether there actually is a door behind me, or whether black holes really do emit Hawking radiation, are facts about reality.
Epistemic rationality is about trying to make the beliefs line up with the facts. “This correspondence between belief and reality is commonly called ‘truth,” says Yudkowsky, “and I’m happy to call it that.”
The second is instrumental rationality, choosing the course of action that is most likely, given what you know now, to bring about the things you want to achieve. If you want to achieve Goal X — wealth, world domination, world peace, preventing climate change, Oxford United winning the Champions League — what steps are most likely to achieve that? Rationality, he says, means winning. Winning at whatever you want to win at, whether it’s saving the world or making money. But winning.
I can’t begin to go into the whorls and sprawls of its reasoning: in its efforts to ground morality and rationality on Bayes’ theorem and utilitarian calculus, Rationality reminds me of one of those huge Enlightenment-era works of philosophy, by Spinoza or Leibniz or someone, that start out with some guy sitting in an armchair and end up trying to establish a Theory of Everything. But I do want to come back to Newcomb’s problem, because I think it is key.
In 1969, Nozick concluded, after much agonising, that you should take both boxes. Imagine the far side of the opaque box is transparent, he says, and that a friend of yours can see into both. Whatever Omega has done, whether the £1 million is there or not, she would be hoping that you take both: if it’s not there, you at least gain £1,000; if it is, you get a bonus £1,000 on top of your million. It’s already happened. The money is in the box, or it isn’t.
Yudkowsky, though, disagrees. The rational thing to do, he says, assuming that you would like a million pounds, is the thing that is most likely to make you a million pounds. Almost everyone who chose to one-box got a million pounds. Almost everyone who chose to two-box didn’t. Only a very clever person could convince themselves that the sensible thing to do is the option that will almost certainly cost you £1 million.
That’s why I like the book. It may address very serious, high-concept ideas — AI, transhumanism, morality, alien gods visiting the planet and playing strange parlour games with boxes — but it gets to them all in a robust, commonsense way. The moral thing to do is, usually, the thing that kills the fewest people, or makes the largest number happy. As he writes, in a wonderful section on how it seems obvious that consciousness has an effect on the world, because philosophers write books about consciousness, “You can argue clever reasons why this is not so, but you have to be clever.”
But following these robust, commonsense views to their logical conclusion often ends up in strange places. If you accept some fairly uncontroversial premises, you can end up with AIs destroying the world, brains uploaded to the cloud, galaxy-spanning civilisations of quasi-immortal demigods.
It’s been especially relevant this year. The rationalist community was well ahead of the public, and of many scientists, in spotting the dangers of Covid-19: they are used to data, to exponential curves, to reasoning under uncertainty. They, and the wider tech community, were using masks and stocking up on essential goods, even as others were saying to worry about the flu instead, or mocking tech workers for avoiding handshakes. And now, when we should be working out the risks to ourselves and our families of going home for Christmas, a bit of extra familiarity with Bayes’ theorem and probability theory would probably not be a bad idea.
In the 12 years or so since the Sequences ended, Yudkowsky himself has withdrawn somewhat, working at MIRI or on other projects, occasionally cropping up on Facebook or Twitter. But they remain as a fascinating piece of writing, pulling together a thousand strands of thought into one place. And, whether or not you agree with their conclusions on AI or rationality, they have created a small, safe harbour for peaceable disagreement on the internet. That alone is a legacy worth having.
*Precocious 17-year-olds can be quite annoying.
Join the discussion
Join like minded readers that support our journalism by becoming a paid subscriber
To join the discussion in the comments, become a paid subscriber.
Join like minded readers that support our journalism, read unlimited articles and enjoy other subscriber-only benefits.
Subscribe