Is there a dynamic argument for expected utility maximisation?
Why should you maximize expected utility? A well-known answer – discussed, for example, in McClennen (1990), Cubitt (1996), and Gustafsson (2022) – goes as follows.
- von Neumann and Morgenstern (1944) have shown that you if you don't maximize expected utility, then your preferences violate certain conditions or "axioms".
- If your preferences violate these axioms, then you are disposed to make sequences of choices that are guaranteed to leave you worse off than what you could otherwise have achieved.
I'm glossing over lots of interesting subtleties here. What I'm interested in is whether one can construct an argument along these lines for expected utility maximization as a general rule of decision making, without making substantive assumptions about the agent's goals.
Let me explain.
It is plausible, I think, that there is a fairly general model of how beliefs and desires relate to choices. There aren't different rules for how one should act depending on what one desires, with nothing to unify the different rules: if you only desire your future pleasure, use rule X; if you desire money, use rule Y; if you like coffee, use rule Z; and so on.
The expected utility rule can be understood as a (fairly) general rule about the connection between (graded) beliefs, (graded) desires, and choice. On this understanding, an agent's utility function represents her desires, and we want to impose no substantive constraints on its shape or domain. We don't want to assume, for example, that utility is a function of material goods.
Now return to the question with which I began. Why should you maximize expected utility – no matter what you ultimately care about?
In this context, both steps in the above argument become problematic.
The first step ("if you don't maximize expected utility then you violate the axioms") presupposes that your utility function is derived in a certain way from your preferences, which are assumed to relate "outcomes" – the ultimate bearers of utility – and arbitrary lotteries involving outcomes. If we don't impose constraints on what you may care about, all of these lotteries may become ill-defined (as explained, for example, in Dreier (1996)). There is no good reason to think that you have non-trivial preferences involving such lotteries.
If you don't have these preferences then you technically violate the axioms. But the violation needn't show up in the problematic choices appealed to in the second step of the argument. ("If you violate the axioms then you are disposed to behave badly.")
More generally, the second step of the argument appeals to sequential decision problems in which the same "outcome" occurs at multiple points in a decision tree. Such decision problems may well be impossible, given that the outcomes must specify everything you care about. What if the things you care about narrow down what kinds of choices you have made?
In sum: as it stands, the two-step argument is not a plausible argument for expected utility maximization as a general decision rule.
But perhaps the argument can be fixed.
The idea is to take for granted that there is a general decision rule – a general model of the connection between (graded) belief, desire, and choice. We can test a candidate rule by looking at what it predicts for agents with such-and-such specific desires. If the rule goes wrong for these agents, we can infer that it is not plausible as a general rule.
This methodology is implicitly accepted by almost everyone in the debate over Newcomb's Problem. The decision-maker in Newcomb's Problem is stipulated to only care about getting as much money as possible. Those of us who are convinced by the arguments for (say) two-boxing don't merely conclude that people who only care about money shouldn't follow EDT. We take for granted that if EDT is wrong for people who only care about money, then it is wrong for everyone. We test a general decision rule by looking at a particular scenario involving an agent with particular desires. Since EDT fails this test, we reject EDT as a general rule.
Can we take the same attitude towards the sequential choice argument for expected utility maximization?
The argument would go something like this.
Consider an agent who (like the decision maker in Newcomb's problem) only cares about accumulating as much money as possible. Why should this agent maximize expected utility? Well, suppose the agent follows a different rule. We can then construct a decision problem in which the agent incurs an avoidable monetary loss. Intuitively, a rule that leads to avoidable monetary loss is not a good rule for promoting the sole aim of accumulating as much money as possible. The decision problem therefore serves as a test that allows us to exclude the alternative decision rule. Ideally, we can construct such a test for every alternative to the expected utility rule, and we can't do it for the expected utility rule.
The crucial point is that we are free to stipulate, in this argument, that the agent only cares about money or other local "outcomes" that make the relevant lotteries and sequential decision problems well-defined.
OK, but how exactly is the argument supposed to go? The standard sequential choice arguments don't assume that the agent violates the expected utility rule. Rather, they assume that the agent's preferences violate the von Neumann and Morgenstern axioms. If we stipulate (say) that an agent's utility function is linear in money, and that the agent's ranking of lotteries violates the expected utility rule, can we infer that the agent's preferences violate the vNM axioms? This is far from obvious. My hunch is that we can't.
Perhaps we shouldn't fix the agent's utility function by stipulation. Instead, we might stipulate that (a) the agent is indifferent between any two worlds that agree with respect to the agent's wealth, and that (b) whenever the agent's wealth is greater in one world than in another, then the agent prefers the first world over the second. This means that the agent's "intrinsic utility" function can be construed as a function of money.
Now, suppose our agent has preferences over monetary lotteries, and these preferences satisfy the von Neumann and Morgenstern axioms. It follows from von Neumann and Morgenstern's representation theorem that there is a (monotonic) utility function U that assigns numbers to monetary outcomes such that the agent ranks lotteries by their expected U-value. Is this function U a correct representation of the agent's desires?
Arguably, yes. Von Neumann's method of measuring utility doesn't work in general, but for our simplistic agent it is OK. Or so one might hope.
Let's assume this is correct. To recapitulate, we then know that if our simplistic agent's preferences satisfy the von Neumann and Morgenstern axioms, then she ranks lotteries by their expected utility. By contraposition: if our agent doesn't rank lotteries by expected utility, then her preferences don't satisfy the axioms.
And now we can run the sequential choice arguments to show that failing to satisfy the axioms is bad idea, for our simplistic agent: the agent will be disposed to incur an avoidable financial loss.
Since failing to satisfy the axioms is a bad idea, and failing to rank lotteries by their expected utility entails failing to satisfy the axioms, we can infer that our agent should rank lotteries by their expected utility.
At this point, we're still stuck with monetary lotteries. The general expected utility rule that we want to defend isn't a rule about lotteries. It isn't a rule for "decision-making under risk", and it only entails such a rule for agents who happen not to care about certain non-local things like fairness or integrity or predictability.
I can only see an abductive argument that takes us beyond lotteries.
Suppose some rule R is the correct general decision rule. In particular, then, R is the correct rule by which our simplistic agent should evaluate monetary lotteries. But we have established (or so we hope) that our simplistic agent should evaluate monetary lotteries by the EU rule. The rule R must therefore coincide with the EU rule when it comes to the evaluation of monetary lotteries by simplistic agents. This is enough to rule out a lot of prominent alternatives to the EU rule. It obviously doesn't show that the R rule is the EU rule. But the EU rule is plausibly the simplest rule that passes this test.
There are a lot of details here that would need to be filled in. Has anyone tried to develop this line of thought?
Apparently (eg 10.1111/tops.12506),
'classical rationality poses a computationally intractable problem; that is, there
exists no general tractable (polynomial-time) process that, for any given situation, selects
an action with maximum expected utility...this setback has been used to argue
against CR and to motivate alternative approaches to rationality.'
People have whinged about expected utility as mixing normativity and description. In the latter case, it is not clear to me whether having a more concave or convex utility curve is more or less rational for lotteries where one has a nonzero probability of dying...viz Harsanyi utilitarianism.