Counterexamples to Good's Theorem
Good (1967) famously "proved" that the expected utility of an informed decision is always at least as great as the expected utility of an uninformed decision. The conclusion is clearly false. Let's have a look at the proof and its presuppositions.
Suppose you can either perform one of the acts A1…An now, or learn the answer to some question E and afterwards perform one of A1…An. Good argues that the second option is always at least as good as the first. The supposed proof goes as follows.
We assume a Savage-style formulation of decision theory. For any act Aj, the expected utility of choosing Aj is
where Si ranges over a suitable partition of states. The value of choosing Aj after learning Ek (an answer to E) is
Here we assume that the truth of Ek does not affect the value of choosing Aj in state Si, so that U(Aj ∧ Si ∧ Ek) = U(Aj ∧ Si).
Since Cr(Si) = ∑k Cr(Si/Ek) Cr(Ek), we can rewrite (1) as
If you choose between A1…An now, you will choose an act Aj that maximises EU. Thus, where 'N' stands for choosing between A1…An now,
\[\begin{align*} (4)\quad EU(N) &= \max_{j} \sum_{i}\sum_{k} Cr(S_{i}/E_{k}) Cr(E_{k}) U(A_{j} \land S_{i})\\ &= \max_{j} \sum_{k} Cr(E_{k}) EU(A_{j}/E_{k}). \end{align*} \]What if instead you choose to first learn ('L') the answer to E? Let's assume that you will afterwards choose an act that maximises your posterior expected utility. You don't know which act that is, but we can compute its expected value by averaging over the possible learning events:
Now compare (4) and (5). EU(N) is the maximum of a weighted average; EU(L) is the corresponding average of the maxima. The maximum of an average is always at least as great as the average of the maxima. QED.
As noted, the argument assumes that the answer to E is certain to make no difference to the value of performing any act Aj in any state Sj, so that U(Aj ∧ Si ∧ Ek) = U(Aj ∧ Si). The argument also assumes that if you choose L then your future self is certain to follow the standard norms of Bayesian rationality: she updates by conditionalisation and maximises expected utility. Let's grant these assumptions.
Even so, the conclusion is not always true. Four counterexamples:
Crime Novel. You have a choice between reading a crime novel and reading a biography. You prefer the crime novel because you like the suspense. Before you make your choice, you have the option of finding out who is the villain in the crime novel (by reading a plot summary), which would spoil the novel for you. After getting the information, you would rather read the biography. You rationally prefer the uninformed choice between the two books over the more informed choice. (Adapted from (Bradley and Steele 2016))
Rain. You have a choice between taking a black box and taking a white box. Before you make your choice you may look through the window and check if it is raining. A reliable predictor has put $1 into the black box and $0 into the white box iff she predicted that you would look outside the window. If she predicted that you would not look outside the window, she has put $2 into the white box and $0 into the black box. You are 50% confident that you were predicted to look outside the window. You rationally prefer the uninformed choice.
Here is why (assuming CDT). If you don't you look outside the window, you will take the white box, with expected payoff $1, compared to $0.50 for the black box. If you do look outside the window, you will become confident that you were predicted to look outside the window; as a result, you will take the black box, with expected payoff $0.50.
Middle Knowledge. Once again you have a choice between taking a black box and taking a white box. A psychologist has figured out what you are disposed to do in this kind of choice situation, where you have no evidence that favours one of the boxes over the other. She has put $1 into whichever box she thinks you would take, and $0 into the other box. Unbeknownst to the psychologist, I have observed what she put into the boxes. I slip you a piece of paper on which I claim to have written down the colour of the box with the $1. You are 90% confident that what I have written down is true. You rationally prefer not to read my note.
Here is why (assuming CDT). If you don't read my note, you can expect to get $1. If you do read my note, you will take the box whose colour is written on the note. Since there's a 90% chance that this box contains $1, the expected payoff is $0.90.
Newcomb Revelation. You are facing the standard Newcomb Problem. Before you make your choice, you have the option of looking inside the opaque box. The predictor knew that you would be given this offer, and has factored your response into her prediction. EDT says that you should reject the offer.
This is only a counterexample to Good's Theorem if we assume EDT. But like CDT, EDT can be formulated in Savage's framework. We only have to stipulate that the states in a properly formulated decision problem are probabilistically independent of the acts. In Newcomb's Problem, a suitable partition of states is { prediction accurate, prediction inaccurate }. Good's "proof" does not seem to rely on a causal construal of the states.
To be fair, one might argue that this case violates the assumption that U(Aj ∧ Si ∧ Ek) equals U(Aj ∧ Si). But this isn't the only problem. Consider equation (5) in the above proof.
In Newcomb Revelation, this says that EU(L) = Cr(full) EU(two-box/full) + Cr(empty) EU(two-box/empty), assuming that conditional on either observation, two-boxing maximises expected utility. But suppose you (as the agent in Newcomb Revelation) are convinced that you will one-box and that you will reject the offer to look inside the opaque box. So Cr(full) is close to 1. And evidently EU(2b/full) is $1M1K. By (5), the expected utility of looking inside the opaque box is therefore close to $1M1K. That's clearly wrong.
More generally, equation (5) in the above proof simply isn't an application of Savage-style decision theory. It is a hand-wavy shortcut.
(Skyrms 1990) argues that one can patch up Good's proof if one assumes that the states are causal dependency hypotheses, but his argument still looks hand-wavy to me, and the other counterexamples suggest that it is fallacious.
Let's see how far we can get if we use a suppositional formulation of decision theory.
Let { Oi } be a partition of "value-level propositions" as in (Lewis 1981). Intuitively, the members of this partition settle everything the agent ultimately cares about. In suppositional formulations of decision theory, the expected utility of an act A is given by
where CrA(Oi) is the probability of Oi on the supposition A. The relevant type of supposition might be "indicative" (yielding EDT) or "subjunctive" (yielding CDT).
Now let's evaluate the two options. First, you might choose directly between A1…An, without first learning the answer to E. This is the option we called N. We assume that your future self will choose an option that maximises expected utility, and that your basic (uncentred) desires don't change. Let's also assume that these assumptions are resilient under suppositions. Thus we can assume that on the supposition that you choose N you will afterwards choose an option from A1…An that maximises (posterior) expected utility. This suggests that the expected utility of N is
where EUN(Aj) is the expected utility of Aj computed relative to CrN. I'll return to this assumption below. Let's stick with it for now. By definition,
Since (CrN)Aj(Oi) = ∑k (CrN)Aj(Oi/Ek)(CrN)Aj(Ek), we can expand (2') into
\[\begin{align*} (3')\quad EU^{N}(A_{j}) &= \sum_{i}\sum_{k} (Cr^{N})^{A_{j}}(O_{i}/E_{k})(Cr^{N})^{A_{j}}(E_{k}) V(O_{i})\\ &= \sum_{k} (Cr^{N})^{A_{j}}(E_{k}) \sum_{i}(Cr^{N})^{A_{j}}(O_{i}/E_{k}) V(O_{i})\\ &= \sum_{k} (Cr^{N})^{A_{j}}(E_{k}) EU^{N}(A_{j}/E_{k}), \end{align*} \]where EUN(Aj/Ek) is defined as ∑i (CrN)Aj(Oi/Ek) V(Oi). Plugging this into (1'), we have
\[\begin{align*} (4')\quad EU(N) &= \max_{j} \sum_{k} (Cr^{N})^{A_{j}}(E_{k}) EU^{N}(A_{j}/E_{k})\\ &= \max_{j} \sum_{k} (Cr^{N})^{A_{j}}(E_{k}) \sum_{i}(Cr^{N})^{A_{j}}(O_{i}/E_{k}) V(O_{i}). \end{align*} \]Alternatively, you might delay your choice between A1…An until after you've learned the answer to E. To begin, we have
As before, probability theory allows expanding and rearranging:
\[\begin{align*} (6')\quad EU(L) &= \sum_{i}\sum_{k} Cr^{L}(O_{i}/E_{k}) Cr^{L}(E_{k}) V(O_{i})\\ &= \sum_{k }Cr^{L}(E_{k}) \sum_{i} Cr^{L}(O_{i}/E_{k}) V(O_{i}). \end{align*} \]∑i CrL(Oi/Ek) V(Oi) is the "desirability" of Ek from the perspective of CrL, understood as in (Jeffrey 1983). We assume that CrL is concentrated on worlds at which you are going to choose an act from A1…An that maximises expected utility after learning the true answer to E. We also assume that the answer to E is all you learn. The desirability of Ek from the perspective of CrL then equals maxj EUL(Aj / Ek), where EUL(Aj / Ek) is the expected utility of Aj computed relative to the probability function (CrL)Ek that comes from Cr by first supposing L and then conditioning on Ek. Plugging this into (6') yields
\[\begin{align*} (7')\quad EU(L) &= \sum_{k }Cr^{L}(E_{k}) \max_{j} EU^{L}(A_{j} / E_{k})\\ &= \sum_{k }Cr^{L}(E_{k}) \max_{j} \sum_{i} ((Cr^{L})_{E_{k}})^{A_{j}}(O_{i}) V(O_{i}). \end{align*} \](4') and (7') resemble (4) and (5) in Good's proof. (4') is the maximum of an average, (7') is the average of some maxima. But the subscripts and superscripts are different. To infer that L is at least as good as N, we need the following two assumptions:
\[\begin{align*} (i) \quad&(Cr^{N})^{A_{j}}(E_{k}) = Cr^{L}(E_{k}), \text{ for all }A_{j}, E_{k.}\\ (ii) \quad&(Cr^{N})^{A_{j}}(O_{i}/E_{k}) = ((Cr^{L})_{E_{k}})^{A_{j}}(O_{i}), \text{ for all }O_{i,} A_{j}, E_{k}. \end{align*} \]Continuing to use superscripts for supposition and subscripts for conditioning, we can rewrite (ii) as
In EDT, the relevant kind of supposition is conditioning, so we can simplify:
\[\begin{align*} (i_{E})\quad &Cr(E_{k} / N \land A_{j}) = Cr(E_{k} / L),\text{ for all }A_{j}, E_{k}.\\ (ii_{E})\quad &Cr(O_{i} / N \land A_{j} \land E_{k}) = Cr(O_{i} / L \land A_{j} \land E_{k}),\text{ for all }O_{i,} A_{j}, E_{k}. \end{align*} \]Condition (iE) is violated in Newcomb Revelation. Here the probability of the opaque box being empty is low conditional on one-boxing without peeking, but it is high conditional on peeking.
CDT does not allow simplifying the two conditions, at least not without further assumptions.
(i) is fairly easy to understand. It says that the probability of the various answers Ek does not "causally" depend on your choice(s). This is violated in the Rain scenario.
(ii) is hard to understand. In normal cases, however, the order of the operations will make little difference. So we can approximately paraphrase (ii) as follows:
You are as likely to get a certain amount of utility by choosing Aj after finding out Ek as by choosing Aj without finding out Ek.
(Here 'without finding out Ek' is meant to imply, as it does in English, that Ek is true.) This condition is obviously violated in the Crime Novel case.
Unfortunately, my "proof" still relies on some further assumptions, besides the assumptions of diachronic rationality.
One assumption was smuggled into (1'):
In effect, this assumes that \( EU(N) = EU(N \land \hat{A} \)), where \( \hat{A} \) is an act that maximises expected utility on the supposition that N. Without this assumption, I don't know how to get the proof off the ground. In EDT, the assumption is harmless, but in CDT it can fail. It fails in Middle Knowledge.
Another problematic assumption in both my "proof" and in Good's is that the possible propositions you might learn form a partition. To see why this matters, return to the Crime Novel scenario.
Let's construe the relevant states as somewhat course-grained "dependency hypotheses". If you plan to not learn about the plot then most of your credence goes to a state S1 in which the act of reading the crime novel would bring about a highly desirable experience while the act of reading the biography would bring about a moderately desirable experience. If Ek is a summary of the crime novel's plot, then most of your credence conditional on Ek still goes to S1. Your enjoyment depends on not knowing the villain, but not on who is the villain. So Cr(S1/Ek) is high, for all relevant Ek. After you've learned Ek, however, Cr(S1) is low. You no longer believe that reading the crime novel would be a great experience.
Since Cr(S1/Ek) is not equal to Cr(S1), finding out about the plot is not adequately modelled as conditioning on Ek. The problem is that if you find out about the plot, you not only learn Ek, but also that you know Ek. It is this knowledge (or belief) that breaks the connection between reading the crime novel and having a great experience. Conditional on knowing or believing Ek, your credence in S1 is low.
Since we want to model your learning event in terms of conditioning, we have to make sure that the propositions { Ek } include everything you might learn if you chose L. In the Crime Novel case, each member of { Ek } should specify (a) a plot and (b) that you believe that this is the plot. But then { Ek } no longer forms a partition. Every element of { Ek } now implies that you won't enjoy the novel because you think you already know the villain's identity.
There is nothing special here about the Crime Novel case. In realistic cases, the answers to E will never form a partition, if we assume that learning the answer goes by conditioning on the answer.
In, for example, the crime novel, can't we just describe it as a different act then. The act in the first instance is reading a crime novel and thus discovering a new fact in a fun way. But if the experience of reading the novels are the same, learning it would not be bad.