Mike Titelbaum on Shifting and Sleeping Beauty
In the last entry, I have suggested that
EEP) P_2(A) = P_1(+A|+E)
is a sensible rule for updating self-locating beliefs. Here, E is the total evidence received at time 2 (the time of P_2), and '+' denotes a function that shifts the evaluation index of propositions, much like 'in 5 minutes': '+A' is true at a centered world w iff A is true at the next point from w where new information is received. (EEP) therefore says that upon learning E, your new credence in any proposition A should equal your previous conditional credence that A will obtain at the next point when information comes in, given that this information is E.
(EEP) belongs to a family of rules that one might call 'shifting rules'. The general pattern is
P_2(A) = P_1(f(A) | f(E)),
with f shifting the evaluation index of the embedded proposition.
Mike Titelbaum also advocates a shifting rule, but without a fixed shifting functions f. Instead, different functions are used for different cases. For each case, we first have to find a context-insensitive term 'Z' such that the agent is certain at time 2 that 'Z is now' is true. Then we apply the rule
M) P_2(A) = P_1(A at Z | E at Z).
For example, suppose I fall asleep during a talk at 2:30. When I wake up, I see that it is 2:50. My new credence in any proposition A should then equal my 2:30 credence in 'A at 2:50' conditional on 'E at 2:50', where E is all the information I receive at 2:50. Thus if I notice that the talk hasn't finished yet, my new credence in the talk going over time should equal my old conditional credence in it going over time given that it won't have finished by 2:50.
(I should mention that the way I present Mike's proposal is different from the way he himself presents it in his paper. For present purpose, the differences don't matter.)
In this example, Mike's rule (M) and my rule (EEP) yield the same result. Not so in other cases.
Let's look at Sleeping Beauty. In order to easily apply (M), we assume that Beauty is confident that if she is awakened on both Monday and Tuesday, then her awakenings will not be subjectively indistinguishable: perhaps a dog will bark on Monday and not on Tuesday, or perhaps the light will be slightly dimmer on one of the days, etc. (Of course she doesn't know in advance on which day the dog will bark or the light will be dimmer.)
Let E be her total evidence when she awakens on Monday, and let Z = 'the time when my total evidence is E'. Beauty is certain that Z is now, so we can use (M) to get
P_2(Heads) = P_1(Heads at Z | E at Z).
Our assumption ensures that there can't be two times when the total evidence is E. But what if there is none? That is, should we regard 'E at Z' as true or as false at worlds where Z never obtains? The answer is 'as false'. Otherwise P_2 won't be a probability function, since P_1(A at Z | E at Z) will be high for every A whatsoever in cases where P_1(at some point Z) is low.
So 'E at Z' is false at any world where Z never occurs. Likewise, 'Heads at Z' is true only at worlds where the coin lands heads and Z occurs at some point. So we have
P_2(Heads) = P_1(Heads & at some point Z | at some point Z),
which reduces to
P_2(Heads) = P_1(Heads | at some point Z).
Recall that Z = 'the time when my total evidence is E'. And E is Beauty's total evidence on Monday, including her perception of the lab and the barking dog, her memories of the setup, the absence of any memories from later than Sunday, and so on. On Sunday, Beauty was confident that if Z occurs at all, then it must occur either on Heads-Monday, Tails-Monday or Tails-Tuesday. But she wasn't certain that Z would occur. For all she knew, the coin could have landed heads and she wouldn't hear a dog upon awakening on Monday. Or the coin could have landed tails and she wouldn't hear a dog on either awakening. Since there are two occasions where Z might occur on Tails, but only one on Heads, the probability for Tails given that Z occurs at some point is higher than the probability for Heads given that Z occurs at some point. By our instance of (M), P_2(Heads) is therefore less than P_1(Heads). And given that P_1(Heads) = 1/2, P_2(Heads) will most likely be 1/3.
Mike's rule (M) leads to thirding, my rule (EEP) to halfing. Whence the difference?
The reason is that my shifting operator '+' never shifts the evaluation index into the distant future. If +A holds at some point, then A must hold at the very next point where any information arrives. By contrast, Mike's evaluation point Z may well occur long after time 1, and after lots of other things have been learned. Sleeping Beauty, for instance, assigns substantial credence to possibilities where Z occurs on Tuesday, two days away.
How can we decide between far-reaching shifting operators such as Mike's 'at Z' and short-reaching operators such as my '+'? We won't find an answer by staring at Sleeping Beauty. But the difference shows up in other cases as well.
Consider theory T which holds that we will be subject to every humanly possible experience exactly once in our lifetime. (T probably entails that we live forever.) Let E be an experience that you might have tomorrow morning. All T possibilities, but only some ~T possibilities, are such that E will be experienced at some point. So when tomorrow you experience E, your new probability for T according to (M) should equal your previous probability for T conditional on the assumption that E is experienced at some point. So E raises the probability of T. The same obviously applies to any other experience you will ever have. According to (M), your confidence in T should steadily rise. (EEP), on the other hand, discards occurrences of E in the distant future and therefore avoids this conclusion. It seems to me that (EEP) gets it right.
Apart from delivering the intuitively correct result in cases like this, short-reaching shifting functions are also supported by theoretical considerations. To see why, note that shifting rules are only plausible if the agent does not receive relevant information at times in between time 1 and time 2 (the times of P_1 and P_2). If you receive strong evidence for A at some intermediate time, then your probability for A at time 2 should be sensitive to this evidence. Mike tries to avoid this problem by aggregating evidence over times: in his model, the evidence at time 2 effectively consists of everything that is certain at time 2, no matter when it was first learned. But this doesn't always help: centered evidence can affect the probability of uncentered propositions without making any of them certain. Since the centered evidence itself quickly becomes false, there is no guarantee that we can afterwards reconstrue it from propositions that are then certain.
For example, consider Beauty's credence in Heads on Wednesday, when she knows that the experiment is over. No matter what happened earlier, she will at this point have memories of the setup and of exactly one post-Sunday awakening. Let E contain everything of which she is now (on Wednesday) certain. Setting time 1 = Sunday and time 2 = Wednesday, (M) says that her new credence in Heads should equal her Sunday credence in Heads conditional on E obtaining on Wednesday. Since E contains no clues about the outcome of the coin toss, this is 1/2. On the other hand, setting time 1 = Monday and time 2 = Wednesday, her new credence in Heads should also equal her Monday credence in Heads conditional on E obtaining on Wednesday, which is 1/3. So if we allow time 1 and time 2 to be separated by arbitrary intervals, we get contradictory results.
Suppose then that shifting rules are restricted to cases where no relevant information is obtained between time 1 and time 2. (This is at any rate a restriction I want for (EEP).) Then it is clear why the shifting function ought to shift no further than to the next point where information comes in: this makes optimal use of the centered information contained in the old probabilities. Far-reaching shifting functions share the problems of inner-world indifference principles that recommend always distributing one's credence evenly among all possibilities within a universe that are compatible with the present evidence. Such principles cause severe and unnecessary information loss: if you know that somewhere in a distant galaxy there are lots of brains in a vat all of which believe that it is Saturday, then you may be certain that you live on Earth today (on Friday); but you must give up this belief tomorrow, even if you have learned nothing that would undermine the belief. Far-reaching shifting rules at least restrict themselves to possibilities in the future of the relevant subject -- ignoring possibilities in the past or possibilities in the lifetime of other people. But the problem remains for possibilities in the distant future.
So far, I have assumed that if the coin lands tails, then Beauty's Tuesday awakening follows her Monday awakening. But, as I mentioned in the last posting, I think the story might be better understood as a case of branching, where both the Monday and the Tuesday awakening directly follow Beauty's Sunday state. In this case, Tails-Tuesday is just as close as Tails-Monday, and my complaint about reaching too far in the future doesn't apply. What does (M) say about this version?
As before, let E be Beauty's total evidence on Monday and Z = 'the time when my total evidence is E'. According to (M),
P_2(Heads) = P_1(Heads at Z | E at Z).
How shall we read 'E at Z' if the future contains one branch with E and another without? Let's say it is true. In general, let's read 'X at Z' as true iff there is at least one branch with X & Z. (I think this is how Mike effectively reads it. And it doesn't matter because any reading leads to trouble.) Then we get
P_2(Heads) = P_1(Heads | at some point on some branch Z).
Since the Tails possibilities have two branches and thereby two opportunities for Z, and the Heads possibilities only one, the probability for 'at some point on some branch Z' is higher given Tails than given Heads. So P_2(Heads) < P_1(Heads), and under plausible assumptions we again get P_2(Heads) = 1/3.
Is this a reasonable treatment of branching? I think not. Suppose you accept the 'many-worlds' interpretation of quantum mechanics on which a branching occurs at every coin toss. Suppose you also know that a certain coin is biased 100:1, but you don't know whether its bias is towards heads or tails. You toss it and it lands heads. The biased-towards-heads hypothesis entails that the coin will land heads on many branches (or on branches with high amplitude); the biased towards-tails hypothesis entails that it will land heads on few branches (or on branches with low amplitude). Either way, the coin is predicted to land heads on some branches. Hence according to (M), P_2(biased-towards-heads) = P_1(biased-towards-heads | at some point on some branch Heads) = P_1(biased-towards-heads). The coin toss leaves you as undecided about the bias as you were before. This seems wrong.
Branching cases are tricky, and I think a satisfactory treatment requires leaving the framework of simple shifting rules altogether.
I have complained that (M) gives the wrong result in far-reaching cases and in branching cases. I would also complain that it gives no result at all in other cases: it falls silent if there is no context-insensitive term 'Z' for which the agent is certain that 'Z is now' is true. In Sleeping Beauty, this happens if there is a positive probability for the Tails-Monday experience being indistinguishable from the Tails-Tuesday experience. It seems to me that rational agents should always assign positive credence to worlds where some point in the future is indistinguishable from the present, so strictly speaking, (M) is never applicable to rational agents.
Here is one thing I'm puzzled about. (M), like (EEP), is an external, diachronic rule: it says what an agent's later credence should be given their earlier credence and their new evidence. It does not say what the later credence should be given certain beliefs about the earlier credence and the new evidence. For agents who always remember their earlier credence, the external rules can however be recovered from the corresponding expert principles
(EP) P_2(A | P_1(+A|+E) = x & E) = x
and
(M*) P_2(A | P_1(A at Z|E at Z) = x & E & Z is now) = x.
My reasons for prefering (EEP) over (M) carry over to (EP) versus (M*). And I do think that (EP) is a plausible constraint on rational credence. Nevertheless, I would like to defend (EEP) even in cases where it cannot be recovered from (EP). In discussion, Mike has called this 'crazy'. But isn't (M) just as external and diachronic as (EP)? Perhaps Mike wants to restrict it to cases where it can be recovered from (M*), in which case he doesn't really want to offer a diachronic update rule, but a synchronic expert rule? Not sure. Anyway, this question is probably orthogonal to the choice of shifting rules.
Very interesting !
I'm glad to see that David Lewis has a good heir.
I'm not glad to see that thirders often neglect halfist writers. Bradley works and White paper are known, but Jenkins, Leslie, Franceschi, Meacham, Bostrom...
Sorry for my bad English :)
A French Bostrom advocate (and it's not easy !)