Inadmissible evidence in Bayesian Confirmation Theory
Suppose we are testing statistical models of some physical process -- a certain type of coin toss, say. One of the models in question holds that the probability of heads on each toss is 1/2; another holds that the probability is 1/4. We set up a long run of trials and observe about 50 percent heads. One would hope that this confirms the model according to which the probability of heads is 1/2 over the alternative.
(Subjective) Bayesian confirmation theory says that some evidence E supports some hypothesis H for some agent to the extent that the agent's rational credence C in the hypothesis is increased by the evidence, so that C(H/E) > C(H). We can now verify that observation of 500 heads strongly confirms that the coin is fair, as follows.
Let H be the hypothesis that the probability of heads on each toss is 1/2, H' the hypothesis that the probability is 1/4, and E the observation of 500 heads in 1000 trials. By Bayes' Theorem, C(H/E) = C(E/H)/C(E) C(H). By some form of the Principal Principle, C(E/H) = (1000 \choose 500) 1/2^1000. By comparison, C(E/H') = (1000 \choose 500) 1/4^500 3/4^500, which is much smaller than C(E/H). Hence C(E/H')/C(E) is also much smaller than C(E/H)/C(E) and so H is confirmed much more strongly than H' (which in fact is almost certainly disconfirmed given close C(E/H') is to zero).
So far so good. But we know that the above form of the Principal Principle is not generally valid: on the supposition that the physical probability of an event is x one's rational credence in the event should not always be x. There are at least three types of counterexample. First, an agent might have "inadmissible evidence". Second, the evidence might "undermine" the hypothesis (in the sense of "undermining futures" in Lewis 1994). Third, self-locating uncertainty might make it impossible to satisfy the Principal Principle. Let's look at the case of inadmissible evidence first.
Suppose that for each trial of our coin toss we happen to know the exact microstate so that we can predict the outcome. So if the first outcome is heads then we already knew that it would be heads; the subjective likelihood C(heads/H) is not 1/2 but 1, as is the likelihood C(heads/H') for H'. So we will never be able to confirm H over H'. Yet that's clearly wrong. All else equal, a statistical model is better the closer its probabilistic predictions fit the actual frequencies. If we have a large sample of frequencies and we trust that they are representative then we have good evidence to favour some models over others. It shouldn't matter if we also happen to have information about the underlying microstates in the sample.
So there appears to be a problem here.
In response, we might try to look at how our probability in H and H' is affected at the time when we acquire the inadmissible information about microstates. Suppose at time t1 we learned that the initial microstate on toss 1 is M1, which determinately leads to heads. If we're lucky, the hypothesis H directly attributes a physical probability to microstates (e.g., statistical mechanics). We can then use the Principal Principle to evaluate C(M1/H) and thereby C(H/M1). It's true that there will be no further boost to H once we learn about the outcomes. But that seems exactly right, since the outcomes weren't news to us.
OK, but what if (1) the hypothesis H does not explicitly assign probability to microstates? And what if (2) we only have partial information about microstates so that we still learn something new from the outcomes?
These problems points towards a (so-far undisclosed, I think) gap in the Principal Principle. Suppose microstates M1..M1000 determinately lead to heads and microstates M1001..M2000 to tails, and suppose we initially give equal credence 1/2000 to each microstate. Consider the hypothesis H' that assigns physical probability 0.25 to heads. By the Principal Principle, our credence in M1..M1000 should decrease from 0.5 to 0.25. But how should our probabilities be redistributed among the microstates? The Principal Principle doesn't say. But surely not every way to redistribute the probabilities that yields probability 0.25 for M1..M1000 is rational. So there is a further norm here that goes beyond the Principal Principle.
What is that norm? The obvious idea is that the update should take the form of Jeffrey Conditioning. That is, on the supposition that some event E has physical probability x, we should Jeffrey conditionalize on the partition [E, ~E] with weights [x, 1-x].
So what we need is this stronger (non-gappy) Principle: for any initial credence Cr, admissible A, propositions B, E, and numbers x,
(PP*) Cr(B / Ch(E)=x & A) = Cr(B/E)x + Cr(B/~E)(1-x).
For B=E, this reduces to Lewis's Principle.
With the help of (PP*), we can compute Cr(M1/H), Cr(M1/H'), Cr(M2/H), and so on, even if H does not explicitly assign probabilities to microstates.
(PP*) also helps a little with problem (2). Suppose all we know is that the microstate for toss 1 is not among M1..M500. This increases our credence in tails. We can use (PP*) to compute the likelihoods Cr(~(M1..M500)/H) and Cr(~(M1..M500)/H'): the latter will plausibly be greater than the former, so the partial information about microstates here supports H1. But now what happens when we then observe that the outcome is (say) heads? On the face of it, we can apply neither Lewis's Principal Principle nor the above revision, because we have inadmissible background information. That is, our credence comes from an initial credence by conditioning on some inadmissible A, namely ~(M1..M500).
However, we don't actually need to use another instance of (PP*). The earlier application, before we learned ~(M1..M500), fixes our initial credence Cr(Mi/H) in any microstate Mi conditional on H (or H'). We can then easily compute the posterior credence of Mi conditional on H (or H') after some microstates have been ruled out: simply redistribute the probability among the remaining microstate in proportion to their previous probability. It's just conditionalization. That gives us, among other things, Cr(M1..M1000/H & ~(M1..M500)), without another application of the Principal Principle.
So inadmissible evidence raises some interesting issues for Bayesian confirmation theory, but it looks like it does not pose any insurmountable problems.