Are recalcitrant worlds less probable?

The Best-Systems Account of chance promises to explain why beliefs about chance should affect our beliefs about ordinary events, as formalized by the Principal Principle. In this post, I want to discuss a challenge to any such explanation.

First, some background.

For any candidate chance function f, let [f] be the set of worlds of which f is (part of) the best system. According to the Best-Systems Account (BSA), the hypothesis "Ch=f" that f is the true chance function expresses the proposition [f]. In what follows, I'll assume that a world is simply a history of "outcomes", and that the candidate systems can be compressed into a single (possibly parameterized) chance function.

In essence, the Principal Principle then says that for any rational prior credence Cr and history h, Cr(h | Ch=f) = f(h). The BSA promises to explain this link because it implies that Ch=f contains valuable information about the history: If Ch=f is true, the actual history must lie in [f]. On the best-systems interpretation, the Principle says that Cr(h | [f]) = f(h).

We note an immediate problem: f generally assigns positive probability to histories outside [f]. But Cr(h | [f]) is obviously 0 for any h outside [f]. This is the well-known "undermining problem". In response, Hall (1994) and Lewis (1994) suggested that we should reformulate the Principal Principle to say that Cr(h | [f]) = f(h | [f]). This is the "New Principle". I want to set this issue aside.

To explain why Cr(h | [f]) should equal either f(h) or f(h | [f]), we need to have a look at the histories in [f]. What do they look like?

The answer depends on how we spell out the standards for good systems. One important criterion is fit. Lewis suggests that the fit of a candidate chance function f to a history h is measured by the probability f(h) that f assigns to the history.

Another important criterion is simplicity. Without simplicity, the best systematization of a history would simply describe the exact history. If we allow the two criteria to trade off against each other, the best systematization of a sufficiently random-looking history will often be probabilistic.

In easy cases, the relative frequencies in histories among [f] will exactly match the f-chances: if they don't, there's an alternative chance function f' with greater fit, so the history belongs to [f'] rather than [f]. The only way this wouldn't happen is if the better-fitting chance function is less simple.

Suppose, for example, that a history is a sequence of coin flips in which the coins are distinguishable by their "weight". We might have a history in which there's a noisy statistical dependency between outcomes and weight: coins with greater weight tend to land heads more often. A good way to systematize such a history uses a parameterized chance function that expresses the chance of heads and tails in terms of weight. In principle, one can always find a function whose chances perfectly match the frequencies. But that function might be horrendously complicated. It's easy to imagine cases where a linear function f would win the trade-off between simplicity and fit, even though the frequencies in the history don't precisely match the f-chances.

Now let's look again at the histories in [f], where f is this kind of linearly parameterized chance function. Some histories in f have frequencies that closely match the f-chances. Call these well-behaved. Other histories in [f] have frequencies that deviate from the f-chances. Call these recalcitrant.

The recalcitrant histories would be more accurately described by a non-linear chance function. They are in [f], even though f has comparatively low fit to them, because there's no simple alternative to f with greater fit.

Let h be some well-behaved history in [f]. Let h' some recalcitrant history in [f]. Since f has greater fit to h, and fit is measured by f(h), it follows that f(h) > f(h').

Now here's the challenge.

The Principal Principle requires that Cr(h | [f]) = f(h) and Cr(h | [f']) = f(h'). We've just seen that f(h) > f(h'). So Cr(h | [f]) > Cr(h' | [f]). And so Cr(h) > Cr(h'). The unconditional prior credence Cr must favour well-behaved members of [f] over recalcitrant members.

In fact, since conditioning preserves ratios, the ratio of priors Cr(h) / Cr(h') must be exactly f(h) / f(h')!

Is there any independent justification for this assumption?

I think there is an independent justification of disfavouring recalcitrant worlds. Recalcitrant worlds are less simple, and a rational prior credence should favour simpler worlds.

One way to think about the simplicity of a history (rather than a theory) is in terms of how its best systematization depends on the weight of simplicity in the criteria for best systems. If a world is highly irregular, it calls for a complex systematization. Only complex systematizations have good fit. If we gradually relax the weight of simplicity in the criteria for best systems, the best systematization will become more and more complicated. By comparison, if a world is more regular, the best system remains best even if we relax the weight of simplicity (up to a point).

For example, if a history's frequencies in the weighted-coins case fit a 5th-order polynomial a little better than a linear relationship, then the linearly parameterized chance function f may be best if simplicity has high weight, but not if the weight is relaxed. By contrast, if a history's frequencies closely fit the linear function f, then f remains the best system for a longer time as we continuously relax the weight of simplicity.

So recalcitrant worlds are less regular. And a rational prior should arguably favour regular worlds.

It's still surprising that the justification of the Principal Principle turns on this assumption about priors.

Also, while the above reasoning may explain why Cr(h) > Cr(h'), a lot more work would have to be done to explain why the ratio Cr(h) / Cr(h') has to exactly match f(h) / f(h'). There is a strong pressure towards a Uniqueness view about rational priors.

There might be another way out. I've assumed, with Lewis, that chance functions assign probabilities to entire histories. I don't think science needs such an ambitious concept of chance.

If we make chance functions more local in scope, we first have to revisit the formulation of the Principal Principle: f(h) is generally undefined for complete histories h. We also have to revisit the undermining problem. We can't move to the New Principle, because chance functions won't be defined conditional on [f]. I think we should simply stick with an approximate version of the old Principle: something like Cr(e | [f]) ≈ f(e) for any e in the domain of f.

These are the adjustments and concessions I assume in Schwarz (2014). I still think the argument I sketched there for a derivation of this Principle from the BSA should work. It doesn't require favouring well-behaved over recalcitrant worlds.

I now think this is a good reason for friends of the BSA to assume that chance is local.

(I'm indebted to Eddy Chen for drawing my attention to the problem of recalcitrant worlds.)

Hall, Ned. 1994. “Correcting the Guide to Objective Chance.” Mind 103: 505–17.
Lewis, David. 1994. “Humean Supervenience Debugged.” Mind 103: 473–90.
Schwarz, Wolfgang. 2014. “Proving the Principal Principle.” In Chance and Temporal Asymmetry, edited by A. Wilson, 81–99. Oxford: Oxford University Press.

Comments

No comments yet.

Add a comment

Please leave these fields blank (spam trap):

No HTML please.
You can edit this comment until 30 minutes after posting.