De Finetti's theorem without symmetries?

Bruno de Finetti (de Finetti (1970)) suggested that chance is objectified credence. The suggestion is explained and defended in Jeffrey (1983, ch.12), Skyrms (1980 ch.I), Skyrms (1984, ch.3), and Diaconis and Skyrms (2017, ch.7), but I still find it hard to understand. It seems to assume that rational credence functions are symmetrical in a way in which I think they shouldn't be.

Let's imagine that the world is a countably infinite sequence of coin flips. De Finetti's Theorem states that if a probability measure P over this space is exchangeable, i.e., invariant under finite permutations, then it can be represented as a mixture of iid Bernoulli processes:

\[ P(\omega) \;=\; \int_0^1 \bigl[\text{iid Bernoulli}(\theta)\bigr](\omega)\,d\mu(\theta). \]

That is, \(P\) reasons about the sequence as if it were generated by a stable chance process. The suggestion is that when we reason about chance, we are using this representation of our credence function. We're not really reasoning about an objective physical magnitude.

I want to set aside this philosophical point; I'm confused about a more technical point: I don't think a rational credence over the sequences should be exchangeable.

At this point, Skyrms (and Jeffrey) tell us that one can replace full exchangeability by weaker invariance conditions: partial exchangeability only requires invariance under a subgroup of permutations, stationarity only requires invariance under "time shifts", Markov exchangeability only requires invariance under "2-block permutations". But I don't think a rational credence should have any of these symmetries!

Suppose the first 20 outcomes are HTHTHTHTHTHTHTHTHTHT. What do you think: is the next outcome more likely to be H or T? I think it's more likely to be H. From what we know, this sequence doesn't look random. It might be random, of course, but I'd give significant credence to the hypothesis that it keeps alternating between H and T.

By contrast, the sequence THHTHTHHTTTHHTTHTHHT could just as well continue with H or T. Since it's a permutation of the alternating sequence, exchangeability requires giving the same credence to both. But the first is more likely to continue with H than the second. So we don't have exchangeability.

Intuitively, without any further information, I'm unsure whether the events in the sequence are (a) outcomes of a stable chance process, or (b) deterministically generated, or (c) outcomes of a Markov process in which the next outcome depends on the previous outcome, and so on. My credence is a mixture of these possibilities. I think any rational credence function should be such a mixture.

Such a mixture will have no non-trivial invariance group. It's not exchangeable, or partially exchangeable, or stationary, or Markov exchangeable. So how is de Finetti's proposal supposed to get off the ground?

Here's an idea. Ignore the Markov possibility for a moment, and suppose we restrict the possible deterministic patterns to a fixed finite class \(D\). The class will include the alternating sequence HTHTHTHTHT…, the sequence THTHTHTHTHT…, the sequence HHTHHTHHTHHT…, and other sequences with a clear pattern (including perhaps the prime number pattern mentioned in Jeffrey (1983, 207) – I don't understand what Jeffrey says about it). Let \(P\) be a credence that is unsure whether the sequence is (a) a member of \(D\), or (b) the result of a Bernoulli process. \(P\) won't have any interesting symmetries, but \(P\) conditional on \(\neg D\) will.

One might even think that \(P(\cdot \mid \neg D)\) is exchangeable. But it is not, as it rules out the nicely patterned permutations. I think de Finetti's theorem still goes through, though: \(P(\cdot\mid \neg D)\) is a mixture of iid Bernoulli processes. That's because the restriction to \(\neg D\) only rules out a finite set of possibilities that gets measure 0 anyway.

So perhaps friends of de Finetti could say this. A rational prior credence function itself may not have any interesting symmetries. But conditional on \(\neg D\), it probably will. And then we can represent \(P\) as undecided between a range of deterministic hypotheses and a range of chance hypotheses.

Can we add the Markov possibility back into the mix? I suppose we can. Technically, an iid process is a special case of a Markov process. But if we start with a noninformative prior over the parameters of a Markov process, the iid hypothesis will have measure 0. So we should really treat the iid hypothesis as a separate part of \(P\). Still, de Finetti's theorem for Markov exchangeable measures should tell us how to determine the two parts of \(P(\cdot \mid \neg D)\), assuming we have Markov exchangeability.

I've assumed that there's a hard cut-off between the random and the deterministic possibilities. This may seem problematic. But I'm not assuming that \(P\) treats them as on a par. Realistically, the "deterministic part" of \(P\) should give high credence to simple patterns and increasingly low credence to complex patterns. The above reasoning goes through as long as there are only finitely many deterministic patterns. I suspect one can fix the reasoning to not rely on this requirement. But even so, we can allow for extremely complicated deterministic patterns to which the deterministic part of \(P\) assigns negligible probability.

Is this how the proposal is supposed to work?

de Finetti, Bruno. 1970. Theory of Probability. New York: John Wiley & Sons.
Diaconis, Persi, and Brian Skyrms. 2017. Ten Great Ideas about Chance. Princeton University Press.
Jeffrey, Richard. 1983. The Logic of Decision. 2nd ed. Chicago: University of Chicago Press.
Skyrms, Brian. 1980. Causal Necessity. A Pragmatic Investigation of the Necessity of Laws. New Haven: Yale University Press.
Skyrms, Brian. 1984. Pragmatics and Empiricism. Yale: Yale University Press.

Comments

# on 22 February 2025, 00:47

Solomonoff's universal prior is supposed to be equivalent to the Bayesian mixture distribution over all the possible sequences.

More generally, I guess you're asking when randomness is a useful approximation for causation in large complex systems - it's rational to accept that many phenomena can be treated in this fashion.

Add a comment

Please leave these fields blank (spam trap):

No HTML please.
You can edit this comment until 30 minutes after posting.