Pettigrew on epistemic risk and the demands of rationality

Pettigrew (2021) defends a type of permissivism about rational credence inspired by James (1897), on which different rational priors reflect different attitudes towards epistemic risk. I'll summarise the main ideas and raise some worries.

(There is, of course, much more in the book than what I will summarise, including many interesting technical results and some insightful responses to anti-permissivist arguments.)

All-or-nothing belief

The central Jamesian idea is that what we should believe depends not just on our evidence but also on our attitude towards epistemic risk.

To understand what this could mean, let's first imagine that there is an all-or-nothing attitude of belief. Let's also assume that there is an evidential probability measure that tells us to what degree a proposition is supported by an agent's evidence. (Richard discusses this setup in chapter 3, drawing on work by Kelly, Easwaran, and Dorst.)

Assume, then, that your evidence supports P to degree 0.8. You want to believe truths and not believe falsehoods. Should you believe P?

If believing a truth has as much positive value for you as believing a falsehood has negative value – if, say, the former has utility 1 and the latter -1 – then the answer is yes: believing P maximises expected utility. But suppose you really care about not believing falsehoods, so that believing a falsehood has utility -10 while believing a truth has utility 1. Then it's better not to believe P.

The utilities here are not meant to represent your practical desires. We're not interested in what you should belief from a practical perspective, but in what you should belief from an epistemic perspective. We're talking about your epistemic utilities.

In general, the more you epistemically care about the risk of believing a falsehood, the more reluctant you will be to form beliefs in the presence of supporting evidence. In the limit, if you're extremely risk-averse, you should believe nothing (or only propositions that are entailed by the evidence). At the other extreme, if you don't care at all about the risk of believing falsehoods, you should believe every proposition whatsoever (with the possible exception of propositions that are incompatible with your evidence).

This not quite how Richard sets up the Jamesian model. Richard assumes that you have a choice between three attitudes: believing P, disbelieving P, and suspending judgement. Even if you don't care about believing falsehoods, you should then only believe a proposition if its evidential probability is greater than 1/2. I think the picture is nicer if we identify disbelief with believing the negation. But this isn't a serious disagreement.

Anyway, three quick critical comments before we move on to credence.

First, what exactly is your "epistemic utility function" supposed to be, and why is it relevant to what you should believe?

The only way I can make sense of this is through a revealed-preference approach. Let's not pretend that we already know what subjective epistemic utility functions are. Instead, we start with the idea that the norms of rationality are permissive. They don't settle whether you should believe a proposition given that it has such-and-such degree of evidential support. Some people believe the proposition on that basis, some don't, and either choice is OK. But it's not OK (let's assume) to believe P and disbelieve Q if Q is better supported than P. In general, there are certain constraints on how one's beliefs should relate to evidential support. If an agent satisfies these constraints then there is an epistemic utility function such that the agent believes a proposition iff believing the proposition maximises expected epistemic utility.

This makes sense to me. It would now be good to know what the relevant constraints are. Richard doesn't tell us. He seems to assume that we have a direct grip on the notion of personal epistemic utility.

Second, whatever the relevant constraints are, they probably won't look all that plausible. The model we've ended up with assumes a version of the Lockean Thesis, on which all-or-nothing belief is only a matter of whether the evidential probability exceeds a certain threshold. This has many well-known problematic consequences.

Third, what does any of this have to do with risk? Above I said that if you really care about the risk of believing a falsehood then you should form few beliefs. That's not actually what the Jamesian model says. The model says that you should form few beliefs iff you assign comparatively high disutility not to a risk, but simply to believing a falsehood.

In economics, it is common to model risk attitudes by "utility curves". For example, an agent is said to be risk-averse if they assign decreasing marginal utility to money. But that's an odd conception of risk-aversion. Intuitively, the fact that $1000 means more to you if you're poor than if you're rich doesn't indicate that you have a genuine preference against taking risks. As I argue in chapter 8 of my decision theory notes (and as others have argued before me), genuine risk-aversion should be modelled by assigning low utility to outcomes that were brought about through a risky choice.

Throughout his book, Richard ignores this way of modelling attitudes towards risk. Perhaps that's because of his "accuracy-first" approach on which the only epistemically relevant feature of a belief state is its degree of accuracy. But I'm not sure. Perhaps it's really because the phenomenon he's interested in just isn't what I would call genuine sensitivity to epistemic risk.

Decision-making under uncertainty

If we try to adapt the Jamesian model of all-or-nothing belief to partial belief (credence), we run into a problem.

We now need to ask about the evidential expected utility of certain credence functions, relative to the personal utility the agent assigns to different degrees of accuracy. The problem is that no matter what these personal utilities look like (within reason), the evidential probability function will plausibly assign maximal expected utility to itself.

Formally, this assumes that the eligible personal utility functions should be "strictly proper", and so Richard goes through some arguments for strict propriety in chapter 4, suggesting that only strictly proper utility functions satisfy such-and-such intuitive desiderata.

Personally, I find some of the supposed desiderata rather dubious. But the underlying philosophical point is arguably simpler than Richard makes it appear.

In a nutshell, the problem is this. If there is such a thing as an evidential probability measure, which tells us that some proposition P has probability x based on our total evidence, then it's hard to see how it could be rational – by the lights of that same evidential probability function – for us to assign to P any credence other than x.

So we must drop the assumption that there is an evidential probability measure. (Richard doesn't comment on this move.) We then can't evaluate the choice of a credence function by its evidentially expected epistemic utility. How else might we evaluate the choice?

Richard suggests that we should think of the situation as a decision problem "under uncertainty", where no probabilistic information is available. In chapter 7, he looks at some decision rules for decision-making under uncertainty, and argues that the best of them is the "Generalised Hurwicz Criterion" (GHC).

The GHC is an extension of the original Hurwicz Criterion, which in turn is an extension of the Maximin rule.

The Maximin rule says to choose an option with greatest worst-case utility. The Hurwicz Criterion looks at both the worst-case and the best-case utility. It recommends to maximise the weighted average of these two utilities, relative to some personal weights.

The GHC extends this by looking at all possible outcomes. We assume that you care to a certain degree about the best case. This is your first "Hurwicz weight" λ1. You also care to some degree about the second-best case, giving your second Hurwicz weight λ2. And so on, up to the worst case. An option's choiceworthiness is then the sum of the utility of all its possible outcomes weighted by their Hurwicz weights: the best outcome is weighted by λ1, the second-best by λ2, and so on.

(I found the discussion of these rules a little confusing. We are told to assume that there is a finite set of worlds that settle everything that matters to the agent. Options are represented as functions from worlds to real numbers, so that f(w) is the utility of f at w. But if w settles everything I care about, then nothing I could choose makes a difference to how good w is: doing f at w must be just as good as doing g at w. We should really understand the "worlds" here as Savage-type "states". These states are compatible with each available act, and they do not settle anything the agent ultimately cares about.)

Why should the GHC be the right way to evaluate options "under uncertainty"? Richard's argument is that the preference relation induced by GHC has some desirable properties. Let's just look at three relevant properties.

"Strong Dominance" says that if an option A is at least as good as B at all worlds (i.e., states), and better at some, then A is preferred to B.

This looks plausible. It rules out both Maximin and the simple Hurwicz Criterion, neither of which satisfies Strong Dominance. The GHC does.

Next we have "Strong Linearity". This says that if A ∼ B, then for any numbers m and k, mA+k ∼ mB+k. For example, if you're indifferent between an option A that scores 1 at w1 and 8 at w2 and an option B that scores 2 at both w1 and w2, then you are also indifferent between these same options with all the outcome utilities multiplied by -1.

Strong Linearity is satisfied by Bayesian accounts on which you first assign a probability to the worlds and then determine choiceworthiness in terms of expected utility.

Richard argues that the condition is implausible if we want agents to care about risk: -8 is a really bad outcome, and you might well prefer the guaranteed -2 over a risk of getting -8. The GHC does not validate Strong Linearity.

No argument is given in support of the assumption that rational agents may "care about risk" in the described manner, where riskiness is not reflected in the outcomes.

Finally, there's "Permutation Indifference". This says that it doesn't matter to which worlds an option assigns its utilities. That is, if π is a permutation of W and if for all worlds w, A(w)=B(π(w)), then A is not preferred to B nor B to A.

Richard claims that this is "compelling". To me it looks insane. It means that all worlds are treated as equally relevant. You should care about the outcome of your choice in skeptical scenarios just as much as you should care about its outcome in sensible scenarios. I think rational agents should give less weight to skeptical scenarios, even if they don't have evidence that rules out these scenarios. I will return to this point below.

The GHC satisfies Permutation Indifference. Another rule that satisfies the condition is "Risk-weighted Objective Bayesianism", which assigns uniform credence to all worlds and then computes choiceworthiness in terms of risk-weighted expected utility, as in Buchak (2013). More generally, GHC and Risk-weighted Objective Bayesianism give the same verdict on all the conditions Richard looks at. It therefore remains unclear how these conditions are supposed to single out GHC as the best decision rule. Something must exclude Risk-weighted Objective Bayesianism, but we're not told what. (Or perhaps I've missed the explanation.)

I already mentioned that I'm not convinced by the case against Strong Linearity, and that I find Permutation Indifference highly implausible. So I'm not convinced by the case for GHC. The assumption that skeptical scenarios should be given less weight actually rules out all the rules that Richard considers.

I have other worries about GHC in particular.

One is that it violates a special case of a condition Richard calls "Coarse-Grain Indifference", according to which (roughly) if you don't care about the answer to a certain question then it doesn't matter if we individuate outcomes in such a way that they include the answer to that question or not.

Another aspect of GHC that looks odd to me is that the Hurwicz weights don't take into account how good or bad the relevant cases are. Compare two choices. In the first, one option gives you either $10K or $900K while the other option gives you either $-10K or $1M. You prefer the first, because you don't want to take the risk of losing $10K. In the second choice, one option gives you either $10K or $12K while the other option gives you either $8K or $20K. There's no risk of losing anything, and the difference between $10K and $8K isn't great, so you prefer the second option. There are no generalised Hurwicz weights that rationalise these attitudes, assuming dollars are a measure of utility.

The Jamesian model for credence

Let's return to the Jamesian idea that our credences reflect not just our evidence but also our personal attitudes towards epistemic risk.

In the model for all-or-nothing belief, we implemented this idea by rescaling the basic epistemic utility of a true belief (+1) and a false belief (-1) in accordance with two personal scaling factors, measuring the extent to which the agent cares about true vs false beliefs.

We can now do something similar for credence, by adopting the Generalised Hurwicz Criterion. The basic epistemic utility of a belief state is an accuracy score. We don't assume a fixed scaling factor for specific degrees of accuracy. Rather, we assume a fixed scaling factor for the worst possible accuracy the state could have, whatever that might be. Similarly for the second-worst accuracy score, and so on. Since we've dropped the idea of an evidential probability measure, we'll determine the total value for each credence function by the sum of these scaled accuracy scores.

The relevant Hurwicz weights are meant to represent the agent's personal attitude towards epistemic risk.

How does the agent's evidence enter the picture? We could apply the GHC at each point in time, considering only credence functions that take into account the present evidence. But this raises some problems, both technical and philosophical. Richard instead suggests that the GHC should only be used once to determine an agent's prior credence, before they receive any evidence. The later credence should then come from the prior credence simply by conditionalising on the evidence.

As a result, your credences at any point in time only take into account your risk attitudes at the very start of your epistemic journey. It's not clear to me why we should assume this. One could instead adopt a form of "ur-prior conditionalisation", as in Meacham (2016), on which we evaluate your credences at any point in time by applying the GHC to the choice of a prior probability at that time and then require you to adopt a GHC-optimal credence function conditionalised on your total evidence.

In chapter 8, Richard explains how the choice of Hurwicz weights affects the rationally eligible priors.

If you give great weight to bad outcomes (low accuracy) then GHC says you should adopt a uniform prior. If you give great weight to good outcomes (high accuracy) then GHC says you should adopt a prior that matches any permutation of your Hurwicz weights. Without further restrictions on the Hurwicz weights, every probability function is in principle permitted as a rational prior.

We have permissivism due to different attitudes towards epistemic risk, as desired. And we got a second kind of permissivism on top: If you're risk-inclined, there will be many eligible prior credence functions, with no rational grounds for choosing between them.

This might point at a reason against my suggested form of ur-prior conditionalisation. If you can choose new priors at each point in time, and you're epistemically risk-inclined, then Richard's model would allow your credences to fluctuate wildly even without relevant evidence. Richard discuss such fluctuations in chapter 10 and seems to agree that they would be problematic.

As with the scaling factors in the Jamesian model for all-or-nothing belief, I'd say that the Hurwicz weights in the model for credence don't really represent an attitude towards risk. In principle, you might put high value on risky actions but also believe that you're an unlucky person so that whenever you choose a risky action you can expect to get the worst possible outcome.

Also, as before, I'm not really sure what an agent's epistemic utility function is supposed to represent. In the model for all-or-nothing belief, it represented a certain combination of objective alethic status (truth/falsity) and the extent to which the agent cares about true or false beliefs. In the new model, one might think these two components have been separated: epistemic utility is simply a measure of objective accuracy; the extent to which you care about high or low accuracy is represented by your Hurwicz weights.

But that's not actually Richard's picture. In chapter 9, Richard endorses a proposal from Gallow (2019) on which learning involves a change of epistemic utility. You should conditionalise on you evidence, says Richard, because your epistemic utility function no longer cares about accuracy at worlds that are incompatible with the evidence. Epistemic utility therefore doesn't simply measure distance to truth. (We seem to have given up on veritism.) Like the Hurwicz weights, it is a subjective attitude that varies from person to person and from time to time. I have no direct grip on this supposed attitude.

Substantive rationality

I now come to what I think is the biggest flaw in Richard's model.

Imagine you are stranded alone on a remote island. As you walk around the island, you occasionally see a bird. The first 17 birds you see are all green. This should make you somewhat confident that the next bird will also be green. There are, of course, worlds where the first 17 birds are green and the next one is blue. And there are worlds where it is yellow. Or red. Or white. You should give some credence to these possibilities. They are not ruled out by your evidence. But you should give higher credence to worlds in which the next bird is green.

If some worlds compatible with your evidence have higher credence than others, and your credence comes from a prior credence function by conditionalising on your evidence, then you must have given higher prior credence to some worlds than to others.

The example illustrates that we can only learn by induction if we give higher prior credence to "regular" worlds in which patterns that begin with GGGGGGGGGGGGGGGGG continue with G than to "irregular" worlds in which they continue with B, Y, R or W.

Similarly inegalitarian attitudes are required for other aspects of rationality. Rational agents can learn about the world around them through sensory experience. Under normal circumstances, the kind of experience we have when we walk in the rain should make us believe that it is raining and not, say, that Russia has invaded Mongolia. We should give high prior credence to worlds where this kind of experience goes along with rainy weather and not to worlds where it goes along with Russian invasions.

On Richard's model, epistemically risk-averse agents must adopt a uniform prior. They will be radical skeptics, incapable of learning about the world beyond their immediate evidence.

Epistemically risk-seeking agents can adopt sensible priors. But they may equally adopt arbitrary permutations of these priors. They may choose priors on which observing 17 green birds makes it highly likely that the next bird is blue, and on which an ordinary rain experience makes it likely that Russia invaded Mongolia.

Richard discusses a small aspect of this problem in chapter 11. Here he considers a scenario in which you know that a certain urn contains either 1 green ball and 3 purple balls (H1) or 3 purple and 1 green ball (H2). Now two balls are drawn with replacement. The possible outcomes are G1-G2, G1-P2, P2-G1, and P1-P2. As Richard points out, if you assign uniform prior credence to the eight combinations of { H1, H2 } with these outcomes, then getting a green ball on the first draw (G1) will not affect your credence in either H1 or G2. That seems wrong.

Richard notes that the problem could be fixed by demanding that your priors should satisfy the Principal Principle. This would imply that Cr(G1/H1) = 1/4 and Cr(G1/H2) = 1/4. More generally, the Principal Principle would settle the rest of your credences once you have assigned credences to H1 and H2.

Some (fairly risk-seeking) Hurwicz weights allow you to adopt a prior that satisfies the Principal Principle. Richard considers the possibility of declaring the other Hurwicz weights irrational, but he doesn't commit to the idea. It would hardly help anyway. The relevant Hurwicz weights would allow an observation of G1 to increase your credence in H2, as it should. But the same weights would allow many other credence functions that don't satisfy the Principal Principle, including credence functions for which observing G1 actually decreases the probability of H2.

Another response Richard considers is to replace the GHC by a "generalised chance Hurwicz criterion" GCHC which would ensure that any eligible prior satisfies the Principal Principle. This looks somewhat better. But it still doesn't go far enough.

For one thing, the problem of rational learning from experience doesn't just arise in cases where there are well-defined objective chances.

Moreover, even in the urn case the chance-based response only works if we assume that there are only two candidate chance functions: one according to which there's a 25% chance of getting a green ball on each draw, independent of the other draws, and another one according to which that chance is 75%. But why are these the only a priori possibilities? What about chance functions that don't treat the individual draws as i.i.d.? If such chance functions are on the table then you may satisfy the Principal Principle and still take observation of a green ball to be strong evidence that most of the balls in the urn are purple.

It's often useful to distinguish between structural and substantive norms of rationality. Internal consistency and coherence are structural demands. That rain experiences should be treated as evidence for rain is a substantive demand, as is the norm that 17 green birds should be taken to indicate the presence of further green birds.

Epistemic utility theory has proved useful in clarifying and perhaps justifying structural norms of rationality. And epistemic utility theory is Richard's preferred tool of work, here and elsewhere. It's no surprise, then, that the account we get is blind to the demands of substantive rationality. But that's a problem. Considerations of epistemic utility do not "determine the rationality of doxastic attitudes" (p.9).

Buchak, Lara. 2013. Risk and Rationality. Oxford: Oxford University Press.
Gallow, J Dmitri. 2019. “Learning and Value Change.” Philosopher’s Imprint 19: 1–22.
James, William. 1897. “The Will to Believe.” In The Will to Believe, and Other Essays in Popular Philosophy. New Tork: Longmans Green.
Meacham, Christopher J. G. 2016. “Ur-Priors, Conditionalization, and Ur-Prior Conditionalization.” Ergo 3 (20170208). doi.org/10.3998/ergo.12405314.0003.017.
Pettigrew, Richard. 2021. Epistemic Risk and the Demands of Rationality. Oxford: Oxford University Press.

Comments

No comments yet.

Add a comment

Please leave these fields blank (spam trap):

No HTML please.
You can edit this comment until 30 minutes after posting.