Gallow on causal counterfactuals without miracles and backtracking
Gallow (2023) spells out an interventionist theory of counterfactuals that promises to preserve two apparently incompatible intuitions.
Suppose the laws of nature are deterministic. What would have happened if you had chosen some act that you didn't actually choose? The two apparently incompatible intuitions are:
(A1) Had you chosen differently, no law of nature would have been violated.
(A2) Had you chosen differently, the initial conditions of the universe would not have been changed.
Rejecting one of these intuitions is widely thought to spell trouble for Causal Decision Theory. Gallow argues that they can both be respected. I'll explain how. Then I'll explain why I'm not convinced.
We start with some familiar ideas from causal modelling. To begin, we assume that the world contains relations of causal influence between variables. Here, a variable isn't a syntactic object, but a "contrastive generalisation of an event". For our topic, we can focus on binary variables that take only two values, 0 and 1. For example, we'll talk about a variable B that represents whether or not you take a certain bet. If you do, we say that B has value 1, if you don't, it has value 0.
Relations of causal influence can be represented by structural equations that specify how the value of one variable is determined by the value of other variables. For example, if B is a bet that a certain coin will land heads, H is a variable for the outcome of the coin flip (1 = Heads, 0 = Tails), and W is a variable for getting the relevant amount of money ("winning"), then the equation
(1) W := B * H
represents that whether you get the money is determined by whether you bet and by how the coin lands, in multiplicative fashion (so that W = 1 iff B and H are both 1).
What does it take for an equation like this to be a correct representation of the causal structure in the world? Gallow lists three conditions.
First, all the variables must be mereologically distinct, so that all combinations of value assignments are possible.
Second, for any possible values of the variables on the right-hand side of the equation (B and H in my example), the equality (W = B * H) holds at the closest worlds at which the variables on the right-hand side have these values. Closeness is measured by intuitive overall similarity at the time of B and H.
To see what this means, let's continue the example. Suppose you haven't actually bet (B=0) and the coin has landed heads (H=1). We then need to check, for example, whether you win (W=1) in the closest worlds at which you bet (B=1) and the coin lands heads (H=1). The answer is yes. In general, no matter how we set W and B, the equation W = B * H holds (in the closest worlds where W and B have these values).
A third condition pertains to entire systems of structural equations. For systems that contain only a single equation, it says that there is no non-trivial dependence (of the sort that would meet the second condition) between the variables on the right-hand side.
In the coin example, there is no non-trivial dependence between B and H. So (1) is a correct representation of the coin flip scenario.
We need one more piece of machinery to connect all this to conditionals.
Let a causal model be a system of structural equations together with an assignment of values to those ("exogenous") variables that only occur on the right-hand side of the equations. A model is correct if the system of equations is correct (as per the three criteria above) and the values assigned to the exogenous variables are their true values. Finally, if M is a model and V=v is an assignment of value v to some variable V, then M revised by V=v is a model much like M except that V has value v and all equations in which V occurs on the left-hand side are removed.
Now, according to Gallow, a counterfactual conditional A=a > C=c is true (on its "causal" reading) whenever there is a correct causal model that determines C=c when revised by A=a.
In the coin example, the equation W := B * H together with the assignment { B=0, H=1 } is a correct causal model. To check whether B=1 > W=1 ("If you had bet on heads, you would have won"), we consider the revised model that has the same equation but assignment { B=1, H=1 }. Applying the equation, this determines W=1. So the conditional is true.
All this is nice and interesting and useful, and spelled out with more care than at many other places where the same ideas have been discussed. But let's see how it helps with (A1) and (A2).
Imagine a situation in which you have the option of betting that the actual laws of nature are violated. You're not going to choose this option. We wonder what would happen if you did. Would you win the bet? Intuitively, the answer is no. Your choosing otherwise would not have brought about a violation of the laws, as per (A1).
Gallow models the relevant causal relations as follows. We have three variables. B for whether you bet, W for whether you get the payoff, and M ("miracle") for whether the actual laws are violated. These variables are related by the equation
(2) W := B * M.
We can check that this is a correct system of equations by Gallow's three conditions. First, the three variables are distinct, at least insofar as any combination of values is possible. Second, the closest worlds at which B=1 or M=1 (or both) are still worlds at which you get the money iff B=1 and M=1. Finally, there is no dependence between B and M.
Wait. Why isn't there a dependence between B and M? In the actual world, we have B=0 and M=0. According to Lewis (1979), the closest B=1 worlds are M=1 worlds. (The closest worlds where you bet have a "miracle".) This suggests that B and M are related by equation (3), which would make (2) an incorrect system.
(3) M := B.
We could, of course, deny that the closest B=1 worlds are M=1 worlds, but then we'd run into the analogous problem with the past and (A2). So let's assume that the closest B=1 worlds are M=1 worlds, if only for the sake of the argument.
Gallow argues that (3) is incorrect even if the closest B=1 worlds are M=1 worlds, because the closest B=0 worlds aren't all M=0 worlds: they include worlds at which the (actual) laws of nature are violated at around the time of the bet.
This makes no sense if closeness is a matter of intuitive similarity at the relevant time (as we are told on p.14). The actual world is surely more similar to itself than any world in which the actual laws are violated at the time of the bet.
So closeness doesn't measure intuitive similarity. It's a technical concept that measures something else. I'm not sure what it measures. Gallow doesn't really explain. He describes a different example involving Jesus that he thinks motivates the idea that the actual world is not the most similar world to itself. I don't really follow his intuitions here. The idea seems to be that if J is a variable for whether Jesus was born then an equation like D := J means that D=1 is guaranteed no matter how J=1 is realised. Even if J actually has value 1, we therefore need to check that D=1 is the case at nearby worlds at which J=1 is realised in some other way. OK.
Perhaps B=0 can also be realised in different ways. On Gallow's reading, equation (3) then requires that at nearby worlds at which B=0 is realised in some other way, the actual laws obtain without violation. On a Lewisian conception of closeness, these worlds all involve violations of the laws. So (3) is false. And so (2) is correct. Here it is again.
(2) W := B * M.
Now it's easy to see why 'B=1 > W=0' ("if you were to bet that the laws are violated you wouldn't win") is true. The actual values of B and M are B=0 and M=0. To evaluate the conditional, we set B=1, leaving M at 0, and plug these values into equation (2). We get W=0.
The same reasoning applies for the past. If C describes the initial conditions of the universe and B is a bet that the initial conditions are not C, then (4) is a correct system of equations:
(4) W := B * (1-C).
We get the desired result that if you had bet that the initial conditions are different then you would have lost, because the initial conditions would not have been changed, as per (A2).
If we're only interested in (A1) and (A2), we can actually use simpler models.
Let M1 be a model with variables B and M, an empty system of equations, and the assignment { B=0, M=0 }. This model is correct by Gallow's criteria. Revising M1 by B=1 ("you choose otherwise") yields a model with M=0. So we have 'B=1 > M=0' ("the laws would not be violated").
Similarly, let M2 be a model with B and C, an empty system of equations, and assignment { B=0, C=1 }. Revising this correct model by B=1 ("you choose otherwise") determines C=1 ("the initial conditions would be the same").
But how could the deterministic laws and the initial conditions both be the same, if you had chosen a different act?! Would an impossible situation have obtained?
According to Gallow, we're not licensed to conclude that the laws and the initial conditions would both have been the same. The "agglomeration" rule for counterfactuals fails.
What if we try to build a model to check whether 'B=1 > (M=0 ∧ C=1)' is true? We can't. The three variables B, M, and C are not independent. M=0 and C=1 entail B=0. Any system with these three variables violates the first condition for correct systems: that the variables must be "distinct". Gallow's semantics therefore doesn't tell us whether 'B=1 > (M=0 ∧ C=1)' is true.
The distinctness condition has an independent motivation. We don't want to say that your playing poker causally determines that you play cards. But if we could use one variable P for poker and another C for cards, then the equation C := P would be a correct system. We need the distinctness condition to rule it out.
So much for my summary. I have eleven worries/objections.
One. As Gallow points out, the coin scenario looks like the famous Morgenbesser case. It is widely thought that to vindicate the intuition that B=1 > W=1 ("if you had bet on heads you would have won"), one must put explicitly causal notions into the analysis of the conditional. Gallow gets the intuition right without invoking any casual notions. This looks like an advantage. But I think it's a problem. The Morgenbesser intuition depends on the extent to which the betting is isolated from the coin flip. Take an extreme case of non-isolation: if you bet on heads, the coin is flipped gently with the left hand; if you don't, it is flipped vigorously with the right hand. As before, you don't bet, the coin is flipped (vigorously with the right hand) and lands heads. If you had bet on heads, would you have won? This is far from clear. I'd say no. There would have been an entirely different coin flip. Who knows how it would have landed? But the system composed only of equation (1) is still correct by Gallow's standards. And the exogenous variables still have values { B=0, H=1 }. Revising this model by B=1 still yields W=1. So 'B=1 > W=1' comes out true, in my version of the story with extreme non-isolation. That's the wrong result.
Two. It is crucial to Gallow's account that there is more than one closest B=0 world, even if B=0 is actually true. I already mentioned that I don't really see an independent motivation for this assumption. If the idea is that B=0 can be realised in different ways, and that the equations should be robust across these ways, then OK, but that's not enough. What if B=0 is a very precise description of a bet that can't be realised in relevantly different ways? We don't want to say that if you had taken the bet then there would have been no law violations but if you had taken the bet in such-and-such specific way then there would have been law violations.
Three. Interventionist accounts tend to neglect the need for a "ramp". An example from Bennett (2003): At time t, a dam bursts. Several cars on the valley road are swept away. Some drivers and passengers die. What would have happened if there had been no cars on the road at t? Nobody would have died. (Let's say.) Would there have been an inquiry into the mystery of the disappearing cars? Intuitively, no. We can construct a model with variables for E = whether cars enter the road at t-1, R = whether there are cars on the road at t, I = whether there is an inquiry into the mystery of disappearing cars. The equation I := E * (1-R) is plausibly correct. (This is so even if our closeness standards require a ramp, so that the closest R=0 worlds are E=0 worlds.) Intervening with R=0 therefore yields I=1. But 'R=0 > I=1' is intuitively false.
Four. Continuing the valley road example, if our closeness standards require a ramp, so that cars don't magically appear or disappear at nearby worlds, then the equation E := R comes out correct. But it has the wrong direction. (Lewis (1981a) tried to get around this by hoping that ramp "events" like E would be too disjunctive to count as genuine events. But (a) this depends on the details of the case, and (b) Gallow doesn't have a rule that events/variables can't be disjunctive.) If our closeness standards don't have a ramp, even the equation I := 1-R is correct. Intuitively, neither E := R nor I := 1-R correctly represents the causal structure of the world.
Five. "If you had played poker you would have played cards" is true. It may not be a strictly "causal" counterfactual, but it's the kind of counterfactual that could well be relevant to CDT. (Imagine you assign basic value to playing cards and wonder whether to play poker or go for a walk.) We need a theory of counterfactuals that doesn't just cover conditionals between distinct variables.
Six. Is M actually distinct from B, as required for the correctness of equation (2)? Gallow doesn't formally define distinctness. He suggests that we can model variables as classes of Lewisian events, each of which is a class of possible spacetime regions (see Lewis (1986)). On this understanding, M=0 comprises all of spacetime in all worlds where the actual laws aren't violated, and M=1 comprises all other regions in all worlds. Wherever B=0 and B=1 happen, they're part of the M=0 region or the M=1 region. Without further constraints on events, B=0 and M=0 definitely aren't distinct by the somewhat complicated criteria from Lewis (1986, 258–60).
Seven. The distinctness constraint isn't enough. Let 'Xanthippe's widowing' be the class of possible spacetime regions that are fully occupied by Xanthippe and simultaneous to Socrates's death. This "event" is distinct from Socrates's death. And it "occurs" whenever Socrates dies. If we allow a variable X for Xanthippe's widowing and a variable S for Socrates dying, the equation S := X is correct. (As is X := S.) But there's no causal relationship between S and X. (This problem was raised by Kim (1973) against Lewis (1973).) I worry that whatever constraint rules out equations with S and X (Lewis (1981b) suggested that X is too gerrymandered to count as an event) will also rule out equations with B and M.
Eight. This is really just a feeling of uneasiness. M doesn't look like the kind of variable that should figure in a causal model. Causal models are useful to represent relations of causal influence between concrete and local types of events: whether a brake was released, how fast a ball was thrown, etc. I wouldn't expect that one could have, say, a variable for whether the equation W := B * H is correct in a causal model. But M is just like that.
Nine. Suppose the deterministic laws of nature have a certain parametric form L(x). There are only two possible parameter values, 0 and 1. The true value is 1. Let L state that there are unviolated laws of the form L(x). B is a bet that a certain event in the past (say, the holocaust) never happened. C are the initial conditions. You are sure that the event happened, so you don't choose B. This means that L(1) and C together entail ¬B. L(0) and C together entail that the history of the universe takes a different path. Let's say it leads to a history in which the event didn't happen and you now choose B. We can model the structure of this scenario with variables for L, B, C, and W (for winning the bet). All combinations of values are possible, because L doesn't settle whether the laws are L(1) or L(0). The equation W := C * L * B is correct. Since we have C=1 and L=1, setting B=1 yields W=1. On Gallow's account, we have to conclude that if you had bet that the holocaust didn't happen, then the holocaust would not have happened. (In general, both the laws and the past seem to depend on our present actions if we look at models with suitably weakened versions of M and C.)
Ten. I would have thought that if there are two correct but partial models of the world's causal structure, then they can be combined into a larger correct model. According to Gallow, M1 and M2 can't be combined in this way. Is the total causal structure of the world unrepresentable?
Eleven. Suppose I'm confident that there are no violations of the actual laws (M=0) and that the initial conditions are C (C=1). Someone offers me a deal that pays $1 if M=0 and $-1 if M=1. Gallow says that I should accept. Then someone offers me a deal that pays $1 if C=1 and $-1 if C=0. Again Gallow says that I should accept. But if someone offers me deal that pays $2 if M=0 and C=1, $0 if M=1 and C=0, and $1 otherwise, then Gallow says that it's unclear what I should do because the situation can't be modelled. But isn't the last bet equivalent to the first two? How could it be rationally required to accept the first two and not the last?
Oh my, so many great objections---thanks!. Let me try to take them one by one.
1. How the coin lands (heads or tails) is influenced by the upwards and angular velocity with which it is flipped. In this version of the case, the upwards and angular velocities of the coin are influenced by which hand its flipped with, which is in turn influenced by whether you take the bet. So there's influence leading from whether you take the bet to how the coin lands. This influence won't be deterministic, since the influence of the hand on initial velocities won't be deterministic. So we'll need a more general treatment of indeterministic relations of influence.
However, the existence of this larger model, with the path of influence leading from whether you take the bet to how the coin lands, is enough to show that the equation W := B * H is not correct. That equation says that there's no influence from B to H, which is not true. So the equation violates condition (E3) of "Causal Influence".
2. Insofar as taking the bet in precisely the way you actually do isn't an option, I don't think this worry will affect the decision theory. But the English conditional "if you hadn't taken the bet in precisely that way, there would have been a miracle" still seems false. However, I don't see why there can't be multiple not-too-different ways for you to choose the bet in precisely the way you actually did. Those other ways will involve slight changes to other states of the world.
You might want to open the door to all kinds of variables, including one---call it "V"---which only takes on the value v at the actual world. This makes it all the more important to use a selection function which isn't strongly centred, since, if we were to use a strongly centred selection function, we'd say that literally every variable causally determines the value of V, since whenever U is actually u, s(U=u, @) = { @ } implies that V=v, and s(U=u', @) implies V=/=v.
3. I see the need for a temporal 'ramp' in some conditionals as a species of a more general problem. Suppose there's a switch connected to a lever connected to a duct. Consider a variable which describes the position of the switch, S=0 if the switch is down, S=1 if the switch is up. In fact, the position of the switch causally influences whether the duct is open. When the switch is down, that pushes the lever up which pulls the duct open. When the switch is up, that pushes the lever down which closes the duct. But now there's two different kinds of worlds we could include in s(S=1,@). We could consider worlds where the switch is moved but the lever is kept in place and is no longer connected to the switch. If we do that, then wiggling the switch won't wiggle the duct. Alternatively, we could consider worlds in which the switch is moved and the lever remains connected to the switch. In that case, wiggling the switch will wiggle the duct.
This strikes me as the same problem, though it doesn't involve time at all. My solution is to say that variables carry presuppositions. They only take on values when those presuppositions are met. There's a variable for the switch's position which takes on a value even when the switch isn't connected to the lever. And there's another which takes on a value only if the switch is connected to the lever. The second one is the one which influences whether the duct is open or not. The presuppositions we are usually inclined to make about a system influence which kinds of variables we're inclined to talk about when describing that system.
4. In the road example: one of the presuppositions of the variable E (at least, the one we're normally inclined to talk about) is that R=1. So the value of R won't wiggle when E wiggles.
I didn't say that variables can't be too disjunctive, but I take it that it's compatible with everything I say in the paper that there is a naturalness constraint on which variables can enter into relations of causal influence. (Related to my discussion below, if we want to use anything like the Lewisian mereology for variable values, we'll need a naturalness constraint.)
5. It's not obvious to me that we need the conditional "If you had played poker, you would have played cards" in order to apply causal decision theory. Let C be a variable whose value depends upon whether I play cards and which card game I play. If I assign basic value to playing cards, then my value function can be determined by the value of the variable C. I don't need an additional, more coarse-grained variable for whether I play cards.
I agree that this is a true English language counterfactual. But I don't think there's causal influence between whether you play poker and whether you play cards. I'd be interested to see a broader theory which allowed us to handle counterfactuals between overlapping variables. Woodward has some suggestions, but I don't have anything constructive to add myself. Developing a theory like that strikes me as a worthwhile research program.
6. Lewis recognises two different kinds of mereology for events: since events are classes, they have the mereology from his "parts of classes". With this mereology, M and B will not be distinct, since there are members of M=1 which are also members of B=1. But this is not the mereology that matters for Lewis's theory of causation when he demands that causes be distinct from their effects. It is instead the rather complicated "spatiotemporal" mereology. We can import this mereology over to the case of variables straightforwardly.
Following Lewis, say that V=v *in* a region R iff R is a member of V=v. Say that V=v *within* R iff V=v in some subregion of R. Say that V=v *implies* U=u iff necessary, if V=v in R, then U=u in R, too. Say that V=v is *essentially a part of* U=u iff necessarily, if V=v in R, then U=u within R. Say that an actual variable value V=v is part of an actual variable value U=u iff there is an actual variable value, I=i, such that I=i implies V=v and there is a variable value J=j such that I=i is essentially a part of J=j and J=j implies U=u. Then, V=v and U=u overlap iff they have a part in common. They are distinct iff they have no part in common.
Neither M=1 nor M=0 implies B=0 or B=1, nor does B=0 or B=1 imply M=0 or M=1, since you could take the bet and not take the bet in a world with a miracle and in a world without a miracle. So neither M=1 nor M=0 is essentially a part of B=0 or B=1. And neither B=0 nor B=1 is essentially a part of M=0 or M=1, for the same reason. As with Lewis's theory of events, much will depend upon which variable values we countenance. With gerrymandered variables, we could get variable values I=i and J=j so that both I=i and J=j contain a single region at the actual world, R and R', respectively, so that B=1 in R and M=0 in R'. Then, I=i will imply B=1, I=i is essentially a part of J=j, and J=j implies M=0. So we'll have that B=1 is a part of M=0. But that's just the same problem Lewis faces with gerrymandered events. If we want to use a mereology like Lewis's, it's important that we restrict the kinds of variable values we countenance.
7. Relatedly, variables should depend upon the intrinsic properties of the regions in which they occur. So there is no variable value corresponding to 'Xanthippe's widowing'. This rules out a variable like X, but does not rule out variables like B and M, since whether there's a violation of the laws in a region of spacetime R doesn't depend upon what happens in regions outside of R, and whether you take the bet in a region R doesn't depend upon what happens outside of R.
8. Relatedly, as I was thinking about things, "M" was a variable describing whether the goings-on in some region of spacetime just before you decide whether to take the bet satisfy the laws of nature or not. So it will be local, at least.
9. I don't see why the equation W:= C * L * B will be true in this situation. In order for the equation to be correct, it must be that s(B=1,@) implies that you win the bet, W=1. (Since, actually, C=L=1.) Take a 'miraculous' understanding of the selection function. We've assumed that the holocaust actually happened. Changing whether I take the bet with a localised miracle won't change whether the holocaust happened in the past. Since the bet pays out only if the holocaust didn't happen, s(B=1,@) won't imply W=1. So the equation won't be correct.
10. Mereologically overlapping variables influence and are influenced. So if we want to represent the entirety of the world's causal structure in a single model, we will need a model which allows us to include these overlapping variables. Doing this in a way which allows us to model interventions on variable's values will require more bells and whistles than you can find in the models people are currently playing around with. So, no, those models cannot always be combined into a larger correct model. (Though you could always just represent the entirely of the world's causal structure with a set of all correct models.) Developing models like this strikes me as a worthwhile research project.
11. This is a nice argument for agglomeration (the principle that, if X is not under your control and Y is not under your control, then X&Y is not under your control). To be clear: I don't take a stand on agglomeration. And while I like the argument, I'm still on the fence. I don't think that, in general, a bet on X&Y is rational whenever a bet on X would be rational and a bet on Y would be rational. Compare: A is a $1 bet on whether the coin doesn't land heads which costs 50 cents. You take the bet by flipping the coin. B is a $1 bet on whether the coin doesn't land tails. You take the bet by flipping the coin. C is a $1 bet on whether the coin doesn't land heads and doesn't land tails. Again, you take the bet by flipping the coin. Taking bet A is rational and taking bet B is rational, but taking bet C is not.