Exploring the Rational Speech Act framework
I've been playing around with the Rational Speech Act framework lately, and I want to write a few blog posts clarifying my thoughts. In this post, I'll introduce the framework and go through a simple application.
1.The framework
The guiding idea behind the Rational Speech Act framework is to model speakers and hearers as rational (Bayesian) agents who think strategically about each other's behaviour. A hearer doesn't just update on the literal content of an utterance, but on the fact that the utterance has been made, by a speaker who anticipated that the speaker would update in some such way.
In its purest form, this kind of reasoning leads to an infinite regress. To interpret your utterance, I need to figure out why you made it. To do that, I need to figure out how you thought I would interpret the utterance, which depends on what you believe about what I believe about why you made it, and so on.
One way to block the regress is to assume that people are not infinitely smart. Think of strategic reasoning in terms of models. A speaker needs a model of the hearer to anticipate how they would interpret an utterance. A hearer needs a model of the speaker to figure out why they made the utterance. In principle, a speaker therefore needs a model of the hearer that includes a model of the speaker that includes a model of the hearer and so on. What we're going to assume is that these models within models within models are increasingly unsophisticated. At some point, we might reach a model of a hearer who simply updates on the literal content, without any strategic reasoning about the speaker. Or we might reach a model of a speaker who does not engage in strategic reasoning. This will stop the regress.
Instead of computing what speakers and hearers at various levels of sophistication would do, we are going to simulate them here in the browser, so that we only need to observe their behaviour.
(I learned about such simulations, and about the RSA framework more generally, from problang.org. Compared to the discussion there, I will focus on philosophical and conceptual issues, with less maths and less code.)
2.A first example
Let's begin with a variant of a scenario from Frank and Goodman (2012), where the RSA framework was first introduced. (This is also the first example on problang.org.)
Imagine the following situation. A (female) speaker has drawn an object from an urn, and wants to communicate to a (male) hearer what she has drawn. It is common knowledge that there were three objects in the urn: a blue square, a blue circle, and a green square.
For some reason, there are only four utterances from which the speaker can choose: 'Blue', 'Green', 'Circle', and 'Square'.
Imagine you're the speaker and you've drawn a green square. What should you say? You could say 'Green' or 'Square'. But 'Green' is more informative, as it rule out the blue square possibility. So you should say 'Green'. By the same reasoning, you should say 'Circle' rather than 'Blue' if you got a blue circle. What should you say if you got a blue square? This is less obvious. If you anticipate that the hearer can replicate the above considerations, both 'Blue' and 'Square' would do the job of communicating that you have drawn a blue square.
Let's write a simulation to verify these predictions.
We first define the relevant states of the world and the available utterances.
var states = ['Blue Square', 'Blue Circle', 'Green Square']; var utterances = ['Blue', 'Green', 'Circle', 'Square'];
These statements look like JavaScript, but the programming language is actually WebPPL. You won't need to understand the implementations to follow along, but I'll briefly explain what's going on.
We represent the three possible draws as strings. The first line in code block #1 (see the little number in the top right?) sets the variable states
to denote a list of these strings. The second line sets the variable utterances
to denote the available messages.
Given the way we've defined states and utterances, an utterance u is true in a state s iff u is a substring of s. (For example, 'Blue' is true in states 'Blue Square' and 'Blue Circle'.) Let's define a function is_true
that takes an utterance and a state as arguments and returns the truth value of the utterance in the state.
// continues #1 var is_true = function(utterance, state) { return state.includes(utterance); };
Now we're ready to model speakers and hearers.
3.Level 0
Let's begin with an unsophisticated level-0 speaker whose only aim is to say something true, without any consideration of what the hearer will make of their utterance.
// continues #2 var speaker0 = Agent({ options: utterances, credence: Indifferent(['Blue Square']), utility: function(u,s){ return is_true(u,s) ? 1 : 0; } }); viz.table(choice(speaker0));
Here I use the function Agent
from the webppl-rsa package in which I've defined a few helper functions for working with RSA models. The agent is initialized with three parameters: a list of options
, a credence
function, and a utility
function. Our speaker0
agent has the four utterances
as options. Her credence function is indifferent between all members of the list ['Blue Square']
. The only member of that list is 'Blue Square'
; so speaker0
gives credence 1 to having drawn a blue square. Since she only cares about saying something true, she assigns utility 1 to uttering a truth and 0 to uttering a falsehood. (In JavaScript, and WebPPL, condition ? A : B
evaluates to A
if condition
is true, otherwise to B
.)
The last line in code block #3 calls the function choice
on speaker0
. The choice
function analyzes the agent's decision problem and returns a uniform distribution over all options that maximize expected utility. viz.table
displays a probability distribution in a table. If you press the 'run' button (underneath code block #3), you should see that our level-0 speaker who has drawn a blue square is undecided between saying 'Blue' and saying 'Square'.
We can also define an unsophisticated level-0 hearer who simply updates on the literal meaning of what they hear, without any consideration of why the speaker might have chosen the utterance.
// continues #2 var hearer0 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { return is_true(utterance, state); }; } }); viz.table(learn(hearer0, 'Blue'));
The hearer0
agent starts out with a uniform prior over the three states. The kinematics
parameter defines how the agent is disposed to update these priors in response to hearing an utterance. The kinematics
function takes an utterance as argument and returns a function from states to truth-values. Think of the returned function as a proposition – the proposition the agent conditionalizes on when she hears the utterance. For hearer0
, it is the proposition that maps a state to true iff the utterance is true in that state. In other words, hearer0
is disposed to conditionalize on the assumption that whatever utterance he hears is true.
The last line in code block #4 calls the function learn
on hearer0
and the utterance 'Blue'
. The learn
function updates the agent's credence function with the supplied utterance, and returns the posterior credence. If you click 'run', you should see that when hearer0
hears 'Blue', he becomes indifferent between 'Blue Circle' and 'Blue Square'.
If you replace 'Blue'
with 'Green'
in the last line of code block #4, you can see what hearer0
would infer from hearing 'Green'. Yes, you can edit the code blocks!
4.Level 1
Now let's model a somewhat more sophisticated hearer who doesn't just update on the literal meaning of an utterance, but on the fact that the speaker has chosen to produce it. The hearer assumes, however, that the speaker does not engage in any further strategic reasoning. In the hearer's internal model of the world, the speaker is a level-0 speaker. This makes the hearer a level-1 hearer.
In code block #3 above, we've defined a level-0 speaker who is certain that she has drawn a blue square. The level-1 hearer obviously doesn't know yet what the speaker has drawn. He needs to simulate what the level-0 speaker would do in response to each possible draw.
We therefore need to generalize our definition of a level-0 speaker, so that we can represent hypothetical speakers who have drawn something other than a blue square.
// continues #4 var speaker0 = function(observation) { return Agent({ options: utterances, credence: Indifferent([observation]), utility: function(u,s){ return is_true(u,s) ? 1 : 0; } }); };
Now speaker0
is a function that returns a hypothetical speaker when given a hypothetical observation as input. speaker0('Blue Square')
is the speaker from code block #3.
Here's the code for a level-1 hearer who responds to an utterance by conditionalizing on the assumption that a level-0 speaker would have produced that utterance.
// continues #5 var hearer1 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { var speaker = speaker0(state); return sample(choice(speaker)) == utterance; } } }); showKinematics(hearer1, utterances);
The call to sample
in the kinematics function takes into account that a state may not determine the speaker's utterance: we know that a level-0 speaker who has drawn a blue square has an equal chance of saying 'Blue' and 'Square'. (If you want to understand how this sample-based updating works, I recommend working through the first few chapters of probmods.org.)
showKinematics
is a shortcut for displaying the posterior credence in response to each utterance. If you click 'run', you can see that the update works as it should (up to the 16th decimal place at least – this is a limitation of WebPPL).
If you replace hearer1
in the last line with hearer0
, you get the same output. So our level-1 hearer behaves just like the level-0 hearer from code block #4 above. You may want to think about whether this is always the case. (Hint: check what happens if you remove the 'Square' option from the available utterances in source block #1.)
Let's also model a level-1 speaker who represents her addressee as a level-0 hearer. This won't make any difference if the speaker is still only interested in choosing an utterance that isn't false. So let's also change the speaker's goals. Our level-1 speaker won't intrinsically care about speaking the truth any more. Instead, she cares about the accuracy of the beliefs that her addressee is expected to form upon hearing her utterance.
// continues #4 var speaker1 = function(observation) { return Agent({ options: utterances, credence: Indifferent([observation]), utility: function(u,s){ return learn(hearer0, u).score(s); } }); }; showChoices(speaker1, states);
learn(hearer0, u).score(s)
is the logarithm of the probability assigned to state s
by hearer0
after updating on utterance u
. This is our accuracy measure.
As you can see, the level-1 speaker always says 'Green' if she has drawn the green circle; she says 'Circle' if she got the blue circle; and she is undecided between 'Blue' and 'Square' if she got the blue square. This is what we've predicted above.
5.Higher Levels
Higher-level agents are now trivial to define. A level-2 hearer (who represents the speaker as a level-1 speaker) can be defined exactly like the level-1 hearer in code block #6, except that the call to speaker0
is replaced by a call to speaker1
. A level-2 speaker can likewise be defined just like the level-1 speaker in block #7, except that we'd call hearer1
instead of hearer0
in the utility function.
Strategic reasoning always bottoms out either in a level-0 hearer or in a level-0 speaker. Here is the complete code for a hearer-terminal model up to level 3.
var states = ['Blue Square', 'Blue Circle', 'Green Square']; var utterances = ['Blue', 'Green', 'Circle', 'Square']; var is_true = function(utterance, state) { return state.includes(utterance); }; var hearer0 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { return is_true(utterance, state); }; } }); var speaker1 = function(observation) { return Agent({ options: utterances, credence: Indifferent([observation]), utility: function(u,s){ return learn(hearer0, u).score(s); } }); }; var hearer2 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { return sample(choice(speaker1(state))) == utterance; } } }); var speaker3 = function(observation) { return Agent({ options: utterances, credence: Indifferent([observation]), utility: function(u,s){ return learn(hearer2, u).score(s); } }); }; // showKinematics(hearer2, utterances); // showChoices(speaker3, states);
This code models a number of hearers and speakers of increasing sophistication, beginning with a naive "level-0" hearer (hearer0
) who simply conditionalizes on what has been said, and ending with a speaker (speaker3
) who chooses an utterance based on what she expects to be the hearer's reaction, whom she models as a level-2 hearer who in turn models the speaker as a level-1 speaker who models the hearer as a level-0 hearer.
In our urn example, speaker3
behaves just like speaker1
, so any level of sophistication beyond level 2 is redundant. (You can see what the agents do by uncommenting the calls to showChoices
or showKinematics
and entering the agent you want to inspect as the first argument.)
Here is the code for a speaker-terminal model up to level 3.
var states = ['Blue Square', 'Blue Circle', 'Green Square']; var utterances = ['Blue', 'Green', 'Circle', 'Square']; var is_true = function(utterance, state) { return state.includes(utterance); }; var speaker0 = function(observation) { return Agent({ options: utterances, credence: Indifferent([observation]), utility: function(u,s){ return is_true(u,s) ? 1 : 0; } }); }; var hearer1 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { return sample(choice(speaker0(state))) == utterance; } } }); var speaker2 = function(observation) { return Agent({ options: utterances, credence: Indifferent([observation]), utility: function(u,s){ return learn(hearer1, u).score(s); } }); }; var hearer3 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { return sample(choice(speaker2(state))) == utterance; } } }); // showChoices(speaker2, states); // showKinematics(hearer3, utterances);
After a few iterations of strategic reasoning, this model makes the same predictions as the hearer-terminal model. For example, hearer3
in the speaker-terminal model responds just like hearer2
in the hearer-terminal model. The speaker-terminal model takes a little longer to reach equilibrium: only from level 3 onwards do further iterations become redundant.
6.What's the point?
Grice had the insight that much of the complexity of language might be explained by combining a simple literal semantics with the hypothesis that speakers are cooperative and that hearers know that they are. (Neo-)Griceans typically propose explicit derivations of enriched meanings. In the above example, one might suggest that a hearer goes through something like following steps when she hears the speaker utter 'Square'.
The hearer uttered 'Square'.
The hearer knows the true state and tries to be informative.
If the hearer had drawn a green square, it would have been more informative to utter 'Green than 'Square'.
So the hearer didn't draw a green square.
So they must have drawn a blue square.
The hearers we've modelled reach the same conclusion. But we didn't need to posit any explicit steps of reasoning. It's all just Bayesian updating.
Sometimes Neo-Griceans put forward derivations that can't be replicated by Bayesian updating. (We'll meet examples later.) The RSA framework can be useful to check such proposals and reveal hidden assumptions.
More generally, the RSA framework allows us to model how semantic content relates to linguistic behaviour.
Metaphysically, linguistic behaviour (widely understood) comes first. The sentence 'es regnet' means what it does in the German-speaking community because of how it is used in that community. What is the relevant pattern of use? To a first approximation, people utter the sentence only when it is raining. But it won't do to say that a sentence S means P iff people (are disposed to) utter S only when P. (This would yield an ill-behaved and unfamiliar notion of meaning.) The connection between meaning and use is more subtle and complicated.
Here's how Lewis put the problem in Lewis (1986, 40):
Suppose we want a systematic grammar, covering not only syntax but semantics, for a natural language or some reasonable imitation or fragment thereof. Such a grammar is meant to plug into an account of the social practice of using language. It encapsulates the part of the account that is different for different linguistic communities […]. What makes the grammar correct for a given population is that, when plugged into its socket, what results is a correct description of their linguistic practice – of the way they suit their words to their attitudes, of the way they suit their attitudes to others' words, and of their mutual expectations concerning these matters.
An RSA model is a simplified "account of the social practice of using language" – a simplified model of the socket into which the grammar plugs.
From this perspective, speaker-terminal models can seem more natural than hearer-terminal models. Lewis (1969) suggested that there is a basic convention to utter an expression only if such-and-such conditions obtain: 'it is raining' only if it is raining, 'Square' only if one has drawn a square, and so on. A naive speaker would randomly choose any expression that satisfies this convention. That's our level-0 speaker. A more sophisticated (level-2) speaker also considers the effect of her utterance on the beliefs of the hearer, who she knows to be aware of the basic convention. And so on.
I might return to these big-picture issues later. First, I want to look at some more applications.