Article for November: Probabilistic pragmatics

Hi everyone! November’s article for the Mint Journal club is a recent publication by Michael Franke and Gerhard Jäger called Probabilistic pragmatics, or why Bayes’ rule is probably important for pragmatics, appearing in Zeitschrift für Sprachwissenschaft this very year.

As the title suggests, the article outlines a probabilistic / Bayesian approach to pragmatics, emphasizing the role of uncertainty for speakers and listeners. Initially, the probabilistic approach to pragmatics is described and compared to alternative methods. Later on, example applications are presented that show how probabilistic modeling can be used to analyze data on pragmatic phenomena such as scalar implicature. The paper ends with arguing that the approach can be applied to indirect speech acts as well, using a mathematical implementation inspired by game theory.

Where do you see good possibilities or problems for a probabilistic take on pragmatics? Feel free to add your thoughts in the comments below.

4 Comments

Thomas Müller 22 November 2016 (15:49)

A very nice and useful introduction to the framework of probabilistic pragmatics in my opinion, and quite readable for someone who doesn't know anything about the ideas in advance (though it might have been quite formulaic and mathematical for some of you?). Some advantages of going Bayesian in a model from my point of view should be that these models take baseline and context probabilities into account (if specified) and that they are, well, probabilistic and thus allow for uncertainty. This is certainly extremely helpful when dealing with behavioural data, which we usually have in experiments on language.
Of course, other statistical approaches have developed similar strategies to account for probabilistic phenomena, and in fact some of the ideas vaguely reminded me of Item Response Theory. It would be interesting for me to see where both methods overlap, and what unique ideas each claim their own (considering that probabilistic pragmatics is the conglomerate of different fields that the authors call it).
I would have wished for a clearer introduction about how the models can be applied for more complex experiments, but I see that this was not the scope (hopefully we'll hear more on that when Michael is giving his talk). Instead, they wanted to contrast the approach with the methods of formal pragmatics and show the additional value. Thus, the most impressive part for me was the last chapter on implementations of game theory, because this was more concerned with actual behavioural outcomes in addition to simple utterance phenomena of pragmatics.
Olivier Morin 24 November 2016 (09:22)

I found much to enjoy and learn from in this ambitious and wide-ranging paper. What an enthusing research program! For discussion's sake, this comment focuses on one thing I found perplexing. My remark might be an objection to the framework of probabilistic pragmatics, but it could also reflect a misunderstanding on my part. I'm putting it forward in any case, hoping that even a simple clarification could open the way for a fruitful discussion.

Predicting population-level variation is, according to this paper, among the main selling points of probabilistic pragmatics. Unlike more qualitative models, probabilistic models represent a distribution of alternatives, with each alternative being given a certain weight. Thanks to this, it can yield fine-grained predictions that address inter-individual differences and population-level patterns that other approaches would gloss over. To simplify (perhaps to the point of caricature), standard models will predict that a population of experimental subjects will choose A over B and C; a probabilistic approach will tell you how many subjects should favour A, B, or C (or how many times one subject would choose A, B, or C). As the authors put it, "the main difference that probabilistic pragmatics brings along [...] is that it can, by its very nature, go a step further: it often comes ready-made to predict, not only particular categorical features of the data, but the full quantitative pattern found in a dataset."

And so it seems to work. The way I understand studies like Frank and Goodman (2012) (described pp. 11 sq. of the paper), they are about trying to predict the frequency of pragmatic choices in a population of subjects, based on a model that represents subjects' beliefs about communication in probabilistic terms. Consider their Green square / Green circle / Blue circle experiment. Of the 180 subjects who were presented with the adjective "Green", a third thought that it best applied to the green square, while two thirds considered that it was a better fit for the green circle. Frank and Goodman (again, if I understand them right), see this as supporting a model of pragmatic inference where subjects assign the probability 1/3 to the first meaning, and the probability 2/3 to the second. If this is true, then there is good reason indeed to claim that "probabilistic pragmatics... by its very nature... comes ready-made to predict ... the full quantitative pattern found in a dataset."

And yet, it is far from obvious that we can straightforwardly make population-level predictions (the fact that around 60 subjects chose the first meaning while 120 chose the second), on the basis of individual judgements of probability distributions (the notion that Frank and Goodman's idealised agent would assign a 1/3 probability to the first meaning and a 2/3 probability to the second one). The reason is simple: if a rational agent foresees a 2/3 probability of an event occurring, she should bet on it all the time (not two thirds of the time). A population of rational agents should do the same—all of them, not two thirds of them.

Herbert Simon drew attention to the difference between subjective probabilities and population-level patterns with the case of the probability matching fallacy. There's a biased coin lands on Heads 70% of the time, on Tails 30% of the time. There's a population of 100 rational, fully informed agents; each agent knows this probability distribution: they represents the coin's state at each toss with a probability distribution (heads 70%, tails 30%). We toss the coin, and ask each agent to make a bet. How many of the 100 agents will bet on Heads? Some people (in the classes I taught) are tempted to answer "70." The right answer, of course, is 100: betting on Tails is the only rational answer, and since all agents are equally rational, all should place the same bet. Simon called "probability matching" the false, but intuitive notion that the distribution of choices in the population should match the agents' subjective probabilities: that 70 people would call Heads and 30 people would call Tails.

Coming back to the Green square / Green circle / Blue circle experiment. Suppose that, following Franke and Goodman's decision method, an agent decides that the adjective "Green" could refer to the square (p=1/3), but more probably refers to the circle (p=2/3). We ask her "What does "Green" refer to; what should she do? Rationally, she should answer "the green circle" in all cases, and not simply most of the time, because that is always (and not most of the time) the most likely answer. Likewise, a population of rational agents following the same model should all give the same answer: the green circle.

I am not denying that proponents of probabilistic pragmatics can sometimes model population-level phenomena, with great success; but I doubt that probabilistic models come ready-made, by nature, to model population-level phenomena. The transition from probability estimates by individuals to population-level frequencies seems anything but straightforward. What do you think?
Dan Sperber 26 November 2016 (20:15)

Thanks, people of the Mint, for sharing this interesting article with the cognition-and-culture community! (Actually, since the paper is not in open access anywhere, the sharing was less than perfect: may I suggest you make sure that papers you want to blog about are in open access, if necessary by asking the authors to make them so on their own web page or just pirating the paper yourselves?).
This article is a clear, simple, and useful presentation of formal “probabilistic pragmatics.” As a supporter of relevance theory, which is itself an informal but definitely probabilistic approach to pragmatics, I am very much in favour of improving the field by making use of the kind of formal tools Franke and Jäger are proposing. I agree that probabilistic pragmatics should be “(i) probabilistic, (ii) interactive, (iii) rationalistic or optimality-based, (iv) computational and (v) data-oriented.” This still leaves plenty of options, and in particular theoretical options regarding the kind of rational interaction involved.

Franke and Jäger (as well as Noah Goodman and his collaborators) assume that, beyond playing different roles, speaker and listener are involved in “reasoning about the beliefs and intentions of the other interlocutor” in a symmetric manner. To what extent is this really the case?

Consider, to highlight by way of comparison the importance of the issue, a simple strategic situation of a military kind. There are two armies A and B. Only A is in a position to attack. If A attacks, B can either defend themselves or flee. By attacking, A gives evidence to B that they have reason to believe that they can defeat B, which may cause B to flee, making A win. On the other hand, B can reason that A relied on giving this evidence and this is a reason to discount it. If, now B doesn’t flee and counter-attacks, this gives evidence to A that B has evidence that it is likely to win and this may give reason to A to flee, and so on. Both side are reasoning strategically in a symmetric manner except for the fact that the interaction is initiated by A. Now modify the situation: imagine that B has nowhere to flee – maybe they have themselves engineered this situation by burning their bridges – whereas A could flee if they wanted to. This asymmetry changes the strategic reasoning for both sides in major ways.

In relevance theory, we have argued that there is a basic asymmetry between speaker and listener. The speaker invests resources to be comprehended. The listener may opt out from investing resources to secure comprehension if he finds it too costly given the expected benefit. To take the example of the Green square / Green circle / Blue circle experiment that Olivier discussed in his comment, suppose the speaker says “green”, which could refer to the square or to one of the circles, and that the listener tentatively concludes that the ambiguity cannot be resolved with a degree of confidence sufficient to exploit the information (which, judging from the results of the experiment, may be psychologically true). Then the listener will abort comprehension and either ask for repair if this is an option or give up on comprehension. A genuine conversational situation is importantly different from the forced-choice problem-solving task presented to the participants in this experiment. More concretely, relevance theory develops the idea that there is a rational procedure for the listener which consists in following a path of least effort in constructing an interpretation and to stop when his expectations of relevance have been satisfied. In the process, these expectations can be revised down to zero, in particular because the expected effort cost becomes too high or because the uncertainty of interpretation lowers the expected benefit too much, aborting the process. This relevance guided procedure, we argue, is modular and part of the evolved equipment humans bring to the task of communicating with one another.

I am not trying here to argue for the relevance-theoretic approach. The point I want to make is that if there is some such asymmetry, either that described in relevance theory, or another one, then the strategic inferences of both the speaker and listener should be quite different from what is currently assumed in probabilistic pragmatics while the overall framework might still be the best one. This might, actually, make the formal probabilistic approach even more interesting and helpful to discover non-trivial properties of comprehension than what the present state of the art suggests.
James Winters 28 November 2016 (10:06)

The Minties will already know I’m a big fan of this work, and I don’t want to spend time extolling the virtues of the paper (the paper itself does a good job on that front). One potential issue I wanted to raise is that, as we know from the decision-making literature, there are fundamental differences between small worlds and large worlds. A small world is a situation in which all relevant alternatives, their consequences, and probabilities are known; allowing for an optimal solution to be discovered. A large world situation, by contrast, is characterised by the uncertainty: relevant information is unknown, or must be estimated from samples, and this future uncertainty violates the conditions for rational decision theory. In these small world situations, Bayesian models seem to perform pretty well, but when it comes to large world situations we find that basic heuristics frequently outperform these models (for a good overview, see Gigerenzer & Gaissmaier, 2011 ). The cases mentioned by Franke & Jäger seem to deal with small world situations, and it will be interesting to see how these assumptions in probabilistic pragmatics scale-up when faced with large world situations (which, to some extent, touches on the problems raised by both Olivier and Dan).

Lastly, I just wanted to offer a brief response to Dan's comment: Franke & Degen (2016) take some tentative steps toward modelling differences in strategic inferences made by speakers and listeners. What they look at explicitly is the depth of pragmatic reasoning in reference games. This allowed them to model heterogeneous pragmatic reasoning in speakers (literal, Gricean, Hyper-pragmatic) and listeners (literal, exhaustive, Gricean). And, whilst this is not directly applicable to your point about the asymmetry in speakers and listeners, it seems that this assumption of population-level variation in pragmatic reasoning is warranted: “We compare a homogeneous type model that assumes that all speakers and listeners are probabilistic Gricean reasoners to a heterogeneous type model that also considers other theoretically motivated types of responders. We show by Bayesian comparison of nested models that the heterogeneous model is a better predictor of individual-level data, especially for listener comprehension. This suggests that individuals differ in their ability to perform ad hoc Quantity reasoning in reference games.”

Article for October: Pragmatic Choice in Conversation

Article for December: Population size does not explain past changes in cultural complexity (?)

Article for November: Probabilistic pragmatics

4 Comments

Thomas Müller 22 November 2016 (15:49)

Olivier Morin 24 November 2016 (09:22)

Dan Sperber 26 November 2016 (20:15)

James Winters 28 November 2016 (10:06)