One explanation to rule them all?
The field of language evolution, it seems to me, is a microcosm of the evolutionary behavioral sciences more generally, in the following sense: you can maintain more or less any position you want, even in the face of data. Is there a Universal Grammar? Some are convinced there is and others are equally positive there isn’t, with subjective probabilities for the two hypotheses hovering in the high nineties and low single digits, respectively, in the opposing camps. Is spoken language evolutionarily old, or relatively recent? Take your pick. Are there language-specific cognitive adaptations, or not? It depends on your postal code.
Amidst this free-for-all, many have tried their hand at finding the holy grail of language evolution: the single unique feature from which all the rest of language’s notorious complexity follows, the one explanatory ring to rule them all. Examples include Michael Tomasello’s candidate, shared intentionality, and Hauser, Chomsky, and Fitch’s recursion. With Speaking Our Minds (SOM), Thom Scott-Phillips introduces and defends his own candidate for what makes human language special: ostensive-inferential communication, which is in turn made possible by recursive mindreading. SOM mounts an impressive theoretical argument, and along the way makes a strong plea for the importance of bringing pragmatics to the fore in thinking about language evolution. I think the praise that the book has received is well-deserved, and I can’t imagine any serious scholar in the field of language evolution won’t feel compelled to read it and to engage with its arguments.
My comments about the book can be arranged into two basic categories, in descending order of positivity. First and most importantly, I could not be more enthusiastic about SOM’s emphasis on the importance of theory of mind and pragmatics in understanding language evolution. I agree that theory of mind is likely to have played a much larger role than anyone has yet recognized in enabling not only linguistic communication, but cultural transmission and cooperation more generally. To me the importance of mindreading in enabling and stabilizing many forms of human sociality has been criminally underexplored, and I’m not sure why. One possible explanation is Tooby and Cosmides’ notion of “instinct blindness:” mindreading underlies so much of everyday social interaction, and we do it so effortlessly, that we scarcely notice its operation or feel it necessary to invoke it in explaining communication and cognition. Whatever the reason, we are still largely in the dark about how important various forms of mindreading might be in solving the social adaptive problems that must be solved to make human communication and cooperation stable. Thom and I are entirely on the same page about this.
My second category of comment takes a more skeptical turn. While I find Thom’s account of the uniqueness of human language plausible, I am by no means convinced that he has identified, in recursive mindreading and ostensive communication, the prime causal mover that gave rise to the rest of human linguistic complexity. Because I take a broad view of what mindreading is—and I believe that basic components like intention-reading are phylogenetically widespread—I assume that mindreading, broadly speaking, far predates language. Moreover, I agree that some fairly sophisticated mindreading abilities such as belief tracking had to be in place before important aspects of human language, such as implicature, evolved. However, I am not yet convinced that multi-level recursive mindreading is the main element that gave rise to the rest of linguistic complexity. It’s certainly possible, but I don’t think the evidence is yet sufficient to call the game for recursive mindreading. Indeed, I think there are several links in the causal story offered in SOM that, while plausible, still remain to be verified. Thus, while I think the book’s focus on the importance of mindreading and pragmatics in language evolution is spot-on, I think it takes a victory lap too soon in declaring the major mysteries of language evolution solved.
This is true for both theoretical and empirical reasons. Empirically, I think we just don’t have enough evidence to say for sure whether many of the book’s key claims are true. And theoretically, I think there remain some mysteries even in some of the more basic mechanisms that Thom calls “well understood.” In the epilogue at the end of the book titled “The Big Questions Answered,” Thom’s ninth and final question and answer are:
Q: Coherence: Does the proposed account depend only on well-understood evolutionary mechanisms, or is it more speculative?
A: My proposals depend on well-understood evolutionary mechanisms alone.
I suppose one man’s understanding can be another man’s confusion, and I might be the confused one here. But as SOM points out, linguistic communication depends on solutions to deep problems of cooperation—in particular, the problem of what makes linguistic communication honest—and I don’t think the underlying mechanisms are well understood at all. Thom opts for reputation as the stabilizing mechanism, and it’s true that there exist game theoretic models in which reputational costs stabilize signal honesty. In that sense, the mechanisms in those models may be “well understood,” but I don’t think that’s equivalent to saying that the mechanisms that actually stabilize cooperative communication in human language are well understood. Indeed, problems of stability in linguistic communication are a subset of problems of large-scale cooperation more generally. While it’s clear that these problems have been at least partly solved in human cooperation—for example, I can engage in successful communication with a stranger I’ll never meet again—it’s not at all clear how they have been. Indeed, the problem I attributed to the field of language evolution, i.e., that you can believe more or less anything you like, seems to apply just as much if not more so to the field of the evolution of cooperation. Claims of “X” and “not X” coexist quite stably in this literature, as seen in debates over group selection, strong reciprocity, and the like. So, I don’t think we’ll be ready anytime soon to check the “Question Answered” box for what stabilizes large-scale cooperation.
Then there is the claim that what makes human linguistic communication special is ostensive-inferential communication, which in turn depends on recursive mindreading. Here again, while I think there is a plausible causal story, I’d like to see both better theoretical support in the form of evolutionary models, and much better empirical support from work in humans and other animals, including evidence from everyday linguistic communication outside the lab. It’s clear, from work by Thom and others, that humans can do recursive mindreading. SOM also offers some tentative evidence that only humans can do this, and that only humans have ostensive-referential communication. However, showing that humans have these abilities and that chimps do not is not the same as showing that these abilities causally enabled the evolution of human linguistic complexity. Indeed, that is difficult to show, because recursive mindreading and ostensive communication are offered as the ultimate causes of human linguistic complexity, and causation is notoriously difficult and perhaps impossible to show using the comparative method. This is especially true when there is only one taxon that shows all of the traits in question (us). And these aren’t the only cognitive traits that are uniquely derived in us; there have been changes in brain size, executive control, planning, tool use, social learning, social complexity, and more. Which factor is causal, if any? Moreover, while recursive mindreading of many levels can be shown in the lab, we don’t yet know how much everyday speech depends on many-level recursive embeddings of mental state inference, nor how much recursive mindreading is actually predicted by the theory. What amount, or lack thereof, would falsify the theory? Or is the mere demonstration that humans can do it enough to call the question settled? Because this is a theory that depends on a complex causal cascade, many of the steps of which have not yet been fully theoretically elaborated or empirically demonstrated, I think we have a long way to go before declaring all of the Big Questions answered.
To me, though, that’s no reason not to take seriously SOM’s argument for the likely importance of mindreading in language evolution, and to do so without considering the Big Questions of language evolution settled. Indeed, in my view, we’re better off proceeding without declaring a winner, because I doubt that there is one true explanatory ring to find. Instead, my hunch is that mindreading has been just one factor among many in enabling the gene-culture coevolutionary pathway leading to the current state of human linguistic complexity. If that’s so, the answers to the Big Questions will come not in the form of simple propositions involving easily stated concepts such as “ostensive communication,” but rather, a messy causal graph including many directed edges that we have yet to discover or name. Still, that’s no reason not to turn our attention right now to the importance of mindreading in language evolution and cultural evolution more generally. I hope that Speaking Our Minds convinces others, including researchers and granting agencies, that this is a quest well worth pursuing.