Inferential communication and information theory

Speaking Our Minds is a timely book that very effectively frames many of the current important problems facing researchers interested in the nature of language and communication. Too few scholars today are worried simultaneously about evolutionary psychology and pragmatics, and ever since my introduction to Thom’s work with his article “Defining biological communication” I have found myself often in high agreement with his conclusions. That said, the main point of contention I would like to bring to the group here is actually a fairly fundamental issue in which my own view has recently shifted away from modern pragmatics theory. The issue concerns the purported distinction between the code model of communication and ostensive-inferential communication, and what it really adds to our current understanding. Is the supposed failure of an information theory-based code model an outdated (and false) argument from a previous time in pragmatics that needed a straw man? I’m thinking it is.

One concern at the heart of the code model problem involves the role of inference in communication. Unconscious inference is a central concept in cognitive science: it’s everywhere. Even the most low-level adaptive problems in perception involving feature detection and integration are solved through inferential procedures. Bottom-up information across modalities is often structured statistically but highly impoverished, so computational solutions incorporate rich priors to extract meaningful data that can be further processed, eventually leading to perceptual experience. Top-down processes are central to all of cognition, and importantly, communication. I’m sure Thom would agree. There does not seem to be much objection to the application of information theory to these topics, so it’s not just a problem about inference on unobservable data. But then what is it exactly? If we are to approach human communication generally, and linguistic communication specifically, from a computational point of view, it is hard to imagine how one might model these inferential processes free from information theory.

The primary problem as Thom describes it, following Sperber and Wilson (1986/1995), is that the code model and ostensive-inferential communication require different internal mechanisms to function. A coding scheme involves associations where encoded symbols are sent through a channel with noise and then decoded, ideally recovered with perfect resolution but often only probabilistically retrieved. The content of the message is physically in the signal, as opposed to ostensive-inferential communication where senders provide evidence of a meaning that must be inferred. But here is where an important misinterpretation might exist: the precise way the signal is associated with the content of the message (the coding scheme) is unspecified in information theory. Shannon’s original formulation, of course, was not designed to explain language understanding, or even human communication for that matter, but instead was provided as a highly general framework describing the formal problem of information transmission in noisy systems. The specific computational nature of a given communication problem depends on the given task demands. As Thom well describes throughout his book, linguistic conventions, including grammar and lexical semantics, likely evolved in the context of an existing ostensive-inferential cognitive environment, so it seems completely reasonable to assume that the coding constituted by linguistic input has been shaped by the inferential abilities of decoders. The generality of the information theory framework can accommodate this circumstance. According to the now standard view in pragmatics, once inferential meaning must be generated based on decoded evidence, the code model reportedly fails. A closer look at the formal properties of information theory does not make it obvious why this should be so. If we view the derivation of some coded message as a potential state change in a receiver, there is no rule about the particular input to output relationship, such as an isomorphism in surface features, or exactly how the content manifests itself in the code at all—only that the receiver’s state has a possible reduction in uncertainty as a function of decoding some aspect of the signal. In a given communication system, the structured input by a sender can be tailored specifically for the receiver by design, and the acquisition of that message with the associated reduction of entropy (relative to a state without that signal) can manifest itself in any number of ways and with any number of alternatives. As Thom points out, code models can involve inference. The problem is not in a limitation of information theory—the problem is we don’t know exactly how inferential communication actually works.

The beauty of inferential models of communication comes from the amazing insight that “arbitrary” symbol systems are used strategically by people in conjunction with a number of signaling channels to help targeted listeners derive relevant meanings in context-specific ways. At some level with language use, the code will likely involve a duality of patterning that affords a retrieval of the specific linguistic content, but the decoding story doesn’t have to end there. It’s not clear formally why a distinction between natural codes and conventionalized codes prevent a computational solution that involves the probabilistic recognition of implicit meaning via the relevant processing of meaningful symbols as well as other continuous data streams. Principles of relevance describe how senders will signal, through multiple simultaneous channels, not only communicative intentions, but informative intentions. A code model can handle that if there are co-evolved encoding and decoding algorithms: encryption is a common application of information theory. In the present case, the system is designed to maximize relevance, but proximately that can look a lot like encryption, and sometimes actually is (e.g., Clark & Schaefer, 1987)

Donaldson-Matasci et al. (2013) used information theory to quantify relationships between noisy environmental cues and effective developmental strategies, and a similar analysis could be applied to how listeners infer implicit meaning through linguistic evidence. One important difference being that the sender is providing a designed signal, rather than an organism processing an environmental cue. Imagine a sender providing linguistic input, which has coded structural features, with the processing outcome measurable as entropy. This just involves the words—the sender’s meaning is not directly observable. In relevance theory terms, the exact words provide evidence of speaker meaning. In information theoretic terms, the literal words constitute a linked process that allows for a measure of conditional entropy. The question then becomes: how much is entropy reduced in an unobservable process (speaker meaning) as a function of observing a linked process (sentence meaning)? Another way of putting it would be to say that the cognitive effect of some ostensive communicative act is the receiver’s uncertainty regarding the implied message prior to receiving the signal minus her posterior uncertainty after receiving it. One advantage here is that the formal model affords a quantification of cognitive effects, something relevance theory technically lacks. Of course, a full computational account would need to specify the various sources of information that play into comprehension algorithms (e.g., nonverbal signals, contextual details, common ground, etc.), but in theory, these are potentially explicable and quantifiable processes that critically involve a code in information theoretic terms. The bottom line is this: if interlocutors are able to reliably use given structure in proximal signals to reduce uncertainty about unobservable meanings, they must be reliant on some set of mutual representations. The arbitrariness or novelty doesn’t really matter if the code is relevant and has predictable cognitive effects.

Finally, it is worth noting here that at some level, if one is to accept the proposition that brain activity is instantiated as a complex system of adaptive neural coding schemes, then it is hard to get around a code model for communication that is rooted in information theory. Communication behaviors are implemented in neural systems in the brain and body, and as Thom rightly points out, involve adaptations for both the production and perception of signals. Relevance theory, in combination with an evolutionarily informed information-based approach, and a developed theory of cultural transmission, provides the tools for a comprehensive theory of communication and cognition.

I want to thank Thom for some earlier discussion on this (I don’t believe the point I’m raising is a surprise), Ray Gibbs, and in particular to Clark Barrett for bringing this issue to my attention longer ago than I care to admit.

References
Clark, H. H., & Schaefer, E. F. (1987). Concealing one's meaning from overhearers. Journal of Memory and Language, 26(2), 209-225.

Donaldson-Matasci, M. C., Bergstrom, C. T., & Lachmann, M. (2013). When unreliable cues are good enough. The American Naturalist, 182(3), 313-327.

Sperber, D., & Wilson, D. (1986/1995).Relevance: Communication and cognition. Cambridge, MA: Harvard University Press.

9 Comments

Olivier Morin 1 July 2015 (09:51)

Let me thank Greg Bryant for this post, and allow me to try and dissociate two issues. One is the scope of information theory, the other the usefulness of the code model. Once we dissociate the two, we might be able to see why information theory can be useful without reintroducing a code model of communication.

I agree that interpretation, like all other cognitive activities, is an activity that reduces some uncertainty about the world. In other words, it falls under the scope of information theory. What I fail to see is how that vindicates the code model in any way. In fact, I see the association between information theory and coded communication as almost coincidental. Shannon happened to develop his theory while working on information transmission, but (as Greg Bryant himself notes) that theory is both much more than a theory of communication, and much less than that. Not to put too fine a point on it, this quote from Warren Weaver (in his introduction to Shannon’s treatise) spelled things out quite clearly:

“The word information, in this theory, is used in a special sense that must not be confused with its ordinary usage. In particular, information must not be confused with meaning. In fact, two messages, one of which is heavily loaded with meaning and the other of which is pure nonsense, can be exactly equivalent, from the present viewpoint, as regards information. It is this, undoubtedly, that Shannon means when he says that the semantic aspects of communication are irrelevant to the engineering aspects. (...) The concept of information developed in this theory at first seems disappointing and bizarre—disappointing because it has nothing to do with meaning, and bizarre because it deals not with a single message but rather with the statistical character of a whole ensemble of messages (…).”

Now, if information theory is nothing more or less than a general theory of the representation of uncertain events, providing a way of quantifying the computational constraints that weigh on any such representation, there is no reason to wed it to a code model of communication. Shannon happened to be working on coded messages, but that was just a historical accident (he could have been, say, building a computer to record casino draws, or doing theoretical work on thermodynamics).

I am quite persuaded that one could, as Greg Bryant points out, have an information-theoretic take on relevance and interpretation (if only we could quantify the information carried by gestures, words, mutually manifest context, etc.). Does this vindicate the code model? It depends on what one means by it. The trouble is, once we put information theory aside, it is not very easy, for me at least, to grasp what the code model actually consists in, if it is anything more than a foil. In fact, as I think about it more, the differences between code-model and ostensive communication seem to boil down to just one thing: the presence or absence of mindreading.

I find it hard to deny that code-model communication may involve complex inferences: as Thom points out (p. 7), decoding can be a highly non-trivial task. As Thom also insists, code-model communication may also be used intentionally and strategically. So rule these out as difference-makers. And (Thom again) code-model communication can be non-deterministic, and may involve a lot of uncertainty. Rule that one out too. I am sure Thom would agree that research on animal codes shows coded communication to be context-sensitive as well. Bees don’t follow any danced signal to any place whatsoever: they use their knowledge of the terrain to react appropriately.

So, if I could be so bold, I would say both the ostensive view and a properly understood code model should agree that neither complex inferences, intentional & strategic use, flexibility, or context-sensitivity set ostensive communication apart.

I may be misunderstanding something here, but I find it hard to square Thom’s definition of code-model communication as depending on “mechanisms of association” alone (pp. 5, 7) with his insistence that code-model communication can be inferential, flexible, probabilistic, used strategically, etc. With a loose enough definition of “mechanisms of association”, we can probably fit anything in, but then I don’t see how we can escape the conclusion that ostensive communication is exactly similar, in that regard, to code-model communication. After all, you could always say that it simply "associates" signals with speaker's meanings, albeit in a flexible, probabilistic, context-sensitive (etc.) way.

The one thing that, it seems to me, sets ostensive communication apart is, then, the use it makes of mindreading capacities. Not everyone here might share this view, though.
Thom Scott-Phillips 2 July 2015 (03:42)

Greg’s comments are challenge to a fundamental part of the thesis of SOM, and indeed to Relevance Theory itself (and some other pragmatic frameworks).

Following Relevance Theory, SOM makes much of the distinction between code model communication and ostensive-inferential communication. Greg asks whether this difference is as real as it is presented. After all, all cognitive inference is information-theoretic at some level. Olivier is right to point out, in response, that information theory and the code model are not the same, even if information theory is often presented as the canonical illustration of the code model (as it is in SOM). Still, Olivier shares some of Greg’s worries: “I don’t see how we can escape the conclusion that ostensive communication is exactly similar… to code-model communication”.

Here is what Relevance: Communication & Cognition (Sperber & Wilson, 1986) has to say: “Inferential and decoding processes are quite different. An inferential process starts from a set of premises and results in a set of conclusions which follow logically from, or are at least warranted by, the premises. A decoding process starts from a signal and results in the recovery of a message which is associated to the signal by an underlying code. In general, conclusions are not associated to their premises by a code, and signals do not warrant the messages they convey.” [p.12-13]

Here is another way to think about it: If ostensive communication can in general be described in terms of a code, then what is the code? What is the “underlying code” that links sticking my tongue out with the sentiment that these people are all idiots (see SOM, p.7)? There is of course no such thing, and hence this is not decoding. Comprehension is instead a matter of inference: your conclusion that these people are all idiots is warranted by the premises that I make manifest when I stick my tongue out. Greg might respond: “Ok, fair enough–but on this notion of inference, aren’t many supposed cases of code model communication actually inferential?” Quite possibly many cases of communication involve inference in some way. Non-human primate communication almost certainly does. But the point –the dividing line between coded communication and ostensive communication – is not whether inferential processes is involved, but whether inferential processes are sufficient to make communication possible in the first place.

Let me expand. One form of inference of particular importance for communication is metapsychology. It is important because, when developed to a sufficiently rich degree, it allows communication to take place even in the absence of any code. This is ostensive-inferential communication, and this is what I mean when I say that human communication is made possible by mechanisms of metapsychology (I do not think that I expressed this point quite so clearly in SOM). Other systems are made possible by codes, and can be made more expressively by inferential processes. In some cases, in particular non-human primate communication, those processes might even amount to metapsychology. That does not, however, amount to ostensive communication.
Dan Sperber 2 July 2015 (11:41)

Thanks to Greg for his post and to Olivier and Thom for their comments.This further comment is intended to be relevant to the issues they all three discuss.When Deirdre and I introduced the notion of ostensive-inferential communication, the "ostensive" bit referred to what the communicator does and the inferential referred to what the audience does. This, we have progressively realised, may be misleading. Inference is ubiquitous: in perception, in memory, in motor control, in fact in all aspects of cognition, including coding-decoding and the production of ostensive stimuli. We had in mind, as Thom rightly stresses, just the very special kind of metapsychological inference done by the audience from the communicator's "productions" (i.e., perceptible behaviour or perceptible traces of behaviour) to the communicator's communicative intention (in which the informative intention is embedded). We now prefer to talk (not yet reflected in our publications) just of ostensive communication, leaving the inferential part to the gloss. This should help make it clear (with the caveat that any possible misconstrual is going to occur) that we are not denying (and are obviously not committed to denying) that the use of a code can, in principle, be cognitively quite complex, involve rich inference, and even, indeed, as Olivier points out, strategic (that is, involving meta-representations of what other agents may think or intend including about what we may think or intend). None of this, I believe blurs the distinction with ostensive communication, which is not defined by metapresentational complexity, not even by their strategic character, but by what these metapresentations are about (namely, communicative and informative intentions). Also, let's remember that, unlike human languages, which would be grossly defective as tools for coding-decoding communication, and are adapted to serve a major but nevertheless subordinate role in ostensive communication where they render the range of what can be communicated limitless, the evolved codes of animal communication are adapted to the communication of very narrow ranges of information of here-and-now relevance. As far as I know, there isn't even compelling evidence of an intended strategic use of signals in animal communication.On the other hand, as Csibra and Gergely have pointed out, ostension permits the transmission of general knowledge. I surmise that only ostension does.
Dan Sperber 2 July 2015 (12:07)

Thanks to Greg for his post and to Olivier and Thom for their comments.This further comment is intended to be relevant to the issues they all three discuss.

When Deirdre and I introduced the notion of ostensive-inferential communication, the "ostensive" bit referred to what the communicator does and the inferential referred to what the audience does. This, we have progressively realised, may be misleading. Inference is ubiquitous: in perception, in memory, in motor control, in fact in all aspects of cognition, including coding-decoding and the production of ostensive stimuli. We had in mind, as Thom rightly stresses, just the very special kind of metapsychological inference done by the audience from the communicator's "productions" (i.e., perceptible behaviour or perceptible traces of behaviour) to the communicator's communicative intention (in which the informative intention is embedded). We now prefer to talk (not yet reflected in our publications) just of ostensive communication, leaving the inferential part to the gloss.

This should help make it clear (with the caveat that any possible misconstrual is going to occur) that we are not denying (and are obviously not committed to denying) that the use of a code can, in principle, be cognitively quite complex, involve rich inference, and even, indeed, as Olivier points out, strategic (that is, involving meta-representations of what other agents may think or intend including about what we may think or intend). None of this, I believe blurs the distinction with ostensive communication, which is not defined by metapresentational complexity, not even by their strategic character, but by what these metapresentations are about (namely, communicative and informative intentions). Also, let's remember that, unlike human languages, which would be grossly defective as tools for coding-decoding communication, and are adapted to serve a major but nevertheless subordinate role in ostensive communication where they render the range of what can be communicated limitless, the evolved codes of animal communication are adapted to the communication of very narrow ranges of information of here-and-now relevance. As far as I know, there isn't even compelling evidence of an intended strategic use of signals in animal communication.

On the other hand, as Csibra and Gergely have pointed out, ostension permits the transmission of general knowledge. I surmise that only ostension does.
Greg Bryant 6 July 2015 (04:53)

Thanks to all for interesting comments here. I think one important issue has to do with how we define “code.” I am using the word in a very general sense – a sense I believe is relevant to information theory. A code is any system of rules converting information into physical signals (numbers, letters, sounds, gestures, etc.) which then affords a computational process by which that information can be used to reduce uncertainty in a receiver. It is true, as Olivier points out, that the code model is separate from information theory, but I would argue that they seem to be used interchangeably by Thom, as well as by Sperber and Wilson et al. I maintain that we might want to put the code model straw man to bed, and instead focus on how information theory can help us address empirically the computational problem of ostensive communication.

Thom writes: “An inferential process starts from a set of premises and results in a set of conclusions which follow logically from, or are at least warranted by, the premises. A decoding process starts from a signal and results in the recovery of a message which is associated to the signal by an underlying code. In general, conclusions are not associated to their premises by a code, and signals do not warrant the messages they convey.”

Unless you limit the definition of coding quite severely, I would argue that deriving a conclusion from a set of premises is easily construed as a form of decoding. The question is in the details of how the premises are represented by structured signals and what sort of algorithmic processes constitute the logic of the derivation from those signals. This is essentially what I meant earlier when I mentioned that the limitation in our understanding is rooted in our lack of specific knowledge about how inferential communication actually works. So in describing a decoding process as recovering a message that is “associated” with the signal, I’m not seeing any specific conflict with the description of inference. Both are sufficiently vague to be mutually compatible when understood in an information theory framework. As I understand Olivier, he makes a similar point.

Thom then writes, “What is the “underlying code” that links sticking my tongue out with the sentiment that these people are all idiots (see SOM, p.7)? There is of course no such thing, and hence this is not decoding.” This reveals what I believe is a limitation in Thom’s definition of coding. I would argue that there is some relationship between the structural features of the expressive action and the likely inferences a target receiver will draw, and this relationship can be reasonably construed as a coding system of some sort. The details of that relationship aside (which, of course, is not currently understood, hence our problem), the physical signal systematically reduces uncertainty in perceivers regarding the contents of the message.

Finally, Thom states “But the point – the dividing line between coded communication and ostensive communication – is not whether inferential processes is involved, but whether inferential processes are sufficient to make communication possible in the first place.” I agree, but doesn’t this distinction break down if the coding scheme is evolved to exploit inferential capabilities?

Dan makes the important point that ostensive communication is about a particular kind of metarepresentation. I think there are good reasons to believe that the mechanisms generating these kinds of representations have a unique and identifiable developmental trajectory, and have some species-specific features, but there are also phylogenetically related systems we can observe in other animals as discussed by Katja in her commentary. I was curious, however, about a statement made by Dan at the end of his comment: “As far as I know, there isn't even compelling evidence of an intended strategic use of signals in animal communication.” I am wondering what exactly qualifies in his view because to my mind there are innumerable examples of this in the nonhuman literature. Perhaps it depends on one’s use of the word “strategic”? Functional deception in many animals (e.g., primate and bird species) satisfies the bill for me, including even chickens, not an animal with the most revered cognitive abilities. To take the chicken example (e.g., Marler, 1986), few, if any, would argue that when producing false food calls in the presence of strange females, roosters are trying to strategically change the mental state of the hens. Rather selection has likely shaped calling behavior by altering triggering mechanisms that increase copulation frequency, and thus reproductive fitness. But in other cases, such as the use of food calls in strategic ways by Capuchin monkeys (e.g., Di Bitetti, 2005), I believe there very solid reasons for assuming the mental states of other monkeys are part of the computational system, regardless of whether the monkeys are conscious of it or not.

To categorize a communicative act as ostensive must there be an explicit awareness of the specific strategic mental state manipulation? A similar question is also being considered in current debates on figurative language understanding (e.g., Gibbs, 2012). I would say no.

References

Di Bitetti, M. S. (2005). Food-associated calls and audience effects in tufted capuchin monkeys, Cebus apella nigritus. Animal Behaviour, 69(4), 911-919.

Gibbs, R. W. (2012). Are ironic acts deliberate? Journal of Pragmatics, 44(1), 104-115.

Marler, P., Dufty, A. & Pickert, R. (1986). Vocal communication in the domestic chicken II: Is a sender sensitive to the presence and nature of a receiver? Animal Behaviour, 34(1), 194-198.
Dan Sperber 6 July 2015 (15:02)

Greg writes: "I was curious about a statement made by Dan at the end of his comment: “As far as I know, there isn't even compelling evidence of an intended strategic use of signals in animal communication.” I am wondering what exactly qualifies in his view because to my mind there are innumerable examples of this in the nonhuman literature."

I am well-aware of the examples of deceptive behavior, including, less frequently, deceptive use of signals in the animal literature. I should have not just said but stressed that I had in mind "intended strategic uses," where the agent intentionally engages in one course of action rather than another as a function of other agents possible responses. To not just do this but intend to do this, you need to be able to represent - consciously or unconsciously, this isn't the issue - how what you do influences what other might do in response. There is no compelling evidence that I know of that shows that non-human animals do this in their signalling behavior.

Di Bitetti, whose intersting work on delayed food call among capuchin monkeys Greg cites, does not provide clear evidence of strategic thinking. Di Bitetti himself cautiously concludes: "finders seem to withhold the production of food-associated calls under certain conditions in a functionally deceptive way." Meaning, I take it, that intentional deception isn't established.
Greg Bryant 6 July 2015 (19:12)

Dan is being a better skeptic than I am. I also agree that the literature on deception in nonhumans, unfortunately, often ignores signaling issues. I don’t intend to play the game where Dan smacks down various empirical examples, but one more example comes to mind that directly addresses Dan’s requirement that “the agent intentionally engages in one course of action rather than another as a function of other agent’s possible responses.”
Santos et al. (2006) show that free-ranging Rhesus monkeys, when faced with two containers of food – one that can be opened silently and one that can only be opened by ringing an attached bell – will preferentially choose the silent one when the researcher is averting her gaze, but will choose randomly when being viewed. These researchers definitely interpret the monkeys’ behavior as being strategically designed to manipulate others’ knowledge states, and they have similar work examining how eye gaze provides a cue to knowledge states. Of course, corvids are also well documented changing their caching behavior as a function of what conspecifics can and cannot see.
I am sympathetic to the skeptical stance here, and when I teach this stuff, this is the challenge I pass on to my students. But my instincts tell me that the problem here is mostly methodological.
Santos, L. R., Nissen, A. G., & Ferrugia, J. A. (2006). Rhesus monkeys, Macaca mulatta, know what others can and cannot hear. Animal Behaviour, 71(5), 1175-1181.
Dan Sperber 6 July 2015 (19:26)

Greg gives good examples of deception in monkeys and corvids. They might even involve an intention to deceive. The deception however isn't done by means of signalling. What I would find much more surprising (and, hence, extremely interesting) would be clear cases where animals intentionally use signals in order to misinform.
Greg Bryant 6 July 2015 (20:08)

The trick, of course, is showing it "clearly." As Dan points out, demonstrating the capacity to alter behavior as a function of different possible responses from target audiences shows the underlying psychology needed - I feel like this is the real reasoning hurdle. Rhesus monkeys have been shown to have some volitional control over their vocalizations (Hage et al. 2013), so maybe it's just a matter of time before they are documented vocalizing in ways that manipulate knowledge states of others. Perhaps another major limitation has to do with vocal and gestural control, in addition to mindreading abilities.

Intending to speak our mind, and speaking our mind

One explanation to rule them all?

Inferential communication and information theory

9 Comments

Olivier Morin 1 July 2015 (09:51)

Thom Scott-Phillips 2 July 2015 (03:42)

Dan Sperber 2 July 2015 (11:41)

Dan Sperber 2 July 2015 (12:07)

Greg Bryant 6 July 2015 (04:53)

Dan Sperber 6 July 2015 (15:02)

Greg Bryant 6 July 2015 (19:12)

Dan Sperber 6 July 2015 (19:26)

Greg Bryant 6 July 2015 (20:08)