Inferential communication and information theory
Speaking Our Minds is a timely book that very effectively frames many of the current important problems facing researchers interested in the nature of language and communication. Too few scholars today are worried simultaneously about evolutionary psychology and pragmatics, and ever since my introduction to Thom’s work with his article “Defining biological communication” I have found myself often in high agreement with his conclusions. That said, the main point of contention I would like to bring to the group here is actually a fairly fundamental issue in which my own view has recently shifted away from modern pragmatics theory. The issue concerns the purported distinction between the code model of communication and ostensive-inferential communication, and what it really adds to our current understanding. Is the supposed failure of an information theory-based code model an outdated (and false) argument from a previous time in pragmatics that needed a straw man? I’m thinking it is.
One concern at the heart of the code model problem involves the role of inference in communication. Unconscious inference is a central concept in cognitive science: it’s everywhere. Even the most low-level adaptive problems in perception involving feature detection and integration are solved through inferential procedures. Bottom-up information across modalities is often structured statistically but highly impoverished, so computational solutions incorporate rich priors to extract meaningful data that can be further processed, eventually leading to perceptual experience. Top-down processes are central to all of cognition, and importantly, communication. I’m sure Thom would agree. There does not seem to be much objection to the application of information theory to these topics, so it’s not just a problem about inference on unobservable data. But then what is it exactly? If we are to approach human communication generally, and linguistic communication specifically, from a computational point of view, it is hard to imagine how one might model these inferential processes free from information theory.
The primary problem as Thom describes it, following Sperber and Wilson (1986/1995), is that the code model and ostensive-inferential communication require different internal mechanisms to function. A coding scheme involves associations where encoded symbols are sent through a channel with noise and then decoded, ideally recovered with perfect resolution but often only probabilistically retrieved. The content of the message is physically in the signal, as opposed to ostensive-inferential communication where senders provide evidence of a meaning that must be inferred. But here is where an important misinterpretation might exist: the precise way the signal is associated with the content of the message (the coding scheme) is unspecified in information theory. Shannon’s original formulation, of course, was not designed to explain language understanding, or even human communication for that matter, but instead was provided as a highly general framework describing the formal problem of information transmission in noisy systems. The specific computational nature of a given communication problem depends on the given task demands. As Thom well describes throughout his book, linguistic conventions, including grammar and lexical semantics, likely evolved in the context of an existing ostensive-inferential cognitive environment, so it seems completely reasonable to assume that the coding constituted by linguistic input has been shaped by the inferential abilities of decoders. The generality of the information theory framework can accommodate this circumstance. According to the now standard view in pragmatics, once inferential meaning must be generated based on decoded evidence, the code model reportedly fails. A closer look at the formal properties of information theory does not make it obvious why this should be so. If we view the derivation of some coded message as a potential state change in a receiver, there is no rule about the particular input to output relationship, such as an isomorphism in surface features, or exactly how the content manifests itself in the code at all—only that the receiver’s state has a possible reduction in uncertainty as a function of decoding some aspect of the signal. In a given communication system, the structured input by a sender can be tailored specifically for the receiver by design, and the acquisition of that message with the associated reduction of entropy (relative to a state without that signal) can manifest itself in any number of ways and with any number of alternatives. As Thom points out, code models can involve inference. The problem is not in a limitation of information theory—the problem is we don’t know exactly how inferential communication actually works.
The beauty of inferential models of communication comes from the amazing insight that “arbitrary” symbol systems are used strategically by people in conjunction with a number of signaling channels to help targeted listeners derive relevant meanings in context-specific ways. At some level with language use, the code will likely involve a duality of patterning that affords a retrieval of the specific linguistic content, but the decoding story doesn’t have to end there. It’s not clear formally why a distinction between natural codes and conventionalized codes prevent a computational solution that involves the probabilistic recognition of implicit meaning via the relevant processing of meaningful symbols as well as other continuous data streams. Principles of relevance describe how senders will signal, through multiple simultaneous channels, not only communicative intentions, but informative intentions. A code model can handle that if there are co-evolved encoding and decoding algorithms: encryption is a common application of information theory. In the present case, the system is designed to maximize relevance, but proximately that can look a lot like encryption, and sometimes actually is (e.g., Clark & Schaefer, 1987)
Donaldson-Matasci et al. (2013) used information theory to quantify relationships between noisy environmental cues and effective developmental strategies, and a similar analysis could be applied to how listeners infer implicit meaning through linguistic evidence. One important difference being that the sender is providing a designed signal, rather than an organism processing an environmental cue. Imagine a sender providing linguistic input, which has coded structural features, with the processing outcome measurable as entropy. This just involves the words—the sender’s meaning is not directly observable. In relevance theory terms, the exact words provide evidence of speaker meaning. In information theoretic terms, the literal words constitute a linked process that allows for a measure of conditional entropy. The question then becomes: how much is entropy reduced in an unobservable process (speaker meaning) as a function of observing a linked process (sentence meaning)? Another way of putting it would be to say that the cognitive effect of some ostensive communicative act is the receiver’s uncertainty regarding the implied message prior to receiving the signal minus her posterior uncertainty after receiving it. One advantage here is that the formal model affords a quantification of cognitive effects, something relevance theory technically lacks. Of course, a full computational account would need to specify the various sources of information that play into comprehension algorithms (e.g., nonverbal signals, contextual details, common ground, etc.), but in theory, these are potentially explicable and quantifiable processes that critically involve a code in information theoretic terms. The bottom line is this: if interlocutors are able to reliably use given structure in proximal signals to reduce uncertainty about unobservable meanings, they must be reliant on some set of mutual representations. The arbitrariness or novelty doesn’t really matter if the code is relevant and has predictable cognitive effects.
Finally, it is worth noting here that at some level, if one is to accept the proposition that brain activity is instantiated as a complex system of adaptive neural coding schemes, then it is hard to get around a code model for communication that is rooted in information theory. Communication behaviors are implemented in neural systems in the brain and body, and as Thom rightly points out, involve adaptations for both the production and perception of signals. Relevance theory, in combination with an evolutionarily informed information-based approach, and a developed theory of cultural transmission, provides the tools for a comprehensive theory of communication and cognition.
I want to thank Thom for some earlier discussion on this (I don’t believe the point I’m raising is a surprise), Ray Gibbs, and in particular to Clark Barrett for bringing this issue to my attention longer ago than I care to admit.
Clark, H. H., & Schaefer, E. F. (1987). Concealing one’s meaning from overhearers. Journal of Memory and Language, 26(2), 209-225.
Donaldson-Matasci, M. C., Bergstrom, C. T., & Lachmann, M. (2013). When unreliable cues are good enough. The American Naturalist, 182(3), 313-327.
Sperber, D., & Wilson, D. (1986/1995).Relevance: Communication and cognition. Cambridge, MA: Harvard University Press.