How not to combine ethnography and experiments in the study of moral judgment

In his latest blog post, Hugo Mercier, discusses Clark Barrett et al.’s paper in PNAS: “Small-scale societies exhibit fundamental variation in the role of intentions in moral judgment.” [´1] Unlike Hugo, I don’t find this piece of work fascinating. In fact, given that excellent scholars I respect and admire have invested a good amount of effort in this work, I am quite disappointed, disappointed enough that I could write a long post detailing what I see as many serious theoretical and methodological weaknesses in this article, and too disappointed to bother to do so. Still, prodded by Hugo, I will react. To begin with, let me quote the way the authors describe what they see as the significance of their article:

It is widely considered a universal feature of human moral psychology that reasons for actions are taken into account in most moral judgments. However, most evidence for this moral intent hypothesis comes from large-scale industrialized societies. We used a standardized methodology to test the moral intent hypothesis across eight traditional small-scale societies (ranging from hunter-gatherer to pastoralist to horticulturalist) and two Western societies (one urban, one rural). The results show substantial variation in the degree to which an individual’s intentions influence moral judgments of his or her actions, with intentions in some cases playing no role at all. This dimension of cross-cultural variation in moral judgment may have important implications for understanding cultural disagreements over wrongdoing.

Sound promising but what does this study really show about moral judgment across culture, if anything?

The word “moral” appears more than an hundred times in the article, but the concept is not discussed at all: does any notion of “moral” have cross-cultural relevance and, if so, which notion? Not discussed. Which notion(s), if any, correspond(s) to “moral” in the societies compared? Not discussed.
Food taboos and disgusting food (in societies where there no food taboos) are considered on par, and both as unproblematically moral, two less-than-obvious decisions but no discussion.

So, to begin with, it is not that clear what the article is really about.

Participants were presented with vignettes describing some objectionable actions such as a theft and were asked in particular:

In your opinion, how good or bad was what [the agent] did?
When people discover what happened, what will people think of [the agent] — will they think he is a good person or a bad person?
In your opinion did [the agent do this] on purpose, or by accident?
In your opinion, do you think [Agent] should be rewarded or punished?

I very much doubt that the relevant notions (“bad action,” “bad person,” “on purpose,” “reward” and “punishment”) are identical across cultures or close enough that people’s response can be numerically compared. This is not discussed, nor is the way people in different cultures might have interpreted their task.

Still, Hugo is impressed by the case of the Yasawans (Fiji islanders), who, it seems, “judge equally harshly a series of moral wrongdoings irrespective of whether they were committed intentionally or not”. Well, as figure 3 shows, the Yasawa, quite unlike the other groups, tended to judge with the exact same mildness all actions presented, theft, food taboo violation, poisoning, bodily harm, whether they were done intentionally or not. Or maybe, more plausibly, they thought that answering in the same manner (by pointing to the middle of a horizontal five-points response scale) to all these questions was the right thing to do. How Yasawans and other participants might have understood the task and the proper way to perform it is not discussed.

Still, the authors rightly note:

Interestingly, Yasawa is a society in the Pacific culture area where mental opacity norms, which proscribe speculating about the reasons for others’ behavior in some contexts, have been reported.

Such kind of relevant ethnographic observation should have been provided in much greater detail to help interpret not just the Yawasans responses, but all the experimental evidence presented. I have little doubt that the result of putting things in a rich enough ethnographic perspective would have, on the one hand, allowed fine-grained qualitative comparison and made the work much more interesting and, on the other hand, would have radically put into question the numerical comparability of the cases.

Just as an anecdotal illustration, Joe Henrich and his collaborators have published several articles on Yasawa culture and society that, unlike this one, combine ethnography and experiments in a useful and sometimes ground-breaking way. In Rita Anne McNamara, Ara Norenzayan & Joseph Henrich (2016) Supernatural punishment, in-group biases, and material insecurity: experiments and ethnography from Yasawa, Fiji (Religion, Brain & Behavior, 6:1, 34-55) in particular, there is, in passing, the following observation:

Yasawans often see norm violations against increasingly distant outsiders as increasingly permissible (Henrich, 2008); although villagers are generally friendly and hospitable to everyone, they also find it more acceptable to steal from high-end tourist resorts than known members of the village. While the resorts regularly employ locals, many villagers are fired for stealing shortly after starting work (this may also be related to the traditional needs-based distribution and redistribution routinely employed among Yasawans, as documented in Gervais, 2013).

Shouldn’t such observations be brought to bear on the interpretation of Yasawans judgment on theft, especially since the theft described in the vignette is said to take place among people who do not know each other, hence, realistically, not among Yasawans who are all or nearly all acquainted with each other?

Instead of a useful combination of relevant ethnographic details and experiments, we get a study full of detailed statistics (without which, I reckon, it would not have been published in PNAS) and very little useful content. The authors, being serious scholars, are very prudent in drawing any theoretical conclusion from their work. The strongest conclusion they draw is:

Our findings do not suggest that intentions and other reasons for action are not important in moral judgment. Instead, what they suggest is that the roles that intentions and reasons for action play in moral judgment are not universal across cultures, but rather, variable. One way of interpreting this is that reasoning about the sources of an agent’s action—using theory of mind and other evolved abilities—is universally available as a resource for moral judgments, but it might not always be used in the same way, or even at all, in particular cases.

I would just qualify this commonsense conclusion by making the trivial point that a factor such as taking intentions into account in judging actions and actors can both be universal and have variable manifestations across cultures. So, what do we learn?

[1] Barrett, H. C., Bolyanatz, A., Crittenden, A. N., Fessler, D. M., Fitzpatrick, S., Gurven, M., … & Laurence, S. (2016). Small-scale societies exhibit fundamental variation in the role of intentions in moral judgment. Proceedings of the National Academy of Sciences, 113(17), 4688-4693.

6 Comments

Clark Barrett 29 May 2016 (03:08)

Log in to Reply

Thanks to Hugo and Dan for their comments on my recent paper, “Small-scale societies exhibit fundamental variation in the role of intentions in moral judgment,” co-authored with my collaborators on the AHRC Culture and the Mind Project (http://www.philosophy.dept.shef.ac.uk/culture&mind/). I understand where Dan is coming from and I think many of his criticisms are well-taken. I think we both agree that in cross-cultural work, explaining the reasons for patterns of universality and variation is the holy grail we’re after (a metaphor that is appropriate in several ways). However, some of the aspects of the paper that he seems to think are bugs I consider features.
The project was designed to measure universals and variation in judgments about scenarios involving reasons for an agent’s actions. We designed the scenarios carefully with at least two things in mind: first, to look at a range of contexts and reasons for action that might speak to the growing literature about the role of reasons for action in moral judgment, and second, to make the scenarios as comparable as possible across cultures, in terms of participants’ construal of what is going on in the scenarios. We (most of the coauthors anyway) are anthropologists, and are well aware of the fraughtness of the issue of interpretation across cultures and the question of what counts as “the same” situation in different cultural contexts. It is quite possible that some, much, or all of the variance that we see lies in differences in how the scenarios are “interpreted” (indeed, some might argue that differences of interpretation are the only explanation for differences in judgment and decision-making across individuals). It’s also possible that judgment differences did not arise from different construals of the situations we presented, but from genuine differences of opinion over whether the very same action counts as right or wrong. As we noted in the paper and Dan notes again, lots of additional work will be required to nail down exactly why we observed the patterns we did. There is a tradeoff between trying to gather systematic data on a lot of scenarios across a lot of cultures, and trying to understand in great detail the reasons for any given micro-pattern in the data. We were aware of this tradeoff and decided to go for the former.
Given this, I view our reluctance to provide a just-so story (or many culture-specific just-so stories) for our data a feature, not a bug. Indeed, during the review process we were repeatedly pressed to provide such a story. We presented some speculations (e.g., the mental opacity conjecture that Hugo mentions), but kept it light. I stand by that decision.
One might infer from Dan’s remarks that he thinks our data would have been better left uncollected and / or unpublished, given that we can’t provide the explanation for what we found. I beg to differ. In fact, I think that carefully collected cross-cultural data are extraordinarily important and easily as valuable as data from brain mapping or genomics, fields where everyone understands that the data are valuable in and of themselves and serve as the basis for explanations and theory-testing that come later. I applaud and thank PNAS and our editor Doug Medin for sharing this view of the value of our data as worthy of presentation to PNAS’ audience. I also applaud them for not requiring that we make up a story as a condition for publication (or to go back and get the story, which will take many more years of work on top of the years we’ve already spent to get these data). I share Hugo’s view that these data are fascinating, even if they raise more questions than they answer. Curiously, in the brief time since the paper has been published, some (like Hugo) have said that they found the data surprising, and some (like Dan) have said that they found the study’s findings obvious or trivial. This difference of opinion suggests to me that our study was more than an empty exercise. (On the statistics: the goal of the paper was to measure variation, and the statistical approach we took is appropriate to that. Applying a measurement tool multiple times and then estimating how much variation one would be likely to observe if one did those measurements again, even if some of the variation is due to differences in how subjects respond to the measurement technique, is just what the statistical methods we used are for).
It seems fairly obvious by now, as pointed out by John Ioannidis, Uri Simonsohn and others, that social science is awash in false positives. In my view this is probably not because most social science data are “bad” (though some are). It’s because our interpretations are. It’s a sad fact that savvy consumers of research papers must learn the art of subtracting out the authors’ own interpretations of their results to see the data and the methods for what they really are. In most cases, this is not easy. Indeed, as many have pointed out, the incentive structure of science leads to a perfect storm for making up stories and dressing up one’s data in theoretical glitter (see great recent work by Paul Smaldino and Richard McElreath on this; also, Hugo’s and Dan’s work might have something to say about how seriously we should take scientists’ reasoned explanations for what they’ve found). Anthropology is perhaps uniquely poised for the accumulation of stories because of the rarity and value of rich ethnographic expertise, which means that whatever stories ethnographic experts make up are likely to have much more staying power than stories made up by lab psychologists, whose explanations can be tested more easily via replication (though even there, stories have a way of sticking around). For this paper we could easily have made up a story (which probably would have been wrong) and it would have become attached to the paper as “the” explanation for the data. Instead, we tried to make it as easy as possible for readers to see exactly what we did, without a heavy theoretical agenda. Despite the paper’s flaws, of which I’m sure there are many, I’m pleased that we were able to make a contribution to the primary anthropological literature without contributing to an unfortunate form of cultural accumulation.
Thom Scott-Phillips 31 May 2016 (08:03)

Log in to Reply

Many thanks to Hugo, Dan, and Clark for the thoughful discussion. I share Hugo’s positive impression of the quantitative richness of the data – but I also share Dan’s concern about the difficultly of interpretation, especially in the relative absense of corresponding qualitative data. With that in mind, I’d like, if I may, to press Clark on his response to Dan.

The abstract reports that “there is substantial cross-cultural variation among eight traditional small-scale societies… and two Western societies… in the extent to which intent and mitigating circumstances influence moral judgments”. But, as Dan suggests, the data don’t actually say this, not directly. What the data do say is that there is variation in how the questions posed were answered. It is not clear, however, for the reasons Dan mentions, whether these responses accurately reflect moral judgements themselves. In his response, Clark seems to grant the point: “We [the authors]… are anthropologists, and are well aware of the fraughtness of the issue of interpretation across cultures and the question of what counts as ‘the same’ situation in different cultural contexts. It is quite possible that some, much, or all of the variance that we see lies in differences in how the scenarios are ‘interpreted'”. I appreciate the efforts that Clark and his co-authors made to make the stimuli as comparable as possible, but the point still stands: if we should be cautious in assuming that the stimuli are interpreted in similar ways across cultures, then what justifies the confidence of the headline conclusion, namely that there is indeed substantial variation in moral judgements themselves? Is this having things both ways?
Dan Sperber 2 June 2016 (19:16)

Log in to Reply

Thanks to Clark for his very reasonable defence of his article. Indeed, if the alternative was to use the experimental data to produce just-so stories about cultural differences in the role of intention in moral judgment, the article as it stands avoids such a pitfall and remains intellectually honest. That there are such variations in “moral judgement” (or some such thing) is a sensible conclusion that could have been argued on the basis of general anthropological and psychological knowledge. As Thom suggest in his comment, the evidence in this article, while compatible with this conclusion, doesn’t really give it significant support.

My disappointment with the article comes from the fact that I had another alternative in mind: an idea of the kind of multi-fieldsite anthropological and psychological approach that could have been pursued to address the general issue. A team of anthropologists working in different societies could have been gathered not, or not just, to administer a standardised experiment but to answer in a richer a small set of closely related questions such as:

– Are there actions in the society that are considered harmful and that are judged differently whether they are performed accidentally/intentionally?
– Are there actions in the society that are considered harmful and that are judged identically whether they are performed accidentally/intentionally?
– Are there types of judgements of harmful actions that are sensitive/insensitive to the intentional character of the action?
– Is there judgement of an intention independently of its being carried out?

The fieldworkers should both draw on standard anthropological methods to answer such questions and use experimental methods to elicit spontaneous intuitive inferences on the issue that may go beyond or contradict the local cultural dogma (in the style, for instance of Rita Astuti’s experiments on Vezo’s idea of death). The experiments should have been designed (with room for serious local adjustment) with two goals in mind: 1) provide genuine insight into local ideas, going beyond and complementing what is afforded by standard anthropological methods; 2) allow comparison between societies. There is an obvious tension between these two goals, but it is an illusion to think that you can pursue the second without to begin with achieving the first one at least to some extent. This I believe is what happened with the work of Barrett et al. Quantitative comparability was given paramount importance. As a result, the experimental data gives no clear insight into the local situation (and when, with the help of further ethnography it seems to give some insight, then comparability is lost or at least compromised).

The outcome of the kind of study I have in mind would be in the form of a collection and confrontation of short studies answering the same questions in a basically qualitative way but with the help of experimental and quantitative method. A qualitative comparison would be possible. To what extent would a quantitative comparison be possible too? I don’t know but I guess it would at best be very rough. Still, we would learn something and genuinely contribute to our understanding or the role of intention in the evaluation of actions across cultures. If more quantitative comparisons were wanted, this would show the way for them (and, possibly, show that the way is very long and hard — or maybe not so hard).

Now, I imagine Clark might say that this would be a much bigger and more expensive undertaking than theirs, and that therefore the cost/benefit ratio might not be better. Let me suggest that this need not be so. When this web site of ours is up and working again, for instance, we might use it to produce exactly such a study. We might open a competition for, say, ten mini-grants for field anthropologists willing to participate in such a study, and… Well this discussion of Barrett et al. is not the place to develop the idea, but what do you say, Clark, you who have thought more about the problem raised by such a cross-cultural study and have rich experience in the matter: would you accept to consider thinking about such a project with us?
Thom Scott-Phillips 3 June 2016 (10:10)

Log in to Reply

This last comment is a very constructive and interesting contribution Dan! It prompts me to bring attention to this paper, published in Social Anthropology last year, by Denis Regnier, which seems to me to have several of the features that you describe in your post. (I believe that Denis did this work while a PhD student with Rita Astuti and Maurice Bloch.) This work is much more modest in size and ambition than the agenda you describe in your comment, and it is not cross-cultural, but I would be interested to know if you think that it has something of the constructive mix of ethnography and semi-controlled experiments that you are looking for. Title and abstract below (apologies to all for the paywall: I can’t find an open-access version).

Clean people, unclean people: The essentialisation of ‘slaves’ among the southern Betsileo of Madagascar
In this article I argue that among the southern Betsileo slave descendants are essentialised by free descendants. After explaining how this striking case of psychological essentialism manifests in the local context, I provide experimental evidence for it and discuss the results of three cognitive tasks that I ran in the field. I then suggest that slaves were not essentialised in the pre-colonial era and contend that the essentialist construal only became entrenched in the aftermath of the 1896 abolition of slavery, which paradoxically triggered the historical process of essentialisation.
Dan Sperber 5 June 2016 (00:19)

Log in to Reply

I am confident that Clark, his co-authors, Thom, and I share basic goals, and in particular the goal of better understanding how evolved aspects of human psychology both make possible human cultural variability and put constraint on it. For this, cross-cultural comparisons based on well-controlled experimental evidence must provide importance evidence. Clark Barrett is a pioneer in carrying out such studies. For instance, Barrett, H.C et al. (2013). Early false-belief understanding in traditional non-Western societies. Proceedings of the Royal Society of London, Series B. 280(1755): 20122654., establishes through experiments done across several cultures the universality of early mindreading, which cannot but have major consequences for the cultural variability (of course, a universal ability to attribute mental states to others may be deployed quite differently across culture). One of these possible consequences of mindreading is to make it possible to attribute intentions and to make use of such attributions in moral judgement.

Properly controlled experiments on mindreading in infants can be performed across societies without raising major problems linked to cultural differences (given the limited impact of culture on infants in this respect). On the other hand, properly controlled cross-cultural experiments on culturally informed moral judgement in older children or adults are much harder to devise. Barrett et al.’s study that I discussed in my post used carefully standardized vignette to elicit such moral judgements, but this, I have argued, does not come near securing the kind of comparability that was aimed at or making the statistical comparisons compelling evidence for any non-trivial theoretical claim.

Once you are testing participants who are fully immersed in their culture on culturally significant material, things become much harder. For instance, eliciting reactions to translation of the same narrative vignette (with minor adaptations to the local culture) doesn’t come near securing that participants across cultures interpret the story and the task in truly similar, let alone identical ways. In most cases – and I would argue that the role of intention-attribution in moral judgement is such a case – comparative experimental studies cannot be properly devised unless they can rely on rich, precise and reliable ethnography. For this, we need not only standard ethnography but also ethnographic experimental studies investigating the way people in a given society interpret relevant issues and questions in different situations. There are very few such studies and the competence in devising them is at its inception. As Thom suggests, the work of the LSE group with Astuti, Bloch, or Régnier, provides all too rare good examples of the kind of experimentally enriched ethnography that should be further developed.

Without such sufficient ethnographic foundations, even the best cross-cultural experimental studies of cultural variability (for instance J. Henrich et al’s (Eds.) (2004) Foundations of Human Sociality: Ethnography and Experiments in 15 Small-Scale societies. Oxford: Oxford University Press) leave us with hard-to-interpret precise fragments of answers to ill-defined questions or, metaphorically speaking, with what looks like well-crafted pieces of a jig-saw puzzle without any certainty that there exist a puzzle in which they actually fit. So, for most questions of anthropological relevance, adequate comparative experimental evidence is hard or impossible to develop before good ethnographic experimental evidence, and this is where much of our effort should go at this stage. If one is intent on pursuing a comparative goal, a priority should be to develop the proper ethnographic tools to pursue it.
Clark Barrett 6 June 2016 (22:37)

Log in to Reply

My hunch is that Dan and Thom and I don’t disagree on most of the fundamentals here. In the continuum between a useless exercise that tells us nothing about morality and a definitive demonstration and explanation of moral differences across ten societies, I think our study falls somewhere in the middle. I’ll briefly explain how and why I think it does, and what remains to be done. I’ll also mention a few things about the project that might have been unclear or buried in the supplementary materials (in particular, we collected free responses of participants explaining their judgments, which are intended to be explored a follow-up analysis; mentioned in the SI but easily overlooked).

Let me first reiterate that what we have published is meant to establish a baseline of variation in moral judgments as a function of the vignette parameters we systematically manipulated (scenario context, intentionality, mitigating factors) and the judgment variables we measured (badness, punishment, reputation, and a few others than can be found in the SI, including judgments of intentionality). We chose to publish this in PNAS because of the wide readership and attention the study would receive, and thus were limited to 6 pages. Though there are over 70 pages of supplementary materials, we decided not to use them to write a longer paper since supplementary materials are rarely read in detail and meant for technical details. The main aim of the paper then, as we’ve established, was to carefully document variation rather than explaining it. This does not mean, however, that we can’t begin to explore explanations on the basis of what we found.

Cautiousness being a virtue, the only thing I will definitively commit to about the paper’s findings is that this is what you get when you ask the questions we asked of people in the ten places we worked. That is not meant to sound cagey, because it’s true of all studies (this is what you get when you show a pigeon a flashing light; this is what you get when you give people a warm mug of hot chocolate, etc). Moreover, as I mentioned in my last post, the project of ethnographers trying to explain what they found via interviews and other things has its own kind of fraughtness. There is a good reason why some would argue that the only real explanations come via experiments in which a variable is manipulated and effects on the outcome are observed. A virtue of our study is that it does allow us to say that this is what happens to judgments, in these societies, when you change factor X in the scenario. It doesn’t yet allow us to say why, of course, other than in the narrow sense that changing this factor influences peoples’ judgments in a certain way.

That said, though, I think there are reasons to confident, contra Thom’s suggestions, that we actually were measuring moral judgments. By moral judgment I mean, very plainly, peoples’ judgments about moral issues. I know the question of what counts as “moral” is itself fraught (and I’d submit that if you want a six page paper to answer that question, measure a massive number of such judments across ten societies, and explain it all, you’re asking for a bit much!). However, I think we have good reason to believe that we are tapping into some kind of moral intuitions, in the ordinary, commensense meaning of that term. Indeed, the ordinary and commonsense were all we were looking for. Let me give you a couple of examples in the Sperberian style.

(1a) Q: Is abortion bad?

(1b) A: Yes.

Does A’s response here reflect a moral judgment? I’d wager it does. You might argue, perhaps rightly, that my wager is based on a lot of background knowledge (is A an American? In which state, in which decade? etc). And that’s true. But contrast (1) with (2):

(2a) Q: Is that tuna fish bad?

(2b) A: Yes.

Most of us would, I suppose, wager that A is not making a moral judgment here. How do we know? Of course, we can’t know for sure (callback to claims about cautiousness above), but I think we’d be on relatively safe ground wagering that A is not making a moral judgment in the ordinary sense of the term. Is such a hunch based entirely on cultural knowledge, as in knowing the cultural history of the abortion debate in the U.S.? Not necessarily. It can also be based on some basic knowledge of psychology, and human nature, and relevance. “Bad” is a polysemous term that can be applied in different ways in different contexts. Some version of it is, as far as I know, universal to all language (unlike the term morality) and can be applied with both moral and non-moral shadings everywhere. Importantly, we can make some baseline inferences about how likely the term “bad” is to reflect moral badness based on the kind of thing it’s being applied to. Calling a human action or the person who does it “bad” is a good candidate for a moral judgment (though not the only possibility); calling a piece of food “bad” is not a particularly good candidate (though calling a person who chooses to eat that food “bad” could be). Some cultural background knowledge is important, of course, but basic principles of human nature plus relevance can get you a long way. If I tell you the Shuar word for bad is yajauch and you overhear a Shuar person say of another person, “yajaucheiti” (“he’s bad,”), I think you can make a pretty good guess about what that person means, even if it’s the only word of Shuar you know. Indeed, as an ethnographer of the Shuar, I can attest that if a person said of another person “yajaucheiti” and did not mean it as a moral judgment, they would almost certainly need to provide additional disambiguating information to prevent the first-pass inference any Shuar speaker would make, on the basis of relevance, that the assertion is a moral one.

Note that this baseline level of taking care that participants understood what we meant is something that we took very seriously. We took care to ensure that each field study was carried out by an experienced ethnographer, each of whom participated in the design of the vignettes, the translation, and back-translation.
But, in fact, we can and did do better than just the baseline of measuring “badness.” Consider the following exchange:

(3a) Q: If X got an abortion, was her action [Very good, good, neutral, bad, very bad]?

(3b) A: Very bad.

(3c) Q: Will people think [poorly, well] of X?

(3d) A: Poorly.

(3e) Q: Should X be [punished, rewarded]?

(3f) A: Punished.

Infamously, Donald Trump recently participated in just such an exchange. He stated that abortion is bad and that women who get abortions should be punished. When asked to explain himself, he replied that if a person does something bad they should be punished. This was of course massively controversial, but some commentators pointed out that Trump’s response was completely logical: if someone does something bad, in his view, they should be punished. Indeed, it could be (and has been) argued that Trump exposed an apparent inconsistency in the mainstream Republican position on abortion, which treats abortion an immoral act, but not (necessarily) the person choosing it.

Now imagine asking a variant of (3) applied to the “bad” tuna fish from conversation (2).

(4a) Q: Is this tuna fish bad?

(4b) A: Yes.

(4c) Q: Should it be punished?

(4d) A: Of course not!

Of course, punishment is not the be-all, end-all litmus test for what counts as moral, and I have no doubt that judgments of morality, punishment, and reputational consequences can to some degree be decoupled (our data provide estimates). However, we measured all three kinds of judgments precisely so that we could increase our confidence that were measuring judgments of moral badness (or goodness; see Figure 5). Indeed, as reported in the paper and in the SI, judgments of badness, punishment, and reputation closely tracked each other, as one would expect if we were looking at judgments like those in exchange (3) but not (4). Of course, we did not explore every statistical angle here, but there is no reason why not to, if you want. In addition to the 70 pages of analyses and graphs of these variables in the SI, the data are posted and freely available on the Culture and the Mind website for anyone who wants to explore them in greater detail.

So, regarding Thom’s question about caution versus confidence: I’m fairly confident that we were generally measuring moral judgments of some kind. When people across cultures said that the act someone had done was very bad, they also said that others would think badly of them and that they should be punished for it. When they said the act was neither bad nor good, they didn’t want them punished or rewarded. I am sure there are other possible explanations for these patterns of judgment, but I think there are good reasons to expect that people were making moral judgments, of the kind in exchange (3), not exchange (4).

A final note about this way of measuring moral judgment. Dan remarks: “Food taboos and disgusting food (in societies where there no food taboos) are considered on par, and both as unproblematically moral, two less-than-obvious decisions but no discussion.” I’m not sure what Dan means by “on par,” here, since the moral nature of actions was something we measured, not stipulated. Our data clearly show that eating culturally disapproved-of foods is not considered a very morally bad act in most of the societies we looked at, though there was of course variation; all the data are there for you to see. The SI does include a discussion of the types of foods we used and how they varied across societies, and that these might not technically be considered “taboos” in the formal anthropological sense of the term (which is, after all, tied to a specific culture area). What they had in common was that they were culturally proscribed foods. In the U.S., for example, the vignettes involved someone either knowingly and willingly eating dog meat, or doing so unknowingly. On Page 18 of the supplementary materials you can see that urban Angelenos considered both intentional and accidental dog-eating to be, in essence, neutral (I was a bit surprised by this; dog-eating could easily have been seen as a bad act). In the Hadza, on the other hand, eating snake meat either on purpose or by mistake was considered bad (the plot there pools badness, punishment, and reputation, but again they correlate). We can agree that we might have chosen other foods within the category of frowned-upon foods in a culture; we asked the ethnographers to pick one. A more in-depth cross-cultural study devoted exclusively to moral judgments about food choices could and should investigate a much broader array of foods, contexts, and judgments. Our study used proscribed foods because they had been nominated in the prior literature as a domain where intentions might matter less for moral judgments, and our data supported this conjecture.

Let me finish by reiterating that our paper was meant as a first step to systematically measure moral judgments across domains and cultures, within the context of a single study and a single methodology. There is no doubt that this method had flaws and requires follow-up (including some of our own, with additional unpublished data). Indeed, there is something in Dan and Thom’s reaction that is just what I was hoping for: “This is preposterous! We’re going to have to go and get more data!”

Here, more is definitely better. Dan suggests further studies involving experimental techniques coupled with more detailed targeted ethnographic interviews. By all means; let’s discuss.