Journal of Artificial Intelligence Research 79 (2024) 971-1000         Submitted 04/2023; published 03/2024  

© 2024 The Authors. Published by AI Access Foundation under Creative Commons Attribution License CC BY 4.0.  
                

 

 Cultural Bias in Explainable AI Research: A Systematic Analysis  
 
Uwe Peters                                                       U.PETERS@UU.NL 
Department of Philosophy, Utrecht University 
3512 BL Utrecht, The Netherlands 
 
Mary Carman                                                          MARY.CARMAN@WITS.AC.ZA  
Department of Philosophy, University of the Witwatersrand 
2050 Johannesburg, South Africa 

 
Abstract 

 
For synergistic interactions between humans and artificial intelligence (AI) systems, AI outputs 
often need to be explainable to people. Explainable AI (XAI) systems are commonly tested in 
human user studies. However, whether XAI researchers consider potential cultural differences in 
human explanatory needs remains unexplored. We highlight psychological research that found 
significant differences in human explanations between many people from Western, commonly 
individualist countries and people from non-Western, often collectivist countries. We argue that 
XAI research currently overlooks these variations and that many popular XAI designs implicitly 
and problematically assume that Western explanatory needs are shared cross-culturally. 
Additionally, we systematically reviewed over 200 XAI user studies and found that most studies 
did not consider relevant cultural variations, sampled only Western populations, but drew 
conclusions about human-XAI interactions more generally. We also analyzed over 30 literature 
reviews of XAI studies. Most reviews did not mention cultural differences in explanatory needs or 
flag overly broad cross-cultural extrapolations of XAI user study results. Combined, our analyses 
provide evidence of a cultural bias toward Western populations in XAI research, highlighting an 
important knowledge gap regarding how culturally diverse users may respond to widely used XAI 
systems that future work can and should address.  
 

1. Introduction 
 
To combine the strengths and mitigate the limitations of human intelligence and AI, increasingly 
more hybrid human-AI (HHAI) systems (e.g., AI-assisted human experts) are being developed 
and used (e.g., for clinical decision-making) (Chen et al., 2020). Successful HHAI systems 
involve and depend on close proactive collaborations, trust, and mutual understandability between 
humans and AI systems (Bansal et al., 2019). Yet, many of the AI systems that are now 
frequently used in high-stakes decision-making domains are opaque, i.e., they operate in ways too 
computationally complex even for AI developers to fully understand (Burrell, 2016). This opacity 
raises questions about these systems’ trustworthiness and can undermine successful HHAI 
collaborations: If the humans that are part of human-AI hybrid systems cannot understand why an 
AI produces the output it does, they may lack meaningful control over it in their collaboration 
with the model (Akata et al., 2020). AI explainability is thus vital for human-AI interactions.  
 

One main approach to dealing with this challenge is to equip AI systems with XAI models 
developed to make opaque systems’ outputs understandable to humans (Arrieta et al., 2020). 
However, the ability to understand the explanations that XAI systems produce may differ 
between individuals (Wang & Yin, 2021), and it has been noted that XAI designers frequently 



PETERS & CARMAN 

972	

adopt a one-size-fits-all approach, suiting primarily only AI experts (Ehsan et al., 2021). This 
may result in XAI systems that leave many AI users’ explanatory needs in human-AI interactions 
unaddressed and that potentially operate in ways at odds with, for instance, the EU’s General 
Data Protection Regulation (Casey et al., 2019).  
 

Apart from interpersonal differences in expertise, culture, i.e., the set of attitudes, values, 
beliefs, and behaviors shared by a group of people and communicated from one generation to the 
next (Matsumoto, 1996), may also significantly influence what explanations people expect or 
prefer from AI systems thus affecting human-AI collaborations. The importance of cultural 
differences has been noted in several areas of AI research including AI ethics (e.g., people’s 
responding to moral dilemmas faced by autonomous vehicles; Awad et al., 2018) and calls for 
greater cultural inclusivity in AI developments and applications are increasing (Carman & 
Rosman, 2021; Linxen et al. 2021; Okolo et al., 2022). However, it remains unclear to what 
extent there are XAI-relevant cultural variations in explanatory needs. Several AI review papers 
have drawn XAI researchers’ attention to psychological findings on human explanations (Abdul 
et al., 2018; Miller, 2019). But they have not considered empirical work on cultural differences in 
human explanatory needs, leaving it unclear whether there are XAI-relevant cultural variations 
and what they would be.  
 

A related concern is that studies in the behavioral sciences, including the fields of human-
computer interaction (HCI) and human-robot interaction (HRI), found that many researchers 
predominantly tested only individuals from Western, educated, industrialized, rich, and 
democratic (WEIRD) countries, even though WEIRD people comprise only 12% of the world 
population (Rad et al., 2018; Linxen et al., 2021; Seaborn et al., 2023). The field of XAI might 
have taken countermeasures and be less affected by WEIRD sampling. However, it has not been 
investigated whether that is so. In a recent systematic review focusing on XAI research in the 
Global South, Okolo et al. (2022) found only three XAI papers that engaged with or involved 
people from communities in the Global South. But the authors did not examine to what extent 
XAI studies outside the Global South may nevertheless be culturally diverse.  
 

More importantly, while several recent studies report that WEIRD sampling may severely limit 
the generalizability of HCI or HRI studies (Linxen et al., 2021; Seaborn et al., 2023), these 
studies do not yet control for the point that studies sampling only individuals from one kind of 
population can be unproblematic even if there are relevant cultural differences. After all, 
researchers may tailor their conclusions to their specific sample or study population, making clear 
that other populations remain to be explored. WEIRD sampling may only become questionable 
when findings are presented as if they apply beyond these populations and researchers produce 
‘hasty generalizations’, i.e., conclusions whose scope is broader than warranted by the evidence 
and justification provided by the researchers (Peters & Lemeire, 2023). Relatedly, we recently 
found that the scope of generalizations in many XAI user studies was only poorly correlated with 
the size of the studies’ samples suggesting that hasty, overly broad extrapolations may have been 
common (Peters & Carman, 2023).  

 
However, no prior corpus analysis has investigated how broadly results are generalized across 

cultures in XAI user studies, leaving a significant gap in the previous work that highlights 
problems related to WEIRD sampling. Hasty generalizations of study results may obscure cultural 
variations in people’s XAI needs and increase the risk that large parts of the world population are 
overlooked in the development of XAI and HHAI systems. Analyzing XAI user studies for hasty 
generalizations is therefore vital. 

 



CULTURAL BIAS IN EXPLAINABLE AI 

 
973 

Here, we aim to fill the research lacunas just outlined. We offer three main contributions. First, 
by drawing on existing psychological studies, we argue that many popular XAI models are likely 
better aligned with the explanatory needs and preferences that were found in people from 
typically individualist, commonly WEIRD cultures than with those that were found in people 
from typically collectivist, commonly non-WEIRD cultures. We outline a range of cultural 
differences that may affect many people’s perception of XAI outputs, making them relevant for 
research on human-AI collaborations. Second, we analyzed an extensive corpus of over 200 XAI 
user studies to examine whether they indicate awareness of cultural variations in explanations, 
have diverse samples, or avoid overgeneralizing their results (e.g., to non-WEIRD populations 
that were not tested). We found that most of these studies failed on all three counts. Finally, to see 
whether these problems have been noticed in XAI user research, we also systematically analyzed 
more than 30 literature reviews of XAI user studies. Most reviews, too, did not indicate any 
awareness of relevant cultural variations in people’s explanatory needs. Nor did they mention the 
problems of WEIRD sampling and hasty, overly broad generalizations of results to non-WEIRD 
individuals in XAI user studies. Our analyses therefore provide evidence of both a significant 
cultural bias toward WEIRD populations and a knowledge gap on whether popular XAI models’ 
outputs are satisfactory across cultures. We end with a set of recommendations to culturally 
diversify XAI user studies.  
 
2. Explainable AI Focusing on Internal Factors Risks Overlooking Collectivist 
Cultures 

 
Two broad categories of XAI techniques are often distinguished: transparent models, which are 
strictly interpretable because of their relatively simple structure (e.g., linear and logistic 
regression models, short decision trees), and post-hoc systems, which may either directly access 
or infer factors causally contributing to an opaque model’s decisions after its training (Arrieta 
et al., 2020). Post-hoc models currently dominate XAI designs for lay-users (Taylor & Taylor, 
2021). Their outputs may be visual (e.g., saliency maps), numerical (e.g., importance scores), or 
textual (e.g., feature reports), and generally cite factors that are internal to an opaque model and 
determinative of its decision (Arrieta et al., 2020). In that sense, post-hoc XAI outputs are often 
thought to be analogous to the human way of explaining decisions in terms of internal mental 
states (belief, desires, etc.) (Adadi & Berrada, 2018; Zerilli, 2022) and frequently contain 
mentalistic notions (‘being confident’, ‘think’, ‘know’). Table 1 presents examples.  
 
 
 
 
 
 
 
 
 
 

Table 1: Four examples of internalist XAI outputs from XAI user studies 
 

Post-hoc XAI systems producing such internalist explanations have been criticized for being 
“algorithm-centered”, as they tend to ignore the social context in which AI systems operate 
(Ehsan et al., 2021). Yet, many AI researchers now hold that for lay-users, XAI systems should 
provide explanations that cite internal states that are viewed as analogues to human beliefs or 
desires because they are shorter, easier to understand, and people expect such explanations (De 
Graaf & Malle, 2017; Zerilli et al., 2019). Correspondingly, a “significant body of work in XAI 

(1) XAI: “I am C x( ) confident that y will be correct based on |S| past cases deemed similar to x.” 
(Waa et al., 2020, p. 4) 
(2) XAI: “Here is why the classifier thinks so [presentation of (e.g.) a decision tree].” 
(Yang et al., 2020, p. 1) 
(3) XAI: “Why this exercise? Wiski thinks your current level matches that of this exercise!” (Ooge 
et al., 2022, p. 3) 
(4) XAI: “ShapeBot knows this is a [AI output] because ShapeBot realizes [decision factors].” 
(Zhang et al. 2022, p. 10) 

 



PETERS & CARMAN 

974	

aims to explain ML [machine learning] systems by reducing their operations to a form that is 
amenable to belief-desire representation” and so “intentional stance” interpretations (Zerilli, 
2022, p. 2). Hence, many currently popular XAI designs for lay-users rest on the assumption that 
people in general prefer internalist explanations of behavior, i.e., explanations invoking an 
agent’s intentional, inner states.  

 
However, none of the just cited papers that argue that XAI systems should provide such 

(intentional stance) explanations have so far reflected on whether this kind of explanation is 
equally used and accepted across all cultures. This is problematic, as the explanations that people 
prefer for a given decision or action are unlikely to be uniform cross-culturally. To illustrate this 
point, we will focus on internalist explanations and variations between individualist cultures, 
where a person’s self is often viewed as a discrete entity independent of others, and collectivist 
cultures, where a person’s self is often viewed as interdependent with others (Hampton & 
Varnum, 2020). While differences between these two cultures are not limited to particular regions 
(Fatehi et al., 2020), and people within a country are usually highly heterogeneous preventing a 
clear demarcation of cultures by countries (Oyserman et al., 2002), several recent studies found 
that WEIRD countries (e.g., the USA) were predominantly individualist whereas non-WEIRD 
countries (e.g., China) were predominantly collectivist cultures (Klein et al., 2018; Pelham et al., 
2022). Figure 1 visualizes this evidence on the link between cultures and countries.   
 

 

 
 

Figure 1: World map displaying the geographical distribution of collectivist and individualist   
cultures; horizontal stripes indicate WEIRD countries (map was self-created using MapChart). 

 
The overlap between individualist cultures and WEIRD countries and collectivist cultures and 

non-WEIRD countries, respectively, is important here because psychological studies on human 
explanations consistently found that while participants from individualist, typically WEIRD 
cultures such as the USA, did tend to explain behavior primarily in terms of an agent’s internal 
mental features (e.g., attitudes, character, or beliefs), participants from collectivist, commonly 
non-WEIRD cultures such as India, Korea, Saudi Arabia, and China, instead preferentially 
explained behavior in terms of external factors including social norms, task difficulty, or 
economic circumstances (Miller, 1984; Cha & Nam, 1985; Al-Zahrani & Kaplowitz, 1993; 



CULTURAL BIAS IN EXPLAINABLE AI 

 
975 

Lillard, 1998). To illustrate the difference, suppose an observer sees, for instance, a nurse 
assisting a patient in trouble, or a man robbing a bank. If the observer has an externalist focus in 
their behavior explanation, they may hold that the nurse acts that way because she has the social 
role to look after patients, and the man committed the crime because of economic hardship, 
respectively, rather than internal factors such as beliefs or desires.  

 
Studies exploring such differences in externalist vs. internalist explanations found that many 

people from non-Western populations (i.e., Asian-Australian, Chinese-Malaysian, Filipino, 
Japanese, Mexican) more strongly endorsed ideas that suggested that internal traits did “not 
describe a person as well as roles or duties do, and that trait-related behavior changes from 
situation to situation” (Henrich et al., 2010, p. 12). Correspondingly, in Pacific societies, many 
people were found to be under the expectation to “refrain from speculating (at least publicly) 
about what others may be thinking” (Robbins & Rumsey, 2008, p. 407), and in some collectivist 
societies, “explanations of behavior seem to require an analysis of social roles, obligations, and 
situational factors” (Fiske et al., 1998, p. 915).  
 

These well-documented cultural differences (Lillard, 1998; Lavelle, 2021) matter for XAI 
development. We do not challenge that AI programmers or other expert AI users need to have 
insights into a system’s internals to debug it and so may prefer internalist XAI outputs (Bhatt et 
al., 2020). However, the findings just outlined cast doubts on the common view in XAI research 
that internalist explanations are analogous to how people in general, including lay users, 
preferentially explain behavior. The findings raise the possibility that potentially many 
individuals from collectivist cultures (which form 70% of the world population; Triandis, 1995) 
may often prefer or even require externalist explanations, i.e., explanations with more reference to 
context, social functions, norms, or others’ behavior than to internal states. If XAI systems 
produce predominantly only internalist explanations and do not sufficiently cite external factors, 
people in collectivist cultures may find them unsatisfactory and less trustworthy. To make the 
difference between internalist and externalist explanations with respect to XAI outputs more 
concrete, Table 2 provides four examples of potential externalist counterparts to the internalist 
XAI outputs from Table 1.  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Table 2: Pairs of internalist and potential externalist XAI outputs 
 

We currently lack the data to tell whether people from collectivist or individualist cultures will 
indeed respond differently to such XAI outputs because to the best of our knowledge (and based 
on the corpus analysis we report below), this has not yet been investigated. Our point here is that 

Internalist XAI: “I am C x( ) confident that y will be correct based on |S| past cases deemed 
similar to x.”  
Externalist XAI: “Y will be correct because my task is to find the most likely result based on |S| 
past cases deemed similar to x.” 
Internalist XAI: “Here is why the classifier thinks so [presentation of (e.g.) a decision tree].”  
Externalist XAI: “The classifier produced this output because classification rules specify that 
given x, y is the case.”  
Internalist XAI: “Why this exercise? Wiski thinks your current level matches that of this 
exercise!”  
Externalist XAI: “Why this exercise? In most Wiski users with your current level, this level 
matched that exercise!” 
Internalist XAI: “ShapeBot knows this is a [AI output] because ShapeBot realizes [decision 
factors].”  
Externalist XAI: “ShapeBot classifies this as a [AI output] because ShapeBot’s task is to do so 
when presented with [decision factors].” 
 



PETERS & CARMAN 

976	

given the evidence that we have from previous psychological studies, there is reason to believe 
that differential reactions to these two kinds of XAI outputs are likely to occur in many 
individuals of the relevant cultures. It would therefore be valuable if XAI researchers 
experimentally tested and compared users’ responses to the outlined internalist and externalist 
outputs. Given the current absence of explicit testing for or reflection on these potential 
differences, the popular use of “algorithm-centered” internalist post-hoc explanations in XAI 
developments (e.g., De Graaf & Malle, 2017; Zerilli et al., 2019; Zerilli, 2022; Ehsan et al., 2021) 
suggests that many XAI designs implicitly and problematically assume that Western explanatory 
needs and preferences are shared cross-culturally, revealing a cultural bias. 
 

There are other cultural differences in human explanations and related cognitive processes than 
the individualist/internalist and collectivist/externalist variation. To draw XAI researchers’ 
attention to them, in Table A1 in the Appendix, we present a range of psychological studies and 
reviews that we have not yet mentioned here and that strike us as especially relevant for XAI 
research. For instance, in experimental settings, participants from East Asia preferred more 
detailed explanations (Klein et al., 2014), indirect, contextualized communication (Wang et al., 
2010), and similarity-based object categorization than Western participants did (Nisbett et al, 
2001).  
 

All of that said, classifying cultures as individualist and collectivist or as WEIRD and non-
WEIRD may not be the best way to account for cultural variations in explanation because this 
approach risks homogenizing and stereotyping users from the related countries. To mitigate this, 
XAI researchers may refrain from these dichotomies and instead investigate more broadly where 
users from different cultural backgrounds are satisfied with one type of explanation, in which 
cases they may require or prefer internalist versus externalist outputs, or whether their choice is 
application dependent. We do not intend the individualist/collectivist and WEIRD/non-WEIRD 
categories to be definitive of a culture (e.g., WEIRD and non-WEIRD groups are heterogeneous, 
not always clearly distinct, and should not be reified; Ghai, 2021). We only employ these 
categories here because they offer interpretative tools for examining cross-cultural differences 
that have already been used insightfully in other AI-related research (e.g., differences in 
algorithmic aversion; Liu et al., 2023) and do capture reliable (but fluid) cultural differences in 
human explanations between some members of WEIRD and non-WEIRD populations, making 
them relevant for XAI and HHAI research. To what extent are XAI researchers aware of the 
outlined cultural variations? To find out, we systematically reviewed XAI user studies.  
 
3. A Systematic Analysis of XAI User Studies 
 
Adapting key components from the Preferred Reporting Items for Systematic Reviews and Meta-
Analyses (PRISMA) framework (Moher et al., 2009) and following a protocol used in previous 
work (Peters & Carman, 2023; Peters & Lemeire, 2023), we reviewed XAI user studies to answer 
four research questions (RQ): 
 

RQ1. Do researchers that conduct XAI user studies indicate awareness that cultural 
variations may affect the generalizability of their results? 
RQ2. What is the cultural background of the samples that XAI user researchers test? 
RQ3. Do XAI researchers restrict their user study conclusions to their participants or study 
population, or generalize beyond them? 
RQ4. Is the scope of researchers’ conclusions related to the cultural diversity of their 
samples such that studies with broader conclusions are associated with more diverse 
samples? 
 



CULTURAL BIAS IN EXPLAINABLE AI 

 
977 

3.1 Methodology 
 
To identify relevant papers, in July 2022, we searched three major databases covering computer 
science and AI literature, i.e., Scopus, Web of Science, and arXiv, using a query containing 15 
variants of key words related to XAI and end-users (for details, see Table A2, Appendix). The 
results were 2523 papers. After removing duplicates (n = 535), 1988 papers remained. Their titles 
and abstracts were scanned to find papers that met our selection criteria.  
 

Selection criteria. We included any primary study (article, conference paper, chapter) that 
surveyed people on AI-based explanations of AI decisions and was published between January 
2012 and July 2022. We excluded reviews, surveys, theoretical (incl. philosophical) papers, 
unpublished drafts (vs., e.g., arXiv preprints), guidelines, position papers, tutorials, technical, or 
applied papers (e.g., only introducing new XAI models), studies or surveys on other AI features 
than specific XAI outputs (e.g., ‘algorithmic aversion’), and small-scale stakeholder or user 
studies with ≤ 5 participants, which is too small a sample to ensure robust generalizations 
(Cazañas et al., 2017). We also excluded non-English papers. Of the 1988 papers, 192 remained 
for further screening, during which forward snowballing produced 14 more papers, resulting in 
206 articles for full-text analysis (Figure 2).  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
                           

  
 
 
 
 
 
 

Figure 2: PRISMA flowchart of the systematic review 
 

Data extraction. During full-text analysis, we (two researchers) independently classified papers 
by using pre-specified criteria (and a binary label, 0 = no; 1 = yes) to extract the following 

Papers identified from: 
Scopus n = 1554 
Web of Science n = 426 
ArXiv n = 543 
Total n = 2523 

Papers removed before screening:  
n = 535 (duplicates) 
 

Papers screened: n = 1988 
Papers excluded: n = 1796 
(Technical, applied papers: n = 1270 
Theoretical papers, guidelines, 
perspectives, tutorials: n = 446 
Reviews, overviews: n = 76 
Small (≤ 5) user studies: n = 4) 
 Papers sought for retrieval: n = 192 

Additional papers identified via 
snowballing: n = 14 

Papers included in review: n = 206 

Id
en

tif
ic

at
io

n 
Sc

re
en

in
g 

 
In

cl
ud

ed
 



PETERS & CARMAN 

978	

information. Apart from publication year and participant recruitment practice (e.g., conventional 
sampling or Amazon Mechanical Turk (MTurk) crowdsourcing), we extracted XAI output type, 
classifying papers as ‘internalist’ when they tested XAI explanations purporting to capture 
models’ internal decision parameters (e.g., local feature importance), or as ‘externalist’ when they 
tested XAI explanations citing external factors (e.g., context, cultural norms, social situation) or 
involved XAI-user interaction (e.g., follow-up questions). Additionally, we classified papers on 
whether authors indicated awareness that culture can influence people’s responding to XAI 
outputs in ways affecting the study generalizability.  

 
We also extracted participants’ cultural background, operationalizing it as participants’ country 

or region (e.g., Europe) (Sawaya et al., 2017). Nationality or region is not always coextensive 
with cultural background (Taras et al., 2016). But it was typically the only clue of cultural 
belonging in the papers, and analyses found that alternative social aggregates (e.g., ethnicity) 
contributed only negligible explained variance to that already captured by nations (Akaliyski et 
al., 2021). Depending on the sample’s country or region, we also labelled a paper as ‘WEIRD’, 
‘non-WEIRD’, or ‘mixed’ based on previous studies’ geographical categorizations (Klein et al., 
2018; Yilmaz & Alper, 2019).  

 
Finally, we identified an article’s scope of conclusion based on the population to which results 

were generalized. Scientists commonly distinguish three types of populations: the target 
population, i.e., people to whom results are intended to be applied in real-world contexts (e.g., all 
users of a system X); the study population, i.e., users who are available and eligible for the study 
(e.g., US users meeting specific inclusion criteria); and the study sample, i.e., participants drawn 
from the study population (Banerjee & Chaudhury, 2010). We coded articles as ‘restricted’ if, 
throughout their text, authors did not extrapolate their findings beyond their sample or study 
population but instead used qualifiers (e.g., ‘our participants’), quantifiers (e.g., ‘some European 
users’), or past tense to limit their claims or recommendations to these populations, or otherwise 
indicated that they are study, sample, context, or culture specific. Authors may in contrast also 
describe results by using generics, i.e., unquantified generalizations that refer to whole categories 
of people not specific, explicitly quantified sub-sets of them (e.g., ‘Users prefer X’ vs. ‘Many 
(US, 75%, Western, etc.) users prefer X’). Or they may use other expressions that suggest that the 
results apply, for instance, to all non-experts, users, people, contexts, time, or cultures (for 
examples, see Table 3). If a paper had at least one such broad results claim in the mentioned 
sections, it was classified as ‘unrestricted’. Papers with both restricted and unrestricted claims 
were also labelled ‘unrestricted’ because manuscripts are typically revised multiple times. If 
authors do not qualify their broader claims in the revisions, there is reason to believe they 
consider their broader generalizations warranted.  
 

Reliability. For each classification, inter-rater agreement was calculated (Cohen’s κ). It was 
consistently substantial (between κ = .71 and .90). We additionally asked two project-naïve 
researchers to independently classify the scope of conclusion variable for 25% of the data using 
our criteria. Inter-rater agreement between their and our classifications was also substantial (κ = 
.66 and .74, respectively), providing an additional reliability control for this variable. All 
remaining disagreements were resolved by discussion before the data were analyzed. All our data 
are publicly accessible on an OSF platform here. 
 
3.2 Results 
 
Most of the 206 XAI studies (94.7%, n = 195) in our sample were published between 2019 and 
2022. Several studies used multiple recruitment practices, where 45.2% (n = 93) of all papers 
reported conventional sampling, followed by crowdsourcing via websites. The two most common 



CULTURAL BIAS IN EXPLAINABLE AI 

 
979 

websites were MTurk (29.1%, n = 60) and Prolific (9.2%, n = 19).  
 

While some papers tested multiple kinds of XAI outputs, 88.8% (n = 183) of the papers 
focused on internalist explanations, and only 14.6% (n = 30) mentioned external factors (incl. 
XAI-user interaction) as relevant for users’ perception of XAI outputs. Moreover, just 3.4% (n = 
7) of the papers considered explanations that invoked external factors that may appear in 
collectivist explanations such as social rules, contexts, or social functions. None of the 206 papers 
explored potential differences in people’s responding to internalist versus externalist XAI outputs 
that we outlined above.  
 

RQ1. Do researchers that conduct XAI user studies indicate awareness that cultural variations 
may affect the generalizability of their results? 93.7% (n = 193) of the papers did not display any 
awareness (e.g., in discussion, limitation, or conclusion sections) that there may be cultural 
differences in how people perceive XAI outputs that can undermine broad extrapolations of 
results. Relatedly, these papers did not provide support (e.g., arguments or evidence) for the 
assumption that human explanatory needs are invariant across cultures.  
 

RQ2. What is the cultural background of the samples that XAI user researchers test? 48.1% (n 
= 99) of the papers did not report cultural information about their samples. Across the remaining 
107 papers, 32 countries or regions were mentioned. The three most frequent ones were the US (n 
= 53), the UK (n = 13) and Germany (n = 12) (for details, see Table A3, Appendix). Moreover, 
from the 107 papers, 81.3% (n = 87) had only WEIRD samples, exceeding the numbers of papers 
with mixed samples (10.3%, n = 11) and with only non-WEIRD samples (8.4%, n = 9).  
 

RQ3. Do XAI researchers restrict their user study conclusions to their participants or study 
population, or generalize beyond them? Since 99 papers did not provide cultural information, 
they could have involved diverse samples. Broad generalizations may in this case be 
unproblematic. Since we could not determine cultural background in these papers, we analyzed 
only the remaining ones with this information (n = 107). 70.1% (n = 75) of them contained 
unrestricted conclusions, i.e., claims that suggested that the study results applied to all (e.g.) non-
experts, people, users, consumers, humans, contexts, or time. Table 3 below presents examples (a 
full list of all unrestricted claims that we found in the papers can be found here). 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Table 3: Examples of unrestricted conclusions 

(1) “Our user study shows that non-experts can analyze our explanations and identify a rich set of 
concepts within images that are relevant (or irrelevant) to the classification process.” (Schneider & 
Vlachos, 2023, p. 4196) 
(2) “Our pilot study revealed that users are more interested in solutions to errors than they are in 
just why the error happened.” (Hald et al., 2021, p. 218) 
(3) “Our findings demonstrate that humans often fail to trust an AI when they should, but also that 
humans follow an AI when they should not.” (Schmidt et al., 2020, p. 272) 
(4) “We also found that users understand explanations referring to categorical features more 
readily than those referring to continuous features.” (Warren et al., 2022, p. 1) 
(5) “Both experiments show that when people are given case-based explanations, from an 
implemented ANN-CBR twin system, they perceive miss-classifications to be more correct.” 
(Ford et al., 2020, p. 1) 
(6) “Results indicate that human users tend to favor explanations about policy rather than about 
single actions.” (Waa et al., 2018, p. 1) 
(7) “Our findings suggest that people do not fully trust algorithms for various reasons, even when 
they have a better idea of how the algorithm works.” (Cheng et al., 2019, p. 10) 
 



PETERS & CARMAN 

980	

RQ4. Is the scope of XAI researchers’ conclusions related to the cultural diversity of their 
samples such that studies with broader conclusions are associated with more diverse samples? 
To address this question, focusing only on the papers with cultural information (n = 107), we first 
analyzed the scope of conclusions in the papers with only WEIRD, only non-WEIRD, and mixed 
samples. If studies with broader conclusions have more diverse samples, then one would predict 
that papers with unrestricted conclusions tend to have mixed samples, i.e., not either only 
WEIRD, or only non-WEIRD samples. Table 4 presents the comparisons. Unlike predicted, 
90.7% (n = 68) of the 75 papers with unrestricted claims had in fact only WEIRD (84%, n = 63) 
or only non-WEIRD (6.7%, n = 5) samples. Moreover, if papers with broader conclusions had 
more diverse samples, then papers with unrestricted claims should include a higher proportion of 
papers with mixed samples compared to papers with restricted claims. However, a χ2 test showed 
that there was no evidence of a statistically significant difference of this kind (p = 0.51). 

 
 
 
 
 
 

 
 
 

Table 4: Distribution of papers with unrestricted and restricted conclusions by sample 
composition 

 
Furthermore, when relating the number of countries or regions sampled in each paper (which 

ranged from 1 to 19) to the scope of conclusion variable, we found that 82 papers mentioned only 
one country or region but nonetheless constituted 74.7% (n = 56) of all papers with unrestricted 
conclusions (n = 75) (Table A4, Appendix). Finally, to statistically analyze whether papers with 
unrestricted conclusions had more diverse samples than papers with restricted conclusions, we 
also conducted a Mann-Whitney U test (as our data were not normally distributed) with the 
number of countries/regions as our dependent scale variable and scope of conclusions as the 
categorical independent variable. We found no evidence that unrestricted papers had or were 
correlated with a statistically significant higher number of countries or regions in their samples 
than the restricted papers (p = 0.59).  

 
Our findings indicate significant shortcomings in many XAI user studies. But before 

interpreting the results, it is worth exploring whether researchers who have conducted literature 
reviews of XAI user studies have noticed any of the issues that we have just reported, i.e., a lack 
of awareness of relevant cultural variations, pervasive WEIRD sampling, or broad generalizations 
of XAI user study results from WEIRD to non-WEIRD populations. We therefore extended our 
systematic review to recent literature reviews of XAI user studies themselves. 
 
4. A Meta-review of Reviews about XAI User Studies 
 
Following the same procedure as before, we explored four questions: 
 

RQ1. Do literature reviews about XAI user studies indicate that there may be cultural 
variations in explanatory needs that can affect the generalizability of study results? 
RQ2. Do these reviews comment on WEIRD population sampling in XAI user studies? 
RQ3. Do they comment on potential hasty generalizations in these studies? 

               Scope of conclusion 
Sample background    Restricted 

papers 
   Unrestricted papers Total 

Only non-WEIRD 4 5 9 
Mixed 4 7 11 

Only WEIRD 24 63 87 
Total 32 75 107 

 



CULTURAL BIAS IN EXPLAINABLE AI 

 
981 

RQ4. Do the authors of reviews about XAI user studies restrict their own conclusions from 
these studies to particular samples or study populations or generalize beyond them? 
 

4.1 Methodology 
 
To find reviews to analyze, in September 2022, we used the same three databases (Scopus, Web 
of Science, arXiv) and search strings as before but now added the specific restrictor “review” (for 
details, see Table A4, Appendix). The search results were 130 papers. 10 duplicates were 
removed. Titles and abstracts of the remaining 120 papers were scanned for articles meeting our 
inclusion criteria. We included any literature review of XAI user studies that was published 
between January 2012 and September 2022. We excluded any theoretical paper about AI 
principles or XAI guidelines, and any review paper about AI or XAI that did not focus on XAI 
user studies. Non-English publications were also excluded. 24 articles remained for further 
screening, during which forward snowballing produced 10 more papers, yielding 34 articles for 
full-text analysis (see Figure 4). 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Figure 4: PRISMA flow diagram for the meta-review 
 

During full-text analysis, we independently classified papers according to the following 
information (in addition to publication year). We applied a binary label (0 = no or 1 = yes) to a 
review if it (1) indicated that there might be cultural, contextual, or social variations in people’s 
perceptions of XAI outputs that are relevant for user research, (2) contained comments on 
WEIRD sampling in XAI studies, (3) noted the relevance of stating cultural, national, or regional 
background in user studies, or (4) commented on potential hasty generalizations of results in these 

Papers identified from: 
Scopus n = 62 
Web of Science n = 19 
ArXiv n = 49 
Total n = 130 

Papers removed before 
screening:  
n = 10 (duplicates) 

Papers screened: n = 120 Papers excluded: n = 96 
(Theoretical papers, not reviews:  
n = 62 
Reviews of AI/XAI in general, 
not XAI user studies:  
n = 34) 
 

Papers identified via snowballing:  
n = 10 

Papers retrieved: n = 24 

Papers included in review: n = 34 

Id
en

tif
ic

at
io

n 
Sc

re
en

in
g 

 
In

cl
ud

ed
 



PETERS & CARMAN 

982	

studies. We also classified reviews according to the scope of conclusions that they drew from 
XAI user studies, employing the same restricted versus unrestricted distinction as previously. To 
ensure reliability for the classifications, we again calculated inter-rater agreement. It was 
consistently substantial (between κ = .78 and κ = 94).  
 
4.2 Results 
 
In our sample of 34 literature reviews, most reviews (91.2%, n = 31) were published between 
2019 and 2022 (Table A5, Appendix) 
 

RQ1. Do literature reviews about XAI user studies indicate that there may be cultural 
variations in explanatory needs that can affect the generalizability of study results? In 82.4% (n = 
28) of the reviews, this did not happen. Moreover, from the 6 reviews that briefly referred to 
culture, just 2 mentioned variations relevant for XAI, noting that there are “clear cultural 
differences in preference for simple versus complex explanations” (Mueller et al., 2019, p. 77) 
and “differences in the preference for personalized explanations depending on their cultural 
background” (Sperrle et al., 2020, p. 5). However, neither paper elaborated on or offered an 
overview of XAI-relevant cultural differences.  
 

RQ2. Do reviews about XAI user studies comment on WEIRD population sampling in these 
studies? 94.1% (n = 32) of the reviews did not do so. Only 2 reviews (Sperrle et al., 2020; Laato 
et al., 2022) displayed sensitivity to the relevance of being explicit about XAI users’ cultural, 
national, or regional background in XAI user studies. However, these papers did not specifically 
review XAI user studies to explore the extent of WEIRD sampling, nor did they offer quantitative 
data on it.  
 

RQ3. Do reviews of XAI user studies comment on potential hasty generalizations in these 
studies? Just 1 of 34 reviews mentioned unwarranted extrapolations, writing that “perhaps the 
greatest challenge in the study of HAI [human-AI] teams […] is simply resisting the urge to 
overgeneralize experimental results” (Zerilli et al., 2022, p. 7). However, the authors did not 
provide quantitative evidence of the extent of hasty, overly broad generalizations in XAI papers 
and did not consider XAI-relevant cultural differences, or WEIRD sampling. They also 
themselves overgeneralized some study results when writing, for instance, that “in very simple 
automated tasks involving a single person, people tend to distrust automated aids whose errors 
they witness, unless an explanation is provided” (ibid). This brings us to RQ4. 
 

RQ4. Do the authors of reviews about XAI user studies restrict their own conclusions from 
these studies to particular samples or study populations generalize beyond them? From all 34 
reviews, 82.4% (n = 28) involved generalizations beyond any particular sample or (e.g., national) 
study population. Table 5 presents examples.  

 
 
 
 
 
 
 
 
 

 



CULTURAL BIAS IN EXPLAINABLE AI 

 
983 

 
 
 
 
 
 
 
 
 
 
 
 

 
Table 5: Seven examples of unrestricted conclusions in XAI reviews 

 
5. General Discussion and Recommendations 
 
Our analyses reveal significant methodological limitations in much of the currently available XAI 
user research. We briefly revisit the three main findings of our two reviews and introduce 
mitigation strategies for the problems that our results highlight. 
 

(1) Lack of sensitivity to cultural variations in explanatory needs. In the first analysis, we 
found that almost 90% of the XAI studies that we reviewed focused only on internalist 
explanations. As argued in Section 2, these explanations may better align with WEIRD 
individuals’ explanatory needs than with those of non-WEIRD people in collectivist cultures, who 
may prefer externalist explanations (Henrich et al., 2010; Lavelle, 2021). Externalist explanations 
involving factors often highlighted in collectivist cultures were only explored in less than 4% of 
all studies. Moreover, in about 90% of the papers, authors did not display any awareness of 
cultural differences such as those discussed in section 2 and outlined in Table A1. Our meta-
review of literature reviews about XAI user studies additionally revealed that the vast majority of 
these reviews (> 80%) were not sensitive to cultural variations in people’s explanatory needs 
either. These findings suggest that XAI researchers routinely overlooked potentially relevant 
cultural differences that can affect human-AI interactions. 
 

To tackle these problems, we recommend that AI journals increase the cultural diversity of 
their reviewer pool to ensure viewpoint variation during manuscript evaluation (Linxen et al., 
2021). Conference organisers, in turn, can use platforms such as OpenReview, which makes 
reviewer reports public thereby allowing for an extra level of accountability (Wang et al., 2021). 
We also recommend that the increasing emphasis in reviews of XAI work on relating 
psychological findings to XAI developments (e.g., Miller, 2019; Rong et al., 2022) be extended to 
include data on the cultural differences summarized in Section 2 and Table A1.  
 

(2) WEIRD sampling. We found that non-WEIRD populations were rarely sampled in XAI user 
research. This finding matches results from studies that explored sampling in HCI and HRI and 
report that 73-75% of papers tested only WEIRD populations (Linxen et al., 2021; Seaborn et al., 
2023). However, our results suggest that the problem may be worse in XAI research, as more than 
80% of XAI papers with relevant information involved only WEIRD participants. The findings of 
our meta-review add further weight to this problem because almost all (94%) of the reviews in 
our sample overlooked the predominately WEIRD sampling in XAI user studies.  

(1) “Users who are expert or self-confident in tasks that have been delegated to automation tend to 
ignore machine advice […].”  (Zerilli et al., 2022, p. 4) 
(2) “It has been recognised in the literature that counterfactuals tend to help humans make causal 
judgments.” (Chou et al., 2022, p. 42) 
(3) “Users tend to anthropomorphize AI and may benefit from humanlike explanations.” (Laato et 
al., 2022, p. 14) 
(4) “The textual explanations generated with GRACE were revealed to be more understandable by 
humans in synthetic and real experiments.” (Islam et al., 2022, p. 20) 
(5) “Humans lose trust in the explanation when soundness is low.” (Gerlings et al., 2021, p. 5) 
(6) “People tend to prefer complex over simple explanations if they can see and compare both 
forms.” (Mueller et al., 2019, p. 77) 
(7) “People rarely expect an explanation that consists of an actual and complete cause of an event.” 
(Miller, 2019, p. 3) 
 



PETERS & CARMAN 

984	

 
There may be explanations for why WEIRD populations are over-represented. WEIRD 

countries may be a key market for XAI and hybrid human-AI designs. However, even if that is 
so, diverse sampling can still be advantageous because a significant number of people in WEIRD 
countries have diverse, non-WEIRD background (e.g., people from China, India, South America 
living in the USA) (Budiman & Ruiz, 2021). XAI products tested on more diverse users may thus 
ultimately be more profitable even in WEIRD countries, as they can appeal to a wider market.  
 

Another reason for the pervasive WEIRD sampling may be that XAI user studies are 
predominantly conducted in WEIRD countries and geographically diverse sampling can be 
complicated. However, experiments through online platforms (e.g., MTurk or Prolific) are often 
feasible and have wider reach, enabling more diverse sampling. 29.1% (n = 60) of the XAI 
studies we reviewed already used MTurk. Even then, though, caution is warranted, as research 
suggests that most MTurkers (80%) come from the USA (Keith & Harms, 2016). It is worth 
noting that comparative studies found that Prolific has more diverse, less dishonest, more 
attentive, and more reliable participants, providing higher quality data than MTurk (Peer et al., 
2017). Yet, we found that only 9.2% of the reviewed XAI papers used Prolific, suggesting that 
many XAI researchers may be unaware of these differences. Thus, while conventional sampling 
should not be abandoned (e.g., it may be needed to study people without computer access), we 
recommend that XAI researchers recruit via Prolific, or LabintheWild, a crowdsourcing platform 
specifically developed to tackle the WEIRD sampling problem (Reinecke & Gajos, 2015), rather 
than MTurk to increase cultural diversity in user studies.  

 
We acknowledge, however, that testing culturally diverse groups can also come with 

conceptual challenges because culture itself can and has been defined in multiple ways (Baldwin 
et al., 2006), where different ways of operationalizing culture (e.g., ethnicity, values, collective 
traits, country of residence, citizenship, heritage, shared language; Taras et al., 2009, 2016) can 
make comparisons between XAI studies and assessments of appropriate levels of generalizations 
difficult. Many existing measures of culture draw on Hofstede’s (1980) methodology and his self-
report questionnaire containing items about individualism/collectivism, power distance, 
uncertainty avoidance, and masculinity/femininity (Taras et al., 2009). Since Hofstede’s original 
questionnaire is perhaps too long for inclusion into XAI user studies (it contained 126 questions), 
for XAI studies investigating, for instance, individualist/collectivist differences, we recommend 
that researchers adapt the related items from this questionnaire, as it is validated. That said, 
Hofstede’s theory and methodology have also been criticized for being overgeneralizing 
(McSweeney, 2002), leading some technology researchers to use nationality as a proxy for culture 
instead (Ur & Wang, 2013; Sawaya et al., 2017). To capture that culture is a multidimensional 
construct, XAI researchers may therefore refrain from any single definition of culture and instead 
individually measure (via self-report items) users’ nationality, racial/ethnic background, country 
of residence, home language and the relevant aspect of Hofstede’s construct and then conduct 
regression analyses to identify and report the strongest predictor of responses to XAI outputs. 
This can enable insights into culture-related variations and may allow for comparisons and 
extrapolations across social groups and XAI user studies without invoking simplistic 
characterizations of culture.  

 
Finally, it is important to note that as many as 48.1% (n = 99) of the reviewed studies did not 

report cultural (country/region) information about their samples. While these studies may have 
involved diverse population samples, the absence of reporting suggests that this information was 
not considered relevant for replication or generalizability. Not reporting on cultural information 
may be justified within a given study. However, it could also reflect implicit assumptions and 
biases about whether findings from particular populations are more generalizable than findings 



CULTURAL BIAS IN EXPLAINABLE AI 

 
985 

from other populations (Cheon et al., 2020). We therefore recommend that researchers either 
report information about participants’ cultural backgrounds in ways discussed above or provide 
“constraints on generality” statements, specifying the study population and the basis for believing 
that the sample is representative of it or broader populations (for guidance on these statements, 
see Simons et al., 2017; Linxen et al., 2021). 
 

(3) Hasty generalizations of XAI study results. Most of the XAI studies we analyzed contained 
conclusions that presented findings as if they held for whole categories of people (e.g., experts, 
users, humans) even when they had only tested WEIRD populations or a single country. 
Generalizations from WEIRD to non-WEIRD populations need not be unwarranted. Researchers 
who produced such extrapolations might have had good grounds to assume that this particular 
dimension of demographic variation was irrelevant for their study. So, it does not follow from the 
evidence that XAI user researchers drew conclusions about populations much wider than their 
study population that these generalizations were unwarranted. However, if all the unrestricted 
conclusions we found had been based on researchers’ reflection on relevant or irrelevant 
demographic differences, there should have been an indication of this reflection in their papers. 
This is because to fully establish a study conclusion and make the study reproducible, all 
underlying assumptions that justify the conclusions (including the potential assumption that 
people’s explanatory needs or preferences are cross-culturally invariant) need to be made explicit. 
Yet, as noted, we found that more than 90% of the reviewed XAI user studies did not contain any 
evidence of reflection on XAI-relevant cultural differences or invariance. Furthermore, we could 
not find any evidence that XAI user studies with broader claims had or were associated with more 
diverse samples. Hence, the unrestricted conclusions in most of the reviewed XAI papers were 
hasty generalizations, i.e., claims whose scope was broader than warranted by the evidence and 
justification provided by the researchers. Since psychological research suggests that explanatory 
needs likely differ between WEIRD and non-WEIRD populations, as discussed in Section 2, the 
pervasive insufficiently supported extrapolations that we found from WEIRD samples to other 
populations may indicate a cultural “generalization bias” (Peters et al., 2022) toward WEIRD 
populations in many currently available XAI user studies.  
 

Our findings fill a significant gap in previous studies in HCI and HRI that reported 
generalizability problems related to WEIRD sampling in these fields (Linxen et al., 2021; 
Seaborn et al., 2023). This is because these studies did not measure the scope of the 
generalizations that researchers produced thus leaving it unclear whether methodological 
shortcomings were involved. Indeed, while hasty generalizations across cultures have been 
reported in other fields (e.g., Peters & Lemeire, 2023) until now it has remained unknown 
whether they also occur in XAI, allowing XAI researchers to ignore or deny their presence in the 
field of XAI. Our results block this potential move, which is important, as encountering 
overgeneralizations in XAI user studies is particularly disconcerting. XAI user study results can 
directly feed into the production of XAI that a wide range of people later interact with (Ehsan et 
al., 2021; Ding et al., 2022; Okolo et al., 2022). These results can affect the way human-AI hybrid 
systems are developed via influencing what XAI models are included in HHAI designs. 
Generalizing results to cultural groups to whom they do not apply can hide that certain XAI and 
human-AI hybrid systems may only meet the explanatory needs of individuals with a particular 
cultural background, raising ethical concerns about both explainability and inclusivity.  
 

We thus recommend that XAI studies be conducted in collaborations with researchers or 
participants from different cultures. To further mitigate hasty generalizations, XAI researchers 
should consider restricting their user study conclusions by using quantifiers (‘US users’, ‘our 
participants’, ‘many users’, etc.), qualifiers (e.g., ‘may’, ‘can’), or past tense (Peters et al., 2022). 



PETERS & CARMAN 

986	

Table A7 in the Appendix presents examples of restricted versions of the unrestricted conclusions 
from Table 3.  

 
6. Limitations 
 
There are several constraints on the generalizability of our own analysis results. First, our 
literature search was limited to three major databases of scientific literature covering XAI studies. 
Second, there may also be XAI user studies that do not use our specific search terms and that we 
may have overlooked. Third, we focused only on English publications on XAI user research and 
may have overlooked, for instance, recent research from non-Western institutions that was not 
published in English. Indeed, the current boom of AI research, particularly in China, may 
significantly counteract the cultural bias in XAI user studies that we reported (Min et al., 2023). 
Future research analyzing the cultural diversity and models employed in Chinese language XAI 
user studies are therefore desirable. However, since most scientific studies are now published in 
English (Ramírez-Castañeda, 2020), our findings remain important because of the size and 
influence of English within the scientific community. Another limitation of our analyses is that 
we used country or region as proxies for culture, as it was typically the only culture-related 
information in the reviewed papers. Using this proxy ignores expatriates, mixed national 
demographics, and shared, technology-facilitated experience. We therefore welcome future XAI 
research reviews that explore other proxies for cultural background in XAI user studies.  
 
7. Conclusion 
 
XAI systems play an increasingly more significant role in many human-AI interactions because 
they can make opaque AI models more trustworthy to people, facilitating human control over 
these models. XAI developers are thus doing important work that is directly relevant for hybrid 
human-AI (HHAI) systems. Here, we examined whether currently popular XAI systems for lay-
users are equally suitable for people from different cultural backgrounds. We argued that XAI 
systems that produce internalist explanations (referring to mental states, e.g., beliefs) are currently 
popular but may cater primarily to the explanatory needs of people from individualist, typically 
WEIRD cultures. Psychological studies found that while most people from individualist cultures 
preferred human internalist explanations, people from collectivist, commonly non-WEIRD 
cultures tended to favor externalist explanations (referring to social roles, context, etc.). To help 
raise XAI and HHAI developers’ awareness of these and other cultural variations relevant for 
XAI design and human-AI interactions, we provided a table offering an empirically informed 
overview of them (see Table A1 in the Appendix).  
 

To support our claim that these variations are currently overlooked in XAI research, we 
analyzed 206 XAI user studies. Most of them contained no evidence that the researchers were 
aware of cultural variations in explanatory needs. Most studies also tested only WEIRD 
populations but researchers routinely generalized results beyond them. When we additionally 
analyzed 34 reviews of XAI user studies, we found that these problems went largely unnoticed 
even by most reviewers of these studies.  
 

In offering evidence of XAI-relevant cultural variations, of a widespread oversight of them in 
the field of XAI, and of pervasive WEIRD sampling paired with extrapolations to non-WEIRD 
populations, this paper uncovers both a cultural bias toward WEIRD populations and an 
important knowledge gap in the field of XAI regarding how culturally diverse users may respond 
to widely used XAI systems. If human-AI hybrids include XAI systems of the kind tested in most 
of the user studies we reviewed then these hybrids may inherit the mentioned cultural bias and be 
less inclusive than they appear and could be. We hope that our analyses help stimulate cross- and 



CULTURAL BIAS IN EXPLAINABLE AI 

 
987 

multi-cultural XAI user studies and improve the vital work that XAI and HHAI developers are 
doing in making AI systems more explainable and useful for all stakeholders no matter their 
cultural background.  
 
Acknowledgements 
 
We would like to thank Apolline Taillandier and Charlotte Gauvry for cross-checking the 
classifications of some of our key data. We are also very grateful for helpful comments from 
Caroline Gevaert, Benjamin Rosman, Alex Krauss, and three reviewers of this journal. We do not 
have funding to declare, and we grant JAIR the permission to publish this paper. 
 
UP conceived and designed the study, collected the data, did the data analysis, developed the 
argumentation, wrote the first draft, and did the editing. 
 
MC assisted with the collection of the data, data classification, revising, and editing of the paper. 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



PETERS & CARMAN 

988	

 

 
Table A1: Select overview of psychological and HCI research with relevance for XAI design

Author(s) Country/region  Cultural variation  Relation to XAI 
Howell (1981), 
Wierzbicka (1992), 
Lebra (1993) 
 

Incl. Peru, Japan, US Some non-Western cultures lack 
concepts comparable to Western 
psychological concepts ‘think’, ‘belief’, 
or ‘desire’ 

Use of mentalistic 
XAI framing (e.g., 
‘AI thinks _’)  

Miller (1984), Al-
Zahrani and Kaplowitz 
(1993), Morris and 
Peng (1994), Lee et al. 
(1996), Choi and 
Nisbett (1998) 

Incl. India, US, Korea, 
China, Japan, Saudi 
Arabia 

Non-Western (vs. Western) study 
participants referred to situations/social 
rules to explain behavior and were less 
susceptible to mistakenly explaining it 
through agents’ internal states when 
situational causes were available 

Preferences for 
internalist vs. 
externalist outputs 

Choi et al. (2003) 
 

Korea, US When explaining behavior, Korean 
participants preferred more contextual 
information than their US counterparts 

Preferences for 
XAI output scope 

Klein et al. (2014) 
  
 

Malaysia, US Malaysian participants preferred more 
detailed explanations for indeterminate 
situations; US participants favored 
simpler ones 

Preferences for 
XAI output 
complexity 

Hall and Hall (1990), 
Sanchez-Burks et al. 
(2003), Wurtz (2005), 
Rau et al. (2009), Wang 
et al. (2010), Lee and 
Sabanović (2014) 
 
 

High-context (e.g., 
Japan, China; South 
Korea), low-context 
(e.g., US, Germany) 
cultures 
 

Participants from Western, low-context 
cultures (communication with low use of 
non-verbal cues) preferred direct, 
explicit communication (e.g., “Drink 
water.”); participants from East Asian, 
high-context cultures preferred indirect, 
implicit communication (e.g., “Drinking 
water may alleviate headaches.”), e.g., 
in robot recommendations  

Preferences for 
XAI 
communication 
style 

Nisbett et al. (2001),  
Norenzayan et al. 
(2002), Varnum et al. 
(2010), Henrich et al. 
(2010), Klein et al. 
(2018) 

Western, East-Asian 
countries 
 

Western participants displayed more 
analytic thinking (i.e., rule-based object 
categorization, context-independent 
understanding of objects, formal logic in 
reasoning); East-Asian participants 
displayed more holistic thinking (i.e., 
similarity-based object categorization, 
focus on context, intuition in reasoning) 

Preferences for 
XAI output 
content (e.g., rule-
based vs. 
example-based) 

Otterbring et al. (2022) 
  

East-Asian, US East-Asian participants preferred 
abstract figures representing conformity; 
US participants favored objects 
representing uniqueness 

Preferences for 
XAI output 
content and format  

Reinecke and Gajo 
(2014),  
Alexander et al. (2021) 

Incl. Russia, 
Macedonia, Australia, 
China, Saudi Arabia 

Regarding websites’ visual 
complexity/design attributes (layout, 
navigation, etc.), Australian users 
focused on textual items; Chinese users 
scanned the whole page 

Preferences for 
XAI output 
complexity and 
format 

Baughan et al. (2021)  
 

US, Japan Visual attention differences affected 
website search: Japanese participants 
remembered more and faster found 
contextual website information than US 
counterparts 

Preferences for 
XAI output 
complexity and 
format  

Van Brummelen et al. 
(2022)  
 
 

US, Singapore, 
Canada, NZ, 
Indonesia, Iran, Japan, 
India 

Non-WEIRD participants’ perspectives 
emphasized virtual agent artificiality; 
WEIRD perspectives emphasized 
human-likeness  

Social embedding 
can influence 
perceptions of AI 

Appendix 



CULTURAL BIAS IN EXPLAINABLE AI RESEARCH	
 

	
 

989 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Table A2: Systematic literature review search strings 
 
 
 
 
 
 
 
 
 

 
Scopus (searched July 2022): 
 
TITLE-ABS-KEY ("XAI" OR "Explainable AI" OR "transparent AI" OR "interpretable AI" OR 
"accountable AI" OR "AI explainability" OR "AI transparency" OR "AI accountability" OR "AI 
interpretability" OR "model explainability" OR "explainable artificial intelligence" OR 
"explainable ML" OR "explainable machine learning" OR "algorithmic explicability" OR 
"algorithmic explainability") AND ("end user" OR "end-user" OR "audience" OR "consumer" 
OR "user" OR "user study" OR "user survey" OR "developer") AND (LIMIT-TO 
(DOCTYPE,"cp") OR LIMIT-TO (DOCTYPE,"ar") OR LIMIT-TO (DOCTYPE,"ch")) AND 
(LIMIT-TO (PUBYEAR,2022) OR LIMIT-TO (PUBYEAR,2021) OR LIMIT-TO 
(PUBYEAR,2020) OR LIMIT-TO (PUBYEAR,2019) OR LIMIT-TO (PUBYEAR,2018) OR 
LIMIT-TO (PUBYEAR,2017) OR LIMIT-TO (PUBYEAR,2016) OR LIMIT-TO 
(PUBYEAR,2012)) AND (LIMIT-TO (LANGUAGE,"English"))  
 
 
Web of Science (searched July 2022): 
 
(ALL=("XAI" OR "Explainable AI" OR "transparent AI" OR "interpretable AI" OR 
"accountable AI" OR "AI explainability" OR "AI transparency" OR "AI accountability" OR "AI 
interpretability" OR "model explainability" OR "explainable artificial intelligence" OR 
"explainable ML" OR "explainable machine learning" OR "algorithmic explicability" OR 
"algorithmic explainability")) AND ALL=("end user" OR "end-user" OR "audience" OR 
"consumer" OR "user" OR "user study" OR "user survey" OR 
"developer") and Article or Proceedings Papers or Early Access or Book Chapters(Document 
Types) and English (Languages) 
 
Refined by all ‘Publication Years’ (2012-01-01 to 2022-12-31) 
 
 
ArXiv (searched July 2022): 
 
Query: order: -announced_date_first; size: 50; date_range: from 2012-01-01 to 2022-12-31; 
classification: Computer Science (cs); include_cross_list: True; terms: AND all="XAI" OR 
"Explainable AI" OR "transparent AI" OR "interpretable AI" OR "accountable AI" OR "AI 
explainability" OR "AI transparency" OR "AI accountability" OR "AI interpretability" OR 
"model explainability" OR "explainable artificial intelligence" OR "explainable ML" OR 
"explainable machine learning" OR "algorithmic explicability" OR "algorithmic explainability"; 
AND all="end user" OR "end-user" OR "audience" OR "consumer" OR "user" OR "user study" 
OR "user survey" OR "developer" 
 
 



PETERS & CARMAN 

	
 

990 

 
 
 
 
 
 
 
 
 
 
 
 
 

Table A3: Frequency of nationalities/regions in the reviewed XAI studies 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Table A4: Numbers of countries/regions in papers with restricted and unrestricted conclusions 
 
 
 
 
 
 
 
 
 
 
 

Table A5: Reviews of XAI user study papers per year 
 
 
 
 
 
 

Country N Country N Country N 
No details 99 South/Latin America 5 Norway 1 
US 53 Sweden 4 Denmark 1 
UK 13 India 4 Iceland 1 
Germany 12 Netherlands  4 South Korea 1 
Canada 9 Asia 3 New Zealand 1 
Europe 9 Switzerland 3 Portugal 1 
Ireland 7 Belgium 2 Africa (unspecified) 1 
North America  6 Finland 2 Americas (unspecified) 1 
China 5 France 2 Rest of the world 1 
Italy 5 Brazil 2 Russia 1 
Australia 5 Japan 2 Costa Rica 1 
 

Countries or 
regions 

Scope of conclusion: 
Restricted 

Scope of conclusion: 
Unrestricted  

Total 

1 26 56 82 
2 2 5 7 
3 2 5 7 
4 0 3 3 
5 0 1 1 
6 2 3 5 
8 0 1 1 
19 0 1 1 
Total 32 75 107 
 

Year Number 
2012-2016 0 
2017 1 
2018 2 
2019 3 
2020 6 
2021 12 
Sept 2022 10 
Total 34 
 



CULTURAL BIAS IN EXPLAINABLE AI RESEARCH	
 

	
 

991 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Table A6: Systematic meta-review search strings 
 
 
 
 
 
 
 
 
 
 

 
Scopus (searched September 2022): 
 
TITLE-ABS-KEY ( "XAI"  OR  "Explainable AI"  OR  "transparent AI"  OR  "interpretable 
AI"  OR  "accountable AI"  OR  "AI explainability"  OR  "AI transparency"  OR  "AI 
accountability"  OR  "AI interpretability"  OR  "model explainability"  OR  "explainable artificial 
intelligence"  OR  "explainable ML"  OR  "explainable machine learning"  OR  "algorithmic 
explicability"  OR  "algorithmic explainability" )  AND  ( "end user"  OR  "end-
user"  OR  "audience"  OR  "consumer"  OR  "user"  OR  "user study"  OR  "user 
survey"  OR  "developer" )  AND  ( LIMIT-TO ( PUBYEAR ,  2022 )  OR  LIMIT-
TO ( PUBYEAR ,  2021 )  OR  LIMIT-TO ( PUBYEAR ,  2020 )  OR  LIMIT-
TO ( PUBYEAR ,  2019 )  OR  LIMIT-TO ( PUBYEAR ,  2018 )  OR  LIMIT-
TO ( PUBYEAR ,  2017 )  OR  LIMIT-TO ( PUBYEAR ,  2016 )  OR  LIMIT-
TO ( PUBYEAR ,  2012 ) )  AND  ( LIMIT-TO ( DOCTYPE ,  "re" ) )  AND  ( LIMIT-
TO ( LANGUAGE ,  "English" ) )  
 
 
Web of Science (searched September 2022): 
 
 (ALL=("XAI" OR "Explainable AI" OR "transparent AI" OR "interpretable AI" OR 
"accountable AI" OR "AI explainability" OR "AI transparency" OR "AI accountability" OR "AI 
interpretability" OR "model explainability" OR "explainable artificial intelligence" OR 
"explainable ML" OR "explainable machine learning" OR "algorithmic explicability" OR 
"algorithmic explainability")) AND ALL=("end user" OR "end-user" OR "audience" OR 
"consumer" OR "user" OR "user study" OR "user survey" OR "developer") and Review 
Article (Document Types) and English (Languages) 
 
Refined by all ‘Publication Years’ (2012-01-01 to 2022-12-31) 
 
 
ArXiv (searched September 2022): 
 
Query: order: -announced_date_first; size: 200; date_range: from 2012-01-01 to 2022-12-31; 
classification: Computer Science (cs); include_cross_list: True; terms: AND all="XAI" OR 
"Explainable AI" OR "transparent AI" OR "interpretable AI" OR "accountable AI" OR "AI 
explainability" OR "AI transparency" OR "AI accountability" OR "AI interpretability" OR 
"model explainability" OR "explainable artificial intelligence" OR "explainable ML" OR 
"explainable machine learning" OR "algorithmic explicability" OR "algorithmic explainability"; 
AND all="end user" OR "end-user" OR "audience" OR "consumer" OR "user" OR "user study" 
OR "user survey" OR "developer"; AND all="review" 
 
 



PETERS & CARMAN 

	
 

992 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Table A7: Restricted versions of unrestricted conclusions. The parts in bold are the restricting 
components. 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

(1) “Our user study shows that non-experts can analyze our explanations and identify a rich set of 
concepts within images that are relevant (or irrelevant) to the classification process.” (Schneider & 
Vlachos, 2023, p. 4196) 
Restricted: “Our user study shows that non-expert participants could analyze our explanations 
and identify a rich set of concepts within images that were relevant (or irrelevant) to the 
classification process.” 
(2) “Our pilot study revealed that users are more interested in solutions to errors than they are in 
just why the error happened.” (Hald et al., 2021, p. 218) 
Restricted: “Our pilot study revealed that users were more interested in solutions to errors than 
they were in just why the error happened.” 
(3) “Our findings demonstrate that humans often fail to trust an AI when they should, but also that 
humans follow an AI when they should not.” (Schmidt et al., 2020, p. 272) 
Restricted: “Our findings demonstrate that participants often failed to trust an AI when they 
should have trusted it, but also that they followed an AI when they should not have done so.” 
(4) “We also found that users understand explanations referring to categorical features more 
readily than those referring to continuous features.” (Warren et al., 2022, p. 1) 
Restricted: “We also found that users understood explanations referring to categorical features 
more readily than those referring to continuous features.” 
(5) “Both experiments show that when people are given case-based explanations, from an 
implemented ANN-CBR twin system, they perceive miss-classifications to be more correct.” 
(Ford et al., 2020, p. 1) 
Restricted: “Both experiments show that when participants were given case-based explanations, 
from an implemented ANN-CBR twin system, they perceived miss-classifications to be more 
correct.” 
(6) “Results indicate that human users tend to favor explanations about policy rather than about 
single actions.” (Waa et al., 2018, p. 1) 
Restricted: “Results indicate that participants tended to favor explanations about policy rather 
than about single actions.” 
(7) “Our findings suggest that people do not fully trust algorithms for various reasons, even when 
they have a better idea of how the algorithm works.” (Cheng et al., 2019, p. 10) 
Restricted: Our findings suggest that people did not fully trust algorithms for various reasons, 
even when they had a better idea of how the algorithm works.” 
 



CULTURAL BIAS IN EXPLAINABLE AI RESEARCH	
 

	
 

993 

 

References 
 
Abdul, A., Vermeulen, Wang, D., Lim, B.Y., & Kankanhalli, M. (2018). Trends and trajectories 

for explainable, accountable and intelligible systems: An HCI research agenda. Proceedings of 
the 2018 CHI Conference on Human Factors in Computing Systems, 1–18.  

 
Adadi, A. & Berrada, M. (2018). Peeking inside the black-box: A survey on Explainable 

Artificial Intelligence (XAI). IEEE Access, 6, 52138–52160. 
 

Akaliyski, P., Welzel, C., Bond, M. H., & Minkov, M. (2021). On “nationology”: The 
gravitational field of national culture. Journal of Cross-Cultural Psychology, 52(8-9), 771–
793.  

 
Akata, Z. & Balliet, D, Rijke, Maarten, R., Dignum, F., Dignum, V., Eiben, Guszti, E., Fokkens, 

A. Grossi, D. & Hindriks, Koen, H., Hoos, H., Hung, H. & Jonker, C., Monz, C. & Neerincx, 
M. & Oliehoek, F., Prakken, H., Schlobach, S., Gaag, L., Harmelen, F. & Welling, M. (2020). 
A research agenda for Hybrid Intelligence: Augmenting human intellect with collaborative, 
adaptive, responsible, and explainable Artificial Intelligence. Computer. 53. 18-28. 
10.1109/MC.2020.2996587. 

 
Alexander, R., Thompson, N., McGill, T. & Murray, D. (2021). The influence of user culture on 

website usability. International Journal of Human-Computer Studies, 154, 102688, 
https://doi.org/10.1016/j.ijhcs.2021.102688. 

 
Al-Zahrani, S., & Kaplowitz, S. (1993). Attributional biases in individualistic and collectivistic 

cultures. Journal of Personality & Social Psychology, 47, 793–804. 
 
Arrieta, A.B., Díaz-Rodríguez, N., del Ser, J., Bennetot, A., Tabik, S., et al. (2020). Explainable 

Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward 
responsible AI. Information Fusion, 58, 82-115. 

 
Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., Bonnefon, J. F., & Rahwan, I. 

(2018). The Moral Machine experiment. Nature, 563(7729), 59–64. 
https://doi.org/10.1038/s41586-018-0637-6 

 
Baldwin, J. R., Faulkner, S. L., Hecht, M. L., & Lindsley, S. L. (eds). (2006). Redefining Culture: 

Perspectives across the disciplines. New Jersey: Lawrence Erlbaum Associates. 
 
Bansal, G., Nushi, B., Kamar, E., Lasecki, W. S., Weld, D. S., & Horvitz, E. (2019). Beyond 

accuracy: The role of mental models in human-AI team performance. Proceedings of the AAAI 
Conference on Human Computation and Crowdsourcing, 7(1), 2-11. 
https://doi.org/10.1609/hcomp.v7i1.5285 

 
Banerjee, A., & Chaudhury, S. (2010). Statistics without tears: Populations and 

samples. Industrial Psychiatry Journal, 19(1), 60–65. https://doi.org/10.4103/0972-
6748.77642 

 
Baughan, A., Oliveira, N., August, T., Yamashita,N. & Reinecke, K. (2021). Do cross-cultural 

differences in visual attention patterns affect search efficiency on websites? Proceedings of the 
2021 CHI Conference on Human Factors in Computing Systems, 362, 1–12.  



PETERS & CARMAN 

	
 

994 

 
Bhatt, U., Xiang, A., Sharma, S., Weller, A., Taly, A., Jia, Y., Ghosh, J., Puri, R., Moura, J.M., & 

Eckersley, P. (2019). Explainable machine learning in deployment. Proceedings of the 2020 
Conference on Fairness, Accountability, and Transparency, 648–657. 
https://doi.org/10.1145/3351095.3375624 

 
Budiman, A., & Ruiz, N.G. (2021). Key facts about Asian origin groups in the U.S. Pew 

Research Centre. https://www.pewresearch.org/fact-tank/2021/04/29/key-facts-about-asian-
origin-groups-in-the-u-s/ 

 
Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning 

algorithms. Big Data & Society. https://doi.org/10.1177/ 20539 51715 622512 
 
Carman, M. & Rosman, B. (2021). Applying a principle of explicability to AI research in Africa: 

Should we do it? Ethics and Information Technology 23, 2, 107–117. 
 
Casey, B., Farhangi, A. & Vogl, R. (2019). Rethinking explainable machines: The GDPR’s 

‘Right to Explanation’ debate and the rise of algorithmic audits in enterprise. Berkeley 
Technology Law Journal, 34, 1, 143–188. 

 
Cazañas, A., de San Miguel, A., & Parra, E. (2017). Estimating sample size for usability 

testing. Enfoque UTE, 8(1), 172-185.  
 
Cha, J.-H., & Nam, K. D. (1985). A test of Kelley’s cube theory of attribution: A cross-cultural 

replication of McArthur's study. Korean Social Science Journal, 12, 151–180. 
 
Cheon, B. K., Melani, I., & Hong, Y. (2020). How USA-centric is psychology? An archival study 

of implicit assumptions of generalizability of findings to human nature based on origins of 
study samples. Social Psychological and Personality Science, 11(2), 928-937. 

 
Chen, L., Ning, H., Nugent, C., & Yu, Z. (2020). Hybrid Human-Artificial Intelligence, 

IEEE Computer, 53(8), 14-17. 
 
Cheng, Hao-Fei & Wang, Ruotong & Zhang, Zheng & O'Connell, Fiona & Gray, Terrance & 

Harper, Franklin & Zhu, Haiyi. (2019). Explaining decision-making algorithms through UI: 
Strategies to help non-expert stakeholders. Proceedings of the 2019 CHI Conference on 
Human Factors in Computing Systems, 1-12. 10.1145/3290605.3300789. 

 
Chou, Y., Moreira, C., Bruza, P., Ouyang, C., & Jorge, J. (2022). Counterfactuals and causability 

in explainable Artificial Intelligence: Theory, algorithms, and applications. Information 
Fusion, 81, 59-83. 

 
Choi, I. & Nisbett, R.E. (1998). Situational salience and cultural differences in the 

correspondence bias and actor-observer bias. Personality and Social Psychology Bulletin, 24, 
949–960. 

 
Choi, I., Dalal, R., Kim-Prieto, C. & Park, H. (2003). Culture and judgment of causal 

relevance. Journal of Personality and Social Psychology, 84(1), 46–59. 
 
De Graaf, M. M. & Malle, B. F. (2017). How people explain action (and autonomous intelligent 

systems should too). AAAI Fall Symposium Series, 19-26. 



CULTURAL BIAS IN EXPLAINABLE AI RESEARCH	
 

	
 

995 

 

 
Ding, W., Abdel-Basset, M., Hawash, H. & Ali, A. (2022). Explainability of Artificial 

Intelligence methods, applications and challenges: A comprehensive survey. Information 
Sciences, 615, 238-292. 

 
Ehsan, U., Q. Liao, V.Q., Muller, M., Riedl, M.O., & Weisz. J.D., (2021). Expanding 

Explainability: Towards Social Transparency in AI systems. Proceedings of the 2021 CHI 
Conference on Human Factors in Computing Systems, 82, 1–19. 

 
Fatehi, K., Priestley, J. L., & Taasoobshirazi, G. (2020). The expanded view of individualism and 

collectivism: One, two, or four dimensions? International Journal of Cross Cultural 
Management, 20(1), 7–24. https://doi.org/10.1177/1470595820913077 

 
Fiske, A.P., Kitayama, S., Markus, H.R., & Nisbett, R.E. (1998). The cultural matrix of social 

psychology. In D. T. Gilbert, S. T. Fiske, & G. Lindzey (Eds.), The Handbook of Social 
Psychology (pp. 915–981). McGraw-Hill. 

 
Ford, C., Kenny, E.M., & Keane, M.T. (2020). Play MNIST For Me! User studies on the effects 

of post-hoc, example-based explanations & error rates on debugging a deep learning, black-
box classifier. arXiv, abs/2009.06349. 

 
Gerlings, J., Shollo, A., & Constantiou, I.D. (2020). Reviewing the need for Explainable 

Artificial Intelligence (XAI). Proceedings of the 54th Hawaii International Conference on 
System Sciences. URL: https://arxiv.org/pdf/2012.01007.pdf 

 
Ghai S. (2021). It’s time to reimagine sample diversity and retire the WEIRD dichotomy. Nature 

Human Behaviour, 5(8), 971–972. https://doi.org/10.1038/s41562-021-01175-9 
 
Hald, K., Weitz, K., André, E. & and Matthias Rehm, M. (2021). “An Error Occurred!” - Trust 

repair with virtual robot using levels of mistake explanation. Proceedings of the 9th 
International Conference on Human-Agent Interaction (HAI '21), 218–226. 
https://doi.org/10.1145/3472307.3484170 

 
Hall, E. T., & Hall, M. R. (1990). Understanding Cultural Differences. Yarmouth, ME:  

Intercultural Press Inc. 
 
Hampton, R.S. & Varnum, M.E.W. (2020). Individualism-Collectivism. In: Zeigler-Hill, V., 

Shackelford, T.K. (eds) Encyclopedia of Personality and Individual Differences. Springer, 
Cham.  

 
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world? Behavioral 

and Brain Sciences, 33, 61–135. 
 
Hofstede, G. (1980). Culture and organizations. International Studies of Management & 

Organization, 10, 4, 15-41, DOI: 10.1080/00208825.1980.11656300 
 
Howell, S. (1981). Rules not words. In P. Heelas and A. Lock (eds.), Indigenous Psychologies, 

London: Academic Press.  
 



PETERS & CARMAN 

	
 

996 

Islam, M. R., Ahmed, M. U., Barua, S., & Begum, S. (2022). A systematic review of Explainable 
Artificial Intelligence in terms of different application domains and tasks. Applied 
Sciences, 12(3), 1353. URL: http://dx.doi.org/10.3390/app12031353 

 
Keith, M., & Harms, P. (2016). Is Mechanical Turk the answer to our sampling woes? Industrial 

and Organizational Psychology, 9(1), 162-167.  
 
Klein, G., Rasmussen, L., Lin, M. H., Hoffman, R. R., & Case, J. (2014). Influencing preferences 

for different types of causal explanation of complex events. Human Factors, 56(8), 1380–
1400.  

 
Klein R. A., Vianello M., Hasselman F., Adams B. G., Adams R. B.Jr., Alper S., . . . Nosek B. A. 

(2018). Many Labs 2: Investigating variation in replicability across samples and 
settings. Advances in Methods and Practices in Psychological Science, 1, 443–490. 

 
Laato, S., Tiainen, M., Islam, A. K. M. N., & Mäntymäki, M. (2022). How to explain AI systems 

to end users: A systematic literature review and research agenda. Internet Research. 
https://doi.org/10.1108/INTR-08-2021-0600  

 
Lavelle, J. (2021). The impact of culture on mindreading. Synthese, 198, 10.1007/s11229-019-

02466-5. 
 
Lebra, T.S. (1993). Culture, self, and communication in Japan and the United States. In W. 

Gudykunst (Ed.), Communication in Japan and the United States (pp. 51–87). Albany, NY: 
State University of New York Press. 

 
Lee, F., Hallahan, M., & Herzog, T. (1996). Explaining real-life events: How culture and domain 

shape attributions. Personality and Social Psychology Bulletin, 22, 732–741. 
 
Lee, H.R. & Sabanović, S. (2014). Culturally variable preferences for robot design and use in 

South Korea, Turkey, and the United States. Proceedings of the 2014 ACM/IEEE International 
Conference on Human-Robot Interaction (HRI '14), 17–24. 
https://doi.org/10.1145/2559636.2559676 

 
Lillard A. (1998). Ethnopsychologies: Cultural variations in theories of mind. Psychological 

Bulletin, 123(1), 3–32.  
 
Linxen, S., Sturm, C., Brühlmann, F., Cassau, V., Opwis, and Reinecke, K. (2021). How WEIRD 

is CHI? Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, 
143, 1–14.  

 
Liu, N. & Kirshner, S. & Lim, E. (2023). Is algorithm aversion WEIRD? A cross-country 

comparison of individual-differences and algorithm aversion. Journal of Retailing and 
Consumer Services. 72. 103259. 10.1016/j.jretconser.2023.103259. 

 
Masuda, T., & Nisbett, R. E. (2001). Attending holistically versus analytically: Comparing the 

context sensitivity of Japanese and Americans. Journal of Personality and Social Psychology, 
81(5), 922–934.  

 
Matsumoto, D. (1996). Culture and Psychology. Pacific Grove, CA: Brooks/Cole. 
 



CULTURAL BIAS IN EXPLAINABLE AI RESEARCH	
 

	
 

997 

 

McSweeney, B. (2002). Hofstede’s model of national cultural differences and their consequences: 
A triumph of faith - a failure of analysis. Human Relations, 55(1), 89-
118. https://doi.org/10.1177/0018726702551004 

 
Miller, J. (1984). Culture and the development of everyday social explanation. Journal of 

Personality and Social Psychology, 46(5), 961–978. 
 
Miller, T. (2019). Explanation in artificial intelligence: Insights from the social sciences. 

Artificial Intelligence, 267, 1–38.  
 
Min, C., Zhao, Y., Bu, Y., Ding, Y., & Wagner, C.S. (2023). Has China caught up to the US in 

AI research? An exploration of mimetic isomorphism as a model for late industrializers. arXiv, 
abs/2307.10198. 

 
Mueller, S.T., Hoffman, R.R., Clancey, W. J., Emrey, A., & Klein, G. (2019). Explanation in 

human-AI systems: A literature meta-review, synopsis of key ideas and publications, and 
bibliography for explainable, arXiv preprint arXiv:1902.01876 

 
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & PRISMA Group (2009). Preferred 

reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLoS 
Medicine, 6(7), e1000097. https://doi.org/10.1371/journal.pmed.1000097 

 
Morris, M.W., & Peng, K. (1994). Culture and cause: American and Chinese attributions for 

social and physical events. Journal of Personality and Social Psychology, 67, 949–971. 
 
Nisbett, R.E., Peng, K., Choi, I., & Norenzayan, A. (2001). Culture and systems of thought: 

Holistic versus analytic cognition. Psychological Review, 108(2), 291–310. 
 
Norenzayan, A., Smith, E.E., Kim, B.J. & Nisbett, R.E. (2002). Cultural preferences for formal 

versus intuitive reasoning. Cognitive Science, 26, 653-684. 
 
Okolo, C. & Dell, N. & Vashistha, A. (2022). Making AI Explainable in the Global South: A 

Systematic Review. Proceedings of the 5th ACM SIGCAS/SIGCHI Conference on Computing 
and Sustainable Societies (COMPASS '22), 439–452. https://doi.org/10.1145/3530190.3534802 

 
Ooge, J., Kato, S. & Verbert, K. (2022). Explaining recommendations in E-Learning: Effects on 

adolescents' trust. 27th International Conference on Intelligent User Interfaces (IUI '22), 93–
105.  

 
Otterbring, T., Bhatnagar, R., & Folwarczny, M. (2022). Selecting the special or choosing the 

common? A high-powered conceptual replication of Kim and Markus' (1999) pen study. The 
Journal of Social Psychology, 1–7. Advance online publication. 
https://doi.org/10.1080/00224545.2022.2036670 

 
Oyserman, D., Coon, H. M. & Kemmelmeier, M. (2002). Rethinking individualism and 

collectivism: Evaluation of theoretical assumptions and meta-analyses. Psychological Bulletin, 
128(1), 3–72. 

 
Peer, E., Brandimarte, L., Samat, S., & Acquisti, A. (2017). Beyond the Turk: Alternative 

platforms for crowdsourcing behavioral research. Journal of Experimental Social Psychology, 
70, 153-163.  



PETERS & CARMAN 

	
 

998 

 
Pelham, B., Hardin, C., Murray, D., Shimizu, M., & Vandello, J. (2022). A truly global, non-

WEIRD examination of collectivism: The Global Collectivism Index (GCI). Current Research 
in Ecological and Social Psychology, 3, https://doi.org/10.1016/j.cresp.2021.100030. 

 
Peters, U., Krauss, A. & Braganza, O. (2022). Generalization bias in science. Cognitive Science, 

46: e13188, https://doi.org/10.1111/cogs.13188 
 
Peters, U. & Lemeire, O. (2023). Hasty generalizations are pervasive in experimental philosophy: 

A systematic analysis. Philosophy of Science. 1-29. 10.1017/psa.2023.109. 
 
Peters, U. & Carman, M. (2023). Unjustified sample sizes and generalizations in Explainable AI 

research: Principles for more inclusive user studies. IEEE Intelligent Systems, 38(6), 52–60. 
URL: https://arxiv.org/pdf/2305.09477.pdf 

 
Rad, M.S., Martingano, A.J., & Ginges, J. (2018). Toward a psychology of homo sapiens: 

Making psychological science more representative of the human population. Proceedings of 
the National Academy of Sciences of the United States of America, 115(45), 11401–11405. 

 
Ramírez-Castañeda, V. (2020). Disadvantages in preparing and publishing scientific papers 

caused by the dominance of the English language in science: The case of Colombian 
researchers in biological sciences. PloS One, 15(9), e0238372. 
https://doi.org/10.1371/journal.pone.0238372 

 
Rau, P., Li, Y., & Li, D. (2009). Effects of communication style and culture on ability to accept 

recommendations from robots. Computers in Human Behavior, 25(2), 587–595. 
 
Reinecke, K. & Gajos, K.Z. (2014). Quantifying visual preferences around the world. 

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '14, 
11-20. 

 
Reinecke, K., & Gajos, K. Z. (2015). LabintheWild: Conducting large-scale online experiments 

with uncompensated samples. Proceedings of the 18th ACM Conference on Computer 
Supported Cooperative Work & Social Computing, 1364–1378.  

 
Robbins, J., & Rumsey, A. (2008). Introduction: Cultural and linguistic anthropology and the 

opacity of other minds. Anthropological Quarterly, 81(2), 407–420. 
 
Rong, Y.R., Leemann, T., Nguyen, Thai-trang, N., Fiedler, L., Seidel, T., Kasneci, G., & 

Enkelejda, K. (2022). Towards human-centered Explainable AI: User studies for model 
explanations, arXiv, arXiv:2210.11584 

 
Sanchez-Burks, J., Lee, F., Choi, I., Nisbett, R., Zhao, S., & Koo, J. (2003). Conversing across 

cultures: East-West communication styles in work and nonwork contexts. Journal of 
Personality and Social Psychology, 85(2), 363–372.   

 
Sawaya, Y., Sharif, M., Christin, N., Kubota, A., Nakarai, A., & Yamada, A. (2017). Self-

confidence trumps knowledge: A cross-cultural study of security behavior. Proceedings of the 
2017 CHI Conference on Human Factors in Computing Systems, 2202–2214.  

 



CULTURAL BIAS IN EXPLAINABLE AI RESEARCH	
 

	
 

999 

 

Schmidt, P., Biessmann, F. & Teubner, T. (2020) Transparency and trust in Artificial Intelligence 
systems. Journal of Decision Systems, 29, 4, 260-278, DOI: 10.1080/12460125.2020.1819094 

 
Schneider, J., & Vlachos, M. (2023). Explaining classifiers by constructing familiar 

concepts. Machine Learning 112, 4167–4200. https://doi.org/10.1007/s10994-022-06157-0 
 
Seaborn, K., Barbareschi, G., & Chandra, S. (2023). Not only WEIRD but “uncanny”? A 

systematic review of diversity in Human–Robot Interaction research. International Journal of 
Social Robotics, 1 - 30. https://doi.org/10.1007/s12369-023-00968-4 

 
Simons, D.J., Shoda, Y., & Lindsay, D.S. (2017). Constraints on generality (COG): A proposed 

addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123–1128. 
 
Sperrle, F., El-Assady, M., Guo, G., Chau, D., Endert, A., & Keim, D.A. (2020). Should we trust 

(X)AI? Design dimensions for structured experimental evaluations. arXiv, abs/2009.06433. 
 
Taras, V., Rowney, J. & Steel, P. (2009). Half a century of measuring culture: Review of 

approaches, challenges, and limitations based on the snalysis of 121 instruments for 
quantifying culture. Journal of International Management, 15. 357-373. 
10.1016/j.intman.2008.08.005. 

 
Taras, V., Steel, P. & Kirkman, B.L. (2016). Does country equate with culture? Beyond 

geography in the search for cultural boundaries. Management International Review, 56, 455–
487.  

 
Taylor, J., & Taylor, G.W. (2021). Artificial cognition: How experimental psychology can help 

generate explainable artificial intelligence. Psychonomic Bulletin & Review, 28(2), 454–475.  
 
Triandis, H.C. (1995). Individualism and Collectivism. Boulder, CO: Westview Press.  
 
Ur, B. & Wang, Y. (2013). A cross-cultural framework for protecting user privacy in online 

social media. Proceedings of the 22nd International Conference on World Wide Web, 755-762. 
10.1145/2487788.2488037. 

 
Van Brummelen, J. Kelleher, M., Tian, M.C., & Nguyen. N.H. (2022). What do WEIRD and non-

WEIRD conversational agent users want and perceive? Towards transparent, trustworthy, 
democratized agents, arXiv, arXiv:2209.07862 

 
Varnum, M. E., Grossmann, I., Kitayama, S., & Nisbett, R.E. (2010). The origin of cultural 

differences in cognition: Evidence for the social orientation hypothesis. Current Directions in 
Psychological Science, 19(1), 9–13.  

 
Waa, J.V., Diggelen, J.V., Bosch, K.V., & Neerincx, M.A. (2018). Contrastive explanations for 

reinforcement learning in terms of expected consequences. arXiv, abs/1807.08706. 
 
Waa, J.v.d, Schoonderwoerd, T., van Diggelen, J. & Neerincx, M. (2020). Interpretable 

confidence measures for decision support systems. International Journal of Human–Computer 
Studies, 144, doi: 10.1016/j.ijhcs.2020.102493. 

 



PETERS & CARMAN 

	
 

1000 

Wang, L., Rau, P.P., Evers, V., Robinson, B.K. & Hinds, P. (2010). When in Rome: The role of 
culture and context in adherence to robot recommendations. Proceedings of the 5th ACM/IEEE 
International Conference on Human-Robot Interaction (HRI '10), 359–366. 

 
Wang, X. X Ming Yin, M. (2021). Are explanations helpful? A comparative study of the effects 

of explanations in AI-assisted decision-making. 26th International Conference on Intelligent 
User Interfaces (IUI '21), 318–328. https://doi.org/10.1145/3397481.3450650 

 
Wang, G., Qi, P., Yanfeng, Z., & Mingyang, Z. (2021). What have we learned from 

OpenReview? arXiv, https://doi.org/10.48550/arxiv.2103.05885 
 
Warren, G., Keane, M.T., & Byrne, R.M. (2022). Features of explainability: How users 

understand counterfactual and causal explanations for categorical and continuous features in 
XAI. arXiv, abs/2204.10152. 

 
Wierzbicka, A. (1992). Semantics, Culture, and Cognition. Oxford: Oxford University Press. 
 
Wurtz, E. (2005), Intercultural communication on web sites: A cross-cultural analysis of web 

sites from high-context cultures and low-context cultures. Journal of Computer-Mediated 
Communication, 11, 274-299. 

   
Yang, F., Huang, Z., Scholtz, J. & Arendt, D.L. (2020). How do visual explanations foster end 

users' appropriate trust in machine learning? Proceedings of the 25th International Conference 
on Intelligent User Interfaces (IUI '20), 189–201.  

 
Yilmaz, O., & Alper, S. (2019). The link between intuitive thinking and social conservatism is 

stronger in WEIRD societies. Judgment and Decision Making, 14(2), 156–
169. https://doi.org/10.1017/S1930297500003399 

 
Zerilli, J., Knott, A., Maclaurin, J., & Gavaghan, C. (2019). Transparency in algorithmic and 

human decisionmaking: is there a double standard? Philosophy & Technology, 661–683, 
https://doi.org/10.1007/s13347-018-0330-6 

 
Zerilli, J., (2022). Explaining Machine Learning Decisions. Philosophy of Science, 89, 1–19. 
 
Zerilli, J., Bhatt, U., & Weller, A. (2022). How transparency modulates trust in artificial 

intelligence. Patterns, 3(4), 100455. https://doi.org/10.1016/j.patter.2022.100455 
 
Zhang, Q., Lee, M.L., & Carter, S. (2022). You complete me: Human-AI teams and 

complementary expertise. Proceedings of the 2022 CHI Conference on Human Factors in 
Computing Systems, 114, 1–28.