Running head: TAP VALIDATION 1 A Preregistered Validation Study

Running head: TAP VALIDATION
1
A Preregistered Validation Study of the Taylor Aggression Paradigm
David S. Chester
Department of Psychology, Virginia Commonwealth University, USA
Abstract Word Count: 149
Introduction/Discussion/Acknowledgments/Footnote Word Count: 1,981
References: 35
Correspondence should be addressed to:
David S. Chester
302 Thurston House
Virginia Commonwealth University
Richmond, VA, 23284, USA
dschester@vcu.edu
1-804-828-7624
TAP VALIDATION
2
Abstract
The Taylor Aggression Paradigm (TAP) is a frequently-used laboratory measure
of aggressive behavior. However, the flexibility inherent in its implementation and
analysis can undermine its validity. To test whether the TAP was a valid aggression
measure irrespective of this flexibility, I conducted a preregistered study of a 25-trial
version of the TAP using a single scoring approach with 160 diverse undergraduate
participants. TAP scores showed agreement with other laboratory aggression measures
and were magnified by an experimental provocation manipulation. Mixed evidence was
found for associations with aggressive dispositions and real-world violence. These
results provide preliminary support for this approach to the TAP to measure state-level
aggressive behavior. However, more evidence is needed to assess the TAP’s external
validity and ability to measure dispositional forms of aggression. Using preregistered
designs, researchers should validate specific variants of their behavioral tasks in order
to optimize the veridicality and reproducibility of psychological science.
Keywords: aggression, Taylor Aggression Paradigm, preregistration, validation,
Competitive Reaction-Time Task
TAP VALIDATION
3
Introduction
The study of aggressive behavior has been a challenging endeavor. Observing
violence in “the wild” has its own risks. Yet, when researchers sought to measure
aggressive behavior in the laboratory, they had to grapple with substantial ethical,
logistical, and theoretical hurdles. The Taylor Aggression Paradigm (hereafter TAP;
Taylor, 1967) arose from these early attempts at operationalization. Now in its 50 th year,
the task has made great contributions to psychological science but has also received
substantive criticism. In what follows, I briefly review the history of the TAP and some
modern critiques that highlight problems with the task’s flexibility. I then detail a study
that was conducted to validate a single variant of the task, in light of these flexibility
issues.
The Taylor Aggression Paradigm: A Brief History and Overview
As stated by Taylor himself, “In order to investigate aggression in the laboratory it
is necessary to have an effective method for inciting aggression and for objectively
measuring the aggressive responses that follow” (pp. 297, Epstein & Taylor, 1967;
Taylor 1967). To extend laboratory aggression measurement beyond hypotheticals, selfreports, and projective tests, Taylor sought to simulate a provocative social experience
in the lab and objectively measure ‘real’ aggressive responses. In the original TAP,
participants selected the intensity of an electric shock to administer to an opponent, who
likewise did the same for them. Participants then repeatedly competed against their
opponent to flip a switch faster in response to a cue. If they lost the competition,
participants received a shock at the intensity of their opponent’s choosing. Participants
TAP VALIDATION
4
could see the level of the shock that their opponent selected for them, which allowed the
experimenters to manipulate the level of provocation experienced by participants.
In modern times, the TAP is most-frequently implemented as a computer
program that can administer noise blasts through headphones instead of shocks
delivered through electrodes (Bond & Lader, 1986; Bushman, 1995). In this new
approach, participants can set the duration and volume of these noise blasts and these
settings serve as the laboratory operationalization of aggression. The TAP often
includes multiple trials in order to provide a more reliable estimate of aggressive
behavior. As in the original task, the computerized TAP typically models participants’
opponents as provocateurs who begin the task by selecting the loudest and longest
noise blasts possible.
The TAP has been used widely with 382 scholarly papers citing the task in 2016
alone (citation estimates from Google Scholar). This task has even been modified for
the brain-imaging environment, allowing for the investigation of neural correlates of
aggressive behavior (e.g., Krämer, Jansma, Tempelmann, & Münte, 2007). However,
such popularity has not been reached without criticism.
Challenges and Critiques
Laboratory measures of aggression, such as the TAP, have been categorically
criticized for reasons including a lack of ecological validity, an exaggerated focus on
retaliatory and not unprovoked aggression, that they are sanctioned by an authority
figure, and a lack of alternative options other than aggressive retaliation (Tedeschi &
Quigley, 1996, 2000). The TAP has been specifically criticized for a lack of construct
validity (Ferguson, Smith, Miller-Stratton, Fritz, & Heinrich, 2008). As a rebuttal, the TAP
TAP VALIDATION
5
exhibits convergent validity with self-report measures of aggression (Giancola & Parrott,
2008; Giancola & Zeichner, 1995) and with real-life acts of physical violence (e.g.,
Chester & DeWall, 2016), shows discriminant validity with traits unassociated with
physical aggression (Giancola & Zeichner, 1995), and external validity in that effect
sizes captured by the TAP correspond to those obtained by field studies (Anderson &
Bushman, 1997).
More recently, the TAP has been criticized for a lack of standardization between
and within laboratories (Elson, Mohseni, Breuer, Scharkow, & Quandt, 2014). For
instance, some studies only analyze volume or duration settings (not both) or the first or
second trial (not all trials) of the task. Such flexibility can be a strength, allowing
researchers to tailor the task to a given study.
However, this flexibility can also be problematic if researchers test their
hypothesis using multiple scoring and analytic strategies and then present the result
that best fits their hypothesis as coming from a singular test (Simmons, Nelson, &
Simonsohn, 2011). Such ‘researcher degrees-of-freedom’ capitalize on chance and
increase the likelihood of type 1 errors. Indeed, as the TAP has grown in popularity, so
too have the variants in which the task is implemented and the data are analyzed (for
visualizations of this phenomenon see http://crtt.flexiblemeasures.com/).
Fortunately, there is a simple solution to ensure that a flexible measure such as
the TAP is not abused: preregistration. Through preregistration, researchers specify
their hypotheses, methods, and planned analyses before data analysis begins,
preventing any flexible and exploratory analyses from being reported as singular and
confirmatory (Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012). This
TAP VALIDATION
6
preregistration technique can then be applied to validate the TAP, independent of
flexibility.
Validation Approaches to the TAP
There are many ways to demonstrate that the TAP is a valid and useful
laboratory aggression measure. First, TAP scores should correspond to other measures
of physically aggressive behavior and tendencies (e.g., other laboratory aggression
tasks, self-report aggression questionnaires), suggesting convergent validity of the task
(Giancola & Zeichner, 1995). Second, the TAP should be responsive to factors known to
magnify aggressive behavior (e.g., provocation, being male), suggesting construct
validity of the task (Giancola & Parrott, 2008). Third, variables that are distinct from
physically harming others (e.g., verbally harming others, harming the self) should not
correspond to TAP scores, thus demonstrating the task’s discriminant validity
(Bernstein, Richardson, & Hammock, 1987). Fourth, as an index of the task’s potential
external validity, TAP scores should correspond to real-world acts of physical
aggression (Anderson & Bushman, 1997). Finally, internal consistency is a necessary
pre-requisite for the validity of a measure that is comprised of multiple responses. If a
preregistered version of the TAP is shown to exhibit these five qualities, then this task
can be considered a valid aggression measurement, beyond concerns regarding the
task’s flexibility.
Present Research
In a preliminary attempt to demonstrate the TAP’s validity outside the context of
methodological and analytic flexibility, I conducted a preregistered study. In this study, a
25-trial version of the TAP was administered alongside variables that would provide
TAP VALIDATION
7
support for the forms of validity detailed above. The preregistration plan for this study is
available online (https://osf.io/x7rjb/register/565fb3678c5e4a66b5582f67), as are all
data, analysis code, and materials (https://osf.io/a2wft/files/).
Methods
Participants
A power-analysis using G*Power 3.1 for an effect size of r = .25 (an estimate
based on mean TAP score correlations with trait physical aggression from Webster et
al., 2014), an alpha level of .05, and 90% power returned a planned sample size of 160.
This sample size was large enough to provide 90% power for a d = .47 effect size of
gender and the experimental provocation on TAP scores.
Based on this power analysis, I sought to enroll 160 participants (80 females, 80
males), following the stop rule that participants had to complete all preregistered
measures of the study to be included in the final sample. As such, I recruited 174
participants, 14 of which had some form of missing data and were excluded from
subsequent analyses (as specified in my preregistration plan).
Final participants included 160 undergraduates, which were not gender-balanced
(65.6% female, 34.4% male; Age: M = 20.09, SD = 5.03, range: 18-55). Participants
were recruited from the Virginia Commonwealth University’s introductory psychology
participant pool. Participants’ racial composition was 47.1% White, 18.8% AfricanAmerican, 17.5% Asian-American, and 15.6% Other. The sample was 13.1% Hispanic
and 86.9% Non-Hispanic. Participants were compensated with credit towards their
introductory psychology course’s research participation assignment.
Preregistered Measures
TAP VALIDATION
8
Measures included in the study’s preregistration are summarized below in the
temporal sequence in which participants completed them.
Taylor Aggression Paradigm. This study’s explicit purpose was to validate the
25-trial version of the TAP (Anderson & Bushman, 1997; Giancola & Chermack, 1998;
Taylor, 1967). For each of the 25 trials of the task, participants began by setting the
volume (60-105 decibels, in 5 decibel increments) and duration (0 – 5 seconds, in 0.5
second increments) of the noise blasts. Noise blast volumes were calibrated with a
decibel reader to assure fidelity to the displayed decibel level. Participants could also
set the volume or duration to 0 to prevent any noise blast from being administered. After
participants entered their noise blast settings, they competed against their opponent
(i.e., a same-sex VCU undergraduate) by clicking a colored box as soon as it went from
green to yellow to red. If participants lost a given trial (which they did approximately
50% of the time), they were blasted with noise at the volume and duration that their
opponent ostensibly determined ahead of time. Wins and losses were randomized
across trials (yet this random order was held constant across participants). All
participants lost the first trial and won the second trial, both of which included volume
and duration settings from the opponent that were at or near maximum. This was done
to ensure that participants were substantially provoked at the outset of the task, as is
commonly done. To ensure the task’s realism, participants lost every trial that they did
not respond to within several seconds. Further, trials would not advance if participants
simply clicked the mouse repeatedly on the colored square.
Hot Sauce Aggression Task. Another often-used laboratory measure of
aggression is the Hot Sauce Aggression Task (Lieberman, Solomon, Greenberg, &
TAP VALIDATION
9
McGregor, 1999). In this task, participants were given several crackers that they could
use to sample a very spicy hot sauce for themselves. After participants administered
their own hot sauce, but before they actually eat any of it, they were asked to assist the
experimenters by measuring out a sample of the same hot sauce for their essay
evaluation partner. The weights of participants’ hot sauce allocations to themselves and
to their partner were measured in grams and the weight of the hot sauce allocation to
their partner served as the dependent measure of aggression.
Voodoo Doll Aggression Task. A relatively novel task to measure aggressive
behavior in the laboratory is the Voodoo Doll Aggression Task (DeWall et al., 2013). This
task takes advantage of the innate human tendency to imbue certain objects with
symbolic properties (Rozin, Millman, & Nemeroff, 1986). In this task, participants were
instructed to view a computer image of a plush human doll as a symbolic representation
of their actual essay evaluator. Participants viewed what the doll would look like with an
array of sharp pins stuck in it. Then, participants typed the number of virtual, sharp pins
that they wished to stick into the doll (from 0 to 51 pins). This task may seem playful, but
there is substantial evidence that this task captures actual aggressive tendencies
(Chester & DeWall, 2017; DeWall et al., 2013).
Brief Aggression Questionnaire. The 12-item BAQ is a subset of items from
the 29-item Buss-Perry Aggression Questionnaire (Webster et al., 2014). The BAQ
possesses a four factor structure with 3-item subscales measuring each construct:
anger (sample item: “sometimes I fly off the handle for no good reason”), hostility
(sample item: “when people are especially nice, I wonder what they want”), physical
aggression (sample item: “given enough provocation, I may hit another person”), and
TAP VALIDATION
10
verbal aggression (sample item: “when people annoy me, I may tell them what I think of
them”). Participants completed the full, 29-item Buss-Perry Aggression Questionnaire,
and therefore the BAQ, by rating their agreement with each statement along a 1
(strongly disagree) to 7 (strongly agree) response scale.
Aggressive Motives Scale. The 6-item AMS is a retrospective, self-report
measure of the motives underlying participants’ aggressive behavior on the TAP
(Anderson & Murphy, 2003). Two items assess instrumental motives (sample item: “I
wanted to impair my opponent’s performance in order to win more”) and four items
assess revengeful aggression motives (sample item: “I wanted to pay back my
opponent for the noise levels he/she set“). Participants rated their agreement with each
statement along a 1 (strongly disagree) to 7 (strongly agree) response scale. This scale
allowed for a test that could disentangle whether participants acted aggressively on the
TAP out of a simply competitive or truly aggressive motivation.
Non-Suicidal Self-Injury Assessment Tool. The 6-item NSSIAT was
constructed to assess state-level desires to engage in self-harm behaviors (Chester,
Whitt, Davis, & DeWall, 2017). Representative items included “Right now, how much do
you want to hurt yourself on purpose?” and “If there was a sharp object lying on the
table how likely would you be to use to hurt yourself?” Participants rated each item
along a 1 (not at all) to 9 (extremely) response scale.
History of Physical Fights Scale. This open-ended measure simply asked
participants how many physical fights they had been in, in the past 5 years and the past
year. This measure allowed for the assessment of real-world acts of aggressive
behavior.
TAP VALIDATION
11
Provocation Manipulation Check. The 7-item PMC (Denson, von Hippel,
Kemp, & Teo, 2010) tested the efficacy of the essay feedback provocation manipulation
used in this study. In this measure, participants reported the extent to which they felt
“provoked”, “insulted”, and other forms of aversive responses to the essay feedback
they received, along a 1 (strongly disagree) to 7 (strongly agree) scale.
Exploratory Measures
Self-report measures that were not included in the study’s preregistration are
summarized below in the temporal sequence in which participants completed them.
These three measures were included in order to assess the TAP’s association with
other forms of antisocial (e.g., rule-breaking) and aggressive (e.g., displaced, proactive)
behavior.
Subtypes of Antisocial Behavior Questionnaire. Aggression in both physical
and verbal forms falls under the larger umbrella of antisocial behaviors, which also
include deviant rule-breaking tendencies (Burt & Donnellan, 2010). The 32-item STAB
assesses these three subtypes of antisocial behavioral tendencies, by asking
participants the frequency (1 - never to 5 - nearly all the time) with which they perform
each behavioral category.
Reactive/Proactive Aggression Questionnaire. The 23-item RPAQ assesses
the dispositional tendency to act aggressively out of impulsive and rash reactance to
provocation (Reactive) or out of an instrumental and calculated motivation (Proactive;
Raine et al., 2006). Participants responded to Reactive items such as “how often have
you reacted angrily when provoked by others?” and Proactive items such as “how often
TAP VALIDATION
12
have you used force to obtain money or things from others?” along a 0 (never) to 2
(often) response scale.
Displaced Aggression Questionnaire. Not all aggression is directed towards
the provocateur. Indeed, displaced aggression takes the form of harmful acts directed at
innocent third-parties. This tendency to take out one’s aggression on bystanders can be
quantified via the DAQ (Denson, Pedersen, & Miller, 2006). In this measure, participants
responded to 10 items such as “I take my anger out on innocent others” along a 1
(extremely uncharacteristic of me) to 7 (extremely characteristic of me) response scale.
Procedure
Informed consent and introduction. Participants arrived individually to my
laboratory’s waiting room at Virginia Commonwealth University. A trained experimenter
greeted them and escorted them back into our testing room, where they were seated at
a computer desk. Participants read and signed an informed consent form. Afterwards,
the experimenter verbally screened each participant for any medical issues that might
lead to sensitive hearing and therefore put them at risk of hearing damage from the TAP.
Participants then listened to a verbal introduction to and overview of the study by the
experimenter.
Provocation manipulation. In order to experimentally-induce interpersonal
provocation, this study used a validated essay evaluation paradigm in which participants
received harsh or positive feedback on an essay. This paradigm has been used
effectively in previous research to provoke individuals into experiences of anger and
aggressiveness (Bushman & Baumeister, 1998; Chester & DeWall, 2017; Chester,
Merwin, & DeWall, 2015). To implement this paradigm, participants were all instructed to
TAP VALIDATION
13
write a short essay about “an important moment in your life that had a great impact on
who you are today.” Participants were then informed that they would be exchanging
their essay with another Virginia Commonwealth University student, who was ostensibly
down the adjacent hallway. The purported purpose of the essay task was to provide the
experimenters with an estimate of participants’ writing and critiquing skills.
Participants wrote their essay for five minutes and then placed it in a folder,
which was taken away by the experimenter. The experimenter promptly returned with a
folder containing their partner’s essay, which possessed identical content across
participants, and had actually been pre-written by one of the experimenters. Essays
were written in sex-stereotypic handwriting that matched the perceived sex of the
participant. This sex-matched handwriting procedure was used to subtly indicate to
participants that their essay partner was same-sex, as aggressive responses are
profoundly altered by the sex of both the perpetrator and victim (Smuts, 1992).
Participants then spent 3 minutes reading and evaluating their partner’s essay along
various criteria providing five numeric scores that ranged from 1 (poor) to 7 (excellent),
summed to a total score of 5-35 points. The essay evaluation also included a space to
write comments. The experimenter then collected the essay and evaluation form from
participants and left the room. The experimenter promptly returned with the participant’s
essay and an essay evaluation form from their fictitious partner that had been prewritten in the same handwriting as the fictitious essay. The essay evaluation contained
either negative (8/35 points, “One of the WORST essays I’ve EVER read!”) or positive
(33/35 points, “Great essay!”) feedback, as determined by a randomized list of condition
TAP VALIDATION
14
assignments. After reviewing the essay feedback for one minute, the experimenter
collected the essay and evaluation form from the participant.
Laboratory aggression tasks. The experimenter then directed participants to
the computer to complete the TAP, describing it as a measure of reaction time. The
experimenter explained the task to the participant, gave them a sample of a moderately
loud noise blast, and then left to check on the fictitious partner’s internet connection to
the participant. The experimenter returned quickly and instructed the participant to begin
the task. While the participant completed the TAP, the experimenter weighed out two
plates with crackers on them.
After completing the TAP, the experimenter returned to the participant with a plate
of crackers and a bottle of PAIN 100%© hot sauce. Participants were informed that they
would now complete a measure of pain regulation, in which they would apply some of
the hot sauce to the crackers and then eat them. After the participant added their
desired level of hot sauce, the experimenter returned and informed them that they
needed to leave to help another experimenter. The experimenter placed a second plate
of crackers in front of the participant and asked them if they would help by applying hot
sauce to these crackers, which their essay evaluation partner would have to eat. After a
60-second absence, the experimenter took both plates and weighed them away from
the participant.
The experimenter then returned and opened up a computerized version of the
Voodoo Doll Aggression Task. This task was framed as a mental visualization exercise.
The experimenter left the room while the participant completed the task. After
completing the Voodoo Doll Aggression Task, participants completed a battery of
TAP VALIDATION
15
personality questionnaires including the items detailed in the above ‘Measures’ section.
Upon completion of these questionnaires, the study was concluded.
Suspicion probe and debriefing. The experimenter then sat across from the
participant and conducted a funneling suspicion interview intended to gauge whether
participants guessed any of the deceptive elements of the study. Participants were then
debriefed as to the deception in the study and the study’s true purpose, were provided
with counseling resources in case of lingering psychological distress, and then were
escorted from the laboratory, with thanks. The study’s procedures lasted no longer than
one hour.
Data Analysis Plan
In alignment with my preregistration plan, I scored the TAP by calculating the
mean of all 50 individual measurements of the task (2 settings per trial x 25 trials).
Group comparisons on this measure were conducted using parametric, independent
samples t-tests. Associations between study variables and this TAP score were
estimated with either bivariate correlation analyses, for variables with parametric
distributions, or generalized linear modeling (specifying a loglinear-link and Poissondistribution) for zero-inflated, positively-skewed, count variables.
Bias-corrected and accelerated 95% confidence intervals were calculated around
effect size estimates using nonparametric bootstrapping techniques (1,000 resamples
per bootstrap analysis). Inferences were made about each effect based on whether the
test was statistically significant (i.e., p < .05 and 95% confidence interval did not include
0). Internal consistency estimates of the TAP’s 25 trials were obtained by calculating a
Cronbach’s alpha and by conducting a principal components analysis, specifying a
TAP VALIDATION
16
direct oblimin rotation in order for potential components to be correlated with one
another. All aforementioned analyses were conducted using SPSS 24.
As an exploratory technique, I applied multilevel linear modeling (MLM) to TAP
scores as the data were collected across 25 time points and this technique allowed for
the estimation of slopes, instead of simple mean-level effects across all trials. These
models were fit specifying a random intercept and slope and estimating fixed effects of
both the slope and the various study variables. These multilevel models were
implemented with the PROC MIXED command in SAS 9.4.
Results
Deviations from Preregistration Plan
I was unable to achieve the 50% gender equity that I outlined in my
preregistration plan. Further, I did not enact the outlier exclusion rule that I described in
the preregistration (i.e., exclude all datapoints that were beyond 1.5 times the interquartile range from the median), as this proved to be far too conservative and would
have led to the exclusion of an unjustified and substantial portion of my sample.
Descriptive Statistics
Mean TAP scores exhibited substantial variability across participants, M = 5.06,
SD = 2.03, range: 0.00 – 10.00. TAP scores did not exhibit problematic skewness, -0.41,
SE = 0.19, or kurtosis, 0.31, SE = 0.38.
The voodoo doll pin count was zero-inflated (40.0% zero values), positively
skewed, 2.37, and leptokurtic, 5.63, as with the physical fight count across the past 5
years was zero-inflated (66.3% zero values), positively skewed, 2.31, and leptokurtic,
6.36. Internal consistency was adequate for the Physical Aggression subscale of the
TAP VALIDATION
17
BAQ, α = .72, the Revengeful Harm subscale of the AMS, α = .82, the Non-Suicidal
Self-Injury Assessment Tool, α = .97, and the Provocation Manipulation Check, α = .96.
Internal consistency was inadequate for the Verbal subscale of the BAQ, α = .501.
Manipulation Check
The random assignment procedure sorted 78 participants into the negative
feedback condition of the provocation manipulation, and 82 participants into the positive
feedback condition. Participants who were assigned to the negative feedback condition
reported greater levels of provocation due to their essay feedback than participants in
the positive feedback condition, t(158) = 9.42, p < .001, d = 1.50 [1.17, 1.80].
Confirmatory Tests
Construct validity - provocation manipulation effect. Participants who were
assigned to the negative feedback condition of the provocation manipulation had higher
TAP scores than participants in the positive feedback condition, t(158) = 3.24, p = .001,
d = 0.52 [0.17, 0.83] (Figure 1).
1 The 5-item Verbal Aggression subscale of the full Buss-Perry Aggression
Questionnaire had an acceptable internal consistency, α = .70. Yet results did not
meaningfully differ if this measure was used, instead of the BAQ subscale.
TAP VALIDATION
18
Figure 1. Violin plots depicting the distributions of mean TAP scores by essay
feedback condition.
Construct validity - gender differences. Contrary to my prediction, males had
lower TAP scores than females, t(158) = -1.99, p = .048, d = -0.32 [-0.60, 0.02].
Construct validity - revengeful harm motivations. Suggesting that noise blast
settings are motivated by the desire to inflict vengeful harm on participants’ targets, TAP
scores were positively associated with revengeful aggression motivations, r(158) = .39
[.26, .50], p < .001.
Convergent validity - laboratory aggression. Supporting the convergent
validity of the TAP, mean noise blasts were positively associated with the amount of hot
sauce participants gave to their partner in the taste test, r(158) = .23 [.07, .36], p = .004.
One participant’s hot sauce allocation was 8.14 SDs above the sample mean. After
removing this outlier, TAP scores remained positively associated with hot sauce
allocations, r(157) = .20 [.04, .37], p = .011 (Figure 2A). TAP scores were also
TAP VALIDATION
19
associated with the number of pins stuck in the voodoo doll that represented their
partner, B = 0.30 [0.26, 0.33], SE = 0.02, Χ(1, 158) = 316.82, p < .001 (Figure 2B).
Figure 2. Scatterplots depicting positive associations between mean TAP scores
and (A) hot sauce weights, in grams, and (B) voodoo doll pin counts, directed
towards participants’ essay evaluators.
Convergent validity - trait aggression. Counter to predictions, TAP scores
were unassociated with trait physical aggression, r(158) = .06 [-.10, .21], p = .445.
External validity. Offering mixed evidence of the TAP’s external validity, TAP
scores were unassociated with the number of physical fights participants had been in
over the past 5 years, B = -0.01 [-0.10, 0.08], SE = 0.05, Χ(1, 158) = 0.06, p = .810. Yet
TAP scores were positively associated with fights that participants had been in over the
past year, B = 0.22 [0.02, 0.42], SE = 0.10, Χ(1, 158) = 4.42, p = .036.
Discriminant validity. Again there was mixed evidence for the TAP’s
discriminant validity, such that TAP scores were unassociated with trait verbal
aggression, r(158) = .09 [-.09, .25], p = .252, yet unexpectedly were positively
associated with current self-harm tendencies, r(158) = .16 [.03, .29], p = .038.
TAP VALIDATION
20
Internal consistency. The 50 individual measurements of the TAP (25 trials x 2
settings per trial) exhibited excellent internal consistency, α = .98. The average of
duration and volume settings from each of the 25 trials exhibited a single-component
structure, as evidenced by an initial component with an eigenvalue of 13.72, which
explained 55.01% of the variance in TAP responses. Each of the 25 trials substantially
loaded onto this component, loadings > .61. Two other factors marginally passed the
1.00 eigenvalue cutoff at 1.83 and 1.32. However, each of these smaller factors only
had a single trial that loaded above a .40 factor-loading cutoff, rendering them largely
uninterpretable.
Exploratory Analyses
Specificity of hot sauce measure to aggression. Participants’ TAP scores
were unassociated with hot sauce allocations towards themselves, r(156) = .11 [-.09, .
27], p = .157. Two participants’ did not provide data for the self-administration of hot
sauce as they ate off their plate, against instructions, rendering the plate’s weight
uninformative. This finding supports the assertion that hot sauce allocations to essay
evaluators reflected an aggressive motivation and not a more general tendency to
allocate hot sauce indiscriminately,
Revenge versus instrumental motivations. To assess whether TAP scores
were motivated more by vengeance (i.e., wanting to inflict harm) than by instrumental
motives (i.e., wanting to win the game), I regressed TAP scores onto both Revengeful
Harm and Instrumental motives scores from the AMS using multiple linear regression.
Revengeful harm motives remained positively associated with TAP scores, B = 0.45
[0.17, 0.70], t(157) = 3.27, p = .001, whereas instrumental motives were not associated
TAP VALIDATION
21
with TAP scores, B = 0.16 [-0.09, 0.42], t(157) = 1.34, p = .181. Internal consistency was
adequate for the Instrumental Motivation subscale of the AMS, α = .78.
Associations with different types of aggressive and antisocial behavior.
Using the STAB questionnaire, TAP scores were unassociated with physical
aggressiveness, r(158) = .07 [-.08, .22], p = .367, social aggressiveness, r(158) = .10
[-.05, .25], p = .212, and rule breaking, r(158) = -.03 [-.22, .17], p = .716. TAP scores
were further unassociated with displaced aggression scores from the DAQ, r(125) = .06
[-.13, .24], p = .475, reactive aggression scores from the RPAS, r(140) = .16 [.01, .31], p
= .061, and proactive aggressiveness, r(140) = .07 [-.05, .21], p = .379. Internal
consistency was adequate for the Physical Aggression, α = .86, Social Aggression, α = .
85, and Rule-Breaking, α = .70, subscales of the STAB, as with the Displaced
Aggression subscale of the DAQ, α = .92, and the Reactive, α = .83, and Proactive, α
= .84, Aggression subscales of the RPAS.
Multilevel linear modeling. Using MLM, TAP scores exhibited substantial withinperson, B = 3.63, SE = 0.08, Z = 42.89, p < .001, and between-person, B = 4.63, SE =
0.59, Z = 7.89, p < .001, variability and tended to decrease over the course of the task,
B = -0.02 [-.03, -.001], SE = 0.01, t(159) = -2.14, p = .034.
MLM was then used to replicate the study’s univariate, confirmatory analyses
(summary of results in Table 1). Replicating the univariate analyses, the experimental
provocation induction increased TAP scores and TAP scores were associated with
greater hot sauce allocations, being female, and non-suicidal self-injury tendencies.
Again, TAP scores were positively associated with revengeful aggression motivations,
even when controlling for instrumental motives and TAP scores were unassociated with
TAP VALIDATION
22
physically and verbally aggressive traits. Zero-inflated measures from the study (i.e.,
voodoo doll pin counts and past fight counts) were excluded, as the predictors were too
zero-inflated to serve as independent variables in MLM.
Table 1. Multilevel associations between study variables and TAP scores across
all 25 trials.
Independent Variable
Provocation
B [95% CI]
1.00
SE
0.31
t (df)
3.24
p
.001
Hot Sauce Weight
[0.39, 1.61]
0.10
0.03
(158)
2.96
.003
Female ( > Male)
[0.03, 0.16]
0.69
0.33
(158)
2.09
.036
Revengeful Harm Motives
[0.04, 1.34]
0.57
0.11
(158)
5.39
< .001
Revengeful Harm Motives
[0.36, 0.78]
0.45
0.14
(158)
3.33
< .001
(controlling for Instrumental)
Self-Injury Tendencies
[0.19, 0.72]
0.35
0.17
(157)
2.08
.038
Trait Physical Aggression
[0.02, 0.68]
0.07
0.11
(158)
0.70
.484
Trait Verbal Aggression
[-0.13, 0.28]
0.16
0.14
(158)
1.16
.247
[-0.11, 0.44]
(158)
Subsequent analyses tested the effect of trait physical and verbal aggression on
the slopes of TAP scores across the 25 trials, not simply on the mean of all 25 trials.
Although physically aggressive traits were unassociated with mean level TAP scores,
they were associated with more positive aggression slopes across the task, B = 0.01
[0.003, 0.02], SE = 0.01, t(158) = 2.60, p = .009 (Figure 3).
TAP VALIDATION
23
Figure 3. TAP scores (mean of duration and volume settings) across all 25 trials
of the task, by high (+1 SD) and low (-1 SD) trait physical aggressiveness (PA).
The simple slopes of this effect of trait physical aggression on TAP slopes were
then probed using an online utility (http://www.quantpsy.org/interact/hlm2.htm; Preacher,
Curran, & Bauer, 2006). At low (-1 SD) levels of trait physical aggression, TAP scores
became progressively lower across the task, B = -0.03, SE = 0.01, t(158) = -3.25, p = .
001. However, at relatively high (+1 SD) levels of trait physical aggression, TAP scores
remained constant, B = 0.00, SE = 0.01, t(158) = 0.27, p = .784. This slope modulation
was not observed for trait verbal aggression, B = 0.00 [-0.01, 0.01], SE = 0.01, t(158) =
-0.05, p = .958.
Discussion
TAP VALIDATION
24
Each year, dozens of scientific findings are published using the Taylor Aggression
Paradigm (TAP; Taylor, 1967). The TAP has facilitated substantial developments in our
understanding of aggression, such as identifying aggression as a function of threats to
vulnerable egos (Bushman & Baumeister, 1998), violent media (Bartholow, Bushman, &
Sestir, 2006), and alcohol consumption (Giancola & Zeichner, 1997). However, the
ability to implement and analyze the TAP in a nearly unlimited array of ways is both a
strength and a weakness. The TAP’s flexibility can enable researcher degrees-offreedom, whether conscious or incidental, to undermine the validity of the task and
introduce false positives into the literature (Elson et al., 2014). Using a preregistered
version of the TAP and a single scoring approach, I sought to remove the flexibility of
the task and test its subsequent validity. Doing so would allow aggression researchers
to retain this form of the tool as a valid laboratory measure.
Evidence for the over-arching validity of the mean score approach to the 25-trial
TAP was mixed. As predicted, these mean TAP scores were higher for experimentally
provoked participants, as compared to their non-provoked counterparts. Further, the
TAP demonstrated convergent validity with the two other laboratory aggression
measures: the Hot Sauce Aggression Task (Lieberman et al., 1999) and the Voodoo
Doll Aggression Task (DeWall et al., 2013). TAP scores corresponded to the appropriate
motivation, the desire to inflict retributive harm, and not with competitive or instrumental
motives to simply win the competition. These findings suggest that the TAP responds
appropriately to aggression-increasing situational effects and shows agreement with
similar measures. Moreover, these results support the construct validity of the TAP as
TAP VALIDATION
25
an effective measure of currently felt, state-level, and “in the moment” aggressive
tendencies.
The TAP did not exhibit the predicted associations with dispositional measures of
aggressiveness, such as gender and trait physical aggression. The disproportionate
level of females in the sample may have contributed to the finding that females had
higher TAP scores than males, though this is unclear. Exploratory multilevel analyses
suggested that the univariate approach to the TAP failed to reveal a significant
association with trait physical aggression because the effect is present at later trials and
not during the initial phase of the task. This temporal effect may have arisen from the
fact that early trials of the TAP are characterized by extreme provocation from
participants’ opponents, a situational input that may override their dispositional
tendencies towards aggressiveness. As such, the TAP may be an accurate measure of
aggressive traits when inferences are based on the slope (and not the mean) of TAP
scores. Future research, which takes a confirmatory approach to these multilevel
analyses, is needed to demonstrate whether this is the case as these MLM analyses
were purely exploratory.
TAP scores also corresponded to physical fight frequency over the past year, but
not over the past 5 years, suggesting that the TAP may have external validity, but within
a shorter time frame. The relatively young age of our sample might also have impacted
this finding as there are significant aggression-related developmental landmarks
between adolescence and emerging adulthood (Cleverley, Szatmari, Vaillancourt, Boyle,
& Lipman, 2012). The developmental trajectory of TAP scores is a fruitful area for future
TAP VALIDATION
26
research, as is the determination of whether the TAP can indeed predict ‘real-world’ acts
of violence.
Mixed evidence was also observed for the TAP’s discriminant validity. As
predicted, TAP scores were unassociated with trait verbal aggressiveness, yet were
positively associated with state-level self-harm tendencies, against prediction. In
hindsight, the selection of self-harm tendencies as an index of discriminant validity was
questionable as these two variables have been previously-linked (Muehlenkamp &
Gutierrez, 2007). Future research should seek to better test the TAP’s discriminant
validity. More work is also needed to better articulate the nomological network around
TAP scores, which may be done by including more general personality and externalizing
behavior measures (e.g., Miller & Lynam, 2006).
Principal components analyses and internal consistency estimates suggested
that the 50 datapoints of the TAP do, in fact, load onto a central latent construct. As
such, arguments that different trials of the TAP measure different constructs (e.g., first
trial = unprovoked aggression, second trial = retaliatory aggression), are not supported
by these data, which instead suggest that this task is largely homologous. More
research is needed to test whether different trials of the TAP do represent quantitatively
or qualitatively different measures.
Conclusions
Flexible psychological measures are a boon to the field. They are adaptable to
various contexts and hypotheses and enable a wider array of research. However, this
flexibility must be tempered with preregistration of the task’s implementation, scoring,
and analysis, lest this flexibility undermine sound science. Using a preregistration
TAP VALIDATION
27
approach, the 25-trial TAP appears to be a valid measure of state-level, currently felt
tendencies towards acts of physical retribution. Yet more work is needed to explore the
TAP’s ability to assess more dispositional and ‘real-world’ forms of aggression.
Assuming these findings are replicated and that the task is used appropriately, the TAP
should have a long and healthy life in the aggression researcher’s toolkit.
TAP VALIDATION
28
Acknowledgments
The author is grateful to the Center for Open Science for incentivizing this work
with their Preregistration Challenge, to Malte Elson for illustrating the severity of the
TAP’s flexibility issues, and to Brad Bushman for creating and disseminating the
computerized version of the TAP used in this project.
TAP VALIDATION
29
References
Anderson, C. A., & Bushman, B. J. (1997). External validity of “trivial” experiments: The
case of laboratory aggression. Review of General Psychology, 1(1), 19–41.
Anderson, C. A., & Murphy, C. R. (2003). Violent video games and aggressive behavior
in young women. Aggressive Behavior, 29(5), 423–429.
Bartholow, B. D., Bushman, B. J., & Sestir, M. A. (2006). Chronic violent video game
exposure and desensitization to violence: Behavioral and event-related brain
potential data. Journal of Experimental Social Psychology, 42(4), 532–539.
Bernstein, S., Richardson, D., & Hammock, G. (1987). Convergent and discriminant
validity of the Taylor and Buss measures of physical aggression. Aggressive
Behavior, 13(1), 15–24.
Bond, A., & Lader, M. (1986). A method to elicit aggressive feelings and behaviour via
provocation. Biological Psychology, 22(1), 69–79.
Burt, S. A., & Donnellan, M. B. (2010). Evidence that the Subtypes of Antisocial
Behavior questionnaire (STAB) predicts momentary reports of acting-out
behaviors. Personality and Individual Differences, 48(8), 917–920.
Bushman, B. J. (1995). Moderating role of trait aggressiveness in the effects of violent
media on aggression. Journal of Personality and Social Psychology, 69(5), 950–
960.
Bushman, B. J., & Baumeister, R. F. (1998). Threatened egotism, narcissism, selfesteem, and direct and displaced aggression: Does self-love or self-hate lead to
violence? Journal of Personality and Social Psychology, 75(1), 219–229.
TAP VALIDATION
30
Chester, D. S. & DeWall, C. N. (2016). The pleasure of revenge: Retaliatory aggression
arises from a neural imbalance toward reward. Social Cognitive and Affective
Neuroscience, 11(7), 1173-1182.
Chester, D. S. & DeWall, C. N. (2017). Combating the sting of rejection with the
pleasure of revenge: A new look at how emotion shapes aggression. Journal of
Personality and Social Psychology, 112(3), 413-430.
Chester, D. S., Merwin, L. M., & DeWall, C. N. (2015). Maladaptive perfectionism’s link
to aggression and self-harm: Emotion regulation as a mechanism. Aggressive
Behavior, 41(5), 443-454.
Chester, D. S., Whitt, Z. T., Davis, T. S., & DeWall, C. N. (2017). The Voodoo Doll SelfInjury Task: A new measure of self-harm tendencies. Manuscript under review.
Cleverley, K., Szatmari, P., Vaillancourt, T., Boyle, M., & Lipman, E. (2012).
Developmental trajectories of physical and indirect aggression from late
childhood to adolescence: Sex differences and outcomes in emerging adulthood.
Journal of the American Academy of Child & Adolescent Psychiatry, 51(10),
1037–1051.
Denson, T. F., Pedersen, W. C., & Miller, N. (2006). The displaced aggression
questionnaire. Journal of Personality and Social Psychology, 90(6), 1032–1051.
Denson, T. F., von Hippel, W., Kemp, R. I., & Teo, L. S. (2010). Glucose consumption
decreases impulsive aggression in response to provocation in aggressive
individuals. Journal of Experimental Social Psychology, 46(6), 1023–1028.
TAP VALIDATION
31
Elson, M., Mohseni, M. R., Breuer, J., Scharkow, M., & Quandt, T. (2014). Press CRTT
to measure aggressive behavior: The unstandardized use of the competitive
reaction time task in aggression research. Psychological Assessment, 26(2), 419.
Epstein, S., & Taylor, S. P. (1967). Instigation to aggression as a function of degree of
defeat and perceived aggressive intent of the opponent1. Journal of Personality,
35(2), 265–289.
Ferguson, C. J., Smith, S., Miller-Stratton, H., Fritz, S., & Heinrich, E. (2008).
Aggression in the laboratory: Problems with the validity of the modified Taylor
Competitive Reaction Time Test as a measure of aggression in media violence
studies. Journal of Aggression, Maltreatment & Trauma, 17(1), 118–132.
Giancola, P. R., & Parrott, D. J. (2008). Further evidence for the validity of the Taylor
Aggression Paradigm. Aggressive Behavior, 34(2), 214–229.
Giancola, P. R., & Zeichner, A. (1995). Construct validity of a competitive reaction-time
aggression paradigm. Aggressive Behavior, 21(3), 199–204.
Giancola, P. R., & Zeichner, A. (1997). The biphasic effects of alcohol on human
physical aggression. Journal of Abnormal Psychology, 106(4), 598–607.
Krämer, U. M., Jansma, H., Tempelmann, C., & Münte, T. F. (2007). Tit-for-tat: The
neural basis of reactive aggression. NeuroImage, 38(1), 203–211.
Lieberman, J. D., Solomon, S., Greenberg, J., & McGregor, H. A. (1999). A hot new way
to measure aggression: Hot sauce allocation. Aggressive Behavior, 25(5), 331–
348.
Miller, J. D., & Lynam, D. R. (2006). Reactive and proactive aggression: Similarities and
differences. Personality and Individual Differences, 41(8), 1469–1480.
TAP VALIDATION
32
Muehlenkamp, J. J., & Gutierrez, P. M. (2007). Risk for suicide attempts among
adolescents who engage in non-suicidal self-injury. Archives of Suicide
Research, 11(1), 69–82.
Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools for probing
interactions in multiple linear regression, multilevel modeling, and latent curve
analysis. Journal of Educational and Behavioral Statistics, 31(4), 437–448.
Raine, A., Dodge, K., Loeber, R., Gatzke-Kopp, L., Lynam, D., Reynolds, C., … Liu, J.
(2006). The reactive–proactive aggression questionnaire: Differential correlates
of reactive and proactive aggression in adolescent boys. Aggressive Behavior,
32(2), 159–171.
Rozin, P., Millman, L., & Nemeroff, C. (1986). Operation of the laws of sympathetic
magic in disgust and other domains. Journal of Personality and Social
Psychology, 50(4), 703–712.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology:
Undisclosed flexibility in data collection and analysis allows presenting anything
as significant. Psychological Science, 22(11), 1359–1366.
Smuts, B. (1992). Male aggression against women. Human Nature, 3(1), 1–44.
Taylor, S. P. (1967). Aggressive behavior and physiological arousal as a function of
provocation and the tendency to inhibit aggression. Journal of Personality, 35(2),
297–310.
Tedeschi, J. T., & Quigley, B. M. (1996). Limitations of laboratory paradigms for studying
aggression. Aggression and Violent Behavior, 1(2), 163–177.
TAP VALIDATION
33
Tedeschi, J. T., & Quigley, B. M. (2000). A further comment on the construct validity of
laboratory aggression paradigms: A response to Giancola and Chermack.
Aggression and Violent Behavior, 5(2), 127–136.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A.
(2012). An agenda for purely confirmatory research. Perspectives on
Psychological Science, 7(6), 632–638.
Webster, G. D., DeWall, C. N., Pond, R. S., Deckman, T., Jonason, P. K., Le, B. M., …
Bator, R. J. (2014). The brief aggression questionnaire: Psychometric and
behavioral evidence for an efficient measure of trait aggression. Aggressive
Behavior, 40(2), 120–139.