Running head: TAP VALIDATION 1 A Preregistered Validation Study of the Taylor Aggression Paradigm David S. Chester Department of Psychology, Virginia Commonwealth University, USA Abstract Word Count: 149 Introduction/Discussion/Acknowledgments/Footnote Word Count: 1,981 References: 35 Correspondence should be addressed to: David S. Chester 302 Thurston House Virginia Commonwealth University Richmond, VA, 23284, USA dschester@vcu.edu 1-804-828-7624 TAP VALIDATION 2 Abstract The Taylor Aggression Paradigm (TAP) is a frequently-used laboratory measure of aggressive behavior. However, the flexibility inherent in its implementation and analysis can undermine its validity. To test whether the TAP was a valid aggression measure irrespective of this flexibility, I conducted a preregistered study of a 25-trial version of the TAP using a single scoring approach with 160 diverse undergraduate participants. TAP scores showed agreement with other laboratory aggression measures and were magnified by an experimental provocation manipulation. Mixed evidence was found for associations with aggressive dispositions and real-world violence. These results provide preliminary support for this approach to the TAP to measure state-level aggressive behavior. However, more evidence is needed to assess the TAP’s external validity and ability to measure dispositional forms of aggression. Using preregistered designs, researchers should validate specific variants of their behavioral tasks in order to optimize the veridicality and reproducibility of psychological science. Keywords: aggression, Taylor Aggression Paradigm, preregistration, validation, Competitive Reaction-Time Task TAP VALIDATION 3 Introduction The study of aggressive behavior has been a challenging endeavor. Observing violence in “the wild” has its own risks. Yet, when researchers sought to measure aggressive behavior in the laboratory, they had to grapple with substantial ethical, logistical, and theoretical hurdles. The Taylor Aggression Paradigm (hereafter TAP; Taylor, 1967) arose from these early attempts at operationalization. Now in its 50 th year, the task has made great contributions to psychological science but has also received substantive criticism. In what follows, I briefly review the history of the TAP and some modern critiques that highlight problems with the task’s flexibility. I then detail a study that was conducted to validate a single variant of the task, in light of these flexibility issues. The Taylor Aggression Paradigm: A Brief History and Overview As stated by Taylor himself, “In order to investigate aggression in the laboratory it is necessary to have an effective method for inciting aggression and for objectively measuring the aggressive responses that follow” (pp. 297, Epstein & Taylor, 1967; Taylor 1967). To extend laboratory aggression measurement beyond hypotheticals, selfreports, and projective tests, Taylor sought to simulate a provocative social experience in the lab and objectively measure ‘real’ aggressive responses. In the original TAP, participants selected the intensity of an electric shock to administer to an opponent, who likewise did the same for them. Participants then repeatedly competed against their opponent to flip a switch faster in response to a cue. If they lost the competition, participants received a shock at the intensity of their opponent’s choosing. Participants TAP VALIDATION 4 could see the level of the shock that their opponent selected for them, which allowed the experimenters to manipulate the level of provocation experienced by participants. In modern times, the TAP is most-frequently implemented as a computer program that can administer noise blasts through headphones instead of shocks delivered through electrodes (Bond & Lader, 1986; Bushman, 1995). In this new approach, participants can set the duration and volume of these noise blasts and these settings serve as the laboratory operationalization of aggression. The TAP often includes multiple trials in order to provide a more reliable estimate of aggressive behavior. As in the original task, the computerized TAP typically models participants’ opponents as provocateurs who begin the task by selecting the loudest and longest noise blasts possible. The TAP has been used widely with 382 scholarly papers citing the task in 2016 alone (citation estimates from Google Scholar). This task has even been modified for the brain-imaging environment, allowing for the investigation of neural correlates of aggressive behavior (e.g., Krämer, Jansma, Tempelmann, & Münte, 2007). However, such popularity has not been reached without criticism. Challenges and Critiques Laboratory measures of aggression, such as the TAP, have been categorically criticized for reasons including a lack of ecological validity, an exaggerated focus on retaliatory and not unprovoked aggression, that they are sanctioned by an authority figure, and a lack of alternative options other than aggressive retaliation (Tedeschi & Quigley, 1996, 2000). The TAP has been specifically criticized for a lack of construct validity (Ferguson, Smith, Miller-Stratton, Fritz, & Heinrich, 2008). As a rebuttal, the TAP TAP VALIDATION 5 exhibits convergent validity with self-report measures of aggression (Giancola & Parrott, 2008; Giancola & Zeichner, 1995) and with real-life acts of physical violence (e.g., Chester & DeWall, 2016), shows discriminant validity with traits unassociated with physical aggression (Giancola & Zeichner, 1995), and external validity in that effect sizes captured by the TAP correspond to those obtained by field studies (Anderson & Bushman, 1997). More recently, the TAP has been criticized for a lack of standardization between and within laboratories (Elson, Mohseni, Breuer, Scharkow, & Quandt, 2014). For instance, some studies only analyze volume or duration settings (not both) or the first or second trial (not all trials) of the task. Such flexibility can be a strength, allowing researchers to tailor the task to a given study. However, this flexibility can also be problematic if researchers test their hypothesis using multiple scoring and analytic strategies and then present the result that best fits their hypothesis as coming from a singular test (Simmons, Nelson, & Simonsohn, 2011). Such ‘researcher degrees-of-freedom’ capitalize on chance and increase the likelihood of type 1 errors. Indeed, as the TAP has grown in popularity, so too have the variants in which the task is implemented and the data are analyzed (for visualizations of this phenomenon see http://crtt.flexiblemeasures.com/). Fortunately, there is a simple solution to ensure that a flexible measure such as the TAP is not abused: preregistration. Through preregistration, researchers specify their hypotheses, methods, and planned analyses before data analysis begins, preventing any flexible and exploratory analyses from being reported as singular and confirmatory (Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012). This TAP VALIDATION 6 preregistration technique can then be applied to validate the TAP, independent of flexibility. Validation Approaches to the TAP There are many ways to demonstrate that the TAP is a valid and useful laboratory aggression measure. First, TAP scores should correspond to other measures of physically aggressive behavior and tendencies (e.g., other laboratory aggression tasks, self-report aggression questionnaires), suggesting convergent validity of the task (Giancola & Zeichner, 1995). Second, the TAP should be responsive to factors known to magnify aggressive behavior (e.g., provocation, being male), suggesting construct validity of the task (Giancola & Parrott, 2008). Third, variables that are distinct from physically harming others (e.g., verbally harming others, harming the self) should not correspond to TAP scores, thus demonstrating the task’s discriminant validity (Bernstein, Richardson, & Hammock, 1987). Fourth, as an index of the task’s potential external validity, TAP scores should correspond to real-world acts of physical aggression (Anderson & Bushman, 1997). Finally, internal consistency is a necessary pre-requisite for the validity of a measure that is comprised of multiple responses. If a preregistered version of the TAP is shown to exhibit these five qualities, then this task can be considered a valid aggression measurement, beyond concerns regarding the task’s flexibility. Present Research In a preliminary attempt to demonstrate the TAP’s validity outside the context of methodological and analytic flexibility, I conducted a preregistered study. In this study, a 25-trial version of the TAP was administered alongside variables that would provide TAP VALIDATION 7 support for the forms of validity detailed above. The preregistration plan for this study is available online (https://osf.io/x7rjb/register/565fb3678c5e4a66b5582f67), as are all data, analysis code, and materials (https://osf.io/a2wft/files/). Methods Participants A power-analysis using G*Power 3.1 for an effect size of r = .25 (an estimate based on mean TAP score correlations with trait physical aggression from Webster et al., 2014), an alpha level of .05, and 90% power returned a planned sample size of 160. This sample size was large enough to provide 90% power for a d = .47 effect size of gender and the experimental provocation on TAP scores. Based on this power analysis, I sought to enroll 160 participants (80 females, 80 males), following the stop rule that participants had to complete all preregistered measures of the study to be included in the final sample. As such, I recruited 174 participants, 14 of which had some form of missing data and were excluded from subsequent analyses (as specified in my preregistration plan). Final participants included 160 undergraduates, which were not gender-balanced (65.6% female, 34.4% male; Age: M = 20.09, SD = 5.03, range: 18-55). Participants were recruited from the Virginia Commonwealth University’s introductory psychology participant pool. Participants’ racial composition was 47.1% White, 18.8% AfricanAmerican, 17.5% Asian-American, and 15.6% Other. The sample was 13.1% Hispanic and 86.9% Non-Hispanic. Participants were compensated with credit towards their introductory psychology course’s research participation assignment. Preregistered Measures TAP VALIDATION 8 Measures included in the study’s preregistration are summarized below in the temporal sequence in which participants completed them. Taylor Aggression Paradigm. This study’s explicit purpose was to validate the 25-trial version of the TAP (Anderson & Bushman, 1997; Giancola & Chermack, 1998; Taylor, 1967). For each of the 25 trials of the task, participants began by setting the volume (60-105 decibels, in 5 decibel increments) and duration (0 – 5 seconds, in 0.5 second increments) of the noise blasts. Noise blast volumes were calibrated with a decibel reader to assure fidelity to the displayed decibel level. Participants could also set the volume or duration to 0 to prevent any noise blast from being administered. After participants entered their noise blast settings, they competed against their opponent (i.e., a same-sex VCU undergraduate) by clicking a colored box as soon as it went from green to yellow to red. If participants lost a given trial (which they did approximately 50% of the time), they were blasted with noise at the volume and duration that their opponent ostensibly determined ahead of time. Wins and losses were randomized across trials (yet this random order was held constant across participants). All participants lost the first trial and won the second trial, both of which included volume and duration settings from the opponent that were at or near maximum. This was done to ensure that participants were substantially provoked at the outset of the task, as is commonly done. To ensure the task’s realism, participants lost every trial that they did not respond to within several seconds. Further, trials would not advance if participants simply clicked the mouse repeatedly on the colored square. Hot Sauce Aggression Task. Another often-used laboratory measure of aggression is the Hot Sauce Aggression Task (Lieberman, Solomon, Greenberg, & TAP VALIDATION 9 McGregor, 1999). In this task, participants were given several crackers that they could use to sample a very spicy hot sauce for themselves. After participants administered their own hot sauce, but before they actually eat any of it, they were asked to assist the experimenters by measuring out a sample of the same hot sauce for their essay evaluation partner. The weights of participants’ hot sauce allocations to themselves and to their partner were measured in grams and the weight of the hot sauce allocation to their partner served as the dependent measure of aggression. Voodoo Doll Aggression Task. A relatively novel task to measure aggressive behavior in the laboratory is the Voodoo Doll Aggression Task (DeWall et al., 2013). This task takes advantage of the innate human tendency to imbue certain objects with symbolic properties (Rozin, Millman, & Nemeroff, 1986). In this task, participants were instructed to view a computer image of a plush human doll as a symbolic representation of their actual essay evaluator. Participants viewed what the doll would look like with an array of sharp pins stuck in it. Then, participants typed the number of virtual, sharp pins that they wished to stick into the doll (from 0 to 51 pins). This task may seem playful, but there is substantial evidence that this task captures actual aggressive tendencies (Chester & DeWall, 2017; DeWall et al., 2013). Brief Aggression Questionnaire. The 12-item BAQ is a subset of items from the 29-item Buss-Perry Aggression Questionnaire (Webster et al., 2014). The BAQ possesses a four factor structure with 3-item subscales measuring each construct: anger (sample item: “sometimes I fly off the handle for no good reason”), hostility (sample item: “when people are especially nice, I wonder what they want”), physical aggression (sample item: “given enough provocation, I may hit another person”), and TAP VALIDATION 10 verbal aggression (sample item: “when people annoy me, I may tell them what I think of them”). Participants completed the full, 29-item Buss-Perry Aggression Questionnaire, and therefore the BAQ, by rating their agreement with each statement along a 1 (strongly disagree) to 7 (strongly agree) response scale. Aggressive Motives Scale. The 6-item AMS is a retrospective, self-report measure of the motives underlying participants’ aggressive behavior on the TAP (Anderson & Murphy, 2003). Two items assess instrumental motives (sample item: “I wanted to impair my opponent’s performance in order to win more”) and four items assess revengeful aggression motives (sample item: “I wanted to pay back my opponent for the noise levels he/she set“). Participants rated their agreement with each statement along a 1 (strongly disagree) to 7 (strongly agree) response scale. This scale allowed for a test that could disentangle whether participants acted aggressively on the TAP out of a simply competitive or truly aggressive motivation. Non-Suicidal Self-Injury Assessment Tool. The 6-item NSSIAT was constructed to assess state-level desires to engage in self-harm behaviors (Chester, Whitt, Davis, & DeWall, 2017). Representative items included “Right now, how much do you want to hurt yourself on purpose?” and “If there was a sharp object lying on the table how likely would you be to use to hurt yourself?” Participants rated each item along a 1 (not at all) to 9 (extremely) response scale. History of Physical Fights Scale. This open-ended measure simply asked participants how many physical fights they had been in, in the past 5 years and the past year. This measure allowed for the assessment of real-world acts of aggressive behavior. TAP VALIDATION 11 Provocation Manipulation Check. The 7-item PMC (Denson, von Hippel, Kemp, & Teo, 2010) tested the efficacy of the essay feedback provocation manipulation used in this study. In this measure, participants reported the extent to which they felt “provoked”, “insulted”, and other forms of aversive responses to the essay feedback they received, along a 1 (strongly disagree) to 7 (strongly agree) scale. Exploratory Measures Self-report measures that were not included in the study’s preregistration are summarized below in the temporal sequence in which participants completed them. These three measures were included in order to assess the TAP’s association with other forms of antisocial (e.g., rule-breaking) and aggressive (e.g., displaced, proactive) behavior. Subtypes of Antisocial Behavior Questionnaire. Aggression in both physical and verbal forms falls under the larger umbrella of antisocial behaviors, which also include deviant rule-breaking tendencies (Burt & Donnellan, 2010). The 32-item STAB assesses these three subtypes of antisocial behavioral tendencies, by asking participants the frequency (1 - never to 5 - nearly all the time) with which they perform each behavioral category. Reactive/Proactive Aggression Questionnaire. The 23-item RPAQ assesses the dispositional tendency to act aggressively out of impulsive and rash reactance to provocation (Reactive) or out of an instrumental and calculated motivation (Proactive; Raine et al., 2006). Participants responded to Reactive items such as “how often have you reacted angrily when provoked by others?” and Proactive items such as “how often TAP VALIDATION 12 have you used force to obtain money or things from others?” along a 0 (never) to 2 (often) response scale. Displaced Aggression Questionnaire. Not all aggression is directed towards the provocateur. Indeed, displaced aggression takes the form of harmful acts directed at innocent third-parties. This tendency to take out one’s aggression on bystanders can be quantified via the DAQ (Denson, Pedersen, & Miller, 2006). In this measure, participants responded to 10 items such as “I take my anger out on innocent others” along a 1 (extremely uncharacteristic of me) to 7 (extremely characteristic of me) response scale. Procedure Informed consent and introduction. Participants arrived individually to my laboratory’s waiting room at Virginia Commonwealth University. A trained experimenter greeted them and escorted them back into our testing room, where they were seated at a computer desk. Participants read and signed an informed consent form. Afterwards, the experimenter verbally screened each participant for any medical issues that might lead to sensitive hearing and therefore put them at risk of hearing damage from the TAP. Participants then listened to a verbal introduction to and overview of the study by the experimenter. Provocation manipulation. In order to experimentally-induce interpersonal provocation, this study used a validated essay evaluation paradigm in which participants received harsh or positive feedback on an essay. This paradigm has been used effectively in previous research to provoke individuals into experiences of anger and aggressiveness (Bushman & Baumeister, 1998; Chester & DeWall, 2017; Chester, Merwin, & DeWall, 2015). To implement this paradigm, participants were all instructed to TAP VALIDATION 13 write a short essay about “an important moment in your life that had a great impact on who you are today.” Participants were then informed that they would be exchanging their essay with another Virginia Commonwealth University student, who was ostensibly down the adjacent hallway. The purported purpose of the essay task was to provide the experimenters with an estimate of participants’ writing and critiquing skills. Participants wrote their essay for five minutes and then placed it in a folder, which was taken away by the experimenter. The experimenter promptly returned with a folder containing their partner’s essay, which possessed identical content across participants, and had actually been pre-written by one of the experimenters. Essays were written in sex-stereotypic handwriting that matched the perceived sex of the participant. This sex-matched handwriting procedure was used to subtly indicate to participants that their essay partner was same-sex, as aggressive responses are profoundly altered by the sex of both the perpetrator and victim (Smuts, 1992). Participants then spent 3 minutes reading and evaluating their partner’s essay along various criteria providing five numeric scores that ranged from 1 (poor) to 7 (excellent), summed to a total score of 5-35 points. The essay evaluation also included a space to write comments. The experimenter then collected the essay and evaluation form from participants and left the room. The experimenter promptly returned with the participant’s essay and an essay evaluation form from their fictitious partner that had been prewritten in the same handwriting as the fictitious essay. The essay evaluation contained either negative (8/35 points, “One of the WORST essays I’ve EVER read!”) or positive (33/35 points, “Great essay!”) feedback, as determined by a randomized list of condition TAP VALIDATION 14 assignments. After reviewing the essay feedback for one minute, the experimenter collected the essay and evaluation form from the participant. Laboratory aggression tasks. The experimenter then directed participants to the computer to complete the TAP, describing it as a measure of reaction time. The experimenter explained the task to the participant, gave them a sample of a moderately loud noise blast, and then left to check on the fictitious partner’s internet connection to the participant. The experimenter returned quickly and instructed the participant to begin the task. While the participant completed the TAP, the experimenter weighed out two plates with crackers on them. After completing the TAP, the experimenter returned to the participant with a plate of crackers and a bottle of PAIN 100%© hot sauce. Participants were informed that they would now complete a measure of pain regulation, in which they would apply some of the hot sauce to the crackers and then eat them. After the participant added their desired level of hot sauce, the experimenter returned and informed them that they needed to leave to help another experimenter. The experimenter placed a second plate of crackers in front of the participant and asked them if they would help by applying hot sauce to these crackers, which their essay evaluation partner would have to eat. After a 60-second absence, the experimenter took both plates and weighed them away from the participant. The experimenter then returned and opened up a computerized version of the Voodoo Doll Aggression Task. This task was framed as a mental visualization exercise. The experimenter left the room while the participant completed the task. After completing the Voodoo Doll Aggression Task, participants completed a battery of TAP VALIDATION 15 personality questionnaires including the items detailed in the above ‘Measures’ section. Upon completion of these questionnaires, the study was concluded. Suspicion probe and debriefing. The experimenter then sat across from the participant and conducted a funneling suspicion interview intended to gauge whether participants guessed any of the deceptive elements of the study. Participants were then debriefed as to the deception in the study and the study’s true purpose, were provided with counseling resources in case of lingering psychological distress, and then were escorted from the laboratory, with thanks. The study’s procedures lasted no longer than one hour. Data Analysis Plan In alignment with my preregistration plan, I scored the TAP by calculating the mean of all 50 individual measurements of the task (2 settings per trial x 25 trials). Group comparisons on this measure were conducted using parametric, independent samples t-tests. Associations between study variables and this TAP score were estimated with either bivariate correlation analyses, for variables with parametric distributions, or generalized linear modeling (specifying a loglinear-link and Poissondistribution) for zero-inflated, positively-skewed, count variables. Bias-corrected and accelerated 95% confidence intervals were calculated around effect size estimates using nonparametric bootstrapping techniques (1,000 resamples per bootstrap analysis). Inferences were made about each effect based on whether the test was statistically significant (i.e., p < .05 and 95% confidence interval did not include 0). Internal consistency estimates of the TAP’s 25 trials were obtained by calculating a Cronbach’s alpha and by conducting a principal components analysis, specifying a TAP VALIDATION 16 direct oblimin rotation in order for potential components to be correlated with one another. All aforementioned analyses were conducted using SPSS 24. As an exploratory technique, I applied multilevel linear modeling (MLM) to TAP scores as the data were collected across 25 time points and this technique allowed for the estimation of slopes, instead of simple mean-level effects across all trials. These models were fit specifying a random intercept and slope and estimating fixed effects of both the slope and the various study variables. These multilevel models were implemented with the PROC MIXED command in SAS 9.4. Results Deviations from Preregistration Plan I was unable to achieve the 50% gender equity that I outlined in my preregistration plan. Further, I did not enact the outlier exclusion rule that I described in the preregistration (i.e., exclude all datapoints that were beyond 1.5 times the interquartile range from the median), as this proved to be far too conservative and would have led to the exclusion of an unjustified and substantial portion of my sample. Descriptive Statistics Mean TAP scores exhibited substantial variability across participants, M = 5.06, SD = 2.03, range: 0.00 – 10.00. TAP scores did not exhibit problematic skewness, -0.41, SE = 0.19, or kurtosis, 0.31, SE = 0.38. The voodoo doll pin count was zero-inflated (40.0% zero values), positively skewed, 2.37, and leptokurtic, 5.63, as with the physical fight count across the past 5 years was zero-inflated (66.3% zero values), positively skewed, 2.31, and leptokurtic, 6.36. Internal consistency was adequate for the Physical Aggression subscale of the TAP VALIDATION 17 BAQ, α = .72, the Revengeful Harm subscale of the AMS, α = .82, the Non-Suicidal Self-Injury Assessment Tool, α = .97, and the Provocation Manipulation Check, α = .96. Internal consistency was inadequate for the Verbal subscale of the BAQ, α = .501. Manipulation Check The random assignment procedure sorted 78 participants into the negative feedback condition of the provocation manipulation, and 82 participants into the positive feedback condition. Participants who were assigned to the negative feedback condition reported greater levels of provocation due to their essay feedback than participants in the positive feedback condition, t(158) = 9.42, p < .001, d = 1.50 [1.17, 1.80]. Confirmatory Tests Construct validity - provocation manipulation effect. Participants who were assigned to the negative feedback condition of the provocation manipulation had higher TAP scores than participants in the positive feedback condition, t(158) = 3.24, p = .001, d = 0.52 [0.17, 0.83] (Figure 1). 1 The 5-item Verbal Aggression subscale of the full Buss-Perry Aggression Questionnaire had an acceptable internal consistency, α = .70. Yet results did not meaningfully differ if this measure was used, instead of the BAQ subscale. TAP VALIDATION 18 Figure 1. Violin plots depicting the distributions of mean TAP scores by essay feedback condition. Construct validity - gender differences. Contrary to my prediction, males had lower TAP scores than females, t(158) = -1.99, p = .048, d = -0.32 [-0.60, 0.02]. Construct validity - revengeful harm motivations. Suggesting that noise blast settings are motivated by the desire to inflict vengeful harm on participants’ targets, TAP scores were positively associated with revengeful aggression motivations, r(158) = .39 [.26, .50], p < .001. Convergent validity - laboratory aggression. Supporting the convergent validity of the TAP, mean noise blasts were positively associated with the amount of hot sauce participants gave to their partner in the taste test, r(158) = .23 [.07, .36], p = .004. One participant’s hot sauce allocation was 8.14 SDs above the sample mean. After removing this outlier, TAP scores remained positively associated with hot sauce allocations, r(157) = .20 [.04, .37], p = .011 (Figure 2A). TAP scores were also TAP VALIDATION 19 associated with the number of pins stuck in the voodoo doll that represented their partner, B = 0.30 [0.26, 0.33], SE = 0.02, Χ(1, 158) = 316.82, p < .001 (Figure 2B). Figure 2. Scatterplots depicting positive associations between mean TAP scores and (A) hot sauce weights, in grams, and (B) voodoo doll pin counts, directed towards participants’ essay evaluators. Convergent validity - trait aggression. Counter to predictions, TAP scores were unassociated with trait physical aggression, r(158) = .06 [-.10, .21], p = .445. External validity. Offering mixed evidence of the TAP’s external validity, TAP scores were unassociated with the number of physical fights participants had been in over the past 5 years, B = -0.01 [-0.10, 0.08], SE = 0.05, Χ(1, 158) = 0.06, p = .810. Yet TAP scores were positively associated with fights that participants had been in over the past year, B = 0.22 [0.02, 0.42], SE = 0.10, Χ(1, 158) = 4.42, p = .036. Discriminant validity. Again there was mixed evidence for the TAP’s discriminant validity, such that TAP scores were unassociated with trait verbal aggression, r(158) = .09 [-.09, .25], p = .252, yet unexpectedly were positively associated with current self-harm tendencies, r(158) = .16 [.03, .29], p = .038. TAP VALIDATION 20 Internal consistency. The 50 individual measurements of the TAP (25 trials x 2 settings per trial) exhibited excellent internal consistency, α = .98. The average of duration and volume settings from each of the 25 trials exhibited a single-component structure, as evidenced by an initial component with an eigenvalue of 13.72, which explained 55.01% of the variance in TAP responses. Each of the 25 trials substantially loaded onto this component, loadings > .61. Two other factors marginally passed the 1.00 eigenvalue cutoff at 1.83 and 1.32. However, each of these smaller factors only had a single trial that loaded above a .40 factor-loading cutoff, rendering them largely uninterpretable. Exploratory Analyses Specificity of hot sauce measure to aggression. Participants’ TAP scores were unassociated with hot sauce allocations towards themselves, r(156) = .11 [-.09, . 27], p = .157. Two participants’ did not provide data for the self-administration of hot sauce as they ate off their plate, against instructions, rendering the plate’s weight uninformative. This finding supports the assertion that hot sauce allocations to essay evaluators reflected an aggressive motivation and not a more general tendency to allocate hot sauce indiscriminately, Revenge versus instrumental motivations. To assess whether TAP scores were motivated more by vengeance (i.e., wanting to inflict harm) than by instrumental motives (i.e., wanting to win the game), I regressed TAP scores onto both Revengeful Harm and Instrumental motives scores from the AMS using multiple linear regression. Revengeful harm motives remained positively associated with TAP scores, B = 0.45 [0.17, 0.70], t(157) = 3.27, p = .001, whereas instrumental motives were not associated TAP VALIDATION 21 with TAP scores, B = 0.16 [-0.09, 0.42], t(157) = 1.34, p = .181. Internal consistency was adequate for the Instrumental Motivation subscale of the AMS, α = .78. Associations with different types of aggressive and antisocial behavior. Using the STAB questionnaire, TAP scores were unassociated with physical aggressiveness, r(158) = .07 [-.08, .22], p = .367, social aggressiveness, r(158) = .10 [-.05, .25], p = .212, and rule breaking, r(158) = -.03 [-.22, .17], p = .716. TAP scores were further unassociated with displaced aggression scores from the DAQ, r(125) = .06 [-.13, .24], p = .475, reactive aggression scores from the RPAS, r(140) = .16 [.01, .31], p = .061, and proactive aggressiveness, r(140) = .07 [-.05, .21], p = .379. Internal consistency was adequate for the Physical Aggression, α = .86, Social Aggression, α = . 85, and Rule-Breaking, α = .70, subscales of the STAB, as with the Displaced Aggression subscale of the DAQ, α = .92, and the Reactive, α = .83, and Proactive, α = .84, Aggression subscales of the RPAS. Multilevel linear modeling. Using MLM, TAP scores exhibited substantial withinperson, B = 3.63, SE = 0.08, Z = 42.89, p < .001, and between-person, B = 4.63, SE = 0.59, Z = 7.89, p < .001, variability and tended to decrease over the course of the task, B = -0.02 [-.03, -.001], SE = 0.01, t(159) = -2.14, p = .034. MLM was then used to replicate the study’s univariate, confirmatory analyses (summary of results in Table 1). Replicating the univariate analyses, the experimental provocation induction increased TAP scores and TAP scores were associated with greater hot sauce allocations, being female, and non-suicidal self-injury tendencies. Again, TAP scores were positively associated with revengeful aggression motivations, even when controlling for instrumental motives and TAP scores were unassociated with TAP VALIDATION 22 physically and verbally aggressive traits. Zero-inflated measures from the study (i.e., voodoo doll pin counts and past fight counts) were excluded, as the predictors were too zero-inflated to serve as independent variables in MLM. Table 1. Multilevel associations between study variables and TAP scores across all 25 trials. Independent Variable Provocation B [95% CI] 1.00 SE 0.31 t (df) 3.24 p .001 Hot Sauce Weight [0.39, 1.61] 0.10 0.03 (158) 2.96 .003 Female ( > Male) [0.03, 0.16] 0.69 0.33 (158) 2.09 .036 Revengeful Harm Motives [0.04, 1.34] 0.57 0.11 (158) 5.39 < .001 Revengeful Harm Motives [0.36, 0.78] 0.45 0.14 (158) 3.33 < .001 (controlling for Instrumental) Self-Injury Tendencies [0.19, 0.72] 0.35 0.17 (157) 2.08 .038 Trait Physical Aggression [0.02, 0.68] 0.07 0.11 (158) 0.70 .484 Trait Verbal Aggression [-0.13, 0.28] 0.16 0.14 (158) 1.16 .247 [-0.11, 0.44] (158) Subsequent analyses tested the effect of trait physical and verbal aggression on the slopes of TAP scores across the 25 trials, not simply on the mean of all 25 trials. Although physically aggressive traits were unassociated with mean level TAP scores, they were associated with more positive aggression slopes across the task, B = 0.01 [0.003, 0.02], SE = 0.01, t(158) = 2.60, p = .009 (Figure 3). TAP VALIDATION 23 Figure 3. TAP scores (mean of duration and volume settings) across all 25 trials of the task, by high (+1 SD) and low (-1 SD) trait physical aggressiveness (PA). The simple slopes of this effect of trait physical aggression on TAP slopes were then probed using an online utility (http://www.quantpsy.org/interact/hlm2.htm; Preacher, Curran, & Bauer, 2006). At low (-1 SD) levels of trait physical aggression, TAP scores became progressively lower across the task, B = -0.03, SE = 0.01, t(158) = -3.25, p = . 001. However, at relatively high (+1 SD) levels of trait physical aggression, TAP scores remained constant, B = 0.00, SE = 0.01, t(158) = 0.27, p = .784. This slope modulation was not observed for trait verbal aggression, B = 0.00 [-0.01, 0.01], SE = 0.01, t(158) = -0.05, p = .958. Discussion TAP VALIDATION 24 Each year, dozens of scientific findings are published using the Taylor Aggression Paradigm (TAP; Taylor, 1967). The TAP has facilitated substantial developments in our understanding of aggression, such as identifying aggression as a function of threats to vulnerable egos (Bushman & Baumeister, 1998), violent media (Bartholow, Bushman, & Sestir, 2006), and alcohol consumption (Giancola & Zeichner, 1997). However, the ability to implement and analyze the TAP in a nearly unlimited array of ways is both a strength and a weakness. The TAP’s flexibility can enable researcher degrees-offreedom, whether conscious or incidental, to undermine the validity of the task and introduce false positives into the literature (Elson et al., 2014). Using a preregistered version of the TAP and a single scoring approach, I sought to remove the flexibility of the task and test its subsequent validity. Doing so would allow aggression researchers to retain this form of the tool as a valid laboratory measure. Evidence for the over-arching validity of the mean score approach to the 25-trial TAP was mixed. As predicted, these mean TAP scores were higher for experimentally provoked participants, as compared to their non-provoked counterparts. Further, the TAP demonstrated convergent validity with the two other laboratory aggression measures: the Hot Sauce Aggression Task (Lieberman et al., 1999) and the Voodoo Doll Aggression Task (DeWall et al., 2013). TAP scores corresponded to the appropriate motivation, the desire to inflict retributive harm, and not with competitive or instrumental motives to simply win the competition. These findings suggest that the TAP responds appropriately to aggression-increasing situational effects and shows agreement with similar measures. Moreover, these results support the construct validity of the TAP as TAP VALIDATION 25 an effective measure of currently felt, state-level, and “in the moment” aggressive tendencies. The TAP did not exhibit the predicted associations with dispositional measures of aggressiveness, such as gender and trait physical aggression. The disproportionate level of females in the sample may have contributed to the finding that females had higher TAP scores than males, though this is unclear. Exploratory multilevel analyses suggested that the univariate approach to the TAP failed to reveal a significant association with trait physical aggression because the effect is present at later trials and not during the initial phase of the task. This temporal effect may have arisen from the fact that early trials of the TAP are characterized by extreme provocation from participants’ opponents, a situational input that may override their dispositional tendencies towards aggressiveness. As such, the TAP may be an accurate measure of aggressive traits when inferences are based on the slope (and not the mean) of TAP scores. Future research, which takes a confirmatory approach to these multilevel analyses, is needed to demonstrate whether this is the case as these MLM analyses were purely exploratory. TAP scores also corresponded to physical fight frequency over the past year, but not over the past 5 years, suggesting that the TAP may have external validity, but within a shorter time frame. The relatively young age of our sample might also have impacted this finding as there are significant aggression-related developmental landmarks between adolescence and emerging adulthood (Cleverley, Szatmari, Vaillancourt, Boyle, & Lipman, 2012). The developmental trajectory of TAP scores is a fruitful area for future TAP VALIDATION 26 research, as is the determination of whether the TAP can indeed predict ‘real-world’ acts of violence. Mixed evidence was also observed for the TAP’s discriminant validity. As predicted, TAP scores were unassociated with trait verbal aggressiveness, yet were positively associated with state-level self-harm tendencies, against prediction. In hindsight, the selection of self-harm tendencies as an index of discriminant validity was questionable as these two variables have been previously-linked (Muehlenkamp & Gutierrez, 2007). Future research should seek to better test the TAP’s discriminant validity. More work is also needed to better articulate the nomological network around TAP scores, which may be done by including more general personality and externalizing behavior measures (e.g., Miller & Lynam, 2006). Principal components analyses and internal consistency estimates suggested that the 50 datapoints of the TAP do, in fact, load onto a central latent construct. As such, arguments that different trials of the TAP measure different constructs (e.g., first trial = unprovoked aggression, second trial = retaliatory aggression), are not supported by these data, which instead suggest that this task is largely homologous. More research is needed to test whether different trials of the TAP do represent quantitatively or qualitatively different measures. Conclusions Flexible psychological measures are a boon to the field. They are adaptable to various contexts and hypotheses and enable a wider array of research. However, this flexibility must be tempered with preregistration of the task’s implementation, scoring, and analysis, lest this flexibility undermine sound science. Using a preregistration TAP VALIDATION 27 approach, the 25-trial TAP appears to be a valid measure of state-level, currently felt tendencies towards acts of physical retribution. Yet more work is needed to explore the TAP’s ability to assess more dispositional and ‘real-world’ forms of aggression. Assuming these findings are replicated and that the task is used appropriately, the TAP should have a long and healthy life in the aggression researcher’s toolkit. TAP VALIDATION 28 Acknowledgments The author is grateful to the Center for Open Science for incentivizing this work with their Preregistration Challenge, to Malte Elson for illustrating the severity of the TAP’s flexibility issues, and to Brad Bushman for creating and disseminating the computerized version of the TAP used in this project. TAP VALIDATION 29 References Anderson, C. A., & Bushman, B. J. (1997). External validity of “trivial” experiments: The case of laboratory aggression. Review of General Psychology, 1(1), 19–41. Anderson, C. A., & Murphy, C. R. (2003). Violent video games and aggressive behavior in young women. Aggressive Behavior, 29(5), 423–429. Bartholow, B. D., Bushman, B. J., & Sestir, M. A. (2006). Chronic violent video game exposure and desensitization to violence: Behavioral and event-related brain potential data. Journal of Experimental Social Psychology, 42(4), 532–539. Bernstein, S., Richardson, D., & Hammock, G. (1987). Convergent and discriminant validity of the Taylor and Buss measures of physical aggression. Aggressive Behavior, 13(1), 15–24. Bond, A., & Lader, M. (1986). A method to elicit aggressive feelings and behaviour via provocation. Biological Psychology, 22(1), 69–79. Burt, S. A., & Donnellan, M. B. (2010). Evidence that the Subtypes of Antisocial Behavior questionnaire (STAB) predicts momentary reports of acting-out behaviors. Personality and Individual Differences, 48(8), 917–920. Bushman, B. J. (1995). Moderating role of trait aggressiveness in the effects of violent media on aggression. Journal of Personality and Social Psychology, 69(5), 950– 960. Bushman, B. J., & Baumeister, R. F. (1998). Threatened egotism, narcissism, selfesteem, and direct and displaced aggression: Does self-love or self-hate lead to violence? Journal of Personality and Social Psychology, 75(1), 219–229. TAP VALIDATION 30 Chester, D. S. & DeWall, C. N. (2016). The pleasure of revenge: Retaliatory aggression arises from a neural imbalance toward reward. Social Cognitive and Affective Neuroscience, 11(7), 1173-1182. Chester, D. S. & DeWall, C. N. (2017). Combating the sting of rejection with the pleasure of revenge: A new look at how emotion shapes aggression. Journal of Personality and Social Psychology, 112(3), 413-430. Chester, D. S., Merwin, L. M., & DeWall, C. N. (2015). Maladaptive perfectionism’s link to aggression and self-harm: Emotion regulation as a mechanism. Aggressive Behavior, 41(5), 443-454. Chester, D. S., Whitt, Z. T., Davis, T. S., & DeWall, C. N. (2017). The Voodoo Doll SelfInjury Task: A new measure of self-harm tendencies. Manuscript under review. Cleverley, K., Szatmari, P., Vaillancourt, T., Boyle, M., & Lipman, E. (2012). Developmental trajectories of physical and indirect aggression from late childhood to adolescence: Sex differences and outcomes in emerging adulthood. Journal of the American Academy of Child & Adolescent Psychiatry, 51(10), 1037–1051. Denson, T. F., Pedersen, W. C., & Miller, N. (2006). The displaced aggression questionnaire. Journal of Personality and Social Psychology, 90(6), 1032–1051. Denson, T. F., von Hippel, W., Kemp, R. I., & Teo, L. S. (2010). Glucose consumption decreases impulsive aggression in response to provocation in aggressive individuals. Journal of Experimental Social Psychology, 46(6), 1023–1028. TAP VALIDATION 31 Elson, M., Mohseni, M. R., Breuer, J., Scharkow, M., & Quandt, T. (2014). Press CRTT to measure aggressive behavior: The unstandardized use of the competitive reaction time task in aggression research. Psychological Assessment, 26(2), 419. Epstein, S., & Taylor, S. P. (1967). Instigation to aggression as a function of degree of defeat and perceived aggressive intent of the opponent1. Journal of Personality, 35(2), 265–289. Ferguson, C. J., Smith, S., Miller-Stratton, H., Fritz, S., & Heinrich, E. (2008). Aggression in the laboratory: Problems with the validity of the modified Taylor Competitive Reaction Time Test as a measure of aggression in media violence studies. Journal of Aggression, Maltreatment & Trauma, 17(1), 118–132. Giancola, P. R., & Parrott, D. J. (2008). Further evidence for the validity of the Taylor Aggression Paradigm. Aggressive Behavior, 34(2), 214–229. Giancola, P. R., & Zeichner, A. (1995). Construct validity of a competitive reaction-time aggression paradigm. Aggressive Behavior, 21(3), 199–204. Giancola, P. R., & Zeichner, A. (1997). The biphasic effects of alcohol on human physical aggression. Journal of Abnormal Psychology, 106(4), 598–607. Krämer, U. M., Jansma, H., Tempelmann, C., & Münte, T. F. (2007). Tit-for-tat: The neural basis of reactive aggression. NeuroImage, 38(1), 203–211. Lieberman, J. D., Solomon, S., Greenberg, J., & McGregor, H. A. (1999). A hot new way to measure aggression: Hot sauce allocation. Aggressive Behavior, 25(5), 331– 348. Miller, J. D., & Lynam, D. R. (2006). Reactive and proactive aggression: Similarities and differences. Personality and Individual Differences, 41(8), 1469–1480. TAP VALIDATION 32 Muehlenkamp, J. J., & Gutierrez, P. M. (2007). Risk for suicide attempts among adolescents who engage in non-suicidal self-injury. Archives of Suicide Research, 11(1), 69–82. Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools for probing interactions in multiple linear regression, multilevel modeling, and latent curve analysis. Journal of Educational and Behavioral Statistics, 31(4), 437–448. Raine, A., Dodge, K., Loeber, R., Gatzke-Kopp, L., Lynam, D., Reynolds, C., … Liu, J. (2006). The reactive–proactive aggression questionnaire: Differential correlates of reactive and proactive aggression in adolescent boys. Aggressive Behavior, 32(2), 159–171. Rozin, P., Millman, L., & Nemeroff, C. (1986). Operation of the laws of sympathetic magic in disgust and other domains. Journal of Personality and Social Psychology, 50(4), 703–712. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. Smuts, B. (1992). Male aggression against women. Human Nature, 3(1), 1–44. Taylor, S. P. (1967). Aggressive behavior and physiological arousal as a function of provocation and the tendency to inhibit aggression. Journal of Personality, 35(2), 297–310. Tedeschi, J. T., & Quigley, B. M. (1996). Limitations of laboratory paradigms for studying aggression. Aggression and Violent Behavior, 1(2), 163–177. TAP VALIDATION 33 Tedeschi, J. T., & Quigley, B. M. (2000). A further comment on the construct validity of laboratory aggression paradigms: A response to Giancola and Chermack. Aggression and Violent Behavior, 5(2), 127–136. Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638. Webster, G. D., DeWall, C. N., Pond, R. S., Deckman, T., Jonason, P. K., Le, B. M., … Bator, R. J. (2014). The brief aggression questionnaire: Psychometric and behavioral evidence for an efficient measure of trait aggression. Aggressive Behavior, 40(2), 120–139.
© Copyright 2025 Paperzz