🔬 METHODOLOGY 2009

🎯 A Distribution of Absolute Pitch Ability as Revealed by Computerized Testing

Patrick Bermudez and Robert J. Zatorre

Music Perception (2009) Vol. 27, Issue 2, pp. 89–101

📅 Accepted: June 24, 2009 👥 N=51 musicians 🔬 Computerized behavioral test 🏫 Montreal Neurological Institute, McGill University

🎯 Key Finding

Absolute pitch is not binary — it exists on a continuum. Using a novel computerized test that measures both accuracy and reaction time, Bermudez & Zatorre found a continuous distribution of AP ability across 51 musicians, with a substantial number of intermediate performers (mean deviation 1–2 semitones) who fall between clear AP possessors and non-possessors. This challenges the occasional claim of bimodal distribution and demonstrates that scoring method and recruitment strategy heavily influence whether AP appears binary or continuous.

📊 Study Design

Participants

N=51 musicians (39 females, 12 males)
27 self-reported as AP possessors, 24 as non-possessors (NAP)
Average age: 23.1 years (SE = 0.52)
Mean age of training onset: 6.1 years (SE = 0.32)
Mean total training: 16.4 years (SE = 0.63)
Recruited from music faculties of two Montreal universities
All gave informed consent; approved by Montreal Neurological Institute ethics
2 participants reclassified based on performance (1 self-reported AP scored 2 SD below AP mean; 1 NAP scored 2 SD above NAP mean)

Stimuli

108 trials (36 notes × 3 intensity levels)
Range: C₃ to B₅ (3 octaves)
Based on A = 440 Hz equal temperament
Each note presented at 3 intensities: −1, −4, and −7 dB (to prevent loudness cues)
Synthetic multiharmonic tones: fundamental + ~9 harmonics (12 dB amplitude decrease between harmonics)
Duration: 1 second (50 ms linear onset and offset ramps)
16-bit sampling depth
Presented at ~75 dB SPL via headphones

🎮 The Computerized Test Interface

Chroma Response (Step 1)

Circular wheel with 12 positions (all pitch classes equidistant from center)
Cursor resets to center after each trial (no positional bias)
All 12 responses equally accessible (unlike piano keyboard)
No timeout: self-paced (allows measurement of natural response speed)
Critical innovation: avoids keyboard familiarity confounds

Octave Response (Step 2)

After selecting chroma, indicate which octave (C to B range)
Color-coded bands in greyscale
Emphasized that exact grand staff position not required
“Simply click anywhere in the color band representing the octave”
Allows analysis of chroma accuracy and octave accuracy separately

Design Innovations (5 Key Advances)

Multiharmonic synthetic stimuli: Equally unfamiliar to all participants (unlike piano/violin tones)
Both chroma and octave judgments collected: Separates pitch class from pitch height
Precise reaction times: 10 ms resolution (identifies strategy differences)
Circular response interface: All 12 responses equidistant (no keyboard bias)
Self-paced: Captures natural response speed (no artificial time pressure)

📈 Results

AP vs NAP Performance

AP Group (n=27*)

77% correct

MAD = 0.38 semitones · RT = 3,346 ms

NAP Group (n=24*)

15% correct

MAD = 2.48 semitones · RT = 7,586 ms

All differences highly significant: accuracy F(1,49) = 217.78, p < .001; MAD F(1,49) = 221.24, p < .001; RT F(1,49) = 30.59, p < .001. (*after reclassification of 2 outliers)

The Continuum of Ability

Best performers: Mean deviation ~0 semitones, >95% correct (essentially perfect AP)
Intermediate performers: MAD 1–2 semitones, 40–60% correct — substantial group
Random performers: MAD ~3 semitones (flat response distribution)
Not clearly bimodal: When considering MAD (not just % correct), the gap between groups is filled by intermediates
8 high-performing participants: MAD < 1 semitone, all responded within 6 seconds

Pitch Class Dependence (White-Key Advantage)

Diatonic notes (C major) identified more accurately and quickly than non-diatonic
Marginally significant interaction: F(1,49) = 3.72, p = .06
Driven by AP group: white keys significantly more accurate (Tukey HSD)
For RT: significant interaction F(1,49) = 23.12, p < .001 (AP faster on white keys)
Pitch class A identified best overall: highest accuracy + fastest RT in AP group
Replicates Miyazaki 1988, 1989, 1990; Takeuchi & Hulse 1991
NAP participants also showed A advantage (likely using it as relative reference)

Reaction Time as Key Dimension

Strong correlation: MAD vs log RT: r = .63, p < .0001
Better performers respond faster (not trading speed for accuracy)
NAP with low MAD show longer reaction times → suggests alternative strategies (relative pitch calculations)
Combined index (MAD × logRT) still shows continuum, not bimodality
RT captures what % correct misses: two participants both at 8.3% correct had vastly different MADs (1.44 vs 2.81) and RTs (4905 vs 9124 ms)

Split-Half Reliability

MAD: r(49) = .99, p < .001
Log RT: r(49) = .98, p < .001
Exceptionally high reliability — test is internally consistent
Suggests even a shortened version (54 trials) would be sufficiently accurate for screening

Age of Training Onset

AP group started training significantly earlier: M = 5.46 years vs NAP M = 6.95 years; t(43) = 2.52, p = .02
MAD significantly correlated with training onset: r(43) = .46, p = .01
% correct correlated with onset: r(43) = .44, p = .002
Log RT correlated with onset: r(42) = .40, p = .007
Consistent with early-learning theory (Takeuchi & Hulse 1993)

💡 Why Scoring Method Matters

A critical methodological contribution of this paper is demonstrating how scoring strategy creates or destroys the appearance of bimodality:

Strict % correct: Only counts exact chroma matches → artificially sharpens the gap between AP and non-AP, creating apparent bimodality
Semitone credit (3/4 point for ±1 semitone): Diminishes distinction between high performers who perform perfectly and those who are consistently close
Mean Absolute Deviation (MAD): Most informative single measure — captures consistency regardless of absolute accuracy, ranges 0 (perfect) to 3 (random)
MAD + Reaction Time combined: Best overall descriptor, as it penalizes time-consuming alternative strategies (relative pitch calculations)

Conclusion: The bimodal distribution sometimes reported in AP literature may be an artifact of strict scoring methods, not a genuine feature of the underlying ability distribution.

💬 Critical Analysis

Strengths

Novel computerized interface eliminates keyboard familiarity bias
Captures both accuracy AND speed (multidimensional assessment)
Exceptional split-half reliability (r = .99)
Separates chroma and octave judgments
Controls for loudness cues (3 intensity levels)
Uses synthetic stimuli (no instrument-familiarity confound)
Rigorous statistical analysis with reclassification of outliers
Directly addresses long-standing controversy about bimodality vs continuum

Limitations

N=51 (moderate sample; larger would better characterize intermediate zone)
Self-selected participants (motivated musicians from university programs)
Montreal recruitment only (cultural/linguistic homogeneity)
No test-retest reliability data (only split-half)
No non-musician comparison group
Synthetic stimuli may underestimate AP for instrument-familiar tones
No longitudinal component (single session snapshot)

Impact & Legacy

Foundational methodology paper. The computerized test designed here became the basis for subsequent AP research at the Zatorre Lab and influenced the field’s move toward more rigorous, multidimensional AP assessment. Bermudez & Zatorre’s approach directly inspired Bairnsfather et al.’s (2025) systematic review calling for a standardized AP phenotyping task.

The key insight — that AP exists on a continuum with important intermediate levels — has been confirmed repeatedly and is now considered the consensus view in the field.

📚 Related Studies

🔗 Access & Resources

📄 Full Text

University of California Press

📊 Citation

DOI: 10.1525/mp.2009.27.2.89
Journal: Music Perception, Vol. 27, Issue 2, pp. 89–101
ISSN: 0730-7829 (print), 1533-8312 (electronic)
Affiliation: Montreal Neurological Institute & BRAMS Laboratory, McGill University