๐ง Auditory Working Memory Predicts Individual Differences in Absolute Pitch Learning
Adults can improve AP categorization in a single session โ and auditory working memory predicts who succeeds
๐ Study Overview
Stephen C. Van Hedger, Shannon L. M. Heald, Rachelle Koch, Howard C. Nusbaum
Cognition, 140, 95โ110
2015 (online April; print July)
Exp 1: N=17 ยท Exp 2: N=29 (UChicago adults, no AP)
๐ฏ Core Finding
Across two experiments, individual differences in auditory working memory (WM) significantly predicted how well adults learned absolute pitch categories in a single laboratory session โ even when controlling for age of musical training onset and overall musical experience.
Mediation result: The well-known relationship between early musical training and AP ability is statistically mediated by auditory WM. This reframes the "critical period" interpretation: early training may matter mostly because it shapes general auditory WM, which in turn enables AP category learning later.
Practical implication: High-WM individuals (>1 SD above the mean) reached 43.8% rote accuracy and 30.5% generalization accuracy after a single session โ comparable to the post-training scores of the valproate group in Gervain et al. (2013), which had used a full week of pharmacologically-assisted training.
๐ Study Design
Experiment 1 โ Implicit Note Memory as the WM measure
Participants
- N=17 University of Chicago students (M=20.6 years, SD=2.6, range 18โ26)
- No reported AP, variable musical experience (M=7.4 years, SD=4.8, range 0โ14)
- Not specifically recruited for musical background
Working memory measure โ Implicit Note Memory (INM) task
- Hear a 250-ms sine target tone โ masked by 1000 ms of white noise โ reproduce by adjusting a starting note (1โ7 semitones above/below) using on-screen arrows in 33-cent steps
- 64 trials total (4 target notes ร 8 starting notes ร 2 repetitions)
- Score = absolute deviation in 33-cent steps from target (lower = better WM precision)
Explicit pitch-labeling task
- Pretest: 180 isolated piano notes (one octave, C4โB4), label by note name on keyboard, no feedback
- Training: 180 piano notes (3 blocks ร 60), same task with feedback after each trial
- Rote posttest: 60 notes (same pitches), no feedback
- Generalization posttest: 48 notes spanning untrained octaves (C3โB3, C5โB5) and untrained timbres (acoustic guitar in original octave)
- Each trial separated by 1000 ms white noise + 2000 ms scrambled piano tones to prevent relative-pitch strategies
Retest (n=6)
- 6 of the 17 participants returned for delayed retest at M=184 days (~6 months) post-training
- Abridged rote posttest (48 trials) + full generalization posttest (48 trials), no feedback
- No reported rehearsal or retraining between sessions
Experiment 2 โ Auditory n-back as the WM measure (non-musical)
Participants
- N=30 (1 excluded โ pretest performance suggested existing AP); analyzed N=29
- UChicago students, staff, and community members (M=22.0 years, SD=4.2, range 18โ32; 19 male)
- M=4.6 years music experience (SD=6.0, range 0โ26) โ significantly less musical training than Exp 1 (p=0.02)
Working memory measure โ Auditory n-back (ANB)
- Spoken-letter stream, ISI 3000 ms; press "Target" if current letter matches the one n trials back
- Both 2-back and 3-back versions (in that order), 90 trials each (3 runs ร 30 letters)
- 30-trial practice round before each test version
- Score = d-prime (signal detection theory) per task
- Critical: non-musical, non-pitch task โ tests general auditory WM, not pitch-specific memory
Explicit pitch-labeling task
- Pretest: 48 trials (12 piano notes ร 4 reps, randomized)
- Training: 120 trials (12 ร 5 ร 2 blocks) with audio + visual feedback
- Rote posttest: 60 trials (12 ร 5), no feedback
- Generalization posttest: 48 notes beyond trained timbre + octave range (parallel to Exp 1)
๐ Key Results
Training Effects โ Experiment 1 (N=17)
| Measure | Mean accuracy | SD | vs Pretest |
|---|---|---|---|
| Pretest (1 octave, piano) | 13.7% | 10.8% | โ |
| Rote posttest (same notes) | 36.2% | 19.4% | t(16)=5.35, p<0.001 |
| Generalization posttest (untrained octaves + timbres) | 21.7% | 15.3% | t(16)=2.31, p<0.05 |
Both rote (t(16)=5.91, p<0.001) and generalization (t(16)=3.60, p=0.002) posttests were significantly above chance (8.33%, i.e. 1/12).
Training Effects โ Experiment 2 (N=29)
| Measure | Mean accuracy | SD | vs Pretest |
|---|---|---|---|
| Pretest (1 octave, piano) | 10.9% | 14.5% | โ |
| Rote posttest | 25.8% | 22.3% | t(28)=โ4.11, p<0.001 |
| Generalization posttest | 15.4% | 12.8% | t(28)=4.73, p<0.001 |
Effects smaller than Exp 1 (consistent with the lower musical experience of the Exp 2 sample), but rote and generalization both reliably above chance (rote t(28)=4.21, p<0.001; gen t(28)=2.98, p=0.006).
Working Memory Predicts AP Learning
Experiment 1 (INM as WM measure):
- INM score significantly predicted explicit AP learning (ฮฒ=โ1.065, SE=0.254, p<0.0001)
- Age of music onset predicted AP in isolation (ฮฒ=โ0.115, p<0.01) but dropped to non-significance when INM was added to the model
- Sobel mediation test: t=โ2.16, SE=0.16, p=0.03 โ auditory WM mediates the relationship between musical training and AP learning
- Bootstrapped 95% CI of mediation index: [โ0.20, โ0.08] (does not include zero)
- Adjusted Rยฒ = 0.388 (~39% of variance explained by WM + age of music onset)
Experiment 2 (auditory n-back as WM measure):
- Auditory n-back dโฒ significantly predicted AP learning in isolation (ฮฒ=0.474, SE=0.183, p<0.01)
- In combined model, ANB retained significance (ฮฒ=0.413, SE=0.211, p=0.05) while age of music onset did not (ฮฒ=โ0.013, p>0.5)
- Sobel mediation: t=โ1.63, SE=0.014, p=0.10 (marginal); bootstrapped 95% CI [โ0.074, โ0.030] (excludes zero โ supports mediation)
- Adjusted Rยฒ = 0.237 (~24% of variance explained)
Convergent finding across both experiments: general auditory WM โ whether measured musically (INM) or non-musically (n-back) โ mediates the link between early musical training and adult AP learning.
Six-month Retention (Exp 1, n=6)
- Average delay: M=184 days (SD=22)
- Rote posttest dropped from ~50% (immediate) to ~38% (delayed); generalization from ~29% to ~24%
- Loss of ~11 percentage points from immediate to delayed
- Both still significantly above chance (rote t(5)=5.80, p<0.01; generalization marginal t(5)=2.13, p=0.08)
- No participants reported actively rehearsing notes between sessions
- Caveat: small retest sample (n=6) limits strong conclusions about retention
๐ง Theoretical Implications
Reframing the "Critical Period"
- Traditional view: AP requires exposure during a critical period (typically before age ~6); adults cannot acquire it
- This study: The age-of-onset effect is statistically mediated by auditory WM. Early training may help mainly because it strengthens domain-general auditory WM, not because of a hard temporal window for AP itself
- Quote (authors, pp. 106–107): "The current set of studies cannot directly address whether post-critical period adults can gain absolute pitch ability that is comparable to 'true' AP ability... However, our finding across two studies that auditory working memory can explain the success of non-AP possessors learning absolute pitch categories supports the notion that intermediate levels of absolute pitch ability... might be best conceptualized as a domain-general perceptual learning task, rather than a specifically musical ability"
Two-Step Model of AP
The authors propose AP can be decomposed into two steps:
- Step 1 โ Pitch chroma representation: Form a precise perceptual representation of pitch chroma, separable from other attributes (timbre, octave, loudness). This step likely depends on auditory WM ability.
- Step 2 โ Label association: Assign cultural note labels (e.g., "C", "F#") to those representations. This step depends on explicit training.
The current studies illuminate step 1: high-WM listeners are better at forming a stable pitch representation that can later be labeled, even at adult age.
Domain-General Perceptual Learning
- AP category learning is best framed as perceptual category learning (Goldstone, 1998), not a uniquely musical skill
- Working memory has been shown to predict success in many other category-learning tasks (DeCaro, Thomas, & Beilock, 2008; Lewandowsky et al., 2012)
- AP fits the same mold: implicit/information-integration learning that benefits from high WM for selective attention
Comparison to Gervain et al. (2013)
- The valproate group in Gervain (2013) reached ~28.3% (5.09/18) accuracy after a full week of pharmacologically-assisted training
- In the current study, high-WM individuals (>1 SD above the mean) reached 43.8% rote / 30.5% generalization after a single laboratory session, no drug
- Authors' interpretation: critical-period framing for AP "perhaps... need not be applied" โ what looks like a closed window may instead be the consequence of insufficient general auditory WM in average adults
๐ Connection to Other Research
Theoretical Foundations
- Levitin (1994): Two-component AP theory (memory + labeling) โ anticipated in the present "two-step" framework
- Ross, Gore, & Marks (2003): Pioneered the implicit note memory paradigm used here as the auditory WM probe
- Deutsch & Dooley (2013): AP possessors have larger auditory digit spans than matched non-AP musicians โ supports the WM/AP link directionally
Direct Successor โ Van Hedger 2019 (PLoS ONE)
- The 2019 follow-up uses high auditory WM as a participant-selection criterion โ an explicit operational consequence of the 2015 finding
- 2 of 6 high-WM adults achieved genuine AP after 8 weeks of training, validating the predictive role identified here
- Without the 2015 mediation finding, the 2019 selection rule would be unmotivated; the two papers together form a single argumentative chain
Pharmacological Counterpoint โ Gervain et al. (2013)
- Argues critical period can be reopened pharmacologically (valproate / HDAC inhibitor)
- The current paper offers an alternative explanation for Gervain's success: high-WM participants in a behavioral protocol can reach comparable performance without drugs
- Suggests valproate's apparent benefit may operate via general WM/attention (auditory n-back has been linked to dopamine release in prefrontal cortex), not by strictly "reopening" a critical period
โ ๏ธ Limitations & Caveats
Performance below "true" AP
- Best participants reached ~50โ70% rote / 30% generalization in a single session โ well below the ~85โ95% accuracy typical of natural AP possessors
- Authors are careful: this is "intermediate AP" or AP category learning, not full "true" AP
- Cannot directly determine whether longer training would close the gap
Sample size and generalization
- Exp 1 N=17, Exp 2 N=29 โ small to moderate by perceptual-learning standards
- Both samples drawn from UChicago community (volunteer/convenience)
- Retest sample (n=6) particularly small โ retention claims are tentative
Single-session limit
- One laboratory session (~1 hour) cannot speak to the upper bound of trainable adult AP
- Different from Sakakibara (2014) or Wong (2025) protocols that span weeks/years
WM construct concerns
- Auditory n-back's construct validity as a WM measure has been questioned (Kane et al., 2007); recent work links it more to fluid intelligence than to WM per se
- Authors note INM and ANB share only ~25% variance, suggesting they tap different aspects of WM (quality vs. quantity)
- Could alternative WM measures (RSPAN, OSPAN, reverse digit span) replicate the mediation? Open question
Causal interpretation
- The mediation finding is correlational โ cannot prove that improving WM would improve AP learning
- Possible reverse causality: people born with high auditory ability may have started music earlier because of that ability
๐ฏ Practical Implications
For Adult Learners
- Single-session gains are real: non-AP adults can roughly triple their pretest accuracy (10โ14% โ 26โ36%) in one session of feedback training
- Individual differences matter: auditory WM is a meaningful predictor โ high-WM individuals see substantially larger gains
- Music background is not the only path: when WM is controlled, age of music onset loses predictive power
- Retention exists: even after ~6 months without rehearsal, gains remain above chance (small sample caveat)
For Researchers
- Selection design: screen participants by auditory WM before training โ reduces variance and increases statistical power (this is what Van Hedger 2019 did)
- Mediation testing: always include WM as a covariate when reporting age-of-onset effects on AP learning
- Combined interventions: WM training (e.g., n-back training) plus AP training may compound โ untested but theoretically motivated
For Music Educators
- Pre-screening for auditory WM may help identify students with the highest chance of acquiring AP through training
- The framing "critical period closed" may be misleading for adult students โ recast as "general perceptual capacity" instead
๐ Methodology Details
Stimuli
- INM task (Exp 1): 250-ms sine waves, 4 target notes (F#4, G4, G#4, A4), 8 starting tones (D4, D#4, E4, F4 below; A#4, B4, C5, C#5 above), 33-cent step resolution
- Pitch-labeling task: Real instrumental notes sampled from Reason 4.0 software + Adobe Audition recordings, normalized to 75 dB SPL, 44.1 kHz
- Generalization stimuli (both Exp): 12 piano notes original octave (C4โB4) + 12 piano notes higher octave (C5โB5) + 12 piano notes lower octave (C3โB3) + 12 acoustic guitar notes original octave
- Inter-trial masking: 1000 ms white noise + 2000 ms of 16 randomized scrambled piano notes (suppresses relative pitch use)
Apparatus
- Sennheiser HD280 studio monitor headphones
- 1280ร1024 monitor, 75 Hz refresh rate
- INM run in Psychophysics Toolbox (MATLAB); explicit pitch-labeling run in E-Prime
Statistical Analysis
- Repeated-measures ANOVA for pretest vs rote vs generalization comparisons; Fisher's LSD post-hoc
- Generalized mixed-effects models (binomial link) with WM score and age of music onset as fixed effects, participant and stimulus note as random effects
- Mediation: Sobel test + bootstrapped indirect effects (10,000 samples; Preacher & Hayes 2008)
- One-sample t-tests for above-chance comparisons (chance = 1/12 = 8.33%)
๐ฌ Future Directions Identified by the Authors
- Extended training: Can high-WM adults reach "true" AP levels with longer protocols? (Answered partially by Van Hedger 2019 and Wong 2020/2025)
- WM training transfer: Would n-back or similar WM training programs improve AP trainability?
- Genuine AP populations: Test whether auditory WM also explains within-AP variability among "true" AP possessors
- Mechanism studies: Investigate whether valproate's apparent boost in Gervain (2013) operates via WM/attention enhancement rather than through critical-period reopening per se
- Construct refinement: Compare INM, n-back, RSPAN, OSPAN, reverse digit span as WM proxies for AP learning
๐ก Key Takeaways
๐ฏ Core Result
Across two experiments, auditory working memory significantly predicts how well non-AP adults learn absolute pitch categories โ replicated with two different WM measures (INM and n-back).
๐ Mediation
The age-of-musical-onset effect is statistically mediated by auditory WM. Early training may matter mainly because it shapes general WM, not because of a strict critical period for AP.
๐ Single-session gains
Adults roughly tripled their accuracy from pretest to posttest in one ~1-hour session, with retention above chance at ~6 months (small sample).
๐งช Two-step model
AP = (1) high-resolution pitch chroma representation + (2) cultural label assignment. WM enables step 1.
๐ vs. Valproate
High-WM adults in a single behavioral session matched or exceeded the post-training scores of the valproate group in Gervain et al. (2013) โ without any drug.
๐ Foundation for 2019
Direct theoretical and methodological foundation for Van Hedger et al. (2019), which used high WM as a participant-selection criterion and produced 2 genuine AP achievers in 8 weeks.
๐ Citation
Van Hedger, S. C., Heald, S. L. M., Koch, R., & Nusbaum, H. C. (2015). Auditory working memory predicts individual differences in absolute pitch learning. Cognition, 140, 95โ110. https://doi.org/10.1016/j.cognition.2015.03.012