pitch-shift

Canonical pitch-shifting algorithms in functional JavaScript. Frequency-domain algorithms (vocoder, phaseLock, transient, formant, sms, hpss) shift bins natively; time-domain algorithms (ola, wsola, psola, granular) apply their namesake stretcher from time-stretch then anti-aliased sinc resample. Consistent unified API: batch, stream, multi-channel. Part of the audiojs ecosystem.

Install

npm install pitch-shift

Usage

import pitchShift, { phaseLock, transient, psola, formant, wsola } from 'pitch-shift'

// Auto-select an algorithm from content hints
let auto = pitchShift(audio, { semitones: 5, content: 'voice' })

// Batch processing
let pitched = phaseLock(audio, { ratio: 1.5 })  // pitch up by factor of 1.5

// Streaming (real-time)
let write = phaseLock({ ratio: 1.5 })
let output = write(inputBlock)
let tail = write()  // flush

// Separate-channel stereo
let stereo = phaseLock([left, right], { ratio: 1.5 })

Algorithms

Each algorithm is a canonical pitch-shift implementation with its own character. Frequency-domain algorithms shift bins natively; time-domain algorithms use their namesake stretcher from time-stretch + anti-aliased sinc resample — the canonical form for time-domain pitch shifting.

Algorithm	Domain	Form	Best for
`ola`	Time	OLA time-stretch + sinc resample. Plain overlap-add without similarity search — the baseline the others improve on.	Baseline / general
`vocoder`	STFT	Bin-shift phase vocoder (SMB/Bernsee). True instantaneous frequency per bin, loudest-wins scatter, synthesis phase accumulation.	Simple tonal material
`phaseLock`	STFT	Laroche-Dolson peak-locked vocoder. Peaks get independent frequency shift; non-peak bins preserve phase offset relative to the nearest peak.	General music
`transient`	STFT	Peak-locked vocoder with spectral-flux transient detection. On transient frames, synthesis phase resets to analysis phase, preserving attacks.	Music with percussion
`psola`	Time	PSOLA time-stretch (autocorrelation period → pitch-mark grains) + sinc resample. Formants preserved in the stretch stage.	Speech, monophonic voice
`wsola`	Time	WSOLA time-stretch (per-grain similarity search ±`tolerance`) + sinc resample. Clean time-domain shift without FFT.	Speech, low-latency
`granular`	Time	OLA time-stretch with small grains (1024) + sinc resample. Grain rate is audible — the texture is the point.	Creative textures
`formant`	STFT	Cepstral envelope preservation. Flatten spectrum by the real-cepstrum envelope, vocoder-shift the flat residual, re-impose the envelope.	Voice (preserves formants)
`paulstretch`	STFT	Large-frame phase randomization. Magnitudes gathered from `k/ratio`; phases drawn uniformly from `[0, 2π)`.	Ambient, extreme shifts
`sms`	Sinusoidal	Peak-scaled Spectral Modeling Synthesis. Parabolic-interpolated peak picking → sinusoidal lobes shifted to `round(f·ratio)`, stochastic residual preserved.	Harmonic/tonal
`hpss`	STFT	Fitzgerald median-filter harmonic/percussive separation. Time-axis median → harmonic estimate; freq-axis median → percussive estimate; soft mask; vocoder-shift the harmonic part, pass-through the percussive.	Mixed music (drums+tonal)
`sample`	Time	Playback-rate pitch shift. Hann-windowed sinc interpolation at fractional read-head stepped by `ratio` per output sample. No time preservation — higher pitch = shorter clip (zero-padded tail).	Sampler/tracker instrument playback
`hybrid`	Hybrid	Crossfade between `phaseLock` (frequency domain) and `wsola` (time domain), weighted by per-sample spectral-flux transient confidence. Tonal regions resolve via the phase vocoder; attacks resolve via WSOLA similarity search.	Mixed dynamic material

Choosing an algorithm

Each algorithm preserves a different invariant and surrenders the rest. No single one wins everywhere — the reason to reach for one over another is what it keeps intact by construction and what it must give up for that. The guide below is what each canonical form trades.

ola — OLA time-stretch + sinc resample. Plain overlap-add without similarity search — the baseline the others improve on. Preserves pitch accuracy, amplitude envelope. Destroys formants (shifted by the resample), phase coherence across long spans, transients (grain-rate phase cancellation). Reach for the simplest possible pitch shift, or as a reference to compare against.

vocoder — SMB/Bernsee bin-shift. Recovers the true instantaneous frequency of each bin from the consecutive-frame phase advance, then re-accumulates synthesis phase at the shifted frequency. Preserves dominant-partial pitch and long-horizon phase for each bin independently. Destroys transients (smeared across the frame), vertical phase coherence between adjacent bins ("phasiness"), formants. Reach for simple tonal material and minimal correct spectral pitch shift.

phaseLock — Laroche-Dolson peak-locked vocoder. Locks non-peak bins' synthesis phase to the nearest peak's, keeping the vertical phase relationship inside each sinusoidal lobe intact. Preserves phase coherence around each peak, partial structure, pitch accuracy. Destroys transients (still smeared, less than vocoder), formants. Reach for general music — the "try this first" phase vocoder.

transient — phaseLock plus spectral-flux transient detection. On flagged frames the synthesis phase snaps back to the analysis phase so the attack shape re-emerges verbatim. Preserves everything phaseLock preserves, plus attack localization on detected transients. Destroys formants; misses quiet transients at a too-high threshold and smears them. Reach for music with percussion where phaseLock alone loses the attack.

psola — PSOLA time-stretch (autocorrelation period contour → pitch-synchronous grains) + sinc resample. The stretch stage copies vocal periods verbatim, preserving formant shape; the resample stage changes pitch. Preserves waveform-per-period shape, attack localization, voiced-speech naturalness. Destroys polyphony (assumes a single pitch contour), unvoiced regions (pitch-mark jitter). Reach for monophonic speech, solo voice, or a single melodic instrument with formant structure.

wsola — WSOLA time-stretch (per-grain similarity search ±tolerance) + sinc resample. The similarity search eliminates grain-rate phase cancellation, producing a clean time-domain pitch shift without FFT. Preserves local waveform shape, attack envelopes, pitch accuracy. Destroys formants (shifted by the resample), phase coherence across long spans. Reach for low-latency speech, or anywhere the phase vocoder's frame is unacceptable.

granular — OLA time-stretch with small grains (1024) + sinc resample. No similarity search, no pitch sync. The grain rate is clearly audible — the texture is the point. Preserves grain-local timbre and a characteristic textural quality. Destroys pitch accuracy on complex tones, smooth envelopes. Reach for creative/textural effects where the grain character is the point.

formant — Cepstral envelope preservation wrapping a vocoder shift. Lifter-flatten the spectrum by its real-cepstrum envelope, shift the flat residual in bin space, re-impose the envelope unchanged. Preserves formant envelope (absolute Hz), vocal-tract character. Destroys what vocoder destroys (transients smear), risks cepstral ringing on very noisy or very sparse spectra. Reach for voice shifting without the chipmunk/giant artifact.

paulstretch — Large-frame phase randomization. Magnitudes are gathered from source bins at k/ratio; phases are redrawn uniformly from [0, 2π) every frame. Preserves long-term magnitude-spectrum statistics. Destroys phase, transients, any rhythmic micro-structure — by design. Stream-vs-batch decorrelates inherently, which is why the metric is marked —. Reach for ambient/drone textures and extreme shift ratios where the smear is the aesthetic.

sms — Peak-scaled Spectral Modeling Synthesis. Parabolic-interpolated peak picking builds a small track list of (freq, mag, phase) triples; each peak's lobe is copied intact to round(f·ratio); the stochastic residual is left unshifted. Preserves formant envelope (lobes scale freely with their peaks), harmonic structure, tonal clarity. Destroys transients, noise-like textures (absorbed into the residual), polyphonic material beyond maxTracks. Reach for sustained tonal/harmonic instruments and vowels where envelope matters.

hpss — Fitzgerald 2010 median-filter harmonic/percussive separation. Time-axis median → harmonic-friendly view; freq-axis median → percussive-friendly view; soft Wiener mask at exponent p splits the spectrogram. The harmonic component is vocoder-shifted; the percussive component passes through at its original phase. Preserves percussive onset locations (unshifted) and harmonic pitch (shifted). Destroys a little signal quality to mask leakage in both directions on ambiguous material. Reach for mixed music where drums and tonal content coexist and the kit should stay stationary while the melody moves.

sample — Playback-rate pitch shift: Hann-windowed sinc interpolation at a fractional read-head stepped by ratio per output sample. The intuition hardware samplers and tracker modules run on. Preserves waveform identity (literally the same audio, faster or slower) and formants trivially — everything scales together. Destroys time: output duration is input_length / ratio, and the tail is zero-padded to keep the unified API. Reach for instrument one-shots, ROM-sample playback, any context where "higher pitch = shorter clip" is the intended effect.

hybrid — Runs phaseLock and wsola in parallel and crossfades sample-by-sample by a transient-confidence signal from spectral flux. Tonal regions resolve via the phase vocoder; attacks resolve via WSOLA similarity search. Preserves phase coherence on tonal regions and attack shape on transients — simultaneously. Destroys CPU budget (≈2×), strict low-latency causality (the detector looks both ways), formants. Reach for mixed dynamic material where a single domain compromises the other.

Measured quality

Each algorithm is measured across ten canonical properties on synthetic fixtures with exact ground truth. The shift column is a direct log-magnitude distance between the algorithm output and a canonically generated shifted reference (e.g. sine(660) as the ground truth for pitchShift(sine(440), 1.5)) — no heuristic, no proxy metric. Run npm run quality for the live numbers.

Algorithm	f0 err	THD%	alias	stream corr	cent err	attack corr	formant dist	phase coh	shift
`hpss`	0.00	0.0	0.052	1.000	0.007	0.996	1.267	0.922	1.464
`vocoder`	0.00	0.0	0.000	1.000	0.006	0.983	1.343	0.922	1.491
`formant`	0.00	0.0	0.000	1.000	0.061	0.984	1.000	0.981	1.616
`ola`	1.00	0.2	0.005	1.000	0.003	0.995	2.345	0.869	1.650
`wsola`	1.00	0.2	0.005	1.000	0.003	0.995	2.345	0.869	1.650
`sample`	2.50	0.1	0.007	1.000	0.003	0.951	2.245	—	1.655
`sms`	0.00	0.0	0.002	1.000	0.001	0.953	2.028	0.922	1.761
`pitchShift` (auto)	0.00	0.0	0.000	1.000	0.012	0.985	1.600	0.993	1.795
`transient`	0.00	0.0	0.000	1.000	0.012	0.985	1.600	0.993	1.795
`phaseLock`	0.00	0.0	0.000	1.000	0.012	0.986	1.591	0.993	1.796
`granular`	1.00	0.2	0.005	1.000	0.003	0.995	2.903	0.946	1.916
`hybrid`	0.00	0.0	0.000	1.000	0.001	0.986	2.499	0.711	1.945
`psola`	0.66	0.2	0.005	1.000	0.003	0.941	2.340	0.998	1.954
`paulstretch`	1.67	0.3	0.223	—	0.061	0.961	7.449	—	2.241

Columns:

f0 err (Hz) — pitch accuracy shifting a 440 Hz sine to 660 Hz. Zero-crossing estimator over the active signal region.
THD% — total harmonic distortion on the shifted pure sine (up to 8 harmonics).
alias — active-region RMS of output / input when shifting a 14 kHz sine by ×2. Canonical behaviour is near zero (nothing valid above Nyquist); time-domain stride-reads fold energy back.
stream corr — streaming vs batch correlation on the 440 Hz sine. Marked — for algorithms whose phase or grain jitter decorrelates on pure tones even when producing valid output (paulstretch randomizes phases, psola jitters pitch marks).
cent err — spectral centroid ratio error on a 3-partial chord. Lower means the timbre shifts by exactly ratio.
onset err — period error of a 100 Hz Dirac impulse train after shift. Measures how well impulse locations survive.
attack corr — plucked-string attack envelope correlation against the input.
formant dist — cepstral envelope distance on a synthetic vowel. Lower = formants stay put. formant dominates here by construction.
phase coh — AM-envelope coherence on a 5 Hz tremolo. Goertzel-extracted modulation depth, min(out, in) / max(out, in). 1.0 means the slow envelope survives the shift intact. Marked — for paulstretch (random phase is non-deterministic) and sample (time-compresses, so the modulation rate itself shifts).
shift (log-mag) — direct log-magnitude spectral distance between the algorithm output and the canonical shifted reference, averaged over four harmonic ground-truth fixtures: sine(660), sineChord(330, [1,1.25,1.5]), karplusStrong(330), and amSine(660). Gain- and phase-invariant. Bold = leader. The single best "how close to the ideal pitch shift" number.

Notes. formant, hpss, and sms dominate formant preservation by construction. transient dominates transient preservation on drum material even though attack corr on a plucked string is close across algorithms. paulstretch stream-vs-batch is marked — because random phase synthesis decorrelates by design. See scripts/fixtures.js and scripts/metrics.js for the full rig.

Options

All algorithms accept:

Option	Default	Description
`ratio`	`1`	Pitch shift ratio (1.5 = +5 semitones, 2 = +1 octave)
`semitones`	from ratio	Pitch shift in semitones
`content`	`music`	Auto-select hint for the default export: `music`, `voice`, `speech`, `tonal`
`method`	auto	Explicit algorithm for the default export
`formant`	`false`	Use formant-preserving shifting through the default export
`frameSize`	`2048`	Frame size in samples
`hopSize`	`frameSize/4`	Hop between frames

Algorithm-specific options:

transient: transientThreshold (default: 1.5) — z-score over log-flux EMA
psola: sampleRate, minFreq (default 70), maxFreq (default 600)
wsola: tolerance (default frameSize/4) — similarity search radius
formant: envelopeWidth (default max(8, N/64)) — cepstrum lifter cutoff
sms: maxTracks (default Infinity), minMag (default 1e-4)
hpss: hpssTimeWidth (default 17 frames), hpssFreqWidth (default 17 bins), hpssPower (default 2) — median window sizes and soft-mask exponent
sample: sincRadius (default 8) — windowed-sinc half-width in samples
hybrid: hybridThreshold (default 0.8) — spectral-flux z-score threshold for full WSOLA blend

Default export selection:

voice / speech → psola
tonal → sms
everything else → transient

Variable pitch (pitch curves)

All frequency-domain algorithms (vocoder, phaseLock, transient, formant, sms, paulstretch, hpss) and sample accept a time-varying ratio — either a function (timeSeconds) => ratio or a Float32Array sampled uniformly across the input duration. STFT-based algorithms evaluate the ratio per frame; sample evaluates per output sample.

// Sinusoidal vibrato: ±10% pitch at 5 Hz
let vibrato = phaseLock(audio, {
  ratio: (t) => 1 + 0.1 * Math.sin(2 * Math.PI * 5 * t),
  sampleRate: 44100,
})

// Glissando from unison to +1 octave across a 2-second clip
let glide = sample(audio, {
  ratio: new Float32Array([1, 1.25, 1.5, 1.75, 2]),
  ratioDuration: 2,
  sampleRate: 44100,
})

Time-domain algorithms (ola, wsola, psola, granular) and hybrid reject function/array ratio — their stretch+resample form applies a single global ratio to the whole signal.

Pitch correction

Variable pitch enables pitch correction when combined with a pitch detector. Detect the sung pitch per frame, compute the ratio to the nearest target note, and pass the correction curve as a ratio function:

import { yin } from 'pitch-detection'
import { formant } from 'pitch-shift'

// 1. Detect pitch per frame
let hop = 512, sr = 44100
let pitchFrames = []
for (let i = 0; i + 2048 <= audio.length; i += hop) {
  let r = yin(audio.subarray(i, i + 2048), { fs: sr })
  pitchFrames.push(r ? { freq: r.freq, clarity: r.clarity } : null)
}

// 2. Snap to nearest scale degree
let scale = [261.63, 293.66, 329.63, 349.23, 392.00, 440.00, 493.88] // C major
let snap = (f) => scale.reduce((a, b) =>
  Math.abs(Math.log2(b / f)) < Math.abs(Math.log2(a / f)) ? b : a
)

// 3. Build correction curve and apply
let corrected = formant(audio, {
  ratio: (t) => {
    let idx = Math.min(Math.round(t * sr / hop), pitchFrames.length - 1)
    let p = pitchFrames[idx]
    if (!p || p.clarity < 0.5) return 1  // unvoiced → no correction
    return snap(p.freq) / p.freq
  },
  sampleRate: sr,
})

formant is the natural choice for voice correction (preserves vowel character). For the hard-tune "auto-tune effect", use phaseLock with aggressive snapping. For harmonic instruments, sms preserves partial structure while following the correction curve.

Examples

import { phaseLock, transient, psola, formant, granular, wsola, sms, hpss, sample, hybrid } from 'pitch-shift'

// Music with drums
let result = transient(audio, { ratio: 1.5 })

// Mixed music (drums + tonal content) with harmonic/percussive separation
let mixed = hpss(audio, { ratio: 1.5 })

// Hybrid: transient-gated crossfade between phase vocoder and WSOLA
let dynamic = hybrid(audio, { ratio: 1.5 })

// Voice (formant-preserving)
let voice = formant(audio, { semitones: 5 })

// Speech
let speech = psola(audio, { ratio: 0.75, sampleRate: 48000 })

// Tonal/harmonic
let tonal = sms(audio, { ratio: 2 })

// Creative granular
let grainy = granular(audio, { ratio: 1.3 })

// Explicit WSOLA alias
let speech = wsola(audio, { ratio: 0.85 })

// Sampler-style playback rate (instrument one-shots)
let played = sample(instrumentBuffer, { semitones: 7 })

Streaming

All algorithms support block-by-block streaming:

let write = phaseLock({ ratio: 1.5 })

// Process audio in chunks
let chunk1 = write(inputBlock1)   // → Float32Array
let chunk2 = write(inputBlock2)
let tail = write()                // flush remaining

Stereo/Multi-channel

Process channels independently:

let leftOut = phaseLock(leftChannel, { ratio: 1.5 })
let rightOut = phaseLock(rightChannel, { ratio: 1.5 })

// Or pass separate channels together
let [leftShifted, rightShifted] = phaseLock([leftChannel, rightChannel], { ratio: 1.5 })

Quality Tools

npm test
npm run quality
npm run bench

npm run quality reports pitch accuracy, stream-vs-batch correlation, stereo handling, and high-frequency attenuation.

Dependencies

time-stretch — Time-domain stretchers (WSOLA, PSOLA) used by time-domain pitch-shift algorithms
fourier-transform — FFT
window-function — Hann windowing

Migration from `pitch-shift` v0.0.0

The package name was previously held by mikolalysenko/pitch-shift (2013, frozen at v0.0.0). That package implements a single time-domain algorithm: per-frame Hann windowing → detect-pitch autocorrelation period → scalePitch linear interpolation → findMatch splice-point similarity search → overlap-add. This is the canonical WSOLA/TD-PSOLA pattern.

The same algorithm is available here as wsola (with per-grain cross-correlation search) or psola (with autocorrelation pitch marks). Both are native implementations without external pitch-detection dependencies and support batch, streaming, and multi-channel.

The old callback API:

// v0.0.0 (old)
var shifter = require('pitch-shift')(onData, t => ratio, { frameSize: 2048 })
shifter.feed(float32Array)

New equivalent:

// v1 (this package)
import { wsola } from 'pitch-shift'
let write = wsola({ ratio })
let out = write(float32Array)
let tail = write()  // flush

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
demo.html		demo.html
formant-shift.js		formant-shift.js
granular.js		granular.js
hpss.js		hpss.js
hybrid.js		hybrid.js
index.d.ts		index.d.ts
index.js		index.js
license.md		license.md
ola.js		ola.js
package-lock.json		package-lock.json
package.json		package.json
paulstretch.js		paulstretch.js
phase-lock.js		phase-lock.js
pitch-shift.js		pitch-shift.js
psola.js		psola.js
sample.js		sample.js
sms.js		sms.js
stft.js		stft.js
test.js		test.js
transient.js		transient.js
util.js		util.js
vocoder.js		vocoder.js
wsola.js		wsola.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pitch-shift

Install

Usage

Algorithms

Choosing an algorithm

Measured quality

Options

Variable pitch (pitch curves)

Pitch correction

Examples

Streaming

Stereo/Multi-channel

Quality Tools

Dependencies

Migration from `pitch-shift` v0.0.0

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pitch-shift

Install

Usage

Algorithms

Choosing an algorithm

Measured quality

Options

Variable pitch (pitch curves)

Pitch correction

Examples

Streaming

Stereo/Multi-channel

Quality Tools

Dependencies

Migration from pitch-shift v0.0.0

Related

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Migration from `pitch-shift` v0.0.0

Packages