Starting a journey to master Audio DSP & ML Audio, I already have a basic knowledge, but I want to attain mastery. Everything I learn, I share here. From Sound Physics to Real-time C++ & Neural Audio.
first post back, and I'm diving straight in. I've been thinking about this one while I was away
What if you could analyse a signal at multiple resolutions at once?
How much oversampling is enough? 2x is marginal, 4x handles most saturation and overdrive cases. 8x gets into professional quality but costs more CPU, 16x starts showing diminishing returns. Most of what I am reading points to 4x as the practical sweet spot for nonlinear stages.
Why does distortion processing need oversampling. You write a soft clipping function, run it at 44.1kHz and hear faint aliasing artifacts. Oversample 4x and they disappear. Been working through exactly why that happens.
Oversampling pushes the problem out of the audible range. Upsample 4x to 176.4kHz, apply the nonlinear processing, harmonics now land at 80kHz & above, lowpass filter at 22.05kHz, then downsample back. Same character, no aliasing into the audible band. The order of steps matters
Nonlinear processing like clipping & saturation creates harmonics. A 10kHz sine through a hard clipper generates 20kHz, 30kHz, 40kHz & beyond. At 44.1kHz the Nyquist is 22.05kHz. That 30kHz harmonic aliases back to 14.1kHz, a new inharm tone that was never in the original signal
I'm exploring resampling libraries. libsamplerate with SRC_SINC_BEST_QUALITY seems to be what most people point to for quality. r8brain is free for commercial use. Speex resampler comes up a lot for real time work. JUCE Lagrange Interpolator for simpler cases.
Converting 44.1kHz to 48kHz correctly is not as simple as it looks. The naive approach of upsampling, filtering and then downsampling is wasteful. Polyphase decomposition is the more interesting path
Polyphase decomposition splits h[n] into L components, h_k[n] = h[nL + k] for k = 0 to L-1. Each output sample uses only one component applied directly to input, no zeros computed. Complexity drops by a factor of L. I'm still working through the full picture but that part clicked
44100 to 48000Hz means multiplying by 160/147. For the naive process, upsample by 160 inserting 159 zeros between each sample, lowpass filter at 22050Hz, then downsample by 147. The complexity is O(N·160·filter length). Most of that work goes toward samples that get thrown away.
Parks-McClellan is built on Chebyshev approx. The optimal solution has error that ripples equally across all bands. Guess extremal freq, solve for equiripple coefficients, find new extremal freq, repeat until convergence. Min. tap count for a given spec. This makes it interesting
What is the most mathematically optimal way to design an FIR filter. The Parks-McClellan algorithm finds the filter that minimises the maximum error across all frequencies at once. I'm still working through why it beats windowed FIR design. Breaking it down.
The window method takes an ideal impulse, truncates it, then multiplies by a window. The problem is that the error is not evenly distributed. Passband gets different error than stopband and you end up using more taps than the spec actually needs. Easy to use but not efficient.
In summary, reading a pole zero plot. Zeros on the unit circle are freq nulls. Poles just inside are sharp resonances. All poles inside means stable. Poles near z=1 boost bass, near z=-1 boost treble.
The pole zero plot is a map of a filter's character. Once you can read it you can design and debug filters just by looking at where poles and zeros sit in the z-plane. Once you build that intuition, it starts to click.
Poles are where the filter resonates. A pole near the unit circle at angle θ creates strong reso at that freq. A pole on the unit circle means instability a pole at r·e^(jω₀) with r = 0.98 & ω₀ =2π·440/44100 u get a sharp reso at 440Hz. The closer r is to 1, the longer it rings
Zeros are where the filter goes silent. H(z) = 0 at zero locations. A zero on the unit circle at angle θ creates a perfect notch at f = θ·Fs/(2π). Place z = -1 and you get a notch at Nyquist. That is exactly what y[n] = x[n] - x[n-1] does. Zeros go where you want silence.
For capturing impulse responses of acoustic spaces which method do you reach out for. Sine sweep with deconvolution for clean detailed results, MLS sequence for speed and good SNR, a starter pistol or balloon pop for simplicity. do you just buy commercial IR packs?
A speaker cabinet is approx. an LTI system. Capture its impulse response & you have captured its entire sound character. Convolve that IR with a clean guitar DI signal & it sounds like that cabinet. That is how speaker simulation plugins work