Real-Time Resampling
We do it using a 50-tap windowed sinc filter whose coefficients are calculated in real-time. The following paragraphs explain the issue in more detail.
Introduction
Let us for example take the case where we want to play back an audio file at a different speed than its recording speed (like playing a vynil record at 33 instead of 45 RPM or vice-versa). It can be done using sample rate conversion (or simply resampling), and playing back the file at a constant "standard" rate (usually 44100 Hz). If we are playing the file at its original speed (100% speed), we are moving from the first to the last sample with a step equal to 1, and output the values of the samples 44100 times per second. If we have to play back the file at a lower speed, the step must be lower than 1, or it must be a fractional number between 0 and 1. If the speed is higher than 100%, the step must be higher than 1, and again can be a fractional number. Then we output the new values 44100 times per second again. But if the step is fractional, how can we get the values of these intermediate points? Of course the easiest and the fastest way is to use the nearest neighbour (zeroth order polynomial interpolation), but even without knowledge of digital signal processing theory, it is obvious that this method will produce serious artefacts and audible distortion. The problem is to interpolate between the samples of the original file to find the new values to be played.
The First Method
Time domain interpolation between the samples using polynomial interpolation. If the polynomial is of infinitely high order, the resulting value at the calculated point will have infinitely high precision. With 16-bit accuracy, the polynomial will have to be of order somewhere between several hundreds and several thousands. Lagrange interpolation can be used, but it is very time-consuming.
The simplest resampling routine uses a linear interpolation (first order polynomial) to approximate the exact values between the samples. Linear interpolation has two major problems:
*The magnitude (and the power) of all frequencies is reduced. The high frequencies are reduced more than the others, and it is very audible.
*The lost power produces non-harmonic "parasitic" signals with magnitude comparable to the magnitude of the true signal at high frequencies (over 10 kHz).
As a strength side of the linear interpolation, we can point that it is very fast – our test assembly code could be executed even on a 386 in real time!
A cubic (third order) interpolation can be used instead, but the results are only slightly (about 10 dB) better, and the execution time is longer, although a 486 can still execute it in real time.
The Method We Use
It is called "band limited interpolation", and is a frequency-domain interpretation of the interpolation problem. We will not go deeply into the theory of that kind of interpolation, but will only say that an interpolation filter is used to reconstruct the original analogue signal from its samples, however in our case it is done digitally, and only at specified points. The theory states that the ideal interpolation filter is the sinc function [sin(x) / x, –∞ < x < +∞]. Of course it is unrealisable. For 16-bit accuracy, several hundreds or thousands of points in the filter kernel have to be used. A very good approximation of the ideal sinc filter kernel is the so called "windowed sinc function", where the sinc function is truncated and forced to lessen its values faster toward the ends. This is done by multiplying it with an appropriate weighting function, which also has to be calculated.
Our resampling routine uses a 50-tap FIR (windowed sinc) filter to interpolate between the samples. We believe that the quality of this filter exceeds the capabilities of the human ear. Its non-harmonic signal attenuation is more than 96 dB for frequencies below 16 kHz, which exceeds the quality of the 16-bit representation of the signal! Up to 18-19 kHz, the attenuation of the non-harmonic signals is still very high. And there are no high-frequency losses!
The bad side is that it is relatively time-consuming. Because resampling is not a linear process, the filter coefficients are different at each moment, and have to be either calculated each time, or stored in a very long table. We do not use a table, but calculate them and do the convolution (filtering using the calculated values) in real-time. This requires calculation of sine and cosine, division and many multiplications and additions per a single point.
We have three different highly optimized versions of our resampling routine – FPU, SSE and 3DNow! All of them feature the same algorithm, which consists of real time calculation of 50 filter coefficients for each output sample (quite time-consuming), plus the filtering process itself using the calculated values.
*1*) The FPU resampling routine runs on all x86 processors which have a x87 floating point unit. It takes about 25% of the CPU time on Intel Pentium II 450 MHz, about 6% on Intel Pentium 4 – 2.1 GHz, about 54% on AMD K6-2 450 MHz, about 88% on VIA Ezra-T 600 MHz and about 4% on AMD Athlon XP 2000+ for a stereo file.
*2*) The SSE resampling routine runs on Intel Pentium III, Pentium 4, AMD Athlon XP, Athlon 64, VIA Nehemiah and all current and future processors which support or will support Intel MMX technology and Streaming SIMD Extensions. The routine uses highly optimized MMX and SSE code and takes about 2.5% of the CPU time on Intel Pentium 4 – 2.1 GHz and AMD Athlon XP 2000+ for a stereo file.
*3*) The 3DNow! resampling routine runs on AMD K6-2, K6-III, Athlon, Athlon XP, Athlon 64, VIA Ezra-T, and all current and future processors which support or will support Intel MMX and AMD 3DNow! technology. It uses highly optimized MMX and 3DNow! code and takes less than 15% of the CPU time on AMD K6-2 450 MHz, about 3% on AMD Athlon XP 2000+ and about 36% on VIA Ezra-T 600 MHz for a stereo file.
Note:The percentage numbers are not exact and are given for clarity only. In other words, "your mileage may vary".
Other Known Methods
There is another algorithm which uses the Fast Fourier Transform (FFT) to interpolate a whole block of samples.