Room Response compensation

In the previous post I measured the Room Impulse Response and tried a naive way of its compensation - the Weiners Deconvolution. The next step I took was figuring out how can I correct the frequency and phase characteristics.

Obviously, once you've measured some imperfectness, you want to fix it somehow. An evident approach is to design a FIR filter using this function, but, for no reason, I didn't feel like doing that and wanted to learn something new. I wanted to learn how true scientists solve this problem.

I found some methods, but it seems they're not suitable for this problem. In this post, I describe my researches of these hard ways, and later, I hope, I'll do a simple and straightforward compensation.

An Active Room Compensation

ARC is supposed to adjust a played back sound so that a listener would hear its original version. ARC is supposed to vanish out rooms acoustic influence. It performs as an equalizer which is tuned by a computer.

The same task is solved in other areas such as:

However, there's no single approach that suits all needs. For example, equalizers in wireless communication aimed to correct a channel with a wide bandwidth and smallest power consumption. In the problem I'm digging into, one can do whatever they want with a room and equipment, there's no need in real-time processing, no deficit of processing power or memory. The problem I dealt with was a lack of spare time only.

The math background

For the simplicity's sake, let's assume we just need to solve an inverse task of a convolution. The result of a deconvolution is an estimation of the \(x\), knowing \(y\) and hoping that we've estimated \(h\) correctly:

\[h \ast x = y \]

This problem has no precise solution, but there are several approaches to it. I tried to find any paper relevant to the ARC, but a short search wasn't very successful, so I adopted the algorithm from this paper, which corresponds to some close but not the same problem.

This method is based on a simple trick: just change the convolution with \(h\) in the equation above to a matrix multiplication and you'll get a well-known problem - an overdetermined system of linear equations. The matrix, that represents the Room Impulse Response \(h\), is a Toeplitz matrix with values of coefficients of \(h\), then the previous equation transforms to this one:

\[\begin{bmatrix} h_{0} & 0 & 0 & 0 & \dots & 0 & 0 & 0 \\ h_{1} & h_{0} & 0 & 0 & \dots & 0 & 0 & 0 \\ h_{2} & h_{1} & h_{0} & 0 & \dots & 0 & 0 & 0 \\ h_{3} & h_{2} & h_{1} & h_{0} & \dots & 0 & 0 & 0 \\ \ddots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots \\ h_{n-3} & h_{n-4} & h_{n-5} & h_{n-6} & \dots & h_{0} & 0 & 0 \\ h_{n-2} & h_{n-3} & h_{n-4} & h_{n-5} & \dots & h_{1} & h_{0} & 0 \\ h_{n-1} & h_{n-2} & h_{n-3} & h_{n-4} & \dots & h_{2} & h_{1} & h_{0} \end{bmatrix} \cdot \begin{bmatrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \\ \vdots \\ x_{n-3} \\ x_{n-2} \\ x_{n-1} \end{bmatrix} = \begin{bmatrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ \vdots \\ y_{n-3} \\ y_{n-2} \\ y_{n-1} \end{bmatrix} \]

So, we are to simply build a proper \(h\) and \(y\), and ask any solver to find the best \(x\). The only problem here is that the matrix \(h\) can't be big enough, otherwise the system of equations becomes too big. My laptop is able to deal with up to 2048 numbers across.

Generating an inverse IR filter

This method performs so-called piece-wise deconvolution, which isn't very useful for real-time processing. I want to build a FIR filter that I'd be able to apply to a signal in order to fix an acoustic environment. I call this filter the inverse impulse response (IR). The inverted IR \(k\) should satisfy the following equation:

\[ h\ast (k \ast x) = x \]

This equation reveals that the original signal should be prefiltered with \(k\) just before playing it through speakers (that are the filter \(h\)).

We can estimate \(k\) implying that it is an invertion of \(h\) and solving the following equation using the same approach that has been showed recently:

\[ h \ast k = \delta \]

\( \delta \) here is the digital delta impulse, whose all cells except one equal to zero and the remaining single sample equals to one.

The next step is pretty straightforward, we just try to estimate \( k \) in the following equation:

\[\begin{bmatrix} h_{0} & 0 & 0 & 0 & \dots & 0 & 0 & 0 \\ h_{1} & h_{0} & 0 & 0 & \dots & 0 & 0 & 0 \\ h_{2} & h_{1} & h_{0} & 0 & \dots & 0 & 0 & 0 \\ h_{3} & h_{2} & h_{1} & h_{0} & \dots & 0 & 0 & 0 \\ \ddots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots \\ h_{n-3} & h_{n-4} & h_{n-5} & h_{n-6} & \dots & h_{0} & 0 & 0 \\ h_{n-2} & h_{n-3} & h_{n-4} & h_{n-5} & \dots & h_{-1} & h_{0} & 0 \\ h_{n-1} & h_{n-2} & h_{n-3} & h_{n-4} & \dots & h_{-2} & h_{-1} & h_{0} \\ 0 & h_{n-1} & h_{n-2} & h_{n-3} & \dots & h_{-3} & h_{-2} & h_{-1} \\ 0 & 0 & h_{n-1} & h_{n-2} & \dots & h_{-4} & h_{-3} & h_{-2} \\ 0 & 0 & 0 & h_{n-1} & \dots & h_{-5} & h_{-4} & h_{-3} \\ \ddots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots & \ddots \\ 0 & 0 & 0 & 0 & \dots & 0 & 0 & h_{n-1} \end{bmatrix} \cdot \begin{bmatrix} k_{0} \\ k_{1} \\ k_{2} \\ k_{3} \\ \vdots \\ k_{n-3} \\ k_{n-2} \\ k_{n-1} \\ k_{n} \\ k_{n+1} \\ k_{n+2} \\ \vdots \\ k_{m} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \\ \vdots \\ 0 \\ 0 \\ \textbf{1} \\ 0 \\ 0 \\ 0 \\ \vdots \\ 0 \end{bmatrix} \]

It's almost the same system of equations except that I added the tail to \(h\) and \(y\) in order to deal with the edge effect on the end of the convolution. This is why \(k\) is bigger than \(h\) in my experiments.

Results

To debug an implementation of this algorithm I made some simulation setup: I used the Room Response that I obtained from the measurements from the previous post, a test signal is a simulated sum of several sines with additive noise. This is what I've got:

Room Response Compensation

There is no Room Response and its inversion on the image as it's not interesting at all, really. On the other hand, the image consists of:

  • the quasi-delta impulse is convolution of the Room Response and its computed inversion - the most exhilarating plot for me;
  • the spectrum of the quasi-delta-impulse;
  • comparison of the virgin signal (before play) - Y_original, blue - and the one that was prefiltered with the inverted room response and then "played with speakers" (which was just simulated by convolution with the Room Response) - Y, green;
  • spectrum of the room response and its inversion;
  • residual is a difference between the virgin signal and prefiltered and played one.

Test signal Y_original is obtained with this code:

# Making the Y signal -- sum of couple of sines and awg-noise:
time = np.array(range(irr_len*4))  
A = [0.5, 0.2]  
freq=[np.pi/16, np.pi/20]  
phi = [0, np.pi*1.3]  
Y_original = sum([a*np.sin(2*np.pi*f*time + p) for a,f,p in zip(A, freq, phi)])  
# Add some noise
Y_original += np.random.normal(0, 0.27, size=Y_original.shape[0])  

Then this signal was prefiltered and then it was played in the next few lines:

# Prefilter it with inverse room response
Y_predistorted = fftconvolve(Y_original, inv_room_response)[:Y_original.shape[0]]

# Filter it like it was played through the speakers (note: we do it with the long version of
# the room response):
Y = fftconvolve(room_response, Y_predistorted)[:Y_predistorted.shape[0]]  

Implmentation

I solved this system with scipy's lstsq with a default solver suitable for this purpose.

Here is the code of building this matrix equation and solving it:

def inverted_ir(IR, invir_len=None):  
    '''
    Calculates reverse impulse response so that we can use it as
    a predistortion filter.
    IR -- the estimated impulse response.
    invir_len -- the length of the target reversed IR.
    '''
    ir_len =  IR.shape[0]

    if invir_len is None:
        invir_len = ir_len

    # The digital delta-impulse:
    Y = np.zeros(invir_len + ir_len-1)
    Y[ir_len - 1] = 1

    ir_zpadded = np.append(IR, np.zeros(invir_len-ir_len))

    H = toeplitz(ir_zpadded, np.zeros(invir_len))
    # Some magick for dealing with an edge-effect.
    H_tail = toeplitz(np.zeros(Y.shape[0]),     \
                    ir_zpadded[-1::-1])[:ir_len-1]
    H = np.append(H, H_tail, axis=0)

    inverse = lstsq(H, Y)[0]
    inverse /= np.max(fftconvolve(IR, inverse))
    return inverse

Note the fact that H matrix is built of two parts (lines 19 and 21), the latter part forms an edge effect of the convolution in order to emulate convolution more precisely.

This code performs on small pieces of a signal as a solving system of linear equations and it consumes much time on pieces larger than 1024 samples.

More elegant approach

It's always useful to talk to someone in order to have all thoughts sorted in the mind. While I was writing this post I thought it should be much efficient to express gradient of error function analytically and feed it a solver.

This is the how I define error function and its partial derivations, that make gradient:

\[E(k) = \Sigma (h \ast k - \delta)^2 \] \[\frac{dE}{dk_i} (k) = H*H^T * k - H^T * \delta\]

Its computation is pretty simple and is checked here:

H_sqr = np.matmul(H.T, H)  
H_delta = np.matmul(H.T, delta)

def grad(k0):  
    # Calculate gradient with single matrix multiplication
    return (np.matmul(H_sqr, k0)- H_delta)

# Find the minimum -- res.x:
res = minimize(E, k0, method='BFGS', jac=grad)  
plot(fftconvolve(res.x,h))  

However this approach worse than just applying lstsq as the results are equal and lstsq is way faster. Though I practiced in math again :-)

Conclusion

These exercises didn't give me a practical solution as both methods are too computationally demanding. If I set my sights a little lower and stop asking for a perfect inversion of an Impulse Response, I can correct frequency response and group delay separately.

Mikhail Baranov

Read more posts.

Moscow, Russia