NLMS and ERL above 0dB

  • Follow


Hi

I'm playing around with a traditional NLMS algorithm for acoustic echo
cancellation.

It seems like the NLMS algorithm have a hard time cancelling the echo
as soon as the ERL level creeps above 0dB.

Is that to be expected?

The speaker signal is pure speech (no noise) and I have linearly
filtered the speaker signal to have a simulated microphone signal. The
adaptive filter order is 512 taps.
I have made a test script which tests the NLMS algo with various step
sizes ranging from very small (0.00001) to 1. For low step sizes
nothing happens (the echo is not cancelled) and
then at some point it just breaks (the error has a huge output ...like
an impulse)...

Is there a way to calculate the optimal step size given no
interference (i.e. no near-end signal) and also given that there is
some interference ?

0
Reply mjames2393 (69) 6/8/2012 12:58:59 AM

On 6/7/12 8:58 PM, Mauritz Jameson wrote:
> Hi
>
> I'm playing around with a traditional NLMS algorithm for acoustic echo
> cancellation.
>
> It seems like the NLMS algorithm have a hard time cancelling the echo
> as soon as the ERL level creeps above 0dB.

what's an ERL?  echo <something> level?


> Is that to be expected?

if you are using an LMS filter to cancel an echo that is bigger than the 
direct sound, i wonder if the adaptation mechanism in the LMS filter 
might go unstable.  dunno if the same happens for normalized LMS.

the adaptation can be sorta modeled like a Markov process based on the 
cross-correlation of the input against uncorrected feedback path.  it's 
been a while since i saw this set up.  but it's kinda like a 
discrete-time state-variable control system with the LMS filter taps as 
the system states.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."


0
Reply rbj (3920) 6/8/2012 2:58:34 AM


ERL is echo return loss...

It's calculated in dB as 20*log10(var(echo signal) / var(speaker
signal)) and it tells you how strong the echo is relative to its
source.

The algorithm I am using is described here:

http://en.wikipedia.org/wiki/Least_mean_squares_filter#Normalised_least_mean_squares_filter_.28NLMS.29


0
Reply mjames2393 (69) 6/8/2012 3:51:11 AM

On 6/7/12 10:58 PM, robert bristow-johnson wrote:
> On 6/7/12 8:58 PM, Mauritz Jameson wrote:
>>
>> I'm playing around with a traditional NLMS algorithm for acoustic echo
>> cancellation.
>>
>> It seems like the NLMS algorithm have a hard time cancelling the echo
>> as soon as the ERL level creeps above 0dB.
....
>> Is that to be expected?
>
> if you are using an LMS filter to cancel an echo that is bigger than the
> direct sound, i wonder if the adaptation mechanism in the LMS filter
> might go unstable. dunno if the same happens for normalized LMS.
>
> the adaptation can be sorta modeled like a Markov process based on the
> cross-correlation of the input against uncorrected feedback path. it's
> been a while since i saw this set up. but it's kinda like a
> discrete-time state-variable control system with the LMS filter taps as
> the system states.

so let's see if this is how it works.  anyone is invited to point out 
any misconceptions here.

L tap FIR

coefficient vector at time n:
                          H[n] = { h[0,n] h[1,n] h[2,n] ... h[L-1,n] }
FIR states at time n:    X[n] = { x[n]   x[n-1] x[n-2] ... x[n-L+1] }

"desired signal" that LMS is s'posed to converge to.

LMS:
               L-1
     y[n]  =   SUM{ h[k,n]*x[n-k] }
               k=0


     e[n]  =   y[n] - d[n]

     h[k,n+1]   =  h[k,n] - mu*e[n]*x[n-k]

where NLMS is the same except
                                                  L-1
   h[k,n+1]   =  h[k,n] - (mu*e[n]*x[n-k]) / (1/L SUM{ (x[n-i])^2 })
                                                  i=0

                                                     L-1
          =  h[k,n] - (mu*(y[n]-d[n])*x[n-k]) / (1/L SUM{ (x[n-i])^2 })
                                                     i=0

looking at the adaptation term, if you apply expectation values on the 
whole thing and be real sloppy with the math, you might get something like

    h[k,n+1]  =  h[k,n] - ( mu*(Ryx[k]-Rdx[k]) ) / (Rxx[0])

                     L-1
    Rxx[k] = (1/L) * SUM{ x[n-i]*x[n-i-k] }
                     i=0

                     L-1
    Ryx[k] = (1/L) * SUM{ y[n-i]*x[n-i-k] }
                     i=0

                     L-1
    Rdx[k] = (1/L) * SUM{ d[n-i]*x[n-i-k] }
                     i=0

furthermore, i think Ryx[k] can look something like

                     L-1  L-1
    Ryx[k] = (1/L) * SUM{ SUM{ h[j,n-i]*x[n-i-j]*x[n-i-k] } }
                     i=0  j=0

              L-1
    Ryx[k] =  SUM{ h[j,n]*Rxx[j-k] }
              j=0


   h[k,n+1]  =  h[k,n] -  SUM{ h[j,n]*mu*(Rxx[j-k]/Rxx[0]) }
                           j

minus something to do with Rdx[k].

so the state transition matrix (applied to the time-varying 
*coefficients*, h[k,n]) looks sorta like

   [  1-mu               mu*Rxx[1]/Rxx[0]   mu*Rxx[2]/Rxx[0]  ... ]
   [  mu*Rxx[1]/Rxx[0]   1-mu               mu*Rxx[1]/Rxx[0]  ... ]
   [  mu*Rxx[2]/Rxx[0]   mu*Rxx[1]/Rxx[0]   1-mu              ... ]
   [  ...                ...                ...               ... ]

and then you gotta ask yourself when that represents a stable system. 
and i don't remember how to do that anymore.  if there is an Rxx[k] that 
represents a high correlation (which is what happens when the echo is 
very loud and delayed by k samples), i can see some elements of this 
state transition matrix get large and you might have a problem.  if you 
slow down the adaptation by reducing the adaptation gain, mu, there 
should always be a point where the updating, although slow, should be 
stable.

i know i was sloppy tossing around the expectation values in the 
summations.  i don't mind if anyone gets more anal about this and fixes it.

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."


0
Reply rbj (3920) 6/8/2012 4:57:19 AM

On 6/8/12 12:57 AM, robert bristow-johnson wrote:

> so the state transition matrix (applied to the time-varying
> *coefficients*, h[k,n]) looks sorta like
>
>  [  1-mu               mu*Rxx[1]/Rxx[0]   mu*Rxx[2]/Rxx[0]  ... ]
>  [  mu*Rxx[1]/Rxx[0]   1-mu               mu*Rxx[1]/Rxx[0]  ... ]
>  [  mu*Rxx[2]/Rxx[0]   mu*Rxx[1]/Rxx[0]   1-mu              ... ]
>  [  ...                ...                ...               ... ]


right away dropped a minus sign...

   [  1-mu               -mu*Rxx[1]/Rxx[0]   -mu*Rxx[2]/Rxx[0]  ... ]
   [  -mu*Rxx[1]/Rxx[0]   1-mu               -mu*Rxx[1]/Rxx[0]  ... ]
   [  -mu*Rxx[2]/Rxx[0]   -mu*Rxx[1]/Rxx[0]   1-mu              ... ]
   [  ...                 ...                 ...               ... ]

looks like

    I  -  mu/Rxx[0]*[ Rxx[i-j] ]

where I is the identity matrix and [ Rxx[i-j] ] is the "Toeplitz matrix" 
with all this diagonals identical and with i the row and j the column 
number of each element.


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."


0
Reply rbj (3920) 6/8/2012 5:16:27 AM

not related to the problem, but 

>> It's calculated in dB as 20*log10(var(echo signal) / var(speaker

shouldn't it be "10 log 10"? "Var"(iance) implies to me that some magnitude
(where I'd use 20 log 10) was already squared. Might be a misunderstanding
on my side, though.


0
Reply markus.nentwig2273 (183) 6/8/2012 8:38:03 AM

> shouldn't it be "10 log 10"? "Var"(iance) implies to me that some magnitude
> (where I'd use 20 log 10) was already squared. Might be a misunderstanding
> on my side, though.

I think you're right about that.
0
Reply mjames2393 (69) 6/8/2012 1:35:53 PM

> looks like
>
>     I  -  mu/Rxx[0]*[ Rxx[i-j] ]
>
> where I is the identity matrix and [ Rxx[i-j] ] is the "Toeplitz matrix"
> with all this diagonals identical and with i the row and j the column
> number of each element.

The equations you posted look very similar to the equations you
stumble into when you derive the levinson-durbin algorithm for
calculating LPC?
0
Reply mjames2393 (69) 6/8/2012 7:39:10 PM

On Thursday, June 7, 2012 7:58:59 PM UTC-5, Mauritz Jameson wrote:
> Hi
>=20
> I'm playing around with a traditional NLMS algorithm for acoustic echo
> cancellation.
>=20
> It seems like the NLMS algorithm have a hard time cancelling the echo
> as soon as the ERL level creeps above 0dB.
>=20
> Is that to be expected?

Mauritz,
I'm out of town at the moment, and won't be back until next Monday. But, th=
e NLMS should cancel an echo with >0dB ERL, IF ALL conditions are favorable=
.. I hope you're simulating the algorithm before trying to implement it.

I need the answer to the following questions.

1. Are you using fixed-point or floating-point simulation?
2. What is your simulated unknown impulse response (including length of del=
ay)?

To test your basic algorithm you should be using a white-noise source, not =
speech. Once you are satisfied with the basic algorithm, then start looking=
 at specialized signals.

The "ideal" update gain depends on the input signal. With the NLMS, a gain =
of 1 is best for white noise. HOWEVER, since the LMS and NLMS estimates the=
 Wiener filter, which requires the inverse of the input autocorrelation mat=
rix,if the input is sinusoid type, the inverse matrix does not exist, and t=
here is no unique solution. This can hurt fixed-point implementations if yo=
u are not careful with the update gain.

I ran a quick floating-point simulation with white noise where the magnitud=
e of the echo was 1.5 times the input, and the algorithm ran with no proble=
ms.
0
Reply maury001 (237) 6/8/2012 10:19:02 PM

> I hope you're simulating the algorithm before trying to implement it.

Yes, I'm running simulations in MATLAB because I don't want to spend
too much time chasing errors when I finally implement it.


> 1. Are you using fixed-point or floating-point simulation?

Floating point. Once I have the algorithm working in MATLAB I will
make a floating point implementation in C and then once that has been
verified to work I will translate the math to fixed point.


> 2. What is your simulated unknown impulse response (including length of d=
elay)?

The impulse response I am using for the acoustic path has 512 filter
coefficients. The delay is 18 samples (I filtered an impulse and the
largest peak in the output happens after 18 samples).

> To test your basic algorithm you should be using a white-noise source, no=
t speech. Once you are satisfied with the basic algorithm, then start looki=
ng at specialized signals.

Ok. I didn't know that. I thought it was better to use it on actual
speech signals instead of signals which the algorithm won't encounter
in "the real world".


> The "ideal" update gain depends on the input signal. With the NLMS, a gai=
n of 1 is best for white noise. HOWEVER, since the LMS and NLMS estimates t=
he Wiener filter, which requires the inverse of the input autocorrelation m=
atrix,if the input is sinusoid type, the inverse matrix does not exist, and=
 there is no unique solution. This can hurt fixed-point implementations if =
you are not careful with the update gain.

How should I be careful ? Should the update gain be lower than
"normal" ? I have noticed that the update gain highly depends on the
power of the speaker signal. The lower the power the higher the update
gain can be. Is that a correct observation?


> I ran a quick floating-point simulation with white noise where the magnit=
ude of the echo was 1.5 times the input, and the algorithm ran with no prob=
lems.

That's nice to know. I will do the same then.

I look forward to your feedback. Thank you!!

0
Reply mjames2393 (69) 6/9/2012 4:18:31 PM

Oh ...and by the way....My simulations show that for some scenarios
the optimal update gain is above 1. I don't know if that is correct,
but that's what my simulations show for my particular NLMS simulation.
0
Reply mjames2393 (69) 6/9/2012 4:21:46 PM

On 6/9/12 12:18 PM, Mauritz Jameson wrote:
>
>> I hope you're simulating the algorithm before trying to implement it.
>
> Yes, I'm running simulations in MATLAB because I don't want to spend
> too much time chasing errors when I finally implement it.
>
>
>> 1. Are you using fixed-point or floating-point simulation?
>
> Floating point. Once I have the algorithm working in MATLAB I will
> make a floating point implementation in C and then once that has been
> verified to work I will translate the math to fixed point.
>

the size of scalers will be important.  try to figure out how big 
they're expected to get in the floating-point simulation.  that'll help 
you translate well to fixed point..

>
>> 2. What is your simulated unknown impulse response (including length of delay)?
>
> The impulse response I am using for the acoustic path has 512 filter
> coefficients. The delay is 18 samples (I filtered an impulse and the
> largest peak in the output happens after 18 samples).

well, that of course depends on what the NLMS is adapting to.  does it 
appear to be the case that the last 400 or so coefficients are far 
closer to zero than the peak at sample 18?  or smaller and really 
randomized in appearance?  or are there additional reflections happening 
at later than 18 samples that you are going after?  as long as the 
acoustic transducers are not physically moved, does the NLMS settle down 
to match an constant impulse response?

>
>> To test your basic algorithm you should be using a white-noise source, not speech. Once you are satisfied with the basic algorithm, then start looking at specialized signals.
>

i dunno if i agree with that.  it might be a less robust test.  i think 
white noise is what you want to use to throw your algorithm a soft ball.

> Ok. I didn't know that. I thought it was better to use it on actual
> speech signals instead of signals which the algorithm won't encounter
> in "the real world".
>

test it, thoroughly, with speech as well as initially with white noise. 
  and maybe find some other sources like car horns or rooms with a lot 
of people talking (which might be pinker in color).

>
>> The "ideal" update gain depends on the input signal. With the NLMS, a gain of 1 is best for white noise. HOWEVER, since the LMS and NLMS estimates the Wiener filter, which requires the inverse of the input autocorrelation matrix,if the input is sinusoid type, the inverse matrix does not exist, and there is no unique solution. This can hurt fixed-point implementations if you are not careful with the update gain.
>
> How should I be careful ? Should the update gain be lower than
> "normal" ? I have noticed that the update gain highly depends on the
> power of the speaker signal. The lower the power the higher the update
> gain can be. Is that a correct observation?
>

one thing that is common with LMS filters that are used for 
speaker-phone cancellation, is that the adaptation gain is reduced when 
there is "double talk", when both the speaker is talking (this would be 
the e[n] that gets added to the "desired signal" d[n]) and there is 
direct and echo feedback, the d[n], which means there is some non-zero 
signal in x[n].  someone else here might have a far better idea than me 
for how that downward adjustment to the adaptation gain is done when 
there is "double talk".

>
>> I ran a quick floating-point simulation with white noise where the magnitude of the echo was 1.5 times the input, and the algorithm ran with no problems.
>
> That's nice to know. I will do the same then.
>
> I look forward to your feedback. Thank you!!
>

On 6/9/12 12:21 PM, Mauritz Jameson wrote:
> Oh ...and by the way....My simulations show that for some scenarios
> the optimal update gain is above 1. I don't know if that is correct,

it might be.  is that mu defined to be roughly the same if the filter 
order (it's "p" in the Wikipedia page on NLMS) or does it increase 
roughly linearly with the FIR length "p"?


> but that's what my simulations show for my particular NLMS simulation.

it sorta depends if that normalizing scaler is

        p
     ( SUM{ (x[n-i])^2 } )^(-1)
       i=0


or

              p
     ( (1/P) SUM{ (x[n-i])^2 } )^(-1)
             i=0



-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."


0
Reply rbj (3920) 6/10/2012 4:24:18 AM

Should I remove DC from the speaker and mic signal before running it
through the NLMS algo?

It seems like I get better results if I apply this FIR filter to the
mic and spk signal before they're processed by the NLMS algo:

NUM = [1 -0.98]
DEN = 1;


0
Reply mjames2393 (69) 6/10/2012 9:54:19 PM

> well, that of course depends on what the NLMS is adapting to. =A0does it
> appear to be the case that the last 400 or so coefficients are far
> closer to zero than the peak at sample 18?

Yes. Some are above positive and some are negative...Looks like
"mountains" if you zoom in on the tail...

> acoustic transducers are not physically moved, does the NLMS settle down
> to match an constant impulse response?

Yes...but the funny thing is that sometimes I see ripples waving
through on both
sides of the peak in the estimated FIR coefficient vector



> one thing that is common with LMS filters that are used for
> speaker-phone cancellation, is that the adaptation gain is reduced when
> there is "double talk", when both the speaker is talking (this would be
> the e[n] that gets added to the "desired signal" d[n]) and there is
> direct and echo feedback, the d[n], which means there is some non-zero
> signal in x[n]. =A0someone else here might have a far better idea than me
> for how that downward adjustment to the adaptation gain is done when
> there is "double talk".

Yes, but how do you distinguish between double-talk and a temporarily
unconverged NLMS filter. In both cases you are going to see a signal
on the output side and it can be either echo coming through because
the
NLMS filter hasn't converged or it can be a near-end signal.

I calculate the update gain like this:

powerOfSpeakerSignal =3D sampleBuffer*(sampleBuffer');

updateGain =3D mu*sampleError/powerOfSpeakerSignal;



where sampleBuffer contains the time-reversed speaker samples.
where mu is a constant

and where

sampleError =3D actualMicrophoneSample - estimatedMicrophoneSample;

0
Reply mjames2393 (69) 6/10/2012 11:51:23 PM

On Thursday, June 7, 2012 7:58:59 PM UTC-5, Mauritz Jameson wrote:
> Hi
>=20
> I'm playing around with a traditional NLMS algorithm for acoustic echo
> cancellation.
>=20
> It seems like the NLMS algorithm have a hard time cancelling the echo
> as soon as the ERL level creeps above 0dB.
>=20
> Is that to be expected?
>=20
> The speaker signal is pure speech (no noise) and I have linearly
> filtered the speaker signal to have a simulated microphone signal. The
> adaptive filter order is 512 taps.
> I have made a test script which tests the NLMS algo with various step
> sizes ranging from very small (0.00001) to 1. For low step sizes
> nothing happens (the echo is not cancelled) and
> then at some point it just breaks (the error has a huge output ...like
> an impulse)...
>=20
> Is there a way to calculate the optimal step size given no
> interference (i.e. no near-end signal) and also given that there is
> some interference ?

Mauritz,
Hi. Don't worry about double-talk at the moment. That is a different conver=
gence problem associated with the condition that the near-end and far-end s=
ignals are correlated (review what the NLMS approximates). If you can't con=
vergence using white noise with echos that are greater than the input, you =
have bigger problems.

As far as being able to use coefficients greater than 1, I suggest you look=
 at my article in IEEE Signal Processing Magazine about enhanced convergenc=
e of the NLMS. For now, use an update gain of 1, use white noise, and see i=
f you can converge with an echo 1.5 times greater than the input. Your erro=
r signal should be very close to zero.

The reason to use white noise is simple. Unless your impulse response is an=
 impulse, it will be frequency dependent. Speech does not possess all frequ=
encies at the same time. Therefore, your filter will converge to those freq=
uencies present in the speech signal, then change when the speech signal ch=
anges. This means that as the frequency content changes, you could actually=
 loose the innovation you had for the past frequencies. For now, use white =
noise. You can move to speech signals after you get the algorithm running.

How big is you Matlab program? If not too big, share it.
0
Reply maury001 (237) 6/11/2012 3:48:57 PM

On Thursday, June 7, 2012 7:58:59 PM UTC-5, Mauritz Jameson wrote:
> Hi
> 
> I'm playing around with a traditional NLMS algorithm for acoustic echo
> cancellation.
> 
> It seems like the NLMS algorithm have a hard time cancelling the echo
> as soon as the ERL level creeps above 0dB.
> 
> Is that to be expected?
> 
> The speaker signal is pure speech (no noise) and I have linearly
> filtered the speaker signal to have a simulated microphone signal. The
> adaptive filter order is 512 taps.
> I have made a test script which tests the NLMS algo with various step
> sizes ranging from very small (0.00001) to 1. For low step sizes
> nothing happens (the echo is not cancelled) and
> then at some point it just breaks (the error has a huge output ...like
> an impulse)...
> 
> Is there a way to calculate the optimal step size given no
> interference (i.e. no near-end signal) and also given that there is
> some interference ?

Oh yes, I forgot to add. If you do not dedicate a coefficient in your filter to the DC component, then YES you should filter it out.
0
Reply maury001 (237) 6/11/2012 4:01:45 PM

On 6/11/12 12:01 PM, maury wrote:
>
> If you do not dedicate a coefficient in your filter to the DC component, then YES you should filter it out.

while i do not disagree with the advise to put in a DC blocking filter, 
i don't understand the concept of dedicating a coefficient in the LMS 
filter to DC.  that "DC coefficient" is actually the sum of all of the 
h[n] coefficients in the LMS FIR.  i s'pose you could calculate and 
subtract the mean of the h[n], and that would be like setting that 
hypothetical DC component to zero.  either way, it's not a bad idea to 
filter the DC out of the "desired" signal, d[n].

-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."


0
Reply rbj (3920) 6/11/2012 5:23:29 PM

Just to make sure I understand.

Are you saying that I should remove DC from the error signal?

Right now I am just pre-filtering the microphone and speaker signal.
The DC-suppressed mic and speaker signal are then handed over to the
NLMS algorithm.

I will share the MATLAB code later today...

Thank you for all the feedback so far...
0
Reply mjames2393 (69) 6/11/2012 7:49:26 PM

On 6/11/12 3:49 PM, Mauritz Jameson wrote:
> Just to make sure I understand.
>
> Are you saying that I should remove DC from the error signal?

it wasn't what i meant to say.  i meant x[n] which is the speaker 
signal, if you are doing speakerphone feedback cancellation.  the "error 
signal", e[n], is actually your net microphone signal after your NLMS 
attempts to remove (via subtraction in the digital domain) the acoustic 
coupling into the microphone from the loudspeaker.  i guess it's d[n], 
the "desired signal" is the raw data that comes in from your mic.

>
> Right now I am just pre-filtering the microphone and speaker signal.
> The DC-suppressed mic and speaker signal are then handed over to the
> NLMS algorithm.

that might be okay.  i might do it to the speaker and not the mic, but 
because of the DC offset that might occur with the mic preamp and A/D 
conversion, it might be a good idea to DC block the mic input also.  if 
you DC block both x[n] and e[n] and if your NLMS is behaving well, the 
mean value (or sum) of the FIR coefficients *should* tend to add to zero 
on average.


-- 

r b-j                  rbj@audioimagination.com

"Imagination is more important than knowledge."


0
Reply rbj (3920) 6/11/2012 8:42:45 PM

Maury,

Here is a link to the code:

https://www.dropbox.com/s/ptw44onhtydv8d0/nlmsAlgorithm.zip

0
Reply mjames2393 (69) 6/12/2012 1:37:27 AM

On Monday, June 11, 2012 8:37:27 PM UTC-5, Mauritz Jameson wrote:
> Maury,
>=20
> Here is a link to the code:
>=20
> https://www.dropbox.com/s/ptw44onhtydv8d0/nlmsAlgorithm.zip

Mauritz,
I have one wee little question, and a couple of comments. Fisrt the little =
question: why are you using the block NLMS algorithm? I know why I have use=
d it in the past, just courious to know why you are using it.

Now the comments. You will be happy to know that your algorithm works. I ch=
anged your testIndex to 1 in main.m (testIndex =3D 1;), then added=20

microphoneSignal =3D 10*microphoneSignal;

after=20

microphoneSignal =3D filter(b,1,speakerSignal);

It worked fine, no problems. So your basic algorithm will cancel echos that=
 are greater than the input (in this case 5 times bigger). Instead of using=
 randi to generate noise, I would suggest using din =3D randn(col,row), the=
n using din =3D din/max(din) to normalize (randn is white gaussian noise ge=
nerator).

Your spk.wav and mic.wav signals are different speakers, and different spee=
ch. This is NOT a test of the NLMS algorithm, but a test of your algorithm =
performance with correlated interference. This is called "double-talk" (as =
I think you know). Correlated double-talk WILL cause the NLMS algorithm to =
diverge. No (simple) way around it. Your task is NOT to find out what is wr=
ong with the NLMS algorithm, but to find out how to detect when double-talk=
 occurs, and what to do when you detect it

If you can still find it, I would suggest you look at two publications. The=
 first is an echo canceller tutorial from the T1A1 committee of ANSI, "ATIS=
 T1.TR.27: Echo Cancelling", published Nov. 1993.
The second is ITU-T, G.168. The appendix (I believe appendix I) has "Consid=
erations regarding echo canceller performance during double talk". If you G=
oogle "echo canceller ITU", then click "G.168 : New Appendix VII on guidanc=
e on echo canceller ... - ITU", you can get G.168(04.97) for free. I don't =
know about the ATIS T1.TR.27 cost.

Good luck,

Maurice
0
Reply maury001 (237) 6/12/2012 2:36:24 PM

On Monday, June 11, 2012 8:37:27 PM UTC-5, Mauritz Jameson wrote:
> Maury,
>=20
> Here is a link to the code:
>=20
> https://www.dropbox.com/s/ptw44onhtydv8d0/nlmsAlgorithm.zip

One other thing, a couple of questions for you.

Assume a simple impulse response such as an impulse (all coefficients =3D 0=
, except 1, i.e. a pure delay). If the echo is greater than the input, will=
 the coefficient in the adaptive filter that represents that impulse have a=
 value less than ot greater than 1? Will the error signal sometimes be grea=
ter than 1 (take another look at your whole algorithm).

Second question. If the echo canceller always sees double-talk, and never s=
ees just the far-end signal, will it have a chance to converge?

Sometimes the way you test an algorithm is as critical, if not more so, as =
testing the algorithm itself.

Maurice
0
Reply maury001 (237) 6/12/2012 2:46:17 PM

Maurice,

Thank you for your valuable feedback so far. I appreciate it.

I'm using the algorithm in a real-time setup where the audio is delivered i=
n blocks. So that's why I process the audio in blocks.

I don't understand why you say that the mic and spk files are different? In=
 the beginning of the mic file I hadn't enabled the loudspeaker. If you "sc=
roll" forward in the mic file you will hear the echo of the speaker signal =
in the microphone signal. There is some background noise in the mic file, b=
ut that's to be expected in a real setup, right?

I noticed that I see ripples through the estimated filter coefficients when=
 double-talk occurs. The peak in the filter coefficient vector usually stay=
s in the same location. Any suggestions as to how I detect these ripples? I=
t looks like a wave propagating through the filter coefficient vector. I wa=
s thinking about measuring the variance of the leading "zeros" and use the =
change in variance as an indicator of double-talk. On the other hand, I als=
o noticed that even though the filter diverges during double-talk, it also =
reconverges pretty fast...so I'm wondering how much I will benefit from not=
 updating the filter during double-talk? Will the double-talk segment sound=
 much better?

If the echo is stronger than its source, the peak in the estimated filter c=
oefficients will probably be larger than 1 (this is just a guess) if there'=
s no near-end speech present.=20

The error signal should stay within a valid range from -1 to 1. The estimat=
ed microphone sample should ideally stay pretty close to the actual microph=
one sample given that there is no near-end speech. If there is near-end spe=
ech, the microphone sample could (for example) be -1 while the estimated is=
 also 1. That means that the error has a range from -2 to 2. I placed a lim=
iter on it in case the error signal is out of bounds, but maybe I should re=
vise the boundaries from -1 to 1 to -2 to 2. I haven't investigated if the =
limiter introduces some undesired behavior of the algorithm or if it lowers=
 the performance of the NLMS algorithm....

If the echo canceller always sees double-talk and never just sees the far-e=
nd signal, I don't think it will converge. Most likely it will adapt to a f=
alse optimum. I'm not sure what that will look like.=20

I will see if I can find the publications you recommended. Thank you!!!



0
Reply mjames2393 (69) 6/12/2012 3:46:16 PM

On Tuesday, June 12, 2012 10:46:16 AM UTC-5, mjame...@gmail.com wrote:
> Maurice,
>=20
> Thank you for your valuable feedback so far. I appreciate it.
>=20
> I'm using the algorithm in a real-time setup where the audio is delivered=
 in blocks. So that's why I process the audio in blocks.
>=20

Understand.


> I don't understand why you say that the mic and spk files are different? =
In the beginning of the mic file I hadn't enabled the loudspeaker. If you "=
scroll" forward in the mic file you will hear the echo of the speaker signa=
l in the microphone signal. There is some background noise in the mic file,=
 but that's to be expected in a real setup, right?
>=20

When I listen to the two, they are readings of two different passasges from=
 The Hobbit. If one is the near-end and the other is the far-end (speaker a=
nd microphone), make sure you have some time at the beginning with just the=
 far-end so the filter can converge.


> I noticed that I see ripples through the estimated filter coefficients wh=
en double-talk occurs. The peak in the filter coefficient vector usually st=
ays in the same location. Any suggestions as to how I detect these ripples?=
 It looks like a wave propagating through the filter coefficient vector. I =
was thinking about measuring the variance of the leading "zeros" and use th=
e change in variance as an indicator of double-talk. On the other hand, I a=
lso noticed that even though the filter diverges during double-talk, it als=
o reconverges pretty fast...so I'm wondering how much I will benefit from n=
ot updating the filter during double-talk? Will the double-talk segment sou=
nd much better?
>=20

How you detect double-talk and what you do when you detect it is for you to=
 determine.


> If the echo is stronger than its source, the peak in the estimated filter=
 coefficients will probably be larger than 1 (this is just a guess) if ther=
e's no near-end speech present.=20
>=20
> The error signal should stay within a valid range from -1 to 1. The estim=
ated microphone sample should ideally stay pretty close to the actual micro=
phone sample given that there is no near-end speech. If there is near-end s=
peech, the microphone sample could (for example) be -1 while the estimated =
is also 1. That means that the error has a range from -2 to 2. I placed a l=
imiter on it in case the error signal is out of bounds, but maybe I should =
revise the boundaries from -1 to 1 to -2 to 2. I haven't investigated if th=
e limiter introduces some undesired behavior of the algorithm or if it lowe=
rs the performance of the NLMS algorithm....
>=20

Ask yourself what you are trying to prevent by limiting the error. Then rem=
ove the limter to see if you were successful. Then ask yourself if your fix=
 is needed.


> If the echo canceller always sees double-talk and never just sees the far=
-end signal, I don't think it will converge. Most likely it will adapt to a=
 false optimum. I'm not sure what that will look like.=20
>=20
> I will see if I can find the publications you recommended. Thank you!!!

You're welcome

Maurice

0
Reply maury001 (237) 6/13/2012 6:37:28 PM

Maurice,

Thank you for your valuable feedback. I listened to the audio files in the =
zip file and the mic file only contains echo from the loudspeaker (if you d=
isregard background noise). During the first 45 seconds of the mic file I h=
adn't put the loudspeaker on. After 45 seconds I switched on the loudspeake=
r, and you will hear that the audio in the mic file is the echo from the lo=
udspeaker. That echo is time-aligned with the audio in the speaker signal.


I tested the algorithm with speech and I fail to understand why the filter =
doesn't converge. Any ideas/suggestions?

The test can be downloaded from this link:

https://www.dropbox.com/s/14q5f8x1wbbzsjq/nlmstest.zip

The audio files in the zip file are not identical to the previous audio fil=
es, but the procedure is the same...I enable the loudspeakers after 45 seco=
nds.


0
Reply mjames2393 (69) 6/13/2012 7:57:39 PM

On Wednesday, June 13, 2012 2:57:39 PM UTC-5, mjame...@gmail.com wrote:
> Maurice,
>=20
> Thank you for your valuable feedback. I listened to the audio files in th=
e zip file and the mic file only contains echo from the loudspeaker (if you=
 disregard background noise). During the first 45 seconds of the mic file I=
 hadn't put the loudspeaker on. After 45 seconds I switched on the loudspea=
ker, and you will hear that the audio in the mic file is the echo from the =
loudspeaker. That echo is time-aligned with the audio in the speaker signal=
..
>=20
>=20
> I tested the algorithm with speech and I fail to understand why the filte=
r doesn't converge. Any ideas/suggestions?
>=20
> The test can be downloaded from this link:
>=20
> https://www.dropbox.com/s/14q5f8x1wbbzsjq/nlmstest.zip
>=20
> The audio files in the zip file are not identical to the previous audio f=
iles, but the procedure is the same...I enable the loudspeakers after 45 se=
conds.

Mauritz,
Fist of all, your algorithm is getting about 20dB cancellation. So I don't =
know why you say it isn't working.

Secondly, the adaptive filter can only adapt to things that are consistent =
with the model. In this case y =3D x'h. Your algorithm has this as a linear=
 model. Therefore, it there is anything in the return path (echo) that is n=
on-linear, or longer than the model (filter) length, or not consistent with=
 the model, then the adaptive filter can not adapt to it. Remember, the NLM=
S only says that it will get the least mean square error. It doesn't make p=
romises about filter coefficient error.=20

Now try this. In your main.m for testIndex =3D 2, replace "microphoneSignal=
 =3D wavread('mic.wav');" with "microphoneSignal =3D filter(b,1,speakerSign=
al);". Then replace "outputSignal =3D nlmsAlgo(speakerSignal, microphoneSig=
nal, 1);" with "[outputSignal,h] =3D nlmsAlgo(speakerSignal, microphoneSign=
al, 1);". Then in nlmsAlgo.m change the function declaration to
"function [outputSignal,h] =3D nlmsAlgo(speakerSignal, microphoneSignal, mu=
)".

This does two things. First, your simulated impulse response, b, is linear.=
 Your microphone signal is now a linear convolution of the unknown impulse =
response and the sperker signal. Second, you will now have the adaptive fil=
ter, h, as a variable.

Now, run main, and your code will show you the cancellation possible if the=
 impulse response is linear. Now plot b and h together to compare actual wi=
th the learned. Since your variables are row vectors, you will need to use =
plot([b' h']).

You now have two experiments: 1 with "b" as the unknown impulse response, a=
nd the other with your microphone wave file. Compare the results, think abo=
ut the assumptions made when using the NLMS, then formulate a theory that i=
s consistent with your observations.

Maurice
0
Reply maury001 (237) 6/14/2012 3:08:35 PM

Thank you Maurice. I understand the experiment that you suggest.
However, I am trying to understand why I see "bursts" of strong echo
slipping through the NLMS algorithm when I use real-world signals.
From what you're saying the explanation is that these microphone
segments can't be modeled as the output of a linear operation on the
speaker signal? You are saying that mic.wav contains segments which
are the result of a non-linear operation on the speaker signal? My
question is : How can I improve my algorithm so it handles these non-
linear segments better? What do you do if you want to stick with a
linear model? Do you increase the number of filter coefficients to
"cover" for these possible non-linearities?

I will run the experiment you suggested and post my "theory" when I'm
done.

0
Reply mjames2393 (69) 6/14/2012 3:35:16 PM

On Thursday, June 14, 2012 10:35:16 AM UTC-5, Mauritz Jameson wrote:
> Thank you Maurice. I understand the experiment that you suggest.
> However, I am trying to understand why I see "bursts" of strong echo
> slipping through the NLMS algorithm when I use real-world signals.
> From what you're saying the explanation is that these microphone
> segments can't be modeled as the output of a linear operation on the
> speaker signal? You are saying that mic.wav contains segments which
> are the result of a non-linear operation on the speaker signal? My
> question is : How can I improve my algorithm so it handles these non-
> linear segments better? What do you do if you want to stick with a
> linear model? Do you increase the number of filter coefficients to
> "cover" for these possible non-linearities?
> 
> I will run the experiment you suggested and post my "theory" when I'm
> done.

You now have everything you need. Study my replys, and the two references I gave you. You should be able to take it from here.
0
Reply maury001 (237) 6/15/2012 6:54:12 PM

27 Replies
32 Views

(page loaded in 0.528 seconds)


Reply: