### IIR filter gain - where to do it?

Hi,
I have the current 3rd order IIR filter:

gain = 1789.111562
b = [1 3 3 1];
a = [1 -1.6009450356 0.9414772490 -0.1971027115];

I'm trying to implement this in fixed-point format, using DF1. I don't
think I need to convert to Second-Order-Section as this is not an
aggressive filter.

The first step was to quantize the coefficients, and I get get by with 12
bits.

Now I don't know what to do with this gain. If I do it at the input, I need
to add ~11 LSB to keep decent SNR. If I do it at the output, well I need to
add ~11 MSB or everything starts clipping within the filter.

Is there a good way to handle such a case or I basically just need the
bits?

Dave

> I'm trying to implement this in fixed-point format, using DF1. I don't
> think I need to convert to Second-Order-Section as this is not an
> aggressive filter.

Here is a biquad for a 8 bit 68HC08 controller:
( text is in german but pictures should be clear enough ).

The biquad is based on the version with one big accumulator
picture 1 / Bild 1.
Several 8x8->16 bit unsigned multipy-opcode of the 68HC908
are used in a basic subroutine for 16x16->32 bit unsigned
multiply.
The next level is the "MAC"-subroutine shown in picture 8 /
Bild 8. Note there is an additional 32 bit ASL-shift controlled
by a flag. Needed to patch coefficients from gain<1,0 to
to gain<2,0.
The flag bits for that are stored in flash in the unused
coefficient a0 that is always assumed gain=1,0
The other coefficients (b0 b1 b2 a1 a2 ) in flash are
16 bit signed magnitude because that was slightly faster
then 2er-complement.
Coefficients & I/O data are scaled to 1,0 = 32767 signed.
But inside the biquad ( picture 6 Bild 6 ) the length goes
up to 32 bit and then 40 bit for the accumulator. At the
output there is saturation logic to limit to 32 bit and
then by reordering the word a scale back ( division )
to 16 bit.

The biquad should be simple to extend to the 3pole structure.

Any 16 bit controller is preferable in that application
to a 8 bit controller. And i doubt a 8 bit controller
without a 8x8->16 bit unsigned multipy opcode is much use.

> Now I don't know what to do with this gain.

Shift the data for the input up to 16 bit but avoid
overflow. For 16 bit wordlength the filter usually has no
gain. Hardly wide enough to keep noise, limit cycles down.

Don?t waste too much time on matlab. A real implementation
tested with real data will show the real problems.

MfG  JRD

On 08/30/2010 07:11 AM, gretzteam wrote:
> Hi,
> I have the current 3rd order IIR filter:
>
> gain = 1789.111562
> b = [1 3 3 1];
> a = [1 -1.6009450356 0.9414772490 -0.1971027115];
>
> I'm trying to implement this in fixed-point format, using DF1. I don't
> think I need to convert to Second-Order-Section as this is not an
> aggressive filter.
>
> The first step was to quantize the coefficients, and I get get by with 12
> bits.
>
> Now I don't know what to do with this gain. If I do it at the input, I need
> to add ~11 LSB to keep decent SNR. If I do it at the output, well I need to
> add ~11 MSB or everything starts clipping within the filter.
>
> Is there a good way to handle such a case or I basically just need the
> bits?

Converting to a first- and a second-order section in cascade should
relieve the need for so much gain all in one place, even if it doesn't
buy you much in the necessary coefficient precision.

"gretzteam" <gretzteam@n_o_s_p_a_m.yahoo.com> writes:

> Hi,
> I have the current 3rd order IIR filter:
>
> gain = 1789.111562
> b = [1 3 3 1];
> a = [1 -1.6009450356 0.9414772490 -0.1971027115];
>
> I'm trying to implement this in fixed-point format, using DF1. I don't
> think I need to convert to Second-Order-Section as this is not an
> aggressive filter.
>
> The first step was to quantize the coefficients, and I get get by with 12
> bits.
>
> Now I don't know what to do with this gain. If I do it at the input, I need
> to add ~11 LSB to keep decent SNR. If I do it at the output, well I need to
> add ~11 MSB or everything starts clipping within the filter.
>
> Is there a good way to handle such a case or I basically just need the
> bits?
>
> Dave

Dave,

gain is 1, then the maximum output is one. That means you would be
scaled Q0.15 on the output for a 16-bit output (e.g.).

Now, you can get a gain of 2^M for free! Simply shift the binary point
to the right M places, M <= 15. Note that the actual integer output
values remain unchanged - it's just a change in the interpretation of
the output (namely, in the position of the imaginary binary point). You
will neither lose precision nor chance overflow (or saturation).
>>Dave,
>
>gain is 1, then the maximum output is one. That means you would be
>scaled Q0.15 on the output for a 16-bit output (e.g.).
>
>Now, you can get a gain of 2^M for free! Simply shift the binary point
>to the right M places, M <= 15. Note that the actual integer output
>values remain unchanged - it's just a change in the interpretation of
>the output (namely, in the position of the imaginary binary point). You
>will neither lose precision nor chance overflow (or saturation).

Hi,
I think I expressed myself incorrectly when talking about gain.

This 3rd order filter:
b = [1 3 3 1];
a = [1 -1.6009450356 0.9414772490 -0.1971027115];
has a gain of 1789.111562.

The input to the filter is [-1:1).
Now of course I want to have 0dB gain in the passband, so I need to scale
everything by 1/1789.11562 at some point. Doing it on the input requires
about 11 extra LSBs to keep SNR fine. Doing it on the output requires 11
extra MSBs so that the filter can handle numbers up to 1789...

I'm doing this in an FPGA so I have control of the wordlength at all nodes.
The 'FIR' part is trivial and requires no multiply at all. Then I was
thinking to do the scaling multiply before going in the IIR part - making
everything 11bits wider before we even start. I just feel like this
inherent gain is costing me so much!

Am I missing something?

Dave

"gretzteam" <gretzteam@n_o_s_p_a_m.yahoo.com> writes:

>>>Dave,
>>
>>gain is 1, then the maximum output is one. That means you would be
>>scaled Q0.15 on the output for a 16-bit output (e.g.).
>>
>>Now, you can get a gain of 2^M for free! Simply shift the binary point
>>to the right M places, M <= 15. Note that the actual integer output
>>values remain unchanged - it's just a change in the interpretation of
>>the output (namely, in the position of the imaginary binary point). You
>>will neither lose precision nor chance overflow (or saturation).
>
>
> Hi,
> I think I expressed myself incorrectly when talking about gain.
>
> This 3rd order filter:
> b = [1 3 3 1];
> a = [1 -1.6009450356 0.9414772490 -0.1971027115];
> has a gain of 1789.111562.
>
> The input to the filter is [-1:1).
> Now of course I want to have 0dB gain in the passband, so I need to scale
> everything by 1/1789.11562 at some point. Doing it on the input requires
> about 11 extra LSBs to keep SNR fine. Doing it on the output requires 11
> extra MSBs so that the filter can handle numbers up to 1789...

Wait... This sounds wrong. If your filter gain is 1789, then you ALREADY
have to have the extra MSBs to cover that gain, or you must otherwise
somehow allow for the maximum necessary for the filter, e.g., by keeping
some number of bits (not necessarily MORE than what you have now) with a
binary point in the place that admits the max full-scale required by the
filter gain.

> I'm doing this in an FPGA so I have control of the wordlength at all nodes.
> The 'FIR' part is trivial and requires no multiply at all. Then I was
> thinking to do the scaling multiply before going in the IIR part - making
> everything 11bits wider before we even start. I just feel like this
> inherent gain is costing me so much!
>
> Am I missing something?

Yes. Extra gain !==> extra bits. Bits are for precision. For example,
you could have an 11 bit output with the binary point at 0 (i.e.,
integer) - that would cover your required output range. The problem you
need to analyze is, what does this output width due to your feedback
paths with respect to limit cycles, quantization errors, and such.

I'm not an expert in IIR design, and the intuitions I'm feeling are
too complex to type quickly, but I hope my intuition is correct and
this limited coverage gives you an idea of where to head.
"Randy Yates" <yates@ieee.org> wrote in message
news:m3hbibk64s.fsf@ieee.org...

> Consider your input signal [-1, +1). If your filter's maximum passband
> gain is 1, then the maximum output is one.

I don't get this.
Surely a full amplitude step into a brick-wall will give you
an output > 1.0.

Pete


> This 3rd order filter:
> b = [1 3 3 1];
> a = [1 -1.6009450356 0.9414772490 -0.1971027115];
> has a gain of 1789.111562.
>
> I'm doing this in an FPGA so I have control of the wordlength at all nodes.
> The 'FIR' part is trivial and requires no multiply at all.

How are the multiplications for the IIR done ?
CSD / canonical signed digit thats shift&add&subtract would be possible,
but one would probably then try to thin out the bits in a1, a2, a3 too.

> Am I missing something?

What is the wordlength at the input ?
That would give an indication how much precision for coefficients is
needed.

MfG JRD


"Pete Fraser" <pfraser@covad.net> writes:

> "Randy Yates" <yates@ieee.org> wrote in message
> news:m3hbibk64s.fsf@ieee.org...
>
>> Consider your input signal [-1, +1). If your filter's maximum passband
>> gain is 1, then the maximum output is one.
>
> I don't get this.
> Surely a full amplitude step into a brick-wall will give you
> an output > 1.0.

You mean because of overshoot? I see that. Good point. I probably
should have just stated that the output was assumed to be [-1, +1).

The actual maximum output value (assuming you don't saturate)
depends on the "impulse response area":

a = sum_{n = -\infty}^{+\infty} h[n]

For FIR this is computable. For IIR, I don't know of an easy way
to determine that. (Or if I did I've forgotten...)
>> This 3rd order filter:
>> b = [1 3 3 1];
>> a = [1 -1.6009450356 0.9414772490 -0.1971027115];
>> has a gain of 1789.111562.
>>
>> I'm doing this in an FPGA so I have control of the wordlength at all
nodes.
>> The 'FIR' part is trivial and requires no multiply at all.
>
>How are the multiplications for the IIR done ?
>CSD / canonical signed digit thats shift&add&subtract would be possible,
>but one would probably then try to thin out the bits in a1, a2, a3 too.
>
>> Am I missing something?
>
>What is the wordlength at the input ?
>That would give an indication how much precision for coefficients is
>needed.
>
>MfG JRD
>
>

Coefficients are currently quantized to 12.9 format. I could go to CSD but
this is an implementation detail at this point.
Input is in 1.23 format.
Basically I'm only struggling with this 'gain'. It just seems too bad that
on top of adding a bunch of bits just because it's IIR etc, I need to add
11 bits because the thing has a HUGE gain!

Dave


>>> This 3rd order filter:
>>> b = [1 3 3 1];
>>> a = [1 -1.6009450356 0.9414772490 -0.1971027115];
>>> has a gain of 1789.111562.

> Input is in 1.23 format.

If there are really 24 bit valid data then the filter
would have a minimal internal wordlength of 24 bit.
Instead of the more common 16 bit.

> Basically I'm only struggling with this 'gain'.

A multiplication following the filter and extending
the wordlength from 24 bit by another 12 bit ?
36 bit sounds wrong. If there is going 24 bit precision
in there is coming ( less then ) 24 bit precision
out.

The other issue is how close to "1789.111562" the gain
has to be. Shifts like "1024.0" is probably not good enough.
Fraction like x/y with a multiplier that extend wordlength
and /y being done by shifts would be close ?

MfG  JRD


On 08/31/2010 07:46 AM, gretzteam wrote:
>>> This 3rd order filter:
>>> b = [1 3 3 1];
>>> a = [1 -1.6009450356 0.9414772490 -0.1971027115];
>>> has a gain of 1789.111562.
>>>
>>> I'm doing this in an FPGA so I have control of the wordlength at all
> nodes.
>>> The 'FIR' part is trivial and requires no multiply at all.
>>
>> How are the multiplications for the IIR done ?
>> CSD / canonical signed digit thats shift&add&subtract would be possible,
>> but one would probably then try to thin out the bits in a1, a2, a3 too.
>>
>>> Am I missing something?
>>
>> What is the wordlength at the input ?
>> That would give an indication how much precision for coefficients is
>> needed.
>>
>> MfG JRD
>>
>>
>
> Coefficients are currently quantized to 12.9 format. I could go to CSD but
> this is an implementation detail at this point.
> Input is in 1.23 format.
> Basically I'm only struggling with this 'gain'. It just seems too bad that
> on top of adding a bunch of bits just because it's IIR etc, I need to add
> 11 bits because the thing has a HUGE gain!

Well, put in attenuation then.

What you're seeing is a natural consequence of using an IIR filter -- so
it's just another thing you need to do just because it's an IIR.

I'm not sure that the bits you have to add 'just because it's an IIR'
are different from the bits you have to add to accomodate the filter
gain -- have you checked?

>I'm not sure that the bits you have to add 'just because it's an IIR'
>are different from the bits you have to add to accomodate the filter
>gain -- have you checked?

Yeah I think this is it...I feel kind of dumb right now. I got confused
because the 'b' coefficients are so trivial that I didn't want to scale
them with the 'attenuation'.

So I now end up doing the FIR part 'all-flat' using shift and add. Then I
right shift by 11 bits before going in the IIR part - essentially giving me
the usual additional LSBs required by the IIR part. This seems to work
pretty well.

Thanks for making me realize this!


On 08/31/2010 05:04 PM, gretzteam wrote:
>> I'm not sure that the bits you have to add 'just because it's an IIR'
>> are different from the bits you have to add to accomodate the filter
>> gain -- have you checked?
>
>
> Yeah I think this is it...I feel kind of dumb right now. I got confused
> because the 'b' coefficients are so trivial that I didn't want to scale
> them with the 'attenuation'.

When I'm feeling religious I'm quite willing to believe that mathematics
is just God's way of keeping smart people humble.

> So I now end up doing the FIR part 'all-flat' using shift and add. Then I
> right shift by 11 bits before going in the IIR part - essentially giving me
> the usual additional LSBs required by the IIR part. This seems to work
> pretty well.
>
> Thanks for making me realize this!

Sounds like you may be on the right track.

