Large FFT --- NOT radix-2

  • Follow


Any pointers to doing non-radix 2 FFT's.
Specifically I wish to do 44100 pt FFT's on 16 bit data.
Simplicity of coding OUTWEIGHS speed!
[ Scilab is slow but adequate ]

0
Reply Richard 1/2/2004 12:39:31 AM

On Thu, 01 Jan 2004 18:39:31 -0600, Richard Owlett
<rowlett@atlascomm.net> wrote:

>Any pointers to doing non-radix 2 FFT's.
>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>Simplicity of coding OUTWEIGHS speed!
>[ Scilab is slow but adequate ]

Hi Richard,
  I'm ignorant of these schemes,
but ya' might do an Internet search 
on "Prime Factor" and "Split-radix"
FFTs.
Good Luck,

[-Rick-]


0
Reply r 1/2/2004 2:58:32 AM


"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vv9fams78p7438@corp.supernews.com...
> Any pointers to doing non-radix 2 FFT's.
> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> Simplicity of coding OUTWEIGHS speed!
> [ Scilab is slow but adequate ]
>

Doesn't SCILAB just do them?  I tried it and it seems to.  It took 0.421
seconds on a dual processor Pentium 200MHz.

Fred


0
Reply Fred 1/2/2004 8:33:56 AM

Fred Marshall wrote:

> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> news:vv9fams78p7438@corp.supernews.com...
> 
>>Any pointers to doing non-radix 2 FFT's.
>>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>Simplicity of coding OUTWEIGHS speed!
>>[ Scilab is slow but adequate ]
>>
> 
> 
> Doesn't SCILAB just do them?  I tried it and it seems to.  It took 0.421
> seconds on a dual processor Pentium 200MHz.
> 
> Fred
> 
> 

I'm going to have a second look at the framework I was using. My 
Pentium 4 1.6 GHz seems to do it no faster than your machine. Well, I 
did do ~1800 of them in a batch ;}

BTW, it's already been pointed out to me in another forum that I could 
make my life simpler by padding to 64k ;/


0
Reply Richard 1/2/2004 12:22:44 PM

Richard Owlett wrote:
> Fred Marshall wrote:
> 
>> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
>> news:vv9fams78p7438@corp.supernews.com...
>>
>>> Any pointers to doing non-radix 2 FFT's.
>>> Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>> Simplicity of coding OUTWEIGHS speed!
>>> [ Scilab is slow but adequate ]
>>>
>>
>>
>> Doesn't SCILAB just do them?  I tried it and it seems to.  It took 0.421
>> seconds on a dual processor Pentium 200MHz.
>>
>> Fred
>>
>>
> 
> I'm going to have a second look at the framework I was using. My Pentium 
> 4 1.6 GHz seems to do it no faster than your machine. Well, I did do 
> ~1800 of them in a batch ;}
> 
> BTW, it's already been pointed out to me in another forum that I could 
> make my life simpler by padding to 64k ;/
> 
> 

Just did a timing check using Scilab's timer(). My FFT's usually take 
..14 seconds with a few as fast as .125 seconds. Not the speed 
improvement I would expect. What's your OS? I'm using Winxp Pro SP1.

Doing the timing pointed out another problem with my setup. It takes 
..94 seconds to move the 44k samples from the input data array to the 
array on which the FFT is performed.


0
Reply Richard 1/2/2004 1:08:53 PM

"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vvar7epmfqmuee@corp.supernews.com...
> Richard Owlett wrote:
> > Fred Marshall wrote:
> >
> >> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> >> news:vv9fams78p7438@corp.supernews.com...
> >>
> >>> Any pointers to doing non-radix 2 FFT's.
> >>> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> >>> Simplicity of coding OUTWEIGHS speed!
> >>> [ Scilab is slow but adequate ]
> >>>
> >>
> >>
> >> Doesn't SCILAB just do them?  I tried it and it seems to.  It took
0.421
> >> seconds on a dual processor Pentium 200MHz.
> >>
> >> Fred
> >>
> >>
> >
> > I'm going to have a second look at the framework I was using. My Pentium
> > 4 1.6 GHz seems to do it no faster than your machine. Well, I did do
> > ~1800 of them in a batch ;}
> >
> > BTW, it's already been pointed out to me in another forum that I could
> > make my life simpler by padding to 64k ;/
> >
> >
>
> Just did a timing check using Scilab's timer(). My FFT's usually take
> .14 seconds with a few as fast as .125 seconds. Not the speed
> improvement I would expect. What's your OS? I'm using Winxp Pro SP1.
>
> Doing the timing pointed out another problem with my setup. It takes
> .94 seconds to move the 44k samples from the input data array to the
> array on which the FFT is performed.

Richard,

First, the test I did was structured like this:

time=0
i=1
p(1:44100)=1
while i<=100
   timer();
   pf=fft(p,-1);
   time=time+timer()
   i=i+1
end

So, there's no moving of arrrays in this code unless it's being done in the
fft - it's just a simple timing loop.
The machine I ran it on runs NT4sp6a.  Well it *is* only a 200MHz machine!
But it may be able to use both processors in this case - I don't know and
would frankly be a little surprised if it did.  Oh, but it appears that it
is using both processors! So, either the speed difference would be 4
assuming you have a single processor machine and we're seeing 3.5 which is
close enough to 4 for me.  Maybe XP is using more resources than NT.

I do note that 44100 breaks into a series of twice repeated primes 2,3,5,7 -
so if you want to do something fancier, maybe you can take advantage of
this - but I'm not at all sure it will be worth it.  You *did* say that
speed isn't the first issue.

Just for fun, I tried decreasing the array size to
32768 -> 0.235 secs.
44100 -> 0.437 secs.
65536 -> 0.781 secs.

I don't know, it looks like there must be some swapping going on for the
largest array:

44100/32768 = 1.346    .437/.235 = 1.859  1.859/1.346 = 1.381 factor over
linear increase

65536/44100 = 1.486    .781/.437 = 1.787  1.787/1.486 = 1.2 factor over
linear increase

65536/32768 = 2        .781/.235 = 3.32   3.32/2 = 1.66 factor over linear
increase.

N*log2(N) @ 32768 = 32768*15 = 491,520
          @ 44100 = 44100*15.43 = 680,463 (just for reference even though
it's not a power of 2)
          @ 65536 = 65536*16 = 1,048,576

44/32 > 680,463/491,520   = 1.384  predicted vs. 1.859 measured; or 1.34
higher than predicted increase
65/44 > 1,048,576/680,463 = 1.54   predicted vs. 1.787 measured; or 1.16
higher than predicted increase
65/32 > 1,048,576/491,520 = 2.133  predicted vs. 3.32  measured; or 1.56
higher than predicted increase

If we assume that 44100 would be less efficient, then it makes sense that
44/32 is higher than 65/44 doesn't it?  (That is, if 44100 is less efficient
it should be closer to 65K and further from 32K).  What I don't figure yet
is why 65/32 would be the worst situation.  Perhaps it's because there's
swapping going on with such a large array - I don't know, this system has
192MB of memory.

Fred


0
Reply Fred 1/2/2004 7:47:51 PM

"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vvar7epmfqmuee@corp.supernews.com...
> Richard Owlett wrote:
> > Fred Marshall wrote:
> >
> >> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> >> news:vv9fams78p7438@corp.supernews.com...
> >>
> >>> Any pointers to doing non-radix 2 FFT's.
> >>> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> >>> Simplicity of coding OUTWEIGHS speed!
> >>> [ Scilab is slow but adequate ]
> >>>
> >>
> >>
> >> Doesn't SCILAB just do them?  I tried it and it seems to.  It took
0.421
> >> seconds on a dual processor Pentium 200MHz.
> >>
> >> Fred
> >>
> >>
> >
> > I'm going to have a second look at the framework I was using. My Pentium
> > 4 1.6 GHz seems to do it no faster than your machine. Well, I did do
> > ~1800 of them in a batch ;}
> >
> > BTW, it's already been pointed out to me in another forum that I could
> > make my life simpler by padding to 64k ;/
> >
> >

Richard,

I forgot to address the padding to 64K.  I don't see why you'd want to do
that unless there performance was greatly affected or unless you need to
zero pad for other reasons - like for circular convolution considerations.
44100 is a product of repeat numbers of primes: 2x2x3x3x5x5x7x7 so its
performance might be expected to be OK if the implemenation is good.  It
appears it is in Scilab.  On the other hand, try 4091 or 4093!!

Here's the recent code I used.  The runs times come out a little bit longer:

function [rowtime,coltime]=fftloop(N,loops)
i=1;
rowtime=0;
coltime=0;
while i<=loops
   p=rand(1,N);
   timer()
   pf=fft(p,-1);
   rowtime=rowtime+timer();
   i=i+1;
end
rowtime=rowtime/loops
i=1;
while i<=loops
   q=rand(N,1);
   timer();
   pf=fft(q,-1);
   coltime=coltime+timer();
   i=i+1;
end
coltime=coltime/loops

Fred


0
Reply Fred 1/2/2004 8:17:27 PM

r.lyons@REMOVE.ieee.org (Rick Lyons) wrote...
> Richard Owlett <rowlett@atlascomm.net> wrote:
> >Any pointers to doing non-radix 2 FFT's.
> >Specifically I wish to do 44100 pt FFT's on 16 bit data.
> >Simplicity of coding OUTWEIGHS speed!
> 
>   I'm ignorant of these schemes,
> but ya' might do an Internet search 
> on "Prime Factor" and "Split-radix"
> FFTs.

The split-radix algorithm is essentially a combination of radices 2
and 4, and only works for power-of-two sizes.  The Prime-Factor
Algorithm (PFA) only decomposes a DFT into relatively prime factors
(so e.g. it cannot further decompose the factor of 7^2 in 44100) and
is fairly complicated to implement.

What the original poster wants is a "mixed-radix" FFT, which is simply
the generalized version of the standard Cooley-Tukey algorithm to
arbitrary factorizations.  Many implementations can be found online
(see www.fftw.org/links.html)

Of course, for spectral estimation you should also be doing windowing,
etcetera, in order to reduce edge artifacts.  Ditto if you are doing
zero-padding.  See e.g. Oppenheim and Schafer, "Digital Signal
Processing".

Cordially,
Steven G. Johnson
0
Reply stevenj 1/2/2004 9:13:54 PM

Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vv9fams78p7438@corp.supernews.com>...
> Any pointers to doing non-radix 2 FFT's.
> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> Simplicity of coding OUTWEIGHS speed!
> [ Scilab is slow but adequate ]

I'm not sure what form of solution you seek.  A script? A program? A
manual method?

If you want to make a C program to accomplish the task, you could use
my KISS FFT library ( http://sourceforge.net/projects/kissfft/ ),
specifically designed for simplicity.

If you know Python (great lang imho), you could use the FFT module,
bundled with NumPy/Numeric.

Another scripting alternative is octave, which mostly works like
Matlab.

Good Luck,
Mark Borgerding
0
Reply mark 1/5/2004 12:49:43 AM

Many helpful people wrote:
> Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vv9fams78p7438@corp.supernews.com>...
> 
>>Any pointers to doing non-radix 2 FFT's.
>>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>Simplicity of coding OUTWEIGHS speed!
>>[ Scilab is slow but adequate ]
> 
> 
>[Snip many educational replies]

My problem was how I was getting the data on which the FFT would be 
done. NOT, how fast Scilab could do an FFT. Mind if I don't admit how 
bad my code was ;?

0
Reply Richard 1/5/2004 5:33:09 AM

Fred Marshall <fmarshallx@remove_the_x.acm.org> wrote:
> 
> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> news:vv9fams78p7438@corp.supernews.com...
>> Any pointers to doing non-radix 2 FFT's.
>> Specifically I wish to do 44100 pt FFT's on 16 bit data.
>> Simplicity of coding OUTWEIGHS speed!
>> [ Scilab is slow but adequate ]
>>
> 
> Doesn't SCILAB just do them?  I tried it and it seems to.  It took 0.421
> seconds on a dual processor Pentium 200MHz.

P200:
octave:4> x=randn(1,44100);tic,w=fft(x);toc
ans = 0.18392

A1600:
octave:5> x=randn(1,44100);tic,w=fft(x);toc
ans = 0.022966
0
Reply pisz_na 1/5/2004 10:47:07 PM

Alex Gibson wrote:
> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> news:vvar7epmfqmuee@corp.supernews.com...
> 
>>Richard Owlett wrote:
>>
>>>Fred Marshall wrote:
>>>
>>>
>>>>"Richard Owlett" <rowlett@atlascomm.net> wrote in message
>>>>news:vv9fams78p7438@corp.supernews.com...
>>>>
>>>>
>>>>>Any pointers to doing non-radix 2 FFT's.
>>>>>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>>>>Simplicity of coding OUTWEIGHS speed!
>>>>>[ Scilab is slow but adequate ]
>>>>>
>>>>
>>>>
>>>>Doesn't SCILAB just do them?  I tried it and it seems to.  It took
> 
> 0.421
> 
>>>>seconds on a dual processor Pentium 200MHz.
>>>>
>>>>Fred
>>>>
>>>>
>>>
>>>I'm going to have a second look at the framework I was using. My Pentium
>>>4 1.6 GHz seems to do it no faster than your machine. Well, I did do
>>>~1800 of them in a batch ;}
>>>
>>>BTW, it's already been pointed out to me in another forum that I could
>>>make my life simpler by padding to 64k ;/
>>>
>>>
>>
>>Just did a timing check using Scilab's timer(). My FFT's usually take
>>.14 seconds with a few as fast as .125 seconds. Not the speed
>>improvement I would expect. What's your OS? I'm using Winxp Pro SP1.
>>
>>Doing the timing pointed out another problem with my setup. It takes
>>.94 seconds to move the 44k samples from the input data array to the
>>array on which the FFT is performed.
>>
> 
> 
> Well early p4's (most p4's) aren't very good at floating point
> unless your code is recompiled for sse2.
> 
> Athlon's do a lot better with x87 floating point.
> 
> (athlon64 + 1 or 2 GB ram  are looking like a real nice upgrade
> for a lot of things not just gaming)
> 
> A p3 at 1.3GHz will beat a p4 1.6GHz
> on somethings about equal on others.
> 
> A p4 should beat a dual pentium 200 on most things
> (should even beat dual pentium pro 200)
> 
> How much memory do you have installed ?
> With xp of any sort 256MB is the absolute minimum for any
> serious sort of work.
> Especially as xp itself usually grabs around 128MB.

1.25 Gig
Knew I'd be wanting lots of memory when i bought the machine :}


> 
> Also what else is running in the background ?
> Did you kill off fast.exe which is the background loader for internet
> explorer.
> 

It's not shown on Task Manager, I use Mozilla in any case.
Timing didn't change when I disabled firewall and anti-virus software.


> Been a while since I last used / ran scilab.
> 
> Does scilab use atlas ?

I don't know.

> If so is it configured for your machine ?
> http://math-atlas.sourceforge.net/
> I known matlab uses it for some things.
> 
> Or have your recompiled scilab to tune it for
> the machine your on ?
> Shouldn't be necessary for most machines.

No. I don't have the tools.


> 
> Alex Gibson
> 
> 

0
Reply Richard 1/6/2004 4:45:22 AM

Well, 44100 is 3^2 * 7^2 * 10^10.  You could use 6 FFTs
matching those sizes  (Winograd, Singleton and Rader have
3,7, 5 point FFTs), however the code is likely to be less
simple than you want.  If you really don't care about speed,
you could just do a 44100 point DFT.  The code would be dirt
simple, although it may also be dog slow.

Richard Owlett wrote:

> Any pointers to doing non-radix 2 FFT's.
> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> Simplicity of coding OUTWEIGHS speed!
> [ Scilab is slow but adequate ]

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin
Franklin, 1759


0
Reply Ray 1/8/2004 7:48:45 PM

He could do a pair of prime factor 210 point FFTs cascaded using the mixed
radix algorithm though.  210=2*3*5*7, so for that you could use the PFA.
PFA is nice because it eliminates the phase rotations between the smaller
FFT kernels.  The 2,3,5 and 7 point FFTs could be any of winograd,
singleton or Rader.   A decent reference is the Smith and Smith "Handbook
of real-time fast fourier transforms", IEEE press.

"Steven G. Johnson" wrote:

> r.lyons@REMOVE.ieee.org (Rick Lyons) wrote...
> > Richard Owlett <rowlett@atlascomm.net> wrote:
> > >Any pointers to doing non-radix 2 FFT's.
> > >Specifically I wish to do 44100 pt FFT's on 16 bit data.
> > >Simplicity of coding OUTWEIGHS speed!
> >
> >   I'm ignorant of these schemes,
> > but ya' might do an Internet search
> > on "Prime Factor" and "Split-radix"
> > FFTs.
>
> The split-radix algorithm is essentially a combination of radices 2
> and 4, and only works for power-of-two sizes.  The Prime-Factor
> Algorithm (PFA) only decomposes a DFT into relatively prime factors
> (so e.g. it cannot further decompose the factor of 7^2 in 44100) and
> is fairly complicated to implement.
>
> What the original poster wants is a "mixed-radix" FFT, which is simply
> the generalized version of the standard Cooley-Tukey algorithm to
> arbitrary factorizations.  Many implementations can be found online
> (see www.fftw.org/links.html)
>
> Of course, for spectral estimation you should also be doing windowing,
> etcetera, in order to reduce edge artifacts.  Ditto if you are doing
> zero-padding.  See e.g. Oppenheim and Schafer, "Digital Signal
> Processing".
>
> Cordially,
> Steven G. Johnson

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com

 "They that give up essential liberty to obtain a little
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759


0
Reply Ray 1/8/2004 7:53:22 PM

"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:27cfb406.0401021313.4196aa02@posting.google.com...

> The split-radix algorithm is essentially a combination of radices 2
> and 4, and only works for power-of-two sizes.

The following question is only loosely related to this comment:

Last year you wrote:

> Actually, the minimal operation count (for powers of two) is achieved by
> something called "split-radix" that is a mixture of radix 2 and 4.  At
> each step, it breaks your size-n DFT into three pieces, one of size n/2
> and two of size n/4.

But it seems to me that split-radix requires 912 additions and 248
multiplications for n = 64, and 1160 isn't the minimal number of
operations.  Is there some paper you are thinking of that discusses
a modification of the standard split-radix algorithm that always
attains the minimal number of arithmetic operations (real adds +
real multiplies?)

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


0
Reply James 1/21/2004 11:15:23 PM

James Van Buskirk wrote:
> But it seems to me that split-radix requires 912 additions and 248
> multiplications for n = 64, and 1160 isn't the minimal number of
> operations.

What do you think is the lowest known number of operations for an n=64 
complex DFT, using what algorithm?
0
Reply Steven 1/22/2004 4:24:02 AM

"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:bunj9a$hts$1@news.fas.harvard.edu...

> James Van Buskirk wrote:
> > But it seems to me that split-radix requires 912 additions and 248
> > multiplications for n = 64, and 1160 isn't the minimal number of
> > operations.

> What do you think is the lowest known number of operations for an n=64
> complex DFT, using what algorithm?

The best I know of is 912 additions and 240 multiplications; just
figured out how to achieve this result yesterday.  Algorithm is
the same as used for all my computational kernals (heavy use of
real-half-complex DFTs,) just became aware of another possible trick.

The lowest known by anyone is a different question.  The more you
look at DFTs, the more you become uncertain that what you or anyone
else does is optimal, even if you get to write the criteria of
optimality yourself.

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


0
Reply James 1/22/2004 5:41:08 AM

James Van Buskirk wrote:
>>James Van Buskirk wrote:
>>>But it seems to me that split-radix requires 912 additions and 248
>>>multiplications for n = 64, and 1160 isn't the minimal number of
>>>operations.
> 
> The best I know of is 912 additions and 240 multiplications; just
> figured out how to achieve this result yesterday.

Please do let me know when you publish your results, as I try to keep 
abreast of the latest FFT developments.

The arithmetic complexity of FFT algorithms was extensively studied in 
the 1980's, when it was still considered to be a good proxy for speed 
(sadly no longer the case).  As of 1991 [1], the lowest-known operation 
count for an n=64 complex DFT was 1160 flops, a record that had been set 
in 1968 by Yavne [2] and was later matched using a simpler algorithm by 
the split-radix technique.  For N=2^m, this record is:

	flops = 4 N log2(N) - 6N + 8

(This is asymptotically 20% fewer flops than for vanilla radix 2.  Note 
that you can trade off some multiplies for additions by using the 3 adds 
+ 3 multiplies complex-multiply trick.)

Actually, it has supposedly been rigorously proved [1] that split-radix 
achieves the minimal number of complex additions.  There is also a 
rigorous lower bound [1] on the number of non-trivial complex (or real) 
multiplications, which split-radix can achieve up to N=16 (and it is 
close for larger N).  For larger N, there are known algorithms that 
achieve the minimal number of multiplies, but they do not have fewer 
total flops than split-radix, and have more flops for N > 64.  So, the 
achievable total flop count has yet not been rigorously proven.

 > Algorithm is
> the same as used for all my computational kernals (heavy use of
> real-half-complex DFTs,) just became aware of another possible trick.

(There is the known technique of taking the real-input FFTs of the real 
and imaginary parts, and then combining them with (2N-4) additions. 
However, if you use the lowest-known operation counts for real-input 
FFTs [3], this gives exactly the same known flops for the complex DFT. 
e.g. 1160flops for n=64 from two 518flops n=64 real-input FFTs.)

Cordially,
Steven G. Johnson

[1] P. Duhamel and M. Vetterli, "Fast Fourier transforms: A tutorial 
review and a state of the art," Signal Processing 19, 259-299 (1990).
[2] R. Yavne, "An economical method for calculating the discrete Fourier 
transform," Proc. AFIPS 33, 115-125 (1968).
[3]  H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus, 
"Real-valued fast Fourier transform algorithms," IEEE Trans. Acoust. 
Speech Sig. Processing ASSP-35, 849�863 (1987).
0
Reply Steven 1/22/2004 9:36:13 PM

"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:4010424d$0$564$b45e6eb0@senator-bedfellow.mit.edu...

> Please do let me know when you publish your results, as I try to keep
> abreast of the latest FFT developments.

Well, I would just post the code the n = 64, but it's a little long --
35937 bytes.  Is this objectionably long for this group?  The trick
that made the difference for n = 64 would also improve the number of
multiplies for some of my smaller modules.  I'm not sure at this point
whether I want to rewrite them all (many take a day each) or try to
create a code generator that can do what I do in the short time
available to me just now to ponder this problem again.

> Actually, it has supposedly been rigorously proved [1] that split-radix
> achieves the minimal number of complex additions.

My code is structured for minimum-add so I am not challenging the
above assertion.

> [1] P. Duhamel and M. Vetterli, "Fast Fourier transforms: A tutorial
> review and a state of the art," Signal Processing 19, 259-299 (1990).
> [2] R. Yavne, "An economical method for calculating the discrete Fourier
> transform," Proc. AFIPS 33, 115-125 (1968).
> [3]  H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus,
> "Real-valued fast Fourier transform algorithms," IEEE Trans. Acoust.
> Speech Sig. Processing ASSP-35, 849�863 (1987).

Thanks for the comments and the references.  Lets see... I've read [1],
and it is highly recommended.  [2] just happened to be sitting on my
desk as I read this, but I'm not sure if I've seen [3].

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


0
Reply James 1/22/2004 11:27:59 PM

PS. James graciously sent me a copy of his code privately, and I can 
verify that it is doing what he claims.  This looks like a very 
impressive result, and an important advance for the complexity theory of 
FFT algorithms: breaking the 36-year-old record for the minimal flop 
count in power-of-two sizes, from 1160 to 1152 flops for n=64.  I hope 
he publishes his algorithm in the near future, and encourage him to post 
his code (gzipped it is < 10k).

PPS. I should caution other readers that a small change in the flop 
count alone is unlikely to significantly impact FFT speed, which is 
dominated these days by cache and pipeline utilization.  But the 
question of minimal arithmetic complexity is still important on 
theoretical grounds.
0
Reply Steven 1/23/2004 10:22:33 PM

"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:bus6sq$8ib$1@news.fas.harvard.edu...

>                                                                 I hope
> he publishes his algorithm in the near future, and encourage him to post
> his code (gzipped it is < 10k).

Perhaps if you have a little corner on your website reserved for
miscellaneous stuff, you could place it there and post a link.  Seems
to me to be the preferred method for communicating largish chunks
of data in newsgroups, if it's not too much trouble from your
viewpoint.

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


0
Reply James 1/24/2004 1:30:23 AM

James Van Buskirk wrote:

> "Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
> news:bus6sq$8ib$1@news.fas.harvard.edu...
> 
> 
>>                                                                I hope
>>he publishes his algorithm in the near future, and encourage him to post
>>his code (gzipped it is < 10k).
> 
> 
> Perhaps if you have a little corner on your website reserved for
> miscellaneous stuff, you could place it there and post a link.  Seems
> to me to be the preferred method for communicating largish chunks
> of data in newsgroups, if it's not too much trouble from your
> viewpoint.

I'll be happy to host it. Even with my slow dial-up connection, it won't
be much trouble. Check out http://users.rcn.com/jyavins to see if it
would be appropriate. It could replace Jeff Briden's simulation that is
old hat by now, have a line of its own, or be anonymous like
http://users.rcn.com/jyavins/walala-lena.jpg.

Jerry

P.S. If you look at the site, please tell me what you think of my new
favicon. It's a reproduction of the stamp I put on the jewelry I make.
-- 
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������

0
Reply Jerry 1/24/2004 2:12:19 AM

"Jerry Avins" <jya@ieee.org> wrote in message
news:4011d483$0$12745$61fed72c@news.rcn.com...

> I'll be happy to host it. Even with my slow dial-up connection, it won't
> be much trouble. Check out http://users.rcn.com/jyavins to see if it
> would be appropriate. It could replace Jeff Briden's simulation that is
> old hat by now, have a line of its own, or be anonymous like
> http://users.rcn.com/jyavins/walala-lena.jpg.

Thanks for the offer; cool website.  I finally got around to figuring
out how to create a personal web page through my ISP.

http://home.comcast.net/~kmbtib/index.html

I uploaded the files.  If there are any questions, post them here
or to comp.lang.fortran, or decode my email address...

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


0
Reply James 1/24/2004 7:04:25 PM

James Van Buskirk wrote:

> ... cool website.  I finally got around to figuring
> out how to create a personal web page through my ISP.
> 
> http://home.comcast.net/~kmbtib/index.html
> 
> I uploaded the files.  If there are any questions, post them here
> or to comp.lang.fortran, or decode my email address...

Nifty code! My site started as a way to show my face to a a stranger I 
had agreed to meet at the airport. The rest grew by accretion. CuteFTP 
and CuteHTML made it easy.

Jerry
-- 
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������

0
Reply Jerry 1/24/2004 7:35:09 PM

"Jerry Avins" <jya@ieee.org> wrote in message
news:4012c8ef$0$7333$61fed72c@news.rcn.com...

> James Van Buskirk wrote:

> > ... cool website.  I finally got around to figuring
> > out how to create a personal web page through my ISP.

> > http://home.comcast.net/~kmbtib/index.html

> > I uploaded the files.  If there are any questions, post them here
> > or to comp.lang.fortran, or decode my email address...

> Nifty code! My site started as a way to show my face to a a stranger I
> had agreed to meet at the airport. The rest grew by accretion. CuteFTP
> and CuteHTML made it easy.

Well, I got back from my walk in the park and concluded that maybe
I should put some descriptions on my website of what the files
are and also, to be topical to the original thread (now that the
subject line has changed, of course :) ) I zipped up my code for all
prime power orders <= 64, so it includes the factors 25 and 49,
although as we have seen, this old code is now known not to be
optimal.

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


0
Reply James 1/25/2004 5:24:33 AM

24 Replies
201 Views

(page loaded in 0.224 seconds)

Similiar Articles:


















7/19/2012 4:38:22 PM


Reply: