Any pointers to doing non-radix 2 FFT's.
Specifically I wish to do 44100 pt FFT's on 16 bit data.
Simplicity of coding OUTWEIGHS speed!
[ Scilab is slow but adequate ]
|
|
0
|
|
|
|
Reply
|
Richard
|
1/2/2004 12:39:31 AM |
|
On Thu, 01 Jan 2004 18:39:31 -0600, Richard Owlett
<rowlett@atlascomm.net> wrote:
>Any pointers to doing non-radix 2 FFT's.
>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>Simplicity of coding OUTWEIGHS speed!
>[ Scilab is slow but adequate ]
Hi Richard,
I'm ignorant of these schemes,
but ya' might do an Internet search
on "Prime Factor" and "Split-radix"
FFTs.
Good Luck,
[-Rick-]
|
|
0
|
|
|
|
Reply
|
r
|
1/2/2004 2:58:32 AM
|
|
"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vv9fams78p7438@corp.supernews.com...
> Any pointers to doing non-radix 2 FFT's.
> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> Simplicity of coding OUTWEIGHS speed!
> [ Scilab is slow but adequate ]
>
Doesn't SCILAB just do them? I tried it and it seems to. It took 0.421
seconds on a dual processor Pentium 200MHz.
Fred
|
|
0
|
|
|
|
Reply
|
Fred
|
1/2/2004 8:33:56 AM
|
|
Fred Marshall wrote:
> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> news:vv9fams78p7438@corp.supernews.com...
>
>>Any pointers to doing non-radix 2 FFT's.
>>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>Simplicity of coding OUTWEIGHS speed!
>>[ Scilab is slow but adequate ]
>>
>
>
> Doesn't SCILAB just do them? I tried it and it seems to. It took 0.421
> seconds on a dual processor Pentium 200MHz.
>
> Fred
>
>
I'm going to have a second look at the framework I was using. My
Pentium 4 1.6 GHz seems to do it no faster than your machine. Well, I
did do ~1800 of them in a batch ;}
BTW, it's already been pointed out to me in another forum that I could
make my life simpler by padding to 64k ;/
|
|
0
|
|
|
|
Reply
|
Richard
|
1/2/2004 12:22:44 PM
|
|
Richard Owlett wrote:
> Fred Marshall wrote:
>
>> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
>> news:vv9fams78p7438@corp.supernews.com...
>>
>>> Any pointers to doing non-radix 2 FFT's.
>>> Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>> Simplicity of coding OUTWEIGHS speed!
>>> [ Scilab is slow but adequate ]
>>>
>>
>>
>> Doesn't SCILAB just do them? I tried it and it seems to. It took 0.421
>> seconds on a dual processor Pentium 200MHz.
>>
>> Fred
>>
>>
>
> I'm going to have a second look at the framework I was using. My Pentium
> 4 1.6 GHz seems to do it no faster than your machine. Well, I did do
> ~1800 of them in a batch ;}
>
> BTW, it's already been pointed out to me in another forum that I could
> make my life simpler by padding to 64k ;/
>
>
Just did a timing check using Scilab's timer(). My FFT's usually take
..14 seconds with a few as fast as .125 seconds. Not the speed
improvement I would expect. What's your OS? I'm using Winxp Pro SP1.
Doing the timing pointed out another problem with my setup. It takes
..94 seconds to move the 44k samples from the input data array to the
array on which the FFT is performed.
|
|
0
|
|
|
|
Reply
|
Richard
|
1/2/2004 1:08:53 PM
|
|
"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vvar7epmfqmuee@corp.supernews.com...
> Richard Owlett wrote:
> > Fred Marshall wrote:
> >
> >> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> >> news:vv9fams78p7438@corp.supernews.com...
> >>
> >>> Any pointers to doing non-radix 2 FFT's.
> >>> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> >>> Simplicity of coding OUTWEIGHS speed!
> >>> [ Scilab is slow but adequate ]
> >>>
> >>
> >>
> >> Doesn't SCILAB just do them? I tried it and it seems to. It took
0.421
> >> seconds on a dual processor Pentium 200MHz.
> >>
> >> Fred
> >>
> >>
> >
> > I'm going to have a second look at the framework I was using. My Pentium
> > 4 1.6 GHz seems to do it no faster than your machine. Well, I did do
> > ~1800 of them in a batch ;}
> >
> > BTW, it's already been pointed out to me in another forum that I could
> > make my life simpler by padding to 64k ;/
> >
> >
>
> Just did a timing check using Scilab's timer(). My FFT's usually take
> .14 seconds with a few as fast as .125 seconds. Not the speed
> improvement I would expect. What's your OS? I'm using Winxp Pro SP1.
>
> Doing the timing pointed out another problem with my setup. It takes
> .94 seconds to move the 44k samples from the input data array to the
> array on which the FFT is performed.
Richard,
First, the test I did was structured like this:
time=0
i=1
p(1:44100)=1
while i<=100
timer();
pf=fft(p,-1);
time=time+timer()
i=i+1
end
So, there's no moving of arrrays in this code unless it's being done in the
fft - it's just a simple timing loop.
The machine I ran it on runs NT4sp6a. Well it *is* only a 200MHz machine!
But it may be able to use both processors in this case - I don't know and
would frankly be a little surprised if it did. Oh, but it appears that it
is using both processors! So, either the speed difference would be 4
assuming you have a single processor machine and we're seeing 3.5 which is
close enough to 4 for me. Maybe XP is using more resources than NT.
I do note that 44100 breaks into a series of twice repeated primes 2,3,5,7 -
so if you want to do something fancier, maybe you can take advantage of
this - but I'm not at all sure it will be worth it. You *did* say that
speed isn't the first issue.
Just for fun, I tried decreasing the array size to
32768 -> 0.235 secs.
44100 -> 0.437 secs.
65536 -> 0.781 secs.
I don't know, it looks like there must be some swapping going on for the
largest array:
44100/32768 = 1.346 .437/.235 = 1.859 1.859/1.346 = 1.381 factor over
linear increase
65536/44100 = 1.486 .781/.437 = 1.787 1.787/1.486 = 1.2 factor over
linear increase
65536/32768 = 2 .781/.235 = 3.32 3.32/2 = 1.66 factor over linear
increase.
N*log2(N) @ 32768 = 32768*15 = 491,520
@ 44100 = 44100*15.43 = 680,463 (just for reference even though
it's not a power of 2)
@ 65536 = 65536*16 = 1,048,576
44/32 > 680,463/491,520 = 1.384 predicted vs. 1.859 measured; or 1.34
higher than predicted increase
65/44 > 1,048,576/680,463 = 1.54 predicted vs. 1.787 measured; or 1.16
higher than predicted increase
65/32 > 1,048,576/491,520 = 2.133 predicted vs. 3.32 measured; or 1.56
higher than predicted increase
If we assume that 44100 would be less efficient, then it makes sense that
44/32 is higher than 65/44 doesn't it? (That is, if 44100 is less efficient
it should be closer to 65K and further from 32K). What I don't figure yet
is why 65/32 would be the worst situation. Perhaps it's because there's
swapping going on with such a large array - I don't know, this system has
192MB of memory.
Fred
|
|
0
|
|
|
|
Reply
|
Fred
|
1/2/2004 7:47:51 PM
|
|
"Richard Owlett" <rowlett@atlascomm.net> wrote in message
news:vvar7epmfqmuee@corp.supernews.com...
> Richard Owlett wrote:
> > Fred Marshall wrote:
> >
> >> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> >> news:vv9fams78p7438@corp.supernews.com...
> >>
> >>> Any pointers to doing non-radix 2 FFT's.
> >>> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> >>> Simplicity of coding OUTWEIGHS speed!
> >>> [ Scilab is slow but adequate ]
> >>>
> >>
> >>
> >> Doesn't SCILAB just do them? I tried it and it seems to. It took
0.421
> >> seconds on a dual processor Pentium 200MHz.
> >>
> >> Fred
> >>
> >>
> >
> > I'm going to have a second look at the framework I was using. My Pentium
> > 4 1.6 GHz seems to do it no faster than your machine. Well, I did do
> > ~1800 of them in a batch ;}
> >
> > BTW, it's already been pointed out to me in another forum that I could
> > make my life simpler by padding to 64k ;/
> >
> >
Richard,
I forgot to address the padding to 64K. I don't see why you'd want to do
that unless there performance was greatly affected or unless you need to
zero pad for other reasons - like for circular convolution considerations.
44100 is a product of repeat numbers of primes: 2x2x3x3x5x5x7x7 so its
performance might be expected to be OK if the implemenation is good. It
appears it is in Scilab. On the other hand, try 4091 or 4093!!
Here's the recent code I used. The runs times come out a little bit longer:
function [rowtime,coltime]=fftloop(N,loops)
i=1;
rowtime=0;
coltime=0;
while i<=loops
p=rand(1,N);
timer()
pf=fft(p,-1);
rowtime=rowtime+timer();
i=i+1;
end
rowtime=rowtime/loops
i=1;
while i<=loops
q=rand(N,1);
timer();
pf=fft(q,-1);
coltime=coltime+timer();
i=i+1;
end
coltime=coltime/loops
Fred
|
|
0
|
|
|
|
Reply
|
Fred
|
1/2/2004 8:17:27 PM
|
|
r.lyons@REMOVE.ieee.org (Rick Lyons) wrote...
> Richard Owlett <rowlett@atlascomm.net> wrote:
> >Any pointers to doing non-radix 2 FFT's.
> >Specifically I wish to do 44100 pt FFT's on 16 bit data.
> >Simplicity of coding OUTWEIGHS speed!
>
> I'm ignorant of these schemes,
> but ya' might do an Internet search
> on "Prime Factor" and "Split-radix"
> FFTs.
The split-radix algorithm is essentially a combination of radices 2
and 4, and only works for power-of-two sizes. The Prime-Factor
Algorithm (PFA) only decomposes a DFT into relatively prime factors
(so e.g. it cannot further decompose the factor of 7^2 in 44100) and
is fairly complicated to implement.
What the original poster wants is a "mixed-radix" FFT, which is simply
the generalized version of the standard Cooley-Tukey algorithm to
arbitrary factorizations. Many implementations can be found online
(see www.fftw.org/links.html)
Of course, for spectral estimation you should also be doing windowing,
etcetera, in order to reduce edge artifacts. Ditto if you are doing
zero-padding. See e.g. Oppenheim and Schafer, "Digital Signal
Processing".
Cordially,
Steven G. Johnson
|
|
0
|
|
|
|
Reply
|
stevenj
|
1/2/2004 9:13:54 PM
|
|
Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vv9fams78p7438@corp.supernews.com>...
> Any pointers to doing non-radix 2 FFT's.
> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> Simplicity of coding OUTWEIGHS speed!
> [ Scilab is slow but adequate ]
I'm not sure what form of solution you seek. A script? A program? A
manual method?
If you want to make a C program to accomplish the task, you could use
my KISS FFT library ( http://sourceforge.net/projects/kissfft/ ),
specifically designed for simplicity.
If you know Python (great lang imho), you could use the FFT module,
bundled with NumPy/Numeric.
Another scripting alternative is octave, which mostly works like
Matlab.
Good Luck,
Mark Borgerding
|
|
0
|
|
|
|
Reply
|
mark
|
1/5/2004 12:49:43 AM
|
|
Many helpful people wrote:
> Richard Owlett <rowlett@atlascomm.net> wrote in message news:<vv9fams78p7438@corp.supernews.com>...
>
>>Any pointers to doing non-radix 2 FFT's.
>>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>Simplicity of coding OUTWEIGHS speed!
>>[ Scilab is slow but adequate ]
>
>
>[Snip many educational replies]
My problem was how I was getting the data on which the FFT would be
done. NOT, how fast Scilab could do an FFT. Mind if I don't admit how
bad my code was ;?
|
|
0
|
|
|
|
Reply
|
Richard
|
1/5/2004 5:33:09 AM
|
|
Fred Marshall <fmarshallx@remove_the_x.acm.org> wrote:
>
> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> news:vv9fams78p7438@corp.supernews.com...
>> Any pointers to doing non-radix 2 FFT's.
>> Specifically I wish to do 44100 pt FFT's on 16 bit data.
>> Simplicity of coding OUTWEIGHS speed!
>> [ Scilab is slow but adequate ]
>>
>
> Doesn't SCILAB just do them? I tried it and it seems to. It took 0.421
> seconds on a dual processor Pentium 200MHz.
P200:
octave:4> x=randn(1,44100);tic,w=fft(x);toc
ans = 0.18392
A1600:
octave:5> x=randn(1,44100);tic,w=fft(x);toc
ans = 0.022966
|
|
0
|
|
|
|
Reply
|
pisz_na
|
1/5/2004 10:47:07 PM
|
|
Alex Gibson wrote:
> "Richard Owlett" <rowlett@atlascomm.net> wrote in message
> news:vvar7epmfqmuee@corp.supernews.com...
>
>>Richard Owlett wrote:
>>
>>>Fred Marshall wrote:
>>>
>>>
>>>>"Richard Owlett" <rowlett@atlascomm.net> wrote in message
>>>>news:vv9fams78p7438@corp.supernews.com...
>>>>
>>>>
>>>>>Any pointers to doing non-radix 2 FFT's.
>>>>>Specifically I wish to do 44100 pt FFT's on 16 bit data.
>>>>>Simplicity of coding OUTWEIGHS speed!
>>>>>[ Scilab is slow but adequate ]
>>>>>
>>>>
>>>>
>>>>Doesn't SCILAB just do them? I tried it and it seems to. It took
>
> 0.421
>
>>>>seconds on a dual processor Pentium 200MHz.
>>>>
>>>>Fred
>>>>
>>>>
>>>
>>>I'm going to have a second look at the framework I was using. My Pentium
>>>4 1.6 GHz seems to do it no faster than your machine. Well, I did do
>>>~1800 of them in a batch ;}
>>>
>>>BTW, it's already been pointed out to me in another forum that I could
>>>make my life simpler by padding to 64k ;/
>>>
>>>
>>
>>Just did a timing check using Scilab's timer(). My FFT's usually take
>>.14 seconds with a few as fast as .125 seconds. Not the speed
>>improvement I would expect. What's your OS? I'm using Winxp Pro SP1.
>>
>>Doing the timing pointed out another problem with my setup. It takes
>>.94 seconds to move the 44k samples from the input data array to the
>>array on which the FFT is performed.
>>
>
>
> Well early p4's (most p4's) aren't very good at floating point
> unless your code is recompiled for sse2.
>
> Athlon's do a lot better with x87 floating point.
>
> (athlon64 + 1 or 2 GB ram are looking like a real nice upgrade
> for a lot of things not just gaming)
>
> A p3 at 1.3GHz will beat a p4 1.6GHz
> on somethings about equal on others.
>
> A p4 should beat a dual pentium 200 on most things
> (should even beat dual pentium pro 200)
>
> How much memory do you have installed ?
> With xp of any sort 256MB is the absolute minimum for any
> serious sort of work.
> Especially as xp itself usually grabs around 128MB.
1.25 Gig
Knew I'd be wanting lots of memory when i bought the machine :}
>
> Also what else is running in the background ?
> Did you kill off fast.exe which is the background loader for internet
> explorer.
>
It's not shown on Task Manager, I use Mozilla in any case.
Timing didn't change when I disabled firewall and anti-virus software.
> Been a while since I last used / ran scilab.
>
> Does scilab use atlas ?
I don't know.
> If so is it configured for your machine ?
> http://math-atlas.sourceforge.net/
> I known matlab uses it for some things.
>
> Or have your recompiled scilab to tune it for
> the machine your on ?
> Shouldn't be necessary for most machines.
No. I don't have the tools.
>
> Alex Gibson
>
>
|
|
0
|
|
|
|
Reply
|
Richard
|
1/6/2004 4:45:22 AM
|
|
Well, 44100 is 3^2 * 7^2 * 10^10. You could use 6 FFTs
matching those sizes (Winograd, Singleton and Rader have
3,7, 5 point FFTs), however the code is likely to be less
simple than you want. If you really don't care about speed,
you could just do a 44100 point DFT. The code would be dirt
simple, although it may also be dog slow.
Richard Owlett wrote:
> Any pointers to doing non-radix 2 FFT's.
> Specifically I wish to do 44100 pt FFT's on 16 bit data.
> Simplicity of coding OUTWEIGHS speed!
> [ Scilab is slow but adequate ]
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com
"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin
Franklin, 1759
|
|
0
|
|
|
|
Reply
|
Ray
|
1/8/2004 7:48:45 PM
|
|
He could do a pair of prime factor 210 point FFTs cascaded using the mixed
radix algorithm though. 210=2*3*5*7, so for that you could use the PFA.
PFA is nice because it eliminates the phase rotations between the smaller
FFT kernels. The 2,3,5 and 7 point FFTs could be any of winograd,
singleton or Rader. A decent reference is the Smith and Smith "Handbook
of real-time fast fourier transforms", IEEE press.
"Steven G. Johnson" wrote:
> r.lyons@REMOVE.ieee.org (Rick Lyons) wrote...
> > Richard Owlett <rowlett@atlascomm.net> wrote:
> > >Any pointers to doing non-radix 2 FFT's.
> > >Specifically I wish to do 44100 pt FFT's on 16 bit data.
> > >Simplicity of coding OUTWEIGHS speed!
> >
> > I'm ignorant of these schemes,
> > but ya' might do an Internet search
> > on "Prime Factor" and "Split-radix"
> > FFTs.
>
> The split-radix algorithm is essentially a combination of radices 2
> and 4, and only works for power-of-two sizes. The Prime-Factor
> Algorithm (PFA) only decomposes a DFT into relatively prime factors
> (so e.g. it cannot further decompose the factor of 7^2 in 44100) and
> is fairly complicated to implement.
>
> What the original poster wants is a "mixed-radix" FFT, which is simply
> the generalized version of the standard Cooley-Tukey algorithm to
> arbitrary factorizations. Many implementations can be found online
> (see www.fftw.org/links.html)
>
> Of course, for spectral estimation you should also be doing windowing,
> etcetera, in order to reduce edge artifacts. Ditto if you are doing
> zero-padding. See e.g. Oppenheim and Schafer, "Digital Signal
> Processing".
>
> Cordially,
> Steven G. Johnson
--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email ray@andraka.com
http://www.andraka.com
"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759
|
|
0
|
|
|
|
Reply
|
Ray
|
1/8/2004 7:53:22 PM
|
|
"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:27cfb406.0401021313.4196aa02@posting.google.com...
> The split-radix algorithm is essentially a combination of radices 2
> and 4, and only works for power-of-two sizes.
The following question is only loosely related to this comment:
Last year you wrote:
> Actually, the minimal operation count (for powers of two) is achieved by
> something called "split-radix" that is a mixture of radix 2 and 4. At
> each step, it breaks your size-n DFT into three pieces, one of size n/2
> and two of size n/4.
But it seems to me that split-radix requires 912 additions and 248
multiplications for n = 64, and 1160 isn't the minimal number of
operations. Is there some paper you are thinking of that discusses
a modification of the standard split-radix algorithm that always
attains the minimal number of arithmetic operations (real adds +
real multiplies?)
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
|
|
0
|
|
|
|
Reply
|
James
|
1/21/2004 11:15:23 PM
|
|
James Van Buskirk wrote:
> But it seems to me that split-radix requires 912 additions and 248
> multiplications for n = 64, and 1160 isn't the minimal number of
> operations.
What do you think is the lowest known number of operations for an n=64
complex DFT, using what algorithm?
|
|
0
|
|
|
|
Reply
|
Steven
|
1/22/2004 4:24:02 AM
|
|
"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:bunj9a$hts$1@news.fas.harvard.edu...
> James Van Buskirk wrote:
> > But it seems to me that split-radix requires 912 additions and 248
> > multiplications for n = 64, and 1160 isn't the minimal number of
> > operations.
> What do you think is the lowest known number of operations for an n=64
> complex DFT, using what algorithm?
The best I know of is 912 additions and 240 multiplications; just
figured out how to achieve this result yesterday. Algorithm is
the same as used for all my computational kernals (heavy use of
real-half-complex DFTs,) just became aware of another possible trick.
The lowest known by anyone is a different question. The more you
look at DFTs, the more you become uncertain that what you or anyone
else does is optimal, even if you get to write the criteria of
optimality yourself.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
|
|
0
|
|
|
|
Reply
|
James
|
1/22/2004 5:41:08 AM
|
|
James Van Buskirk wrote:
>>James Van Buskirk wrote:
>>>But it seems to me that split-radix requires 912 additions and 248
>>>multiplications for n = 64, and 1160 isn't the minimal number of
>>>operations.
>
> The best I know of is 912 additions and 240 multiplications; just
> figured out how to achieve this result yesterday.
Please do let me know when you publish your results, as I try to keep
abreast of the latest FFT developments.
The arithmetic complexity of FFT algorithms was extensively studied in
the 1980's, when it was still considered to be a good proxy for speed
(sadly no longer the case). As of 1991 [1], the lowest-known operation
count for an n=64 complex DFT was 1160 flops, a record that had been set
in 1968 by Yavne [2] and was later matched using a simpler algorithm by
the split-radix technique. For N=2^m, this record is:
flops = 4 N log2(N) - 6N + 8
(This is asymptotically 20% fewer flops than for vanilla radix 2. Note
that you can trade off some multiplies for additions by using the 3 adds
+ 3 multiplies complex-multiply trick.)
Actually, it has supposedly been rigorously proved [1] that split-radix
achieves the minimal number of complex additions. There is also a
rigorous lower bound [1] on the number of non-trivial complex (or real)
multiplications, which split-radix can achieve up to N=16 (and it is
close for larger N). For larger N, there are known algorithms that
achieve the minimal number of multiplies, but they do not have fewer
total flops than split-radix, and have more flops for N > 64. So, the
achievable total flop count has yet not been rigorously proven.
> Algorithm is
> the same as used for all my computational kernals (heavy use of
> real-half-complex DFTs,) just became aware of another possible trick.
(There is the known technique of taking the real-input FFTs of the real
and imaginary parts, and then combining them with (2N-4) additions.
However, if you use the lowest-known operation counts for real-input
FFTs [3], this gives exactly the same known flops for the complex DFT.
e.g. 1160flops for n=64 from two 518flops n=64 real-input FFTs.)
Cordially,
Steven G. Johnson
[1] P. Duhamel and M. Vetterli, "Fast Fourier transforms: A tutorial
review and a state of the art," Signal Processing 19, 259-299 (1990).
[2] R. Yavne, "An economical method for calculating the discrete Fourier
transform," Proc. AFIPS 33, 115-125 (1968).
[3] H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus,
"Real-valued fast Fourier transform algorithms," IEEE Trans. Acoust.
Speech Sig. Processing ASSP-35, 849�863 (1987).
|
|
0
|
|
|
|
Reply
|
Steven
|
1/22/2004 9:36:13 PM
|
|
"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:4010424d$0$564$b45e6eb0@senator-bedfellow.mit.edu...
> Please do let me know when you publish your results, as I try to keep
> abreast of the latest FFT developments.
Well, I would just post the code the n = 64, but it's a little long --
35937 bytes. Is this objectionably long for this group? The trick
that made the difference for n = 64 would also improve the number of
multiplies for some of my smaller modules. I'm not sure at this point
whether I want to rewrite them all (many take a day each) or try to
create a code generator that can do what I do in the short time
available to me just now to ponder this problem again.
> Actually, it has supposedly been rigorously proved [1] that split-radix
> achieves the minimal number of complex additions.
My code is structured for minimum-add so I am not challenging the
above assertion.
> [1] P. Duhamel and M. Vetterli, "Fast Fourier transforms: A tutorial
> review and a state of the art," Signal Processing 19, 259-299 (1990).
> [2] R. Yavne, "An economical method for calculating the discrete Fourier
> transform," Proc. AFIPS 33, 115-125 (1968).
> [3] H. V. Sorensen, D. L. Jones, M. T. Heideman, and C. S. Burrus,
> "Real-valued fast Fourier transform algorithms," IEEE Trans. Acoust.
> Speech Sig. Processing ASSP-35, 849�863 (1987).
Thanks for the comments and the references. Lets see... I've read [1],
and it is highly recommended. [2] just happened to be sitting on my
desk as I read this, but I'm not sure if I've seen [3].
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
|
|
0
|
|
|
|
Reply
|
James
|
1/22/2004 11:27:59 PM
|
|
PS. James graciously sent me a copy of his code privately, and I can
verify that it is doing what he claims. This looks like a very
impressive result, and an important advance for the complexity theory of
FFT algorithms: breaking the 36-year-old record for the minimal flop
count in power-of-two sizes, from 1160 to 1152 flops for n=64. I hope
he publishes his algorithm in the near future, and encourage him to post
his code (gzipped it is < 10k).
PPS. I should caution other readers that a small change in the flop
count alone is unlikely to significantly impact FFT speed, which is
dominated these days by cache and pipeline utilization. But the
question of minimal arithmetic complexity is still important on
theoretical grounds.
|
|
0
|
|
|
|
Reply
|
Steven
|
1/23/2004 10:22:33 PM
|
|
"Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
news:bus6sq$8ib$1@news.fas.harvard.edu...
> I hope
> he publishes his algorithm in the near future, and encourage him to post
> his code (gzipped it is < 10k).
Perhaps if you have a little corner on your website reserved for
miscellaneous stuff, you could place it there and post a link. Seems
to me to be the preferred method for communicating largish chunks
of data in newsgroups, if it's not too much trouble from your
viewpoint.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
|
|
0
|
|
|
|
Reply
|
James
|
1/24/2004 1:30:23 AM
|
|
James Van Buskirk wrote:
> "Steven G. Johnson" <stevenj@alum.mit.edu> wrote in message
> news:bus6sq$8ib$1@news.fas.harvard.edu...
>
>
>> I hope
>>he publishes his algorithm in the near future, and encourage him to post
>>his code (gzipped it is < 10k).
>
>
> Perhaps if you have a little corner on your website reserved for
> miscellaneous stuff, you could place it there and post a link. Seems
> to me to be the preferred method for communicating largish chunks
> of data in newsgroups, if it's not too much trouble from your
> viewpoint.
I'll be happy to host it. Even with my slow dial-up connection, it won't
be much trouble. Check out http://users.rcn.com/jyavins to see if it
would be appropriate. It could replace Jeff Briden's simulation that is
old hat by now, have a line of its own, or be anonymous like
http://users.rcn.com/jyavins/walala-lena.jpg.
Jerry
P.S. If you look at the site, please tell me what you think of my new
favicon. It's a reproduction of the stamp I put on the jewelry I make.
--
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������
|
|
0
|
|
|
|
Reply
|
Jerry
|
1/24/2004 2:12:19 AM
|
|
"Jerry Avins" <jya@ieee.org> wrote in message
news:4011d483$0$12745$61fed72c@news.rcn.com...
> I'll be happy to host it. Even with my slow dial-up connection, it won't
> be much trouble. Check out http://users.rcn.com/jyavins to see if it
> would be appropriate. It could replace Jeff Briden's simulation that is
> old hat by now, have a line of its own, or be anonymous like
> http://users.rcn.com/jyavins/walala-lena.jpg.
Thanks for the offer; cool website. I finally got around to figuring
out how to create a personal web page through my ISP.
http://home.comcast.net/~kmbtib/index.html
I uploaded the files. If there are any questions, post them here
or to comp.lang.fortran, or decode my email address...
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
|
|
0
|
|
|
|
Reply
|
James
|
1/24/2004 7:04:25 PM
|
|
James Van Buskirk wrote:
> ... cool website. I finally got around to figuring
> out how to create a personal web page through my ISP.
>
> http://home.comcast.net/~kmbtib/index.html
>
> I uploaded the files. If there are any questions, post them here
> or to comp.lang.fortran, or decode my email address...
Nifty code! My site started as a way to show my face to a a stranger I
had agreed to meet at the airport. The rest grew by accretion. CuteFTP
and CuteHTML made it easy.
Jerry
--
Engineering is the art of making what you want from things you can get.
�����������������������������������������������������������������������
|
|
0
|
|
|
|
Reply
|
Jerry
|
1/24/2004 7:35:09 PM
|
|
"Jerry Avins" <jya@ieee.org> wrote in message
news:4012c8ef$0$7333$61fed72c@news.rcn.com...
> James Van Buskirk wrote:
> > ... cool website. I finally got around to figuring
> > out how to create a personal web page through my ISP.
> > http://home.comcast.net/~kmbtib/index.html
> > I uploaded the files. If there are any questions, post them here
> > or to comp.lang.fortran, or decode my email address...
> Nifty code! My site started as a way to show my face to a a stranger I
> had agreed to meet at the airport. The rest grew by accretion. CuteFTP
> and CuteHTML made it easy.
Well, I got back from my walk in the park and concluded that maybe
I should put some descriptions on my website of what the files
are and also, to be topical to the original thread (now that the
subject line has changed, of course :) ) I zipped up my code for all
prime power orders <= 64, so it includes the factors 25 and 49,
although as we have seen, this old code is now known not to be
optimal.
--
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end
|
|
0
|
|
|
|
Reply
|
James
|
1/25/2004 5:24:33 AM
|
|
|
24 Replies
201 Views
(page loaded in 0.224 seconds)
Similiar Articles: 1D space FFT - comp.soft-sys.matlab... L : 1/L : n/L]; when i plot (k, ftu1,'.'); i find a very large ... for an image - comp.lang.labview 1D space FFT - comp.soft-sys.matlab Fast Fourier Transformation ... FFT+inverse FFT - comp.lang.idl-pvwaveThere is an algorithm called FFT (Fast Fourier > > Transformation) > > that happens to be very ... Converting FFT to inverse-FFT - comp.dsp... right after ... The Large FFT ... conventional wisdom how to upsample very large arrays, accurately ...My current system takes 10 hours to computer a 1E9 FFT (actually, size is 2^30 ... java.gui ... conventional wisdom how to upsample very large arrays, accurately ... (2 ... DC component in FFT Spectrum - comp.soft-sys.matlab> > > I you don't want sidelobes of a large DC component to > > contaminate the ... The Fast Fourier Transform (FFT) and Power Spectrum ... the ... TAKING THE FFT The following ... Calibrating FFT results, amplitude in to magnitude out - comp.dsp ...if w = 2*pi*k/N for integer k, it will come out real clean and > you will know how large the number in the FFT bin is for a ... The Fast Fourier Transform (FFT ... obtain FFT Return Values - comp.dsp... represented by a pair is sqrt(real^2 + imaginary^2), which could concievably be as large as ... The Fast Fourier Transform (FFT) and the power spectrum ... - Stack ... 2D FFT of images - comp.soft-sys.matlabThis is an fft-based convolution including ... is using FFT based convolution for large ... FFT stands for "Fast" Fourier Transform and is simply a fast algorithm for ... FFTW: Inverse of forward fft not equal to original function - comp ...In any case, the errors are too large to be ascribed ... FFT in blocks and then average it. - comp.dsp 2) You mentioned FFTW, but not which ... The Fast Fourier Transform ... Implementation of FFT and IFFT in Matlab - comp.soft-sys.matlab ...In general, N is large and there can be many peaks with different levels. ... Implementation of FFT and IFFT in Matlab ... ... Fast Fourier Transform Source code in Visual ... How to compute phase of FFT signal in real world? - comp.dsp ...... on a bin center, then the max spectral magnitude component will be large ... of FFT signal in real world? - comp.dsp ..... frequency, phase ... fast fourier transformation ... 2D Convolution using FFTW - comp.dspg,h, 0, 0, 0, 0, 0, i] Step (4) Take FFT of newKernel and the image padded with ... comp.soft-sys.matlab 2D FFT of images - comp.soft-sys.matlab FFTW ... for large kernel ... Phase versus Time from an FFT - comp.soft-sys.matlabSo, if I aquire the signal at some sampling rate, I can perform FFT(F) resulting in ... In your case, if the phase errors are not overwhelmingly large, then you might be ... Reducing noise impact in FFT: averaged segment method - comp.dsp ...If you repeat that N-point FFT on a new block of N time samples and measure the ... on flat-top windows (which are not appropriate to the OP's problem) and the large ... how to know that SRL16 was infered on xilinx? - comp.arch.fpga ...Hello, I am implementing a core from xilinx (FFT) and wanted to know how the ... The xilinx core manual says that the earlier stages that need large shift registers ... filter fft data - comp.soft-sys.matlabconventional wisdom how to upsample very large arrays, accurately ... Hi ... Filters (FFTs) Using Frequency Filters (FFTs) Use FFT Filtering (Fast Fourier Transform ... Prime 95 v.2.56 Small vs Large FFT - AnandTech ForumsHardware and Technology > CPUs and Overclocking ... The Large FFT setting says it's for maximum heat, but I ALWAYS get higher temps with ... This is a very good ... Large FFT vs Many FFTs | Comp.DSP | DSPRelated.comLarge FFT vs Many FFTs - Edison - 2007-04-05 06:44:00 ... are to take 1 seconds worth of data and FFT it, giving a 1.2 million point FFT. 7/19/2012 4:38:22 PM
|