Hi.
I noticed I haven't gotten many more answers on the thread "Fast pi
program?" about the pi program. I'd really be curious to know if
specifically the multiplication routines can be made faster than
what's already in there (since that's what seems to be taking up most
of the time according to the profiler. Not that I'm surprised.). The
source code file is still available for download.
|
|
0
|
|
|
|
Reply
|
mike3
|
9/4/2007 6:49:52 PM |
|
On Sep 4, 12:49 pm, mike3 <mike4...@yahoo.com> wrote:
> Hi.
>
> I noticed I haven't gotten many more answers on the thread "Fast pi
> program?" about the pi program. I'd really be curious to know if
> specifically the multiplication routines can be made faster than
> what's already in there (since that's what seems to be taking up most
> of the time according to the profiler. Not that I'm surprised.). The
> source code file is still available for download.
Any answer?
|
|
0
|
|
|
|
Reply
|
mike3
|
9/7/2007 6:29:56 PM
|
|
On Sep 7, 11:29 am, mike3 <mike4...@yahoo.com> wrote:
> On Sep 4, 12:49 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > Hi.
>
> > I noticed I haven't gotten many more answers on the thread "Fast pi
> > program?" about the pi program. I'd really be curious to know if
> > specifically the multiplication routines can be made faster than
> > what's already in there (since that's what seems to be taking up most
> > of the time according to the profiler. Not that I'm surprised.). The
> > source code file is still available for download.
>
> Any answer?
It's a fairly interesting topic. But I found your license confusing
and the first time I tried it, there were too many problems to
continue.
Just profile it, and speed up the hot spots.
|
|
0
|
|
|
|
Reply
|
user923005
|
9/7/2007 11:11:26 PM
|
|
On Sep 7, 5:11 pm, user923005 <dcor...@connx.com> wrote:
> On Sep 7, 11:29 am, mike3 <mike4...@yahoo.com> wrote:
>
> > On Sep 4, 12:49 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > Hi.
>
> > > I noticed I haven't gotten many more answers on the thread "Fast pi
> > > program?" about the pi program. I'd really be curious to know if
> > > specifically the multiplication routines can be made faster than
> > > what's already in there (since that's what seems to be taking up most
> > > of the time according to the profiler. Not that I'm surprised.). The
> > > source code file is still available for download.
>
> > Any answer?
>
> It's a fairly interesting topic. But I found your license confusing
> and the first time I tried it, there were too many problems to
> continue.
>
> Just profile it, and speed up the hot spots.
My license agreement was confusing? Could you explain,
please? I was just saying that you shouldn't redistribute
the program or any modified version without my permission,
as I was just releasing it for help with the speed, not a "full"
release. If I do go with a full release, then I will probably
release under a more relaxed license.
What were the problems you had when you tried it? You
were using GNU GCC to compile weren't you? Also, did
you get the most recent download, which does *not* use
a time zone library called "libtz"? If not, and that is related
to your problem, then you can get the new download here:
http://www.mediafire.com/?9mzltzjyizn
By the way I already profiled with gprof, and the hot spots
seem to be the multiplication routines, by the way, with the
FFTs/NTTs and all that. I'd also like some advice on the disk
math routines as I'm not sure if they could be improved
or not in terms of performance.
|
|
0
|
|
|
|
Reply
|
mike3
|
9/8/2007 12:05:42 AM
|
|
On Sep 7, 5:05 pm, mike3 <mike4...@yahoo.com> wrote:
> On Sep 7, 5:11 pm, user923005 <dcor...@connx.com> wrote:
>
>
>
>
>
> > On Sep 7, 11:29 am, mike3 <mike4...@yahoo.com> wrote:
>
> > > On Sep 4, 12:49 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > > Hi.
>
> > > > I noticed I haven't gotten many more answers on the thread "Fast pi
> > > > program?" about the pi program. I'd really be curious to know if
> > > > specifically the multiplication routines can be made faster than
> > > > what's already in there (since that's what seems to be taking up most
> > > > of the time according to the profiler. Not that I'm surprised.). The
> > > > source code file is still available for download.
>
> > > Any answer?
>
> > It's a fairly interesting topic. But I found your license confusing
> > and the first time I tried it, there were too many problems to
> > continue.
>
> > Just profile it, and speed up the hot spots.
>
> My license agreement was confusing? Could you explain,
> please? I was just saying that you shouldn't redistribute
> the program or any modified version without my permission,
> as I was just releasing it for help with the speed, not a "full"
> release. If I do go with a full release, then I will probably
> release under a more relaxed license.
>
> What were the problems you had when you tried it? You
> were using GNU GCC to compile weren't you? Also, did
> you get the most recent download, which does *not* use
> a time zone library called "libtz"? If not, and that is related
> to your problem, then you can get the new download here:
>
> http://www.mediafire.com/?9mzltzjyizn
>
> By the way I already profiled with gprof, and the hot spots
> seem to be the multiplication routines, by the way, with the
> FFTs/NTTs and all that. I'd also like some advice on the disk
> math routines as I'm not sure if they could be improved
> or not in terms of performance
I can build it with gcc:
dcorbit@DCORBIT64 /c/junk/pisrc
$ make
gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c nttxfm.c
gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c crt.c
primes.h:18: warning: 'NTTroots' defined but not used
gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c bigmul.c
crtmath.h:22: warning: 'crtcopy32' defined but not used
crtmath.h:53: warning: 'crtmulbsm32' defined but not used
crtmath.h:108: warning: 'crtmod32' defined but not used
primes.h:18: warning: 'NTTroots' defined but not used
gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c blockint.c
gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c diskint.c
gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c newton.c
gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c agm.c
gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c pib26.c
gcc -g -pg -o pib26 nttxfm.o crt.o bigmul.o blockint.o diskint.o
newton.o agm.o pib26.o -lm
but my best profiler tools are Windows based (Intel's VTUNE and
Microsoft's Profiler that comes with the Enterprise version of their
tool set).
Since your file is chock full of inline assembly in GAS syntax, there
is little hope of compiling it successfully using the Intel or MSVC++
compilers.
The things I could find with the gprof are the same things that you
found so I doubt that I can be of any help.
|
|
0
|
|
|
|
Reply
|
user923005
|
9/8/2007 1:53:49 AM
|
|
On Sep 7, 7:53 pm, user923005 <dcor...@connx.com> wrote:
> On Sep 7, 5:05 pm, mike3 <mike4...@yahoo.com> wrote:
>
>
>
>
>
> > On Sep 7, 5:11 pm, user923005 <dcor...@connx.com> wrote:
>
> > > On Sep 7, 11:29 am, mike3 <mike4...@yahoo.com> wrote:
>
> > > > On Sep 4, 12:49 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > > > Hi.
>
> > > > > I noticed I haven't gotten many more answers on the thread "Fast pi
> > > > > program?" about the pi program. I'd really be curious to know if
> > > > > specifically the multiplication routines can be made faster than
> > > > > what's already in there (since that's what seems to be taking up most
> > > > > of the time according to the profiler. Not that I'm surprised.). The
> > > > > source code file is still available for download.
>
> > > > Any answer?
>
> > > It's a fairly interesting topic. But I found your license confusing
> > > and the first time I tried it, there were too many problems to
> > > continue.
>
> > > Just profile it, and speed up the hot spots.
>
> > My license agreement was confusing? Could you explain,
> > please? I was just saying that you shouldn't redistribute
> > the program or any modified version without my permission,
> > as I was just releasing it for help with the speed, not a "full"
> > release. If I do go with a full release, then I will probably
> > release under a more relaxed license.
>
> > What were the problems you had when you tried it? You
> > were using GNU GCC to compile weren't you? Also, did
> > you get the most recent download, which does *not* use
> > a time zone library called "libtz"? If not, and that is related
> > to your problem, then you can get the new download here:
>
> >http://www.mediafire.com/?9mzltzjyizn
>
> > By the way I already profiled with gprof, and the hot spots
> > seem to be the multiplication routines, by the way, with the
> > FFTs/NTTs and all that. I'd also like some advice on the disk
> > math routines as I'm not sure if they could be improved
> > or not in terms of performance
>
> I can build it with gcc:
> dcorbit@DCORBIT64 /c/junk/pisrc
> $ make
> gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c nttxfm.c
> gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c crt.c
> primes.h:18: warning: 'NTTroots' defined but not used
> gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c bigmul.c
> crtmath.h:22: warning: 'crtcopy32' defined but not used
> crtmath.h:53: warning: 'crtmulbsm32' defined but not used
> crtmath.h:108: warning: 'crtmod32' defined but not used
> primes.h:18: warning: 'NTTroots' defined but not used
> gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c blockint.c
> gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c diskint.c
> gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c newton.c
> gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c agm.c
> gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c pib26.c
> gcc -g -pg -o pib26 nttxfm.o crt.o bigmul.o blockint.o diskint.o
> newton.o agm.o pib26.o -lm
>
> but my best profiler tools are Windows based (Intel's VTUNE and
> Microsoft's Profiler that comes with the Enterprise version of their
> tool set).
>
> Since your file is chock full of inline assembly in GAS syntax, there
> is little hope of compiling it successfully using the Intel or MSVC++
> compilers.
>
I suppose I could rewrite the AT&T syntax (it's not called "GAS"
syntax)
assembler in Intel syntax, but since I was working with gcc, I
did not do it. I use gcc since I do not have the money to buy those
other compilers you mentioned.
> The things I could find with the gprof are the same things that you
> found so I doubt that I can be of any help.
So you don't think gprof is a good enough profiler, then? And I'd
bet that Enterprise edition of the Microsoft stuff would probably
cost sweet amounts of money I don't have.
And why couldn't the results from gprof be of any help,
anyways?
|
|
0
|
|
|
|
Reply
|
mike3
|
9/8/2007 2:50:33 AM
|
|
On Sep 7, 7:50 pm, mike3 <mike4...@yahoo.com> wrote:
> On Sep 7, 7:53 pm, user923005 <dcor...@connx.com> wrote:
>
>
>
>
>
> > On Sep 7, 5:05 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > On Sep 7, 5:11 pm, user923005 <dcor...@connx.com> wrote:
>
> > > > On Sep 7, 11:29 am, mike3 <mike4...@yahoo.com> wrote:
>
> > > > > On Sep 4, 12:49 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > > > > Hi.
>
> > > > > > I noticed I haven't gotten many more answers on the thread "Fast pi
> > > > > > program?" about the pi program. I'd really be curious to know if
> > > > > > specifically the multiplication routines can be made faster than
> > > > > > what's already in there (since that's what seems to be taking up most
> > > > > > of the time according to the profiler. Not that I'm surprised.). The
> > > > > > source code file is still available for download.
>
> > > > > Any answer?
>
> > > > It's a fairly interesting topic. But I found your license confusing
> > > > and the first time I tried it, there were too many problems to
> > > > continue.
>
> > > > Just profile it, and speed up the hot spots.
>
> > > My license agreement was confusing? Could you explain,
> > > please? I was just saying that you shouldn't redistribute
> > > the program or any modified version without my permission,
> > > as I was just releasing it for help with the speed, not a "full"
> > > release. If I do go with a full release, then I will probably
> > > release under a more relaxed license.
>
> > > What were the problems you had when you tried it? You
> > > were using GNU GCC to compile weren't you? Also, did
> > > you get the most recent download, which does *not* use
> > > a time zone library called "libtz"? If not, and that is related
> > > to your problem, then you can get the new download here:
>
> > >http://www.mediafire.com/?9mzltzjyizn
>
> > > By the way I already profiled with gprof, and the hot spots
> > > seem to be the multiplication routines, by the way, with the
> > > FFTs/NTTs and all that. I'd also like some advice on the disk
> > > math routines as I'm not sure if they could be improved
> > > or not in terms of performance
>
> > I can build it with gcc:
> > dcorbit@DCORBIT64 /c/junk/pisrc
> > $ make
> > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c nttxfm.c
> > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c crt.c
> > primes.h:18: warning: 'NTTroots' defined but not used
> > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c bigmul.c
> > crtmath.h:22: warning: 'crtcopy32' defined but not used
> > crtmath.h:53: warning: 'crtmulbsm32' defined but not used
> > crtmath.h:108: warning: 'crtmod32' defined but not used
> > primes.h:18: warning: 'NTTroots' defined but not used
> > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c blockint.c
> > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c diskint.c
> > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c newton.c
> > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c agm.c
> > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c pib26.c
> > gcc -g -pg -o pib26 nttxfm.o crt.o bigmul.o blockint.o diskint.o
> > newton.o agm.o pib26.o -lm
>
> > but my best profiler tools are Windows based (Intel's VTUNE and
> > Microsoft's Profiler that comes with the Enterprise version of their
> > tool set).
>
> > Since your file is chock full of inline assembly in GAS syntax, there
> > is little hope of compiling it successfully using the Intel or MSVC++
> > compilers.
>
> I suppose I could rewrite the AT&T syntax (it's not called "GAS"
> syntax)
> assembler in Intel syntax, but since I was working with gcc, I
> did not do it. I use gcc since I do not have the money to buy those
> other compilers you mentioned.
>
> > The things I could find with the gprof are the same things that you
> > found so I doubt that I can be of any help.
>
> So you don't think gprof is a good enough profiler, then?
Actually, it might be good enough. I don't use it unless there is no
other choice and so my lack of experience with that profiler may be
the real limiting step and not the capability of the profiler.
Some of the things that the high end profilers do is give you
suggestions about better formulations and show you what the bottleneck
in the process is (much more salient than where the time is going).
> And I'd
> bet that Enterprise edition of the Microsoft stuff would probably
> cost sweet amounts of money I don't have.
They do cost a bazillion dollars. The Intel profiler is cheaper than
the MS profiler and just as good (but it is Intel specific and
disables much of the really excellent functionality if you try to use
it on AMD).
I think you can download the Intel compiler for Linux for free. Maybe
you can get the profiler also. You might check this stuff out:
http://www.intel.com/cd/software/products/asmo-na/eng/download/eval/219690.htm
I guess that you will get 20% faster just by using the Intel compiler
instead of GCC.
> And why couldn't the results from gprof be of any help,
> anyways?
Well you have them. Did they help?
|
|
0
|
|
|
|
Reply
|
user923005
|
9/10/2007 7:41:18 PM
|
|
On Sep 10, 1:41 pm, user923005 <dcor...@connx.com> wrote:
> On Sep 7, 7:50 pm, mike3 <mike4...@yahoo.com> wrote:
>
>
>
>
>
> > On Sep 7, 7:53 pm, user923005 <dcor...@connx.com> wrote:
>
> > > On Sep 7, 5:05 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > > On Sep 7, 5:11 pm, user923005 <dcor...@connx.com> wrote:
>
> > > > > On Sep 7, 11:29 am, mike3 <mike4...@yahoo.com> wrote:
>
> > > > > > On Sep 4, 12:49 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > > > > > Hi.
>
> > > > > > > I noticed I haven't gotten many more answers on the thread "Fast pi
> > > > > > > program?" about the pi program. I'd really be curious to know if
> > > > > > > specifically the multiplication routines can be made faster than
> > > > > > > what's already in there (since that's what seems to be taking up most
> > > > > > > of the time according to the profiler. Not that I'm surprised.). The
> > > > > > > source code file is still available for download.
>
> > > > > > Any answer?
>
> > > > > It's a fairly interesting topic. But I found your license confusing
> > > > > and the first time I tried it, there were too many problems to
> > > > > continue.
>
> > > > > Just profile it, and speed up the hot spots.
>
> > > > My license agreement was confusing? Could you explain,
> > > > please? I was just saying that you shouldn't redistribute
> > > > the program or any modified version without my permission,
> > > > as I was just releasing it for help with the speed, not a "full"
> > > > release. If I do go with a full release, then I will probably
> > > > release under a more relaxed license.
>
> > > > What were the problems you had when you tried it? You
> > > > were using GNU GCC to compile weren't you? Also, did
> > > > you get the most recent download, which does *not* use
> > > > a time zone library called "libtz"? If not, and that is related
> > > > to your problem, then you can get the new download here:
>
> > > >http://www.mediafire.com/?9mzltzjyizn
>
> > > > By the way I already profiled with gprof, and the hot spots
> > > > seem to be the multiplication routines, by the way, with the
> > > > FFTs/NTTs and all that. I'd also like some advice on the disk
> > > > math routines as I'm not sure if they could be improved
> > > > or not in terms of performance
>
> > > I can build it with gcc:
> > > dcorbit@DCORBIT64 /c/junk/pisrc
> > > $ make
> > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c nttxfm.c
> > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c crt.c
> > > primes.h:18: warning: 'NTTroots' defined but not used
> > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c bigmul.c
> > > crtmath.h:22: warning: 'crtcopy32' defined but not used
> > > crtmath.h:53: warning: 'crtmulbsm32' defined but not used
> > > crtmath.h:108: warning: 'crtmod32' defined but not used
> > > primes.h:18: warning: 'NTTroots' defined but not used
> > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c blockint.c
> > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c diskint.c
> > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c newton.c
> > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c agm.c
> > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c pib26.c
> > > gcc -g -pg -o pib26 nttxfm.o crt.o bigmul.o blockint.o diskint.o
> > > newton.o agm.o pib26.o -lm
>
> > > but my best profiler tools are Windows based (Intel's VTUNE and
> > > Microsoft's Profiler that comes with the Enterprise version of their
> > > tool set).
>
> > > Since your file is chock full of inline assembly in GAS syntax, there
> > > is little hope of compiling it successfully using the Intel or MSVC++
> > > compilers.
>
> > I suppose I could rewrite the AT&T syntax (it's not called "GAS"
> > syntax)
> > assembler in Intel syntax, but since I was working with gcc, I
> > did not do it. I use gcc since I do not have the money to buy those
> > other compilers you mentioned.
>
> > > The things I could find with the gprof are the same things that you
> > > found so I doubt that I can be of any help.
>
> > So you don't think gprof is a good enough profiler, then?
>
> Actually, it might be good enough. I don't use it unless there is no
> other choice and so my lack of experience with that profiler may be
> the real limiting step and not the capability of the profiler.
> Some of the things that the high end profilers do is give you
> suggestions about better formulations and show you what the bottleneck
> in the process is (much more salient than where the time is going).
>
gprof simply tells, at least as far as I know, where the time
is going, into what routines. Although I have not had a huge
amount of experience with it either.
> > And I'd
> > bet that Enterprise edition of the Microsoft stuff would probably
> > cost sweet amounts of money I don't have.
>
> They do cost a bazillion dollars. The Intel profiler is cheaper than
> the MS profiler and just as good (but it is Intel specific and
> disables much of the really excellent functionality if you try to use
> it on AMD).
>
How did you get this stuff, then? You must make a lot
of money.
> I think you can download the Intel compiler for Linux for free. Maybe
> you can get the profiler also. You might check this stuff out:http://www.intel.com/cd/software/products/asmo-na/eng/download/eval/2...
>
Looks like you can get a non-commercial version of both
items for free. Since this program is not a commercial
venture in any way, I might give this a try.
> I guess that you will get 20% faster just by using the Intel compiler
> instead of GCC.
>
> > And why couldn't the results from gprof be of any help,
> > anyways?
>
> Well you have them. Did they help?
They told me what routines ate most of the time. It
appears the two NTT routines take up the most,
followed by the routine that emits digits with the
Chinese Remainder Theorem. (All are used to
multiply the big numbers.)
Perhaps someone else here could offer some more
help?
|
|
0
|
|
|
|
Reply
|
mike3
|
9/10/2007 11:18:23 PM
|
|
"mike3" <mike4ty4@yahoo.com> wrote in message
news:1189466303.056643.157520@o80g2000hse.googlegroups.com...
> On Sep 10, 1:41 pm, user923005 <dcor...@connx.com> wrote:
> Perhaps someone else here could offer some more
> help?
I always think that the book _Pi and the AGM_ by the Canadian mathematicians
Borwein and Borwein is relevant to this topic. The Canadians are the
currert record-holders for the number of digits on pi. User10^6 is always
relevant for speed.
--
Wade Ward
|
|
0
|
|
|
|
Reply
|
Wade
|
9/11/2007 6:54:32 AM
|
|
On Sep 10, 5:18 pm, mike3 <mike4...@yahoo.com> wrote:
> On Sep 10, 1:41 pm, user923005 <dcor...@connx.com> wrote:
>
>
>
> > On Sep 7, 7:50 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > On Sep 7, 7:53 pm, user923005 <dcor...@connx.com> wrote:
>
> > > > On Sep 7, 5:05 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > > > On Sep 7, 5:11 pm, user923005 <dcor...@connx.com> wrote:
>
> > > > > > On Sep 7, 11:29 am, mike3 <mike4...@yahoo.com> wrote:
>
> > > > > > > On Sep 4, 12:49 pm, mike3 <mike4...@yahoo.com> wrote:
>
> > > > > > > > Hi.
>
> > > > > > > > I noticed I haven't gotten many more answers on the thread "Fastpi> > > > > > > program?" about thepi program. I'd really be curious to know if
> > > > > > > > specifically the multiplication routines can be made faster than
> > > > > > > > what's already in there (since that's what seems to be taking up most
> > > > > > > > of the time according to the profiler. Not that I'm surprised.). The
> > > > > > > > source code file is still available for download.
>
> > > > > > > Any answer?
>
> > > > > > It's a fairly interesting topic. But I found your license confusing
> > > > > > and the first time I tried it, there were too many problems to
> > > > > > continue.
>
> > > > > > Just profile it, and speed up the hot spots.
>
> > > > > My license agreement was confusing? Could you explain,
> > > > > please? I was just saying that you shouldn't redistribute
> > > > > the program or any modified version without my permission,
> > > > > as I was just releasing it for help with the speed, not a "full"
> > > > > release. If I do go with a full release, then I will probably
> > > > > release under a more relaxed license.
>
> > > > > What were the problems you had when you tried it? You
> > > > > were using GNU GCC to compile weren't you? Also, did
> > > > > you get the most recent download, which does *not* use
> > > > > a time zone library called "libtz"? If not, and that is related
> > > > > to your problem, then you can get the new download here:
>
> > > > >http://www.mediafire.com/?9mzltzjyizn
>
> > > > > By the way I already profiled with gprof, and the hot spots
> > > > > seem to be the multiplication routines, by the way, with the
> > > > > FFTs/NTTs and all that. I'd also like some advice on the disk
> > > > > math routines as I'm not sure if they could be improved
> > > > > or not in terms of performance
>
> > > > I can build it with gcc:
> > > > dcorbit@DCORBIT64 /c/junk/pisrc
> > > > $ make
> > > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c nttxfm.c
> > > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c crt.c
> > > > primes.h:18: warning: 'NTTroots' defined but not used
> > > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c bigmul.c
> > > > crtmath.h:22: warning: 'crtcopy32' defined but not used
> > > > crtmath.h:53: warning: 'crtmulbsm32' defined but not used
> > > > crtmath.h:108: warning: 'crtmod32' defined but not used
> > > > primes.h:18: warning: 'NTTroots' defined but not used
> > > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c blockint.c
> > > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c diskint.c
> > > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c newton.c
> > > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c agm.c
> > > > gcc -g -pg -O3 -Wall -ffast-math -funroll-loops -c pib26.c
> > > > gcc -g -pg -o pib26 nttxfm.o crt.o bigmul.o blockint.o diskint.o
> > > > newton.o agm.o pib26.o -lm
>
> > > > but my best profiler tools are Windows based (Intel's VTUNE and
> > > > Microsoft's Profiler that comes with the Enterprise version of their
> > > > tool set).
>
> > > > Since your file is chock full of inline assembly in GAS syntax, there
> > > > is little hope of compiling it successfully using the Intel or MSVC++
> > > > compilers.
>
> > > I suppose I could rewrite the AT&T syntax (it's not called "GAS"
> > > syntax)
> > > assembler in Intel syntax, but since I was working with gcc, I
> > > did not do it. I use gcc since I do not have the money to buy those
> > > other compilers you mentioned.
>
> > > > The things I could find with the gprof are the same things that you
> > > > found so I doubt that I can be of any help.
>
> > > So you don't think gprof is a good enough profiler, then?
>
> > Actually, it might be good enough. I don't use it unless there is no
> > other choice and so my lack of experience with that profiler may be
> > the real limiting step and not the capability of the profiler.
> > Some of the things that the high end profilers do is give you
> > suggestions about better formulations and show you what the bottleneck
> > in the process is (much more salient than where the time is going).
>
> gprof simply tells, at least as far as I know, where the time
> is going, into what routines. Although I have not had a huge
> amount of experience with it either.
>
> > > And I'd
> > > bet that Enterprise edition of the Microsoft stuff would probably
> > > cost sweet amounts of money I don't have.
>
> > They do cost a bazillion dollars. The Intel profiler is cheaper than
> > the MS profiler and just as good (but it is Intel specific and
> > disables much of the really excellent functionality if you try to use
> > it on AMD).
>
> How did you get this stuff, then? You must make a lot
> of money.
>
> > I think you can download the Intel compiler for Linux for free. Maybe
> > you can get the profiler also. You might check this stuff out:http://www.intel.com/cd/software/products/asmo-na/eng/download/eval/2...
>
> Looks like you can get a non-commercial version of both
> items for free. Since this program is not a commercial
> venture in any way, I might give this a try.
>
> > I guess that you will get 20% faster just by using the Intel compiler
> > instead of GCC.
>
> > > And why couldn't the results from gprof be of any help,
> > > anyways?
>
> > Well you have them. Did they help?
>
> They told me what routines ate most of the time. It
> appears the two NTT routines take up the most,
> followed by the routine that emits digits with the
> Chinese Remainder Theorem. (All are used to
> multiply the big numbers.)
>
> Perhaps someone else here could offer some more
> help?
Any answers? Looks like someone sent a response
but it I can't get the text of the response to show up
here on Google.
|
|
0
|
|
|
|
Reply
|
mike3
|
9/15/2007 1:57:43 AM
|
|
|
9 Replies
78 Views
(page loaded in 0.08 seconds)
Similiar Articles: Calculation of filter coefficients in Sigma studio - comp.dsp ...... found w0 is not clear so resending my queries again. ... ->w0 = 2*pi*f0/Fs ->gainLinear = 10^(gain/20 ... a1,a2,b0,b1,b2 in hex format in CAPTURE WINDOW of software. bad pointer exception - comp.lang.c++... include void main() { try { int *pi ... generates a runtime error time > and time again? The definition of "undefined behaviour" is that the behaviour of the code ... Is scilab 5.x a VIABLE alternative to scilab 4.x ??? - comp.soft ...And I just got "BURNED" :< Try this code fragment ... s=[sin(2*%pi*440*t);sin(2*%pi*350*t)]; savewave ... with documentation (or its voids). > BURNED *again ... Uniform Distribution of Points about a sphere - comp.soft-sys ...... by Rakhmanov et al, the details as follows: 0<theta<pi ... have tried to implement this using the following code but ... at certain points. Many thanks for your help again. uniform random variable - comp.soft-sys.matlab... pi*y); hist(z,100); I have made it shorter again.But still the shape of 'z' doesn't look familiar. So I guess I still have something wrong with this small program.. Help with discrete double integral - comp.soft-sys.matlab ...The exact integral value of f(x,y)*q(x) over x and y is - pi/4. The code is ... is no point in me (or anyone else) explaining the same thing all over again. Windowing effect on spectral leakage and phase after FFT - comp ...But I am puzzled by the effect of windowing in the code ... 50 Hz sinusoid and a 120 Hz sinusoid x = 0.7*cos(2*pi ... Again, windows are used for specific purposes, which are ... freqs function - comp.soft-sys.matlab... HERE***** %phase angle phang=(phase(resp)); phadeg=phang*180/pi ... I am downloading the program again from my original purcase, and was sure to check all ... Asymmetric FIR Filtering - comp.dsp... at it presently is because the issue has surfaced once again. I'm basically using some matlab code that ... lpf = fir1(nf,bw/fs,'low').'; > B = lpf.*exp(j*2*pi*f0/fs*(1 ... Sendmail SMTP Host Name - comp.unix.solarisThe program works fine on my test machine, but when I put ... net smtp.sbcglobal.net is an alias for smtp.pi ... I put that in my /etc/inet/hosts file and tried again. Pi - 10 Trillion DigitsRound 2... 10 Trillion Digits of Pi Same program, same computer, just a longer wait... ... (again...) April 16, 2011: Week 18: Series summation is past 47% complete. View Build A huge PI Program PageThis Program has already helped doctors get 70,000+ PI referrals There is no city too large or ... Never ask for a PI referral again...you won’t need to 7/18/2012 1:31:52 AM
|