-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I wrote an application:
http://doublehp.free.fr/tmp/xvcam.c
it is derived from xvcam ( part of camE project, part of Enlightenment
project )
It originally was just a viewer for webcam.
My version is able to compute image recognition using a modified version
of the Randomized Generalized Hough Transform.
- -1- I would like some people to have look at how I do the HT, and if it
possible to improove it with no more computation.
You can read about my own critics about that in
http://www.demaine.info/projet_self_tracking_webcam/formal_report.pdf
in section 12 starting at page 14 ( the pics in that section are not the
good ones yet)
Just look at the function PGHT(). Other *HT() were testing stuff. In
order to know how I initialise everything, you might want to read the
beginning of run2().
- -2- I would like to know how to optimise the work of conv1(), conv2() (
matrix convolution ), PGHT() ( my modified HT), and the first part of
run2(), in order to make things faster.
As said in tho logbook (
http://www.demaine.info/projet_self_tracking_webcam/log_book.pdf )
I tried to precompute cos/sin tables, but calling the result took more
time than computing them in real time.
What other kind of optimisation can I do ?
I dont think anybody can compile that file. It depends on many stuff.
And none of you will be able to run it since it access a USB device I
home made. Later versions should be more friendly. But for now I d like
to improve computing/calculous.
Thank for any comment.
- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBZLdbGWSTLbOSw8IRAnxdAJ4vXJDRNxu4hT3jxu39T3Eh/SrEkgCfY6m9
1Zz1lsq0BOaRDZBBiHv1JhM=
=Iq7E
-----END PGP SIGNATURE-----
|
|
0
|
|
|
|
Reply
|
nntp_pipex (70)
|
10/7/2004 3:26:20 AM |
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Note that I think what take most of the time is memory acces which looks
random, and IMHO can not be predict by cashes; the address of the next
cell is a function applied to the content of the current cell. How to
make that more efficient ?
- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBZT2FGWSTLbOSw8IRAhnlAJwIwhQe6tIPkeWlchnJkxBONsh5cQCcDYlS
jH4a4kbHd+fgvGTPqXUe+ss=
=TwOG
-----END PGP SIGNATURE-----
|
|
0
|
|
|
|
Reply
|
nntp_pipex (70)
|
10/7/2004 12:58:45 PM
|
|
DEMAINE Benoit-Pierre wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I wrote an application:
> http://doublehp.free.fr/tmp/xvcam.c
> it is derived from xvcam ( part of camE project, part of Enlightenment
> project )
>
> It originally was just a viewer for webcam.
> My version is able to compute image recognition using a modified version
> of the Randomized Generalized Hough Transform.
>
> - -1- I would like some people to have look at how I do the HT, and if it
> possible to improove it with no more computation.
> You can read about my own critics about that in
> http://www.demaine.info/projet_self_tracking_webcam/formal_report.pdf
> in section 12 starting at page 14 ( the pics in that section are not the
> good ones yet)
> Just look at the function PGHT(). Other *HT() were testing stuff. In
> order to know how I initialise everything, you might want to read the
> beginning of run2().
>
> - -2- I would like to know how to optimise the work of conv1(), conv2() (
> matrix convolution ), PGHT() ( my modified HT), and the first part of
> run2(), in order to make things faster.
>
> As said in tho logbook (
> http://www.demaine.info/projet_self_tracking_webcam/log_book.pdf )
> I tried to precompute cos/sin tables, but calling the result took more
> time than computing them in real time.
>
> What other kind of optimisation can I do ?
....
In general, I suggest:
1) Measure where the time is spent in your code when it runs. Remember, there's
no point in optimizing code unless it contributes substantially to your runtime.
You'll also need a precise measure of runtime to know whether each optimization
that you try is making the code faster or slower. If you don't time your
program precisely, you can't know if your optimizations are helping or hindering.
2) There are three ways to speed up a program: I) reduce the number of
instructions, II) make better use of the CPU, and III) minimize the time the
program spends waiting for system resources (like memory). If you're using a
capable compiler and you're applying its complete set of compiler optimization
flags (like -O3 and -funroll-loops), then you've reduced the number of
instructions and improved instruction throughput. Probably the only thing left
to do is look at the way that you access memory. Of course, if you can find an
example of the function that you want to optimize that has already been
optimized (in a library or a book on algorithms, perhaps), that's going to be
the fastest way to improve your code.
(If your compiler can produce optimization reports, explaining what
optimizations it could and could not perform on your code, you should try to
generate these. You can learn a lot about the strengths and weaknesses of your
code this way.)
Unless memory accesses appear predictable to your eye, the compiler will be
unable to predict the access pattern either. If you are chasing pointers (which
is what you seem to be describing), the only way to improve memory reuse
(caching) may be to devise a different data structure that's more regular.
Remember too, premature optimization is the root of all evil...
Randy
--
Randy Crawford http://www.ruf.rice.edu/~rand rand AT rice DOT edu
|
|
0
|
|
|
|
Reply
|
joe304 (205)
|
10/7/2004 4:39:32 PM
|
|
Randy coughed up:
> DEMAINE Benoit-Pierre wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I wrote an application:
>> http://doublehp.free.fr/tmp/xvcam.c
>> it is derived from xvcam ( part of camE project, part of
>> Enlightenment project )
>>
>> It originally was just a viewer for webcam.
>> My version is able to compute image recognition using a modified
>> version of the Randomized Generalized Hough Transform.
>>
>> - -1- I would like some people to have look at how I do the HT, and
>> if it possible to improove it with no more computation.
>> You can read about my own critics about that in
>> http://www.demaine.info/projet_self_tracking_webcam/formal_report.pdf
>> in section 12 starting at page 14 ( the pics in that section are not
>> the good ones yet)
>> Just look at the function PGHT(). Other *HT() were testing stuff. In
>> order to know how I initialise everything, you might want to read the
>> beginning of run2().
>>
>> - -2- I would like to know how to optimise the work of conv1(),
>> conv2() ( matrix convolution ), PGHT() ( my modified HT), and the
>> first part of run2(), in order to make things faster.
>>
>> As said in tho logbook (
>> http://www.demaine.info/projet_self_tracking_webcam/log_book.pdf )
>> I tried to precompute cos/sin tables, but calling the result took
>> more time than computing them in real time.
>>
>> What other kind of optimisation can I do ?
> ...
>
> In general, I suggest:
>
> 1) Measure where the time is spent in your code when it runs.
Remember too, that in most profiler, there are usually two distinct metrics,
both of which can be critical:
1. the amount of time spent in a method and all sub methods called
2. the amount of time spent in a method devoid of all the sub methods
called
....[rip]...
> Remember too, premature optimization is the root of all evil...
What he said...
....[rip]...
--
Everythinginlifeisrealative.Apingpongballseemssmalluntilsomeoneramsitupyournose.
|
|
0
|
|
|
|
Reply
|
tgm2tothe10thpower (2065)
|
10/7/2004 5:57:16 PM
|
|
DEMAINE Benoit-Pierre wrote:
> - -2- I would like to know how to optimise the work of conv1(),
> conv2() ( matrix convolution ), PGHT() ( my modified HT), and
> the first part of run2(), in order to make things faster.
You can perform convolution by Fourier transforming the image and
filter kernel, multiplying pointwise, and transforming back. Whether
that's faster than direct convolution depends on the involved
matrices' sizes; you'd have to experiment here.
If you don't want so much floating point computation, there are also
Fourier transforms over finite rings, the so-called number-theoretic
transforms. NTT's share the convolution/multiplication duality and
admit FFT-like fast algorithms, but the underlying algebraic system's
nature constrains possible transform lengths.
Google for more information, there should be heaps of code and
explanation for the regular Fourier method in particular.
> As said in tho logbook (
> http://www.demaine.info/projet_self_tracking_webcam/log_book.pdf
> ) I tried to precompute cos/sin tables, but calling the result
> took more time than computing them in real time.
That can happen since large tables are cache-unfriendly. In
extract_R() the angle changes by a constant amount between trig
calls, so you could use a recursive generator there, giving both
sine and cosine quite cheaply. Unfortunately, that won't work for
arbitrary angles like in length(). A fine article on those animals
is available from its author's site:
http://personal.atl.bellsouth.net/p/h/physics/2nd_OSC_paper.pdf
Incidentally, this line from extract_R() seems weird to me:
x=mouse_x+(float)r*cosf((float)o);
Wouldn't that be more like cosf((2*PI*o)/_R_RES_)?
I may be mistaken, though, because I don't understand your
code at all. I hope my comments are helpful despite that ;)
Responses regarding the Hough transform may be more probable
in sci.image.processing or comp.graphics.algorithms.
Martin
--
Sartre promotes the use of submodernist semiotic
theory to attack sexist perceptions of art.
http://www.elsewhere.org/cgi-bin/postmodern/
|
|
0
|
|
|
|
Reply
|
martin.eisenbergNOS (92)
|
10/7/2004 6:41:11 PM
|
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
| 1) Measure where the time is spent in your code when it runs.
Remember, there's
| no point in optimizing code unless it contributes substantially to
your runtime.
| You'll also need a precise measure of runtime to know whether each
optimization
| that you try is making the code faster or slower. If you don't time your
| program precisely, you can't know if your optimizations are helping or
hindering.
My logbook tell I done that about sin/cos precomputed tables.
| 2) There are three ways to speed up a program: I) reduce the number of
| instructions, II) make better use of the CPU, and III) minimize the
time the
| program spends waiting for system resources (like memory).
I think that the problem in my program is that it make too many memory
access to unpredictable locations, so that the caches can not preload
RAM. If you read PGHT(), you will see that it is not that big, but it
involves many for loops whith many random arry acces in them.
Since I dont know how many registers there is in a AMD thunderbird, I
can not know whether I can keep some stuff in variables, hoping that the
values stay in registers ( thus saving time ).
| If you're using a
| capable compiler
gcc version 3.3.4 (Debian 1:3.3.4-13)
| and you're applying its complete set of compiler optimization
| flags (like -O3 and -funroll-loops), then you've reduced the number of
| instructions and improved instruction throughput.
I did not think about that. What options are good for memory access ?
does stripping improove speed ?
| Probably the only thing left
| to do is look at the way that you access memory. Of course, if you
can find an
| example of the function that you want to optimize that has already been
| optimized (in a library or a book on algorithms, perhaps), that's
going to be
| the fastest way to improve your code.
I get you, but I dont know how to aply that to my code any further.
| (If your compiler can produce optimization reports, explaining what
| optimizations it could and could not perform on your code, you should
try to
| generate these. You can learn a lot about the strengths and
weaknesses of your
| code this way.)
How do I get such reports with gcc ?
| Unless memory accesses appear predictable to your eye, the compiler
will be
| unable to predict the access pattern either. If you are chasing
pointers (which
| is what you seem to be describing), the only way to improve memory reuse
| (caching) may be to devise a different data structure that's more regular.
The worse point is that I run along every pixel of a picture, and for
each of them, I run along a 50*3 matrix, and if the content of the
matrix is not null, then I read relative coordinates in the array, and
find out an other point of the picture, and modify it. I can not guess
the coordinates of the point I will modify before I have computed them
from coordinates of current point and variables in the array ...
I dont think I can divide anything: if I spread the 50*3 array, I have
to run several time the loop which runs along the picture.
I am not even sure I could be able to divide the work to make it more
efficient with SMP: all the work is about multiplying matrixes: you need
an eye on the whole matrix at any time if you want to do any thing with it.
- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBZc1tGWSTLbOSw8IRApwxAKCWh+PgQUHBsi7Y6MX28soXyUADAgCeIXft
P0Vr/KgZq2gX2zPFrx6q+HI=
=qqfW
-----END PGP SIGNATURE-----
|
|
0
|
|
|
|
Reply
|
nntp_pipex (70)
|
10/7/2004 11:12:46 PM
|
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Martin Eisenberg wrote:
| DEMAINE Benoit-Pierre wrote:
|
|
|>- -2- I would like to know how to optimise the work of conv1(),
|>conv2() ( matrix convolution ), PGHT() ( my modified HT), and
|>the first part of run2(), in order to make things faster.
|
| You can perform convolution by Fourier transforming the image and
| filter kernel, multiplying pointwise, and transforming back. Whether
| that's faster than direct convolution depends on the involved
| matrices' sizes; you'd have to experiment here.
Where can I read about your FT ? I only use 0-255 integer range.
My matrixes have variable sizes, but all parametrable with #define
The main optimisation should be about PGHT()
The worse part is :
~ for(i=0;i<_YUV_WIDTH;i+=1)
~ for(j=0;j<_YUV_HEIGHT;j+=1)
~ {
~ int w;
~ if(!i2->S[i][j])
~ continue;
~ o_p=i2->P[i][j];
~ o_w=i2->W[i][j];
~ for(p=0;p<_R_RES_;p++)
~ for(w=0;w<_R_WIDTH_;w++)
~ {
~ if(!i2->R_valid[p])
~ continue;
~ // test if phase in pic is close to any phase in R-Table
~ if(abs(o_p-i2->R_P[p][w])<5)
~ if(abs(o_w-i2->R_W[p][w])<5)
if((i-i2->R_x[p][w]>=0)&&(i-i2->R_x[p][w]<_YUV_WIDTH)&&
(j-i2->R_y[p][w]>=0)&&(j-i2->R_y[p][w]<_YUV_HEIGHT))
~ i2->result[i-i2->R_x[p][w]][j-i2->R_y[p][w]]+=0x1;
~ }
~ }
4 for loops with many arry access in random way.
| That can happen since large tables are cache-unfriendly.
:(
| Incidentally, this line from extract_R() seems weird to me:
| x=mouse_x+(float)r*cosf((float)o);
| Wouldn't that be more like cosf((2*PI*o)/_R_RES_)?
hmmm by the time I wrote that line, _R_RES used to be 360. Further more,
cos(3) uses angle in radians. So, you are right, my code is badly
written. but practically, it still works :) guess why ...
if you think you use radians, and use an angle from 0 to 30 with a
stepper of _1_, since a full angle is 6.2831, the value 7 will make
computation as for 0.7189, 8 as for 1.7189 ... and so on, so that I do a
compleet circle every 6 steps, but do not make computations for
redundant values. Every next circle improove the resolution in angle ...
so that I obtain the same result as cosf((2*PI*o)/_R_RES_) but saving
two mult and a division (at the expense of a modulo done inside the cos)
\o/ This a perfect example of non voluntary optimisation \o/ will you
believe this ?
| I may be mistaken,
You are right, but aplying your modification would make things slower
without improoving the work :)
| though, because I don't understand your
| code at all.
Some time I dont either :) ( see RGHT() I am not even able to understand
what I wrote ^^ )
- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBZdLdGWSTLbOSw8IRAh8sAJ91F/3FloFFYOfjfUlKNKjcPNpt0wCeLdIe
An3Qs64VwGLUyvCNg3KQz6I=
=quFm
-----END PGP SIGNATURE-----
|
|
0
|
|
|
|
Reply
|
nntp_pipex (70)
|
10/7/2004 11:35:58 PM
|
|
DEMAINE Benoit-Pierre wrote:
>
.... snip ...
>
> The worse part is :
>
> ~ for(i=0;i<_YUV_WIDTH;i+=1)
> ~ for(j=0;j<_YUV_HEIGHT;j+=1)
> ~ {
> ~ int w;
> ~ if(!i2->S[i][j])
> ~ continue;
> ~ o_p=i2->P[i][j];
> ~ o_w=i2->W[i][j];
> ~ for(p=0;p<_R_RES_;p++)
> ~ for(w=0;w<_R_WIDTH_;w++)
> ~ {
> ~ if(!i2->R_valid[p])
> ~ continue;
> ~ // test if phase in pic is close to any phase in R-Table
> ~ if(abs(o_p-i2->R_P[p][w])<5)
> ~ if(abs(o_w-i2->R_W[p][w])<5)
> if((i-i2->R_x[p][w]>=0)&&(i-i2->R_x[p][w]<_YUV_WIDTH)&&
> (j-i2->R_y[p][w]>=0)&&(j-i2->R_y[p][w]<_YUV_HEIGHT))
> ~ i2->result[i-i2->R_x[p][w]][j-i2->R_y[p][w]]+=0x1;
> ~ }
> ~ }
>
> 4 for loops with many arry access in random way.
Whoever wrote that should be shot.
--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!
|
|
0
|
|
|
|
Reply
|
cbfalconer (19183)
|
10/8/2004 3:12:51 AM
|
|
DEMAINE Benoit-Pierre wrote:
> Where can I read about your FT ? I only use 0-255 integer range.
http://www.google.com/search?q=fft+convolution&num=15
http://www.google.com/search?q=fft+convolution+%22two+dimensions%22
http://www.numerical-recipes.com/nronline_switcher.html
(chapters 12.4, 12.5, 13.1)
http://www.jjj.de/fxt/fxtpage.html (has NTT info as well)
http://www.fftw.org/
I include the first search because many texts on FFT convolution
only treat the one-dimensional case, but it's easy to generalize
conceptually since multidimensional DFT operators are separable into
one 1D transform in each direction. However, I think you want to use
a canned 2D transform to actually carry it out.
The main things to understand are (1) the difference between circular
and linear convolution and the use of zero-padding to reconcile the
two; and flowing from that, (2) the overlap-add or overlap-save
methods for blockwise computation. You will probably encounter talk
about data tapering windows and analysis frame overlap (a different
"overlap" than above). These are only used in spectral analysis, not
in pure convolution, and are thus irrelevant to your application.
Pose specific questions on the image processing groups or comp.dsp.
> if you think you use radians, and use an angle from 0 to 30 with
> a stepper of _1_, since a full angle is 6.2831, the value 7 will
> make computation as for 0.7189, 8 as for 1.7189 ... and so on,
> so that I do a compleet circle every 6 steps, but do not make
> computations for redundant values. Every next circle improove
> the resolution in angle ... so that I obtain the same result as
> cosf((2*PI*o)/_R_RES_) but saving two mult and a division (at
> the expense of a modulo done inside the cos) \o/ This a perfect
> example of non voluntary optimisation \o/ will you believe this ?
I get it. But I still wonder about consistency of this "relabeling"
of angles with the rest of the program. Digressing, I guess "\o/" has
smiley semantics, but is it meant to suggest any particular object?
> Some time I dont either :) ( see RGHT() I am not even able to
> understand what I wrote ^^ )
While I don't even know the algorithm in the abstract,
that honestly does not surprise me in the least ;)
Martin
--
Quidquid latine dictum sit, altum viditur.
|
|
0
|
|
|
|
Reply
|
martin.eisenbergNOS (92)
|
10/8/2004 8:39:48 AM
|
|
CBFalconer coughed up:
> DEMAINE Benoit-Pierre wrote:
>>
> ... snip ...
>>
>> The worse part is :
>>
>> ~ for(i=0;i<_YUV_WIDTH;i+=1)
>> ~ for(j=0;j<_YUV_HEIGHT;j+=1)
>> ~ {
>> ~ int w;
>> ~ if(!i2->S[i][j])
>> ~ continue;
>> ~ o_p=i2->P[i][j];
>> ~ o_w=i2->W[i][j];
>> ~ for(p=0;p<_R_RES_;p++)
>> ~ for(w=0;w<_R_WIDTH_;w++)
>> ~ {
>> ~ if(!i2->R_valid[p])
>> ~ continue;
>> ~ // test if phase in pic is close to any phase in
>> R-Table ~ if(abs(o_p-i2->R_P[p][w])<5)
>> ~ if(abs(o_w-i2->R_W[p][w])<5)
>> if((i-i2->R_x[p][w]>=0)&&(i-i2->R_x[p][w]<_YUV_WIDTH)&&
>> (j-i2->R_y[p][w]>=0)&&(j-i2->R_y[p][w]<_YUV_HEIGHT))
>> ~ i2->result[i-i2->R_x[p][w]][j-i2->R_y[p][w]]+=0x1;
>> ~ }
>> ~ }
>>
>> 4 for loops with many arry access in random way.
>
> Whoever wrote that should be shot.
It's a pretty good example of why we need the same caning justice that
Singapore has.
--
It'salwaysbeenmygoalinlifetocreateasignaturethatendedwiththeword"blarphoogy".
|
|
0
|
|
|
|
Reply
|
tgm2tothe10thpower (2065)
|
10/8/2004 4:15:09 PM
|
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
| Whoever wrote that should be shot.
I agree that my code is not as nice as Marilyn Monroe, but could any one
tell me what is deeply wrong in it ?
I think if I ever try to port it in ASM, it could get faster, but
definitely even wores/unreadable
- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBZzWRGWSTLbOSw8IRAjIGAKCcGNY2rHyz2e7tQsECWhfThgSVeACgi2K4
yLnCdpXJn3LZRW+7LIm44lg=
=DRzZ
-----END PGP SIGNATURE-----
|
|
0
|
|
|
|
Reply
|
nntp_pipex (70)
|
10/9/2004 12:49:21 AM
|
|
DEMAINE Benoit-Pierre wrote:
> | Whoever wrote that should be shot.
>
> I agree that my code is not as nice as Marilyn Monroe, but could any one
> tell me what is deeply wrong in it ?
It could be just as fast even with complete variable names, short methods,
and fewer multiple [][][] indices.
> I think if I ever try to port it in ASM, it could get faster, but
> definitely even wores/unreadable
Premature optimization is the root of not being Marilyn Monroe.
--
Phlip
http://industrialxp.org/community/bin/view/Main/TestFirstUserInterfaces
|
|
0
|
|
|
|
Reply
|
phlip_cpp (3649)
|
10/9/2004 1:27:38 AM
|
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
| It could be just as fast even with complete variable names, short methods,
| and fewer multiple [][][] indices.
I always use very short names. It is clearer in _my_ mind: short names
=> faster to write => faster to read => faster to understand.
| short methods,
Is PGHT() that big ? run2() is long :)
I gotta a good question: considering speed of execution, is it faster to use
tab[x][y] or tab[x+y*X_WIDTH] ?
especially when using 2 or 3D arrays of about 250 unit width each...
is a multiplication faster than memory acces ?
- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBZ1DuGWSTLbOSw8IRAvUVAJ4g8aKWFfrlM6CN739XS6ivEf+xwgCgwVsP
PReXSWNkLxgQLT81KpwkNBM=
=DO0H
-----END PGP SIGNATURE-----
|
|
0
|
|
|
|
Reply
|
nntp_pipex (70)
|
10/9/2004 2:46:06 AM
|
|
On Sat, 9 Oct 2004, DEMAINE Benoit-Pierre wrote:
[CBFalconer wrote, rightly:]
> | Whoever wrote that should be shot.
>
> I agree that my code is not as nice as Marilyn Monroe, but could any one
> tell me what is deeply wrong in it ?
Well, duh.
Because I'm a nice guy, I rewrote the loop you posted in a more standard
idiom. Of course, it's still not anywhere near decent, since I don't know
what it does and so I couldn't change the data structures to fit the
problem (e.g., you're using about five or six parallel arrays where a C
programmer would use a single array of structs), nor fix the comments (the
one existing comment doesn't explain anything AFAIC), nor fix the variable
names (since I don't know what they're meant to indicate).
'w' and 'p' seem particularly bizarre, since AFAIK even in French those
don't mean anything interesting. Also, you forgot to declare 'p'. I've
declared it for you in my rewrite.
-Arthur
for (i=0; i < YUV_WIDTH; ++i)
for (j=0; j < YUV_HEIGHT; ++j)
{
int o_p = i2->P[i][j];
int o_w = i2->W[i][j];
int p, w;
if (i2->S[i][j] == 0)
continue;
for (p=0; p < R_RES; ++p)
for (w=0; w < R_WIDTH; ++w)
{
int x = i - i2->R_x[p][w];
int y = j - i2->R_y[p][w];
if (i2->R_valid[p] == 0)
continue;
/*
test if phase in pic is close to any phase in R-Table
*/
if ( x >= 0 && x < YUV_WIDTH
&& y >= 0 && y < YUV_HEIGHT
&& abs(o_p - i2->R_P[p][w]) < 5
&& abs(o_w - i2->R_W[p][w]) < 5
)
++i2->result[x][y];
}
}
|
|
0
|
|
|
|
Reply
|
ajo (1601)
|
10/9/2004 3:41:57 AM
|
|
In article <e8H9d.6796$Tx1.2577@newssvr16.news.prodigy.com>,
phlip_cpp@yahoo.com says...
> DEMAINE Benoit-Pierre wrote:
>
> > | Whoever wrote that should be shot.
> >
> > I agree that my code is not as nice as Marilyn Monroe, but could any one
> > tell me what is deeply wrong in it ?
>
> It could be just as fast even with complete variable names, short methods,
> and fewer multiple [][][] indices.
The layout and the variable names are pretty bad, and comments seem
lacking. I dislike some features such as the unnecessary use of
increment operators which could be in separate statements.
But splitting into multiple methods would make it far *less* readable
and comprehensible, and there is nothing wrong with multiple array
indices [if speed is a priority I reckon one should use pointers as much
as possible, but maybe the compiler will give the same code anyway].
- Gerry Quinn
|
|
0
|
|
|
|
Reply
|
gerryq (1321)
|
10/9/2004 10:58:27 AM
|
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Arthur J. O'Dwyer wrote:
|
| On Sat, 9 Oct 2004, DEMAINE Benoit-Pierre wrote:
| [CBFalconer wrote, rightly:]
|
|> | Whoever wrote that should be shot.
|>
|> I agree that my code is not as nice as Marilyn Monroe, but could any one
|> tell me what is deeply wrong in it ?
|
|
| Well, duh.
|
| Because I'm a nice guy, I rewrote the loop you posted in a more
| standard idiom.
Thank you :)
| Of course, it's still not anywhere near decent, since I
| don't know what it does and so I couldn't change the data structures to
| fit the problem (e.g., you're using about five or six parallel arrays
| where a C
| programmer would use a single array of structs), nor fix the comments (the
| one existing comment doesn't explain anything AFAIC), nor fix the
| variable names (since I don't know what they're meant to indicate).
| 'w' and 'p' seem particularly bizarre, since AFAIK even in French
| those don't mean anything interesting. Also, you forgot to declare
| 'p'. I've
| declared it for you in my rewrite.
You could have read all that in http://doublehp.free.fr/tmp/xvcam.c
P stands for phase, and W for width. That is explained for i2->P in the
header of the file where I declare i2.
are you sure an array of stucts would be faster that a sturct of arrays ?
I will study your patch ASAP = within the week. Thank.
- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBZ+SmGWSTLbOSw8IRAjATAKCx+kt0tCyR0ocXKaede/Co1o2JTgCeK1Zi
lbcZcwrPMOzw/QfwFfxDPM8=
=DXwu
-----END PGP SIGNATURE-----
|
|
0
|
|
|
|
Reply
|
nntp_pipex (70)
|
10/9/2004 1:16:23 PM
|
|
DEMAINE wrote:
) are you sure an array of stucts would be faster that a sturct of arrays ?
Probably.
I haven't read the actual code or anything, but I can list two reasons
why an array of structs is considerablyfaster than a struct of arrays:
- You get locality of reference. That is, the data you access is next to
each other in memory, meaning that it gets cached in one go.
- You only need one pointer, with (small) fixed offsets, instead of a whole
set of pointers with the same offset.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
|
|
0
|
|
|
|
Reply
|
willem (1478)
|
10/9/2004 5:17:25 PM
|
|
Gerry Quinn coughed up:
....[rip]...
> [...] and there is nothing wrong with multiple array
> indices [if speed is a priority I reckon one should use pointers as
> much as possible, but maybe the compiler will give the same code
> anyway].
Sure. You know this already, but reagarding ptr vs. index, the compiler
/usually does/ give the same code anyway, except for cases where a pointer
can simply be incremented over added.
AFAICT, in all cases of (in C)
*(longptr + x)
and
longptr[x]
the asm code is the same. At least for all the compilers I've ever tested
this on.
Now in this case, unless the optimizer is spectacular, the following are not
identical:
longptr = &longarray[0];
for (x = 0; x < 1000; x++) // using x is silly, this is only
an example
*longptr++ = 5;
and
for (x = 0; x < 1000; x++)
longarray[x] = 5;
produces slower code in the latter case, because it is the same as
for (x = 0; x < 1000; x++)
*(longarray + x) = 5;
and most machines have ptr+something usually slower than, say, ptr+4, which
can often be part of the opcode at the machine level for all small values of
4 :)
>
> - Gerry Quinn
--
"Gentlemen, you can't fight in here! This is the War Room!"
|
|
0
|
|
|
|
Reply
|
tgm2tothe10thpower (2065)
|
10/9/2004 7:10:20 PM
|
|
Thomas wrote:
) AFAICT, in all cases of (in C)
)
) *(longptr + x)
)
) and
)
) longptr[x]
)
) the asm code is the same. At least for all the compilers I've ever tested
) this on.
If I'm not mistaken, longptr[x] is just syntactic sugar for *(longptr + x)
That's also why the oldie-obfie-hack x[longptr] works.
) Now in this case, unless the optimizer is spectacular, the following are not
) identical:
)
) longptr = &longarray[0];
) for (x = 0; x < 1000; x++) // using x is silly, this is only
) an example
) *longptr++ = 5;
)
) and
)
) for (x = 0; x < 1000; x++)
) longarray[x] = 5;
)
) produces slower code in the latter case, because it is the same as
)
) for (x = 0; x < 1000; x++)
) *(longarray + x) = 5;
)
) and most machines have ptr+something usually slower than, say, ptr+4, which
) can often be part of the opcode at the machine level for all small values of
) 4 :)
I disagree, assuming 'most machines' are pentia.
On (modern) x86, the latter code is faster, because there is an adressing
mode that's specificly designed for indexing, and the first piece of code
has an extra (separate) increment instruction.
Old machines used to have postincrement opcodes, but things change. ^_^
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
|
|
0
|
|
|
|
Reply
|
willem (1478)
|
10/9/2004 7:58:32 PM
|
|
Willem coughed up:
> Thomas wrote:
> ) AFAICT, in all cases of (in C)
> )
> ) *(longptr + x)
> )
> ) and
> )
> ) longptr[x]
> )
> ) the asm code is the same. At least for all the compilers I've ever
> tested ) this on.
>
> If I'm not mistaken, longptr[x] is just syntactic sugar for *(longptr
> + x) That's also why the oldie-obfie-hack x[longptr] works.
>
> ) Now in this case, unless the optimizer is spectacular, the
> following are not ) identical:
> )
> ) longptr = &longarray[0];
> ) for (x = 0; x < 1000; x++) // using x is silly, this
> is only ) an example
> ) *longptr++ = 5;
> )
> ) and
> )
> ) for (x = 0; x < 1000; x++)
> ) longarray[x] = 5;
> )
> ) produces slower code in the latter case, because it is the same as
> )
> ) for (x = 0; x < 1000; x++)
> ) *(longarray + x) = 5;
> )
> ) and most machines have ptr+something usually slower than, say,
> ptr+4, which ) can often be part of the opcode at the machine level
> for all small values of ) 4 :)
>
> I disagree, assuming 'most machines' are pentia.
Ok, but no, I was actually thinking of the unix department I used to run and
port a postscript app to:
sparc
sillicon graphic's (early) MIPS based architecture
DG Aviion (88000?)
etc.
These were often non-ansii C compilers, if that matters any.
....[rip]...
--
"His name was Robert Paulson. His name was Robert Paulson. His name was
Robert Paulson..."
|
|
0
|
|
|
|
Reply
|
tgm2tothe10thpower (2065)
|
10/9/2004 8:09:22 PM
|
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Willem wrote:
| DEMAINE wrote:
| ) are you sure an array of stucts would be faster that a sturct of
arrays ?
|
| Probably.
| I haven't read the actual code or anything, but I can list two reasons
| why an array of structs is considerablyfaster than a struct of arrays:
|
| - You get locality of reference. That is, the data you access is next to
| each other in memory, meaning that it gets cached in one go.
| - You only need one pointer, with (small) fixed offsets, instead of a
whole
| set of pointers with the same offset.
Your consideration shall not work in my case because the index used in
various are not the same in all arrays: I first acces tab[i], and then
c=tab2[f(i)] , then tab3[f(c,i)] ...
- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBaI5XGWSTLbOSw8IRAmUYAJ0UiJjCZK214mk3wr3hpsykpkb45wCeJPqA
yAbI6adZVh4slJj2HX9azQM=
=TJoi
-----END PGP SIGNATURE-----
|
|
0
|
|
|
|
Reply
|
nntp_pipex (70)
|
10/10/2004 1:20:24 AM
|
|
|
20 Replies
24 Views
(page loaded in 5.468 seconds)
Similiar Articles: How to estimate exectution time - comp.soft-sys.matlab... to know if there is away to estimate the time needed to execute a matlab code in a ... to effectively rewrite your program, looking for ways that it thinks will optimize ... Optimizing ntpd memory usage? - comp.protocols.time.ntpHow to reduce code? - comp.unix.programmer Optimizing ntpd memory usage? - comp.protocols.time.ntp Reduce the size of libc by eliminating things that are not actually ... Fitting Voigt Profile with NLINFIT - comp.soft-sys.matlab ...The initial guesses for the Doppler and Lorentzian coefficients are good and I want NLINFIT to improve them. However, the code I have written increases these coefficients ... Matlab to C code transfer function of system ? - comp.soft-sys ...FFT of Impulse response - comp.soft-sys.matlab How can I improve upon my code to correctly ... sys.matlab Phase versus Time ... Determining The Transfer Function Of A ... MATLAB Vs C - comp.soft-sys.matlabHave you tried running your MATLAB code in the Profiler to identify the bottlenecks? Have you then attempted to improve those bottlenecks? Do you have any Code ... FFT of Impulse response - comp.soft-sys.matlabIt is a jagged, circular shape, when I'm expecting a nice clean frequency response. How can I improve upon my code to correctly FFT the impulse response? optimization using lagrange multiplier method - comp.soft-sys ...Need Matlab code for Constrained Particle Swarm Optimization ... optimization using ... soft ... optimization algorithm on lagrange multiplier method - comp.soft ... optimizing ... mtimes Inner matrix dimensions must agree. - comp.soft-sys.matlab ...> > Error in ==> multiobjective_three at 4 > d = - expDiv.*Wts'; The code in the ... should be > taken to gamultiobj in order to find the values of Wts that optimize in ... Axis appearance: avoid tick mark labels to overlap - comp.soft-sys ...Hi guys, do you pls know a way to improve the axis appearance in the event tick ... I am looking for a way to adjust the code so as I don't need to "correct" each ... How to strip comments out of code - comp.lang.java.programmer ...Compilers are free to optimize, or -- just like the obfuscators -- to "mangle" the code in the way preventing from reverse engineering (even not fully generated debug ... improve strlen - comp.lang.asm.x86I was carefully hand-optimizing code for the 486 and then the Pentium came out and changed all the rules. Then the PII, then the PIII, then the PIV. Can anyone help me with assembly language? - comp.lang.asm.x86 ...Please post the code that you have written so far -- you will then likely receive constructive comments on how to fix/improve that code. What you will _not_ receive ... How to write testbench file? - comp.lang.vhdlI have the following .vhd code and is having a hard time trying to write the ... 4 to 16 decoder, then optimize away the upper 8 outputs. > That would NOT optimize away ... Convert Simulink mdl to CUDA for GPU computing? - comp.soft-sys ...I am currently trying to optimize parameters of a pretty complex Simulink model (fuel ... That's why I wrote my own GA code in C. I have access to GPU computing (NVidia ... 7 Ways to Optimize C# Code - Visual C# KicksDiscover 7 simple ways to optimize your C# applications. Improve the readability and performance of your C# source code by making simple tweaks. Program optimization - Wikipedia, the free encyclopediaAn automatic optimizer (or optimizing compiler, a program that performs code optimization) may itself have to be optimized, either to further improve the efficiency of ... 7/25/2012 3:12:13 AM
|