How to optimise code ?

  • Follow


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I wrote an application:
http://doublehp.free.fr/tmp/xvcam.c
it is derived from xvcam ( part of camE project, part of Enlightenment
project )

It originally was just a viewer for webcam.
My version is able to compute image recognition using a modified version
of the Randomized Generalized Hough Transform.

- -1- I would like some people to have look at how I do the HT, and if it
possible to improove it with no more computation.
You can read about my own critics about that in
http://www.demaine.info/projet_self_tracking_webcam/formal_report.pdf
in section 12 starting at page 14 ( the pics in that section are not the
good ones yet)
Just look at the function PGHT(). Other *HT() were testing stuff. In
order to know how I initialise everything, you might want to read the
beginning of run2().

- -2- I would like to know how to optimise the work of conv1(), conv2() (
matrix convolution ), PGHT() ( my modified HT), and the first part of
run2(), in order to make things faster.

As said in tho logbook (
http://www.demaine.info/projet_self_tracking_webcam/log_book.pdf )
I tried to precompute cos/sin tables, but calling the result took more
time than computing them in real time.

What other kind of optimisation can I do ?

I dont think anybody can compile that file. It depends on many stuff.
And none of you will be able to run it since it access a USB device I
home made. Later versions should be more friendly. But for now I d like
to improve computing/calculous.

Thank for any comment.

- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZLdbGWSTLbOSw8IRAnxdAJ4vXJDRNxu4hT3jxu39T3Eh/SrEkgCfY6m9
1Zz1lsq0BOaRDZBBiHv1JhM=
=Iq7E
-----END PGP SIGNATURE-----
0
Reply nntp_pipex (70) 10/7/2004 3:26:20 AM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Note that I think what take most of the time is memory acces which looks
random, and IMHO can not be predict by cashes; the address of the next
cell is a function applied to the content of the current cell. How to
make that more efficient ?

- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZT2FGWSTLbOSw8IRAhnlAJwIwhQe6tIPkeWlchnJkxBONsh5cQCcDYlS
jH4a4kbHd+fgvGTPqXUe+ss=
=TwOG
-----END PGP SIGNATURE-----
0
Reply nntp_pipex (70) 10/7/2004 12:58:45 PM


DEMAINE Benoit-Pierre wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I wrote an application:
> http://doublehp.free.fr/tmp/xvcam.c
> it is derived from xvcam ( part of camE project, part of Enlightenment
> project )
> 
> It originally was just a viewer for webcam.
> My version is able to compute image recognition using a modified version
> of the Randomized Generalized Hough Transform.
> 
> - -1- I would like some people to have look at how I do the HT, and if it
> possible to improove it with no more computation.
> You can read about my own critics about that in
> http://www.demaine.info/projet_self_tracking_webcam/formal_report.pdf
> in section 12 starting at page 14 ( the pics in that section are not the
> good ones yet)
> Just look at the function PGHT(). Other *HT() were testing stuff. In
> order to know how I initialise everything, you might want to read the
> beginning of run2().
> 
> - -2- I would like to know how to optimise the work of conv1(), conv2() (
> matrix convolution ), PGHT() ( my modified HT), and the first part of
> run2(), in order to make things faster.
> 
> As said in tho logbook (
> http://www.demaine.info/projet_self_tracking_webcam/log_book.pdf )
> I tried to precompute cos/sin tables, but calling the result took more
> time than computing them in real time.
> 
> What other kind of optimisation can I do ?
....

In general, I suggest:

1) Measure where the time is spent in your code when it runs.  Remember, there's
no point in optimizing code unless it contributes substantially to your runtime.
 You'll also need a precise measure of runtime to know whether each optimization
that you try is making the code faster or slower.  If you don't time your
program precisely, you can't know if your optimizations are helping or hindering.

2) There are three ways to speed up a program: I) reduce the number of
instructions, II) make better use of the CPU, and III) minimize the time the
program spends waiting for system resources (like memory).  If you're using a
capable compiler and you're applying its complete set of compiler optimization
flags (like -O3 and -funroll-loops), then you've reduced the number of
instructions and improved instruction throughput.  Probably the only thing left
to do is look at the way that you access memory.  Of course, if you can find an
example of the function that you want to optimize that has already been
optimized (in a library or a book on algorithms, perhaps), that's going to be
the fastest way to improve your code.

(If your compiler can produce optimization reports, explaining what
optimizations it could and could not perform on your code, you should try to
generate these.  You can learn a lot about the strengths and weaknesses of your
code this way.)

Unless memory accesses appear predictable to your eye, the compiler will be
unable to predict the access pattern either.  If you are chasing pointers (which
is what you seem to be describing), the only way to improve memory reuse
(caching) may be to devise a different data structure that's more regular.

Remember too, premature optimization is the root of all evil...

    Randy

-- 
Randy Crawford   http://www.ruf.rice.edu/~rand   rand AT rice DOT edu
0
Reply joe304 (205) 10/7/2004 4:39:32 PM

Randy coughed up:
> DEMAINE Benoit-Pierre wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I wrote an application:
>> http://doublehp.free.fr/tmp/xvcam.c
>> it is derived from xvcam ( part of camE project, part of
>> Enlightenment project )
>>
>> It originally was just a viewer for webcam.
>> My version is able to compute image recognition using a modified
>> version of the Randomized Generalized Hough Transform.
>>
>> - -1- I would like some people to have look at how I do the HT, and
>> if it possible to improove it with no more computation.
>> You can read about my own critics about that in
>> http://www.demaine.info/projet_self_tracking_webcam/formal_report.pdf
>> in section 12 starting at page 14 ( the pics in that section are not
>> the good ones yet)
>> Just look at the function PGHT(). Other *HT() were testing stuff. In
>> order to know how I initialise everything, you might want to read the
>> beginning of run2().
>>
>> - -2- I would like to know how to optimise the work of conv1(),
>> conv2() ( matrix convolution ), PGHT() ( my modified HT), and the
>> first part of run2(), in order to make things faster.
>>
>> As said in tho logbook (
>> http://www.demaine.info/projet_self_tracking_webcam/log_book.pdf )
>> I tried to precompute cos/sin tables, but calling the result took
>> more time than computing them in real time.
>>
>> What other kind of optimisation can I do ?
> ...
>
> In general, I suggest:
>
> 1) Measure where the time is spent in your code when it runs.

Remember too, that in most profiler, there are usually two distinct metrics, 
both of which can be critical:

    1. the amount of time spent in a method and all sub methods called
    2. the amount of time spent in a method devoid of all the sub methods 
called

....[rip]...


> Remember too, premature optimization is the root of all evil...

What he said...

....[rip]...


-- 
Everythinginlifeisrealative.Apingpongballseemssmalluntilsomeoneramsitupyournose.


0
Reply tgm2tothe10thpower (2065) 10/7/2004 5:57:16 PM

DEMAINE Benoit-Pierre wrote:

> - -2- I would like to know how to optimise the work of conv1(),
> conv2() ( matrix convolution ), PGHT() ( my modified HT), and
> the first part of run2(), in order to make things faster.

You can perform convolution by Fourier transforming the image and 
filter kernel, multiplying pointwise, and transforming back. Whether 
that's faster than direct convolution depends on the involved 
matrices' sizes; you'd have to experiment here.

If you don't want so much floating point computation, there are also 
Fourier transforms over finite rings, the so-called number-theoretic 
transforms. NTT's share the convolution/multiplication duality and 
admit FFT-like fast algorithms, but the underlying algebraic system's 
nature constrains possible transform lengths.

Google for more information, there should be heaps of code and 
explanation for the regular Fourier method in particular.

> As said in tho logbook (
> http://www.demaine.info/projet_self_tracking_webcam/log_book.pdf
> ) I tried to precompute cos/sin tables, but calling the result
> took more time than computing them in real time.

That can happen since large tables are cache-unfriendly. In 
extract_R() the angle changes by a constant amount between trig 
calls, so you could use a recursive generator there, giving both
sine and cosine quite cheaply. Unfortunately, that won't work for 
arbitrary angles like in length(). A fine article on those animals
is available from its author's site:
http://personal.atl.bellsouth.net/p/h/physics/2nd_OSC_paper.pdf

Incidentally, this line from extract_R() seems weird to me:
    x=mouse_x+(float)r*cosf((float)o);
Wouldn't that be more like cosf((2*PI*o)/_R_RES_)?
I may be mistaken, though, because I don't understand your
code at all. I hope my comments are helpful despite that ;)
Responses regarding the Hough transform may be more probable
in sci.image.processing or comp.graphics.algorithms.


Martin

-- 
Sartre promotes the use of submodernist semiotic
theory to attack sexist perceptions of art.

http://www.elsewhere.org/cgi-bin/postmodern/
0
Reply martin.eisenbergNOS (92) 10/7/2004 6:41:11 PM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

| 1) Measure where the time is spent in your code when it runs.
Remember, there's
| no point in optimizing code unless it contributes substantially to
your runtime.
|  You'll also need a precise measure of runtime to know whether each
optimization
| that you try is making the code faster or slower.  If you don't time your
| program precisely, you can't know if your optimizations are helping or
hindering.

My logbook tell I done that about sin/cos precomputed tables.

| 2) There are three ways to speed up a program: I) reduce the number of
| instructions, II) make better use of the CPU, and III) minimize the
time the
| program spends waiting for system resources (like memory).

I think that the problem in my program is that it make too many memory
access to unpredictable locations, so that the caches can not preload
RAM. If you read PGHT(), you will see that it is not that big, but it
involves many for loops whith many random arry acces in them.

Since I dont know how many registers there is in a AMD thunderbird, I
can not know whether I can keep some stuff in variables, hoping that the
values stay in registers ( thus saving time ).


| If you're using a
| capable compiler

gcc version 3.3.4 (Debian 1:3.3.4-13)

| and you're applying its complete set of compiler optimization
| flags (like -O3 and -funroll-loops), then you've reduced the number of
| instructions and improved instruction throughput.

I did not think about that. What options are good for memory access ?
does stripping improove speed ?

| Probably the only thing left
| to do is look at the way that you access memory.  Of course, if you
can find an
| example of the function that you want to optimize that has already been
| optimized (in a library or a book on algorithms, perhaps), that's
going to be
| the fastest way to improve your code.

I get you, but I dont know how to aply that to my code any further.

| (If your compiler can produce optimization reports, explaining what
| optimizations it could and could not perform on your code, you should
try to
| generate these.  You can learn a lot about the strengths and
weaknesses of your
| code this way.)

How do I get such reports with gcc ?

| Unless memory accesses appear predictable to your eye, the compiler
will be
| unable to predict the access pattern either.  If you are chasing
pointers (which
| is what you seem to be describing), the only way to improve memory reuse
| (caching) may be to devise a different data structure that's more regular.

The worse point is that I run along every pixel of a picture, and for
each of them, I run along a 50*3 matrix, and if the content of the
matrix is not null, then I read relative coordinates in the array, and
find out an other point of the picture, and modify it. I can not guess
the coordinates of the point I will modify before I have computed them
from coordinates of current point and variables in the array ...

I dont think I can divide anything: if I spread the 50*3 array, I have
to run several time the loop which runs along the picture.

I am not even sure I could be able to divide the work to make it more
efficient with SMP: all the work is about multiplying matrixes: you need
an eye on the whole matrix at any time if you want to do any thing with it.

- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZc1tGWSTLbOSw8IRApwxAKCWh+PgQUHBsi7Y6MX28soXyUADAgCeIXft
P0Vr/KgZq2gX2zPFrx6q+HI=
=qqfW
-----END PGP SIGNATURE-----
0
Reply nntp_pipex (70) 10/7/2004 11:12:46 PM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Eisenberg wrote:
| DEMAINE Benoit-Pierre wrote:
|
|
|>- -2- I would like to know how to optimise the work of conv1(),
|>conv2() ( matrix convolution ), PGHT() ( my modified HT), and
|>the first part of run2(), in order to make things faster.
|
| You can perform convolution by Fourier transforming the image and
| filter kernel, multiplying pointwise, and transforming back. Whether
| that's faster than direct convolution depends on the involved
| matrices' sizes; you'd have to experiment here.

Where can I read about your FT ? I only use 0-255 integer range.

My matrixes have variable sizes, but all parametrable with #define

The main optimisation should be about PGHT()

The worse part is :

~        for(i=0;i<_YUV_WIDTH;i+=1)
~                for(j=0;j<_YUV_HEIGHT;j+=1)
~                {
~                        int w;
~                        if(!i2->S[i][j])
~                                continue;
~                        o_p=i2->P[i][j];
~                        o_w=i2->W[i][j];
~                        for(p=0;p<_R_RES_;p++)
~                        for(w=0;w<_R_WIDTH_;w++)
~                        {
~                                if(!i2->R_valid[p])
~                                        continue;
~                // test if phase in pic is close to any phase in R-Table
~                                if(abs(o_p-i2->R_P[p][w])<5)
~                                if(abs(o_w-i2->R_W[p][w])<5)
if((i-i2->R_x[p][w]>=0)&&(i-i2->R_x[p][w]<_YUV_WIDTH)&&
(j-i2->R_y[p][w]>=0)&&(j-i2->R_y[p][w]<_YUV_HEIGHT))
~        i2->result[i-i2->R_x[p][w]][j-i2->R_y[p][w]]+=0x1;
~                        }
~                }

4 for loops with many arry access in random way.

| That can happen since large tables are cache-unfriendly.

:(

| Incidentally, this line from extract_R() seems weird to me:
|     x=mouse_x+(float)r*cosf((float)o);
| Wouldn't that be more like cosf((2*PI*o)/_R_RES_)?

hmmm by the time I wrote that line, _R_RES used to be 360. Further more,
cos(3) uses angle in radians. So, you are right, my code is badly
written. but practically, it still works :) guess why ...

if you think you use radians, and use an angle from 0 to 30 with a
stepper of _1_, since a full angle is 6.2831, the value 7 will make
computation as for 0.7189, 8 as for 1.7189 ... and so on, so that I do a
compleet circle every 6 steps, but do not make computations for
redundant values. Every next circle improove the resolution in angle ...
so that I obtain the same result as cosf((2*PI*o)/_R_RES_) but saving
two mult and a division (at the expense of a modulo done inside the cos)
\o/ This a perfect example of non voluntary optimisation \o/ will you
believe this ?

| I may be mistaken,

You are right, but aplying your modification would make things slower
without improoving the work :)

| though, because I don't understand your
| code at all.

Some time I dont either :) ( see RGHT() I am not even able to understand
what I wrote ^^ )

- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZdLdGWSTLbOSw8IRAh8sAJ91F/3FloFFYOfjfUlKNKjcPNpt0wCeLdIe
An3Qs64VwGLUyvCNg3KQz6I=
=quFm
-----END PGP SIGNATURE-----
0
Reply nntp_pipex (70) 10/7/2004 11:35:58 PM

DEMAINE Benoit-Pierre wrote:
> 
.... snip ...
> 
> The worse part is :
> 
> ~        for(i=0;i<_YUV_WIDTH;i+=1)
> ~                for(j=0;j<_YUV_HEIGHT;j+=1)
> ~                {
> ~                        int w;
> ~                        if(!i2->S[i][j])
> ~                                continue;
> ~                        o_p=i2->P[i][j];
> ~                        o_w=i2->W[i][j];
> ~                        for(p=0;p<_R_RES_;p++)
> ~                        for(w=0;w<_R_WIDTH_;w++)
> ~                        {
> ~                                if(!i2->R_valid[p])
> ~                                        continue;
> ~                // test if phase in pic is close to any phase in R-Table
> ~                                if(abs(o_p-i2->R_P[p][w])<5)
> ~                                if(abs(o_w-i2->R_W[p][w])<5)
> if((i-i2->R_x[p][w]>=0)&&(i-i2->R_x[p][w]<_YUV_WIDTH)&&
> (j-i2->R_y[p][w]>=0)&&(j-i2->R_y[p][w]<_YUV_HEIGHT))
> ~        i2->result[i-i2->R_x[p][w]][j-i2->R_y[p][w]]+=0x1;
> ~                        }
> ~                }
> 
> 4 for loops with many arry access in random way.

Whoever wrote that should be shot.


-- 
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
   <http://cbfalconer.home.att.net>  USE worldnet address!


0
Reply cbfalconer (19183) 10/8/2004 3:12:51 AM

DEMAINE Benoit-Pierre wrote:

> Where can I read about your FT ? I only use 0-255 integer range.

http://www.google.com/search?q=fft+convolution&num=15
http://www.google.com/search?q=fft+convolution+%22two+dimensions%22
http://www.numerical-recipes.com/nronline_switcher.html
    (chapters 12.4, 12.5, 13.1)
http://www.jjj.de/fxt/fxtpage.html (has NTT info as well)
http://www.fftw.org/

 I include the first search because many texts on FFT convolution 
only treat the one-dimensional case, but it's easy to generalize 
conceptually since multidimensional DFT operators are separable into 
one 1D transform in each direction. However, I think you want to use 
a canned 2D transform to actually carry it out.

The main things to understand are (1) the difference between circular 
and linear convolution and the use of zero-padding to reconcile the 
two; and flowing from that, (2) the overlap-add or overlap-save 
methods for blockwise computation. You will probably encounter talk 
about data tapering windows and analysis frame overlap (a different 
"overlap" than above). These are only used in spectral analysis, not 
in pure convolution, and are thus irrelevant to your application. 
Pose specific questions on the image processing groups or comp.dsp.

> if you think you use radians, and use an angle from 0 to 30 with
> a stepper of _1_, since a full angle is 6.2831, the value 7 will
> make computation as for 0.7189, 8 as for 1.7189 ... and so on,
> so that I do a compleet circle every 6 steps, but do not make
> computations for redundant values. Every next circle improove
> the resolution in angle ... so that I obtain the same result as
> cosf((2*PI*o)/_R_RES_) but saving two mult and a division (at
> the expense of a modulo done inside the cos) \o/ This a perfect
> example of non voluntary optimisation \o/ will you believe this ?

I get it. But I still wonder about consistency of this "relabeling" 
of angles with the rest of the program. Digressing, I guess "\o/" has 
smiley semantics, but is it meant to suggest any particular object?

> Some time I dont either :) ( see RGHT() I am not even able to
> understand what I wrote ^^ )

While I don't even know the algorithm in the abstract,
that honestly does not surprise me in the least ;)


Martin

-- 
Quidquid latine dictum sit, altum viditur.
0
Reply martin.eisenbergNOS (92) 10/8/2004 8:39:48 AM

CBFalconer coughed up:
> DEMAINE Benoit-Pierre wrote:
>>
> ... snip ...
>>
>> The worse part is :
>>
>> ~        for(i=0;i<_YUV_WIDTH;i+=1)
>> ~                for(j=0;j<_YUV_HEIGHT;j+=1)
>> ~                {
>> ~                        int w;
>> ~                        if(!i2->S[i][j])
>> ~                                continue;
>> ~                        o_p=i2->P[i][j];
>> ~                        o_w=i2->W[i][j];
>> ~                        for(p=0;p<_R_RES_;p++)
>> ~                        for(w=0;w<_R_WIDTH_;w++)
>> ~                        {
>> ~                                if(!i2->R_valid[p])
>> ~                                        continue;
>> ~                // test if phase in pic is close to any phase in
>> R-Table ~                                if(abs(o_p-i2->R_P[p][w])<5)
>> ~                                if(abs(o_w-i2->R_W[p][w])<5)
>> if((i-i2->R_x[p][w]>=0)&&(i-i2->R_x[p][w]<_YUV_WIDTH)&&
>> (j-i2->R_y[p][w]>=0)&&(j-i2->R_y[p][w]<_YUV_HEIGHT))
>> ~        i2->result[i-i2->R_x[p][w]][j-i2->R_y[p][w]]+=0x1;
>> ~                        }
>> ~                }
>>
>> 4 for loops with many arry access in random way.
>
> Whoever wrote that should be shot.


It's a pretty good example of why we need the same caning justice that 
Singapore has.


-- 
It'salwaysbeenmygoalinlifetocreateasignaturethatendedwiththeword"blarphoogy".


0
Reply tgm2tothe10thpower (2065) 10/8/2004 4:15:09 PM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

| Whoever wrote that should be shot.

I agree that my code is not as nice as Marilyn Monroe, but could any one
tell me what is deeply wrong in it ?

I think if I ever try to port it in ASM, it could get faster, but
definitely even wores/unreadable

- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZzWRGWSTLbOSw8IRAjIGAKCcGNY2rHyz2e7tQsECWhfThgSVeACgi2K4
yLnCdpXJn3LZRW+7LIm44lg=
=DRzZ
-----END PGP SIGNATURE-----
0
Reply nntp_pipex (70) 10/9/2004 12:49:21 AM

DEMAINE Benoit-Pierre wrote:

> | Whoever wrote that should be shot.
>
> I agree that my code is not as nice as Marilyn Monroe, but could any one
> tell me what is deeply wrong in it ?

It could be just as fast even with complete variable names, short methods,
and fewer multiple [][][] indices.

> I think if I ever try to port it in ASM, it could get faster, but
> definitely even wores/unreadable

Premature optimization is the root of not being Marilyn Monroe.

-- 
  Phlip
  http://industrialxp.org/community/bin/view/Main/TestFirstUserInterfaces


0
Reply phlip_cpp (3649) 10/9/2004 1:27:38 AM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

| It could be just as fast even with complete variable names, short methods,
| and fewer multiple [][][] indices.

I always use very short names. It is clearer in _my_ mind: short names
=> faster to write => faster to read => faster to understand.

| short methods,

Is PGHT() that big ? run2() is long :)

I gotta a good question: considering speed of execution, is it faster to use
tab[x][y] or tab[x+y*X_WIDTH] ?
especially when using 2 or 3D arrays of about 250 unit width each...
is a multiplication faster than memory acces ?

- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZ1DuGWSTLbOSw8IRAvUVAJ4g8aKWFfrlM6CN739XS6ivEf+xwgCgwVsP
PReXSWNkLxgQLT81KpwkNBM=
=DO0H
-----END PGP SIGNATURE-----
0
Reply nntp_pipex (70) 10/9/2004 2:46:06 AM

On Sat, 9 Oct 2004, DEMAINE Benoit-Pierre wrote:
[CBFalconer wrote, rightly:]
> | Whoever wrote that should be shot.
>
> I agree that my code is not as nice as Marilyn Monroe, but could any one
> tell me what is deeply wrong in it ?

   Well, duh.

   Because I'm a nice guy, I rewrote the loop you posted in a more standard 
idiom.  Of course, it's still not anywhere near decent, since I don't know 
what it does and so I couldn't change the data structures to fit the 
problem (e.g., you're using about five or six parallel arrays where a C
programmer would use a single array of structs), nor fix the comments (the
one existing comment doesn't explain anything AFAIC), nor fix the variable 
names (since I don't know what they're meant to indicate).
   'w' and 'p' seem particularly bizarre, since AFAIK even in French those 
don't mean anything interesting.  Also, you forgot to declare 'p'.  I've
declared it for you in my rewrite.

-Arthur


     for (i=0; i < YUV_WIDTH; ++i)
     for (j=0; j < YUV_HEIGHT; ++j)
     {
         int o_p = i2->P[i][j];
         int o_w = i2->W[i][j];
         int p, w;

         if (i2->S[i][j] == 0)
           continue;

         for (p=0; p < R_RES; ++p)
         for (w=0; w < R_WIDTH; ++w)
         {
             int x = i - i2->R_x[p][w];
             int y = j - i2->R_y[p][w];

             if (i2->R_valid[p] == 0)
               continue;

             /*
                test if phase in pic is close to any phase in R-Table
             */
             if (   x >= 0 && x < YUV_WIDTH
                 && y >= 0 && y < YUV_HEIGHT
                 && abs(o_p - i2->R_P[p][w]) < 5
                 && abs(o_w - i2->R_W[p][w]) < 5
                )
               ++i2->result[x][y];
         }
     }
0
Reply ajo (1601) 10/9/2004 3:41:57 AM

In article <e8H9d.6796$Tx1.2577@newssvr16.news.prodigy.com>, 
phlip_cpp@yahoo.com says...
> DEMAINE Benoit-Pierre wrote:
> 
> > | Whoever wrote that should be shot.
> >
> > I agree that my code is not as nice as Marilyn Monroe, but could any one
> > tell me what is deeply wrong in it ?
> 
> It could be just as fast even with complete variable names, short methods,
> and fewer multiple [][][] indices.

The layout and the variable names are pretty bad, and comments seem 
lacking.  I dislike some features such as the unnecessary use of 
increment operators which could be in separate statements.

But splitting into multiple methods would make it far *less* readable 
and comprehensible, and there is nothing wrong with multiple array 
indices [if speed is a priority I reckon one should use pointers as much 
as possible, but maybe the compiler will give the same code anyway].

- Gerry Quinn


0
Reply gerryq (1321) 10/9/2004 10:58:27 AM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Arthur J. O'Dwyer wrote:
|
| On Sat, 9 Oct 2004, DEMAINE Benoit-Pierre wrote:
| [CBFalconer wrote, rightly:]
|
|> | Whoever wrote that should be shot.
|>
|> I agree that my code is not as nice as Marilyn Monroe, but could any one
|> tell me what is deeply wrong in it ?
|
|
|   Well, duh.
|
|   Because I'm a nice guy, I rewrote the loop you posted in a more
| standard idiom.

Thank you :)

|  Of course, it's still not anywhere near decent, since I
| don't know what it does and so I couldn't change the data structures to
| fit the problem (e.g., you're using about five or six parallel arrays
| where a C
| programmer would use a single array of structs), nor fix the comments (the
| one existing comment doesn't explain anything AFAIC), nor fix the
| variable names (since I don't know what they're meant to indicate).
|   'w' and 'p' seem particularly bizarre, since AFAIK even in French
| those don't mean anything interesting.  Also, you forgot to declare
| 'p'.  I've
| declared it for you in my rewrite.

You could have read all that in http://doublehp.free.fr/tmp/xvcam.c
P stands for phase, and W for width. That is explained for i2->P in the
header of the file where I declare i2.

are you sure an array of stucts would be faster that a sturct of arrays ?

I will study your patch ASAP = within the week. Thank.

- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBZ+SmGWSTLbOSw8IRAjATAKCx+kt0tCyR0ocXKaede/Co1o2JTgCeK1Zi
lbcZcwrPMOzw/QfwFfxDPM8=
=DXwu
-----END PGP SIGNATURE-----
0
Reply nntp_pipex (70) 10/9/2004 1:16:23 PM

DEMAINE wrote:
) are you sure an array of stucts would be faster that a sturct of arrays ?

Probably.
I haven't read the actual code or anything, but I can list two reasons
why an array of structs is considerablyfaster than a struct of arrays:

- You get locality of reference.  That is, the data you access is next to
  each other in memory, meaning that it gets cached in one go.
- You only need one pointer, with (small) fixed offsets, instead of a whole
  set of pointers with the same offset.



SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Reply willem (1478) 10/9/2004 5:17:25 PM

Gerry Quinn coughed up:

....[rip]...

> [...] and there is nothing wrong with multiple array
> indices [if speed is a priority I reckon one should use pointers as
> much as possible, but maybe the compiler will give the same code
> anyway].

Sure.  You know this already, but reagarding ptr vs. index, the compiler 
/usually does/ give the same code anyway, except for cases where a pointer 
can simply be incremented over added.

AFAICT, in all cases of (in C)

        *(longptr + x)

and

        longptr[x]

the asm code is the same.  At least for all the compilers I've ever tested 
this on.

Now in this case, unless the optimizer is spectacular, the following are not 
identical:

        longptr = &longarray[0];
        for (x = 0; x < 1000; x++)        // using x is silly, this is only 
an example
            *longptr++ = 5;

and

        for (x = 0; x < 1000; x++)
            longarray[x] = 5;

produces slower code in the latter case, because it is the same as

        for (x = 0; x < 1000; x++)
            *(longarray + x) = 5;

and most machines have ptr+something usually slower than, say, ptr+4, which 
can often be part of the opcode at the machine level for all small values of 
4 :)


>
> - Gerry Quinn



-- 
"Gentlemen, you can't fight in here! This is the War Room!"


0
Reply tgm2tothe10thpower (2065) 10/9/2004 7:10:20 PM

Thomas wrote:
) AFAICT, in all cases of (in C)
)
)         *(longptr + x)
)
) and
)
)         longptr[x]
)
) the asm code is the same.  At least for all the compilers I've ever tested 
) this on.

If I'm not mistaken, longptr[x] is just syntactic sugar for *(longptr + x)
That's also why the oldie-obfie-hack x[longptr] works.

) Now in this case, unless the optimizer is spectacular, the following are not 
) identical:
)
)         longptr = &longarray[0];
)         for (x = 0; x < 1000; x++)        // using x is silly, this is only 
) an example
)             *longptr++ = 5;
)
) and
)
)         for (x = 0; x < 1000; x++)
)             longarray[x] = 5;
)
) produces slower code in the latter case, because it is the same as
)
)         for (x = 0; x < 1000; x++)
)             *(longarray + x) = 5;
)
) and most machines have ptr+something usually slower than, say, ptr+4, which 
) can often be part of the opcode at the machine level for all small values of 
) 4 :)

I disagree, assuming 'most machines' are pentia.

On (modern) x86, the latter code is faster, because there is an adressing
mode that's specificly designed for indexing, and the first piece of code
has an extra (separate) increment instruction.

Old machines used to have postincrement opcodes, but things change.  ^_^


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Reply willem (1478) 10/9/2004 7:58:32 PM

Willem coughed up:
> Thomas wrote:
> ) AFAICT, in all cases of (in C)
> )
> )         *(longptr + x)
> )
> ) and
> )
> )         longptr[x]
> )
> ) the asm code is the same.  At least for all the compilers I've ever
> tested ) this on.
>
> If I'm not mistaken, longptr[x] is just syntactic sugar for *(longptr
> + x) That's also why the oldie-obfie-hack x[longptr] works.
>
> ) Now in this case, unless the optimizer is spectacular, the
> following are not ) identical:
> )
> )         longptr = &longarray[0];
> )         for (x = 0; x < 1000; x++)        // using x is silly, this
> is only ) an example
> )             *longptr++ = 5;
> )
> ) and
> )
> )         for (x = 0; x < 1000; x++)
> )             longarray[x] = 5;
> )
> ) produces slower code in the latter case, because it is the same as
> )
> )         for (x = 0; x < 1000; x++)
> )             *(longarray + x) = 5;
> )
> ) and most machines have ptr+something usually slower than, say,
> ptr+4, which ) can often be part of the opcode at the machine level
> for all small values of ) 4 :)
>
> I disagree, assuming 'most machines' are pentia.

Ok, but no, I was actually thinking of the unix department I used to run and 
port a postscript app to:

    sparc
    sillicon graphic's (early) MIPS based architecture
    DG Aviion (88000?)
    etc.

These were often non-ansii C compilers, if that matters any.

....[rip]...

-- 
"His name was Robert Paulson. His name was Robert Paulson. His name was
Robert Paulson..."


0
Reply tgm2tothe10thpower (2065) 10/9/2004 8:09:22 PM

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Willem wrote:
| DEMAINE wrote:
| ) are you sure an array of stucts would be faster that a sturct of
arrays ?
|
| Probably.
| I haven't read the actual code or anything, but I can list two reasons
| why an array of structs is considerablyfaster than a struct of arrays:
|
| - You get locality of reference.  That is, the data you access is next to
|   each other in memory, meaning that it gets cached in one go.
| - You only need one pointer, with (small) fixed offsets, instead of a
whole
|   set of pointers with the same offset.

Your consideration shall not work in my case because the index used in
various are not the same in all arrays: I first acces tab[i], and then
c=tab2[f(i)] , then tab3[f(c,i)] ...

- --
DEMAINE Beno�t-Pierre http:/www.demaine.info/
\_o< apt-get remove ispell >o_/
There're 10 types of people: those who can count in binary and those who
can't
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBaI5XGWSTLbOSw8IRAmUYAJ0UiJjCZK214mk3wr3hpsykpkb45wCeJPqA
yAbI6adZVh4slJj2HX9azQM=
=TJoi
-----END PGP SIGNATURE-----
0
Reply nntp_pipex (70) 10/10/2004 1:20:24 AM

20 Replies
24 Views

(page loaded in 5.468 seconds)

Similiar Articles:


















7/25/2012 3:12:13 AM


Reply: