COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Storage of symmetric boolean matrix

• Email
• Follow

```Hello all,

First of all, this might be somewhat off-topic for some of you, so let
me

I am writing a program that requires a huge 2D symmetric matrix of
booleans
(i.e., either 0 or 1).  My first thought was to store it in a packed
way, so
the memory needed would be about half of the amount required to store
the
whole array in a traditional fashion. My packing is described as
follows.

Given this symmetric matrix:

0   1   2   3   4
+---+---+---+---+---+
0   | 1 | 0 | 0 | 1 | 1 |
+---+---+---+---+---+
1   | 0 | 0 | 1 | 1 | 0 |
+---+---+---+---+---+
2   | 0 | 1 | 1 | 1 | 1 |
+---+---+---+---+---+
3   | 1 | 1 | 1 | 1 | 1 |
+---+---+---+---+---+
4   | 1 | 0 | 1 | 1 | 0 |
+---+---+---+---+---+

I am storing only the upper triangular portion plus the diagonal
continuously
in memory, so I have:

00  01  02  03  04  11  12  13  14  22  23  24  33  34  44
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

The two digits numbers above the schematic correspond to the indexes
in the
original matrix. Assuming each position is one char, instead of N^2
bytes the
packed way is using only N*(N+1)/2 bytes. The trade-off is of course
having a
more costly way to access the matrix. Instead of simply doing M[i][J],
I need
a function like this:

#define N 10000 /* size of matrix */
...
1  unsigned char is_set(unsigned int i, unsigned int j)
2  {
3      unsigned int tmp;
4      unsigned int real_index;
5
6      if (i > j) {
7          tmp = j;
8          j = i;
9          i = tmp;
10      }
11
12      real_index = (i * N) - (i * (i - 1) / 2) + j - i;
13      return packed_array[real_index];
14  }

I came up with the formula to convert the indexes from the traditional
array
to the packed array and it seems to work.

Then I realized it is a waste to use a char to store a single
information, so
I decided to the bits of a char to do this, then instead of using N
(N-1)/2
bytes, I would need only 1/8 of this memory. Of course this leads to
bit
manipulation and all. The final result, with a few tweaks, is the
following
function:

1  unsigned char is_set(int i, int j)
2  {
3      register unsigned int idx;
4
5      if (i > j) {
6          register int temp = i;
7          i = j;
8          j = temp;
9      }
10
11      idx = ((i * ((N << 1) - i - 1)) >> 1) + j;
16      return ((packed_array[idx >> 3] << (idx & 7)) & (unsigned
char) 0x80);
17  }

(I reordered the expression to reduce the number of multiplications
and use
shift operations.)

This function is called a lot of times, and this is having some
noticeable
impact in the overall performance. The memory saving is really huge,
but
the performance is making me wonder if this is really the best choice.

Does anyone know how could I make it faster, or a clever way to solve
this
problem, or any suggestions on the code I posted above? Only
limitation
is that they only work properly if a char is 8 bits long, but I can
live with that.

Thanks,
Gustavo

PS. Sorry if the schematics look messed up, with a monospace font they
looked nice.
```
 0
Reply rondina (9) 6/27/2009 9:07:55 PM

See related articles to this posting

```On Sat, 27 Jun 2009 14:07:55 -0700, Gustavo Rondina wrote:
> Hello all,
>
> First of all, this might be somewhat off-topic for some of you, so let
> me
>
> I am writing a program that requires a huge 2D symmetric matrix of
> booleans
> (i.e., either 0 or 1).  My first thought was to store it in a packed
> way, so
> the memory needed would be about half of the amount required to store
> the
> whole array in a traditional fashion. My packing is described as
> follows.
>
> Given this symmetric matrix:
>
>             0   1   2   3   4
>           +---+---+---+---+---+
>       0   | 1 | 0 | 0 | 1 | 1 |
>           +---+---+---+---+---+
>       1   | 0 | 0 | 1 | 1 | 0 |
>           +---+---+---+---+---+
>       2   | 0 | 1 | 1 | 1 | 1 |
>           +---+---+---+---+---+
>       3   | 1 | 1 | 1 | 1 | 1 |
>           +---+---+---+---+---+
>       4   | 1 | 0 | 1 | 1 | 0 |
>           +---+---+---+---+---+
>
> I am storing only the upper triangular portion plus the diagonal
> continuously
> in memory, so I have:
>
>     00  01  02  03  04  11  12  13  14  22  23  24  33  34  44
>    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
>    | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
>    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
>
> The two digits numbers above the schematic correspond to the indexes
> in the
> original matrix. Assuming each position is one char, instead of N^2
> bytes the
> packed way is using only N*(N+1)/2 bytes. The trade-off is of course
> having a
> more costly way to access the matrix. Instead of simply doing M[i][J],
> I need
> a function like this:
>
>     #define N 10000 /* size of matrix */
>     ...
>  1  unsigned char is_set(unsigned int i, unsigned int j)
>  2  {
>  3      unsigned int tmp;
>  4      unsigned int real_index;
>  5
>  6      if (i > j) {
>  7          tmp = j;
>  8          j = i;
>  9          i = tmp;
> 10      }
> 11
> 12      real_index = (i * N) - (i * (i - 1) / 2) + j - i;
> 13      return packed_array[real_index];
> 14  }
>
> I came up with the formula to convert the indexes from the traditional
> array
> to the packed array and it seems to work.
>
> Then I realized it is a waste to use a char to store a single
> information, so
> I decided to the bits of a char to do this, then instead of using N
> (N-1)/2
> bytes, I would need only 1/8 of this memory. Of course this leads to
> bit
> manipulation and all. The final result, with a few tweaks, is the
> following
> function:
>
>  1  unsigned char is_set(int i, int j)
>  2  {
>  3      register unsigned int idx;
>  4
>  5      if (i > j) {
>  6          register int temp = i;
>  7          i = j;
>  8          j = temp;
>  9      }
> 10
> 11      idx = ((i * ((N << 1) - i - 1)) >> 1) + j;
> 16      return ((packed_array[idx >> 3] << (idx & 7)) & (unsigned
> char) 0x80);
> 17  }
>
> (I reordered the expression to reduce the number of multiplications
> and use
> shift operations.)
>
> This function is called a lot of times, and this is having some
> noticeable
> impact in the overall performance. The memory saving is really huge,
> but
> the performance is making me wonder if this is really the best choice.
>
> Does anyone know how could I make it faster, or a clever way to solve
> this
> problem, or any suggestions on the code I posted above? Only
> limitation
> is that they only work properly if a char is 8 bits long, but I can
> live with that.
>
> Thanks,
> Gustavo
>
> PS. Sorry if the schematics look messed up, with a monospace font they
> looked nice.

Hi Gustavo,

The bitshifts and ANDs will be very fast so probably the problem is the
overhead of a function call. Try declaring your function as inline (or use
a Macro).

Cheers,
Joe

--
......................  o    _______________           _,
` Good Evening!  ,      /\_  _|             |        .-'_|
`................,     _\__`[_______________|       _| (_|
] [ \, ][         ][        (_|

```
 0
Reply spamtrap7 (71) 6/27/2009 9:43:15 PM

```On Sat, 27 Jun 2009 14:07:55 -0700 (PDT), Gustavo Rondina
<rondina@gmail.com> wrote:

>Hello all,
>
>First of all, this might be somewhat off-topic for some of you, so let
>me
>
>I am writing a program that requires a huge 2D symmetric matrix of
>booleans
>(i.e., either 0 or 1).  My first thought was to store it in a packed
>way, so
>the memory needed would be about half of the amount required to store
>the
>whole array in a traditional fashion. My packing is described as
>follows.

snip conventional 2-d array description

>I am storing only the upper triangular portion plus the diagonal
>continuously
>in memory, so I have:

snip description of converted linear array

>Then I realized it is a waste to use a char to store a single
>information, so
>I decided to the bits of a char to do this, then instead of using N
>(N-1)/2
>bytes, I would need only 1/8 of this memory. Of course this leads to
>bit
>manipulation and all. The final result, with a few tweaks, is the
>following
>function:
>
> 1  unsigned char is_set(int i, int j)
> 2  {
> 3      register unsigned int idx;
> 4
> 5      if (i > j) {
> 6          register int temp = i;
> 7          i = j;
> 8          j = temp;
> 9      }
>10
>11      idx = ((i * ((N << 1) - i - 1)) >> 1) + j;
>16      return ((packed_array[idx >> 3] << (idx & 7)) & (unsigned
>char) 0x80);

What happened to lines 12 through 15?

>17  }
>
>(I reordered the expression to reduce the number of multiplications
>and use
>shift operations.)

>
>This function is called a lot of times, and this is having some
>noticeable
>impact in the overall performance. The memory saving is really huge,
>but
>the performance is making me wonder if this is really the best choice.
>

Welcome to the common experience of being able to optimize for time or
space but not both.

>Does anyone know how could I make it faster, or a clever way to solve
>this
>problem, or any suggestions on the code I posted above? Only
>limitation
>is that they only work properly if a char is 8 bits long, but I can
>live with that.

You need to look at the generated assembly code.

N<<1 is a compile-time constant and should not be computed
each time.  If your compiler is not optimizing this, you might want to
compute it once and store it in a static variable.

Assuming packed_array has type unsigned char, the arithmetic
in the return statement is still performed using int and then
converted to unsigned.  Is the generated code spending a lot of time
converting between the types?  If so, you might consider using
bit objects instead of 8 bit ones.

In my experience, multiplication is the long tent pole in your
i * ((N << 1) - i - 1)
This is the same as
i * (N << 1) - i *(i+1)
If you increase N until it is a power of 2, the first
multiplication can be replaced by another shift.
The second multiplication depends only on i.  i will always be
between 0 and N-1.  You could build an N element array of int where
element y is set to y*y+y.
(i << NN) - array[i]
with no multiplication involved.  (The array reference should result
in a shift, not a multiply, but you do need to check the assembly
code.)

--
Remove del for email
```
 0
Reply schwarzb3978 (1424) 6/27/2009 11:36:06 PM

```Gustavo Rondina <rondina@gmail.com> writes:
<snip>
> I am storing only the upper triangular portion plus the diagonal
> continuously
> in memory, so I have:
>
>     00  01  02  03  04  11  12  13  14  22  23  24  33  34  44
>    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
>    | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
>    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
>
> The two digits numbers above the schematic correspond to the indexes
> in the
> original matrix. Assuming each position is one char, instead of N^2
> bytes the
> packed way is using only N*(N+1)/2 bytes. The trade-off is of course
> having a
> more costly way to access the matrix. Instead of simply doing M[i][J],
<snip>
> Then I realized it is a waste to use a char to store a single
> information, so
> I decided to the bits of a char to do this, then instead of using N
> (N-1)/2
> bytes, I would need only 1/8 of this memory. Of course this leads to
> bit
> manipulation and all. The final result, with a few tweaks, is the
> following
> function:
>
>  1  unsigned char is_set(int i, int j)
>  2  {
>  3      register unsigned int idx;
>  4
>  5      if (i > j) {
>  6          register int temp = i;
>  7          i = j;
>  8          j = temp;
>  9      }
> 10
> 11      idx = ((i * ((N << 1) - i - 1)) >> 1) + j;
> 16      return ((packed_array[idx >> 3] << (idx & 7)) & (unsigned
> char) 0x80);
> 17  }
>
> (I reordered the expression to reduce the number of multiplications
> and use
> shift operations.)
>
> This function is called a lot of times, and this is having some
> noticeable
> impact in the overall performance. The memory saving is really huge,
> but
> the performance is making me wonder if this is really the best
> choice.

If you've coded the rest cleanly, it should be very simple to switch in
a plain array and see how much better/worse it is.

My only alternative is to use an array of pointers rather than a
packed array.  You arrange for M[i] to be a pointer to row i.  This
replaces one part of the index calculation with an indirection.  You
do this along with any of your other plans -- storing only one half of
the matrix and/or using bits rather than bytes.

You might also want to try this: allocate the diagonals and not the
rows.  Why?  Because you can make the symmetry explicit that way.  For
an NxN matrix you allocate an array of 2N-1 pointers P.  P[0] points
to an array of 1 element (M[0][N-1]), P[1] to two elements (M[0][N-2]
and M[1][N-1]) and so on up to P[N-1] which points to the N elements
of the main diagonal.  The remaining Ps pointer to the *same* arrays
but in reverse order (P[N] = P[N-2], P[N+1] = P[N-3] and so on).

Now, make a new pointer-to-pointer variable D and set it to &P[N-1].
D[0] is now the main diagonal and both D[1] and D[-1] are the (same)
diagonal just above and below it.  Element M[i][j] is now indexed as

D[i-j][min(i,j)]   i.e:   D[i-j][i < j ? i : j]

As with all optimisations, this may or may not pay off but I'd want to
try it out.

You can use the bit allocation trick here as well, of course.

Note that when using an array of pointers you don't have to allocate
the smaller arrays separately.  You can so it all on one go and simply
point to the appropriate part.

<snip>
--
Ben.
```
 0
Reply ben.usenet (6790) 6/28/2009 1:58:42 AM

```Gustavo Rondina wrote:
> Hello all,
>
> First of all, this might be somewhat off-topic for some of you, so let
> me
>
> I am writing a program that requires a huge 2D symmetric matrix of
> booleans
> (i.e., either 0 or 1).  My first thought was to store it in a packed
> way, so
> the memory needed would be about half of the amount required to store
> the
> whole array in a traditional fashion. My packing is described as
> follows.
>
> Given this symmetric matrix:
>
>             0   1   2   3   4
>           +---+---+---+---+---+
>       0   | 1 | 0 | 0 | 1 | 1 |
>           +---+---+---+---+---+
>       1   | 0 | 0 | 1 | 1 | 0 |
>           +---+---+---+---+---+
>       2   | 0 | 1 | 1 | 1 | 1 |
>           +---+---+---+---+---+
>       3   | 1 | 1 | 1 | 1 | 1 |
>           +---+---+---+---+---+
>       4   | 1 | 0 | 1 | 1 | 0 |
>           +---+---+---+---+---+
>
> I am storing only the upper triangular portion plus the diagonal
> continuously
> in memory, so I have:
>
>     00  01  02  03  04  11  12  13  14  22  23  24  33  34  44
>    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
>    | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
>    +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
>
> The two digits numbers above the schematic correspond to the indexes
> in the
> original matrix. Assuming each position is one char, instead of N^2
> bytes the
> packed way is using only N*(N+1)/2 bytes. The trade-off is of course
> having a
> more costly way to access the matrix. Instead of simply doing M[i][J],
> I need
> a function like this:
>
>     #define N 10000 /* size of matrix */
>     ...
>  1  unsigned char is_set(unsigned int i, unsigned int j)
>  2  {
>  3      unsigned int tmp;
>  4      unsigned int real_index;
>  5
>  6      if (i > j) {
>  7          tmp = j;
>  8          j = i;
>  9          i = tmp;
> 10      }
> 11
> 12      real_index = (i * N) - (i * (i - 1) / 2) + j - i;
> 13      return packed_array[real_index];
> 14  }
>
> I came up with the formula to convert the indexes from the traditional
> array
> to the packed array and it seems to work.
>
> Then I realized it is a waste to use a char to store a single
> information, so
> I decided to the bits of a char to do this, then instead of using N
> (N-1)/2
> bytes, I would need only 1/8 of this memory. Of course this leads to
> bit
> manipulation and all. The final result, with a few tweaks, is the
> following
> function:
>
>  1  unsigned char is_set(int i, int j)
>  2  {
>  3      register unsigned int idx;
>  4
>  5      if (i > j) {
>  6          register int temp = i;
>  7          i = j;
>  8          j = temp;
>  9      }
> 10
> 11      idx = ((i * ((N << 1) - i - 1)) >> 1) + j;
> 16      return ((packed_array[idx >> 3] << (idx & 7)) & (unsigned
> char) 0x80);
> 17  }
>
> (I reordered the expression to reduce the number of multiplications
> and use
> shift operations.)
>
> This function is called a lot of times, and this is having some
> noticeable
> impact in the overall performance. The memory saving is really huge,
> but
> the performance is making me wonder if this is really the best choice.
>
> Does anyone know how could I make it faster, or a clever way to solve
> this
> problem, or any suggestions on the code I posted above? Only
> limitation
> is that they only work properly if a char is 8 bits long, but I can
> live with that.
>
> Thanks,
> Gustavo
>
> PS. Sorry if the schematics look messed up, with a monospace font they
> looked nice.

Gustavo,

I suggest you use an unsigned int as your base (foundation) type rather
than char.  Many processors are now geared up to fetch integers at a
fast rate.  Some processors actually slow down when processing chars.

For example, if a processor has a 32-bit integer, and you are using
8-bit chars, it would fetch 8 bits at a time using char and 32-bits in
one fetch using integer.

Also, you may want to ask yourself if the space saving is necessary
and actually benefits the product.  Many "desktop" applications are not
memory sensitive, so more time should be spent on correctness of the
program.

--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq:   http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.comeaucomputing.com/learn/faq/
Other sites:
http://www.josuttis.com  -- C++ STL Library book
http://www.sgi.com/tech/stl -- Standard Template Library
```
 0

```On Sat, 27 Jun 2009 14:07:55 -0700, Gustavo Rondina wrote:

> (I reordered the expression to reduce the number of multiplications
> and use shift operations.)

The compiler ought to be able to do this for you. Certainly,
multiplication or division by 2 will be translated to a shift for any sane
compiler.

> This function is called a lot of times, and this is having some
> noticeable impact in the overall performance. The memory saving is
> really huge, but the performance is making me wonder if this is really
> the best choice.
>
> Does anyone know how could I make it faster, or a clever way to solve this
> problem, or any suggestions on the code I posted above? Only limitation
> is that they only work properly if a char is 8 bits long, but I can live
> with that.

If the multiplication (i*(N/2-i-1)) is the bottleneck, using a
pre-calculated array of row pointers may be faster.

Also try:

Returning "int" rather than "char".

Using an array of "int"s (or "long"s) rather than "char"s.

Returning 0/1 rather than 0/0x80, e.g.:

return (packed_array[idx>>3] >> (idx&7)) & 1;

Returning zero/non-zero, e.g.:

return (packed_array[idx>>3] & (1 << (idx&7)));

Defining the function as "static inline ..." in a header file.

Finally: storing values as bits rather than bytes will use 1/8th the
memory, but using a triangular matrix will save less than a half, so maybe
the latter isn't worth it.

```
 0
Reply nobody (5153) 6/28/2009 7:39:54 AM

```On Sat, 27 Jun 2009 16:36:06 -0700, Barry Schwarz wrote:

[snip code from previous post]

> What happened to lines 12 through 15?

Sorry, I had copied a previous version of the function with a few extra
unnecessary lines which were removed only after I inserted the line
numbers by hand -- next time 'cat -n' certainly will be used.

> You need to look at the generated assembly code.
>
> 	N<<1 is a compile-time constant and should not be computed
> each time.  If your compiler is not optimizing this, you might want to
> compute it once and store it in a static variable.

I did this and it made the function a bit faster, thanks for the tip.

> 	Assuming packed_array has type unsigned char, the arithmetic
> in the return statement is still performed using int and then converted
> to unsigned.  Is the generated code spending a lot of time converting
> between the types?  If so, you might consider using unsigned int instead
> 8 bit ones.

Just now Thomas Matthews suggested this as well in another post, thank
you both. I am using integers now, and it is indeed a little faster.

> In my experience, multiplication is the long tent pole in your
>    i * ((N << 1) - i - 1)
> This is the same as
>    i * (N << 1) - i *(i+1)
> 	If you increase N until it is a power of 2, the first
> multiplication can be replaced by another shift.
> 	The second multiplication depends only on i.  i will always be
> between 0 and N-1.  You could build an N element array of int where
> element y is set to y*y+y.

This is interesting, I will give it a try. I don't know though if I can
afford increasing N to a power of 2. Only tests will tell.

For now I have the following. The computations involving constants have
been removed from the function and the result is stored somewhere else.

Again, this assumes that a unsigned int is 32 bits long.

1 unsigned int is_set (unsigned int i, unsigned int j)
2 {
3   register unsigned int idx;
4
5   if (i > j)
6     {
7       register int temp = i;
8       i = j;
9       j = temp;
10     }
11
12   /* Original expression:
13
14      idx = i * N - ((i * (i - 1)) / 2) + (j - i)
15
16      In the code below NN == (N << 1) - 1 is a variable
17      initialized somwhere else.
18    */
19
20   idx = NN - i;
21   idx *= i;
22   idx >>= 1;
23   idx += j;
24
25   /* packed_array is a array of unsigned integers */
26   return ((packed_array[idx >> 5] << (idx & 31)) & 0x80000000);
27 }

Cheers,

--
Gustavo
```
 0
Reply rondina (9) 6/28/2009 8:49:37 AM

```On Sun, 28 Jun 2009 00:18:18 -0700, Thomas Matthews wrote:

> I suggest you use an unsigned int as your base (foundation) type rather
> than char.  Many processors are now geared up to fetch integers at a
> fast rate.  Some processors actually slow down when processing chars.

I did this modification and indeed it is a bit faster (something around
10% to 15%, but I haven't run extensive benchmarks yet.)

[snip]

> Also, you may want to ask yourself if the space saving is necessary and
> actually benefits the product.  Many "desktop" applications are not
> memory sensitive, so more time should be spent on correctness of the
> program.

I see. Well, this is a scientific application for simulating the dynamics
of system of molecules. The memory requirement scales with N, which is the
number of molecules/atoms/beads one wants to simulate. In the future the
plan is to port this application to run on GPUs, where memory is not as
abundant as in the regular PC, so I am trying to design the application to
use as less memory as possible so large systems can be simulated.

With the highly parallel design of GPUs the expensive computations of
calculating indexes and such might be hidden by the enormous amount of
threads one can run, but nonetheless I would like to get a fast ``serial''
(i.e., CPU) version as well.

The replies to my original question were very good and gave me some ideas
I might want to try.

Cheers,

--
Gustavo
```
 0
Reply rondina (9) 6/28/2009 9:28:13 AM

```On Sun, 28 Jun 2009 08:49:37 +0000 (UTC), Gustavo Rondina
<rondina@gmail.com> wrote:

>On Sat, 27 Jun 2009 16:36:06 -0700, Barry Schwarz wrote:

snip

>> In my experience, multiplication is the long tent pole in your
>>    i * ((N << 1) - i - 1)
>> This is the same as
>>    i * (N << 1) - i *(i+1)
>> 	If you increase N until it is a power of 2, the first
>> multiplication can be replaced by another shift.
>> 	The second multiplication depends only on i.  i will always be
>> between 0 and N-1.  You could build an N element array of int where
>> element y is set to y*y+y.
>
>This is interesting, I will give it a try. I don't know though if I can
>afford increasing N to a power of 2. Only tests will tell.

Your N was defined as 10,000.  If this was just an approximation as
opposed to a precise requirement, why not use 8,192 as your
approximation.  Otherwise your are up to 16,384.  The extra bits in
packed_array won't matter that much.  Even the extra 6K elements of
the int array should not be that big a deal given how much space you
saved by packing the boolean values.

>
>For now I have the following. The computations involving constants have
>been removed from the function and the result is stored somewhere else.
>
>Again, this assumes that a unsigned int is 32 bits long.
>
>  1 unsigned int is_set (unsigned int i, unsigned int j)
>  2 {
>  3   register unsigned int idx;
>  4
>  5   if (i > j)
>  6     {
>  7       register int temp = i;

This should be unsigned also.

>  8       i = j;
>  9       j = temp;
> 10     }

snip

--
Remove del for email
```
 0
Reply schwarzb3978 (1424) 6/28/2009 7:15:30 PM

```Gustavo Rondina wrote:

<snip>

> I see. Well, this is a scientific application for simulating the dynamics
> of system of molecules. The memory requirement scales with N, which is the
> number of molecules/atoms/beads one wants to simulate. In the future the
> plan is to port this application to run on GPUs, where memory is not as
> abundant as in the regular PC, so I am trying to design the application to
> use as less memory as possible so large systems can be simulated.
>
> With the highly parallel design of GPUs the expensive computations of
> calculating indexes and such might be hidden by the enormous amount of
> threads one can run, but nonetheless I would like to get a fast ``serial''
> (i.e., CPU) version as well.
>
> The replies to my original question were very good and gave me some ideas
> I might want to try.

A few points you might want to consider as you will be switching to a
different processor...

The size of unsigned in could well be different on the final target
processor.

There are differences between compilers on what they can optimise
efficiently.

There are difference between processors on what opperations are efficient.

So, all your careful benchmarking and speed optimisations (and some
space optimisations) could well be undone by switching to a different
processor.

So I would recommend keeping all of the different versions of the code
and getting hold of the real target (or a simulator for it) and re-doing
the work on the final target system!
--
Flash Gordon
```
 0
Reply smap (838) 6/29/2009 6:47:52 AM

```On Sat, 27 Jun 2009 14:07:55 -0700 (PDT), Gustavo Rondina
<rondina@gmail.com> wrote:

>Hello all,
>
>First of all, this might be somewhat off-topic for some of you, so let
>me
>
>I am writing a program that requires a huge 2D symmetric matrix of
>booleans
>(i.e., either 0 or 1).
[snip]

I suspect your time problems are not where you think.  When one is
accessing a multi-dimensional array it is desirable to arrange the
memory accesses sequentially.  Thus your loops look something like
this:

for (i=0;i<n;i++) {
for (j=0;j<n;j++) {
/* code referencing M[i][j] */
}
}

This code will run faster than code with the loop order exchanged,
i.e.,

for (i=0;i<n;i++) {
for (j=0;j<n;j++) {
/* code referencing M[j][i] */
}
}

So, suppose your code is arranged so that you access columns
sequentially.  Instead of referencing the two dimensional array,
reference a slice.  Then the loop looks like

for (i=0;i<n;i++) {
S = get_column(M,i)[
for (j=0;j<n;j++) {
/* code referencing S[j] */
}
}

Now you can pack your array one entry to a bit, exploit symmetry,
and get a reduction in storage size of M by a factor of 64 for 32
bit ints.  In turn you want an array of n ints to hold S.

At this point you might be thinking, well why not just use a
pointer into M rather than have a get_column function.  The reason
is that the code to unpack the entire column can be optimized more
effectively.

Similar remarks apply to the code to set the contents of a column.

For the most part the issues here are language independent.  There
is a C issue though.  It is probably best to use char for M and int
for S since these are the "natural" types.  One use typedefs for
them and see whether different choices of type affect performance.

Richard Harter, cri@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
If I do not see as far as others, it is because
I stand in the footprints of giants.
```
 0
Reply cri (1432) 6/29/2009 3:10:44 PM

10 Replies
44 Views

Similar Articles

12/16/2013 3:26:49 AM
page loaded in 39253 ms. (0)

Similar Artilces:

Hessenberg reduction of a symmetric matrix
Hello, I would like to know what algorithm Matlab uses within HESS for the reduction of a symmetric matrix to tridiagonal form, and if the behaviour of HESS can be changed for symmetric matrices to not allow permutations of the first row or column. The particular application I am working with, requires that the unitary similarity transformations which reduce the matrix A to Hessenberg act only on the trailing (n-1)*(n-1) submatrix of A. For example, for the matrx: n=5; A=[0,randn(1,n-1);zeros(n-1,1),diag(randn(n-1,1))];A=A+triu(A,1)'; [P,H]=hess(A) the reduction, at some point p...

Smallest eigenvector of a real symmetric matrix ???
Consider the following problem: minimize_v || Av ||^2, subject to || v || = 1 where A is a n-by-q matrix, and v is a q-by-1 vector, and || || is the Euclidean (i.e., L-2) norm. In the ideal case I have n>q and rank(A) = q. But I also consider the case where rank(A) < q. Clearly this problem can also be re-written as: minimize_v v'(A' A) v, subject to || v || = 1 which is to find the eigenvector v* of the real symmetric matrix A'A that corresponds to the smallest eigenvalue. I have the following questions (answer to any question is welcomed !...

Why does Intel Matrix Storage Manager start verifiying?
I assume that this is the best group to post the following problem. If not would someone please advise. I have two identical 500GB drives (not XP boot) in RAID 1 ICH10R. I am getting annoyed that the Intel Matrix Storage Manager ver 8 decides to verify and repair which takes two hours to complete. It has never found any error and neither has running chkdisk or Kaspersky AV. My system is stable with all my five drives regularly defragmented. Motherboard is Gigabyte GA EP45-DS3P with 4GB Corsair Dominator, Intel 8400 core2 running XP pro SP3 and all updates What might the cause of...

Function to Count Number of true elements in a boolean matrix
Hi All, I'm wondering if there's a function out there to count the number of true values in a boolean matrix. I know that the sum function does the trick (even though the sum of a number of true values doesn't really make sense), but I assume that this is rather slow and there would be a function out there that is optimized for use on boolean matrices. Thanks, "Shaddy " <s.shokrallaREMOVETHIS@gmail.com> wrote in message <gkqv6p\$70h\$1@fred.mathworks.com>... > Hi All, > > I'm wondering if there's a function out there to count the number of ...

Vectorized real symmetric dense matrix diagonalization subroutine
I need Fortran source of *good vectorized* real symmetric dense matrix diagonalization subroutine(s) (finding of all the eigenvalues/ vectors). Typical sizes are from 1000 to 10000. May be some used for NEC SX, or some Cray's ? Fast methods like Householder 3-daigonalization w/subsequent eigenvalues/vectors finding are preferred. IMHO 1st step gives no wide choice, but there is a lot of methods for 3-diagonal matrix eigenvalues/vectors finding. The most rich Fortran collection I know was old Eispack; but which subroutines are more good vectorizable ? Lapack, if I'm not wrong, propo...

About Minimum Description Length of Eigenvalues of Square symmetric matrix
Hi all, Iam working on matrix dimensionality reduction in information retrieval problems. One method to identify the number of dimensions to be retained in a term by document matrix (A of size mxn) is to identify the Minimum Description Length (MDL) of Eigenvalues of Square symmetric matrix (A' * A of size n x n). This was described by Hongyuan Zha. But to identify the MDL we have to perform the arithmetic and geometric mean of the Eigenvalus of A' * A. The problem Iam facing is: in calculating the geometric mean of Eigenvalues, the product of all the Eigenvalues is becoming "...

Finding cells holding numbers greater than zero in a symmetric matrix
I have a matrix filled with numbers, and I need to find out cells that hold numbers greater than 0. In order to do so, I did the following thing: [I, J] = find(mat > 0); My only problem is that my matrix is symmetric, and I'm not interested in getting symetric cells twice. Is there some quick way to do so without using loops. In article <ef111a8.-1@webx.raydaftYaTP>, Holly <holly345@hotmail.com> wrote: > I have a matrix filled with numbers, and I need to find out cells > that hold numbers greater than 0. > In order to do so, I did the following thing: > [I, J]...

Re: How to construct symmetric matrix from just a one half #2
*Here it is: mat1 + Transpose[mat1] - DiagonalMatrix[Diagonal[mat1]] *--- 2010/6/15 Srikanth K S <sriperso@gmail.com> > *Sorry, that seems like a mistake! > *--- > > > 2010/6/15 Srikanth K S <sriperso@gmail.com> > > *Here is a way, but there could be other elegant ways than this! >> >> *In[8]:== mat1=={{1,0,0,0},{2,3,0,0},{4,9,5,0},{2,2,3,4}}; >> In[12]:== Union@@@MapThread[List,{mat1,MapThread[List,mat1]}] >> Out[12]== {{0,1,2,4},{0,2,3,9},{0,3,4,5,9},{0,2,3,4}}* >> *--- >> >> >> 2010/6/...

Reducing fill-ins when factoring sparse symmetric positive definite matrix
Hi, I try to solve Ax=b using a direct method. A is a sparse symmetric positive definite matrix. The problem is 'out-of-memory' error when I try to factor matrix A along with Cholesky or LU decomposition. I believe that MATLAB internally uses its own fill-in reducing algorithm. However, the resultant low/upper triangular matrix had 10 times more non-zero elements than matrix A. Consequently, I end up with 'out-of-memory' error due to these fill-ins. I am just wondering if there is any better way to reduce fill-ins in lower and upper triangular matrices. What would be accept...

insert a matrix into matrix
hello, new to all this so excuse my ignorance lets say i have x = 1 2 3 4 which i convert to column vector x= 1 2 3 4 now i have a, created from the lengh of x in this case 4, and any value, lets say 4, so a= 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 now i want to insert x into a 4 times, so that a= 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 but i want to do this without a loop i can replace it with numbers using a(,:i)=x where i = 1 to 4 in this case, but i have to use a loop for that, did i explain that ok, can anyone help? cheeseboy On Fri, 19 Nov 2004 20:47:25 -0500, cheeseboy wrote: > hello...

Copy from matrix to matrix...How?
Dear friends, if the matrix U1 equal to: U1=[1,1,1,1; 0,0,0,0; 0,0,0,0 ] and the second matrix U2 equal to: U2=[0,0,0,0; 2,2,2,2; 0,0,0,0 ] how can I creat the third matrix whic is equal to: U3=[1,1,1,1; 2,2,2,2; 0,0,0,0 ] Please could you change on my code to be right? This is the code: for j=1:10 for i=1:12 U1(i,j)=U2(i,j); end end waiting for your solution.......Thanks You can just write: U3=U1+U2 alfann <alfann.net@hotmail.com> wrote in message <2072829264.305125.126762...

A matrix that selects some predetermined columns from another matrix to produce a matrix that cosists of those columns
Hi, I would like to come up with a matrix to takes some columns of another matrix and puts them into the other matrix. For example, for X = [x1 x2 x3 x4] where xi is n x 1 vector, XJ = [x2 x3]. So, J has to be n x 2. Thanks in advance! usuiisu usuiisu <ikuyasu@gmail.com> wrote in message <526a92f2-79e0-450e-bceb-7bd8a838a661@30g2000yql.googlegroups.com>... > Hi, > > I would like to come up with a matrix to takes some columns of another > matrix > and puts them into the other matrix. For example, > for X = [x1 x2 x3 x4] where xi is n x 1 vecto...

Matrix composed by two matrix
Hi, I have a matrix of (0 to 53, 0 to 67) and i create other two matrix one of x(0 to 53, 0 to 53) and another y(0 to 53, 0 to 5) how could i map this two matrix on bigger matrix ? the easy way, correct way, the combinatorial way. To do this outside an process statement or even inside a process. On 4 Jul 2006 09:41:07 -0700, "lvcargnini" <lvcargnini@gmail.com> wrote: >I have a matrix of (0 to 53, 0 to 67) and i create other two matrix >one of x(0 to 53, 0 to 53) and another y(0 to 53, 0 to 5) how could i >map this two matrix on bigger matrix ? >the easy way, c...

Vector of matrixes times matrix
Hallo! I use the transfer matrix approach to calculate the transmission (and reflection) properties of a cascaded filter. The following code computes the transfer matrix: %%%%%%%%%%%%%%%%%%%%%%% init_ones = ones(1,wavelength_steps); init_zeros = zeros(1,wavelength_steps); T_matrix(1,1,:) = init_ones; T_matrix(1,2,:) = init_zeros; T_matrix(2,1,:) = init_zeros; T_matrix(2,2,:) = init_ones; for section=1:grating.sections % this variables will be used a lot, so I compute them gamma = sqrt((abs(grating.kappa(section))).^2-(pi*grating.nu_dash).^2); Q = -pi .* grating.nu_dash ....

matrix of matrices and matrix regression
Hello, I would like to know how I can create a matrix of matrices of the type : X = [X1, 0; 0, X2] where I want to keep X1 and X2 as separate matrices. X1 is a 126x4 and X2 is a 577x4. Hence they have the same number of columns but not the same number of rows. I want X simply to be 2X2 Then I want to create a matrix of two vectors: Y = [Y1, Y2] where Y1 is 126x1 and Y2 is 577x1. And Y is simply 2x1. Then I want to be able to regress X on Y which should yield 8 coefficients since both X1 and X2 have four columns. Can anyone please help me with this? thanks Eduard ...

a matrix whose some elements are matrixes
I want obtain a matric whose some elements are matrixes, how to compile the program? example: Iwant to abtain the following matrix G I I I I I 0 I G I 0 0 0 0 I I G I 0 0 0 I 0 I G I 0 0 I 0 0 I G I 0 I 0 0 0 I G I 0 0 0 0 0 I G in which I is an identity matrix, 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ...

Matrix
Needed MATRIX for software testing Web based application. I just started to work in the company. Please reply to me if you wish to help me with this issue. Thanks so much. Regards, Michael. Our book "Quality Web Systems" will provide you with a checklist for web testing. see http://www.qualitywebsys.com There are also some web testing checklists available free, including discussions surrounding the topic on http://www.qaforums.com Good luck, Elfriede Dustin http://www.effectivesoftwaretesting.com >Subject: Matrix >From: m15s@hotmail.com (MichaelS) >Date: 7/17/03 8:08 ...

Matrix
I have an nx2 matrix that I need to split in to 2 column vectors. the ->col makes row vectors, and I can't figure out how to make them in to column vectors. Does someone know a good way to split the nx2 matrix? ->col makes column vectors for me. However I have flag -98 set to display vectors horizontally. Double check ->col with your nx2 matrix. You may have 2 nx1 vectors displayed horizontally. Sean It gives 2 1xn row vectors. These are vectors and not matrices, so you can't run TRAN onthem. The first one works perfectly. Thanks! <diltsman@gmail.com> wrot...

matrix
Hi all type matrix is array(integer range <>, integer range <>) of integer; ... m: matrix(1..10, 1..5); ... for i in m'first..m'last loop -- 1 to 10 for j in m(i)'first .. m(i)'last loop -- doesnt work put(m(i, j)'image); end loop; end loop; what is the proper way to iterate through the matrix? thanks for answears -- Daniel * Daniel Sch�le wrote: > what is the proper way to iterate through the matrix? with Ada.Text_IO; use Ada.Text_IO; procedure t is type Matrix is array(Integer range <>, Integer range <...

matrix
I have matrix M and matrix N for example M={{*1*,2},{3,4},{*5*,6},{7,8}} N={{*1*,8},{*5*,0},{9,7}} (dimension is diffrent) I need to make new matrixes which include only this rows from matrix M and matrix N for which firsts elements are the same in both matrix (M and N). so I need M1={{1,2},{5,6}} N1={{1,8},{5,0}} Elements of matrix are real (not always integer) Please for help Peter On Feb 2, 12:26 am, Piotr Munik <piotr.mu...@gmail.com> wrote: > I have matrix M and matrix N > for example > M={{*1*,2},{3,4},{*5*,6},{7,8}} > N={{*1*,8},{*5*,0},{9,7}} &g...