f



How much memory does the Linux OS take?

I did some testing and found out on my 2 GB machine that I 
can only load a little less than 500 MB before this process 
slows down considerably due to page faults. I want to know 
about how much memory usage is required by the OS if most 
every inessential process is disabled. 


0
Peter
3/26/2010 2:21:23 AM
comp.os.linux.development.apps 5216 articles. 1 followers. Post Follow

119 Replies
869 Views

Similar Articles

[PageSpeed] 25

"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> I did some testing and found out on my 2 GB machine that I 
> can only load a little less than 500 MB before this process 
> slows down considerably due to page faults. I want to know 
> about how much memory usage is required by the OS if most 
> every inessential process is disabled. 

A few MB.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 2:30:05 AM
On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:

> I did some testing and found out on my 2 GB machine that I 
> can only load a little less than 500 MB before this process 
> slows down considerably due to page faults. I want to know 
> about how much memory usage is required by the OS if most 
> every inessential process is disabled.

You can get by with 2MB if you're very careful.  I'd recommend 4MB if
you want to run an application or two.

-- 
Grant

0
Grant
3/26/2010 3:00:23 AM
"Grant Edwards" <invalid@invalid.invalid> wrote in message 
news:hoh807$3ck$1@reader1.panix.com...
> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>
>> I did some testing and found out on my 2 GB machine that 
>> I
>> can only load a little less than 500 MB before this 
>> process
>> slows down considerably due to page faults. I want to 
>> know
>> about how much memory usage is required by the OS if most
>> every inessential process is disabled.
>
> You can get by with 2MB if you're very careful.  I'd 
> recommend 4MB if
> you want to run an application or two.
>
> -- 
> Grant
>

My problem is that the OS is eating up 1.5 GB. I would be 
thrilled if I could get it down to 500 MB. 


0
Peter
3/26/2010 3:41:55 AM
On Mar 25, 8:41=A0pm, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:

> My problem is that the OS is eating up 1.5 GB. I would be
> thrilled if I could get it down to 500 MB.

What exactly are you measuring? The OS will, and should, use all of
your memory. That's what it's for. Page faults are normal and don't
indicate a problem (they're part of how the OS tracks memory usage).

I think there's a strong possibility you're misunderstanding what
you're seeing and your actual problem has nothing to do with usage or
exhaustion of physical memory.

DS
0
David
3/26/2010 3:46:49 AM
I want there to be no page faults for my process. I want to 
limit the sum total of everything else to half of physical 
memory.

"David Schwartz" <davids@webmaster.com> wrote in message 
news:1a5bf20b-626a-428b-813d-4aeb0fbc5b8c@v34g2000prm.googlegroups.com...
On Mar 25, 8:41 pm, "Peter Olcott" <NoS...@OCR4Screen.com> 
wrote:

> My problem is that the OS is eating up 1.5 GB. I would be
> thrilled if I could get it down to 500 MB.

What exactly are you measuring? The OS will, and should, use 
all of
your memory. That's what it's for. Page faults are normal 
and don't
indicate a problem (they're part of how the OS tracks memory 
usage).

I think there's a strong possibility you're misunderstanding 
what
you're seeing and your actual problem has nothing to do with 
usage or
exhaustion of physical memory.

DS 


0
Peter
3/26/2010 4:07:26 AM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "Grant Edwards" <invalid@invalid.invalid> wrote in message 
> news:hoh807$3ck$1@reader1.panix.com...
>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>>
>>> I did some testing and found out on my 2 GB machine that 
>>> I
>>> can only load a little less than 500 MB before this 
>>> process
>>> slows down considerably due to page faults. I want to 
>>> know
>>> about how much memory usage is required by the OS if most
>>> every inessential process is disabled.
>>
>> You can get by with 2MB if you're very careful.  I'd 
>> recommend 4MB if
>> you want to run an application or two.
>>
>> -- 
>> Grant
>>
>
> My problem is that the OS is eating up 1.5 GB. I would be 
> thrilled if I could get it down to 500 MB. 

Unused memory is wasted memory.  Any RAM you've got that isn't in use by
your programs will be used as various OS buffers; if your programs
need more space, then the space being used for buffers will decrease.
Seeing the OS taking up 1.5GB means nothing else wants the memory.

So...  what is the testing that has led you to think a process is
slowing down due to page faults?
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/26/2010 4:10:19 AM
"Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message 
news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "Grant Edwards" <invalid@invalid.invalid> wrote in 
>> message
>> news:hoh807$3ck$1@reader1.panix.com...
>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> 
>>> wrote:
>>>
>>>> I did some testing and found out on my 2 GB machine 
>>>> that
>>>> I
>>>> can only load a little less than 500 MB before this
>>>> process
>>>> slows down considerably due to page faults. I want to
>>>> know
>>>> about how much memory usage is required by the OS if 
>>>> most
>>>> every inessential process is disabled.
>>>
>>> You can get by with 2MB if you're very careful.  I'd
>>> recommend 4MB if
>>> you want to run an application or two.
>>>
>>> -- 
>>> Grant
>>>
>>
>> My problem is that the OS is eating up 1.5 GB. I would be
>> thrilled if I could get it down to 500 MB.
>
> Unused memory is wasted memory.  Any RAM you've got that 
> isn't in use by
> your programs will be used as various OS buffers; if your 
> programs
> need more space, then the space being used for buffers 
> will decrease.
> Seeing the OS taking up 1.5GB means nothing else wants the 
> memory.
>
> So...  what is the testing that has led you to think a 
> process is
> slowing down due to page faults?

Memory access speed dropped from 90 MB per second to 17 MB / 
second, whenever I used more than 480 MB.

> -- 
> As we enjoy great advantages from the inventions of 
> others, we should
> be glad of an opportunity to serve others by any invention 
> of ours;
> and this we should do freely and generously. (Benjamin 
> Franklin) 


0
Peter
3/26/2010 4:13:29 AM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message
> news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>> "Grant Edwards" <invalid@invalid.invalid> wrote in message
>>> news:hoh807$3ck$1@reader1.panix.com...
>>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>>>>
>>>>> I did some testing and found out on my 2 GB machine that I can
>>>>> only load a little less than 500 MB before this process slows
>>>>> down considerably due to page faults. I want to know about how
>>>>> much memory usage is required by the OS if most every inessential
>>>>> process is disabled.
>>>> You can get by with 2MB if you're very careful.  I'd recommend 4MB
>>>> if you want to run an application or two.
>>>>
>>> My problem is that the OS is eating up 1.5 GB. I would be thrilled
>>> if I could get it down to 500 MB.
>> Unused memory is wasted memory.  Any RAM you've got that isn't in
>> use by your programs will be used as various OS buffers; if your
>> programs need more space, then the space being used for buffers will
>> decrease.  Seeing the OS taking up 1.5GB means nothing else wants
>> the memory.
>> So...  what is the testing that has led you to think a process is
>> slowing down due to page faults?
>
> Memory access speed dropped from 90 MB per second to 17 MB / second,
> whenever I used more than 480 MB.

Those numbers sound off by a few orders of magnitude.  What exactly
are you measuring, and how?

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 4:31:35 AM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1xwrwzu2jc.fsf@unicorn.mansr.com...
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message
>> news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>
>>>> "Grant Edwards" <invalid@invalid.invalid> wrote in 
>>>> message
>>>> news:hoh807$3ck$1@reader1.panix.com...
>>>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> 
>>>>> wrote:
>>>>>
>>>>>> I did some testing and found out on my 2 GB machine 
>>>>>> that I can
>>>>>> only load a little less than 500 MB before this 
>>>>>> process slows
>>>>>> down considerably due to page faults. I want to know 
>>>>>> about how
>>>>>> much memory usage is required by the OS if most every 
>>>>>> inessential
>>>>>> process is disabled.
>>>>> You can get by with 2MB if you're very careful.  I'd 
>>>>> recommend 4MB
>>>>> if you want to run an application or two.
>>>>>
>>>> My problem is that the OS is eating up 1.5 GB. I would 
>>>> be thrilled
>>>> if I could get it down to 500 MB.
>>> Unused memory is wasted memory.  Any RAM you've got that 
>>> isn't in
>>> use by your programs will be used as various OS buffers; 
>>> if your
>>> programs need more space, then the space being used for 
>>> buffers will
>>> decrease.  Seeing the OS taking up 1.5GB means nothing 
>>> else wants
>>> the memory.
>>> So...  what is the testing that has led you to think a 
>>> process is
>>> slowing down due to page faults?
>>
>> Memory access speed dropped from 90 MB per second to 17 
>> MB / second,
>> whenever I used more than 480 MB.
>
> Those numbers sound off by a few orders of magnitude. 
> What exactly
> are you measuring, and how?
>
> -- 
> M�ns Rullg�rd
> mans@mansr.com

Here is the C++ source

#include <stdio.h>
#include <stdlib.h>
#include <vector>
#include <time.h>

//#define uint32 unsigned int
typedef unsigned int uint32;

const uint32 size   = 100000000;
std::vector<uint32> Data;
uint32 Max = 0x3fffffff;


double Process() {
  clock_t finish;
  clock_t start = clock();
  double duration;
  uint32 num = 0;
  for (uint32 N = 0; N < Max; N++)
    num = Data[num];
  finish = clock();
  duration = (double)(finish - start) / CLOCKS_PER_SEC;
  return duration;
}


void Initialize() {
  Data.resize(size);
  for (uint32 N = 0; N < size; N++) {
    uint32 Random = rand() * rand();
    Random %= size;
    Data[N] = Random;
  }
  char N;
  printf("Hit any key to Continue:");
  scanf("%c", &N);
}


int main() {
  double duration;
  double MBperSec;
  printf("Size in bytes--->%d\n", size * 4);
  Initialize();
  duration = Process();
  MBperSec = (double)(Max * 4) / (duration * 1024 * 1024);
  printf("%4.2f Seconds   %4.2f Megabytes Per Second\n", 
duration, MBperSec);
  return 0;
}




0
Peter
3/26/2010 4:44:48 AM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
> news:yw1xwrwzu2jc.fsf@unicorn.mansr.com...
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message
>>> news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>
>>>>> "Grant Edwards" <invalid@invalid.invalid> wrote in 
>>>>> message
>>>>> news:hoh807$3ck$1@reader1.panix.com...
>>>>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> I did some testing and found out on my 2 GB machine that I can
>>>>>>> only load a little less than 500 MB before this process slows
>>>>>>> down considerably due to page faults. I want to know about how
>>>>>>> much memory usage is required by the OS if most every
>>>>>>> inessential process is disabled.
>>>>>> You can get by with 2MB if you're very careful.  I'd recommend
>>>>>> 4MB if you want to run an application or two.
>>>>>>
>>>>> My problem is that the OS is eating up 1.5 GB. I would be
>>>>> thrilled if I could get it down to 500 MB.
>>>> Unused memory is wasted memory.  Any RAM you've got that isn't in
>>>> use by your programs will be used as various OS buffers; if your
>>>> programs need more space, then the space being used for buffers
>>>> will decrease.  Seeing the OS taking up 1.5GB means nothing else
>>>> wants the memory.  So...  what is the testing that has led you to
>>>> think a process is slowing down due to page faults?
>>>
>>> Memory access speed dropped from 90 MB per second to 17 MB /
>>> second, whenever I used more than 480 MB.
>>
>> Those numbers sound off by a few orders of magnitude. What exactly
>> are you measuring, and how?
>
> Here is the C++ source

Very funny.

> #include <stdio.h>
> #include <stdlib.h>
> #include <vector>
> #include <time.h>
>
> //#define uint32 unsigned int
> typedef unsigned int uint32;
>
> const uint32 size   = 100000000;
> std::vector<uint32> Data;
> uint32 Max = 0x3fffffff;
>
> double Process() {
>   clock_t finish;
>   clock_t start = clock();
>   double duration;
>   uint32 num = 0;
>   for (uint32 N = 0; N < Max; N++)
>     num = Data[num];
>   finish = clock();
>   duration = (double)(finish - start) / CLOCKS_PER_SEC;
>   return duration;
> }

Even gcc manages to optimise this entire loop into nothing.  Your test
is meaningless.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 5:35:31 AM
Peter Olcott wrote:

> I want there to be no page faults for my process.

That may be hard, but start by avoiding to have any swap space.

> I want to limit the sum total of everything else to half of physical
> memory.

I don't know how/if you can do that.

But it really sounds like you should use MaRTE OS or something like
that, and not Linux.

Greetings,

Jacob
-- 
"The same equations have the same solutions."
0
Jacob
3/26/2010 9:39:23 AM
On 26.3.10 6:07 , Peter Olcott wrote:
> I want there to be no page faults for my process.

You cannot. Practically all disk reads will be handled
as page faults at the very bottom.

Please get a good book, e.g. Understanding the Linux
Kernel, and read and understand the chapters of memory
management.

You are very probably barking up the wrong tree - the
slow-down is caused by something else than paging.

One cause may be hardware. On many motherboards, there
is not enough cache to cover all of 2 GiB.

-- 

Tauno Voipio, MSEE
tauno voipio (at) iki fi


  I want to
> limit the sum total of everything else to half of physical
> memory.
>
> "David Schwartz"<davids@webmaster.com>  wrote in message
> news:1a5bf20b-626a-428b-813d-4aeb0fbc5b8c@v34g2000prm.googlegroups.com...
> On Mar 25, 8:41 pm, "Peter Olcott"<NoS...@OCR4Screen.com>
> wrote:
>
>> My problem is that the OS is eating up 1.5 GB. I would be
>> thrilled if I could get it down to 500 MB.
>
> What exactly are you measuring? The OS will, and should, use
> all of
> your memory. That's what it's for. Page faults are normal
> and don't
> indicate a problem (they're part of how the OS tracks memory
> usage).
>
> I think there's a strong possibility you're misunderstanding
> what
> you're seeing and your actual problem has nothing to do with
> usage or
> exhaustion of physical memory.
>
> DS
>
>

0
Tauno
3/26/2010 10:25:10 AM
On Mar 25, 9:07=A0pm, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:

> I want there to be no page faults for my process. I want to
> limit the sum total of everything else to half of physical
> memory.

It doesn't work that way. I think you have a mistaken notion of what a
page fault actually is.

A page fault occurs when a process goes to access a page of memory,
and the operating system has to intervene in some way. No matter what
you do, your process will take at least three different kinds of page
faults:

1) When the code first runs, as the code faults in, the process will
take page faults. The code pages are not loaded at first, so the OS
will need to intervene when they are accessed.

2) As you allocate memory, pages need to be mapped into your process.
Because the pages will at some point not be mapped, page faults will
be needed to map them.

3) As you use memory, you will occasionally access a page of memory
that has not been accessed since the last tick of the page ager. Since
the OS needs to mark these pages as recently accessed (to know not to
swap them out), page faults will be needed.

Unless you have some strong evidence that page faults are part of some
kind of problem, you are just trying to make the operating system act
against its own logic and sense. The operating system deliberately
uses page faults to handle demand loading, page aging and LRU
tracking, lazy memory allocations, and many more things. You would
have to lose all of these valuable features to get "no page faults".

DS
0
David
3/26/2010 10:54:27 AM
On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:

> I want there to be no page faults for my process.

Then lock those pages down.

> I want to limit the sum total of everything else to half of physical 
> memory.

I'm not aware of any way to do that unless you have your critical app
allocate half of physical ram and then lock those pages.

-- 
Grant Edwards               grant.b.edwards        Yow! If Robert Di Niro
                                  at               assassinates Walter Slezak,
                              gmail.com            will Jodie Foster marry
                                                   Bonzo??
0
Grant
3/26/2010 1:23:34 PM
Tauno Voipio <tauno.voipio@notused.fi.invalid> writes:

> On 26.3.10 6:07 , Peter Olcott wrote:
>> I want there to be no page faults for my process.
>
> You cannot. Practically all disk reads will be handled
> as page faults at the very bottom.

On CPUs without hardware page table walking, every TLB miss generates
a page fault too.  Page faults in themselves are really nothing to be
worried about.

> Please get a good book, e.g. Understanding the Linux
> Kernel, and read and understand the chapters of memory
> management.
>
> You are very probably barking up the wrong tree - the
> slow-down is caused by something else than paging.
>
> One cause may be hardware. On many motherboards, there
> is not enough cache to cover all of 2 GiB.

I doubt that is the case unless it is very old.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 1:58:55 PM
Grant Edwards <invalid@invalid.invalid> writes:

> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>
>> I want there to be no page faults for my process.
>
> Then lock those pages down.
>
>> I want to limit the sum total of everything else to half of physical 
>> memory.
>
> I'm not aware of any way to do that unless you have your critical app
> allocate half of physical ram and then lock those pages.

It is possible to restrict the amount of memory used by Linux using
the mem=X boot parameter.  One can then mmap the memory thus reserved
directly from /dev/mem.  Please note that this is in general a bad idea.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 2:06:23 PM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1xsk7ntzks.fsf@unicorn.mansr.com...
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>> news:yw1xwrwzu2jc.fsf@unicorn.mansr.com...
>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>
>>>> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message
>>>> news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
>>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>>
>>>>>> "Grant Edwards" <invalid@invalid.invalid> wrote in
>>>>>> message
>>>>>> news:hoh807$3ck$1@reader1.panix.com...
>>>>>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I did some testing and found out on my 2 GB machine 
>>>>>>>> that I can
>>>>>>>> only load a little less than 500 MB before this 
>>>>>>>> process slows
>>>>>>>> down considerably due to page faults. I want to 
>>>>>>>> know about how
>>>>>>>> much memory usage is required by the OS if most 
>>>>>>>> every
>>>>>>>> inessential process is disabled.
>>>>>>> You can get by with 2MB if you're very careful.  I'd 
>>>>>>> recommend
>>>>>>> 4MB if you want to run an application or two.
>>>>>>>
>>>>>> My problem is that the OS is eating up 1.5 GB. I 
>>>>>> would be
>>>>>> thrilled if I could get it down to 500 MB.
>>>>> Unused memory is wasted memory.  Any RAM you've got 
>>>>> that isn't in
>>>>> use by your programs will be used as various OS 
>>>>> buffers; if your
>>>>> programs need more space, then the space being used 
>>>>> for buffers
>>>>> will decrease.  Seeing the OS taking up 1.5GB means 
>>>>> nothing else
>>>>> wants the memory.  So...  what is the testing that has 
>>>>> led you to
>>>>> think a process is slowing down due to page faults?
>>>>
>>>> Memory access speed dropped from 90 MB per second to 17 
>>>> MB /
>>>> second, whenever I used more than 480 MB.
>>>
>>> Those numbers sound off by a few orders of magnitude. 
>>> What exactly
>>> are you measuring, and how?
>>
>> Here is the C++ source
>
> Very funny.
>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <vector>
>> #include <time.h>
>>
>> //#define uint32 unsigned int
>> typedef unsigned int uint32;
>>
>> const uint32 size   = 100000000;
>> std::vector<uint32> Data;
>> uint32 Max = 0x3fffffff;
>>
>> double Process() {
>>   clock_t finish;
>>   clock_t start = clock();
>>   double duration;
>>   uint32 num = 0;
>>   for (uint32 N = 0; N < Max; N++)
>>     num = Data[num];
>>   finish = clock();
>>   duration = (double)(finish - start) / CLOCKS_PER_SEC;
>>   return duration;
>> }
>
> Even gcc manages to optimise this entire loop into 
> nothing.  Your test
> is meaningless.
>
> -- 
> M�ns Rullg�rd
> mans@mansr.com

Actually this little test program accurately models the 
behavior of my deterministic finite automaton OCR 
recognizing engine, and the loop is not optimized into 
nothing.
   www.OCR4Screen.com 


0
Peter
3/26/2010 2:17:01 PM
"Jacob Sparre Andersen" <sparre@nbi.dk> wrote in message 
news:871vf7zak4.fsf@hugsarin.sparre-andersen.dk...
> Peter Olcott wrote:
>
>> I want there to be no page faults for my process.
>
> That may be hard, but start by avoiding to have any swap 
> space.
>
>> I want to limit the sum total of everything else to half 
>> of physical
>> memory.
>
> I don't know how/if you can do that.
>
> But it really sounds like you should use MaRTE OS or 
> something like
> that, and not Linux.
>
> Greetings,
>
> Jacob
> -- 
> "The same equations have the same solutions."

(1) The service provider that I will be using only has a few 
choices of OS.
(2) There is some sort of way to lock pages into memory that 
I read about. 


0
Peter
3/26/2010 2:18:42 PM
I am sure that it is page faults. Using all of those 
features would be fine because this web server is dedicated 
to run my process only.

"David Schwartz" <davids@webmaster.com> wrote in message 
news:c7801090-ae70-4b5b-b805-25de853050a4@a16g2000pre.googlegroups.com...
On Mar 25, 9:07 pm, "Peter Olcott" <NoS...@OCR4Screen.com> 
wrote:

> I want there to be no page faults for my process. I want 
> to
> limit the sum total of everything else to half of physical
> memory.

It doesn't work that way. I think you have a mistaken notion 
of what a
page fault actually is.

A page fault occurs when a process goes to access a page of 
memory,
and the operating system has to intervene in some way. No 
matter what
you do, your process will take at least three different 
kinds of page
faults:

1) When the code first runs, as the code faults in, the 
process will
take page faults. The code pages are not loaded at first, so 
the OS
will need to intervene when they are accessed.

2) As you allocate memory, pages need to be mapped into your 
process.
Because the pages will at some point not be mapped, page 
faults will
be needed to map them.

3) As you use memory, you will occasionally access a page of 
memory
that has not been accessed since the last tick of the page 
ager. Since
the OS needs to mark these pages as recently accessed (to 
know not to
swap them out), page faults will be needed.

Unless you have some strong evidence that page faults are 
part of some
kind of problem, you are just trying to make the operating 
system act
against its own logic and sense. The operating system 
deliberately
uses page faults to handle demand loading, page aging and 
LRU
tracking, lazy memory allocations, and many more things. You 
would
have to lose all of these valuable features to get "no page 
faults".

DS 


0
Peter
3/26/2010 2:22:23 PM
"Grant Edwards" <invalid@invalid.invalid> wrote in message 
news:hoicgm$ltg$1@reader1.panix.com...
> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>
>> I want there to be no page faults for my process.
>
> Then lock those pages down.
>
>> I want to limit the sum total of everything else to half 
>> of physical
>> memory.
>
> I'm not aware of any way to do that unless you have your 
> critical app
> allocate half of physical ram and then lock those pages.
>

Perfect this is exactly what I was looking for.  Where can I 
found out more about this?

> -- 
> Grant Edwards               grant.b.edwards        Yow! If 
> Robert Di Niro
>                                  at 
> assassinates Walter Slezak,
>                              gmail.com            will 
> Jodie Foster marry
>                                                   Bonzo?? 


0
Peter
3/26/2010 2:23:26 PM
"Tauno Voipio" <tauno.voipio@notused.fi.invalid> wrote in 
message news:hoi227$p3c$1@news.eternal-september.org...
> On 26.3.10 6:07 , Peter Olcott wrote:
>> I want there to be no page faults for my process.
>
> You cannot. Practically all disk reads will be handled
> as page faults at the very bottom.
>
> Please get a good book, e.g. Understanding the Linux
> Kernel, and read and understand the chapters of memory
> management.
>
> You are very probably barking up the wrong tree - the
> slow-down is caused by something else than paging.

I am sure that the slowing down is paging.

>
> One cause may be hardware. On many motherboards, there
> is not enough cache to cover all of 2 GiB.

My most powerful machine has only 8 MB of the slowest (L3) 
cache.

Isn't it something that we have to come up with a different 
abbreviation for gigabyte simply because the hard drive 
manufactures have been so persistent in lying about the size 
of a gigabyte that the lie has finally become the truth?

>
> -- 
>
> Tauno Voipio, MSEE
> tauno voipio (at) iki fi
>
>
>  I want to
>> limit the sum total of everything else to half of 
>> physical
>> memory.
>>
>> "David Schwartz"<davids@webmaster.com>  wrote in message
>> news:1a5bf20b-626a-428b-813d-4aeb0fbc5b8c@v34g2000prm.googlegroups.com...
>> On Mar 25, 8:41 pm, "Peter Olcott"<NoS...@OCR4Screen.com>
>> wrote:
>>
>>> My problem is that the OS is eating up 1.5 GB. I would 
>>> be
>>> thrilled if I could get it down to 500 MB.
>>
>> What exactly are you measuring? The OS will, and should, 
>> use
>> all of
>> your memory. That's what it's for. Page faults are normal
>> and don't
>> indicate a problem (they're part of how the OS tracks 
>> memory
>> usage).
>>
>> I think there's a strong possibility you're 
>> misunderstanding
>> what
>> you're seeing and your actual problem has nothing to do 
>> with
>> usage or
>> exhaustion of physical memory.
>>
>> DS
>>
>>
> 


0
Peter
3/26/2010 2:27:33 PM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1xfx3ntbxc.fsf@unicorn.mansr.com...
> Grant Edwards <invalid@invalid.invalid> writes:
>
>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> 
>> wrote:
>>
>>> I want there to be no page faults for my process.
>>
>> Then lock those pages down.
>>
>>> I want to limit the sum total of everything else to half 
>>> of physical
>>> memory.
>>
>> I'm not aware of any way to do that unless you have your 
>> critical app
>> allocate half of physical ram and then lock those pages.
>
> It is possible to restrict the amount of memory used by 
> Linux using
> the mem=X boot parameter.  One can then mmap the memory 
> thus reserved
> directly from /dev/mem.  Please note that this is in 
> general a bad idea.
>
> -- 
> M�ns Rullg�rd
> mans@mansr.com

Which of these two methods is better in my special case? I 
was thinking that locking the pages might be better because 
it tells every process OS or not, these pages can not be 
swapped out. 


0
Peter
3/26/2010 2:29:50 PM
On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
> "Grant Edwards" <invalid@invalid.invalid> wrote in message news:hoicgm$ltg$1@reader1.panix.com...
>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>>
>>> I want there to be no page faults for my process.
>>
>> Then lock those pages down.
>>
>>> I want to limit the sum total of everything else to half of physical
>>> memory.
>>
>> I'm not aware of any way to do that unless you have your critical app
>> allocate half of physical ram and then lock those pages.
>
> Perfect this is exactly what I was looking for.  Where can I 
> found out more about this?

http://www.google.com/search?q=linux+lock+pages

-- 
Grant Edwards               grant.b.edwards        Yow! Mary Tyler Moore's
                                  at               SEVENTH HUSBAND is wearing
                              gmail.com            my DACRON TANK TOP in a
                                                   cheap hotel in HONOLULU!
0
Grant
3/26/2010 2:30:07 PM
On Thu, 25 Mar 2010 23:07:26 -0500, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>I want there to be no page faults for my process. I want to 
>limit the sum total of everything else to half of physical 
>memory.

turn off swapping.  Linux doesn't need to swap code that is available
from a library, it simply unmaps it knowing it can always go back and
get it.
0
AZ
3/26/2010 2:59:48 PM
"AZ Nomad" <aznomad.3@PremoveOBthisOX.COM> wrote in message 
news:slrnhqpiv4.sk5.aznomad.3@ip70-176-155-130.ph.ph.cox.net...
> On Thu, 25 Mar 2010 23:07:26 -0500, Peter Olcott 
> <NoSpam@OCR4Screen.com> wrote:
>>I want there to be no page faults for my process. I want 
>>to
>>limit the sum total of everything else to half of physical
>>memory.
>
> turn off swapping.  Linux doesn't need to swap code that 
> is available
> from a library, it simply unmaps it knowing it can always 
> go back and
> get it.

I like the idea of locking pages better. Turning off 
swapping completely might effect the performance and 
reliability of the OS too much. 


0
Peter
3/26/2010 3:02:52 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
> news:yw1xsk7ntzks.fsf@unicorn.mansr.com...
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>>> news:yw1xwrwzu2jc.fsf@unicorn.mansr.com...
>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>
>>>>> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message
>>>>> news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
>>>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>>>
>>>>>>> "Grant Edwards" <invalid@invalid.invalid> wrote in
>>>>>>> message
>>>>>>> news:hoh807$3ck$1@reader1.panix.com...
>>>>>>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I did some testing and found out on my 2 GB machine that I
>>>>>>>>> can only load a little less than 500 MB before this process
>>>>>>>>> slows down considerably due to page faults. I want to know
>>>>>>>>> about how much memory usage is required by the OS if most
>>>>>>>>> every inessential process is disabled.
>>>>>>>> You can get by with 2MB if you're very careful.  I'd
>>>>>>>> recommend 4MB if you want to run an application or two.
>>>>>>>>
>>>>>>> My problem is that the OS is eating up 1.5 GB. I would be
>>>>>>> thrilled if I could get it down to 500 MB.
>>>>>> Unused memory is wasted memory.  Any RAM you've got that isn't
>>>>>> in use by your programs will be used as various OS buffers; if
>>>>>> your programs need more space, then the space being used for
>>>>>> buffers will decrease.  Seeing the OS taking up 1.5GB means
>>>>>> nothing else wants the memory.  So...  what is the testing that
>>>>>> has led you to think a process is slowing down due to page
>>>>>> faults?
>>>>>
>>>>> Memory access speed dropped from 90 MB per second to 17 MB /
>>>>> second, whenever I used more than 480 MB.
>>>>
>>>> Those numbers sound off by a few orders of magnitude. What
>>>> exactly are you measuring, and how?
>>>
>>> Here is the C++ source
>>
>> Very funny.
>>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <vector>
>>> #include <time.h>
>>>
>>> //#define uint32 unsigned int
>>> typedef unsigned int uint32;
>>>
>>> const uint32 size   = 100000000;
>>> std::vector<uint32> Data;
>>> uint32 Max = 0x3fffffff;
>>>
>>> double Process() {
>>>   clock_t finish;
>>>   clock_t start = clock();
>>>   double duration;
>>>   uint32 num = 0;
>>>   for (uint32 N = 0; N < Max; N++)
>>>     num = Data[num];
>>>   finish = clock();
>>>   duration = (double)(finish - start) / CLOCKS_PER_SEC;
>>>   return duration;
>>> }
>>
>> Even gcc manages to optimise this entire loop into nothing.  Your
>> test is meaningless.
>
> Actually this little test program accurately models the 
> behavior of my deterministic finite automaton OCR 
> recognizing engine, and the loop is not optimized into 
> nothing.

I compiled your test app.  There is no loop.  If your compiler creates
a loop, it is truly an awful compiler.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 4:18:18 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
> news:yw1xfx3ntbxc.fsf@unicorn.mansr.com...
>> Grant Edwards <invalid@invalid.invalid> writes:
>>
>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>>>
>>>> I want there to be no page faults for my process.
>>> Then lock those pages down.
>>>
>>>> I want to limit the sum total of everything else to half of
>>>> physical memory.
>>> I'm not aware of any way to do that unless you have your critical
>>> app allocate half of physical ram and then lock those pages.
>> It is possible to restrict the amount of memory used by Linux using
>> the mem=X boot parameter.  One can then mmap the memory thus
>> reserved directly from /dev/mem.  Please note that this is in
>> general a bad idea.
>
> Which of these two methods is better in my special case? I was
> thinking that locking the pages might be better because it tells every
> process OS or not, these pages can not be swapped out.

From what you've told us so far, there is _nothing_ remotely special
about your case.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 4:23:37 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "AZ Nomad" <aznomad.3@PremoveOBthisOX.COM> wrote in message
> news:slrnhqpiv4.sk5.aznomad.3@ip70-176-155-130.ph.ph.cox.net...
>> On Thu, 25 Mar 2010 23:07:26 -0500, Peter Olcott
>> <NoSpam@OCR4Screen.com> wrote:
>>>I want there to be no page faults for my process. I want to limit
>>>the sum total of everything else to half of physical memory.
>> turn off swapping.  Linux doesn't need to swap code that is
>> available from a library, it simply unmaps it knowing it can always
>> go back and get it.
>
> I like the idea of locking pages better. Turning off swapping
> completely might effect the performance and reliability of the OS too
> much.

Sorry to break it to you, but you clearly have no clue about memory
management in a modern OS.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 4:24:54 PM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1xbpebt5th.fsf@unicorn.mansr.com...
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>> news:yw1xsk7ntzks.fsf@unicorn.mansr.com...
>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>
>>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>>>> news:yw1xwrwzu2jc.fsf@unicorn.mansr.com...
>>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>>
>>>>>> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in 
>>>>>> message
>>>>>> news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
>>>>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>>>>
>>>>>>>> "Grant Edwards" <invalid@invalid.invalid> wrote in
>>>>>>>> message
>>>>>>>> news:hoh807$3ck$1@reader1.panix.com...
>>>>>>>>> On 2010-03-26, Peter Olcott 
>>>>>>>>> <NoSpam@OCR4Screen.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I did some testing and found out on my 2 GB 
>>>>>>>>>> machine that I
>>>>>>>>>> can only load a little less than 500 MB before 
>>>>>>>>>> this process
>>>>>>>>>> slows down considerably due to page faults. I 
>>>>>>>>>> want to know
>>>>>>>>>> about how much memory usage is required by the OS 
>>>>>>>>>> if most
>>>>>>>>>> every inessential process is disabled.
>>>>>>>>> You can get by with 2MB if you're very careful. 
>>>>>>>>> I'd
>>>>>>>>> recommend 4MB if you want to run an application or 
>>>>>>>>> two.
>>>>>>>>>
>>>>>>>> My problem is that the OS is eating up 1.5 GB. I 
>>>>>>>> would be
>>>>>>>> thrilled if I could get it down to 500 MB.
>>>>>>> Unused memory is wasted memory.  Any RAM you've got 
>>>>>>> that isn't
>>>>>>> in use by your programs will be used as various OS 
>>>>>>> buffers; if
>>>>>>> your programs need more space, then the space being 
>>>>>>> used for
>>>>>>> buffers will decrease.  Seeing the OS taking up 
>>>>>>> 1.5GB means
>>>>>>> nothing else wants the memory.  So...  what is the 
>>>>>>> testing that
>>>>>>> has led you to think a process is slowing down due 
>>>>>>> to page
>>>>>>> faults?
>>>>>>
>>>>>> Memory access speed dropped from 90 MB per second to 
>>>>>> 17 MB /
>>>>>> second, whenever I used more than 480 MB.
>>>>>
>>>>> Those numbers sound off by a few orders of magnitude. 
>>>>> What
>>>>> exactly are you measuring, and how?
>>>>
>>>> Here is the C++ source
>>>
>>> Very funny.
>>>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>> #include <vector>
>>>> #include <time.h>
>>>>
>>>> //#define uint32 unsigned int
>>>> typedef unsigned int uint32;
>>>>
>>>> const uint32 size   = 100000000;
>>>> std::vector<uint32> Data;
>>>> uint32 Max = 0x3fffffff;
>>>>
>>>> double Process() {
>>>>   clock_t finish;
>>>>   clock_t start = clock();
>>>>   double duration;
>>>>   uint32 num = 0;
>>>>   for (uint32 N = 0; N < Max; N++)
>>>>     num = Data[num];
>>>>   finish = clock();
>>>>   duration = (double)(finish - start) / CLOCKS_PER_SEC;
>>>>   return duration;
>>>> }
>>>
>>> Even gcc manages to optimise this entire loop into 
>>> nothing.  Your
>>> test is meaningless.
>>
>> Actually this little test program accurately models the
>> behavior of my deterministic finite automaton OCR
>> recognizing engine, and the loop is not optimized into
>> nothing.
>
> I compiled your test app.  There is no loop.  If your 
> compiler creates
> a loop, it is truly an awful compiler.
>
> -- 
> M�ns Rullg�rd
> mans@mansr.com

OK then g++ must be a truly awful compiler. 


0
Peter
3/26/2010 4:50:23 PM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1x39znt5ih.fsf@unicorn.mansr.com...
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "AZ Nomad" <aznomad.3@PremoveOBthisOX.COM> wrote in 
>> message
>> news:slrnhqpiv4.sk5.aznomad.3@ip70-176-155-130.ph.ph.cox.net...
>>> On Thu, 25 Mar 2010 23:07:26 -0500, Peter Olcott
>>> <NoSpam@OCR4Screen.com> wrote:
>>>>I want there to be no page faults for my process. I want 
>>>>to limit
>>>>the sum total of everything else to half of physical 
>>>>memory.
>>> turn off swapping.  Linux doesn't need to swap code that 
>>> is
>>> available from a library, it simply unmaps it knowing it 
>>> can always
>>> go back and get it.
>>
>> I like the idea of locking pages better. Turning off 
>> swapping
>> completely might effect the performance and reliability 
>> of the OS too
>> much.
>
> Sorry to break it to you, but you clearly have no clue 
> about memory
> management in a modern OS.
>
> -- 
> M�ns Rullg�rd
> mans@mansr.com

I have a real time process. 


0
Peter
3/26/2010 4:51:11 PM
On 03/26/10 12:18, M�ns Rullg�rd wrote:
> "Peter Olcott"<NoSpam@OCR4Screen.com>  writes:
>
>> "M�ns Rullg�rd"<mans@mansr.com>  wrote in message
>> news:yw1xsk7ntzks.fsf@unicorn.mansr.com...
>>> "Peter Olcott"<NoSpam@OCR4Screen.com>  writes:
>>>
>>>> "M�ns Rullg�rd"<mans@mansr.com>  wrote in message
>>>> news:yw1xwrwzu2jc.fsf@unicorn.mansr.com...
>>>>> "Peter Olcott"<NoSpam@OCR4Screen.com>  writes:
>>>>>
>>>>>> "Joe Pfeiffer"<pfeiffer@cs.nmsu.edu>  wrote in message
>>>>>> news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
>>>>>>> "Peter Olcott"<NoSpam@OCR4Screen.com>  writes:
>>>>>>>
>>>>>>>> "Grant Edwards"<invalid@invalid.invalid>  wrote in
>>>>>>>> message
>>>>>>>> news:hoh807$3ck$1@reader1.panix.com...
>>>>>>>>> On 2010-03-26, Peter Olcott<NoSpam@OCR4Screen.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I did some testing and found out on my 2 GB machine that I
>>>>>>>>>> can only load a little less than 500 MB before this process
>>>>>>>>>> slows down considerably due to page faults. I want to know
>>>>>>>>>> about how much memory usage is required by the OS if most
>>>>>>>>>> every inessential process is disabled.
>>>>>>>>> You can get by with 2MB if you're very careful.  I'd
>>>>>>>>> recommend 4MB if you want to run an application or two.
>>>>>>>>>
>>>>>>>> My problem is that the OS is eating up 1.5 GB. I would be
>>>>>>>> thrilled if I could get it down to 500 MB.
>>>>>>> Unused memory is wasted memory.  Any RAM you've got that isn't
>>>>>>> in use by your programs will be used as various OS buffers; if
>>>>>>> your programs need more space, then the space being used for
>>>>>>> buffers will decrease.  Seeing the OS taking up 1.5GB means
>>>>>>> nothing else wants the memory.  So...  what is the testing that
>>>>>>> has led you to think a process is slowing down due to page
>>>>>>> faults?
>>>>>>
>>>>>> Memory access speed dropped from 90 MB per second to 17 MB /
>>>>>> second, whenever I used more than 480 MB.
>>>>>
>>>>> Those numbers sound off by a few orders of magnitude. What
>>>>> exactly are you measuring, and how?
>>>>
>>>> Here is the C++ source
>>>
>>> Very funny.
>>>
>>>> #include<stdio.h>
>>>> #include<stdlib.h>
>>>> #include<vector>
>>>> #include<time.h>
>>>>
>>>> //#define uint32 unsigned int
>>>> typedef unsigned int uint32;
>>>>
>>>> const uint32 size   = 100000000;
>>>> std::vector<uint32>  Data;
>>>> uint32 Max = 0x3fffffff;
>>>>
>>>> double Process() {
>>>>    clock_t finish;
>>>>    clock_t start = clock();
>>>>    double duration;
>>>>    uint32 num = 0;
>>>>    for (uint32 N = 0; N<  Max; N++)
>>>>      num = Data[num];
>>>>    finish = clock();
>>>>    duration = (double)(finish - start) / CLOCKS_PER_SEC;
>>>>    return duration;
>>>> }
>>>
>>> Even gcc manages to optimise this entire loop into nothing.  Your
>>> test is meaningless.
>>
>> Actually this little test program accurately models the
>> behavior of my deterministic finite automaton OCR
>> recognizing engine, and the loop is not optimized into
>> nothing.
>
> I compiled your test app.  There is no loop.  If your compiler creates
> a loop, it is truly an awful compiler.

Or the optimizer turned off?
0
Joe
3/26/2010 5:01:57 PM
M�ns Rullg�rd <mans@mansr.com> writes:

> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
>> news:yw1xsk7ntzks.fsf@unicorn.mansr.com...
>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>
>>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>>>> news:yw1xwrwzu2jc.fsf@unicorn.mansr.com...
>>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>>
>>>>>> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message
>>>>>> news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
>>>>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>>>>
>>>>>>>> "Grant Edwards" <invalid@invalid.invalid> wrote in
>>>>>>>> message
>>>>>>>> news:hoh807$3ck$1@reader1.panix.com...
>>>>>>>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> I did some testing and found out on my 2 GB machine that I
>>>>>>>>>> can only load a little less than 500 MB before this process
>>>>>>>>>> slows down considerably due to page faults. I want to know
>>>>>>>>>> about how much memory usage is required by the OS if most
>>>>>>>>>> every inessential process is disabled.
>>>>>>>>> You can get by with 2MB if you're very careful.  I'd
>>>>>>>>> recommend 4MB if you want to run an application or two.
>>>>>>>>>
>>>>>>>> My problem is that the OS is eating up 1.5 GB. I would be
>>>>>>>> thrilled if I could get it down to 500 MB.
>>>>>>> Unused memory is wasted memory.  Any RAM you've got that isn't
>>>>>>> in use by your programs will be used as various OS buffers; if
>>>>>>> your programs need more space, then the space being used for
>>>>>>> buffers will decrease.  Seeing the OS taking up 1.5GB means
>>>>>>> nothing else wants the memory.  So...  what is the testing that
>>>>>>> has led you to think a process is slowing down due to page
>>>>>>> faults?
>>>>>>
>>>>>> Memory access speed dropped from 90 MB per second to 17 MB /
>>>>>> second, whenever I used more than 480 MB.
>>>>>
>>>>> Those numbers sound off by a few orders of magnitude. What
>>>>> exactly are you measuring, and how?
>>>>
>>>> Here is the C++ source
>>>
>>> Very funny.
>>>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>> #include <vector>
>>>> #include <time.h>
>>>>
>>>> //#define uint32 unsigned int
>>>> typedef unsigned int uint32;
>>>>
>>>> const uint32 size   = 100000000;
>>>> std::vector<uint32> Data;
>>>> uint32 Max = 0x3fffffff;
>>>>
>>>> double Process() {
>>>>   clock_t finish;
>>>>   clock_t start = clock();
>>>>   double duration;
>>>>   uint32 num = 0;
>>>>   for (uint32 N = 0; N < Max; N++)
>>>>     num = Data[num];
>>>>   finish = clock();
>>>>   duration = (double)(finish - start) / CLOCKS_PER_SEC;
>>>>   return duration;
>>>> }
>>>
>>> Even gcc manages to optimise this entire loop into nothing.  Your
>>> test is meaningless.
>>
>> Actually this little test program accurately models the 
>> behavior of my deterministic finite automaton OCR 
>> recognizing engine, and the loop is not optimized into 
>> nothing.
>
> I compiled your test app.  There is no loop.  If your compiler creates
> a loop, it is truly an awful compiler.

The goal of a benchmark like his isn't to run efficiently, it's to
measure how some other application will really run.  So you don't
optimize at a high enough level to optimize the loop away.
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/26/2010 5:03:29 PM
"Joe Beanfish" <joe@nospam.duh> wrote in message 
news:hoipa5$uvm@news.thunderstone.com...
> On 03/26/10 12:18, M�ns Rullg�rd wrote:
>> "Peter Olcott"<NoSpam@OCR4Screen.com>  writes:
>>
>>>>> Here is the C++ source
>>>>
>>>> Very funny.
>>>>
>>>>> #include<stdio.h>
>>>>> #include<stdlib.h>
>>>>> #include<vector>
>>>>> #include<time.h>
>>>>>
>>>>> //#define uint32 unsigned int
>>>>> typedef unsigned int uint32;
>>>>>
>>>>> const uint32 size   = 100000000;
>>>>> std::vector<uint32>  Data;
>>>>> uint32 Max = 0x3fffffff;
>>>>>
>>>>> double Process() {
>>>>>    clock_t finish;
>>>>>    clock_t start = clock();
>>>>>    double duration;
>>>>>    uint32 num = 0;
>>>>>    for (uint32 N = 0; N<  Max; N++)
>>>>>      num = Data[num];
>>>>>    finish = clock();
>>>>>    duration = (double)(finish - start) / 
>>>>> CLOCKS_PER_SEC;
>>>>>    return duration;
>>>>> }
>>>>
>>>> Even gcc manages to optimise this entire loop into 
>>>> nothing.  Your
>>>> test is meaningless.
>>>
>>> Actually this little test program accurately models the
>>> behavior of my deterministic finite automaton OCR
>>> recognizing engine, and the loop is not optimized into
>>> nothing.
>>
>> I compiled your test app.  There is no loop.  If your 
>> compiler creates
>> a loop, it is truly an awful compiler.
>
> Or the optimizer turned off?

The optimizer should not change the semantics enough to 
optimize away this loop. 


0
Peter
3/26/2010 5:17:51 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "Joe Beanfish" <joe@nospam.duh> wrote in message 
> news:hoipa5$uvm@news.thunderstone.com...
>>
>> Or the optimizer turned off?
>
> The optimizer should not change the semantics enough to 
> optimize away this loop. 

Oh, yes it could.  It can easily notice that num is never read after
it's written, and then notice that all assignmens to it are unnecessary,
and then notice the whole loop is unnecessary.

A couple of weeks ago I wrote a toy bubblesort program so my students
could have some more-or-less real code to compare branch predictors on.
When I didn't read the sorted array, -O3 optimized the whole bubblesort
away.  When I did read it, it completely unrolled the entire
doubly-nested loop (it was a tiny array).

I've been looking at your benchmark program; I don't have an explanation
for what you're seeing, except to say I'm seeing it too, but I'm
watching vmstat while it runs.  0 swaps in, 0 out.
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/26/2010 5:41:28 PM
Peter Olcott writes:
> I have a real time process.

Then select a real-time scheduler, lock your process's memory, and be
happy.  "page faults" are not faults in the sense of being some sort of
error.  They are a normal part of the operation of any virtual memory
OS.

And learn to trim your replies (everyone, not just Peter).

-- 
John Hasler 
jhasler@newsguy.com
Dancing Horse Hill
Elmwood, WI USA
0
John
3/26/2010 5:43:29 PM
"John Hasler" <jhasler@newsguy.com> wrote in message 
news:874ok3ugfy.fsf@thumper.dhh.gt.org...
> Peter Olcott writes:
>> I have a real time process.
>
> Then select a real-time scheduler, lock your process's 
> memory, and be
> happy.  "page faults" are not faults in the sense of being 
> some sort of
> error.  They are a normal part of the operation of any 
> virtual memory
> OS.
>
> And learn to trim your replies (everyone, not just Peter).
>
> -- 
> John Hasler
> jhasler@newsguy.com
> Dancing Horse Hill
> Elmwood, WI USA

I tried to lock the memory and it did not work. The code 
compiled OK, but it did not seem to lock all of the memory.
  http://linux.die.net/man/2/mlock

I suspect that this may be the problem.
  RLIMIT_MEMLOCK soft resource limit instead defines a limit 
on how much memory an unprivileged process may lock.
I tried to display this value, by the compiler could not 
find it.

Is this current enough material relating to real-time 
scheduling in Linux?
   http://www.informit.com/articles/article.aspx?p=101760&seqNum=4

0
Peter
3/26/2010 6:10:12 PM
Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:

> M�ns Rullg�rd <mans@mansr.com> writes:
>
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
>>> news:yw1xsk7ntzks.fsf@unicorn.mansr.com...
>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>
>>>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>>>>> news:yw1xwrwzu2jc.fsf@unicorn.mansr.com...
>>>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>>>
>>>>>>> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message
>>>>>>> news:1bbpeb3eqc.fsf@snowball.wb.pfeifferfamily.net...
>>>>>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>>>>>
>>>>>>>>> "Grant Edwards" <invalid@invalid.invalid> wrote in
>>>>>>>>> message
>>>>>>>>> news:hoh807$3ck$1@reader1.panix.com...
>>>>>>>>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I did some testing and found out on my 2 GB machine that I
>>>>>>>>>>> can only load a little less than 500 MB before this process
>>>>>>>>>>> slows down considerably due to page faults. I want to know
>>>>>>>>>>> about how much memory usage is required by the OS if most
>>>>>>>>>>> every inessential process is disabled.
>>>>>>>>>> You can get by with 2MB if you're very careful.  I'd
>>>>>>>>>> recommend 4MB if you want to run an application or two.
>>>>>>>>>>
>>>>>>>>> My problem is that the OS is eating up 1.5 GB. I would be
>>>>>>>>> thrilled if I could get it down to 500 MB.
>>>>>>>> Unused memory is wasted memory.  Any RAM you've got that isn't
>>>>>>>> in use by your programs will be used as various OS buffers; if
>>>>>>>> your programs need more space, then the space being used for
>>>>>>>> buffers will decrease.  Seeing the OS taking up 1.5GB means
>>>>>>>> nothing else wants the memory.  So...  what is the testing that
>>>>>>>> has led you to think a process is slowing down due to page
>>>>>>>> faults?
>>>>>>>
>>>>>>> Memory access speed dropped from 90 MB per second to 17 MB /
>>>>>>> second, whenever I used more than 480 MB.
>>>>>>
>>>>>> Those numbers sound off by a few orders of magnitude. What
>>>>>> exactly are you measuring, and how?
>>>>>
>>>>> Here is the C++ source
>>>>
>>>> Very funny.
>>>>
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>> #include <vector>
>>>>> #include <time.h>
>>>>>
>>>>> //#define uint32 unsigned int
>>>>> typedef unsigned int uint32;
>>>>>
>>>>> const uint32 size   = 100000000;
>>>>> std::vector<uint32> Data;
>>>>> uint32 Max = 0x3fffffff;
>>>>>
>>>>> double Process() {
>>>>>   clock_t finish;
>>>>>   clock_t start = clock();
>>>>>   double duration;
>>>>>   uint32 num = 0;
>>>>>   for (uint32 N = 0; N < Max; N++)
>>>>>     num = Data[num];
>>>>>   finish = clock();
>>>>>   duration = (double)(finish - start) / CLOCKS_PER_SEC;
>>>>>   return duration;
>>>>> }
>>>>
>>>> Even gcc manages to optimise this entire loop into nothing.  Your
>>>> test is meaningless.
>>>
>>> Actually this little test program accurately models the 
>>> behavior of my deterministic finite automaton OCR 
>>> recognizing engine, and the loop is not optimized into 
>>> nothing.
>>
>> I compiled your test app.  There is no loop.  If your compiler creates
>> a loop, it is truly an awful compiler.
>
> The goal of a benchmark like his isn't to run efficiently, it's to
> measure how some other application will really run.  So you don't
> optimize at a high enough level to optimize the loop away.

If you do that, it is no longer an accurate representation of the real
scenario, if it ever was.  For the test to be meaningful, you must
ensure the code (machine, not source) being tested is exactly same as
in the real application.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 6:15:37 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
> news:yw1xbpebt5th.fsf@unicorn.mansr.com...
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>>> news:yw1xsk7ntzks.fsf@unicorn.mansr.com...
>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>>
>>>>> double Process() {
>>>>>   clock_t finish;
>>>>>   clock_t start = clock();
>>>>>   double duration;
>>>>>   uint32 num = 0;
>>>>>   for (uint32 N = 0; N < Max; N++)
>>>>>     num = Data[num];
>>>>>   finish = clock();
>>>>>   duration = (double)(finish - start) / CLOCKS_PER_SEC;
>>>>>   return duration;
>>>>> }
>>>>
>>>> Even gcc manages to optimise this entire loop into nothing.  Your
>>>> test is meaningless.
>>>
>>> Actually this little test program accurately models the behavior
>>> of my deterministic finite automaton OCR recognizing engine, and
>>> the loop is not optimized into nothing.
>>
>> I compiled your test app.  There is no loop.  If your compiler
>> creates a loop, it is truly an awful compiler.
>
> OK then g++ must be a truly awful compiler. 

That it is, but apparently not awful enough to miss this trivial
optimisation.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 6:18:06 PM
On Fri, 26 Mar 2010 10:02:52 -0500, Peter Olcott <NoSpam@OCR4Screen.com> wrote:

>"AZ Nomad" <aznomad.3@PremoveOBthisOX.COM> wrote in message 
>news:slrnhqpiv4.sk5.aznomad.3@ip70-176-155-130.ph.ph.cox.net...
>> On Thu, 25 Mar 2010 23:07:26 -0500, Peter Olcott 
>> <NoSpam@OCR4Screen.com> wrote:
>>>I want there to be no page faults for my process. I want 
>>>to
>>>limit the sum total of everything else to half of physical
>>>memory.
>>
>> turn off swapping.  Linux doesn't need to swap code that 
>> is available
>> from a library, it simply unmaps it knowing it can always 
>> go back and
>> get it.

>I like the idea of locking pages better. Turning off 
>swapping completely might effect the performance and 
>reliability of the OS too much. 


I've done it on a couple of big embedded systems (no normal users,
just running a big app like mythtv.)  First was the main server, with
web and mail servers as well in 4G of ram.  Ran fine w/out swap even
though it was a gentoo system doing enormous compiles for updates.
Other is a remote diskless frontent for mythtv w/ 2G of ram.
0
AZ
3/26/2010 6:35:07 PM
"AZ Nomad" <aznomad.3@PremoveOBthisOX.COM> wrote in message 
news:slrnhqpvir.unu.aznomad.3@ip70-176-155-130.ph.ph.cox.net...
> On Fri, 26 Mar 2010 10:02:52 -0500, Peter Olcott 
> <NoSpam@OCR4Screen.com> wrote:
>
>>"AZ Nomad" <aznomad.3@PremoveOBthisOX.COM> wrote in 
>>message
>>news:slrnhqpiv4.sk5.aznomad.3@ip70-176-155-130.ph.ph.cox.net...
>>> On Thu, 25 Mar 2010 23:07:26 -0500, Peter Olcott
>>> <NoSpam@OCR4Screen.com> wrote:
>>>>I want there to be no page faults for my process. I want
>>>>to
>>>>limit the sum total of everything else to half of 
>>>>physical
>>>>memory.
>>>
>>> turn off swapping.  Linux doesn't need to swap code that
>>> is available
>>> from a library, it simply unmaps it knowing it can 
>>> always
>>> go back and
>>> get it.
>
>>I like the idea of locking pages better. Turning off
>>swapping completely might effect the performance and
>>reliability of the OS too much.
>
>
> I've done it on a couple of big embedded systems (no 
> normal users,
> just running a big app like mythtv.)  First was the main 
> server, with
> web and mail servers as well in 4G of ram.  Ran fine w/out 
> swap even
> though it was a gentoo system doing enormous compiles for 
> updates.
> Other is a remote diskless frontent for mythtv w/ 2G of 
> ram.

How do you turn off swapping completely? I will try it. 


0
Peter
3/26/2010 6:42:51 PM
M�ns Rullg�rd <mans@mansr.com> writes:

> Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:
>
>> M�ns Rullg�rd <mans@mansr.com> writes:
>>>
>>> I compiled your test app.  There is no loop.  If your compiler creates
>>> a loop, it is truly an awful compiler.
>>
>> The goal of a benchmark like his isn't to run efficiently, it's to
>> measure how some other application will really run.  So you don't
>> optimize at a high enough level to optimize the loop away.
>
> If you do that, it is no longer an accurate representation of the real
> scenario, if it ever was.  For the test to be meaningful, you must
> ensure the code (machine, not source) being tested is exactly same as
> in the real application.

Nonsense.  If you're trying to model a process's memory access behavior,
all you have to do come up with a program that shows the same behavior.
That's what he's trying to do, so his approach is a good one.  I agree
with everybody else responding that either (1) he's misinterpreting his
results, or (2) his benchmark isn't doing what he thinks he is.
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/26/2010 6:44:14 PM
"Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message 
news:1bwrwyc48x.fsf@snowball.wb.pfeifferfamily.net...
> M�ns Rullg�rd <mans@mansr.com> writes:
>
>> Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:
>>
>>> M�ns Rullg�rd <mans@mansr.com> writes:
>>>>
>>>> I compiled your test app.  There is no loop.  If your 
>>>> compiler creates
>>>> a loop, it is truly an awful compiler.
>>>
>>> The goal of a benchmark like his isn't to run 
>>> efficiently, it's to
>>> measure how some other application will really run.  So 
>>> you don't
>>> optimize at a high enough level to optimize the loop 
>>> away.
>>
>> If you do that, it is no longer an accurate 
>> representation of the real
>> scenario, if it ever was.  For the test to be meaningful, 
>> you must
>> ensure the code (machine, not source) being tested is 
>> exactly same as
>> in the real application.
>
> Nonsense.  If you're trying to model a process's memory 
> access behavior,
> all you have to do come up with a program that shows the 
> same behavior.
> That's what he's trying to do, so his approach is a good 
> one.  I agree
> with everybody else responding that either (1) he's 
> misinterpreting his
> results, or (2) his benchmark isn't doing what he thinks 
> he is.

Of course even a universal consensus to not necessarily 
create truth.

The issue is that on a machine with 2 GB of RAM the 
performance of the benchmark drops from 90 MB / sec to 17 MB 
/sec when memory allocation exceeds 450 MB. All other 
processes are only taking about 300 MB. These results are 
easily replicated by compiling and running the source code 
that I provided. Don't use the -O3 switch.

I want to know exactly why this is occurring and thus the 
options that I have to alter this behavior besides buying 
more RAM.

> -- 
> As we enjoy great advantages from the inventions of 
> others, we should
> be glad of an opportunity to serve others by any invention 
> of ours;
> and this we should do freely and generously. (Benjamin 
> Franklin) 


0
Peter
3/26/2010 7:06:55 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message
> news:1bwrwyc48x.fsf@snowball.wb.pfeifferfamily.net...
>> M�ns Rullg�rd <mans@mansr.com> writes:
>>
>>> Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:
>>>
>>>> M�ns Rullg�rd <mans@mansr.com> writes:
>>>>> I compiled your test app.  There is no loop.  If your compiler
>>>>> creates a loop, it is truly an awful compiler.
>>>> The goal of a benchmark like his isn't to run efficiently, it's to
>>>> measure how some other application will really run.  So you don't
>>>> optimize at a high enough level to optimize the loop away.
>>> If you do that, it is no longer an accurate representation of the
>>> real scenario, if it ever was.  For the test to be meaningful, you
>>> must ensure the code (machine, not source) being tested is exactly
>>> same as in the real application.
>> Nonsense.  If you're trying to model a process's memory access
>> behavior, all you have to do come up with a program that shows the
>> same behavior.

That is true, and his test case shows an entirely different behaviour.
When I compiled it, it performed no memory accesses whatsoever.  If
optimisation is lowered sufficiently that the loop remains, many other
optimisations are also disabled, and the overall behaviour of the
program changes radically.

>> That's what he's trying to do, so his approach is a good one.  I
>> agree with everybody else responding that either (1) he's
>> misinterpreting his results, or (2) his benchmark isn't doing what
>> he thinks he is.

I say both of those are true in this case.

> Of course even a universal consensus to not necessarily create truth.
>
> The issue is that on a machine with 2 GB of RAM the performance of the
> benchmark drops from 90 MB / sec to 17 MB /sec when memory allocation
> exceeds 450 MB. All other processes are only taking about 300
> MB. These results are easily replicated by compiling and running the
> source code that I provided. Don't use the -O3 switch.
>
> I want to know exactly why this is occurring and thus the options that
> I have to alter this behavior besides buying more RAM.

Have you examined the machine code generated in each case?

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 7:19:05 PM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1xmxxusxg6.fsf@unicorn.mansr.com...
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:

>> The issue is that on a machine with 2 GB of RAM the 
>> performance of the
>> benchmark drops from 90 MB / sec to 17 MB /sec when 
>> memory allocation
>> exceeds 450 MB. All other processes are only taking about 
>> 300
>> MB. These results are easily replicated by compiling and 
>> running the
>> source code that I provided. Don't use the -O3 switch.
>>
>> I want to know exactly why this is occurring and thus the 
>> options that
>> I have to alter this behavior besides buying more RAM.
>
> Have you examined the machine code generated in each case?

What would be the point of that?

The only difference between the performance in the first 
case and the performance in the second case is the effect of 
the change of a single constant;
const unsigned int size = 100000000;
const unsigned int size = 200000000;


>
> -- 
> M�ns Rullg�rd
> mans@mansr.com 


0
Peter
3/26/2010 7:26:22 PM
On Mar 26, 12:06=A0pm, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:

> The issue is that on a machine with 2 GB of RAM the
> performance of the benchmark drops from 90 MB / sec to 17 MB
> /sec when memory allocation exceeds 450 MB. All other
> processes are only taking about 300 MB. These results are
> easily replicated by compiling and running the source code
> that I provided. Don't use the -O3 switch.

> I want to know exactly why this is occurring and thus the
> options that I have to alter this behavior besides buying
> more RAM.

I couldn't replicate it. But since you can, why don't you track down
why it's occurring. You could start by profiling the application that
shows high performance and the one that shows low performance
(ideally, one on each side of the threshold, as close as possible) and
see what differences stand out.

It could be an issue with the implementation of std::vector.

DS
0
David
3/26/2010 7:27:10 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
> How do you turn off swapping completely? I will try it. 

Either find all the places in the startup scripts where it's turned on
and remove them, or execute

swapoff -a
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/26/2010 7:32:04 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
> news:yw1xmxxusxg6.fsf@unicorn.mansr.com...
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>>> The issue is that on a machine with 2 GB of RAM the performance of
>>> the benchmark drops from 90 MB / sec to 17 MB /sec when memory
>>> allocation exceeds 450 MB. All other processes are only taking
>>> about 300 MB. These results are easily replicated by compiling and
>>> running the source code that I provided. Don't use the -O3 switch.
>>>
>>> I want to know exactly why this is occurring and thus the options
>>> that I have to alter this behavior besides buying more RAM.
>>
>> Have you examined the machine code generated in each case?
>
> What would be the point of that?

To make sure the compiler didn't do something unexpected.

> The only difference between the performance in the first 
> case and the performance in the second case is the effect of 
> the change of a single constant;
> const unsigned int size = 100000000;
> const unsigned int size = 200000000;

If the compiler knows the size of the array, it may well choose to do
things entirely differently depending on the size.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 7:32:41 PM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1xeij6swti.fsf@unicorn.mansr.com...
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>> news:yw1xmxxusxg6.fsf@unicorn.mansr.com...
>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>>> The issue is that on a machine with 2 GB of RAM the 
>>>> performance of
>>>> the benchmark drops from 90 MB / sec to 17 MB /sec when 
>>>> memory
>>>> allocation exceeds 450 MB. All other processes are only 
>>>> taking
>>>> about 300 MB. These results are easily replicated by 
>>>> compiling and
>>>> running the source code that I provided. Don't use 
>>>> the -O3 switch.
>>>>
>>>> I want to know exactly why this is occurring and thus 
>>>> the options
>>>> that I have to alter this behavior besides buying more 
>>>> RAM.
>>>
>>> Have you examined the machine code generated in each 
>>> case?
>>
>> What would be the point of that?
>
> To make sure the compiler didn't do something unexpected.
>
>> The only difference between the performance in the first
>> case and the performance in the second case is the effect 
>> of
>> the change of a single constant;
>> const unsigned int size = 100000000;
>> const unsigned int size = 200000000;
>
> If the compiler knows the size of the array, it may well 
> choose to do
> things entirely differently depending on the size.

Its not an array its a std::vector, thus dynamically rather 
than statically allocated.

>
> -- 
> M�ns Rullg�rd
> mans@mansr.com 


0
Peter
3/26/2010 7:38:24 PM
"Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message 
news:1bsk7mc217.fsf@snowball.wb.pfeifferfamily.net...
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>> How do you turn off swapping completely? I will try it.
>
> Either find all the places in the startup scripts where 
> it's turned on
> and remove them, or execute
>
> swapoff -a

I tried that and it didn't help. Same behavior right at the 
450 MB limit.

> -- 
> As we enjoy great advantages from the inventions of 
> others, we should
> be glad of an opportunity to serve others by any invention 
> of ours;
> and this we should do freely and generously. (Benjamin 
> Franklin) 


0
Peter
3/26/2010 7:39:30 PM
If your machine has more memory, then I suspect that the 
threshold may be much higher.
I could test it as dynamic allocation of an array, this 
would eliminate std::vector as the cause.

This is my first professional development on Linux, so I 
don't know how to profile an app.
On Windows a profiler is built into Visual Studio.

"David Schwartz" <davids@webmaster.com> wrote in message 
news:4f29cd26-9176-4410-8489-3081a8fe7a1d@q14g2000prf.googlegroups.com...
On Mar 26, 12:06 pm, "Peter Olcott" <NoS...@OCR4Screen.com> 
wrote:

> The issue is that on a machine with 2 GB of RAM the
> performance of the benchmark drops from 90 MB / sec to 17 
> MB
> /sec when memory allocation exceeds 450 MB. All other
> processes are only taking about 300 MB. These results are
> easily replicated by compiling and running the source code
> that I provided. Don't use the -O3 switch.

> I want to know exactly why this is occurring and thus the
> options that I have to alter this behavior besides buying
> more RAM.

I couldn't replicate it. But since you can, why don't you 
track down
why it's occurring. You could start by profiling the 
application that
shows high performance and the one that shows low 
performance
(ideally, one on each side of the threshold, as close as 
possible) and
see what differences stand out.

It could be an issue with the implementation of std::vector.

DS 


0
Peter
3/26/2010 7:44:09 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
> news:yw1xeij6swti.fsf@unicorn.mansr.com...
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>>> news:yw1xmxxusxg6.fsf@unicorn.mansr.com...
>>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>
>>>>> The issue is that on a machine with 2 GB of RAM the performance
>>>>> of the benchmark drops from 90 MB / sec to 17 MB /sec when
>>>>> memory allocation exceeds 450 MB. All other processes are only
>>>>> taking about 300 MB. These results are easily replicated by
>>>>> compiling and running the source code that I provided. Don't use
>>>>> the -O3 switch.
>>>>>
>>>>> I want to know exactly why this is occurring and thus the
>>>>> options that I have to alter this behavior besides buying more
>>>>> RAM.
>>>>
>>>> Have you examined the machine code generated in each 
>>>> case?
>>>
>>> What would be the point of that?
>>
>> To make sure the compiler didn't do something unexpected.
>>
>>> The only difference between the performance in the first
>>> case and the performance in the second case is the effect 
>>> of
>>> the change of a single constant;
>>> const unsigned int size = 100000000;
>>> const unsigned int size = 200000000;
>>
>> If the compiler knows the size of the array, it may well 
>> choose to do
>> things entirely differently depending on the size.
>
> Its not an array its a std::vector, thus dynamically rather 
> than statically allocated.

Are you certain?  If the size is constant and visible to the compiler,
it may still be able to optimise it statically.

Just check the code already.  It's not that hard.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/26/2010 7:47:50 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message 
> news:1bsk7mc217.fsf@snowball.wb.pfeifferfamily.net...
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>
>>> How do you turn off swapping completely? I will try it.
>>
>> Either find all the places in the startup scripts where 
>> it's turned on
>> and remove them, or execute
>>
>> swapoff -a
>
> I tried that and it didn't help. Same behavior right at the 
> 450 MB limit.

Your program is definitely showing some interesting behavior at that
limit.  You now have at least two reasons (the other being my seeing the
same timing behavior but not seeing any swapping) to conclude that
whatever it is, it isn't swapping.
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/26/2010 7:51:01 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
> news:yw1xeij6swti.fsf@unicorn.mansr.com...
>>
>> If the compiler knows the size of the array, it may well 
>> choose to do
>> things entirely differently depending on the size.
>
> Its not an array its a std::vector, thus dynamically rather 
> than statically allocated.

In order to eliminate the possibility that the problem is in the
<vector> implementation, I replaced it with an array; same behavior
(but, you see, this is the sort of thing you ought to be doing:  testing
hypotheses instead of dogmatically declaring what the problem has to be).
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/26/2010 7:54:17 PM
On Mar 26, 12:44=A0pm, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:

> If your machine has more memory, then I suspect that the
> threshold may be much higher.
> I could test it as dynamic allocation of an array, this
> would eliminate std::vector as the cause.

It may be that Linux changes how aggressive its page ager is based on
how much physical memory is in use, but that's a total guess.

> This is my first professional development on Linux, so I
> don't know how to profile an app.
> On Windows a profiler is built into Visual Studio.

Then 'man gprof' is a good place to start.

DS
0
David
3/26/2010 8:05:00 PM
On Fri, 26 Mar 2010 13:42:51 -0500, Peter Olcott <NoSpam@OCR4Screen.com> wrote:

>"AZ Nomad" <aznomad.3@PremoveOBthisOX.COM> wrote in message 
>news:slrnhqpvir.unu.aznomad.3@ip70-176-155-130.ph.ph.cox.net...
>> On Fri, 26 Mar 2010 10:02:52 -0500, Peter Olcott 
>> <NoSpam@OCR4Screen.com> wrote:
>>
>>>"AZ Nomad" <aznomad.3@PremoveOBthisOX.COM> wrote in 
>>>message
>>>news:slrnhqpiv4.sk5.aznomad.3@ip70-176-155-130.ph.ph.cox.net...
>>>> On Thu, 25 Mar 2010 23:07:26 -0500, Peter Olcott
>>>> <NoSpam@OCR4Screen.com> wrote:
>>>>>I want there to be no page faults for my process. I want
>>>>>to
>>>>>limit the sum total of everything else to half of 
>>>>>physical
>>>>>memory.
>>>>
>>>> turn off swapping.  Linux doesn't need to swap code that
>>>> is available
>>>> from a library, it simply unmaps it knowing it can 
>>>> always
>>>> go back and
>>>> get it.
>>
>>>I like the idea of locking pages better. Turning off
>>>swapping completely might effect the performance and
>>>reliability of the OS too much.
>>
>>
>> I've done it on a couple of big embedded systems (no 
>> normal users,
>> just running a big app like mythtv.)  First was the main 
>> server, with
>> web and mail servers as well in 4G of ram.  Ran fine w/out 
>> swap even
>> though it was a gentoo system doing enormous compiles for 
>> updates.
>> Other is a remote diskless frontent for mythtv w/ 2G of 
>> ram.

>How do you turn off swapping completely? I will try it. 


remove swapfile from fstab, reboot
or interactively with the swapoff <partition> command
0
AZ
3/26/2010 8:22:07 PM
On Fri, 26 Mar 2010 14:39:30 -0500, Peter Olcott <NoSpam@OCR4Screen.com> wrote:

>"Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message 
>news:1bsk7mc217.fsf@snowball.wb.pfeifferfamily.net...
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>
>>> How do you turn off swapping completely? I will try it.
>>
>> Either find all the places in the startup scripts where 
>> it's turned on
>> and remove them, or execute
>>
>> swapoff -a

>I tried that and it didn't help. Same behavior right at the 
>450 MB limit.
that's a hardware issue
0
AZ
3/26/2010 8:22:33 PM
On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>
> "Jacob Sparre Andersen" <sparre@nbi.dk> wrote in message 
> news:871vf7zak4.fsf@hugsarin.sparre-andersen.dk...
>> Peter Olcott wrote:
>>
>>> I want there to be no page faults for my process.

one way is to run from a ramdisk. 

>>> I want to limit the sum total of everything else to half 
>>> of physical
>>> memory.

IIRC there is a way to limit buffers, but the best way is probably to
just keep adding ram until that happens automatically.

> (1) The service provider that I will be using only has a few 
> choices of OS.
> (2) There is some sort of way to lock pages into memory that 
> I read about. 

possibly "the sticky bit"  (seriously that's what it's called)

how come you're worried about page faults but don't seem concerned
about cache misses?

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
0
Jasen
3/26/2010 10:15:12 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> wrote in 
news:eeydnfZhLc_ukDDWnZ2dnUVZ_rKdnZ2d@giganews.com:
> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message 
> news:1bsk7mc217.fsf@snowball.wb.pfeifferfamily.net...
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>> How do you turn off swapping completely? I will try it.
>>
>> Either find all the places in the startup scripts where 
>> it's turned on
>> and remove them, or execute
>>
>> swapoff -a
> 
> I tried that and it didn't help. Same behavior right at the 
> 450 MB limit.

I haven't been able to reproduce your result exactly, but I have played 
with the program enough to have seen bizarre spikes (both directions) at 
different sizes and I have a strong suspicion why this is happening.  

The process of accessing memory in your test routine is controlled by 
the content of the array -- each element tells the program which element 
to access next.  That content is precalculated using repeated calls to 
the pseudo random number generator rand().  

However, because you do not seed the pseudo-random number generator, 
each time you run the program, it produces the exact same sequence of 
values.  The way those random values are distributed across the table 
varies according to the table size ("Data[N] = (rand() * rand()) % 
size;").  

That distribution of element values may be extremely "fair" in terms of 
cache usage or it may be extremely unfair; most of the time it is 
somewhere between these two extremes.

If it turns out to be very fair, nearly every access to an array member 
has to be loaded into L1 cache from main memory (because it's not 
already in cache), which is relatively slow.  

If it turns out very unfair, then you may end up with many elements of 
the table that point to other elements in such a way that few elements 
are accessed which are not already in cache.  In the extreme case, you 
may end up with a cycle (element A points to element B points to element 
C points to element D points to element A...).  In this case, the 
reported "data access" rates will be unrealistically high; after the 
first visit to each in the sequence element, you will never experience 
another cache miss.

To summarize: your problem is likely "cache misses" not "page faults".  


GH
0
Gil
3/26/2010 10:18:49 PM
On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>
> "Joe Beanfish" <joe@nospam.duh> wrote in message 
> news:hoipa5$uvm@news.thunderstone.com...
>> On 03/26/10 12:18, Måns Rullgård wrote:
>>> "Peter Olcott"<NoSpam@OCR4Screen.com>  writes:
>>>
>>>>>> Here is the C++ source
>>>>>
>>>>> Very funny.
>>>>>
>>>>>> #include<stdio.h>
>>>>>> #include<stdlib.h>
>>>>>> #include<vector>
>>>>>> #include<time.h>
>>>>>>
>>>>>> //#define uint32 unsigned int
>>>>>> typedef unsigned int uint32;
>>>>>>
>>>>>> const uint32 size   = 100000000;
>>>>>> std::vector<uint32>  Data;
>>>>>> uint32 Max = 0x3fffffff;
>>>>>>
>>>>>> double Process() {
>>>>>>    clock_t finish;
>>>>>>    clock_t start = clock();
>>>>>>    double duration;
>>>>>>    uint32 num = 0;
>>>>>>    for (uint32 N = 0; N<  Max; N++)
>>>>>>      num = Data[num];
>>>>>>    finish = clock();
>>>>>>    duration = (double)(finish - start) / 
>>>>>> CLOCKS_PER_SEC;
>>>>>>    return duration;
>>>>>> }
>>>>>
>>>>> Even gcc manages to optimise this entire loop into 
>>>>> nothing.  Your
>>>>> test is meaningless.
>>>>
>>>> Actually this little test program accurately models the
>>>> behavior of my deterministic finite automaton OCR
>>>> recognizing engine, and the loop is not optimized into
>>>> nothing.
>>>
>>> I compiled your test app.  There is no loop.  If your 
>>> compiler creates
>>> a loop, it is truly an awful compiler.
>>
>> Or the optimizer turned off?
>
> The optimizer should not change the semantics enough to 
> optimize away this loop. 

why not?  the two versions funtions behave identically.
(with the exception of the possibility of undefined behavior in the
original, but it's legal to optimise that out)

The value of num not used after the loop :
it is not tested, returned, output, or stored in a static or global 
variable after the loop.

thefore  " num = Data[num];" 
is by induction a pointless operation and can be removed.
therfore the loop is empty
therfore the loop is unneeded.




--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
0
Jasen
3/26/2010 10:35:16 PM
On Fri, 26 Mar 2010 19:06:55 UTC in comp.os.linux.development.system, "Peter 
Olcott" <NoSpam@OCR4Screen.com> wrote:

> The issue is that on a machine with 2 GB of RAM the 
> performance of the benchmark drops from 90 MB / sec to 17 MB 
> /sec when memory allocation exceeds 450 MB. 

Enormous thread, not much information about your hardware.

I took your source and ran it on 2 separate machines: one was a 2005 vintage 
Xeon 3GHz with 2GB RAM and the other a quad core 2.4GHz Q6600 with 8GB RAM. The 
results were odd. On the slow machine I varied the size of the array from 256MB 
to 1GB and got speeds of 56MB/s on 256MB to 16MB/s on 1GB. I played around a bit
and found that there was a cut-off point and that cut off point here was with an
array size of (211*1024*1024) elements.

On the quad core machine I varied the size of the array from 400MB to 4GB and 
the speeds were 87MB/s to 92MB/s. Once I got  to 5GB then I saw a drop off from 
~90MB/s to ~50MB/s.

Since you are walking the array all over it at random, I wonder if you are 
seeing some artifact of your hardware's ability to cache RAM. I changed your 
code to walk the array sequentially instead and then I get consistent rates on 
the quad core machine of 138MB/s even if I use an array of 7GB in size.

-- 
Trevor Hemsley, Brighton, UK
Trevor dot Hemsley at ntlworld dot com
0
Trevor
3/26/2010 11:46:53 PM
Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:

> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
>> news:yw1xeij6swti.fsf@unicorn.mansr.com...
>>>
>>> If the compiler knows the size of the array, it may well choose to
>>> do things entirely differently depending on the size.
>>
>> Its not an array its a std::vector, thus dynamically rather than
>> statically allocated.
>
> In order to eliminate the possibility that the problem is in the
> <vector> implementation, I replaced it with an array; same behavior
> (but, you see, this is the sort of thing you ought to be doing:
> testing hypotheses instead of dogmatically declaring what the
> problem has to be).

I never so much as speculated over the real cause of the problems the
OP is allegedly experiencing.  I merely pointed out two facts about
his test:

1) The compiler optimises it away completely, thereby obviously
   rendering the results meaningless.  If compiled without
   optimisations so as to avoid this, it will no longer accurately
   reflect the behaviour of the real application, again providing
   meaningless results.

2) The bandwidth figures the OP quoted are off by several orders of
   magnitude from what would be expected of a modern system.  Even
   main memory can provide several gigabytes per second these days.
   This suggests there is something fishy elsewhere, which is why I
   suggested looking at the actual machine code to see what really is
   being executed.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/27/2010 12:49:56 AM
Gil Hamilton <gil_hamilton@hotmail.com> writes:
>
> I haven't been able to reproduce your result exactly, but I have played 
> with the program enough to have seen bizarre spikes (both directions) at 
> different sizes and I have a strong suspicion why this is happening.  
>
> The process of accessing memory in your test routine is controlled by 
> the content of the array -- each element tells the program which element 
> to access next.  That content is precalculated using repeated calls to 
> the pseudo random number generator rand().  
>
> However, because you do not seed the pseudo-random number generator, 
> each time you run the program, it produces the exact same sequence of 
> values.  The way those random values are distributed across the table 
> varies according to the table size ("Data[N] = (rand() * rand()) % 
> size;").  

Ah, this seems like an extremely likely guess.
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/27/2010 12:57:23 AM
M�ns Rullg�rd <mans@mansr.com> writes:

> Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:
>
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
>>> news:yw1xeij6swti.fsf@unicorn.mansr.com...
>>>>
>>>> If the compiler knows the size of the array, it may well choose to
>>>> do things entirely differently depending on the size.
>>>
>>> Its not an array its a std::vector, thus dynamically rather than
>>> statically allocated.
>>
>> In order to eliminate the possibility that the problem is in the
>> <vector> implementation, I replaced it with an array; same behavior
>> (but, you see, this is the sort of thing you ought to be doing:
>> testing hypotheses instead of dogmatically declaring what the
>> problem has to be).
>
> I never so much as speculated over the real cause of the problems the
> OP is allegedly experiencing.  I merely pointed out two facts about
> his test:

The "you" I was adressing was the OP, not you M�ns.  He's the one who
ought to be conducting experiments rather than insisting on what has to
be happening.

> 1) The compiler optimises it away completely, thereby obviously
>    rendering the results meaningless.  If compiled without
>    optimisations so as to avoid this, it will no longer accurately
>    reflect the behaviour of the real application, again providing
>    meaningless results.

The main thing his test program is doing is making a lot of random
memory accesses through the array.  If his real application is doing the
same thing (but actually using the result so it doesn't get optimized
out of existence), his test program is accurate.

> 2) The bandwidth figures the OP quoted are off by several orders of
>    magnitude from what would be expected of a modern system.  Even
>    main memory can provide several gigabytes per second these days.
>    This suggests there is something fishy elsewhere, which is why I
>    suggested looking at the actual machine code to see what really is
>    being executed.

What's probably fishy there -- and I haven't checked -- is with his
arithmetic.  But there is a huge increase in wall-clock time as well, so
I figure that's what's important, not his actual numbers.
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/27/2010 1:03:56 AM
Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:

> M�ns Rullg�rd <mans@mansr.com> writes:
>
>> Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:
>>
>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>
>>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message 
>>>> news:yw1xeij6swti.fsf@unicorn.mansr.com...
>>>>>
>>>>> If the compiler knows the size of the array, it may well choose to
>>>>> do things entirely differently depending on the size.
>>>>
>>>> Its not an array its a std::vector, thus dynamically rather than
>>>> statically allocated.
>>>
>>> In order to eliminate the possibility that the problem is in the
>>> <vector> implementation, I replaced it with an array; same behavior
>>> (but, you see, this is the sort of thing you ought to be doing:
>>> testing hypotheses instead of dogmatically declaring what the
>>> problem has to be).
>>
>> I never so much as speculated over the real cause of the problems the
>> OP is allegedly experiencing.  I merely pointed out two facts about
>> his test:
>
> The "you" I was adressing was the OP, not you M�ns.  He's the one who
> ought to be conducting experiments rather than insisting on what has to
> be happening.

I must have missed a level of quoting somewhere.  Glad we agree.

>> 1) The compiler optimises it away completely, thereby obviously
>>    rendering the results meaningless.  If compiled without
>>    optimisations so as to avoid this, it will no longer accurately
>>    reflect the behaviour of the real application, again providing
>>    meaningless results.
>
> The main thing his test program is doing is making a lot of random
> memory accesses through the array.  If his real application is doing the
> same thing (but actually using the result so it doesn't get optimized
> out of existence), his test program is accurate.

The test could be trivially modified to also use the result, e.g. by
printing the final value or storing it in a (volatile if necessary)
global variable.

>> 2) The bandwidth figures the OP quoted are off by several orders of
>>    magnitude from what would be expected of a modern system.  Even
>>    main memory can provide several gigabytes per second these days.
>>    This suggests there is something fishy elsewhere, which is why I
>>    suggested looking at the actual machine code to see what really is
>>    being executed.
>
> What's probably fishy there -- and I haven't checked -- is with his
> arithmetic.  But there is a huge increase in wall-clock time as well, so
> I figure that's what's important, not his actual numbers.

Any deviation from the expected should be investigated and understood.
Otherwise you simply do not know what you are measuring.

When benchmarking loops it is often risky to replace a variable number
of iterations with a fixed number in a manner visible to the compiler.
Doing so can cause significant changes to how loop unrolling and
similar optimisations are applied.

To ensure the compiler really cannot derive anything specific to the
test case, it is often useful to compile the function being tested
separately from the calling code containing the parameters.  That way
the interesting function is guaranteed to be the same regardless of
variations in test parameters.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/27/2010 2:03:22 AM
"Gil Hamilton" <gil_hamilton@hotmail.com> wrote in message 
news:Xns9D47BA4BE306Dgilhamiltonhotmailco@188.40.43.245...
> "Peter Olcott" <NoSpam@OCR4Screen.com> wrote in
> news:eeydnfZhLc_ukDDWnZ2dnUVZ_rKdnZ2d@giganews.com:
>> "Joe Pfeiffer" <pfeiffer@cs.nmsu.edu> wrote in message
>> news:1bsk7mc217.fsf@snowball.wb.pfeifferfamily.net...
>>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>>> How do you turn off swapping completely? I will try it.
>>>
>>> Either find all the places in the startup scripts where
>>> it's turned on
>>> and remove them, or execute
>>>
>>> swapoff -a
>>
>> I tried that and it didn't help. Same behavior right at 
>> the
>> 450 MB limit.
>
> I haven't been able to reproduce your result exactly, but 
> I have played
> with the program enough to have seen bizarre spikes (both 
> directions) at
> different sizes and I have a strong suspicion why this is 
> happening.
>
> The process of accessing memory in your test routine is 
> controlled by
> the content of the array -- each element tells the program 
> which element
> to access next.  That content is precalculated using 
> repeated calls to
> the pseudo random number generator rand().
>
> However, because you do not seed the pseudo-random number 
> generator,
> each time you run the program, it produces the exact same 
> sequence of
> values.  The way those random values are distributed 
> across the table
> varies according to the table size ("Data[N] = (rand() * 
> rand()) %
> size;").
>
> That distribution of element values may be extremely 
> "fair" in terms of
> cache usage or it may be extremely unfair; most of the 
> time it is
> somewhere between these two extremes.
>
> If it turns out to be very fair, nearly every access to an 
> array member
> has to be loaded into L1 cache from main memory (because 
> it's not
> already in cache), which is relatively slow.
>
> If it turns out very unfair, then you may end up with many 
> elements of
> the table that point to other elements in such a way that 
> few elements
> are accessed which are not already in cache.  In the 
> extreme case, you
> may end up with a cycle (element A points to element B 
> points to element
> C points to element D points to element A...).  In this 
> case, the
> reported "data access" rates will be unrealistically high; 
> after the
> first visit to each in the sequence element, you will 
> never experience
> another cache miss.
>
> To summarize: your problem is likely "cache misses" not 
> "page faults".
>
>
> GH

I have noted your findings and will re-adjust the algorithm 
and make much more exhaustive tests. 


0
Peter
3/27/2010 3:31:09 AM
"Jasen Betts" <jasen@xnet.co.nz> wrote in message 
news:hojcr4$fbu$2@reversiblemaps.ath.cx...
> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>>
>> "Joe Beanfish" <joe@nospam.duh> wrote in message
>> news:hoipa5$uvm@news.thunderstone.com...
>>> On 03/26/10 12:18, M�ns Rullg�rd wrote:
>>>> "Peter Olcott"<NoSpam@OCR4Screen.com>  writes:
>>>>
>>>>>>> Here is the C++ source
>>>>>>
>>>>>> Very funny.
>>>>>>
>>>>>>> #include<stdio.h>
>>>>>>> #include<stdlib.h>
>>>>>>> #include<vector>
>>>>>>> #include<time.h>
>>>>>>>
>>>>>>> //#define uint32 unsigned int
>>>>>>> typedef unsigned int uint32;
>>>>>>>
>>>>>>> const uint32 size   = 100000000;
>>>>>>> std::vector<uint32>  Data;
>>>>>>> uint32 Max = 0x3fffffff;
>>>>>>>
>>>>>>> double Process() {
>>>>>>>    clock_t finish;
>>>>>>>    clock_t start = clock();
>>>>>>>    double duration;
>>>>>>>    uint32 num = 0;
>>>>>>>    for (uint32 N = 0; N<  Max; N++)
>>>>>>>      num = Data[num];
>>>>>>>    finish = clock();
>>>>>>>    duration = (double)(finish - start) /
>>>>>>> CLOCKS_PER_SEC;
>>>>>>>    return duration;
>>>>>>> }
>>>>>>
>>>>>> Even gcc manages to optimise this entire loop into
>>>>>> nothing.  Your
>>>>>> test is meaningless.
>>>>>
>>>>> Actually this little test program accurately models 
>>>>> the
>>>>> behavior of my deterministic finite automaton OCR
>>>>> recognizing engine, and the loop is not optimized into
>>>>> nothing.
>>>>
>>>> I compiled your test app.  There is no loop.  If your
>>>> compiler creates
>>>> a loop, it is truly an awful compiler.
>>>
>>> Or the optimizer turned off?
>>
>> The optimizer should not change the semantics enough to
>> optimize away this loop.
>
> why not?  the two versions funtions behave identically.
> (with the exception of the possibility of undefined 
> behavior in the
> original, but it's legal to optimise that out)
>
> The value of num not used after the loop :
> it is not tested, returned, output, or stored in a static 
> or global
> variable after the loop.
>
> thefore  " num = Data[num];"
> is by induction a pointless operation and can be removed.
> therfore the loop is empty
> therfore the loop is unneeded.

Therefore it does not do what I need it to do therefore the 
optimization is erroneous.
>
>
>
>
> --- news://freenews.netfront.net/ - complaints: 
> news@netfront.net --- 


0
Peter
3/27/2010 3:32:26 AM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1x634isi4r.fsf@unicorn.mansr.com...
> Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:
>
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>> "M�ns Rullg�rd" <mans@mansr.com> wrote in message
>>> news:yw1xeij6swti.fsf@unicorn.mansr.com...
>>>>
>>>> If the compiler knows the size of the array, it may 
>>>> well choose to
>>>> do things entirely differently depending on the size.
>>>
>>> Its not an array its a std::vector, thus dynamically 
>>> rather than
>>> statically allocated.
>>
>> In order to eliminate the possibility that the problem is 
>> in the
>> <vector> implementation, I replaced it with an array; 
>> same behavior
>> (but, you see, this is the sort of thing you ought to be 
>> doing:
>> testing hypotheses instead of dogmatically declaring what 
>> the
>> problem has to be).
>
> I never so much as speculated over the real cause of the 
> problems the
> OP is allegedly experiencing.  I merely pointed out two 
> facts about
> his test:
>
> 1) The compiler optimises it away completely, thereby 
> obviously
>   rendering the results meaningless.  If compiled without
>   optimisations so as to avoid this, it will no longer 
> accurately
>   reflect the behaviour of the real application, again 
> providing
>   meaningless results.
>
> 2) The bandwidth figures the OP quoted are off by several 
> orders of
>   magnitude from what would be expected of a modern 
> system.  Even
>   main memory can provide several gigabytes per second 
> these days.
>   This suggests there is something fishy elsewhere, which 
> is why I
>   suggested looking at the actual machine code to see what 
> really is
>   being executed.

Actually if you go take a look at
  comp.hardware or
  alt.comp.hardware
you will see why Paul explains why this is not so.

There are very few postings there so it is very easy to 
find.

>
> -- 
> M�ns Rullg�rd
> mans@mansr.com 


0
Peter
3/27/2010 3:38:22 AM
>
>
> for (uint32 N = 0; N < Max; N++)
>     num = Data[num];
>
Your program is buggy.  It's not actually calculating the metric that it 
is purporting to print out.

> uint32 Random = rand() * rand();
>
Not random enough the first time, was it?  (-:

0
Jonathan
3/27/2010 5:31:55 AM
>
>>
>> The value of num not used after the loop :
>> it is not tested, returned, output, or stored in a static or global 
>> variable after the loop.
>> thefore " num = Data[num];" is by induction a pointless operation and 
>> can be removed.
>> therfore the loop is empty
>> therfore the loop is unneeded.
>>
> Therefore it does not do what I need it to do therefore 
> the optimization is erroneous.
>
No.  The correct final inference is that your understanding of the C++ 
programming language is erroneous. You need to learn about the C++ 
concept of observable side-effects.  Go and read about them.  If you 
have not expressed "what I need it to do" as an actual observable 
side-effect of the program, then you have not written your program 
correctly for the C++ language.

0
Jonathan
3/27/2010 5:45:41 AM
"Jonathan de Boyne Pollard" 
<J.deBoynePollard-newsgroups@NTLWorld.COM> wrote in message 
news:IU.D20100327.T053209.P2468.Q0@J.de.Boyne.Pollard.localhost...
> >
>>
>> for (uint32 N = 0; N < Max; N++)
>>     num = Data[num];
>>
> Your program is buggy.  It's not actually calculating the 
> metric that it is purporting to print out.
>
>> uint32 Random = rand() * rand();
>>
> Not random enough the first time, was it?  (-:
>
RAND_MAX = 32767
A single invocation of rand() would not by itself provide 
random enough coverage over size when size is much greater 
than 32767. 


0
Peter
3/27/2010 6:44:55 AM
On 2010-03-27, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>
> "Jonathan de Boyne Pollard" 
><J.deBoynePollard-newsgroups@NTLWorld.COM> wrote in message 
> news:IU.D20100327.T053209.P2468.Q0@J.de.Boyne.Pollard.localhost...
>> >
>>>
>>> for (uint32 N = 0; N < Max; N++)
>>>     num = Data[num];
>>>
>> Your program is buggy.  It's not actually calculating the 
>> metric that it is purporting to print out.
>>
>>> uint32 Random = rand() * rand();
>>>
>> Not random enough the first time, was it?  (-:
>>
> RAND_MAX = 32767
> A single invocation of rand() would not by itself provide 
> random enough coverage over size when size is much greater 
> than 32767. 

I assumed you were doing that because you didn't want a uniform distribution.
(no primes above 32767, denser near the middle, etc...)

but it's worse than that  rand()*rand() has (approximately) a 1 in 16384.25
chance of producing the value 0

so your attempt to improve rand() has actually made it worse.

this means your test program is likely to find a loop starting at 0
and going for a few tens of thousands of steps before repeating.

you could do this

 uint32 Random = rand() * (RAND_MAX+1) + rand();

but /dev/urandom is likely to serve you better if you want more randomness.
(or dev random if you need truly random and are prepared to wait for it
to arrive).

long longrand()
{
        static int fd = -1;
        long rv;        
        if( fd < 0)
          fd=open("/dev/urandom",O_RDONLY);
        if( fd<0 || read(fd,&rv,sizeof rv) < sizeof rv)
                {
                if( fd>=0)close(fd);
                fd=-1;
                return -1;
                }
        return rv & 0x7fffffff;
}


--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
0
Jasen
3/27/2010 8:46:35 AM
On 26/03/10 18:42, Peter Olcott wrote:
[...]
> How do you turn off swapping completely? I will try it. 

This may not help as much as you think --- on Linux, the swapper is used
for all sorts of stuff, including loading code from disk. Removing any
swap partitions will merely stop it from swapping RSS pages to disk. If
the kernel thinks it's running out of memory, it'll still discard code
pages, and swap them back in again when you try to run them.

Swap issues almost certainly not what the problem is here. Of course, I
don't know what *is*, but I'm pretty sure swap isn't it.

-- 
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────
│
│ "In the beginning was the word.
│ And the word was: Content-type: text/plain" --- Unknown sage
0
David
3/27/2010 12:14:09 PM
On 27 Mar, 06:44, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:
> "Jonathan de Boyne Pollard"<J.deBoynePollard-newsgro...@NTLWorld.COM> wro=
te in message
>
> news:IU.D20100327.T053209.P2468.Q0@J.de.Boyne.Pollard.localhost...
>
> >> for (uint32 N =3D 0; N < Max; N++)
> >> =A0 =A0 num =3D Data[num];
>
> > Your program is buggy. =A0It's not actually calculating the
> > metric that it is purporting to print out.
>
> >> uint32 Random =3D rand() * rand();
>
> > Not random enough the first time, was it? =A0(-:
>
> RAND_MAX =3D 32767
> A single invocation of rand() would not by itself provide
> random enough coverage over size when size is much greater
> than 32767.
>
However rand() * rand() will give you a binomial distribution, which
is unlikely to be what you want.
 rand() * (RAND_MAX + 1) + rand();
is the uniform distribution.

0
Dr
3/27/2010 12:21:03 PM
Peter Olcott wrote:
> "Jasen Betts" <jasen@xnet.co.nz> wrote in message 
> news:hojcr4$fbu$2@reversiblemaps.ath.cx...
>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>>> "Joe Beanfish" <joe@nospam.duh> wrote in message
>>> news:hoipa5$uvm@news.thunderstone.com...
>>>> On 03/26/10 12:18, M�ns Rullg�rd wrote:
>>>>> "Peter Olcott"<NoSpam@OCR4Screen.com>  writes:
>>>>>
>>>>>>>> Here is the C++ source
>>>>>>> Very funny.
>>>>>>>
>>>>>>>> #include<stdio.h>
>>>>>>>> #include<stdlib.h>
>>>>>>>> #include<vector>
>>>>>>>> #include<time.h>
>>>>>>>>
>>>>>>>> //#define uint32 unsigned int
>>>>>>>> typedef unsigned int uint32;
>>>>>>>>
>>>>>>>> const uint32 size   = 100000000;
>>>>>>>> std::vector<uint32>  Data;
>>>>>>>> uint32 Max = 0x3fffffff;
>>>>>>>>
>>>>>>>> double Process() {
>>>>>>>>    clock_t finish;
>>>>>>>>    clock_t start = clock();
>>>>>>>>    double duration;
>>>>>>>>    uint32 num = 0;
>>>>>>>>    for (uint32 N = 0; N<  Max; N++)
>>>>>>>>      num = Data[num];
>>>>>>>>    finish = clock();
>>>>>>>>    duration = (double)(finish - start) /
>>>>>>>> CLOCKS_PER_SEC;
>>>>>>>>    return duration;
>>>>>>>> }
>>>>>>> Even gcc manages to optimise this entire loop into
>>>>>>> nothing.  Your
>>>>>>> test is meaningless.
>>>>>> Actually this little test program accurately models 
>>>>>> the
>>>>>> behavior of my deterministic finite automaton OCR
>>>>>> recognizing engine, and the loop is not optimized into
>>>>>> nothing.
>>>>> I compiled your test app.  There is no loop.  If your
>>>>> compiler creates
>>>>> a loop, it is truly an awful compiler.
>>>> Or the optimizer turned off?
>>> The optimizer should not change the semantics enough to
>>> optimize away this loop.
>> why not?  the two versions funtions behave identically.
>> (with the exception of the possibility of undefined 
>> behavior in the
>> original, but it's legal to optimise that out)
>>
>> The value of num not used after the loop :
>> it is not tested, returned, output, or stored in a static 
>> or global
>> variable after the loop.
>>
>> thefore  " num = Data[num];"
>> is by induction a pointless operation and can be removed.
>> therfore the loop is empty
>> therfore the loop is unneeded.
> 
> Therefore it does not do what I need it to do therefore the 
> optimization is erroneous.
>>

You should be able to disable the optimization. "-O0" maybe?
0
Peter
3/27/2010 12:42:03 PM
Dr Malcolm McLean wrote:
> 
> On 27 Mar, 06:44, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:
> > "Jonathan de Boyne Pollard"<J.deBoynePollard-newsgro...@NTLWorld.COM> wrote in message
> >
> > news:IU.D20100327.T053209.P2468.Q0@J.de.Boyne.Pollard.localhost...
> >
> > >> for (uint32 N = 0; N < Max; N++)
> > >>     num = Data[num];
> >
> > > Your program is buggy.  It's not actually calculating the
> > > metric that it is purporting to print out.
> >
> > >> uint32 Random = rand() * rand();
> >
> > > Not random enough the first time, was it?  (-:
> >
> > RAND_MAX = 32767
> > A single invocation of rand() would not by itself provide
> > random enough coverage over size when size is much greater
> > than 32767.
> >
> However rand() * rand() will give you a binomial distribution, which
> is unlikely to be what you want.
>  rand() * (RAND_MAX + 1) + rand();
> is the uniform distribution.

What makes you think that (RAND_MAX + 1) is defined?

-- 
pete
0
pete
3/27/2010 12:57:53 PM
On 3/27/2010 8:57 AM, pete wrote:
> Dr Malcolm McLean wrote:
>> [...]
>> However rand() * rand() will give you a binomial distribution, which
>> is unlikely to be what you want.
>>   rand() * (RAND_MAX + 1) + rand();
>> is the uniform distribution.
>
> What makes you think that (RAND_MAX + 1) is defined?

     And what makes him think that rand()*rand() has a binomial
distribution, even if it doesn't overflow?

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Eric
3/27/2010 1:35:04 PM
"Jasen Betts" <jasen@xnet.co.nz> wrote in message 
news:hokglb$ked$1@reversiblemaps.ath.cx...
> On 2010-03-27, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>>
>> "Jonathan de Boyne Pollard"
>><J.deBoynePollard-newsgroups@NTLWorld.COM> wrote in 
>>message
>> news:IU.D20100327.T053209.P2468.Q0@J.de.Boyne.Pollard.localhost...
>>> >
>>>>
>>>> for (uint32 N = 0; N < Max; N++)
>>>>     num = Data[num];
>>>>
>>> Your program is buggy.  It's not actually calculating 
>>> the
>>> metric that it is purporting to print out.
>>>
>>>> uint32 Random = rand() * rand();
>>>>
>>> Not random enough the first time, was it?  (-:
>>>
>> RAND_MAX = 32767
>> A single invocation of rand() would not by itself provide
>> random enough coverage over size when size is much 
>> greater
>> than 32767.
>
> I assumed you were doing that because you didn't want a 
> uniform distribution.
> (no primes above 32767, denser near the middle, etc...)

No I did this so that spatial locality could not as easily 
improve the cache hit ratio.

>
> but it's worse than that  rand()*rand() has 
> (approximately) a 1 in 16384.25
> chance of producing the value 0
>
> so your attempt to improve rand() has actually made it 
> worse.
>
> this means your test program is likely to find a loop 
> starting at 0
> and going for a few tens of thousands of steps before 
> repeating.
>
> you could do this
>
> uint32 Random = rand() * (RAND_MAX+1) + rand();

OK, I may try this. the results don't look random enough 
right now.

>
> but /dev/urandom is likely to serve you better if you want 
> more randomness.
> (or dev random if you need truly random and are prepared 
> to wait for it
> to arrive).

That may slow things down too much, if I want true random I 
an get a hardware random number generator dongle. For my 
purposes I only need the numbers to be very widely 
dispersed. I could generated my own pseudo random numbers 
based on a lookup table.

>
> long longrand()
> {
>        static int fd = -1;
>        long rv;
>        if( fd < 0)
>          fd=open("/dev/urandom",O_RDONLY);
>        if( fd<0 || read(fd,&rv,sizeof rv) < sizeof rv)
>                {
>                if( fd>=0)close(fd);
>                fd=-1;
>                return -1;
>                }
>        return rv & 0x7fffffff;
> }
>
>
> --- news://freenews.netfront.net/ - complaints: 
> news@netfront.net --- 


0
Peter
3/27/2010 2:21:37 PM
"David Given" <dg@cowlark.com> wrote in message 
news:hoksqh$184$1@news.eternal-september.org...
> On 26/03/10 18:42, Peter Olcott wrote:
> [...]
>> How do you turn off swapping completely? I will try it.
>
> This may not help as much as you think --- on Linux, the 
> swapper is used
> for all sorts of stuff, including loading code from disk. 
> Removing any
> swap partitions will merely stop it from swapping RSS 
> pages to disk. If
> the kernel thinks it's running out of memory, it'll still 
> discard code
> pages, and swap them back in again when you try to run 
> them.
>
> Swap issues almost certainly not what the problem is here. 
> Of course, I
> don't know what *is*, but I'm pretty sure swap isn't it.

More extensive testing might show what the problem is.

>
> -- 
> ???? dg@cowlark.com ????? http://www.cowlark.com ?????
> ?
> ? "In the beginning was the word.
> ? And the word was: Content-type: text/plain" --- Unknown 
> sage 


0
Peter
3/27/2010 2:22:50 PM
OK this is changed now.

"Dr Malcolm McLean" <malcolm.mclean5@btinternet.com> wrote 
in message 
news:4f098cf5-f4b8-4fdf-b65f-077d724cb8cf@z3g2000yqz.googlegroups.com...
On 27 Mar, 06:44, "Peter Olcott" <NoS...@OCR4Screen.com> 
wrote:
> "Jonathan de Boyne 
> Pollard"<J.deBoynePollard-newsgro...@NTLWorld.COM> wrote 
> in message
>
> RAND_MAX = 32767
> A single invocation of rand() would not by itself provide
> random enough coverage over size when size is much greater
> than 32767.
>
However rand() * rand() will give you a binomial 
distribution, which
is unlikely to be what you want.
 rand() * (RAND_MAX + 1) + rand();
is the uniform distribution.


0
Peter
3/27/2010 2:37:33 PM
"Peter Flass" <Peter_Flass@Yahoo.com> wrote in message 
news:hokuep$afb$3@news.eternal-september.org...
> Peter Olcott wrote:

>>> thefore  " num = Data[num];"
>>> is by induction a pointless operation and can be 
>>> removed.
>>> therfore the loop is empty
>>> therfore the loop is unneeded.
>>
>> Therefore it does not do what I need it to do therefore 
>> the optimization is erroneous.
>>>
>
> You should be able to disable the optimization. "-O0" 
> maybe?

I simply did not provide any command line switches and it 
worked fine. 


0
Peter
3/27/2010 2:38:43 PM
On 2010-03-27, Peter Olcott <NoSpam@OCR4Screen.com> wrote:

>>> The optimizer should not change the semantics enough to
>>> optimize away this loop.
>>
>> why not?  the two versions funtions behave identically.
>> (with the exception of the possibility of undefined 
>> behavior in the
>> original, but it's legal to optimise that out)
>>
>> The value of num not used after the loop :
>> it is not tested, returned, output, or stored in a static 
>> or global
>> variable after the loop.
>>
>> thefore  " num = Data[num];"
>> is by induction a pointless operation and can be removed.
>> therfore the loop is empty
>> therfore the loop is unneeded.
>
> Therefore it does not do what I need it to do therefore the 
> optimization is erroneous.

So expect the compiler to "do what you need it to do" rather than
correctly compile the code you write according to the language
standard?

You're headed for a lifetime of frustration and failure...

-- 
Grant
0
Grant
3/27/2010 2:49:14 PM
On 2010-03-27, Peter Flass <Peter_Flass@Yahoo.com> wrote:
> Peter Olcott wrote:
>> "Jasen Betts" <jasen@xnet.co.nz> wrote in message 
>> news:hojcr4$fbu$2@reversiblemaps.ath.cx...
>>> On 2010-03-26, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>>>> "Joe Beanfish" <joe@nospam.duh> wrote in message
>>>> news:hoipa5$uvm@news.thunderstone.com...
>>>>> On 03/26/10 12:18, M�ns Rullg�rd wrote:
>>>>>> "Peter Olcott"<NoSpam@OCR4Screen.com>  writes:
>>>>>>
>>>>>>>>> Here is the C++ source
>>>>>>>> Very funny.
>>>>>>>>
>>>>>>>>> #include<stdio.h>
>>>>>>>>> #include<stdlib.h>
>>>>>>>>> #include<vector>
>>>>>>>>> #include<time.h>
>>>>>>>>>
>>>>>>>>> //#define uint32 unsigned int
>>>>>>>>> typedef unsigned int uint32;
>>>>>>>>>
>>>>>>>>> const uint32 size   = 100000000;
>>>>>>>>> std::vector<uint32>  Data;
>>>>>>>>> uint32 Max = 0x3fffffff;
>>>>>>>>>
>>>>>>>>> double Process() {
>>>>>>>>>    clock_t finish;
>>>>>>>>>    clock_t start = clock();
>>>>>>>>>    double duration;
>>>>>>>>>    uint32 num = 0;
>>>>>>>>>    for (uint32 N = 0; N<  Max; N++)
>>>>>>>>>      num = Data[num];
>>>>>>>>>    finish = clock();
>>>>>>>>>    duration = (double)(finish - start) /
>>>>>>>>> CLOCKS_PER_SEC;
>>>>>>>>>    return duration;
>>>>>>>>> }
>>>>>>>> Even gcc manages to optimise this entire loop into
>>>>>>>> nothing.  Your
>>>>>>>> test is meaningless.
>>>>>>> Actually this little test program accurately models 
>>>>>>> the
>>>>>>> behavior of my deterministic finite automaton OCR
>>>>>>> recognizing engine, and the loop is not optimized into
>>>>>>> nothing.
>>>>>> I compiled your test app.  There is no loop.  If your
>>>>>> compiler creates
>>>>>> a loop, it is truly an awful compiler.
>>>>> Or the optimizer turned off?
>>>> The optimizer should not change the semantics enough to
>>>> optimize away this loop.
>>> why not?  the two versions funtions behave identically.
>>> (with the exception of the possibility of undefined 
>>> behavior in the
>>> original, but it's legal to optimise that out)
>>>
>>> The value of num not used after the loop :
>>> it is not tested, returned, output, or stored in a static 
>>> or global
>>> variable after the loop.
>>>
>>> thefore  " num = Data[num];"
>>> is by induction a pointless operation and can be removed.
>>> therfore the loop is empty
>>> therfore the loop is unneeded.
>> 
>> Therefore it does not do what I need it to do therefore the 
>> optimization is erroneous.
>
> You should be able to disable the optimization.

That's not guaranteed by the language specification.

> "-O0" maybe?

Better that mucking about with compiler flags in an attempt to get
lucky and stumble over a program that does "what you need it to do" --
write a correct program.

Serously.

-- 
Grant
0
Grant
3/27/2010 2:51:22 PM
In comp.os.linux.development.apps Peter Olcott <NoSpam@ocr4screen.com> wrote:

> "Peter Flass" <Peter_Flass@Yahoo.com> wrote in message 
> news:hokuep$afb$3@news.eternal-september.org...
> > Peter Olcott wrote:

> >>> thefore  " num = Data[num];"
> >>> is by induction a pointless operation and can be 
> >>> removed.
> >>> therfore the loop is empty
> >>> therfore the loop is unneeded.
> >>
> >> Therefore it does not do what I need it to do therefore 
> >> the optimization is erroneous.
> >>>
> >
> > You should be able to disable the optimization. "-O0" 
> > maybe?

> I simply did not provide any command line switches and it 
> worked fine. 

It may depend on the default settings of your compiler - g++ has
optomizations switched on per default and when your program is
compiled in this default mode the time spent in the Process()
function is too short to measure with the method you use. Only
with switching optimization off (using '-O0') expicitely some
appreciable time is spent there.

On my machine (also with 2 GB of RAM) I didn't see any pronounced
slow down with increased memory use (at least up to the point were
still enough RAM was available and swapping to the harddisk didn't
start), I got about 23 seconds with 'size' set to 100000000 and
23.9 s for 400000000. So getting more than 3/4 of the avaiable RAM
(1.6 GB) doesn't seem to be a problem. And /proc/meminfo showed
that nearly 1.8 GB were free (even though an Apache web server,
a PostgreSQL server, emacs and a number of other processes where
also running). That may indicate that the effect you're seeing
has some reason not directly related to memory usage, especially
not to the "OS eating up 1.5 GB".

                            Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
3/27/2010 3:25:01 PM
"Grant Edwards" <invalid@invalid.invalid> wrote in message 
news:hol5ta$dre$2@reader1.panix.com...
> On 2010-03-27, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>
>>>> The optimizer should not change the semantics enough to
>>>> optimize away this loop.
>>>
>>> why not?  the two versions funtions behave identically.
>>> (with the exception of the possibility of undefined
>>> behavior in the
>>> original, but it's legal to optimise that out)
>>>
>>> The value of num not used after the loop :
>>> it is not tested, returned, output, or stored in a 
>>> static
>>> or global
>>> variable after the loop.
>>>
>>> thefore  " num = Data[num];"
>>> is by induction a pointless operation and can be 
>>> removed.
>>> therfore the loop is empty
>>> therfore the loop is unneeded.
>>
>> Therefore it does not do what I need it to do therefore 
>> the
>> optimization is erroneous.
>
> So expect the compiler to "do what you need it to do" 
> rather than
> correctly compile the code you write according to the 
> language
> standard?
>
> You're headed for a lifetime of frustration and failure...
>
> -- 
> Grant

I won't bother to try to correct you, others have already 
corrected you on this. 


0
Peter
3/27/2010 5:32:03 PM
Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "Joe Beanfish" <joe@nospam.duh> wrote in message 
>> news:hoipa5$uvm@news.thunderstone.com...
>>>
>>> Or the optimizer turned off?
>>
>> The optimizer should not change the semantics enough to 
>> optimize away this loop. 
>
> Oh, yes it could.  It can easily notice that num is never read after
> it's written

???

>>>>>    for (uint32 N = 0; N<  Max; N++)
>>>>>      num = Data[num];
                     ^^^^^ 
                     that's num being read.

[SNIP - mention of bubblesort in a teaching environment - you 
should be ashamed of yourself contaminating youngsters' minds.]

Phil
-- 
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1
0
Phil
3/27/2010 5:35:10 PM
"Jens Thoms Toerring" <jt@toerring.de> wrote in message 
news:816madF55jU1@mid.uni-berlin.de...
> In comp.os.linux.development.apps Peter Olcott 
> <NoSpam@ocr4screen.com> wrote:
>
>> "Peter Flass" <Peter_Flass@Yahoo.com> wrote in message
>> news:hokuep$afb$3@news.eternal-september.org...
>> > Peter Olcott wrote:
>
>> >>> thefore  " num = Data[num];"
>> >>> is by induction a pointless operation and can be
>> >>> removed.
>> >>> therfore the loop is empty
>> >>> therfore the loop is unneeded.
>> >>
>> >> Therefore it does not do what I need it to do 
>> >> therefore
>> >> the optimization is erroneous.
>> >>>
>> >
>> > You should be able to disable the optimization. "-O0"
>> > maybe?
>
>> I simply did not provide any command line switches and it
>> worked fine.
>
> It may depend on the default settings of your compiler - 
> g++ has
> optomizations switched on per default and when your 
> program is
> compiled in this default mode the time spent in the 
> Process()
> function is too short to measure with the method you use. 
> Only
> with switching optimization off (using '-O0') expicitely 
> some
> appreciable time is spent there.
>
> On my machine (also with 2 GB of RAM) I didn't see any 
> pronounced
> slow down with increased memory use (at least up to the 
> point were
> still enough RAM was available and swapping to the 
> harddisk didn't
> start), I got about 23 seconds with 'size' set to 
> 100000000 and
> 23.9 s for 400000000. So getting more than 3/4 of the 
> avaiable RAM
> (1.6 GB) doesn't seem to be a problem. And /proc/meminfo 
> showed
> that nearly 1.8 GB were free (even though an Apache web 
> server,
> a PostgreSQL server, emacs and a number of other processes 
> where
> also running). That may indicate that the effect you're 
> seeing
> has some reason not directly related to memory usage, 
> especially
> not to the "OS eating up 1.5 GB".
>
>                            Regards, Jens
> -- 
>  \   Jens Thoms Toerring  ___      jt@toerring.de
>   \__________________________      http://toerring.de

I did much more extensive testing and found at least some 
instances where I got really good speed even when using huge 
amounts of RAM. It seem to be that the sequence of the 
random numbers controls the performance. I rewrote the code 
to start with a fixed seed, and then the results (at least 
under Windows) were duplicated when the program was re-run. 
I will run this under Linux now too.

I will also try ways of doing this with optimization turned 
on.  It is already turned on in Windows. The biggest 
question that I have left is why the memory access 
performance is so much worse than the worst case cache hit 
ratio should provide.


0
Peter
3/27/2010 5:39:16 PM
"Grant Edwards" <invalid@invalid.invalid> wrote in message 
news:hol5ta$dre$2@reader1.panix.com...
> On 2010-03-27, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
>
>>>> The optimizer should not change the semantics enough to
>>>> optimize away this loop.
>>>
>>> why not?  the two versions funtions behave identically.
>>> (with the exception of the possibility of undefined
>>> behavior in the
>>> original, but it's legal to optimise that out)
>>>
>>> The value of num not used after the loop :
>>> it is not tested, returned, output, or stored in a 
>>> static
>>> or global
>>> variable after the loop.
>>>
>>> thefore  " num = Data[num];"
>>> is by induction a pointless operation and can be 
>>> removed.
>>> therfore the loop is empty
>>> therfore the loop is unneeded.
>>
>> Therefore it does not do what I need it to do therefore 
>> the
>> optimization is erroneous.
>
> So expect the compiler to "do what you need it to do" 
> rather than
> correctly compile the code you write according to the 
> language
> standard?
>
> You're headed for a lifetime of frustration and failure...
>
> -- 
> Grant

Correct me if I am wrong: Where in the language standard 
does it say to remove the body of my loop? 


0
Peter
3/27/2010 5:40:45 PM
In comp.os.linux.development.apps Peter Olcott <NoSpam@ocr4screen.com> wrote:

> "Grant Edwards" <invalid@invalid.invalid> wrote in message 
> news:hol5ta$dre$2@reader1.panix.com...
> > On 2010-03-27, Peter Olcott <NoSpam@OCR4Screen.com> wrote:
> >
> >>>> The optimizer should not change the semantics enough to
> >>>> optimize away this loop.
> >>>
> >>> why not?  the two versions funtions behave identically.
> >>> (with the exception of the possibility of undefined
> >>> behavior in the
> >>> original, but it's legal to optimise that out)
> >>>
> >>> The value of num not used after the loop :
> >>> it is not tested, returned, output, or stored in a 
> >>> static
> >>> or global
> >>> variable after the loop.
> >>>
> >>> thefore  " num = Data[num];"
> >>> is by induction a pointless operation and can be 
> >>> removed.
> >>> therfore the loop is empty
> >>> therfore the loop is unneeded.
> >>
> >> Therefore it does not do what I need it to do therefore 
> >> the
> >> optimization is erroneous.
> >
> > So expect the compiler to "do what you need it to do" 
> > rather than
> > correctly compile the code you write according to the 
> > language
> > standard?
> >
> > You're headed for a lifetime of frustration and failure...

> Correct me if I am wrong: Where in the language standard 
> does it say to remove the body of my loop? 

I don't have the C++-standard but in the C99-standard (and
I would be surprised if C and C++ would differ much in that
respect) you have in section 5.1.2.3 "Program execution",
paragraph 3

  In the abstract machine, all expressions are evaluated as
  specified by the semantics. An actual implementation need
  not evaluate part of an expression if it can deduce that
  its value is not used and that no needed side effects are
  produced (including any caused by calling a function or
  accessing a volatile object).

I guess that's the standard's "blessing" of such optimizations
you were looking for. Without that blessing I wouldn't know
why the 'volatile' keyword would be needed at all - it's just
there to allow uou to tell the compiler explicitely not opti-
mize out memory accesses that have no language-imposed side
effects.
                          Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
3/27/2010 6:05:33 PM
In comp.os.linux.development.system M?ns Rullg?rd <mans@mansr.com> wrote:
> 
> 2) The bandwidth figures the OP quoted are off by several orders of
>    magnitude from what would be expected of a modern system.  Even
>    main memory can provide several gigabytes per second these days.
>    This suggests there is something fishy elsewhere, which is why I
>    suggested looking at the actual machine code to see what really is
>    being executed.
> 

Apparently OP wants to test random memory access.  For sequentially
dependent random reads his high result is very good: 90 MB/s in
ints means 22.5*10^6 memory access which means less then 50 ns
per access.  Probably his accesses are not random enough and
he still gets nontivial proprtion of L2 cache hits.

There are few reasons for slow down when using more memory:
- initializing array with random values tends to produce
  loops much shorter than array size.  Due to (pseudo)
  randomness larger array may produce much longer loop
  and consequently have much smaller amount of L2 cache hits
- TLB trashing.  With random access each access is likely
  to be on different page and require different page table
  entrly.  Relatively short loop will run out of TLB entries
  on chip and reload them from cache.  Longer loop will
  need too many page table entries to fit in L2 cache and
  will have to reload them from RAM.

P.S. I would answer to OP directly, but apparently his
posts do not show up (expire too quickly???).

-- 
                              Waldek Hebisch
hebisch@math.uni.wroc.pl 
0
Waldek
3/27/2010 6:28:11 PM
"Jens Thoms Toerring" <jt@toerring.de> wrote in message 
news:816vndF16jU1@mid.uni-berlin.de...
> In comp.os.linux.development.apps Peter Olcott 
> <NoSpam@ocr4screen.com> wrote:
>> Correct me if I am wrong: Where in the language standard
>> does it say to remove the body of my loop?
>
> I don't have the C++-standard but in the C99-standard (and
> I would be surprised if C and C++ would differ much in 
> that
> respect) you have in section 5.1.2.3 "Program execution",
> paragraph 3
>
>  In the abstract machine, all expressions are evaluated as
>  specified by the semantics. An actual implementation need
>  not evaluate part of an expression if it can deduce that
>  its value is not used and that no needed side effects are
>  produced (including any caused by calling a function or
>  accessing a volatile object).
>
> I guess that's the standard's "blessing" of such 
> optimizations
> you were looking for. Without that blessing I wouldn't 
> know
> why the 'volatile' keyword would be needed at all - it's 
> just
> there to allow uou to tell the compiler explicitely not 
> opti-
> mize out memory accesses that have no language-imposed 
> side
> effects.

Great. I used this to overcome the objections that my code 
could not be compiled with tht -O3 switch:
  volatile uint32 num;

>                          Regards, Jens
> -- 
>  \   Jens Thoms Toerring  ___      jt@toerring.de
>   \__________________________      http://toerring.de 


0
Peter
3/27/2010 6:36:28 PM
"Waldek Hebisch" <hebisch@math.uni.wroc.pl> wrote in message 
news:holinr$1to$1@z-news.wcss.wroc.pl...
> In comp.os.linux.development.system M?ns Rullg?rd 
> <mans@mansr.com> wrote:
>>
>> 2) The bandwidth figures the OP quoted are off by several 
>> orders of
>>    magnitude from what would be expected of a modern 
>> system.  Even
>>    main memory can provide several gigabytes per second 
>> these days.
>>    This suggests there is something fishy elsewhere, 
>> which is why I
>>    suggested looking at the actual machine code to see 
>> what really is
>>    being executed.
>>
>
> Apparently OP wants to test random memory access.  For 
> sequentially
> dependent random reads his high result is very good: 90 
> MB/s in
> ints means 22.5*10^6 memory access which means less then 
> 50 ns
> per access.  Probably his accesses are not random enough 
> and
> he still gets nontivial proprtion of L2 cache hits.
>
> There are few reasons for slow down when using more 
> memory:
> - initializing array with random values tends to produce
>  loops much shorter than array size.  Due to (pseudo)
>  randomness larger array may produce much longer loop
>  and consequently have much smaller amount of L2 cache 
> hits
> - TLB trashing.  With random access each access is likely
>  to be on different page and require different page table
>  entrly.  Relatively short loop will run out of TLB 
> entries
>  on chip and reload them from cache.  Longer loop will
>  need too many page table entries to fit in L2 cache and
>  will have to reload them from RAM.

(1) There are no page faults even with very large memory 
allocations.
(2 I specifically want to test worst case cache hit 
performance.
(3 The biggest difference in performance can be attributed 
to differing sequences of random numbers. This can be 
repeated by using the same fixed constant for the seed.
(4) The performance that I am getting is substantially worse 
than worst case cache hit performance.

My most primary remaining question is why is (4) true?

>
> P.S. I would answer to OP directly, but apparently his
> posts do not show up (expire too quickly???).
>
> -- 
>                              Waldek Hebisch
> hebisch@math.uni.wroc.pl 


0
Peter
3/27/2010 6:41:58 PM
"Peter Olcott" <NoSpam@OCR4Screen.com> writes:

> "Jens Thoms Toerring" <jt@toerring.de> wrote in message 
> news:816vndF16jU1@mid.uni-berlin.de...
>> In comp.os.linux.development.apps Peter Olcott 
>> <NoSpam@ocr4screen.com> wrote:
>>> Correct me if I am wrong: Where in the language standard
>>> does it say to remove the body of my loop?
>>
>> I don't have the C++-standard but in the C99-standard (and
>> I would be surprised if C and C++ would differ much in 
>> that
>> respect) you have in section 5.1.2.3 "Program execution",
>> paragraph 3
>>
>>  In the abstract machine, all expressions are evaluated as
>>  specified by the semantics. An actual implementation need
>>  not evaluate part of an expression if it can deduce that
>>  its value is not used and that no needed side effects are
>>  produced (including any caused by calling a function or
>>  accessing a volatile object).
>>
>> I guess that's the standard's "blessing" of such optimizations you
>> were looking for. Without that blessing I wouldn't know why the
>> 'volatile' keyword would be needed at all - it's just there to
>> allow uou to tell the compiler explicitely not opti- mize out
>> memory accesses that have no language-imposed side effects.
>
> Great. I used this to overcome the objections that my code 
> could not be compiled with tht -O3 switch:
>   volatile uint32 num;

Bad idea.  That will prevent proper optimisation of the loop.  The
correct solution is to do something with the final value of num, such
as storing it in a global variable or printing it.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/27/2010 6:49:51 PM
On 27 Mar, 13:35, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> What makes you think that (RAND_MAX + 1) is defined?
>
> =A0 =A0 =A0And what makes him think that rand()*rand() has a
> binomial distribution, even if it doesn't overflow?
>
I nodded.
A web search turns up that there's no neat way of obtaining the
probability density function of a the product of several uniformly-
distributed random variables. Basically you have to go to log space
and treat is as a sum (which approximates the normal distribution as N
increases).


0
Dr
3/27/2010 7:08:10 PM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1xeij54n1s.fsf@unicorn.mansr.com...
> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>
>> "Jens Thoms Toerring" <jt@toerring.de> wrote in message
>> news:816vndF16jU1@mid.uni-berlin.de...
>>> In comp.os.linux.development.apps Peter Olcott
>>> <NoSpam@ocr4screen.com> wrote:
>>>> Correct me if I am wrong: Where in the language 
>>>> standard
>>>> does it say to remove the body of my loop?
>>>
>>> I don't have the C++-standard but in the C99-standard 
>>> (and
>>> I would be surprised if C and C++ would differ much in
>>> that
>>> respect) you have in section 5.1.2.3 "Program 
>>> execution",
>>> paragraph 3
>>>
>>>  In the abstract machine, all expressions are evaluated 
>>> as
>>>  specified by the semantics. An actual implementation 
>>> need
>>>  not evaluate part of an expression if it can deduce 
>>> that
>>>  its value is not used and that no needed side effects 
>>> are
>>>  produced (including any caused by calling a function or
>>>  accessing a volatile object).
>>>
>>> I guess that's the standard's "blessing" of such 
>>> optimizations you
>>> were looking for. Without that blessing I wouldn't know 
>>> why the
>>> 'volatile' keyword would be needed at all - it's just 
>>> there to
>>> allow uou to tell the compiler explicitely not opti- 
>>> mize out
>>> memory accesses that have no language-imposed side 
>>> effects.
>>
>> Great. I used this to overcome the objections that my 
>> code
>> could not be compiled with tht -O3 switch:
>>   volatile uint32 num;
>
> Bad idea.  That will prevent proper optimisation of the 
> loop.  The
> correct solution is to do something with the final value 
> of num, such
> as storing it in a global variable or printing it.

OK that speeded up the tight loop even more.
>
> -- 
> M�ns Rullg�rd
> mans@mansr.com 


0
Peter
3/27/2010 7:42:30 PM
>
>
> On my machine (also with 2 GB of RAM) I didn't see any pronounced slow 
> down with increased memory use (at least up to the point were still 
> enough RAM was available and swapping to the harddisk didn't start), I 
> got about 23 seconds with 'size' set to 100000000 and 23.9 s for 
> 400000000. So getting more than 3/4 of the avaiable RAM (1.6 GB) 
> doesn't seem to be a problem. And /proc/meminfo showed that nearly 1.8 
> GB were free (even though an Apache web server, a PostgreSQL server, 
> emacs and a number of other processes where also running). That may 
> indicate that the effect you're seeing has some reason not directly 
> related to memory usage, especially not to the "OS eating up 1.5 GB".
>
The point doesn't seem to have sunk in, yet.  M. Olcott is still going 
on about memory speeds, too. So I'll repeat it:  Despite the fact that 
the output of the program claims to be measuring a memory access 
throughput, the actual operation of the program itself is doing no such 
thing.

One might think, at first blush, that the program is constructing a 
large array of pages and then referencing them all randomly, to simulate 
a random memory access pattern by some other program.  But that's not 
what the program is doing.  What it is in fact doing is creating a set 
of 1 or more random-length potentially cross-linked singly-linked lists 
within an array of unsigned integers, and traversing the one of them 
that starts at element 0 of the array.  As pointed out, many of the step 
effects here are due to the way that the pseudo-random sequence happens 
to distribute over the array size, which (note) is a non-prime number in 
many of the tests that people are running.  Furthermore, it's well 
within the bounds of possibility for the PRNG to cause a short 
singly-linked list that loops tightly within a single page in one case, 
and a long singly-linked list that spans multiple pages in another, 
depending from seed values and array sizes.  There's certainly no 
guarantee that all of the array (and thus all of the memory pages it 
occupies) is actually referenced at all by the list traversal, so the 
calculation of the value that is printed out, based upon the assumption 
that the whole array is referenced, is completely bogus.

M. Olcott's program is buggy.  It's not actually calculating the metric 
that it is purporting to print out. Speculating as to why your machine's 
"memory speed" is different to M. Olcott's, based upon the output of 
this program, is pointless.  You might as well be speculating over NABOB 
compression factors.

0
Jonathan
3/27/2010 9:21:03 PM
In comp.os.linux.development.apps Peter Olcott <NoSpam@ocr4screen.com> wrote:

> "Måns Rullgård" <mans@mansr.com> wrote in message 
> news:yw1xeij54n1s.fsf@unicorn.mansr.com...
> > "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
> >> "Jens Thoms Toerring" <jt@toerring.de> wrote in message
> >> news:816vndF16jU1@mid.uni-berlin.de...
> >>> In comp.os.linux.development.apps Peter Olcott
> >>> <NoSpam@ocr4screen.com> wrote:
> >>>> Correct me if I am wrong: Where in the language standard does
> >>>> it say to remove the body of my loop?
> >>>
> >>> I don't have the C++-standard but in the C99-standard (and I
> >>> would be surprised if C and C++ would differ much in that
> >>> respect) you have in section 5.1.2.3 "Program execution",
> >>> paragraph 3
> >>>
> >>>  In the abstract machine, all expressions are evaluated as
> >>> specified by the semantics. An actual implementation need not
> >>> evaluate part of an expression if it can deduce that its value
> >>> is not used and that no needed side effects are produced
> >>> (including any caused by calling a function or accessing a
> >>> volatile object).
> >>>
> >>> I guess that's the standard's "blessing" of such optimizations
> >>> you were looking for. Without that blessing I wouldn't know
> >>> why the 'volatile' keyword would be needed at all - it's just
> >>> there to allow uou to tell the compiler explicitely not opti-
> >>> mize out memory accesses that have no language-imposed side
> >>> effects.
> >>
> >> Great. I used this to overcome the objections that my code
> >> could not be compiled with tht -O3 switch: volatile uint32 num;
> >
> > Bad idea. That will prevent proper optimisation of the loop. The
> > correct solution is to do something with the final value of num,
> > such as storing it in a global variable or printing it.

> OK that speeded up the tight loop even more.

Not a big surprise actually - if you use 'volatile' each and
every write (and read) has to go to the RAM ( keeping of va-
lues just in a CPU registers or in the cache isn't allowed).
Normally 'volatile' is used when you have e.g. memory mapped
devices - the compiler doesn't know about them, but writing
or reading from such memory locations can have (non-language
imposed) side effects (e.g. writing to a memory location may
make a LED blink or something more useful like making an ADC
do a conversion etc.), and thus actually writing to (or reading
from) such memory locations makes all the difference. Another
scenario is variables that may get changed by e.g. interrupt
handlers etc. (not very useful just checking a copy of a value
in a CPU register when the interrupt handler changes what's in
RAM behing your back). But otherwise using 'volatile' isn't a
great idea (unless you want to slow things down as far as pos-
sible;-) On the other hand e.g. printing out a value is a side
effect the compiler is not allowed to "optimize out", thus it
can't discard all of the operations that do the calculation of
the value to be printed out.
                                Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
3/27/2010 10:02:37 PM
"Jonathan de Boyne Pollard" 
<J.deBoynePollard-newsgroups@NTLWorld.COM> wrote in message 
news:IU.D20100327.T212118.P2407.Q0@J.de.Boyne.Pollard.localhost...
> >
>>
>> On my machine (also with 2 GB of RAM) I didn't see any 
>> pronounced slow down with increased memory use (at least 
>> up to the point were still enough RAM was available and 
>> swapping to the harddisk didn't start), I got about 23 
>> seconds with 'size' set to 100000000 and 23.9 s for 
>> 400000000. So getting more than 3/4 of the avaiable RAM 
>> (1.6 GB) doesn't seem to be a problem. And /proc/meminfo 
>> showed that nearly 1.8 GB were free (even though an 
>> Apache web server, a PostgreSQL server, emacs and a 
>> number of other processes where also running). That may 
>> indicate that the effect you're seeing has some reason 
>> not directly related to memory usage, especially not to 
>> the "OS eating up 1.5 GB".
>>
> The point doesn't seem to have sunk in, yet.  M. Olcott is 
> still going on about memory speeds, too. So I'll repeat 
> it:  Despite the fact that the output of the program 
> claims to be measuring a memory access throughput, the 
> actual operation of the program itself is doing no such 
> thing.
>
> One might think, at first blush, that the program is 
> constructing a large array of pages and then referencing 
> them all randomly, to simulate a random memory access 
> pattern by some other program.  But that's not what the 
> program is doing.  What it is in fact doing is creating a 
> set of 1 or more random-length potentially cross-linked 
> singly-linked lists within an array of unsigned integers, 
> and traversing the one of them

(1) I implemented a std::vector once so I know that it is 
supposed to be allocated a single contiguous block of 
memory.
(2) The system monitor indicates that this is what it is 
doing because memory allocation increases exactly as 
specified.
Where do you get the idea that it is creating a set of 
linked lists from?

> that starts at element 0 of the array.  As pointed out, 
> many of the step effects here are due to the way that the 
> pseudo-random sequence happens to distribute over the 
> array size, which (note) is a non-prime number in many of 
> the tests that people are running.  Furthermore, it's well 
> within the bounds of possibility for the PRNG to cause a 
> short singly-linked list that loops tightly within a 
> single page in one case, and a long singly-linked list 
> that spans multiple pages in another, depending from seed 
> values and array sizes.  There's certainly no guarantee 
> that all of the array (and thus all of the memory pages it 
> occupies) is actually referenced at all by the list 
> traversal, so the calculation of the value that is printed 
> out, based upon the assumption that the whole array is 
> referenced, is completely bogus.
>
> M. Olcott's program is buggy.  It's not actually 
> calculating the metric that it is purporting to print out. 
> Speculating as to why your machine's "memory speed" is 
> different to M. Olcott's, based upon the output of this 
> program, is pointless.  You might as well be speculating 
> over NABOB compression factors.
> 


0
Peter
3/27/2010 10:26:36 PM
"Jonathan de Boyne Pollard" 
<J.deBoynePollard-newsgroups@NTLWorld.COM> wrote in message 
news:IU.D20100327.T212118.P2407.Q0@J.de.Boyne.Pollard.localhost...
> >
>>
>> On my machine (also with 2 GB of RAM) I didn't see any 
>> pronounced slow down with increased memory use (at least 
>> up to the point were still enough RAM was available and 
>> swapping to the harddisk didn't start), I got about 23 
>> seconds with 'size' set to 100000000 and 23.9 s for 
>> 400000000. So getting more than 3/4 of the avaiable RAM 
>> (1.6 GB) doesn't seem to be a problem. And /proc/meminfo 
>> showed that nearly 1.8 GB were free (even though an 
>> Apache web server, a PostgreSQL server, emacs and a 
>> number of other processes where also running). That may 
>> indicate that the effect you're seeing has some reason 
>> not directly related to memory usage, especially not to 
>> the "OS eating up 1.5 GB".
>>
> The point doesn't seem to have sunk in, yet.  M. Olcott is 
> still going on about memory speeds, too. So I'll repeat 
> it:  Despite the fact that the output of the program 
> claims to be measuring a memory access throughput, the 
> actual operation of the program itself is doing no such 
> thing.
>
> One might think, at first blush, that the program is 
> constructing a large array of pages and then referencing 
> them all randomly, to simulate a random memory access 
> pattern by some other program.  But that's not what the 
> program is doing.  What it is in fact doing is creating a 
> set of 1 or more random-length potentially cross-linked 
> singly-linked lists within an array of unsigned integers, 
> and traversing the one of them that starts at element 0 of 
> the array.  As pointed out, many of the step effects here 
> are due to the way that the pseudo-random sequence happens 
> to distribute over the array size, which (note) is a 
> non-prime number in many of the tests that people are 
> running.  Furthermore, it's well within the bounds of 
> possibility for the PRNG to cause a short singly-linked 
> list that loops tightly within a single page in one case, 
> and a long singly-linked list that spans multiple pages in 
> another, depending from seed values and array sizes. 
> There's certainly no guarantee that all of the array (and 
> thus all of the memory pages it occupies) is actually 
> referenced at all by the list traversal, so the 
> calculation of the value that is printed out, based upon 
> the assumption that the whole array is referenced, is 
> completely bogus.

It is not purporting to do that. It is purporting to 
generate a sequence of memory access that in at least some 
instances (depending of the sequence of generated random 
numbers) at least approximately demonstrates the worst case 
behavior for cache hit ratios.  So in other words you can 
ignore all of those little tight loops that you refer to, 
they are not representative.

This is only a little quick and dirty program that has its 
sole purpose of enabling ballpark estimates of the 
performance of a deterministic finite automaton generator 
that has similar memory access patterns.

The only remaining issue is to try to find the reasoning why 
these worst case cache hit ratio instances are substantially 
slower than what the actual worst case cache hit ratio is 
supposed to be.

>
> M. Olcott's program is buggy.  It's not actually 
> calculating the metric that it is purporting to print out. 
> Speculating as to why your machine's "memory speed" is 
> different to M. Olcott's, based upon the output of this 
> program, is pointless.  You might as well be speculating 
> over NABOB compression factors.
> 


0
Peter
3/27/2010 10:45:46 PM
In comp.os.linux.development.apps Peter Olcott <NoSpam@ocr4screen.com> wrote:

> "Jonathan de Boyne Pollard" 
> <J.deBoynePollard-newsgroups@NTLWorld.COM> wrote in message 
> > One might think, at first blush, that the program is 
> > constructing a large array of pages and then referencing 
> > them all randomly, to simulate a random memory access 
> > pattern by some other program.  But that's not what the 
> > program is doing.  What it is in fact doing is creating a 
> > set of 1 or more random-length potentially cross-linked 
> > singly-linked lists within an array of unsigned integers, 
> > and traversing the one of them

> Where do you get the idea that it is creating a set of 
> linked lists from?

With 

    for ( uint32 N = 0; N < Max; N++ )
        num = Data[ num ];

you use the array like a singly-linked list (with no data pay-
load), i.e. you use the value stored in one vector element as
a "pointer" to another one (well, actually offsets instead of
pointers, but that doesn't make too much of a difference). As
others have pointed out, the way you set up that "list" you
may accidentally create a short loop in this "list" instead of
going through large parts of its elements which, of course,
would make a lot of a difference speed-wise. And the final
behaviour could depend strongly on the exact combination of
the random seeds initial value and the value of 'size'. Thus
some people voiced their concerns about the validity of your
stated results (or the conclusions drawn from them;-)

                            Regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
3/27/2010 11:07:03 PM
M�ns Rullg�rd <mans@mansr.com> writes:
> Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:
>>
>> The "you" I was adressing was the OP, not you M�ns.  He's the one who
>> ought to be conducting experiments rather than insisting on what has to
>> be happening.
>
> I must have missed a level of quoting somewhere.  Glad we agree.

Or else I did :)

>>> 1) The compiler optimises it away completely, thereby obviously
>>>    rendering the results meaningless.  If compiled without
>>>    optimisations so as to avoid this, it will no longer accurately
>>>    reflect the behaviour of the real application, again providing
>>>    meaningless results.
>>
>> The main thing his test program is doing is making a lot of random
>> memory accesses through the array.  If his real application is doing the
>> same thing (but actually using the result so it doesn't get optimized
>> out of existence), his test program is accurate.
>
> The test could be trivially modified to also use the result, e.g. by
> printing the final value or storing it in a (volatile if necessary)
> global variable.

Good point.  That would keep the loop from being optimized away, while
other optimizations could still be performed.
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/28/2010 12:21:05 AM
Phil Carmody <thefatphil_demunged@yahoo.co.uk> writes:

> Joe Pfeiffer <pfeiffer@cs.nmsu.edu> writes:
>> "Peter Olcott" <NoSpam@OCR4Screen.com> writes:
>>
>>> "Joe Beanfish" <joe@nospam.duh> wrote in message 
>>> news:hoipa5$uvm@news.thunderstone.com...
>>>>
>>>> Or the optimizer turned off?
>>>
>>> The optimizer should not change the semantics enough to 
>>> optimize away this loop. 
>>
>> Oh, yes it could.  It can easily notice that num is never read after
>> it's written
>
> ???
>
>>>>>>    for (uint32 N = 0; N<  Max; N++)
>>>>>>      num = Data[num];
>                      ^^^^^ 
>                      that's num being read.

Sorry, I was too terse.  It is read after being written as you point
out, but the final value is never read after the loop ends.

> [SNIP - mention of bubblesort in a teaching environment - you 
> should be ashamed of yourself contaminating youngsters' minds.]

Lousy for sorting, great for simulating branch prediction by hand :)
-- 
As we enjoy great advantages from the inventions of others, we should
be glad of an opportunity to serve others by any invention of ours;
and this we should do freely and generously. (Benjamin Franklin)
0
Joe
3/28/2010 12:23:04 AM
On 2010-03-27, Waldek Hebisch <hebisch@math.uni.wroc.pl> wrote:

> Apparently OP wants to test random memory access.  For sequentially
> dependent random reads his high result is very good: 90 MB/s in
> ints means 22.5*10^6 memory access which means less then 50 ns
> per access.  Probably his accesses are not random enough and
> he still gets nontivial proprtion of L2 cache hits.

not very random at all
one in every 16384(ish) of them is a 0

theis means code runs rouund a fixed circuit of mean length 16384
(poisson distribution iirc)

> P.S. I would answer to OP directly, but apparently his
> posts do not show up (expire too quickly???).

maybe you've killfiled him?

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
0
Jasen
3/28/2010 2:09:37 AM
On 2010-03-27, pete <pfiland@mindspring.com> wrote:
> Dr Malcolm McLean wrote:
>> 
>> On 27 Mar, 06:44, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:
>> > "Jonathan de Boyne Pollard"<J.deBoynePollard-newsgro...@NTLWorld.COM> wrote in message
>> >
>> > news:IU.D20100327.T053209.P2468.Q0@J.de.Boyne.Pollard.localhost...
>> >
>> > >> for (uint32 N = 0; N < Max; N++)
>> > >>     num = Data[num];
>> >
>> > > Your program is buggy.  It's not actually calculating the
>> > > metric that it is purporting to print out.
>> >
>> > >> uint32 Random = rand() * rand();
>> >
>> > > Not random enough the first time, was it?  (-:
>> >
>> > RAND_MAX = 32767
>> > A single invocation of rand() would not by itself provide
>> > random enough coverage over size when size is much greater
>> > than 32767.
>> >
>> However rand() * rand() will give you a binomial distribution, which
>> is unlikely to be what you want.
>>  rand() * (RAND_MAX + 1) + rand();
>> is the uniform distribution.
>
> What makes you think that (RAND_MAX + 1) is defined?

ints are long (32bit) on linux, and I (and possibly several others) 
didn't notice the crossposting to comp.lang.c (where ints can be short)





--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
0
Jasen
3/28/2010 2:16:07 AM
Dr Malcolm McLean wrote:

[broadening x-post to sci.math]

> On 27 Mar, 13:35, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>> What makes you think that (RAND_MAX + 1) is defined?
>>
>>      And what makes him think that rand()*rand() has a
>> binomial distribution, even if it doesn't overflow?
>>
> I nodded.
> A web search turns up that there's no neat way of obtaining the
> probability density function of a the product of several uniformly-
> distributed random variables. Basically you have to go to log space
> and treat is as a sum (which approximates the normal distribution as N
> increases).
> 
> 

I'm reading this in comp.lang.c, and I don't think we have the firepower 
to determine this distribution, hence the crosspost.

Hello sci.math.  If rand has a flat pdf on the interval [0, 32768], what 
is the distribution of rand squared?

Thanks for your comment, and cheers,
-- 
fred
0
Phred
3/28/2010 5:46:30 AM
On Mar 27, 10:46=A0pm, Phred Phungus <Ph...@example.invalid> wrote:
> Dr Malcolm McLean wrote:
>
> [broadening x-post to sci.math]
>
> > On 27 Mar, 13:35, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> >> What makes you think that (RAND_MAX + 1) is defined?
>
> >> =A0 =A0 =A0And what makes him think that rand()*rand() has a
> >> binomial distribution, even if it doesn't overflow?
>
> > I nodded.
> > A web search turns up that there's no neat way of obtaining the
> > probability density function of a the product of several uniformly-
> > distributed random variables. Basically you have to go to log space
> > and treat is as a sum (which approximates the normal distribution as N
> > increases).
>
> I'm reading this in comp.lang.c, and I don't think we have the firepower
> to determine this distribution, hence the crosspost.
>
> Hello sci.math. =A0If rand has a flat pdf on the interval [0, 32768], wha=
t
> is the distribution of rand squared?
>
> Thanks for your comment, and cheers,
> --
> fred

If U is uniformly distributed on interval [0,M], then U^2 will be
distributed on interval [0,M^2]. Find the distribution as follows: for
any x in (0,M^2) the cumulative distribution of U^2 is F(x) =3D Pr{U^2
<=3D x} =3D Pr{U <=3D sqrt(x)}; we don't have negative values of U, so we
don't consider the interval from -sqrt(x) to 0. Anyway, for uniform U
on [0,M], Pr{U <=3D y} =3D y/M for y in [0,M]. For y =3D sqrt(x) this gives
Pr{U^2 <=3D x} =3D sqrt(x)/M [note: the right-hand-side =3D 1 when x =3D M^=
2,
as it should]. Thus, the density function of U^2 is f(x) =3D dF(x)/dx =3D
(1/2)/[M*sqrt(x)].

Note: this assumes continuously-distributed uniform U. For an *actual*
rand, U will be discrete. Furthermore, for a typical congruential
generator, the value 0 is never attained, and the distribution will be
slightly non-uniform over the other values 1, 2, ..., 32768.

R.G. Vickson
0
Ray
3/28/2010 6:05:02 AM
Ray Vickson wrote:
> On Mar 27, 10:46 pm, Phred Phungus <Ph...@example.invalid> wrote:
>> Dr Malcolm McLean wrote:
>>
>> [broadening x-post to sci.math]

>> Hello sci.math.  If rand has a flat pdf on the interval [0, 32768], what
>> is the distribution of rand squared?

> If U is uniformly distributed on interval [0,M], then U^2 will be
> distributed on interval [0,M^2]. Find the distribution as follows: for
> any x in (0,M^2) the cumulative distribution of U^2 is F(x) = Pr{U^2
> <= x} = Pr{U <= sqrt(x)}; we don't have negative values of U, so we
> don't consider the interval from -sqrt(x) to 0. Anyway, for uniform U
> on [0,M], Pr{U <= y} = y/M for y in [0,M]. For y = sqrt(x) this gives
> Pr{U^2 <= x} = sqrt(x)/M [note: the right-hand-side = 1 when x = M^2,
> as it should]. Thus, the density function of U^2 is f(x) = dF(x)/dx =
> (1/2)/[M*sqrt(x)].

Ok.  So, if one is to verify, then you're gonna heat up the 
pseudorandoms and have M^2 bins.  If the product of r = rand() and s = 
rand() equals q, then bin number q, initially set to zero, increments.

q2)  Does anyone have a statistical model for this event being 
pseudorandom as opposed to being clunky in the lower bits?
> 
> Note: this assumes continuously-distributed uniform U. For an *actual*
> rand, U will be discrete. Furthermore, for a typical congruential
> generator, the value 0 is never attained, and the distribution will be
> slightly non-uniform over the other values 1, 2, ..., 32768.

Thanks so much for your comment and your minor bounds error above like 
the one I made.  For clarity's sake, I should stipulate that the 
interval is closed about zero and a value stipulated by the C 
implementation.  I'd like to choose this as [0, 32767] because then I 
know I won't be in any trouble with M^2.

q3)  How different is the analysis when it goes from continuous to discrete?
-- 
fred
0
Phred
3/28/2010 7:00:32 AM
Phred Phungus <Phred@example.invalid> writes:
> Dr Malcolm McLean wrote:
> [broadening x-post to sci.math]
>
>> On 27 Mar, 13:35, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>>> What makes you think that (RAND_MAX + 1) is defined?
>>>
>>>      And what makes him think that rand()*rand() has a
>>> binomial distribution, even if it doesn't overflow?
>>
>> I nodded.
>> A web search turns up that there's no neat way of obtaining the
>> probability density function of a the product of several uniformly-
>> distributed random variables.

What's not neat about the continuous case?

>> Basically you have to go to log space
>> and treat is as a sum (which approximates the normal distribution as N
>> increases).

And why do you introduce an N? There is no N.

> I'm reading this in comp.lang.c, and I don't think we have the
> firepower to determine this distribution, hence the crosspost.
>
> Hello sci.math.  If rand has a flat pdf on the interval [0, 32768],
> what is the distribution of rand squared?

We're not squaring anything, we're multiplying 2 (purportedly) 
independent values.

Phil
-- 
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1
0
Phil
3/28/2010 10:21:01 AM
On 28 Mar, 11:21, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:
> Phred Phungus <Ph...@example.invalid> writes:
> > Dr Malcolm McLean wrote:
> > [broadening x-post to sci.math]
>
> >> On 27 Mar, 13:35, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> >>> What makes you think that (RAND_MAX + 1) is defined?
>
> >>> =A0 =A0 =A0And what makes him think that rand()*rand() has a
> >>> binomial distribution, even if it doesn't overflow?
>
> >> I nodded.
> >> A web search turns up that there's no neat way of obtaining the
> >> probability density function of a the product of several uniformly-
> >> distributed random variables.
>
> What's not neat about the continuous case?
>
> And why do you introduce an N? There is no N.
>
You've answered your own question.

0
Dr
3/28/2010 3:41:51 PM
On Mar 28, 3:21=A0am, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:
> Phred Phungus <Ph...@example.invalid> writes:
> > Dr Malcolm McLean wrote:
> > [broadening x-post to sci.math]
>
> >> On 27 Mar, 13:35, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> >>> What makes you think that (RAND_MAX + 1) is defined?
>
> >>> =A0 =A0 =A0And what makes him think that rand()*rand() has a
> >>> binomial distribution, even if it doesn't overflow?
>
> >> I nodded.
> >> A web search turns up that there's no neat way of obtaining the
> >> probability density function of a the product of several uniformly-
> >> distributed random variables.
>
> What's not neat about the continuous case?
>
> >> Basically you have to go to log space
> >> and treat is as a sum (which approximates the normal distribution as N
> >> increases).
>
> And why do you introduce an N? There is no N.
>
> > I'm reading this in comp.lang.c, and I don't think we have the
> > firepower to determine this distribution, hence the crosspost.
>
> > Hello sci.math. =A0If rand has a flat pdf on the interval [0, 32768],
> > what is the distribution of rand squared?
>
> We're not squaring anything, we're multiplying 2 (purportedly)
> independent values.

This is ambiguous in the original question. Having the line y =3D
rand()*rand() in a program will (probably) call rand twice and give
the product of (supposedly) independent uniform random variables. On
the other hand, the instruction y =3D rand()^2 will give the square of a
supposedly uniform random variable. So, both questions are valid.
Maybe the OP did not realize, or just forgot, the difference between
X*X and X^2 when X =3D rand() instruction. Interestingly, getting the
distribution of the square is easy, but not of the product; the latter
can be done but is not easy and is the subject of research papers.

R.G. Vickson


> Phil
> --
> I find the easiest thing to do is to k/f myself and just troll away
> -- David Melville on r.a.s.f1

0
Ray
3/28/2010 10:19:01 PM
"Ray Vickson" <RGVickson@shaw.ca> wrote...
On Mar 27, 10:46 pm, Phred Phungus <Ph...@example.invalid> wrote:
>> Dr Malcolm McLean wrote:
>>
>> [broadening x-post to sci.math]
>>
>> > On 27 Mar, 13:35, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>> >> What makes you think that (RAND_MAX + 1) is defined?
>>
>> >> And what makes him think that rand()*rand() has a
>> >> binomial distribution, even if it doesn't overflow?
>>
>>> I nodded.
>>> A web search turns up that there's no neat way of obtaining the
>>> probability density function of a the product of several uniformly-
>>> distributed random variables. Basically you have to go to log space
>>> and treat is as a sum (which approximates the normal distribution
>>> as N increases).
>>
>> I'm reading this in comp.lang.c, and I don't think we have the
>> firepower to determine this distribution, hence the crosspost.
>>
>> Hello sci.math. If rand has a flat pdf on the interval [0, 32768],
>> what is the distribution of rand squared?
>
> If U is uniformly distributed on interval [0,M], then U^2 will be
> distributed on interval [0,M^2]. Find the distribution as follows: for
> any x in (0,M^2) the cumulative distribution of U^2 is F(x) = Pr{U^2
> <= x} = Pr{U <= sqrt(x)}; we don't have negative values of U, so we
> don't consider the interval from -sqrt(x) to 0. Anyway, for uniform U
> on [0,M], Pr{U <= y} = y/M for y in [0,M]. For y = sqrt(x) this gives
> Pr{U^2 <= x} = sqrt(x)/M [note: the right-hand-side = 1 when x = M^2,
> as it should]. Thus, the density function of U^2 is f(x) = dF(x)/dx =
> (1/2)/[M*sqrt(x)].
>
> Note: this assumes continuously-distributed uniform U. For an *actual*
> rand, U will be discrete. Furthermore, for a typical congruential
> generator, the value 0 is never attained, and the distribution will be
> slightly non-uniform over the other values 1, 2, ..., 32768.

Following the background, I think the real underlying question was
more like "given independent discrete variables U1, U2
uniformly distributed on [0, M] what is the distribution of U1*U2".
Sort of the discrete counterpart to the continuous case of
http://mathworld.wolfram.com/UniformProductDistribution.html.

Except that the discrete case does not seem nearly as amenable and
smooth. For example, 0 has a probability of 2/M, primes in [2, M]
2/M^2, primes >M 0, the entire range [M(M-1)+1, M^2] also 0.
Don't know that there is a "nice" closed form for this distribution.

Liviu


0
Liviu
3/28/2010 10:39:43 PM
On Mar 27, 7:21=A0am, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:

> That may slow things down too much, if I want true random I
> an get a hardware random number generator dongle. For my
> purposes I only need the numbers to be very widely
> dispersed. I could generated my own pseudo random numbers
> based on a lookup table.

On Linux, you can open /dev/urandom and read an unlimited supply of
random bits. So long as your distribution sets things up properly,
they should pass most statistical tests for randomness and should be
cryptographically strong.

DS
0
David
3/28/2010 11:52:26 PM
David Schwartz <davids@webmaster.com> writes:

> On Mar 27, 7:21�am, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:
>
>> That may slow things down too much, if I want true random I
>> an get a hardware random number generator dongle. For my
>> purposes I only need the numbers to be very widely
>> dispersed. I could generated my own pseudo random numbers
>> based on a lookup table.
>
> On Linux, you can open /dev/urandom and read an unlimited supply of
> random bits. So long as your distribution sets things up properly,
> they should pass most statistical tests for randomness and should be
> cryptographically strong.

The problem here isn't so much the true randomness of the sequence as
the distribution and the avoiding of accesses turning into a short
loop.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/29/2010 12:12:44 AM
On Mar 28, 5:12=A0pm, M=E5ns Rullg=E5rd <m...@mansr.com> wrote:

> The problem here isn't so much the true randomness of the sequence as
> the distribution and the avoiding of accesses turning into a short
> loop.

The usual way to avoid this is simply to either sum or XOR all the
intermediary values into one final value, then do something trivial
based on that value (for example, print it out or pass it to a system
call that ignores it anyway).

DS
0
David
3/29/2010 12:59:20 AM
David Schwartz <davids@webmaster.com> writes:

> On Mar 28, 5:12�pm, M�ns Rullg�rd <m...@mansr.com> wrote:
>
>> The problem here isn't so much the true randomness of the sequence as
>> the distribution and the avoiding of accesses turning into a short
>> loop.
>
> The usual way to avoid this is simply to either sum or XOR all the
> intermediary values into one final value, then do something trivial
> based on that value (for example, print it out or pass it to a system
> call that ignores it anyway).

It seems I wasn't clear.  I was talking about the access pattern
covering the entire array.  The way the OP did it, using each value
read as the next index, there's a chance that of a loop forming.
Consider for instance the case where some value is equal to its index.
The loop would get stuck there, repeatedly reading this value, which
is obviously not what was intended.

-- 
M�ns Rullg�rd
mans@mansr.com
0
iso
3/29/2010 1:14:11 AM
"M�ns Rullg�rd" <mans@mansr.com> wrote in message 
news:yw1xoci82dfn.fsf@unicorn.mansr.com...
> David Schwartz <davids@webmaster.com> writes:
>
>> On Mar 27, 7:21 am, "Peter Olcott" 
>> <NoS...@OCR4Screen.com> wrote:
>>
>>> That may slow things down too much, if I want true 
>>> random I
>>> an get a hardware random number generator dongle. For my
>>> purposes I only need the numbers to be very widely
>>> dispersed. I could generated my own pseudo random 
>>> numbers
>>> based on a lookup table.
>>
>> On Linux, you can open /dev/urandom and read an unlimited 
>> supply of
>> random bits. So long as your distribution sets things up 
>> properly,
>> they should pass most statistical tests for randomness 
>> and should be
>> cryptographically strong.
>
> The problem here isn't so much the true randomness of the 
> sequence as
> the distribution and the avoiding of accesses turning into 
> a short
> loop.
>
> -- 
> M�ns Rullg�rd
> mans@mansr.com

That isn't even a problem any more. I made enough runs of 
variations of this test to get a good feel for the worst 
case cache hit performance. The only remaining problem is 
explaining why the worst case cache hit performance is worse 
than should be expected from worst case cache hit 
performance. 


0
Peter
3/29/2010 1:38:13 AM
On Mar 28, 6:14=A0pm, M=E5ns Rullg=E5rd <m...@mansr.com> wrote:

> It seems I wasn't clear. =A0I was talking about the access pattern
> covering the entire array. =A0The way the OP did it, using each value
> read as the next index, there's a chance that of a loop forming.
> Consider for instance the case where some value is equal to its index.
> The loop would get stuck there, repeatedly reading this value, which
> is obviously not what was intended.

It's quite possible that the behavior the OP is seeing is caused by
the specifics of his access pattern.

DS
0
David
3/29/2010 1:47:13 AM
On 2010-03-29, Måns Rullgård <mans@mansr.com> wrote:
> David Schwartz <davids@webmaster.com> writes:
>
>> On Mar 28, 5:12 pm, Måns Rullgård <m...@mansr.com> wrote:
>>
>>> The problem here isn't so much the true randomness of the sequence as
>>> the distribution and the avoiding of accesses turning into a short
>>> loop.
>>
>> The usual way to avoid this is simply to either sum or XOR all the
>> intermediary values into one final value, then do something trivial
>> based on that value (for example, print it out or pass it to a system
>> call that ignores it anyway).
>
> It seems I wasn't clear.  I was talking about the access pattern
> covering the entire array.  The way the OP did it, using each value
> read as the next index, there's a chance that of a loop forming.
> Consider for instance the case where some value is equal to its index.
> The loop would get stuck there, repeatedly reading this value, which
> is obviously not what was intended.

Indeed, see also: The Birthday Paradox. Even with good random numbers
it's likely to quickly find a loop of less than sqrt(n)
length.

If he wants random array access he could build a giant Mersenne Twister.
and just ran that.

bye.

--- news://freenews.netfront.net/ - complaints: news@netfront.net ---
0
Jasen
3/29/2010 8:09:56 AM
Ray Vickson wrote:

> This is ambiguous in the original question. Having the line y =
> rand()*rand() in a program will (probably) call rand twice and give
> the product of (supposedly) independent uniform random variables. On
> the other hand, the instruction y = rand()^2 will give the square of a
> supposedly uniform random variable. So, both questions are valid.
> Maybe the OP did not realize, or just forgot, the difference between
> X*X and X^2 when X = rand() instruction. Interestingly, getting the
> distribution of the square is easy, but not of the product; the latter
> can be done but is not easy and is the subject of research papers.

Ok, Ray, I was unclear about this myself.  From my perspective, I'm not 
OP, but from the perspective of sci.math, I introduced the original 
question.

The interesting question I was thinking about is what is the 
distribution of q:

r = rand();
s = rand();
q = r * s;

That I lapsed into a notation of talking about rand squared shows that I 
lack many of the tools of writing about mathematics on the net.  The 
Germans are very good at it in de.sci.mathematik.

If there is an OP still paying attention, maybe he/she can specify what 
he is looking for.

Cheers,
-- 
fred
0
Phred
3/30/2010 4:41:29 AM
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---1247968617-1751970618-1269888054=:31180
Content-Type: TEXT/PLAIN; CHARSET=iso-8859-2; FORMAT=flowed
Content-Transfer-Encoding: 8BIT
Content-ID: <Pine.LNX.4.64.1003292046221.31180@login01.caesar.elte.hu>

On Mon, 29 Mar 2010, M�ns Rullg�rd wrote:

> David Schwartz <davids@webmaster.com> writes:
>
>> On Mar 27, 7:21�am, "Peter Olcott" <NoS...@OCR4Screen.com> wrote:
>>
>>> That may slow things down too much, if I want true random I
>>> an get a hardware random number generator dongle. For my
>>> purposes I only need the numbers to be very widely
>>> dispersed. I could generated my own pseudo random numbers
>>> based on a lookup table.
>>
>> On Linux, you can open /dev/urandom and read an unlimited supply of
>> random bits. So long as your distribution sets things up properly,
>> they should pass most statistical tests for randomness and should be
>> cryptographically strong.
>
> The problem here isn't so much the true randomness of the sequence as
> the distribution and the avoiding of accesses turning into a short
> loop.


(The local nntp server I used to use seems to have a problem with 
forwarding my messages outwards. At least I was unable to find my messages 
with google, and they appear absent also from news.eternal-september.org. 
I went kind of crazy seeing this question discussed in multiple newsgroups 
and nobody reflecting on my pertinent(?) message -- reposted below. I'm 
switching to a different news client and the aforementioned news server, 
in the hope that I'll be able to repost Sattolo's name to all relevant 
newsgroups. Excuse the massive cross-posting. I set Followup-To to c.u.p., 
because that's where I saw the topic first.)

lacos


X-News: ludens comp.unix.programmer:167386
From: lacos@ludens.elte.hu (Ersek, Laszlo)
Subject: Re: Can extra processing threads help in this case?
Date: 24 Mar 2010 02:49:32 +0100
Message-ID: <RfITPGAL3QUd@ludens>

In article <pe9k77-gjk.ln1@wilbur.25thandClement.com>, William Ahern <william@wilbur.25thandClement.com> writes:
> Peter Olcott <NoSpam@ocr4screen.com> wrote:
>> "Eric Sosman" <esosman@ieee-dot-org.invalid> wrote in 
>> message news:ho5tof$lon$1@news.eternal-september.org...
> <snip>
>> > But if there's another CPU/core/strand/pipeline, it's possible that one
>> > processor's stall time could be put to productive use by another if
>> > there were multiple execution threads.
> <snip>
>> is there any possible way that this app is not memory bound that you can
>> provide a specific concrete example of?
> 
> Your question was answered.
> 
> You're hung up on your numbers and preconceived ideas. Your application
> could be BOTH memory bound AND able to benefit from multiple CPUs. But it's
> nearly impossible to guess without knowing at least the algorithm; more
> specifically, the code.

This thread has inspired me to write some code. I wrote it last night in
a sleep deprivation induced delirium. (I almost modified the upper limit
of LOG2_NEL from 31 to 32 before posting, but then decided to post the
version with 31 so that it works where "size_t" is a 32-bit unsigned
integer type with conversion rank at least that of "int".) I was fairly
sure about knowing what it measures. Now I'm fairly sure I was stupid.
So please tell me, I ask you kindly, what does the following code
measure? Thank you.

First some test results (inserting whitespace generously):

$ time -p ./jump_around 1

starting threads
14.5209 s/thread

real 16.43
user 16.37
sys 0.02


$ time -p ./jump_around 2

starting threads
15.571 s/thread

real 17.64
user 32.95
sys 0.05


$ time -p ./jump_around 4

starting threads
15.587 s/thread

real 33.25
user 64.16
sys 0.08


CPU (from <http://mysite.verizon.net/pchardwarelinks/current_cpus.htm>):

           Athlon 64 X2 6000+ MMX 3DNow! SSE SSE2 SSE3 (Windsor)
           (128-bit on-Die unbuffered DDR2 PC6400 mem controller, AMD-V)
           (dual core)

           940 pins, 3000MHz (200x15), (64-bit dual-pumped bus), 1.4v

           2x 64KB data (2-way)
           2x 64KB instruction (2-way)
           2x 1MB on-Die unified L2 (16-way exclusive)


Here cometh the code. Please excuse the bugs, both the blatant and the
insidious.

Thanks,
lacos


/*
     $Id: jump_around.c,v 1.4 2010/03/23 00:02:22 lacos Exp $

     This program creates a single-cycle permutation of [0, 2**LOG2_NEL - 1] in a
     static array with Sattolo's algorithm. Then it starts the specified number of
     threads. Each sub-thread will traverse the permutation NROUND times. Each
     thread starts at a different position in the loop (at offsets increasing
     simply one by one in the buffer). Since the shuffle is driven by a uniform
     distribution PRNG, the sub-threads should divide the loop more or less evenly
     among themselves. The program measures and prints the user CPU time consumed
     by a single sub-thread on average. Waiting for memory should be included in
     this value.

     Compile and link with eg.

     $ gcc -ansi -pedantic -Wall -Wextra -o jump_around jump_around.c -l pthread

     uint32_t and uint64_t are used only when their sizes are considered
     significant for some reason.
*/

#define _XOPEN_SOURCE 500


#include <stdlib.h>       /* size_t, strtol(), EXIT_FAILURE, jrand48() */
#include <inttypes.h>     /* uint32_t, uint64_t */
#include <assert.h>       /* assert() */
#include <stdio.h>        /* fprintf(), fflush() */
#include <sys/resource.h> /* getrusage() */
#include <pthread.h>      /* pthread_create(), pthread_join() */


/* Number of rounds a single sub-thread traverses the cycle. */
#define NROUND 10u

/* Highest number of sub-threads allowed on the command line. */
#define MAXTHR 4L

/*
     Base two logarithm of the number of elements in the array (that is, log2 of
     the cycle length). Make sure it is in [log2(MAXTHR), 31].
*/
#define LOG2_NEL 24


/* Number of elements; IOW, cycle length. */
#define NEL ((size_t)1 << LOG2_NEL)


/* 28 == LOG2_NEL will result in a 1GB array. */
static uint32_t buf[NEL];


/*
     "*arg" points to a size_t object holding the start offset in the buffer.
*/
static void *
meander(void *arg)
{
     unsigned round;

     for (round = 0u; round < NROUND; ++round) {
       size_t pos,
           cnt;

       pos = *(const size_t *)arg;
       for (cnt = 0u; cnt < NEL; ++cnt) {
         pos = buf[pos];
       }
       assert(pos == *(const size_t *)arg);
     }

     return 0;
}


int
main(int argc, char **argv)
{
     long nthr;

     /* Get number of sub-threads. */
     {
       char *endptr;

       if (2 != argc || (nthr = strtol(argv[1], &endptr, 10)) < 1L
           || nthr > MAXTHR || '\0' != *endptr) {
         (void)fprintf(stderr, "pass number of threads (1-%ld)\n", MAXTHR);
         return EXIT_FAILURE;
       }
     }

     /* Initialize array with Sattolo's algorithm. */
     {
       size_t cnt;
       short unsigned xsubi[3] = { 1u, 2u, 3u };

       /* Identity permutation. */
       for (cnt = 0u; cnt < NEL; ++cnt) {
         buf[cnt] = cnt;
       }

       /* Swaps. */
       for (cnt = 0u; cnt < NEL - 1u; ++cnt) {
         uint32_t save;
         size_t rnd;

         save = buf[cnt];
         /*
           Choose a pseudo-random number in [0, NEL - 2u - cnt] with uniform
           distribution.

           (uint32_t)jrand48(xsubi) covers [0, 2**32 - 1].

           See also the constraint on LOG2_NEL.
         */
         rnd = (uint64_t)(uint32_t)jrand48(xsubi) * ((NEL - 2u - cnt) + 1u) >> 32;
         buf[cnt] = buf[cnt + 1u + rnd];
         buf[cnt + 1u + rnd] = save;
       }
     }

     (void)fprintf(stderr, "starting threads\n");

     {
       int ret;
       struct rusage start, stop;

       ret = getrusage(RUSAGE_SELF, &start);
       assert(0 == ret);

       {
         long widx;
         size_t startpos[MAXTHR];
         pthread_t worker[MAXTHR];

         for (widx = 0L; widx < nthr; ++widx) {
           startpos[widx] = widx;
           ret = pthread_create(worker + widx, 0, &meander, startpos + widx);
           assert(0 == ret);
         }

         for (widx = 0L; widx < nthr; ++widx) {
           ret = pthread_join(worker[widx], 0);
           assert(0 == ret);
         }
       }

       ret = getrusage(RUSAGE_SELF, &stop);
       assert(0 == ret);

       (void)fprintf(stdout, "%Lg s/thread\n",
           (((long double)stop.ru_utime.tv_usec - start.ru_utime.tv_usec)
           / 1000000.0L + stop.ru_utime.tv_sec - start.ru_utime.tv_sec) / nthr);
       if (EOF == fflush(stdout)) {
         return EXIT_FAILURE;
       }
     }

     return EXIT_SUCCESS;
}
---1247968617-1751970618-1269888054=:31180--
0
Ersek
3/30/2010 9:05:42 AM
Reply: