how to use volatile key word? #2

  • Follow


how to use volatile key word?

Assume I have a global variable shared by two or more tasks.  Do I need to declare it as volatile?

When will a non-volatile global variable be written back to memory from a register copy? Before a function call? Or before the return statement?



0
Reply fuhu314 (16) 6/26/2012 4:11:28 PM

On 6/26/2012 12:11 PM, 伏虎 wrote:
> how to use volatile key word?
>
> Assume I have a global variable shared by two or more tasks.  Do I
> need to declare it as volatile?

'volatile' usually means that the object's value can be changed at any 
moment by some mechanism outside of your program control.  When you have 
a global variable shared between parts of *your program*, that's all 
under your control, and hence doesn't need to be declared 'volatile'.

> When will a non-volatile global variable be written back to memory
> from a register copy? Before a function call? Or before the return
> statement?

Side effects (like changing memory) happen at "sequence points", within 
the same thread.  Shared [between threads] data need to be protected 
better, and there are mechanisms for that - see "atomic operations" and 
"synchronization".  That's all in theory, of course.  Do you have a 
particular problem?

V
-- 
I do not respond to top-posted replies, please don't ask
0
Reply v.bazarov (792) 6/26/2012 5:04:09 PM


=?UTF-8?B?5LyP6JmO?= <fuhu314@hotmail.com> wrote in
news:c83af998-1703-4a43-b67c-945e38156488@googlegroups.com: 

> how to use volatile key word?

Volatile is meant to be used for accessing memory-mapped hardware, if you 
are not writing device drivers or such you can just forget about it. It is 
neither needed nor sufficient for portable thread synchronisation.

See e.g. http://en.wikipedia.org/wiki/Volatile_variable

hth
Paavo
0
Reply myfirstname1 (587) 6/26/2012 6:53:23 PM

Paavo Helde <myfirstname@osa.pri.ee> wrote:
> Volatile is meant to be used for accessing memory-mapped hardware, if you 
> are not writing device drivers or such you can just forget about it. It is 
> neither needed nor sufficient for portable thread synchronisation.

'volatile' might not be sufficient for thread synchronization nor atomicity,
but it can certainly make a difference in a multithreaded program.

In a program I had a case where I was just reading an integral from one
thread that was changed by another thread to signal a minor, unimportant
effect (namely something related to updating the UI). This did not require
full-fledged locking (which would have made it needlessly expensive) because
if the integral had a wrong value for a split second, that wasn't really
a catastrophical event.

However, 'volatile' had a major impact on that integral. Without it, the
other thread was not seeing the changes immediately, causing a noticeable
delay (it was a situation where the thread that was setting that integral
did it inside a loop, and the compiler was optimizing the updating of the
variable in such a way that the actual writing to the extern variable was
delayed to be after the loop). With 'volatile' the variable was always
updated immediately.
0
Reply nospam270 (2853) 6/27/2012 5:32:02 AM

On Tue, 26 Jun 2012 13:53:23 -0500, Paavo Helde
<myfirstname@osa.pri.ee> wrote:

>=?UTF-8?B?5LyP6JmO?= <fuhu314@hotmail.com> wrote in
>news:c83af998-1703-4a43-b67c-945e38156488@googlegroups.com: 
>
>> how to use volatile key word?
>
>Volatile is meant to be used for accessing memory-mapped hardware, if you 
>are not writing device drivers or such you can just forget about it. It is 
>neither needed nor sufficient for portable thread synchronisation.

I'd say the volatile keywork is not sufficient but it is necessary in
some circumstances.

>See e.g. http://en.wikipedia.org/wiki/Volatile_variable

This page does show cases where use of the volatile keyword is
necessary.
-- 
(\__/)  M.
(='.'=) If a man stands in a forest and no woman is around
(")_(") is he still wrong?

0
Reply i1658 (113) 6/27/2012 9:41:37 AM

Juha Nieminen <nospam@thanks.invalid> wrote in
news:jse5si$c7s$1@speranza.aioe.org: 

> Paavo Helde <myfirstname@osa.pri.ee> wrote:
>> Volatile is meant to be used for accessing memory-mapped hardware, if
>> you are not writing device drivers or such you can just forget about
>> it. It is neither needed nor sufficient for portable thread
>> synchronisation. 
> 
> 'volatile' might not be sufficient for thread synchronization nor
> atomicity, but it can certainly make a difference in a multithreaded
> program. 
> 
> In a program I had a case where I was just reading an integral from
> one thread that was changed by another thread to signal a minor,
> unimportant effect (namely something related to updating the UI). This
> did not require full-fledged locking (which would have made it
> needlessly expensive) because if the integral had a wrong value for a
> split second, that wasn't really a catastrophical event.

Out of curiosity - did you measure how much the "needlessly expensive" 
locking was slower than using a volatile? I'm asking because I find myself 
often in urge to invent all kind of clever tricks to bypass proper locking.

Cheers
Paavo
0
Reply myfirstname1 (587) 6/27/2012 9:03:10 PM

Mark <i@dontgetlotsofspamanymore.invalid> wrote in
news:t8llu797ibglliasbsqvte9r0fitpmkndk@4ax.com: 

> 
>>See e.g. http://en.wikipedia.org/wiki/Volatile_variable
> 
> This page does show cases where use of the volatile keyword is
> necessary.

Not sure what you mean, for C and C++ I see only discussion about memory-
mapped hardware. The examples about multi-thread synchronization with 
volatile are talking about Java and C# where this keyword apparently means 
something different.

Cheers
Paavo




0
Reply myfirstname1 (587) 6/27/2012 9:12:05 PM

On 06/27/12 05:32 PM, Juha Nieminen wrote:
> Paavo Helde<myfirstname@osa.pri.ee>  wrote:
>> Volatile is meant to be used for accessing memory-mapped hardware, if you
>> are not writing device drivers or such you can just forget about it. It is
>> neither needed nor sufficient for portable thread synchronisation.
>
> 'volatile' might not be sufficient for thread synchronization nor atomicity,
> but it can certainly make a difference in a multithreaded program.
>
> In a program I had a case where I was just reading an integral from one
> thread that was changed by another thread to signal a minor, unimportant
> effect (namely something related to updating the UI). This did not require
> full-fledged locking (which would have made it needlessly expensive) because
> if the integral had a wrong value for a split second, that wasn't really
> a catastrophical event.

Did you measure?  In the absence of contention, the impact of locking 
should be close to none.

-- 
Ian Collins
0
Reply ian-news (9886) 6/27/2012 9:18:50 PM

Paavo Helde <myfirstname@osa.pri.ee> writes:
>Juha Nieminen <nospam@thanks.invalid> wrote in
>news:jse5si$c7s$1@speranza.aioe.org: 
>
>> Paavo Helde <myfirstname@osa.pri.ee> wrote:
>>> Volatile is meant to be used for accessing memory-mapped hardware, if
>>> you are not writing device drivers or such you can just forget about
>>> it. It is neither needed nor sufficient for portable thread
>>> synchronisation. 
>> 
>> 'volatile' might not be sufficient for thread synchronization nor
>> atomicity, but it can certainly make a difference in a multithreaded
>> program. 
>> 
>> In a program I had a case where I was just reading an integral from
>> one thread that was changed by another thread to signal a minor,
>> unimportant effect (namely something related to updating the UI). This
>> did not require full-fledged locking (which would have made it
>> needlessly expensive) because if the integral had a wrong value for a
>> split second, that wasn't really a catastrophical event.
>
>Out of curiosity - did you measure how much the "needlessly expensive" 
>locking was slower than using a volatile? I'm asking because I find myself 
>often in urge to invent all kind of clever tricks to bypass proper locking.

This will be dependent on how the volatile is used.  For example:

   volatile bool Class::terminate_thread = false;

   .
   .
   .


   void
   Class::Run(void *arg)
   {
      while (!terminate_thread) {
         // do something
      }
      pthread_exit(NULL);
   }
   .
   .
   .

If terminate_thread is declared volatile (since its value can be changed by
another thread or a signal handler), the effect on code generation is minimal
(a load each time through the loop instead of using a register cached value), while
most synchronization mechanisms will require at least one function call.

Given that accesses to 'bool' datatypes are atomic on most modern architectures,
it is a common paradigm to use volatile on such variables.

scott
0
Reply scott1 (356) 6/27/2012 9:19:42 PM

scott@slp53.sl.home (Scott Lurndal) wrote in
news:ONKGr.39377$C06.11961@news.usenetserver.com: 

> Paavo Helde <myfirstname@osa.pri.ee> writes:
>>Juha Nieminen <nospam@thanks.invalid> wrote in
>>news:jse5si$c7s$1@speranza.aioe.org: 
>>
>>> Paavo Helde <myfirstname@osa.pri.ee> wrote:
>>>> Volatile is meant to be used for accessing memory-mapped hardware,
>>>> if you are not writing device drivers or such you can just forget
>>>> about it. It is neither needed nor sufficient for portable thread
>>>> synchronisation. 
>>> 
>>> 'volatile' might not be sufficient for thread synchronization nor
>>> atomicity, but it can certainly make a difference in a multithreaded
>>> program. 
>>> 
>>> In a program I had a case where I was just reading an integral from
>>> one thread that was changed by another thread to signal a minor,
>>> unimportant effect (namely something related to updating the UI).
>>> This did not require full-fledged locking (which would have made it
>>> needlessly expensive) because if the integral had a wrong value for
>>> a split second, that wasn't really a catastrophical event.
>>
>>Out of curiosity - did you measure how much the "needlessly expensive"
>>locking was slower than using a volatile? I'm asking because I find
>>myself often in urge to invent all kind of clever tricks to bypass
>>proper locking. 
> 
> This will be dependent on how the volatile is used.  For example:
> 
>    volatile bool Class::terminate_thread = false;
> 
>    .
>    .
>    .
> 
> 
>    void
>    Class::Run(void *arg)
>    {
>       while (!terminate_thread) {
>          // do something
>       }
>       pthread_exit(NULL);
>    }
>    .
>    .
>    .
> 
> If terminate_thread is declared volatile (since its value can be
> changed by another thread or a signal handler), the effect on code
> generation is minimal (a load each time through the loop instead of
> using a register cached value), while most synchronization mechanisms
> will require at least one function call. 
> 
> Given that accesses to 'bool' datatypes are atomic on most modern
> architectures, it is a common paradigm to use volatile on such
> variables. 

I agree this is probably working fine with current systems, but I think 
it is not guaranteed to work by any standard, is it? If so, it may 
arguable cease working on some not-so-distant future hyper-super multi-
core machine.

Anyway, I believe these saved cpu cycles are worth something only if the 
loop is really very tight and fast, but 'volatile' does not guarantee an 
immediate notification anyway so it does not really make sense to check 
the flag so often. In such a tight loop, why not check the flag e.g. each 
1000-th iteration with proper locking?

Cheers
Paavo
0
Reply myfirstname1 (587) 6/27/2012 9:37:36 PM

Paavo Helde <myfirstname@osa.pri.ee> writes:
>scott@slp53.sl.home (Scott Lurndal) wrote in
>news:ONKGr.39377$C06.11961@news.usenetserver.com: 
>
>> Paavo Helde <myfirstname@osa.pri.ee> writes:
>>>Juha Nieminen <nospam@thanks.invalid> wrote in
>>>news:jse5si$c7s$1@speranza.aioe.org: 
>>>
>>>> Paavo Helde <myfirstname@osa.pri.ee> wrote:
>>>>> Volatile is meant to be used for accessing memory-mapped hardware,
>>>>> if you are not writing device drivers or such you can just forget
>>>>> about it. It is neither needed nor sufficient for portable thread
>>>>> synchronisation. 
>>>> 
>>>> 'volatile' might not be sufficient for thread synchronization nor
>>>> atomicity, but it can certainly make a difference in a multithreaded
>>>> program. 
>>>> 
>>>> In a program I had a case where I was just reading an integral from
>>>> one thread that was changed by another thread to signal a minor,
>>>> unimportant effect (namely something related to updating the UI).
>>>> This did not require full-fledged locking (which would have made it
>>>> needlessly expensive) because if the integral had a wrong value for
>>>> a split second, that wasn't really a catastrophical event.
>>>
>>>Out of curiosity - did you measure how much the "needlessly expensive"
>>>locking was slower than using a volatile? I'm asking because I find
>>>myself often in urge to invent all kind of clever tricks to bypass
>>>proper locking. 
>> 
>> This will be dependent on how the volatile is used.  For example:
>> 
>>    volatile bool Class::terminate_thread = false;
>> 
>>    .
>>    .
>>    .
>> 
>> 
>>    void
>>    Class::Run(void *arg)
>>    {
>>       while (!terminate_thread) {
>>          // do something
>>       }
>>       pthread_exit(NULL);
>>    }
>>    .
>>    .
>>    .
>> 
>> If terminate_thread is declared volatile (since its value can be
>> changed by another thread or a signal handler), the effect on code
>> generation is minimal (a load each time through the loop instead of
>> using a register cached value), while most synchronization mechanisms
>> will require at least one function call. 
>> 
>> Given that accesses to 'bool' datatypes are atomic on most modern
>> architectures, it is a common paradigm to use volatile on such
>> variables. 
>
>I agree this is probably working fine with current systems, but I think 
>it is not guaranteed to work by any standard, is it? If so, it may 
>arguable cease working on some not-so-distant future hyper-super multi-
>core machine.

Given the target for these applications, this is not a concern.  It is
quite unlikely that any future computer architecture would fall over
on this anyway, for a number of reasons (including the mass of existing
legacy C and C++ code out there, for which volatile has certain meaning).

And I seem to recall some wording the the C standard about atomicity
guarantees for certain size data types, but all I have to hand is a 1988
draft document which says nothing.

>
>Anyway, I believe these saved cpu cycles are worth something only if the 
>loop is really very tight and fast, but 'volatile' does not guarantee an 
>immediate notification anyway so it does not really make sense to check 
>the flag so often. In such a tight loop, why not check the flag e.g. each 
>1000-th iteration with proper locking?

In the actual occurances, the loop includes one of:

   - The poll(2) system call, waiting for work to arrive on one or
     more file descriptors (e.g. an inbound network connection or inbound
     packet) or

   - A call to pthread_cond_wait(3), waiting for work to be queued to the
     thread through shared memory.

The loop is anything but tight, and the "terminate_thread" variable is
checked once per "work item" (one of the file descriptors poll(2) is waiting
on is the read-end of a pipe used to break out of the poll(2) call when
the thread is being terminated).

scott
0
Reply scott1 (356) 6/27/2012 11:17:35 PM

scott@slp53.sl.home (Scott Lurndal) wrote in
news:jwMGr.43370$0I1.18968@news.usenetserver.com: 

> Paavo Helde <myfirstname@osa.pri.ee> writes:
>>scott@slp53.sl.home (Scott Lurndal) wrote in
>>news:ONKGr.39377$C06.11961@news.usenetserver.com: 
>>
>>> Paavo Helde <myfirstname@osa.pri.ee> writes:
>>>>Juha Nieminen <nospam@thanks.invalid> wrote in
>>>>news:jse5si$c7s$1@speranza.aioe.org: 
>>>>
>>>>> Paavo Helde <myfirstname@osa.pri.ee> wrote:
>>>>>> Volatile is meant to be used for accessing memory-mapped
>>>>>> hardware, if you are not writing device drivers or such you can
>>>>>> just forget about it. It is neither needed nor sufficient for
>>>>>> portable thread synchronisation. 
>>>>> 
>>>>> 'volatile' might not be sufficient for thread synchronization nor
>>>>> atomicity, but it can certainly make a difference in a
>>>>> multithreaded program. 
>>>>> 
>>>>> In a program I had a case where I was just reading an integral
>>>>> from one thread that was changed by another thread to signal a
>>>>> minor, unimportant effect (namely something related to updating
>>>>> the UI). This did not require full-fledged locking (which would
>>>>> have made it needlessly expensive) because if the integral had a
>>>>> wrong value for a split second, that wasn't really a
>>>>> catastrophical event. 
>>>>
>>>>Out of curiosity - did you measure how much the "needlessly
>>>>expensive" locking was slower than using a volatile? I'm asking
>>>>because I find myself often in urge to invent all kind of clever
>>>>tricks to bypass proper locking. 
>>> 
>>> This will be dependent on how the volatile is used.  For example:
>>> 
>>>    volatile bool Class::terminate_thread = false;
>>> 
>>>    .
>>>    .
>>>    .
>>> 
>>> 
>>>    void
>>>    Class::Run(void *arg)
>>>    {
>>>       while (!terminate_thread) {
>>>          // do something
>>>       }
>>>       pthread_exit(NULL);
>>>    }
>>>    .
>>>    .
>>>    .
>>> 
>>> If terminate_thread is declared volatile (since its value can be
>>> changed by another thread or a signal handler), the effect on code
>>> generation is minimal (a load each time through the loop instead of
>>> using a register cached value), while most synchronization
>>> mechanisms will require at least one function call. 
>>> 
>>> Given that accesses to 'bool' datatypes are atomic on most modern
>>> architectures, it is a common paradigm to use volatile on such
>>> variables. 
>>
>>I agree this is probably working fine with current systems, but I
>>think it is not guaranteed to work by any standard, is it? If so, it
>>may arguable cease working on some not-so-distant future hyper-super
>>multi- core machine.
> 
> Given the target for these applications, this is not a concern.  It is
> quite unlikely that any future computer architecture would fall over
> on this anyway, for a number of reasons (including the mass of
> existing legacy C and C++ code out there, for which volatile has
> certain meaning).

Agreed.
 
> And I seem to recall some wording the the C standard about atomicity
> guarantees for certain size data types, but all I have to hand is a
> 1988 draft document which says nothing.

Atomicity is not a concern I think, but the cross-thread visibility of 
changes might be.

>>
>>Anyway, I believe these saved cpu cycles are worth something only if
>>the loop is really very tight and fast, but 'volatile' does not
>>guarantee an immediate notification anyway so it does not really make
>>sense to check the flag so often. In such a tight loop, why not check
>>the flag e.g. each 1000-th iteration with proper locking?
> 
> In the actual occurances, the loop includes one of:
> 
>    - The poll(2) system call, waiting for work to arrive on one or
>      more file descriptors (e.g. an inbound network connection or
>      inbound packet) or
> 
>    - A call to pthread_cond_wait(3), waiting for work to be queued to
>    the 
>      thread through shared memory.

pthread_cond_wait() already involves mutex locking, so there is no reason 
why the termination flag could not be protected by the same mutex.

 
> The loop is anything but tight, and the "terminate_thread" variable is
> checked once per "work item" (one of the file descriptors poll(2) is
> waiting on is the read-end of a pipe used to break out of the poll(2)
> call when the thread is being terminated).

poll(2) gives out information about events related to each file 
descriptor, so in this case the terminate_thread flag is not needed at 
all. If one is so concerned about the performance that one wants to avoid 
a proper locking of the termination flag then one should be worried about 
accessing a volatile variable as well.

Yeah, probably I'm just nitpicking and volatile flag works fine in the 
practice.

Cheers
Paavo
0
Reply myfirstname1 (587) 6/28/2012 5:01:31 AM

Paavo Helde <myfirstname@osa.pri.ee> wrote:
> Juha Nieminen <nospam@thanks.invalid> wrote in
>> In a program I had a case where I was just reading an integral from
>> one thread that was changed by another thread to signal a minor,
>> unimportant effect (namely something related to updating the UI). This
>> did not require full-fledged locking (which would have made it
>> needlessly expensive) because if the integral had a wrong value for a
>> split second, that wasn't really a catastrophical event.
> 
> Out of curiosity - did you measure how much the "needlessly expensive" 
> locking was slower than using a volatile? I'm asking because I find myself 
> often in urge to invent all kind of clever tricks to bypass proper locking.

No, but it was enough that the options were between writing 'volatile'
and several lines of code using a non-standard library to implement proper
locking, so the choice was rather clear to me.

(In C++11 it might have been enough to declare the integral to be atomic,
and it would probably have been equally fast and fully synchronized. However,
I did not have C++11 at my disposal at that time.)
0
Reply nospam270 (2853) 6/28/2012 5:53:40 AM

Ian Collins <ian-news@hotmail.com> wrote:
> Did you measure?  In the absence of contention, the impact of locking 
> should be close to none.

Locking is always expensive, even if no thread has to wait at any point.

That's the very reason why so much research is being done on lock-free
containers and atomic operations.

Anyways, if the choice is between writing 'volatile' and writing several
lines of code to implement locking in a situation where locking is not
needed, the choice is pretty clear.
0
Reply nospam270 (2853) 6/28/2012 5:55:26 AM

On 06/28/12 05:55 PM, Juha Nieminen wrote:
> Ian Collins<ian-news@hotmail.com>  wrote:
>> Did you measure?  In the absence of contention, the impact of locking
>> should be close to none.
>
> Locking is always expensive, even if no thread has to wait at any point.
>
> That's the very reason why so much research is being done on lock-free
> containers and atomic operations.
>
> Anyways, if the choice is between writing 'volatile' and writing several
> lines of code to implement locking in a situation where locking is not
> needed, the choice is pretty clear.

Until you have to run on a multi-core system....

-- 
Ian Collins
0
Reply ian-news (9886) 6/28/2012 7:59:53 AM

On Wed, 27 Jun 2012 16:12:05 -0500, Paavo Helde
<myfirstname@osa.pri.ee> wrote:

>Mark <i@dontgetlotsofspamanymore.invalid> wrote in
>news:t8llu797ibglliasbsqvte9r0fitpmkndk@4ax.com: 
>
>> 
>>>See e.g. http://en.wikipedia.org/wiki/Volatile_variable
>> 
>> This page does show cases where use of the volatile keyword is
>> necessary.
>
>Not sure what you mean, for C and C++ I see only discussion about memory-
>mapped hardware. The examples about multi-thread synchronization with 
>volatile are talking about Java and C# where this keyword apparently means 
>something different.

As I understand it there is still a risk of the unsafe opimizations
which the volatile keyword would prevent.
-- 
(\__/)  M.
(='.'=) If a man stands in a forest and no woman is around
(")_(") is he still wrong?

0
Reply i1658 (113) 6/28/2012 8:31:52 AM

Ian Collins <ian-news@hotmail.com> wrote:
>> Anyways, if the choice is between writing 'volatile' and writing several
>> lines of code to implement locking in a situation where locking is not
>> needed, the choice is pretty clear.
> 
> Until you have to run on a multi-core system....

What difference would that make?
0
Reply nospam270 (2853) 6/28/2012 8:46:31 AM

On 28/06/2012 07:55, Juha Nieminen wrote:
> Locking is always expensive, even if no thread has to wait at any point.
>
> That's the very reason why so much research is being done on lock-free
> containers and atomic operations.

Globally speaking, with regard to the C++11 standard, atomic types may 
use non-blocking locks in the implementation. The only exception is 
std::atomic_flag, which is guaranteed not to use locks. This means that 
lock-free data structures and are not necessarily wait-free data structures.

Writing wait-free data structures is extremely hard and even when you 
get them right, you should make sure that the benefit outweighs the 
cost, since the group of atomic operations used to implement them can be 
much slower than non-atomic counterpart. In other words, the main reason 
to write lock-free and wait-free code is to increase the potential for 
concurrency, but it may well decrease overall performance.

For these reasons it's important to identify the relevant performance 
aspect and profile various alternatives on the target platforms before 
deciding.
0
Reply luca.risolia (165) 6/28/2012 5:54:21 PM

Mark wrote:

> On Wed, 27 Jun 2012 16:12:05 -0500, Paavo Helde
> <myfirstname@osa.pri.ee> wrote:
> 
>>Mark <i@dontgetlotsofspamanymore.invalid> wrote in
>>news:t8llu797ibglliasbsqvte9r0fitpmkndk@4ax.com: 
>>
>>> 
>>>>See e.g. http://en.wikipedia.org/wiki/Volatile_variable
>>> 
>>> This page does show cases where use of the volatile keyword is
>>> necessary.
>> 
>> Not sure what you mean, for C and C++ I see only discussion about
>> memory- mapped hardware. The examples about multi-thread
>> synchronization with volatile are talking about Java and C# where
>> this keyword apparently means something different.
> 
> As I understand it there is still a risk of the unsafe opimizations
> which the volatile keyword would prevent.

While that's true, the memory barriers that come with synchronized
processor instructions, synchronization library functions/objects or the
new C++ synchronization mechanisms prevent these unsafe optimizations
also. That's why volatile is not necessary if you use these.

Gerhard
0
Reply gelists (266) 6/28/2012 6:27:55 PM

On 06/28/12 08:46 PM, Juha Nieminen wrote:
> Ian Collins<ian-news@hotmail.com>  wrote:
>>> Anyways, if the choice is between writing 'volatile' and writing several
>>> lines of code to implement locking in a situation where locking is not
>>> needed, the choice is pretty clear.
>>
>> Until you have to run on a multi-core system....
>
> What difference would that make?

On reflection, possibly not a lot.  volatile should keep the variable 
out of caches and visible to the other cores.

-- 
Ian Collins
0
Reply ian-news (9886) 6/28/2012 7:16:29 PM

Op 28-Jun-12 7:55, Juha Nieminen schreef:
> Ian Collins<ian-news@hotmail.com>  wrote:
>> Did you measure?  In the absence of contention, the impact of locking
>> should be close to none.
>
> Locking is always expensive, even if no thread has to wait at any point.

It depends; for example on Windows critical sections have minimal 
overhead on single core systems when there is no thread contention. 
Basically it boils down to a (sort-of) atomic test-and-set instruction, 
and only when the test fails it makes the (expensive) jump to kernel 
mode to halt the thread. On multi-core systems the test-and-set 
instruction gets a LOCK prefix to synchronize across cores which makes 
it a bit more expensive. OTOH if you use just 'volatile' variables to 
synchronize across threads in on x86 multi-core environments you would 
need the LOCK prefix as well for read-modify-write operations.

> That's the very reason why so much research is being done on lock-free
> containers and atomic operations.
>
> Anyways, if the choice is between writing 'volatile' and writing several
> lines of code to implement locking in a situation where locking is not
> needed, the choice is pretty clear.

In that case you might even get away with leaving out 'volatile' as well.
0
Reply dombo (238) 6/28/2012 7:20:06 PM

Ian Collins <ian-news@hotmail.com> writes:
>On 06/28/12 08:46 PM, Juha Nieminen wrote:
>> Ian Collins<ian-news@hotmail.com>  wrote:
>>>> Anyways, if the choice is between writing 'volatile' and writing several
>>>> lines of code to implement locking in a situation where locking is not
>>>> needed, the choice is pretty clear.
>>>
>>> Until you have to run on a multi-core system....
>>
>> What difference would that make?
>
>On reflection, possibly not a lot.  volatile should keep the variable 
>out of caches and visible to the other cores.

The caching hardware itself ensures that the caches on all cores/sockets are
coherent, such that there will only be one core that holds a modified
value of the memory line (64-bytes for most) and all other cores will invalidate any
copies of the cache line. (or transfer the modified value to the new requesting
core)   This applies to all architectures (ppc, arm, x86, ia64)[*].  volatile
has nothing to do with the hardware architecture - it is simply a directive to
the compiler that ensures that every reference results in a corresponding load
instruction, and every modification results in a corresponding store instruction.
A location marked volatile will be cached in the same manner as any other location
(assuming that the memory location and/or access uses cachable semantics, see MTRR for
intel or the access attribute on the MIPS LD/ST instructions).

Now the processor pipelines can play tricks such that memory accesses for sequential
instructions may be issued out of order, but for strongly-ordered architectures such as i686/x86_64,
this will never be exposed to the program - the program will always see the accesses in
program order.    For weakly-ordered architectures, programs that share memory between
cores (multithreaded or shm/mmap/shm_open) need to be cognizent of this and use the
appropriate memory barrier instructions (e.g. some MIPS and ARM implementations).

A good example of how a weakly-ordered processor can cause problems is shown
here: http://wanderingcoder.net/2011/04/01/arm-memory-ordering/

scott

[*] it doesn't apply to tilera, but then that's a different beast in so many ways.
0
Reply scott1 (356) 6/28/2012 8:04:12 PM

Dombo <dombo@disposable.invalid> writes:
>Op 28-Jun-12 7:55, Juha Nieminen schreef:
>> Ian Collins<ian-news@hotmail.com>  wrote:
>>> Did you measure?  In the absence of contention, the impact of locking
>>> should be close to none.
>>
>> Locking is always expensive, even if no thread has to wait at any point.
>
>It depends; for example on Windows critical sections have minimal 
>overhead on single core systems when there is no thread contention. 
>Basically it boils down to a (sort-of) atomic test-and-set instruction, 

plus at least two branches (function call & return), right?   Which causes
non-locality evictions from the icache and increases the icache footprint
of the application.  Sure, the BTB helps the performance, and the internal
stack return value caches in most modern processors also help, but you're
still bringing in a new cache line for the locking code.

Using atomics is better, and modern processors implement locked bus
transactions by gaining exclusive access to the cache line for the
duration of the operation.  The LOCK prefix has two effects on
performance:
   - If the access straddles two cache lines (i.e. a 64-bit access where
     32-bits are in one cache line and 32-bits are in the next), then
     a global bus lock will be asserted which will cause all cores on
     all sockets to stall memory accesses until the instruction completes.
     DON'T DO THIS!  Look at your atomics and make sure they are completely
     enclosed in a single cache line.

   - The LOCK prefix will serialize the core.   This means that all outstanding
     stores will be commited (to a modified cache line) and all previous loads
     will be completed before the locked instruction executes.   This will have
     a small performance effect by reducing parallelism in the CPU.

>and only when the test fails it makes the (expensive) jump to kernel 
>mode to halt the thread. On multi-core systems the test-and-set 
>instruction gets a LOCK prefix to synchronize across cores which makes 
>it a bit more expensive.

Fortunately, on x86 anyway, loads from the integral data types (byte,
word, doubleword and extended doubleword (8, 16, 32 and 64)) are always
atomic without the LOCK prefix.

However, if the atomic being loaded is related to some other variable (e.g. the count
of entries in a queue and the queue listhead), then the LOCK prefix should
be used on loads of the atomic value to ensure that all prior stores are
completed  before the load issues.

> OTOH if you use just 'volatile' variables to 
>synchronize across threads in on x86 multi-core environments you would 
>need the LOCK prefix as well for read-modify-write operations.

Not necessarily, particularly for booleans.  If you are incrementing or
decrementing a variable, then you should use the appropriate atomic
operations (either C11 or C++11 standard ops or the older GNU extensions,
or inline ASM) - but only if the exact value of the variable need be
precise (e.g. for reference counting an object); if missing an increment
or a decrement doesn't matter (e.g. for an event counter), then
atomics aren't necessary.

>
>> That's the very reason why so much research is being done on lock-free
>> containers and atomic operations.
>>
>> Anyways, if the choice is between writing 'volatile' and writing several
>> lines of code to implement locking in a situation where locking is not
>> needed, the choice is pretty clear.
>
>In that case you might even get away with leaving out 'volatile' as well.

No, volatile controls _compiler optimizations_ only.   You use volatile
to tell the compiler to always load and always store when the variable is
accessed or modified.

scott
0
Reply scott1 (356) 6/28/2012 8:26:09 PM

Op 28-Jun-12 21:16, Ian Collins schreef:
> On 06/28/12 08:46 PM, Juha Nieminen wrote:
>> Ian Collins<ian-news@hotmail.com> wrote:
>>>> Anyways, if the choice is between writing 'volatile' and writing
>>>> several
>>>> lines of code to implement locking in a situation where locking is not
>>>> needed, the choice is pretty clear.
>>>
>>> Until you have to run on a multi-core system....
>>
>> What difference would that make?
>
> On reflection, possibly not a lot. volatile should keep the variable out
> of caches and visible to the other cores.

I don't know what the C++ 11 standard says about this but, on VS2010 it 
does nothing special to keep volatile variables out of the cache.

0
Reply dombo (238) 6/28/2012 8:26:54 PM

Juha Nieminen=E6=96=BC 2012=E5=B9=B46=E6=9C=8828=E6=97=A5=E6=98=9F=E6=9C=9F=
=E5=9B=9BUTC+8=E4=B8=8B=E5=8D=881=E6=99=8253=E5=88=8640=E7=A7=92=E5=AF=AB=
=E9=81=93=EF=BC=9A
> Paavo Helde <myfirstname@osa.pri.ee> wrote:
> > Juha Nieminen <nospam@thanks.invalid> wrote in
> >> In a program I had a case where I was just reading an integral from
> >> one thread that was changed by another thread to signal a minor,

 Well, this is the old joke of C with switches and cases not dupliating=20
the key value in cores to execute in parallel.



> >> unimportant effect (namely something related to updating the UI). This
> >> did not require full-fledged locking (which would have made it
> >> needlessly expensive) because if the integral had a wrong value for a
> >> split second, that wasn't really a catastrophical event.
> >=20
> > Out of curiosity - did you measure how much the "needlessly expensive"=
=20
> > locking was slower than using a volatile? I'm asking because I find mys=
elf=20
> > often in urge to invent all kind of clever tricks to bypass proper lock=
ing.
>=20
> No, but it was enough that the options were between writing 'volatile'
> and several lines of code using a non-standard library to implement prope=
r
> locking, so the choice was rather clear to me.
>=20
> (In C++11 it might have been enough to declare the integral to be atomic,
> and it would probably have been equally fast and fully synchronized. Howe=
ver,
> I did not have C++11 at my disposal at that time.)



Juha Nieminen=E6=96=BC 2012=E5=B9=B46=E6=9C=8828=E6=97=A5=E6=98=9F=E6=9C=9F=
=E5=9B=9BUTC+8=E4=B8=8B=E5=8D=881=E6=99=8253=E5=88=8640=E7=A7=92=E5=AF=AB=
=E9=81=93=EF=BC=9A
> Paavo Helde <myfirstname@osa.pri.ee> wrote:
> > Juha Nieminen <nospam@thanks.invalid> wrote in
> >> In a program I had a case where I was just reading an integral from
> >> one thread that was changed by another thread to signal a minor,
> >> unimportant effect (namely something related to updating the UI). This
> >> did not require full-fledged locking (which would have made it
> >> needlessly expensive) because if the integral had a wrong value for a
> >> split second, that wasn't really a catastrophical event.
> >=20
> > Out of curiosity - did you measure how much the "needlessly expensive"=
=20
> > locking was slower than using a volatile? I'm asking because I find mys=
elf=20
> > often in urge to invent all kind of clever tricks to bypass proper lock=
ing.
>=20
> No, but it was enough that the options were between writing 'volatile'
> and several lines of code using a non-standard library to implement prope=
r
> locking, so the choice was rather clear to me.
>=20
> (In C++11 it might have been enough to declare the integral to be atomic,
> and it would probably have been equally fast and fully synchronized. Howe=
ver,
> I did not have C++11 at my disposal at that time.)

I think MS did sense the need for multi-core programming in C#.

But C++ is getting old to have a kid in  the C family.
0
Reply dihedral88888 (786) 6/29/2012 4:53:50 AM

On Tue, 26 Jun 2012 at 17:04 GMT, Victor Bazarov <v.bazarov@comcast.invalid> wrote:
> On 6/26/2012 12:11 PM, 伏虎 wrote:
>> how to use volatile key word?
>>
>> Assume I have a global variable shared by two or more tasks.  Do I
>> need to declare it as volatile?
>
> 'volatile' usually means that the object's value can be changed at any 
> moment by some mechanism outside of your program control.  When you have 
> a global variable shared between parts of *your program*, that's all 
> under your control, and hence doesn't need to be declared 'volatile'.
>

Right...

Also check this doc:
http://www.kernel.org/doc/Documentation/volatile-considered-harmful.txt

even though it is for kernel, it applies to user-space applications as
well.

Thanks.
0
Reply xiyou.wangcong (255) 6/29/2012 8:12:42 AM

On Thu, 28 Jun 2012 15:27:55 -0300, Gerhard Fiedler
<gelists@gmail.com> wrote:

>Mark wrote:
>
>> On Wed, 27 Jun 2012 16:12:05 -0500, Paavo Helde
>> <myfirstname@osa.pri.ee> wrote:
>> 
>>>Mark <i@dontgetlotsofspamanymore.invalid> wrote in
>>>news:t8llu797ibglliasbsqvte9r0fitpmkndk@4ax.com: 
>>>
>>>> 
>>>>>See e.g. http://en.wikipedia.org/wiki/Volatile_variable
>>>> 
>>>> This page does show cases where use of the volatile keyword is
>>>> necessary.
>>> 
>>> Not sure what you mean, for C and C++ I see only discussion about
>>> memory- mapped hardware. The examples about multi-thread
>>> synchronization with volatile are talking about Java and C# where
>>> this keyword apparently means something different.
>> 
>> As I understand it there is still a risk of the unsafe opimizations
>> which the volatile keyword would prevent.
>
>While that's true, the memory barriers that come with synchronized
>processor instructions, synchronization library functions/objects or the
>new C++ synchronization mechanisms prevent these unsafe optimizations
>also. That's why volatile is not necessary if you use these.

We can't all rely on new features.  I have to maintain code that needs
to work on a large variety of different compilers (including quite old
ones).
-- 
(\__/)  M.
(='.'=) If a man stands in a forest and no woman is around
(")_(") is he still wrong?

0
Reply i1658 (113) 6/29/2012 8:43:17 AM

Mark wrote:

> On Thu, 28 Jun 2012 15:27:55 -0300, Gerhard Fiedler
> <gelists@gmail.com> wrote:
> 
>>Mark wrote:
>>
>>> On Wed, 27 Jun 2012 16:12:05 -0500, Paavo Helde
>>> <myfirstname@osa.pri.ee> wrote:
>>> 
>>>>Mark <i@dontgetlotsofspamanymore.invalid> wrote in
>>>>news:t8llu797ibglliasbsqvte9r0fitpmkndk@4ax.com: 
>>>>
>>>>> 
>>>>>>See e.g. http://en.wikipedia.org/wiki/Volatile_variable
>>>>> 
>>>>> This page does show cases where use of the volatile keyword is
>>>>> necessary.
>>>> 
>>>> Not sure what you mean, for C and C++ I see only discussion about
>>>> memory- mapped hardware. The examples about multi-thread
>>>> synchronization with volatile are talking about Java and C# where
>>>> this keyword apparently means something different.
>>> 
>>> As I understand it there is still a risk of the unsafe opimizations
>>> which the volatile keyword would prevent.
>>
>>While that's true, the memory barriers that come with synchronized
>>processor instructions, synchronization library functions/objects or the
>>new C++ synchronization mechanisms prevent these unsafe optimizations
>>also. That's why volatile is not necessary if you use these.
> 
> We can't all rely on new features.  I have to maintain code that needs
> to work on a large variety of different compilers (including quite old
> ones).

FWIW, I wasn't talking about new features /only/. Working
implementations of pthread and compiler support for synchronized atomics
have been around since long before C++11.

Gerhard
0
Reply gelists (266) 6/29/2012 6:14:04 PM

Gerhard Fiedler <gelists@gmail.com> writes:
>Mark wrote:
>
>> On Thu, 28 Jun 2012 15:27:55 -0300, Gerhard Fiedler
>> <gelists@gmail.com> wrote:
>> 
>>>Mark wrote:
>>>
>>>> On Wed, 27 Jun 2012 16:12:05 -0500, Paavo Helde
>>>> <myfirstname@osa.pri.ee> wrote:
>>>> 
>>>>>Mark <i@dontgetlotsofspamanymore.invalid> wrote in
>>>>>news:t8llu797ibglliasbsqvte9r0fitpmkndk@4ax.com: 
>>>>>
>>>>>> 
>>>>>>>See e.g. http://en.wikipedia.org/wiki/Volatile_variable
>>>>>> 
>>>>>> This page does show cases where use of the volatile keyword is
>>>>>> necessary.
>>>>> 
>>>>> Not sure what you mean, for C and C++ I see only discussion about
>>>>> memory- mapped hardware. The examples about multi-thread
>>>>> synchronization with volatile are talking about Java and C# where
>>>>> this keyword apparently means something different.
>>>> 
>>>> As I understand it there is still a risk of the unsafe opimizations
>>>> which the volatile keyword would prevent.
>>>
>>>While that's true, the memory barriers that come with synchronized
>>>processor instructions, synchronization library functions/objects or the
>>>new C++ synchronization mechanisms prevent these unsafe optimizations
>>>also. That's why volatile is not necessary if you use these.
>> 
>> We can't all rely on new features.  I have to maintain code that needs
>> to work on a large variety of different compilers (including quite old
>> ones).
>
>FWIW, I wasn't talking about new features /only/. Working
>implementations of pthread and compiler support for synchronized atomics
>have been around since long before C++11.

And working code using volatile has been around since long before pthreads.

scot
0
Reply scott1 (356) 6/29/2012 6:40:20 PM

Scott Lurndal wrote:

> Gerhard Fiedler <gelists@gmail.com> writes:
>>Mark wrote:
>>
>>> On Thu, 28 Jun 2012 15:27:55 -0300, Gerhard Fiedler
>>> <gelists@gmail.com> wrote:
>>> 
>>>>Mark wrote:
>>>>
>>>>> On Wed, 27 Jun 2012 16:12:05 -0500, Paavo Helde
>>>>> <myfirstname@osa.pri.ee> wrote:
>>>>> 
>>>>>>Mark <i@dontgetlotsofspamanymore.invalid> wrote in
>>>>>>news:t8llu797ibglliasbsqvte9r0fitpmkndk@4ax.com: 
>>>>>>
>>>>>>> 
>>>>>>>>See e.g. http://en.wikipedia.org/wiki/Volatile_variable
>>>>>>> 
>>>>>>> This page does show cases where use of the volatile keyword is
>>>>>>> necessary.
>>>>>> 
>>>>>> Not sure what you mean, for C and C++ I see only discussion about
>>>>>> memory- mapped hardware. The examples about multi-thread
>>>>>> synchronization with volatile are talking about Java and C# where
>>>>>> this keyword apparently means something different.
>>>>> 
>>>>> As I understand it there is still a risk of the unsafe opimizations
>>>>> which the volatile keyword would prevent.
>>>>
>>>>While that's true, the memory barriers that come with synchronized
>>>>processor instructions, synchronization library functions/objects or the
>>>>new C++ synchronization mechanisms prevent these unsafe optimizations
>>>>also. That's why volatile is not necessary if you use these.
>>> 
>>> We can't all rely on new features.  I have to maintain code that needs
>>> to work on a large variety of different compilers (including quite old
>>> ones).
>>
>>FWIW, I wasn't talking about new features /only/. Working
>>implementations of pthread and compiler support for synchronized atomics
>>have been around since long before C++11.
> 
> And working code using volatile has been around since long before
> pthreads.

So what? It seems to get more and more difficult to make a point that
uses more than two sentences or 12 words... :)

The line of arguments here is (I'm trying to stay under 12 words):

(A) "volatile" is needed to prevent unsafe compiler optimizations.
(B) Other methods also prevent unsafe optimizations.
(A)  I can't use these other methods because I can't use C++11.
(B) Many of those other methods don't need C++11 features.

I couldn't put the whole chain in 12 words, but maybe this shortened
summary helps illustrate anyway why the fact that in some situations
volatile works has nothing to do with this sub-thread. 

Gerhard
0
Reply gelists (266) 6/30/2012 3:17:12 PM

Dombo skrev 2012-06-28 22:26:
> Op 28-Jun-12 21:16, Ian Collins schreef:
>> On 06/28/12 08:46 PM, Juha Nieminen wrote:
>>> Ian Collins<ian-news@hotmail.com> wrote:
>>>>> Anyways, if the choice is between writing 'volatile' and writing
>>>>> several
>>>>> lines of code to implement locking in a situation where locking is not
>>>>> needed, the choice is pretty clear.
>>>>
>>>> Until you have to run on a multi-core system....
>>>
>>> What difference would that make?
>>
>> On reflection, possibly not a lot. volatile should keep the variable out
>> of caches and visible to the other cores.
>
> I don't know what the C++ 11 standard says about this but, on VS2010 it
> does nothing special to keep volatile variables out of the cache.
>

That's because VS2010 only targets platforms where the hardware does the 
cache sync. So it doesn't have to do anything.

It also means that the code can easily be non-portable, because "works 
on x86" doesn't mean "works elsewhere".


Bo Persson


0
Reply bop (1069) 7/1/2012 2:59:44 PM

On Sun, 01 Jul 2012 16:59:44 +0200, Bo Persson wrote:

>>> On reflection, possibly not a lot. volatile should keep the variable out
>>> of caches and visible to the other cores.
>>
>> I don't know what the C++ 11 standard says about this but, on VS2010 it
>> does nothing special to keep volatile variables out of the cache.
> 
> That's because VS2010 only targets platforms where the hardware does the 
> cache sync. So it doesn't have to do anything.

It's more because "volatile" is normally understood as requesting that the
compiler doesn't re-order or coalesce load/store operations, not that it
should provide workarounds for re-ordering performed by the CPU. If you
want the latter, you need to use explicit (and platform-specific) fence
operations.

> It also means that the code can easily be non-portable, because "works 
> on x86" doesn't mean "works elsewhere".

x86 has one of the strongest memory consistency models. At the other
extreme, Alpha, PPC, IA-64 and ARM have very weak consistency.

Coupled with the lack of any alignment constraints, x86 is about the worst
test platform if you want portable code, as it papers over the most common
issues. On the upside, being little-endian means that omitting
host-order/network-order conversions will show up.

0
Reply nobody (4806) 7/1/2012 4:05:15 PM

Nobody=E6=96=BC 2012=E5=B9=B47=E6=9C=882=E6=97=A5=E6=98=9F=E6=9C=9F=E4=B8=
=80UTC+8=E4=B8=8A=E5=8D=8812=E6=99=8205=E5=88=8615=E7=A7=92=E5=AF=AB=E9=81=
=93=EF=BC=9A
> On Sun, 01 Jul 2012 16:59:44 +0200, Bo Persson wrote:
>=20
> >>> On reflection, possibly not a lot. volatile should keep the variable =
out
> >>> of caches and visible to the other cores.
> >>
> >> I don't know what the C++ 11 standard says about this but, on VS2010 i=
t
> >> does nothing special to keep volatile variables out of the cache.
> >=20
> > That's because VS2010 only targets platforms where the hardware does th=
e=20
> > cache sync. So it doesn't have to do anything.
>=20
> It's more because "volatile" is normally understood as requesting that th=
e
> compiler doesn't re-order or coalesce load/store operations, not that it
> should provide workarounds for re-ordering performed by the CPU. If you
> want the latter, you need to use explicit (and platform-specific) fence
> operations.
>=20
> > It also means that the code can easily be non-portable, because "works=
=20
> > on x86" doesn't mean "works elsewhere".
>=20
> x86 has one of the strongest memory consistency models. At the other
> extreme, Alpha, PPC, IA-64 and ARM have very weak consistency.
>=20
> Coupled with the lack of any alignment constraints, x86 is about the wors=
t
> test platform if you want portable code, as it papers over the most commo=
n
> issues. On the upside, being little-endian means that omitting
> host-order/network-order conversions will show up.

 The X86 memory related instructions were designed in the period that=20
the DRAM chips were expensive in the consumer markete in the 70's.

But Intel did support the virtual mode long time ago.

=20
=20
0
Reply dihedral88888 (786) 7/2/2012 5:44:28 AM

On 28 Jun., Scott Lurndal wrote:

[snipped discussion about proper use of keyword "volatile"]

> The caching hardware itself ensures that the caches on all cores/sockets are
> coherent, such that there will only be one core that holds a modified
> value of the memory line (64-bytes for most) and all other cores will invalidate any
> copies of the cache line.

[snip]
>
> volatile
> has nothing to do with the hardware architecture - it is simply a directive to
> the compiler that ensures that every reference results in a corresponding load
> instruction, and every modification results in a corresponding store instruction.

[snip]

I think that this is pretty much the answer to Paavo's question
whether the C++ standard guarantees this behaviour. Do you happen to
know chapter and verse of the C++ standard?

Thanks in advance,
Stuart
0
Reply DerTopper (387) 7/2/2012 8:06:42 AM

Scott Lurndal wrote:
> Fortunately, on x86 anyway, loads from the integral data types (byte,
> word, doubleword and extended doubleword (8, 16, 32 and 64)) are always
> atomic without the LOCK prefix.
> 
> However, if the atomic being loaded is related to some other variable (e.g.
> the count
> of entries in a queue and the queue listhead), then the LOCK prefix should
> be used on loads of the atomic value to ensure that all prior stores are
> completed  before the load issues.

The LOCK prefix can be used only on read-modify-write instructions with 
a memory destination. It cannot be used on a load instruction. (You 
could use a locked xadd or cmpxchg to load a value; is that what you 
meant?)
0
Reply prl1 (102) 8/3/2012 12:35:11 AM

34 Replies
49 Views

(page loaded in 1.036 seconds)


Reply: