Assuming I've got an atomic & fenced CAS instruction available via a
library; is it possible to use this to write a lock free algorithm in C
++?
Although the memory fence would prevent hardware re-ordering I'm
worried that compiler could reorder memory access rendering the fence
useless. For example consider the below, grossly simplified example:
bool continue = true;
while(continue) {
while(isLocked) { }
// atomically set isLocked to true if it has the value false
if(AtomicTestAndSet(&isLocked, true, false)) {
continute = false;
}
}
SharedStdVector[3] += 10;
Is there anything that prevents the compiler from reordering the
access to SharedStdVector so that it is executed before mutex is
aquired? Is there anything I can do to prevent such reordering?
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
amsk1982 (1)
|
7/12/2010 6:22:44 AM |
|
On Jul 12, 2:22 pm, Andy <amsk1...@gmail.com> wrote:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless.
Compilers will not do reordering when faced with inline assembly or
the calling of non-pure functions whose definitions are not available
in the same translation unit.
Note however that since this is not (yet) standardized, there is no
formal guarantee.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Mathias
|
7/12/2010 3:25:38 PM
|
|
On 2010-07-12 03:22:44 -0400, Andy said:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }
> }
> SharedStdVector[3] += 10;
>
> Is there anything that prevents the compiler from reordering the
> access to SharedStdVector so that it is executed before mutex is
> aquired? Is there anything I can do to prevent such reordering?
It depends on the compiler. But C++0x will have atomic variables and
the implementation will take care of the compiler-specific techniques
to prevent reordering.
--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The Standard C++ Library Extensions: a Tutorial and Reference (www.petebecker.com/tr1book)
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Pete
|
7/12/2010 3:25:53 PM
|
|
On 07/12/2010 02:22 PM, Andy wrote:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }
> }
> SharedStdVector[3] += 10;
>
> Is there anything that prevents the compiler from reordering the
> access to SharedStdVector so that it is executed before mutex is
> aquired? Is there anything I can do to prevent such reordering?
>
You need to look up the thread library implementation for the C++ you
are using. This will provide you with various mechanisms for creating
thread safe code. In your case you'd need a mutex, (short for mutual
exclusion).
HTH
cpp4ever
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
cpp4ever
|
7/12/2010 3:26:05 PM
|
|
On Jul 12, 6:22 am, Andy <amsk1...@gmail.com> wrote:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }}
>
> SharedStdVector[3] += 10;
>
> Is there anything that prevents the compiler from reordering the
> access to SharedStdVector so that it is executed before mutex is
> aquired? Is there anything I can do to prevent such reordering?
When implementing Peterson's (or Decker's) algorithms you may, in
addition to hardware memory barriers, also add compiler memory
barriers:
http://en.wikipedia.org/wiki/Memory_barrier#Out-of-order_execution_versus_compiler_reordering_optimizations
http://en.wikipedia.org/wiki/Memory_ordering#Compiler_memory_barrier
It does not always work to only make the variables volatile, and you
may resort to the compiler-specific hacks above.
Or if speed is not an issue, just use some form of existing library
(like boost, etc.)
Marius.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Lailoken
|
7/12/2010 3:28:25 PM
|
|
Andy <amsk1982@gmail.com> writes:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
Lock-free algorithms and anything else that uses atomics for memory
ordering requires the compiler and CPU respect the ordering constraints.
With a C++0x compiler, the std::atomic<> facilities will ensure this
holds.
With current compilers, it may be possible with a library, if the
library uses the appropriate constructs for the compiler. e.g. the
just::thread implementation of std::atomic<> works with gcc 4.3&4.4,
MSVC2008 and MSVC2010, and uses compiler-specific constructs to ensure
that the ordering constaints are enforced. Earlier versions of MSVC such
as MSVC.Net 2003 do not provide the necessary memory ordering guarantees
to be able to implement full memory ordering semantics in a separate
library.
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }
> }
> SharedStdVector[3] += 10;
>
> Is there anything that prevents the compiler from reordering the
> access to SharedStdVector so that it is executed before mutex is
> aquired? Is there anything I can do to prevent such reordering?
The C++03 standard does not deal with threads. C++0x supports a full
memory model and atomics. Anything else is up to the compiler
implementation.
Anthony
--
Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/
just::thread C++0x thread library http://www.stdthread.co.uk
Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Anthony
|
7/12/2010 3:28:37 PM
|
|
Andy wrote:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
The short answer is yes.
Unfortunately, there's more to that than a simple answer.
First, any lock-free algorithm should be thoroughly analyzed prior to
trying to implement it in any language (for example, your algorithm is
not lock-free at all! More on that at the end of the post)
Second, it's very hard to write portable lock-free C++ code.
You are always at the mercy of the library that you're using to
implement atomic primitives. For example, does your "AtomicTestAndSet"
function feature acqure or release semantics? (i.e., does it issue a
relevant memory fence?) Does the library that you're using give you any
guarantees with respect to compiler cooperation (like pthread library does).
With advent of c++0x's atoimc<> and boost::atomic it's becoming easier
however.
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }
> }
> SharedStdVector[3] += 10;
This snippet has several problems.
First, even if we assume that AtomicTestAndSet has both acquire and
release semantics (i.e., it issues a full memory barrier), the access to
isLocked is not synchronized. The line "while (isLocked)" might go into
an infinite loop even if another thread updates isLocked to false. It
can happen due to the compiler optimization where the compiler is
allowed to treat all code as a single-threaded code, and isLocked will
be read from the main memory only once.
One way to fix that is to mark isLocked as volatile (beware though,
there has been a long thread on this topic about 3 months ago which I
had been a part of. Using volatile is not for a faint of heart and
should only be done by a library-writer
http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/fcef57c539a42ef5/be98b8ef87ff64a4?hl=en&tvc=1&q=is+alexandrescu+right#be98b8ef87ff64a4).
Better yet, make all access to isLocked atomic, i.e.
atomic<bool> isLocked; or similar
But the bigger issue in your code is that even if you can get it to
work, it's not lock-free.
Lock-free is not about avoiding mutexes or other synchronization
primitives. Lock-free is a property of an algorithm that guarantees
progress of the overall program at every step. I.e., that at least one
of the threads will make progress.
What you did is you wrote an implementation of a lock without resorting
to system-provided primitives. But it's a lock nevertheless. And you
have all the issues with locking - what if another thread stalls and
never updates isLocked to false? Also, you can dead-lock as easily as
with system-provided locks. Moreover, since usually system-provided
locks don't resort to busy-waiting as you do they will most likely have
a better performance as well. And, at least on Linux, taking an
uncontested lock is never a system call, just a single atomic operation.
I'm pretty sure that taking an uncontested lock on other operating
systems is optimized as well.
HTH,
Andy.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Andy
|
7/12/2010 3:29:56 PM
|
|
On Jul 12, 5:22 pm, Andy <amsk1...@gmail.com> wrote:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }}
>
> SharedStdVector[3] += 10;
>
> Is there anything that prevents the compiler from reordering the
> access to SharedStdVector so that it is executed before mutex is
> aquired? Is there anything I can do to prevent such reordering?
>
I am the first to admit that I am not well-versed in these matters,
but in theory (and in practice as far as I have experienced) you
should be able to use the "volatile" keyword when declaring the
involved variables to signal to the compiler that their values
might change in some other context (e.g. some other thread) and it
should not try to mess around with the order of the statements
involving them.
For example, you could (and you probably should) declare "isLocked"
and SharedStdVector as "volatile", like this:
volatile bool isLocked = false;
volatile std::vector<int> SharedStdVector;
I don't know whether a volatile attribute on a non-POD class has
any effect or what kind. I have never tried that, in fact. But
it should work for simple data types.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Yaser
|
7/12/2010 3:30:21 PM
|
|
On Jul 12, 6:22 am, Andy <amsk1...@gmail.com> wrote:
> Assuming I've got an atomic & fenced CAS instruction available via a
> library; is it possible to use this to write a lock free algorithm in C
> ++?
>
> Although the memory fence would prevent hardware re-ordering I'm
> worried that compiler could reorder memory access rendering the fence
> useless. For example consider the below, grossly simplified example:
>
> bool continue = true;
> while(continue) {
> while(isLocked) { }
> // atomically set isLocked to true if it has the value false
> if(AtomicTestAndSet(&isLocked, true, false)) {
> continute = false;
> }}
>
> SharedStdVector[3] += 10;
>
> Is there anything that prevents the compiler from reordering the
> access to SharedStdVector so that it is executed before mutex is
> aquired? Is there anything I can do to prevent such reordering?
The short version is that C++ standard says nothing about threading.
All threading and synchronization issues are covered by other
standards. What tends to happen in practice is that the compiler will
be written with a particular threading library and model in mind, such
as POSIX. When you write POSIX code, you are not just using a library.
A library alone is insufficient to offer the needed guarantees because
the guarantees are not expressible in C++. That is, it's not just a
POSIX library. It's a POSIX compliant library \and compiler\.
Thus, I would suggest you look at the docs for this library which
offers the threading primitives, and look at the docs for your
compiler (if any). Also, you look at the output assembly to make sure
it's doing what it should be doing.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Joshua
|
7/12/2010 3:30:41 PM
|
|
On 2010-07-12 12:30:21 -0400, Yaser Zhian said:
>
> I am the first to admit that I am not well-versed in these matters,
> but in theory (and in practice as far as I have experienced) you
> should be able to use the "volatile" keyword when declaring the
> involved variables to signal to the compiler that their values
> might change in some other context (e.g. some other thread) and it
> should not try to mess around with the order of the statements
> involving them.
That's not sufficient. The problem is that, while the compiler isn't
allowed to reorder two volatile accesses, it doesn't necessarily
prevent reording of other instructions around those volatiles. The
result will often not be good.
--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The Standard C++ Library Extensions: a Tutorial and Reference (www.petebecker.com/tr1book)
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Pete
|
7/13/2010 1:52:46 AM
|
|
> > bool continue = true;
> > while(continue) {
> > while(isLocked) { }
> > // atomically set isLocked to true if it has the value false
> > if(AtomicTestAndSet(&isLocked, true, false)) {
> > continute = false;
> > }
> > }
> > SharedStdVector[3] += 10;
>
> This snippet has several problems.
[snip]
> But the bigger issue in your code is that even if you can get it to
> work, it's not lock-free.
Thanks for your comments; I realize the example was not lock free. It
was really just intended to make the re-ordering issues I was
concerned about explicit, and a simple spin mutex implementation is
the simplest, most obvious was to write a short piece of code to
illustrate this. I probably should have chosen a better example. I
must admit that I missed the "single read" optimization issue
(although if this was real code I hopefully wouldn't have done).
It seems to me that code such as this can best be made to work on non-C
++0X implementations by:
1) using an assembly library call to read isLocked before the CAS
thereby preventing compiler optimization of the read. I definitely
don't want the volatile modifier here as on some platforms I would
also get an unwanted fence (e.g. on MSVC), and the whole point of this
read is that it is non-fenced.
2) Ensuring that the CAS Instruction is defined to include a
"compiler memory barrier" in addition to the processor one. Code like
this be proven to work on non-C++0X compilers where there are no such
compiler barriers.
On C++0X implementations this should work with std::atomic (although I
haven't yet checked the guarantees this new class makes yet).
Is that correct?
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
|
|
0
|
|
|
|
Reply
|
Andy
|
7/13/2010 1:54:02 AM
|
|
|
10 Replies
155 Views
(page loaded in 0.143 seconds)
|