Simple, Experimental Wait-Free RW-Mutex Algorithm on Windows...

  • Follow


This simplistic reader/writer algorithm is wait-free in the sense that there 
are no loops; no CAS or LL/SC:


http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html


So far, the counting algorithm looks okay. I guess at a certain level its 
uses some of the same basic principals that one of Joe Seighs wait-free 
semaphore algorithms used. Everything takes place in single atomic 
statements which do not depend on any prior state holding a consistent value 
within a specific timeframe, unlike CAS or LL/SC. There is no need for any 
sort of "compare logic" in order to commit a mutation to the shared counter. 
You don't have to load; modify; store only if prior load is consistent... 
Therefore, simple increments, decrements and fetch-and-add is all that is 
really required. There is no timeout logic as of yet. However, I believe the 
same basic scheme that Joe used can be make to work within the context of my 
algorithm...

Anyway, it should compile fine. Please tinker around with it, and see if you 
can come up with some real nasty race-conditions!


Any thoughts?

;^) 

0
Reply Chris 10/31/2007 2:06:30 AM

"Chris Thomasson" <cristom@comcast.net> wrote in message 
news:i5idndsOPO1ifLranZ2dnUVZ_rignZ2d@comcast.com...
> This simplistic reader/writer algorithm is wait-free in the sense that 
> there are no loops; no CAS or LL/SC:
>
>
> http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
>
>
> So far, the counting algorithm looks okay. I guess at a certain level its 
> uses some of the same basic principals that one of Joe Seighs wait-free 
> semaphore algorithms used. Everything takes place in single atomic 
> statements which do not depend on any prior state holding a consistent 
> value within a specific timeframe, unlike CAS or LL/SC. There is no need 
> for any sort of "compare logic" in order to commit a mutation to the 
> shared counter. You don't have to load; modify; store only if prior load 
> is consistent... Therefore, simple increments, decrements and 
> fetch-and-add is all that is really required. There is no timeout logic as 
> of yet. However, I believe the same basic scheme that Joe used can be make 
> to work within the context of my algorithm...
>
> Anyway, it should compile fine. Please tinker around with it, and see if 
> you can come up with some real nasty race-conditions!
>
>
> Any thoughts?
>
> ;^)

Reload the code. There was a race-condition. On your case rdunlock would
detect a m_count < 1 and posted to the event waking the writer. Problem...
Please re-download the link, and tell me what you think:

http://appcore.home.comcast.net/~appcore/misc/rwmutex_win_hpp.html



Okay, here is a little contest... Who can identify the race-condition I
found? Hint: it has to do with rdunlock responding to wrlock... Anyway, the
corrected code is on the website.


Post followups here please:

http://groups.google.com/group/comp.programming.threads/browse_frm/thread/62cbf4851c1f63de



0
Reply Chris 10/31/2007 3:19:46 AM


Here is an updated version that has an experimental read-to-write upgrade 
function:


http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
_____________________________________________________________________

#if ! defined(RWMUTEX_WIN_HPP)
  #define RWMUTEX_WIN_HPP
  #if ! defined(WIN32_LEAN_AND_MEAN)
    #define WIN32_LEAN_AND_MEAN
  #endif
  #undef _WIN32_WINNT
  #define _WIN32_WINNT 0x0400
  #include <windows.h>
  #include <exception>
  #include <cassert>
  #include <climits>
  #if defined(_MSC_VER)
    #define RWMUTEX_WIN_UNEXPECTED unexpected
  #else
    #define RWMUTEX_WIN_UNEXPECTED std::unexpected
  #endif
/*************************************************************/




class rwmutex_win {
  rwmutex_win(rwmutex_win const&);
  rwmutex_win const& operator =(rwmutex_win const&);


private:
  LONG m_count;
  LONG m_rdwake;
  HANDLE m_wrwset;
  HANDLE m_rdwset;
  CRITICAL_SECTION m_wrlock;


public:
  rwmutex_win()
    : m_count(LONG_MAX),
      m_rdwake(0),
      m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
      m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
    if (! m_rdwset || ! m_wrwset) {
      if (m_rdwset) { CloseHandle(m_rdwset); }
      if (m_wrwset) { CloseHandle(m_wrwset); }
      throw;
    }
    InitializeCriticalSection(&m_wrlock);
  }

  ~rwmutex_win() throw() {
    DeleteCriticalSection(&m_wrlock);
    if (! CloseHandle(m_rdwset)) { assert(false); }
    if (! CloseHandle(m_wrwset)) { assert(false); }
  }


public:
  void wrlock() throw() {
    EnterCriticalSection(&m_wrlock);
    LONG count =
      InterlockedExchangeAdd(&m_count, -LONG_MAX);
    if (count < LONG_MAX) {
      if ( InterlockedExchangeAdd(&m_rdwake,
            LONG_MAX - count) + LONG_MAX - count) {
        if (WaitForSingleObject(m_wrwset,
              INFINITE) != WAIT_OBJECT_0) {
          assert(false); RWMUTEX_WIN_UNEXPECTED();
        }
      }
    }
  }

  void wrunlock() throw() {
    LONG count =
      InterlockedExchangeAdd(&m_count, LONG_MAX);
    if (count < 0) {
      if (! ReleaseSemaphore(m_rdwset, count * -1, 0)) {
        assert(false); RWMUTEX_WIN_UNEXPECTED();
      }
    }
    LeaveCriticalSection(&m_wrlock);
  }


public:
  void rdlock() throw() {
    LONG count = InterlockedDecrement(&m_count);
    if (count < 0) {
      if (WaitForSingleObject(m_rdwset,
            INFINITE) != WAIT_OBJECT_0) {
        assert(false); RWMUTEX_WIN_UNEXPECTED();
      }
    }
  }

  void rdunlock() throw() {
    LONG count = InterlockedIncrement(&m_count);
    if (count < 1) {
      if (! InterlockedDecrement(&m_rdwake)) {
        if (! SetEvent(m_wrwset)) {
          assert(false); RWMUTEX_WIN_UNEXPECTED();
        }
      }
    }
  }


public:
  bool tryrdtowr() throw() {
    if (TryEnterCriticalSection(&m_wrlock)) {
      LONG count =
        InterlockedExchangeAdd(&m_count, (-LONG_MAX) + 1) + 1;
      if (count < LONG_MAX) {
        if (InterlockedExchangeAdd(&m_rdwake,
              LONG_MAX - count) + LONG_MAX - count) {
          if (WaitForSingleObject(m_wrwset,
                INFINITE) != WAIT_OBJECT_0) {
            assert(false); RWMUTEX_WIN_UNEXPECTED();
          }
        }
      }
      return true;
    }
    return false;
  }
};




/*************************************************************/
#endif

_____________________________________________________________________




This was in response to the following query by Alexander Grigoriev:

http://groups.google.com/group/microsoft.public.win32.programmer.kernel/msg/63182c5c425e41f5



Does this look okay to you guys? If not, please help me fix it!

;^)


Thank you all. 

0
Reply Chris 11/13/2007 1:21:51 PM

Chris Thomasson wrote:
> Here is an updated version that has an experimental read-to-write 
> upgrade function:

Whats experimental about it?

Outline the experiment, what you are experimenting with?  Are you 
testing for something?

Well, experiments aside,  there are numerious design issues with this 
code, but the most glaring one in its vain attempt to protect resources, 
it fails to protect its entry points.

You need to protect not only whats in the box, but all interface points 
to the box.

--
HLS


0
Reply Pops 11/13/2007 4:16:10 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:eRY1vChJIHA.3636@TK2MSFTNGP03.phx.gbl...
> Chris Thomasson wrote:
>> Here is an updated version that has an experimental read-to-write upgrade 
>> function:
>
> Whats experimental about it?

The algorihtm itself.


>
> Outline the experiment, what you are experimenting with?  Are you testing 
> for something?
>
> Well, experiments aside,  there are numerious design issues with this 
> code, but the most glaring one in its vain attempt to protect resources, 
> it fails to protect its entry points.

If you found a bug point it out please. I can't find any so far.


>
> You need to protect not only whats in the box, but all interface points to 
> the box.


0
Reply Chris 11/13/2007 5:07:46 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:eRY1vChJIHA.3636@TK2MSFTNGP03.phx.gbl...
> Chris Thomasson wrote:
>> Here is an updated version that has an experimental read-to-write upgrade 
>> function:
>
> Whats experimental about it?
>
> Outline the experiment, what you are experimenting with?  Are you testing 
> for something?
>
> Well, experiments aside,  there are numerious design issues with this 
> code, but the most glaring one in its vain attempt to protect resources, 
> it fails to protect its entry points.

Your trolling attempts are not going to work.



> You need to protect not only whats in the box, but all interface points to 
> the box.

Are referring to the fact that I did not include a lock helper class for 
scoped locking? Why can't you make you own? Its trivial. 

0
Reply Chris 11/13/2007 5:13:16 PM

Chris Thomasson wrote:
> "Pops" <dude@nospamnospamnospam.com> wrote in message 
>>
>> Well, experiments aside,  there are numerious design issues with this 
>> code, but the most glaring one in its vain attempt to protect 
>> resources, it fails to protect its entry points.
> 
> Your trolling attempts are not going to work.

Seems to me, it already did!

>> You need to protect not only whats in the box, but all interface 
>> points to the box.
> 
> Are referring to the fact that I did not include a lock helper class for 
> scoped locking? Why can't you make you own? Its trivial.

Whoa, you posted code here BEGGING for help. It was pointed out to you 
the glaring problem with it, with a PROVEN example of a Application 
unstable operation of your code, and now you don't wish to heed the 
advice to fix it? pushing the buck to others?   Give me a break!

Who's trolling now?

--
HLS
0
Reply Pops 11/13/2007 5:17:02 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:%23FBzwkhJIHA.4476@TK2MSFTNGP06.phx.gbl...
> Chris Thomasson wrote:
>> "Pops" <dude@nospamnospamnospam.com> wrote in message
>>>
>>> Well, experiments aside,  there are numerious design issues with this 
>>> code, but the most glaring one in its vain attempt to protect resources, 
>>> it fails to protect its entry points.
>>
>> Your trolling attempts are not going to work.
>
> Seems to me, it already did!
>
>>> You need to protect not only whats in the box, but all interface points 
>>> to the box.
>>
>> Are referring to the fact that I did not include a lock helper class for 
>> scoped locking? Why can't you make you own? Its trivial.
>
> Whoa, you posted code here BEGGING for help. It was pointed out to you the 
> glaring problem with it, with a PROVEN example of a Application unstable 
> operation of your code, and now you don't wish to heed the advice to fix 
> it? pushing the buck to others?   Give me a break!
>
> Who's trolling now?

Show me exactly how my rw-mutex does not work? 

0
Reply Chris 11/13/2007 7:40:22 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:%23FBzwkhJIHA.4476@TK2MSFTNGP06.phx.gbl...
> Chris Thomasson wrote:
>> "Pops" <dude@nospamnospamnospam.com> wrote in message
>>>
>>> Well, experiments aside,  there are numerious design issues with this 
>>> code, but the most glaring one in its vain attempt to protect resources, 
>>> it fails to protect its entry points.
>>
>> Your trolling attempts are not going to work.
>
> Seems to me, it already did!
>
>>> You need to protect not only whats in the box, but all interface points 
>>> to the box.
>>
>> Are referring to the fact that I did not include a lock helper class for 
>> scoped locking? Why can't you make you own? Its trivial.
>
> Whoa, you posted code here BEGGING for help. It was pointed out to you the 
> glaring problem with it, with a PROVEN example of a Application unstable 
> operation of your code, and now you don't wish to heed the advice to fix 
> it? pushing the buck to others?   Give me a break!
>
> Who's trolling now?

I believe the example you gave was something like this:


{
  rwmutex_win rw;
  rw.rdlock();
}



Well, of course that is busted. Let me fix it:

{
  rwmutex_win rw;
  rw.rdlock();
  rw.rdunlock();
}

There, now what's wrong with my code? If you can help me perfect my 
rwmutex_win class, I will appreciate.it . So far, from what I can gather, 
you want me to implement a scoped locking helper class for it. Okay, I will 
add one and post it here shortly.

Thanks.

0
Reply Chris 11/13/2007 7:51:43 PM

Chris Thomasson wrote:

>>>> Well, experiments aside,  there are numerious design issues with 
>>>> this code, but the most glaring one in its vain attempt to protect 
 >>>>
>>>> You need to protect not only whats in the box, but all interface 
>>>> points to the box.
>>>
>>> Are referring to the fact that I did not include a lock helper class 
>>> for scoped locking? Why can't you make you own? Its trivial.
>>
>> Whoa, you posted code here BEGGING for help. It was pointed out to you 
>> the glaring problem with it, with a PROVEN example of a Application 
>> unstable operation of your code, and now you don't wish to heed the 
>> advice to fix it? pushing the buck to others?   Give me a break!

> Show me exactly how my rw-mutex does not work?

ah come on! Don't you know anything about software engineering, never 
mind SE, thats a class of of its own, Programming?  Do you know what 
Entry points?  Interface Points? Black Box concepts?  Can't you even 
test YOUR own programs?   Really, its seems like you are one of those 
wannabe programmers, probably uneducated, with no professional degree, 
self taught, which by no means doesn't mean there are no natural great 
ones, but you are not one of them.

I posted the example fault code already in the other thread where you 
already responded to with your silly *fix it yourself* response.

Need it again?  Fine!

void main(char argc, char *argv[])
{
    rwmutex_win rw;
    rw.rdunlock();
    rw.rdunlock();
    getch();
    rw.rdlock();
}

You need to control C/Break out of this code.

That is oppose to a method that would be used by our R/W classes with 
inherent initialization, and locks and unlocks.

void main(char argc, char *argv[])
{
    TReaderWriter mutex;
    ...
    {
      TReaderGrabber grab(mutex);
      ...
    }
}

Which would NOT lend itself to easy programming or sychronization order 
mistakes quite possible with your code.

Seriously, if you wish to exhibit yourself as even a close facsimile of 
a great multi-threaded developer "expert" atleast, please, don't 
embarass me with this baloney of yours.  And stop trying to pit that 
other usenet GROUP against this Microsoft GROUP.  That's nonsense and 
isn't going to impress anyone. The participants could care less, but you 
didn't insult them yet either.  Nonetheless, I could give a hoot about 
what your friends out there think.  It is meaningless to me.

Are you ready to play nice?

==
HLS
0
Reply Pops 11/13/2007 7:58:48 PM

"Chris Thomasson" <cristom@comcast.net> wrote in message 
news:Vo6dnY3WKuhxY6TanZ2dnUVZ_vumnZ2d@comcast.com...
> "Pops" <dude@nospamnospamnospam.com> wrote in message 
> news:%23FBzwkhJIHA.4476@TK2MSFTNGP06.phx.gbl...
>> Chris Thomasson wrote:
>>> "Pops" <dude@nospamnospamnospam.com> wrote in message
>>>>
>>>> Well, experiments aside,  there are numerious design issues with this 
>>>> code, but the most glaring one in its vain attempt to protect 
>>>> resources, it fails to protect its entry points.
>>>
>>> Your trolling attempts are not going to work.
>>
>> Seems to me, it already did!
>>
>>>> You need to protect not only whats in the box, but all interface points 
>>>> to the box.
>>>
>>> Are referring to the fact that I did not include a lock helper class for 
>>> scoped locking? Why can't you make you own? Its trivial.
>>
>> Whoa, you posted code here BEGGING for help. It was pointed out to you 
>> the glaring problem with it, with a PROVEN example of a Application 
>> unstable operation of your code, and now you don't wish to heed the 
>> advice to fix it? pushing the buck to others?   Give me a break!
>>
[...]

Okay. Here is the full example of the code you say proves by rwmutex_win 
class does not work:


>> void main(char argc, char *argv[])
>> {
>>    rwmutex_win rw;
>>    rw.rdunlock();
>>    rw.rdunlock();
>>    {
>>       rwmutex_win rw;
>>       rw.rdlock();
>>    }
>>
>>    getch();
>>
>>    rw.rdlock();
>> }
>>


That is clearly an incorrect use of a mutual exclusion synchronization 
object in general. Why are you releasing a read-lock when you have not 
acquired one? In other words, you create a rwmutex_win object on the stack 
and the first call you make to it is to unlock read-access. Why do you think 
this is okay?


For some reason I think you wrote that code in error on purpose to see if 
you could get some sort of rise out of me? Na... 

0
Reply Chris 11/13/2007 8:00:54 PM

The following code (using Chris's RW library) will lock up this simple 
application requiring an abortive ^C/break:

void main(char argc, char *argv[])
{
    rwmutex_win rw;
    rw.rdunlock();
    rw.rdunlock();
    getch();
    rw.rdlock();
}

Chris Thomasson wrote:

> For some reason I think you wrote that code in error on purpose to see 
> if you could get some sort of rise out of me? Na...

If that doesn't raise the hair in your back, or you do not recognize the 
possible dualities of improper synchronization, then god help anyone 
would use and trust your code!   Just fix it by checking your entry 
points or provide a local scope class wrapper for implmenentators to use.

The bottom line is you are a hypocrit, and it clearly shows your lack of 
software engineering knowledge. You do not have to gain the right to 
criticize any other else's work and insult other's people knowledge.

--
HLS
0
Reply Pops 11/13/2007 8:09:38 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:%23SvMNFjJIHA.1208@TK2MSFTNGP05.phx.gbl...
> The following code (using Chris's RW library) will lock up this simple 
> application requiring an abortive ^C/break:
>
> void main(char argc, char *argv[])
^^^^^^^^^^^^^^^^^^^

void main is classic mistake. Anyway:


> {
>    rwmutex_win rw;
>    rw.rdunlock();
>    rw.rdunlock();
>    getch();
>    rw.rdlock();
> }
>

Your code uses unlocks a read lock when it has not already acquired one. Now 
common! Here is the proper way to do it:

int main(char argc, char *argv[]) {
  rwmutex_win rw;
  rw.rdlock();
  rw.rdlock();
  getch();
  rw.rdunlock();
  rw.rdunlock();
  return 0;
}



0
Reply Chris 11/13/2007 8:47:10 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:eZrCK$iJIHA.3356@TK2MSFTNGP02.phx.gbl...
> Chris Thomasson wrote:
>
>>>>> Well, experiments aside,  there are numerious design issues with this 
>>>>> code, but the most glaring one in its vain attempt to protect
>>>>> You need to protect not only whats in the box, but all interface 
>>>>> points to the box.
>>>>
>>>> Are referring to the fact that I did not include a lock helper class 
>>>> for scoped locking? Why can't you make you own? Its trivial.
>>>
>>> Whoa, you posted code here BEGGING for help. It was pointed out to you 
>>> the glaring problem with it, with a PROVEN example of a Application 
>>> unstable operation of your code, and now you don't wish to heed the 
>>> advice to fix it? pushing the buck to others?   Give me a break!
>
>> Show me exactly how my rw-mutex does not work?
>
> ah come on! Don't you know anything about software engineering, never mind 
> SE, thats a class of of its own, Programming?  Do you know what Entry 
> points?  Interface Points? Black Box concepts?  Can't you even test YOUR 
> own programs?   Really, its seems like you are one of those wannabe 
> programmers, probably uneducated, with no professional degree, self 
> taught, which by no means doesn't mean there are no natural great ones, 
> but you are not one of them.
>
> I posted the example fault code already in the other thread where you 
> already responded to with your silly *fix it yourself* response.
>
> Need it again?  Fine!
>
> void main(char argc, char *argv[])
> {
>    rwmutex_win rw;
>    rw.rdunlock();
>    rw.rdunlock();
>    getch();
>    rw.rdlock();
> }

Read here:

http://groups.google.com/group/comp.programming.threads/msg/cede2e550058c2ff


0
Reply Chris 11/13/2007 8:59:18 PM

Chris Thomasson wrote:

>> The following code (using Chris's RW library) will lock up this simple 
>> application requiring an abortive ^C/break:
>>
>> void main(char argc, char *argv[])
>> {
>>    rwmutex_win rw;
>>    rw.rdunlock();
>>    rw.rdunlock();
>>    getch();
>>    rw.rdlock();
>> }
> 
> Your code uses unlocks a read lock when it has not already acquired one. 
> Now common! Here is the proper way to do it:
> 
> int main(char argc, char *argv[]) {
>  rwmutex_win rw;
>  rw.rdlock();
>  rw.rdlock();
>  getch();
>  rw.rdunlock();
>  rw.rdunlock();
>  return 0;
> }

You really don't have a clue?  How can you complain about the lack of 
our constructor not checking return values with a VERY LOW OCCURENCE 
rate of railure, essentially a NIL possibility, yet you no qualm with 
your own code with VERY HIGH OCCURRENCE unprotected functional entry points?

Honestly, you don't have a clue?

--
HLS
0
Reply Pops 11/13/2007 9:06:13 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:eZrCK$iJIHA.3356@TK2MSFTNGP02.phx.gbl...
> Chris Thomasson wrote:
>
>>>>> Well, experiments aside,  there are numerious design issues with this 
>>>>> code, but the most glaring one in its vain attempt to protect
>>>>> You need to protect not only whats in the box, but all interface 
>>>>> points to the box.
>>>>
>>>> Are referring to the fact that I did not include a lock helper class 
>>>> for scoped locking? Why can't you make you own? Its trivial.
>>>
>>> Whoa, you posted code here BEGGING for help. It was pointed out to you 
>>> the glaring problem with it, with a PROVEN example of a Application 
>>> unstable operation of your code, and now you don't wish to heed the 
>>> advice to fix it? pushing the buck to others?   Give me a break!
>
>> Show me exactly how my rw-mutex does not work?
>
> ah come on! Don't you know anything about software engineering, never mind 
> SE, thats a class of of its own, Programming?  Do you know what Entry 
> points?  Interface Points? Black Box concepts?  Can't you even test YOUR 
> own programs?   Really, its seems like you are one of those wannabe 
> programmers, probably uneducated, with no professional degree, self 
> taught, which by no means doesn't mean there are no natural great ones, 
> but you are not one of them.
>
> I posted the example fault code already in the other thread where you 
> already responded to with your silly *fix it yourself* response.
>
> Need it again?  Fine!
>
> void main(char argc, char *argv[])
> {
>    rwmutex_win rw;
>    rw.rdunlock();
>    rw.rdunlock();
>    getch();
>    rw.rdlock();
> }
>
> You need to control C/Break out of this code.
>
> That is oppose to a method that would be used by our R/W classes with 
> inherent initialization, and locks and unlocks.

The stack-based programmer friendly helper class is only as good as the 
underlying primitives.  Chris hasn't argued with the value of the helper 
class, nor implied that his implementation replaces it.  He is working on 
the meat of the problem, beneath the surface and not even visible to the end 
user.

>
> void main(char argc, char *argv[])
> {
>    TReaderWriter mutex;
>    ...
>    {
>      TReaderGrabber grab(mutex);
>      ...
>    }
> }
>
> Which would NOT lend itself to easy programming or sychronization order 
> mistakes quite possible with your code.
>
> Seriously, if you wish to exhibit yourself as even a close facsimile of a 
> great multi-threaded developer "expert" atleast, please, don't embarass me 
> with this baloney of yours.  And stop trying to pit that other usenet 
> GROUP against this Microsoft GROUP.  That's nonsense and isn't going to 
> impress anyone. The participants could care less, but you didn't insult 
> them yet either.  Nonetheless, I could give a hoot about what your friends 
> out there think.  It is meaningless to me.

I believe that the other group you are referring to 
(comp.programming.threads) is moderated while the Microsoft ones are open to 
anyone who wants to mouth off in public.

>
> Are you ready to play nice?

Chris always plays nice, as nearly as I can tell.

In fact, I think he was the one I had a very similar discussion with some 
time ago, because he told me a way to make my algorithm faster which in fact 
added overhead in my particular case.  The key differences were that our 
exchange ended relatively quickly when I explained I was discussing a case 
with only one reader (and I never claimed it was an r/w-mutex or r/w queue) 
and that neither of us ever resorted to ad-hominem attacks (nevermind 
attacks thoroughly without basis which seem to be the norm in your threads 
here).

And BTW your code is appallingly inefficient for the single reader case, you 
should be using interlocked singly linked-linked lists which give single 
reader, multiple writer, lockless, low overhead operation.  Microsoft even 
has provided an implementation for you.

http://msdn2.microsoft.com/en-us/library/ms686962.aspx


>
> ==
> HLS 


0
Reply Ben 11/13/2007 9:22:54 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:ejM40kjJIHA.5360@TK2MSFTNGP03.phx.gbl...
> Chris Thomasson wrote:
>
>>> The following code (using Chris's RW library) will lock up this simple 
>>> application requiring an abortive ^C/break:
>>>
>>> void main(char argc, char *argv[])
>>> {
>>>    rwmutex_win rw;
>>>    rw.rdunlock();
>>>    rw.rdunlock();
>>>    getch();
>>>    rw.rdlock();
>>> }
>>
>> Your code uses unlocks a read lock when it has not already acquired one. 
>> Now common! Here is the proper way to do it:
>>
>> int main(char argc, char *argv[]) {
>>  rwmutex_win rw;
>>  rw.rdlock();
>>  rw.rdlock();
>>  getch();
>>  rw.rdunlock();
>>  rw.rdunlock();
>>  return 0;
>> }
>
> You really don't have a clue?  How can you complain about the lack of our 
> constructor not checking return values with a VERY LOW OCCURENCE rate of 
> railure, essentially a NIL possibility, yet you no qualm with your own 
> code with VERY HIGH OCCURRENCE unprotected functional entry points?
>
> Honestly, you don't have a clue?

You release a lock that has not been previously acquired. Your making a big 
mistake there. Read here:

http://groups.google.com/group/comp.programming.threads/msg/cede2e550058c2ff

If you insist on misusing the class, well, that's your problem not mine. 

0
Reply Chris 11/13/2007 9:26:29 PM

Ben Voigt [C++ MVP] wrote:

> And BTW your code is appallingly inefficient for the single reader case, you 
> should be using interlocked singly linked-linked lists which give single 
> reader, multiple writer, lockless, low overhead operation.  Microsoft even 
> has provided an implementation for you.
> 
> http://msdn2.microsoft.com/en-us/library/ms686962.aspx

Benny, this EXACT point of EFFICIENCY above was *ALREADY* pointed out by 
yours truly on multiple occasions - from the onset.  So there is nothing 
new being stated here.  The posted code with the MFC class was a WORKING 
starting point for exploration to provide the OP a quick FIFO queue and 
I clearly stated the case where multi-readers might be required.

I appreciate the link, thought, a glance tells me it will serve very useful

PS: I am happy Chris was able to help you and not insult you at the same 
time.  I disagree with your other comments though, but there is no need 
to repeat it.

--
HLS
0
Reply Pops 11/13/2007 9:30:15 PM

Chris Thomasson wrote:

> If you insist on misusing the class, well, that's your problem 
 > not mine.

No, I'm "worry about the NEWBIE" who might end up writing mult-threaded 
applications and is not sychronizing his locks and unlock calls which 
your library, clearly, obviously, uncategorically promotes with a VERY 
HIGH occurence - just like the SIMPLE example illustrated.

Let me show it again:

void main(char argc, char *argv[])
{
    rwmutex_win rw;
    rw.rdunlock();
    rw.rdunlock();
    getch();
    rw.rdlock();
}

The above will LOCK UP the application using Chris's library.

--
HLS

0
Reply Pops 11/13/2007 9:34:21 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:e7ndQyjJIHA.5928@TK2MSFTNGP05.phx.gbl...
> Ben Voigt [C++ MVP] wrote:
>
>> And BTW your code is appallingly inefficient for the single reader case, 
>> you should be using interlocked singly linked-linked lists which give 
>> single reader, multiple writer, lockless, low overhead operation. 
>> Microsoft even has provided an implementation for you.
>>
>> http://msdn2.microsoft.com/en-us/library/ms686962.aspx
>
> Benny, this EXACT point of EFFICIENCY above was *ALREADY* pointed out by 
> yours truly on multiple occasions - from the onset.  So there is nothing 
> new being stated here.  The posted code with the MFC class was a WORKING 
> starting point for exploration to provide the OP a quick FIFO queue and I 
> clearly stated the case where multi-readers might be required.

Ok, good enough, but advertising it as an R/W-queue was rather disingenious 
if you were fully aware that it supports only one simultaneous reader.

>
> I appreciate the link, thought, a glance tells me it will serve very 
> useful
>
> PS: I am happy Chris was able to help you and not insult you at the same 
> time.  I disagree with your other comments though, but there is no need to 
> repeat it.
>
> --
> HLS 


0
Reply Ben 11/13/2007 9:39:24 PM

Ben Voigt [C++ MVP] wrote:
> 
> The stack-based programmer friendly helper class is only as good as the 
> underlying primitives.  Chris hasn't argued with the value of the helper 
> class, nor implied that his implementation replaces it.  He is working on 
> the meat of the problem, beneath the surface and not even visible to the end 
> user.

Sorry, I think it is worth noting:

But it is indeed visible to the end user, clearly visible.  Any software 
engineer worth his salt will clearly tell you that not protecting your 
entry points is clearly a fundamental aspect of getting to the "meat of 
the problem" which is akin and all part of "Black Box" A.K.A functional 
programming even with its C/C++ substrates.

--
HLS
0
Reply Pops 11/13/2007 9:40:51 PM

> If you insist on misusing the class, well, that's your problem not mine.

Not entirely.  It would be an improvement to check for such mistakes in an 
appropriate debug version protected by #if DEBUG 


0
Reply Ben 11/13/2007 9:41:08 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:uULSj0jJIHA.4592@TK2MSFTNGP02.phx.gbl...
> Chris Thomasson wrote:
>
>> If you insist on misusing the class, well, that's your problem not mine.
>
> No, I'm "worry about the NEWBIE" who might end up writing mult-threaded 
> applications and is not sychronizing his locks and unlock calls which your 
> library, clearly, obviously, uncategorically promotes with a VERY HIGH 
> occurence - just like the SIMPLE example illustrated.
>
> Let me show it again:
>
> void main(char argc, char *argv[])
> {
>    rwmutex_win rw;
>    rw.rdunlock();
>    rw.rdunlock();
>    getch();
>    rw.rdlock();
> }
>
> The above will LOCK UP the application using Chris's library.

Shall Chris also prevent the user from writing:

int main(void)
{
    while (1);
    {
        printf("Press Q to quit.\n");
        int c = getch();
        if (c == 'Q' || c == 'q') return 0;
    }
}

which also hangs, requiring CTRL+C to break? 


0
Reply Ben 11/13/2007 9:44:38 PM

Ben Voigt [C++ MVP] wrote:

> Ok, good enough, but advertising it as an R/W-queue was rather disingenious 
> if you were fully aware that it supports only one simultaneous reader.

Only in the example posted, and it was CLEARLY stated in the original 
response to the OP.

Just to be clear,  it was well pointed out that in multiple reader 
environment, a simple change to a TCriticalSectionGrabber() will do the 
job, and clearly stated that a MORE efficient FIFO other than MFC based 
one was more appropriate.

Anything else?

--
HLS
0
Reply Pops 11/13/2007 9:45:06 PM

Ben Voigt [C++ MVP] wrote:
> "Pops" <dude@nospamnospamnospam.com> wrote in message 
> news:eZrCK$iJIHA.3356@TK2MSFTNGP02.phx.gbl...
>> Chris Thomasson wrote:
>>
>> You need to control C/Break out of this code.
>>
>> That is oppose to a method that would be used by our R/W classes with 
>> inherent initialization, and locks and unlocks.
> 
> The stack-based programmer friendly helper class is only as good as the 
> underlying primitives.  Chris hasn't argued with the value of the helper 
> class, nor implied that his implementation replaces it.  He is working on 
> the meat of the problem, beneath the surface and not even visible to the end 
> user.

Ben, I nearly missed the introduction of some actual discussion into 
this battle. Shame on you. ;-)

This is an excellent point, but I'll add more. A READ lock guard object 
construct can be a great boon -- it adds convenience with no real risk.

A WRITE lock (or normal mutex) guard object introduces substantial risk 
of its own because it'll unlock implicitly when the destructor fires no 
matter what the cause or where in the code the unwind occurred. An 
unlock with broken shared data predicates is a serious bug, and the 
guard object can make the bug really hard to find. Obviously this risk 
can be controlled -- but it's not trivial for beginners.

Whereas "straight C style" code like Chris' example is more likely to 
exit still holding the lock. Also bad, clearly; but generally much 
easier to diagnose. Any decent lock analysis package will be able to 
tell you where the lock was taken after it leads to deadlock, and the 
state of the data will be available for analysis. No analysis package is 
going to be able to tell you where your guard implicitly released a lock 
when it shouldn't have.

Guard object are "cool", and quite often appropriate and convenient. But 
they can also be extremely dangerous if not completely understood. And, 
in particular, it's cheesy for someone to be criticizing code merely 
because it chooses not to use that style of programming.

> I believe that the other group you are referring to 
> (comp.programming.threads) is moderated while the Microsoft ones are open to 
> anyone who wants to mouth off in public.

No, sadly, (sometimes), comp.programming.threads is NOT moderated.
0
Reply Dave 11/13/2007 10:05:41 PM

Ben Voigt [C++ MVP] wrote:

>> The above will LOCK UP the application using Chris's library.
> 
> Shall Chris also prevent the user from writing:
> 
> int main(void)
> {
>     while (1);
>     {
>         printf("Press Q to quit.\n");
>         int c = getch();
>         if (c == 'Q' || c == 'q') return 0;
>     }
> }
> 
> which also hangs, requiring CTRL+C to break? 

Do you see anything there related to Chris's library?

Oh come on, not you too!  What happen to this Microsoft GROUP!!  Do they 
let just anyone become a C++ MVP!?   Or as Chris's friend, you both in 
the same school level?

If you missed the entire point of about his "MEAT BOX" and protecting 
his entry points, which is FAR worst a situation than not checking for 
return values in our constructor, then this has become a BIG joke!

Geesz!

I'm sorry to be rude to you, but why are you wasting my time with that 
baloney?

--
HLS
0
Reply Pops 11/13/2007 10:05:56 PM

"Ben Voigt [C++ MVP]" <rbv@nospam.nospam> wrote in message 
news:uY57H3jJIHA.3940@TK2MSFTNGP05.phx.gbl...
>
>> If you insist on misusing the class, well, that's your problem not mine.
>
> Not entirely.  It would be an improvement to check for such mistakes in an 
> appropriate debug version protected by #if DEBUG

Okay. I will post the error checking version later on today. Thanks for 
that. 

0
Reply Chris 11/13/2007 10:16:20 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:OyutMGkJIHA.5328@TK2MSFTNGP05.phx.gbl...
> Ben Voigt [C++ MVP] wrote:
>
>>> The above will LOCK UP the application using Chris's library.
>>
>> Shall Chris also prevent the user from writing:
>>
>> int main(void)
>> {
>>     while (1);
>>     {
>>         printf("Press Q to quit.\n");
>>         int c = getch();
>>         if (c == 'Q' || c == 'q') return 0;
>>     }
>> }
>>
>> which also hangs, requiring CTRL+C to break?
>
> Do you see anything there related to Chris's library?
>
> Oh come on, not you too!  What happen to this Microsoft GROUP!!  Do they 
> let just anyone become a C++ MVP!?   Or as Chris's friend, you both in the 
> same school level?

I would advise against insult the work that Ben, and many others, put into 
obtaining their MVP titles. I say, good for them, and keep up the good work.




> If you missed the entire point of about his "MEAT BOX" and protecting his 
> entry points, which is FAR worst a situation than not checking for return 
> values in our constructor, then this has become a BIG joke!
>
> Geesz!
>
> I'm sorry to be rude to you, but why are you wasting my time with that 
> baloney?

Your trolling attempts are not going to work any more.



I am going to add a locking scoping utility class and some further "sanity" 
checking in my rwmutex_win class and post it here later on today. I hope 
that you could give it another chance after that? Anyway, I was really 
focused on the "rw-mutex algorithm itself" I created. I wanted to know if 
anybody can see something I cannot. You mentioned that it allows programmers 
to make explicit mistakes. Well, I will add some

#if defined(DEBUG)

sanity checks.


You informed me that it would be a good idea to include locking scoping 
objects. Okay, I will add those as well.


Thanks for your help Hector (aka, Pops). 

0
Reply Chris 11/13/2007 10:25:44 PM

Chris Thomasson wrote:
> "Ben Voigt [C++ MVP]" <rbv@nospam.nospam> wrote in message 
> news:uY57H3jJIHA.3940@TK2MSFTNGP05.phx.gbl...
>>
>>> If you insist on misusing the class, well, that's your problem not mine.
>>
>> Not entirely.  It would be an improvement to check for such mistakes 
>> in an appropriate debug version protected by #if DEBUG
> 
> Okay. I will post the error checking version later on today. Thanks for 
> that.

Chris,

Since the implementation your simple library requires the exposure of 
user entry points for usage, these need to be protected in release code. 
  There is a high potential for synchronization errors that will not get 
caught.

Please trust me on that.

And finally, here let me throw another bone out to you:

We will add the error checking for our constructor, we don't believe it 
will offer any value, but will add a logger/tracer to see if anything 
has silently occurred unbeknowst to us in the last 12 years of working 
operations.

What we can't do is just throw exceptions without a new design review 
and prolong QA testing period to see how any new throws may alter any 
EH/SEH handling currently in place.

-- 
Hector Santos, CTO
Santronics Software, Inc
http://www.santronics.com
0
Reply Pops 11/13/2007 10:25:48 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:%23V62SRkJIHA.2480@TK2MSFTNGP05.phx.gbl...
> Chris Thomasson wrote:
>> "Ben Voigt [C++ MVP]" <rbv@nospam.nospam> wrote in message 
>> news:uY57H3jJIHA.3940@TK2MSFTNGP05.phx.gbl...
>>>
>>>> If you insist on misusing the class, well, that's your problem not 
>>>> mine.
>>>
>>> Not entirely.  It would be an improvement to check for such mistakes in 
>>> an appropriate debug version protected by #if DEBUG
>>
>> Okay. I will post the error checking version later on today. Thanks for 
>> that.
>
> Chris,
>
> Since the implementation your simple library requires the exposure of user 
> entry points for usage, these need to be protected in release code. There 
> is a high potential for synchronization errors that will not get caught.
>
> Please trust me on that.

Okay. I am making the proper modifications to my code now. I will post soon.




> And finally, here let me throw another bone out to you:
>
> We will add the error checking for our constructor, we don't believe it 
> will offer any value, but will add a logger/tracer to see if anything has 
> silently occurred unbeknowst to us in the last 12 years of working 
> operations.

Perfect.




> What we can't do is just throw exceptions without a new design review and 
> prolong QA testing period to see how any new throws may alter any EH/SEH 
> handling currently in place.

Okay. I see. However, I think the add error checking via logging will be 
worthwhile.


Thanks. 

0
Reply Chris 11/13/2007 10:47:10 PM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:OyutMGkJIHA.5328@TK2MSFTNGP05.phx.gbl...
> Ben Voigt [C++ MVP] wrote:
>
>>> The above will LOCK UP the application using Chris's library.
>>
>> Shall Chris also prevent the user from writing:
>>
>> int main(void)
>> {
>>     while (1);
>>     {
>>         printf("Press Q to quit.\n");
>>         int c = getch();
>>         if (c == 'Q' || c == 'q') return 0;
>>     }
>> }
>>
>> which also hangs, requiring CTRL+C to break?
>
> Do you see anything there related to Chris's library?
>
> Oh come on, not you too!  What happen to this Microsoft GROUP!!  Do they 
> let just anyone become a C++ MVP!?   Or as Chris's friend, you both in the 
> same school level?
>
> If you missed the entire point of about his "MEAT BOX" and protecting his 
> entry points, which is FAR worst a situation than not checking for return 
> values in our constructor, then this has become a BIG joke!

My point was simply that bad code will do bad things.  We would hardly say 
that it is up to the printf or getch writer to prevent (ab)use in the 
fashion of the example I gave.  Why then would we expect Chris to prevent 
clear abuse of his functions?  And even if he prevented abuse in the manner 
of your example, would it actually do any good?  I can still create a 
deadlock by using your guard objects improperly (pseudocode):

rwlock a, b;

thread1()
{
    writerguard w1(a);
    writerguard w2(b);
    ...
}

thread2()
{
    writerguard w1(b);
    writerguard w2(a);
    ...
}

Is this a bug in your version, because you do not protect against this 
abuse? 


0
Reply Ben 11/13/2007 10:47:51 PM

Ben Voigt [C++ MVP] wrote:
> "Pops" <dude@nospamnospamnospam.com> wrote in message 
> 
> My point was simply that bad code will do bad things.  

Ok, the obvious. GIGO! Garbage In, Garbage out.

> Why then would we expect Chris to prevent 
> clear abuse of his functions?  

That was a clear attempt to break it, and that alone should be enough to 
show the entry points need protection.

But we are talking the real world, and more appropriate the complex 
workd of multi-thread and synchronization designs.

It doesn't have to be an abuse. It can be "honest" coding mistake, I 
mean the sky is the limit of the coding possibilities that can easily 
create a situation whether unlock is prematurely called. Maybe the gut 
there if conditiion function exit that skipped a lock, and the exit 
called the unlock, maybe he was using try finally blocks and didn't 
quite understood the flow yet.

The fact that it is allowed to occured, opens the door that it can, and 
more than likely will happen

> And even if he prevented abuse in the manner 
> of your example, would it actually do any good?  

Sure, better coding, less supports issues, confidence in design, there 
are all kinds of benefits which ultimately spells into less cost, less 
dollars across the board.

> I can still create a 
> deadlock by using your guard objects improperly (pseudocode):
 >
 > ...
 >
> Is this a bug in your version, because you do not protect against this 
> abuse? 

Sure, there is clear cut points where proper implementation is expected. 
  No doubt.

No one is immune to it.  My point is when you have explicit entry points 
  for usage where more entry points are required to release it, it is 
far better to remove the potential issues as much as you can, then to 
just let to to the whims of a programmer.

In our case, which means nothing, it is unlikely to happen since we 
don't use globals like you showed and most our resources to be protected 
all modularilize, a class per resource, etc. But sure, in principle, it 
can happen.

So no, I don't expect Chris or anyone else to solve the world problems, 
just the black box they own and expose as much as they *practically* 
can.  There is always a point where the

       "Press Any Key / Where is the Any Key?"

user syndrome comes into play.

Is that obvious? I think so. Or maybe not. :-)

(Oh, time for dinner! See ya later)

--
HLS
0
Reply Pops 11/13/2007 11:29:54 PM

Chris Thomasson wrote:

>>> Okay. I will post the error checking version later on today. Thanks 
>>> for that.

My opinion, there is no right/wrong, just how I was trained. Also keep 
in mind that I have not done this level of coding/thinking in many 
years. Our treasure chest is full of reusable code designed over many 
years that the finer details are well forgotten. Plus age is catching up 
fast. :-)  All I am saying is talk all this with an open mind, there is 
nothing "locked" in stone. I just wish to avoid useless philosophical 
debates. Ok?

1) Another Lockup situation:

    void BadTest2()
    {
       rwmutex_win rw();
       rw.wrlock();
       getch();
       rw.wrlock();
    }

One might say this is a programmer mistake, YES, but it is possible to 
occur through obscure and abstract ways in the programmers world, 
especially as I explain in #4 below "Same Thread Writer Access".

Also, this might be related to another problem with the "depth" which I 
will show more below in #3.  So keep this in mind.

2) xxunlock() / xxlock() entry points.

For explicit usage (users calling directly), this is a perfect situation 
were you might wish to use a throw because it is a fundamental user 
coding error to resolved asap. It is critical to perfecting the 
synchronization and serialization of resources.

So I will interested to see how you do this because there is a issue 
with the depth (#3) that might be used in checking the entry point. It 
depends on how you do this.

Anyway, when you add the local scope class helper (and if you like 
borrow the similar login in critsect.h), this will tremendously clean up 
user implementation. For example, when I used your library for the 
multi-reader/single reader pump example, the Get() looks like this:

     BOOL Get(TWCFWatchEvent &o)
         {
           Mutex.wrlock();
           if (IsEmpty()) {
              Mutex.wrunlock();
              return FALSE;
           }
           o = RemoveTail();
           Mutex.wrunlock();
           return TRUE;
         }

There is nothing wrong with it, but one might be tempted to want to use 
try finally, and now SEH might be required. So getting the local scope 
class helper is going to be a great addition to your rw code, and 
greatly appreciated by end-users for very simple and common 
serialization of resources.

3) Semaphore counts (depth).

I found another mite with the depth usage.

Again, in the spirit of minimizing software engineering coding issues, 
an allowance for the end user to enter a depth value in the contructor 
is available and because of that allowance, it can open the door to 
incorrect usage.

Per MSDN, the minimum value for depth is 1.

If a value of zero, it is caught with a run time exception:

          rwmutex_win Mutex(0);  // <-- exception thrown

This is good so far,

But a value of 1 is not correctly considered in the if conditions and 
this will cause a permanent lockup as well.

Design consideration questions:

1) Why is depth needed?  Can it be removed?

2) Can it stay at LONG_MAX?  Think Pareto Principle. If 80-90% is the 
common usage and most appropriate usage, then you might consider fixing 
it at LONG MAX.

3) Will it hurt keeping it at LONG MAX or some other value?

Again, these are considerations to solidify the code and reduce any 
software engineering issues.  In either case, needed, desired or not,
its internal usage needs to be used corrected.

4) Same thread WRITER access.

Here I admit there is philosophy debate depending on one's school of 
thought.  I definitely open to open-minded discussions here.

But overall, I don't believe the same thread WRITER access should be 
blocked.   I believe this is whats causing the lock in #1 above.

In our critsect.h, this is possible to have global controller:

class  TFoobar
{
public:
   void memberX()
   {
      TWriterGrabber grab(Mutex);
      memberY();  //Call some other class memberY
   }

   void memberY()
   {
      TWriterGrabber grab(Mutex);
   }

private:
   TReaderWriter Mutex;
} foobar;

Again, think black box, the functional modularity may be use 
autonomously or with other functions.  The applicability and reasons is 
plentiful.  This writer mutex should not block the same thread.

Unless I am missing something Chris, I am not see thing with the 
rwmutex_win library.

5) Support Functional Programming.

I would consider using a BOOL return on the xxxunlock/lock functions to 
help avoid having to use exception issues:

       rwmutex_win rw();
       if (!rw.wrlock()) {
          MYLOGERROR(GetLastError(),"Critical Design Error");
       }

You want to call and set your own SetLastError(RWMUTEX_ERROR_BASE+xxxx) 
for any FALSE condition here.

Summary:

Overall, I think correcting the logic and usage the depth variables is 
key to remove most if not all of its issue I see.

- Check for entry points misuse,
- Checking for a depth value of 1 and using it correctly is needed.
- Consider deprecating the rddepth optional parameter unless required.
- Consider same thread WRITE access allowance,
- Add a Local Scope Class Header,
- Consider functional programming support.

--
HLS

0
Reply Pops 11/14/2007 3:43:24 AM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:eO4pxCnJIHA.4272@TK2MSFTNGP05.phx.gbl...
> Chris Thomasson wrote:
>
>>>> Okay. I will post the error checking version later on today. Thanks for 
>>>> that.
>
> My opinion, there is no right/wrong, just how I was trained. Also keep in 
> mind that I have not done this level of coding/thinking in many years. Our 
> treasure chest is full of reusable code designed over many years that the 
> finer details are well forgotten. Plus age is catching up fast. :-)  All I 
> am saying is talk all this with an open mind, there is nothing "locked" in 
> stone. I just wish to avoid useless philosophical debates. Ok?

All right with me. BTW, thanks for your comments on my code. I appreciate 
them.




> 1) Another Lockup situation:
>
>    void BadTest2()
>    {
>       rwmutex_win rw();
>       rw.wrlock();
>       getch();
>       rw.wrlock();
>    }
>
> One might say this is a programmer mistake, YES, but it is possible to 
> occur through obscure and abstract ways in the programmers world, 
> especially as I explain in #4 below "Same Thread Writer Access".

Yes. This is an example of Murphy's Law. If shit can happen, it will happen 
right in the middle of the place where you don't seem to think it will...

;^)




> Also, this might be related to another problem with the "depth" which I 
> will show more below in #3.  So keep this in mind.
>
> 2) xxunlock() / xxlock() entry points.
>
> For explicit usage (users calling directly), this is a perfect situation 
> were you might wish to use a throw because it is a fundamental user coding 
> error to resolved asap. It is critical to perfecting the synchronization 
> and serialization of resources.
>
> So I will interested to see how you do this because there is a issue with 
> the depth (#3) that might be used in checking the entry point. It depends 
> on how you do this.
>
> Anyway, when you add the local scope class helper (and if you like borrow 
> the similar login in critsect.h), this will tremendously clean up user 
> implementation. For example, when I used your library for the 
> multi-reader/single reader pump example, the Get() looks like this:
>
>     BOOL Get(TWCFWatchEvent &o)
>         {
>           Mutex.wrlock();
>           if (IsEmpty()) {
>              Mutex.wrunlock();
>              return FALSE;
>           }
>           o = RemoveTail();
>           Mutex.wrunlock();
>           return TRUE;
>         }
>
> There is nothing wrong with it, but one might be tempted to want to use 
> try finally, and now SEH might be required. So getting the local scope 
> class helper is going to be a great addition to your rw code, and greatly 
> appreciated by end-users for very simple and common serialization of 
> resources.
>
> 3) Semaphore counts (depth).
>
> I found another mite with the depth usage.


I have removed the m_rddepth variable. Check here:


http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html

You are 100% correct in that it added unneeded complexity. I subsituted 
m_rddepth with LONG_MAX. I think that processors with more than LONG_MAX 
cores are far off in the future...

:^)



> Again, in the spirit of minimizing software engineering coding issues, an 
> allowance for the end user to enter a depth value in the contructor is 
> available and because of that allowance, it can open the door to incorrect 
> usage.
>
> Per MSDN, the minimum value for depth is 1.
>
> If a value of zero, it is caught with a run time exception:
>
>          rwmutex_win Mutex(0);  // <-- exception thrown
>
> This is good so far,


>
> But a value of 1 is not correctly considered in the if conditions and this 
> will cause a permanent lockup as well.
>
> Design consideration questions:
>
> 1) Why is depth needed?  Can it be removed?

Yes it can, and its has been removed. Thanks for pointing that out.




> 2) Can it stay at LONG_MAX?  Think Pareto Principle. If 80-90% is the 
> common usage and most appropriate usage, then you might consider fixing it 
> at LONG MAX.

Yes. I fixed it at LONG_MAX. Perfect.



>
> 3) Will it hurt keeping it at LONG MAX or some other value?
>
> Again, these are considerations to solidify the code and reduce any 
> software engineering issues.  In either case, needed, desired or not,
> its internal usage needs to be used corrected.

I have corrected the issue.



>
> 4) Same thread WRITER access.

Okay. You want me to make my wrlock function to be recursive. You correct in 
that recursive locking is not needed to create correct code. However, I will 
go ahead and try to recurse the writer as it adds to the overall 
functionality of the synchronization object overall; I mean POSIX extension 
supports recursive attribute for mutex, so, why should I not provide 
recursive write locking? Thanks.. That should be a trivial task for me to 
do.



>
> Here I admit there is philosophy debate depending on one's school of 
> thought.  I definitely open to open-minded discussions here.

Indeed.

[...]
> Unless I am missing something Chris, I am not see thing with the 
> rwmutex_win library.

You missed nothing at all. The rw-mutex implementation I posted as-is does 
NOT support recursive writer locking in any way, shape or form. I think I 
will add it as to further support enhanced functionality of my rw-mutex 
class.




> 5) Support Functional Programming.
>
> I would consider using a BOOL return on the xxxunlock/lock functions to 
> help avoid having to use exception issues:
>
>       rwmutex_win rw();
>       if (!rw.wrlock()) {
>          MYLOGERROR(GetLastError(),"Critical Design Error");
>       }

I don't think I need bool return... Humm... Well, I am interested in what 
others on this point? I think I would only need to return bool if there was 
the qualifier "try" in the name of the function called. Otherwise, I would 
expect blocking behavior to ensue.




> You want to call and set your own SetLastError(RWMUTEX_ERROR_BASE+xxxx) 
> for any FALSE condition here.
>
> Summary:
>
> Overall, I think correcting the logic and usage the depth variables is key 
> to remove most if not all of its issue I see.
>
> - Check for entry points misuse,
> - Checking for a depth value of 1 and using it correctly is needed.
> - Consider deprecating the rddepth optional parameter unless required.
> - Consider same thread WRITE access allowance,
> - Add a Local Scope Class Header,
> - Consider functional programming support.

Thank so much for that. I really appreciate  the time you took to looking 
over my code an try to help it out.


I am not going to be able to post the code today. Please check back sometime 
tomorrow though. I will take your comment to heart and apply them in my 
refactored code.


Thanks Hector.


Sincerely,

Chris Thomasson

http://appcore.home.comcast.net




:^) 

0
Reply Chris 11/14/2007 4:52:10 AM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:%23WNXj6jJIHA.1184@TK2MSFTNGP04.phx.gbl...
> Ben Voigt [C++ MVP] wrote:
>
>> Ok, good enough, but advertising it as an R/W-queue was rather 
>> disingenious if you were fully aware that it supports only one 
>> simultaneous reader.
>
> Only in the example posted, and it was CLEARLY stated in the original 
> response to the OP.
>
> Just to be clear,  it was well pointed out that in multiple reader 
> environment, a simple change to a TCriticalSectionGrabber() will do the 
> job, and clearly stated that a MORE efficient FIFO other than MFC based 
> one was more appropriate.
>
> Anything else?

Humm... Well, perhaps you could go ahead and think about renaming the class 
to something along the lines of:

SingleReaderMultiWriterQueue


Just to document the actual usage requ9irements and/or functionality in the 
name of the object itself, or something to get the point across more 
directly?

Would that be helpful at all? 

0
Reply Chris 11/14/2007 5:13:52 AM

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:eO4pxCnJIHA.4272@TK2MSFTNGP05.phx.gbl...
> Chris Thomasson wrote:
>
[...]
> 5) Support Functional Programming.
>
> I would consider using a BOOL return on the xxxunlock/lock functions to 
> help avoid having to use exception issues:
>
>       rwmutex_win rw();
>       if (!rw.wrlock()) {
>          MYLOGERROR(GetLastError(),"Critical Design Error");
>       }
>
> You want to call and set your own SetLastError(RWMUTEX_ERROR_BASE+xxxx) 
> for any FALSE condition here.

[...]

I will add a trywrlock function. I will also consider and probably use 
SetLastError like errno. Fine. 

0
Reply Chris 11/14/2007 5:21:55 AM

Chris Thomasson wrote:

>> Just to be clear,  it was well pointed out that in multiple reader 
>> environment, a simple change to a TCriticalSectionGrabber() will do 
>> the job, and clearly stated that a MORE efficient FIFO other than MFC 
>> based one was more appropriate.
>>
>> Anything else?
> 
> Humm... Well, perhaps you could go ahead and think about renaming the 
> class to something along the lines of:
> 
> SingleReaderMultiWriterQueue

Maybe Chris, but I think more appropriately is my propensity to presume 
the obvious will be understood.  It is a bad habit, but I have to admit 
this kernel32 group has changed. There is a new generation here, the 
veterans are not longer here or more mellow like I have become. But the 
vets all know me here. I was a regular during the 90s and early 2000s, 
but now I only pop up once in a blue moon to get an answer and then move 
on.  While I'm here, something I will answer some question too. I love 
and enjoy helping people if I can, like I did with the OP.  But if you 
didn't begin to disrupt the thread, rest assured, I was near ready to 
move on. :-)

Anyway, many times, I will draft a message and decide that it could be 
cut down, or I just won't worry about the finer details because I 
presume in this highly technical forum full of professionals and long 
time veterans, it will be understood.

That of course, can often get you in trouble when a "young turk" who 
doesn't know you begin challenging you. :-) I think you went overboard, 
but lets put that to rest.

Anyway, the point is that it is WELL UNDERSTOOD the MFC Clist for a FIFO 
was not ideal, it is not thread ready and requires serializations.  MFC 
does offer CNoTrackObject and CProcessLocal/CThreadLocal template 
classes to make it thread ready at the process and thread level. Off 
hand, if I recall, I believe it uses TLS. Let me see if I can find an 
quick example usage.... ok, here is one:

Example usage:

struct CDllGlobalData : public CNoTrackObject
{
    ... your data ...
    DWORD dwCount;
    CString sModulename
};

CThreadLocal<CDllGlobalData> dllGlobalData;

int GetGlobalDebugLevel()
{
     return dllGlobalData->dwCount;
}
CString GetGlobalModuleName()
{
     return dllGlobalData->sModuleName;
}

(please don't jump on the example, thats all it is. :-))

But I didn't want to waste time to point out this. Efficiency and thread 
counts came into my mind.

I also knew a more efficient,low overhead double linked link queue would 
be better and we have code for this too but for other things. I wasn' 
about to post that. But I did mention this point in one of the post 
which you skipped over.  :-)

And believe it or not the code I posted was first designed for 
multi-readers, multi-writers, but there were several reasons why I cut 
it down to a multi-writer/single reader example:

1) Cut the size of the posted code, and more importantly,

2) Give the poster the MINIMUM BASELINE for him to
    analyze the loading requirement.

I knew that would be very important in this "pump" design, so I cut it 
down, simplified it to a multi-writer/single reader design.

Honest, this is a key consideration and I know how important it is 
because it is fundamental in our communications product flow control 
synchronization needs. That is why my original statement to the OP 100% 
implied this concept and point:

      "If your testing shows the queue is growing too fast,
       then you add additional data threads."

That was 100% stated with the implications if he added more reader 
threads, then the code as it was posted would have to be changed to 
serialized the multiple readers, as you noted with a writer lock.

I guess my mistake is that I either pulled that clarification from my 
draft message, or I didn't bother to write it or since I just finished 
exploring the same idea with RDCW() missed events and use a similar MFC 
FIFO queue for a proof of concept, I used this more recognizable idea.

So I never disagreed with that point. But for some reason you wanted to 
ignore everything that was said and preferred to militantly focus on it 
being a problem and I was a terrible person because of it. :)

Anyway, its a dead issue.

Peace?

We really do need to stop this at this point bu if you feel to urge to 
reply, go ahead with the final say.

--
HLS
0
Reply Pops 11/14/2007 6:19:38 AM

On Nov 13, 2:47 pm, "Ben Voigt [C++ MVP]" <r...@nospam.nospam> wrote:
> rwlock a, b;
>
> thread1()
> {
>     writerguard w1(a);
>     writerguard w2(b);
>     ...
>
> }
>
> thread2()
> {
>     writerguard w1(b);
>     writerguard w2(a);
>     ...
>
> }
>
> Is this a bug in your version, because you do not protect against this
> abuse?- Hide quoted text -

I'll try to make this discussion a bit more interesting and maybe
restore some peace ;-)

Read write lock is actually tricky to use because the following is
also a bug but much more slippery (hence deadlier):

thread1()
{
    readguard rlock1(a);
    readguard rlock2(a);
}

Thought read lock allow multiple readers it's actually not recursive
(i.e. Illusive recursive).
[hint - another thread try to acquire exclusive lock in the middle].

In our read RW lock (which is similar to one of Chris lock free read
lock) we have several defect detections in debug mode:
1) Recursive read lock detection for up to N concurrent accesses
2) Recursive write lock detection
3) Recursive write lock detection (includes write-read combinations)
   Write lock promotion not allowed (I think that it complicate the
code).
4) 2.5 minutes deadlock detection for other cases (using "hibernate
safe" timer).

Rani

0
Reply rani_sharoni 11/14/2007 5:37:29 PM

Unfortunately, a read/write lock doesn't have capability to track all 
readers. It can validate writers, though, (and those who pretend to own the 
write lock). Since an only writer can be inside a critical section (and 
recursion should not be allowed), that implementation can check against 
incorrect write lock use. But NOT against read lock abuse.

"Ben Voigt [C++ MVP]" <rbv@nospam.nospam> wrote in message 
news:uY57H3jJIHA.3940@TK2MSFTNGP05.phx.gbl...
>
>> If you insist on misusing the class, well, that's your problem not mine.
>
> Not entirely.  It would be an improvement to check for such mistakes in an 
> appropriate debug version protected by #if DEBUG
> 


0
Reply Alexander 11/15/2007 3:43:50 AM

Here is the latest version. It has some addedd assertions for "sanity" 
checking, and I added some scoped locking classes as promised. However, I 
did not make the writer lock recursive just yet. I am thinking about how 
much that is really needed. Well, here it is:


http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
________________________________________________________________
#if ! defined(RWMUTEX_WIN_HPP)
  #define RWMUTEX_WIN_HPP()0x0001UL
  #undef WIN32_LEAN_AND_MEAN
  #undef _WIN32_WINNT
  #define _WIN32_WINNT 0x0400
  #include <windows.h>
  #include <exception>
  #include <cassert>
  #include <climits>
  #if ! defined(RWMUTEX_WIN_UNEXPECTED)
    #if defined(_MSC_VER)
      #define RWMUTEX_WIN_UNEXPECTED() \
        assert(false), unexpected()
    #else
      #define RWMUTEX_WIN_UNEXPECTED() \
        assert(false), std::unexpected()
    #endif
  #endif
  #if ! defined(RWMUTEX_WIN_CTOR_UNEXPECTED)
    #define RWMUTEX_WIN_CTOR_UNEXPECTED() \
      assert(false); throw std::exception()
  #endif
  #if ! defined(RWMUTEX_WIN_DTOR_UNEXPECTED)
    #define RWMUTEX_WIN_DTOR_UNEXPECTED()assert(false)
  #endif
/*************************************************************/




/* Simple Scoped Lock Helpers
_________________________________________________________*/
namespace lockguard {
  template<typename T>
  class read {
    T* m_state;

  public:
    read(T& state) throw()
      : m_state(&state) {
      m_state->rdlock();
    }

    ~read() throw() {
      if (m_state) {
        m_state->rdunlock();
      }
    }

  public:
    void cancel() throw() {
      m_state = 0;
    }
  };


  template<typename T>
  class write {
    T* m_state;

  public:
    write(T& state) throw()
      : m_state(&state) {
      m_state->wrlock();
    }

    ~write() throw() {
      if (m_state) {
        m_state->wrunlock();
      }
    }

  public:
    void cancel() throw() {
      m_state = 0;
    }
  };
}




/* Wait-Free Fast-Pathed Read/Write Mutex
_________________________________________________________*/
class rwmutex_win {
  rwmutex_win(rwmutex_win const&);
  rwmutex_win const& operator =(rwmutex_win const&);


public:
  typedef lockguard::read<rwmutex_win> lock_read;
  typedef lockguard::write<rwmutex_win> lock_write;


private:
  LONG m_count;
  LONG m_rdwake;
  CRITICAL_SECTION m_wrlock;
  HANDLE m_wrwset;
  HANDLE m_rdwset;


public:
  rwmutex_win()
    : m_count(LONG_MAX),
      m_rdwake(0),
      m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
      m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
    if (! m_rdwset || ! m_wrwset) {
      if (m_rdwset) { CloseHandle(m_rdwset); }
      if (m_wrwset) { CloseHandle(m_wrwset); }
      RWMUTEX_WIN_CTOR_UNEXPECTED();
    }
    InitializeCriticalSection(&m_wrlock);
  }

  ~rwmutex_win() throw() {
    if (m_count != LONG_MAX || m_rdwake) {
      RWMUTEX_WIN_DTOR_UNEXPECTED();
    }
    DeleteCriticalSection(&m_wrlock);
    if (! CloseHandle(m_rdwset) ||
        ! CloseHandle(m_wrwset)) {
      RWMUTEX_WIN_DTOR_UNEXPECTED();
    }
  }


public:
  void wrlock() throw() {
    EnterCriticalSection(&m_wrlock);
    assert(m_count > -1);
    LONG count =
      InterlockedExchangeAdd(&m_count, -LONG_MAX);
    if (count < LONG_MAX) {
      if (InterlockedExchangeAdd(&m_rdwake,
            LONG_MAX - count) + LONG_MAX - count) {
        if (WaitForSingleObject(m_wrwset,
              INFINITE) != WAIT_OBJECT_0) {
          RWMUTEX_WIN_UNEXPECTED();
        }
      }
    }
  }

  void wrunlock() throw() {
    assert(m_count < 1);
    LONG count =
      InterlockedExchangeAdd(&m_count, LONG_MAX);
    if (count < 0) {
      if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
        RWMUTEX_WIN_UNEXPECTED();
      }
    }
    LeaveCriticalSection(&m_wrlock);
  }


public:
  void rdlock() throw() {
    assert(m_count <= LONG_MAX);
    LONG count = InterlockedDecrement(&m_count);
    if (count < 0) {
      if (WaitForSingleObject(m_rdwset,
            INFINITE) != WAIT_OBJECT_0) {
        assert(false); RWMUTEX_WIN_UNEXPECTED();
      }
    }
  }

  void rdunlock() throw() {
    assert(m_count < LONG_MAX);
    LONG count = InterlockedIncrement(&m_count);
    if (count < 1) {
      if (! InterlockedDecrement(&m_rdwake)) {
        if (! SetEvent(m_wrwset)) {
          RWMUTEX_WIN_UNEXPECTED();
        }
      }
    }
  }


public:
  bool tryrdtowr() throw() {
    assert(m_count < LONG_MAX);
    if (TryEnterCriticalSection(&m_wrlock)) {
      assert(m_count > -1);
      LONG count =
        InterlockedExchangeAdd(&m_count, (-LONG_MAX) + 1) + 1;
      if (count < LONG_MAX) {
        if (InterlockedExchangeAdd(&m_rdwake,
              LONG_MAX - count) + LONG_MAX - count) {
          if (WaitForSingleObject(m_wrwset,
                INFINITE) != WAIT_OBJECT_0) {
            RWMUTEX_WIN_UNEXPECTED();
          }
        }
      }
      return true;
    }
    return false;
  }
};




/*************************************************************/
#endif
________________________________________________________________



Enjoy!


BTW, have any of you been tinkering around with this particular design?


Thanks. 

0
Reply Chris 11/15/2007 5:18:30 PM

This is latest version. It includes writer recursion support. It also 
augments the lockguard::read class to support read-to-write upgrades:


http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
________________________________________________________________

#if ! defined(RWMUTEX_WIN_HPP)
  #define RWMUTEX_WIN_HPP()0x0001UL
  #undef WIN32_LEAN_AND_MEAN
  #undef _WIN32_WINNT
  #define _WIN32_WINNT 0x0400
  #include <windows.h>
  #include <exception>
  #include <cassert>
  #include <climits>
  #if ! defined(RWMUTEX_WIN_UNEXPECTED)
    #if defined(_MSC_VER)
      #define RWMUTEX_WIN_UNEXPECTED() \
        assert(false), unexpected()
    #else
      #define RWMUTEX_WIN_UNEXPECTED() \
        assert(false), std::unexpected()
    #endif
  #endif
  #if ! defined(RWMUTEX_WIN_CTOR_UNEXPECTED)
    #define RWMUTEX_WIN_CTOR_UNEXPECTED() \
      assert(false); throw std::exception()
  #endif
  #if ! defined(RWMUTEX_WIN_DTOR_UNEXPECTED)
    #define RWMUTEX_WIN_DTOR_UNEXPECTED()assert(false)
  #endif
/*************************************************************/




/* Simple Scoped Lock Helpers
_________________________________________________________*/
namespace lockguard {
  template<typename T>
  class read {
    T* m_state;
    bool m_upgraded;
  public:
    read(T& state) throw()
      : m_state(&state),
        m_upgraded(false) {
      m_state->rdlock();
    }
    ~read() throw() {
      if (m_state) {
        if (! m_upgraded) {
          m_state->rdunlock();
        } else {
          m_state->wrunlock();
        }
      }
    }
  public:
    void cancel() throw() { m_state = 0; }

    bool tryrdtowr() throw() {
      return (m_state && ! m_upgraded) ?
        (m_upgraded = m_state->tryrdtowr()) : false;
    }
  };


  template<typename T>
  class write {
    T* m_state;
  public:
    write(T& state) throw() : m_state(&state) {
      m_state->wrlock();
    }
    ~write() throw() {
      if (m_state) { m_state->wrunlock(); }
    }
  public:
    void cancel() throw() { m_state = 0; }
  };
}




/* Wait-Free Fast-Pathed Read/Write Mutex
_________________________________________________________*/
class rwmutex_win {
  rwmutex_win(rwmutex_win const&);
  rwmutex_win const& operator =(rwmutex_win const&);


public:
  typedef lockguard::read<rwmutex_win> lock_read;
  typedef lockguard::write<rwmutex_win> lock_write;


private:
  LONG m_count;
  LONG m_rdwake;
  LONG m_wrrecurse;
  CRITICAL_SECTION m_wrlock;
  HANDLE m_wrwset;
  HANDLE m_rdwset;


public:
  rwmutex_win()
    : m_count(LONG_MAX),
      m_rdwake(0),
      m_wrrecurse(0),
      m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
      m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
    if (! m_rdwset || ! m_wrwset) {
      if (m_rdwset) { CloseHandle(m_rdwset); }
      if (m_wrwset) { CloseHandle(m_wrwset); }
      RWMUTEX_WIN_CTOR_UNEXPECTED();
    }
    InitializeCriticalSection(&m_wrlock);
  }

  ~rwmutex_win() throw() {
    if (m_count != LONG_MAX ||
        m_rdwake || m_wrrecurse) {
      RWMUTEX_WIN_DTOR_UNEXPECTED();
    }
    DeleteCriticalSection(&m_wrlock);
    if (! CloseHandle(m_rdwset) ||
        ! CloseHandle(m_wrwset)) {
      RWMUTEX_WIN_DTOR_UNEXPECTED();
    }
  }


public:
  void wrlock() throw() {
    EnterCriticalSection(&m_wrlock);
    if (! (m_wrrecurse++)) {
      assert(m_count > -1);
      LONG count = InterlockedExchangeAdd(&m_count, -LONG_MAX);
      if (count < LONG_MAX) {
        if (InterlockedExchangeAdd(&m_rdwake,
              LONG_MAX - count) + LONG_MAX - count) {
          if (WaitForSingleObject(m_wrwset,
                INFINITE) != WAIT_OBJECT_0) {
            RWMUTEX_WIN_UNEXPECTED();
          }
        }
      }
    }
  }

  void wrunlock() throw() {
    assert(m_count < 1);
    if (! (--m_wrrecurse)) {
      LONG count = InterlockedExchangeAdd(&m_count, LONG_MAX);
      if (count < 0) {
        if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
          RWMUTEX_WIN_UNEXPECTED();
        }
      }
    }
    LeaveCriticalSection(&m_wrlock);
  }


public:
  void rdlock() throw() {
    assert(m_count <= LONG_MAX);
    LONG count = InterlockedDecrement(&m_count);
    if (count < 0) {
      if (WaitForSingleObject(m_rdwset,
            INFINITE) != WAIT_OBJECT_0) {
        assert(false); RWMUTEX_WIN_UNEXPECTED();
      }
    }
  }

  void rdunlock() throw() {
    assert(m_count < LONG_MAX);
    LONG count = InterlockedIncrement(&m_count);
    if (count < 1) {
      if (! InterlockedDecrement(&m_rdwake)) {
        if (! SetEvent(m_wrwset)) {
          RWMUTEX_WIN_UNEXPECTED();
        }
      }
    }
  }


public:
  bool tryrdtowr() throw() {
    assert(m_count < LONG_MAX);
    if (TryEnterCriticalSection(&m_wrlock)) {
      assert(! m_wrrecurse);
      ++m_wrrecurse;
      if (m_wrrecurse == 1) {
        assert(m_count > -1);
        LONG count =
          InterlockedExchangeAdd(&m_count, (-LONG_MAX) + 1) + 1;
        if (count < LONG_MAX) {
          if (InterlockedExchangeAdd(&m_rdwake,
                LONG_MAX - count) + LONG_MAX - count) {
            if (WaitForSingleObject(m_wrwset,
                  INFINITE) != WAIT_OBJECT_0) {
              RWMUTEX_WIN_UNEXPECTED();
            }
          }
        }
      }
      return true;
    }
    return false;
  }
};




/*************************************************************/
#endif

________________________________________________________________


Enjoy!   ;^)


BTW, any comments?


0
Reply Chris 11/16/2007 3:29:48 AM

Chris,

I have your previous copy, and the one I used for my context switch rate 
profile testing (did you see that message in the ISLL thread?)

I just wanted to ask you a few things.

I believe I did see an slight improvement with I changed

     InitializeCriticalSection(&m_wrlock);

to

     InitializeCriticalSectionAndSpinCount(&m_wrlock,4);


Also, in my evaluation and comments regarding deprecating the 
depth=LONG_MAX,  you stated something that implied there was a 
relationship with the # of cpus.

If so, then why not use get the CPU count and use that?  I did this in 
my copy of your previous posted version:

   int rwMutextCpuCount()
   {
      int iTotalCpus = LONG_MAX;
#ifdef WIN32
       SYSTEM_INFO systemInfo;
       GetSystemInfo(&systemInfo);
       iTotalCpus    = systemInfo.dwNumberOfProcessors;
#endif
      return iTotalCpus;
   }

and in your constructor:

   rwmutex_win()
     : m_cpus(rwMutextCpuCount()),
       m_count(m_cpus),
       m_rdwake(0),
       m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
       m_rdwset(CreateSemaphore(0, 0, m_cpus, 0)) {
     if (! m_rdwset || ! m_wrwset) {
       if (m_rdwset) { CloseHandle(m_rdwset); }
       if (m_wrwset) { CloseHandle(m_wrwset); }
       throw;
     }
     InitializeCriticalSection(&m_wrlock);
   }

I don't recall seeing any real different on my machine/testing.

But for the profile testing I did and posted, I used the LONG_MAX for 
the comparison testing.

--
HLS

Chris Thomasson wrote:
> Here is the latest version. It has some addedd assertions for "sanity" 
> checking, and I added some scoped locking classes as promised. However, 
> I did not make the writer lock recursive just yet. I am thinking about 
> how much that is really needed. Well, here it is:
> 
> 
> http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
> ________________________________________________________________
> #if ! defined(RWMUTEX_WIN_HPP)
>  #define RWMUTEX_WIN_HPP()0x0001UL
>  #undef WIN32_LEAN_AND_MEAN
>  #undef _WIN32_WINNT
>  #define _WIN32_WINNT 0x0400
>  #include <windows.h>
>  #include <exception>
>  #include <cassert>
>  #include <climits>
>  #if ! defined(RWMUTEX_WIN_UNEXPECTED)
>    #if defined(_MSC_VER)
>      #define RWMUTEX_WIN_UNEXPECTED() \
>        assert(false), unexpected()
>    #else
>      #define RWMUTEX_WIN_UNEXPECTED() \
>        assert(false), std::unexpected()
>    #endif
>  #endif
>  #if ! defined(RWMUTEX_WIN_CTOR_UNEXPECTED)
>    #define RWMUTEX_WIN_CTOR_UNEXPECTED() \
>      assert(false); throw std::exception()
>  #endif
>  #if ! defined(RWMUTEX_WIN_DTOR_UNEXPECTED)
>    #define RWMUTEX_WIN_DTOR_UNEXPECTED()assert(false)
>  #endif
> /*************************************************************/
> 
> 
> 
> 
> /* Simple Scoped Lock Helpers
> _________________________________________________________*/
> namespace lockguard {
>  template<typename T>
>  class read {
>    T* m_state;
> 
>  public:
>    read(T& state) throw()
>      : m_state(&state) {
>      m_state->rdlock();
>    }
> 
>    ~read() throw() {
>      if (m_state) {
>        m_state->rdunlock();
>      }
>    }
> 
>  public:
>    void cancel() throw() {
>      m_state = 0;
>    }
>  };
> 
> 
>  template<typename T>
>  class write {
>    T* m_state;
> 
>  public:
>    write(T& state) throw()
>      : m_state(&state) {
>      m_state->wrlock();
>    }
> 
>    ~write() throw() {
>      if (m_state) {
>        m_state->wrunlock();
>      }
>    }
> 
>  public:
>    void cancel() throw() {
>      m_state = 0;
>    }
>  };
> }
> 
> 
> 
> 
> /* Wait-Free Fast-Pathed Read/Write Mutex
> _________________________________________________________*/
> class rwmutex_win {
>  rwmutex_win(rwmutex_win const&);
>  rwmutex_win const& operator =(rwmutex_win const&);
> 
> 
> public:
>  typedef lockguard::read<rwmutex_win> lock_read;
>  typedef lockguard::write<rwmutex_win> lock_write;
> 
> 
> private:
>  LONG m_count;
>  LONG m_rdwake;
>  CRITICAL_SECTION m_wrlock;
>  HANDLE m_wrwset;
>  HANDLE m_rdwset;
> 
> 
> public:
>  rwmutex_win()
>    : m_count(LONG_MAX),
>      m_rdwake(0),
>      m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
>      m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
>    if (! m_rdwset || ! m_wrwset) {
>      if (m_rdwset) { CloseHandle(m_rdwset); }
>      if (m_wrwset) { CloseHandle(m_wrwset); }
>      RWMUTEX_WIN_CTOR_UNEXPECTED();
>    }
>    InitializeCriticalSection(&m_wrlock);
>  }
> 
>  ~rwmutex_win() throw() {
>    if (m_count != LONG_MAX || m_rdwake) {
>      RWMUTEX_WIN_DTOR_UNEXPECTED();
>    }
>    DeleteCriticalSection(&m_wrlock);
>    if (! CloseHandle(m_rdwset) ||
>        ! CloseHandle(m_wrwset)) {
>      RWMUTEX_WIN_DTOR_UNEXPECTED();
>    }
>  }
> 
> 
> public:
>  void wrlock() throw() {
>    EnterCriticalSection(&m_wrlock);
>    assert(m_count > -1);
>    LONG count =
>      InterlockedExchangeAdd(&m_count, -LONG_MAX);
>    if (count < LONG_MAX) {
>      if (InterlockedExchangeAdd(&m_rdwake,
>            LONG_MAX - count) + LONG_MAX - count) {
>        if (WaitForSingleObject(m_wrwset,
>              INFINITE) != WAIT_OBJECT_0) {
>          RWMUTEX_WIN_UNEXPECTED();
>        }
>      }
>    }
>  }
> 
>  void wrunlock() throw() {
>    assert(m_count < 1);
>    LONG count =
>      InterlockedExchangeAdd(&m_count, LONG_MAX);
>    if (count < 0) {
>      if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
>        RWMUTEX_WIN_UNEXPECTED();
>      }
>    }
>    LeaveCriticalSection(&m_wrlock);
>  }
> 
> 
> public:
>  void rdlock() throw() {
>    assert(m_count <= LONG_MAX);
>    LONG count = InterlockedDecrement(&m_count);
>    if (count < 0) {
>      if (WaitForSingleObject(m_rdwset,
>            INFINITE) != WAIT_OBJECT_0) {
>        assert(false); RWMUTEX_WIN_UNEXPECTED();
>      }
>    }
>  }
> 
>  void rdunlock() throw() {
>    assert(m_count < LONG_MAX);
>    LONG count = InterlockedIncrement(&m_count);
>    if (count < 1) {
>      if (! InterlockedDecrement(&m_rdwake)) {
>        if (! SetEvent(m_wrwset)) {
>          RWMUTEX_WIN_UNEXPECTED();
>        }
>      }
>    }
>  }
> 
> 
> public:
>  bool tryrdtowr() throw() {
>    assert(m_count < LONG_MAX);
>    if (TryEnterCriticalSection(&m_wrlock)) {
>      assert(m_count > -1);
>      LONG count =
>        InterlockedExchangeAdd(&m_count, (-LONG_MAX) + 1) + 1;
>      if (count < LONG_MAX) {
>        if (InterlockedExchangeAdd(&m_rdwake,
>              LONG_MAX - count) + LONG_MAX - count) {
>          if (WaitForSingleObject(m_wrwset,
>                INFINITE) != WAIT_OBJECT_0) {
>            RWMUTEX_WIN_UNEXPECTED();
>          }
>        }
>      }
>      return true;
>    }
>    return false;
>  }
> };
> 
> 
> 
> 
> /*************************************************************/
> #endif
> ________________________________________________________________
> 
> 
> 
> Enjoy!
> 
> 
> BTW, have any of you been tinkering around with this particular design?
> 
> 
> Thanks.
0
Reply Pops 11/16/2007 10:17:38 PM

I think that in most uses write lock will be held for quite a while.

Using a CS with a spin count only makes sense if the CS is usually held for 
very little time, to justify wait in a tight loop.

"Pops" <dude@nospamnospamnospam.com> wrote in message 
news:%23csdw6JKIHA.3400@TK2MSFTNGP03.phx.gbl...
> Chris,
>
> I have your previous copy, and the one I used for my context switch rate 
> profile testing (did you see that message in the ISLL thread?)
>
> I just wanted to ask you a few things.
>
> I believe I did see an slight improvement with I changed
>
>     InitializeCriticalSection(&m_wrlock);
>
> to
>
>     InitializeCriticalSectionAndSpinCount(&m_wrlock,4);
>
>
> Also, in my evaluation and comments regarding deprecating the 
> depth=LONG_MAX,  you stated something that implied there was a 
> relationship with the # of cpus.
>
> If so, then why not use get the CPU count and use that?  I did this in my 
> copy of your previous posted version:
>
>   int rwMutextCpuCount()
>   {
>      int iTotalCpus = LONG_MAX;
> #ifdef WIN32
>       SYSTEM_INFO systemInfo;
>       GetSystemInfo(&systemInfo);
>       iTotalCpus    = systemInfo.dwNumberOfProcessors;
> #endif
>      return iTotalCpus;
>   }
>
> and in your constructor:
>
>   rwmutex_win()
>     : m_cpus(rwMutextCpuCount()),
>       m_count(m_cpus),
>       m_rdwake(0),
>       m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
>       m_rdwset(CreateSemaphore(0, 0, m_cpus, 0)) {
>     if (! m_rdwset || ! m_wrwset) {
>       if (m_rdwset) { CloseHandle(m_rdwset); }
>       if (m_wrwset) { CloseHandle(m_wrwset); }
>       throw;
>     }
>     InitializeCriticalSection(&m_wrlock);
>   }
>
> I don't recall seeing any real different on my machine/testing.
>
> But for the profile testing I did and posted, I used the LONG_MAX for the 
> comparison testing.
>
> --
> HLS
>
> Chris Thomasson wrote:
>> Here is the latest version. It has some addedd assertions for "sanity" 
>> checking, and I added some scoped locking classes as promised. However, I 
>> did not make the writer lock recursive just yet. I am thinking about how 
>> much that is really needed. Well, here it is:
>>
>>
>> http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
>> ________________________________________________________________
>> #if ! defined(RWMUTEX_WIN_HPP)
>>  #define RWMUTEX_WIN_HPP()0x0001UL
>>  #undef WIN32_LEAN_AND_MEAN
>>  #undef _WIN32_WINNT
>>  #define _WIN32_WINNT 0x0400
>>  #include <windows.h>
>>  #include <exception>
>>  #include <cassert>
>>  #include <climits>
>>  #if ! defined(RWMUTEX_WIN_UNEXPECTED)
>>    #if defined(_MSC_VER)
>>      #define RWMUTEX_WIN_UNEXPECTED() \
>>        assert(false), unexpected()
>>    #else
>>      #define RWMUTEX_WIN_UNEXPECTED() \
>>        assert(false), std::unexpected()
>>    #endif
>>  #endif
>>  #if ! defined(RWMUTEX_WIN_CTOR_UNEXPECTED)
>>    #define RWMUTEX_WIN_CTOR_UNEXPECTED() \
>>      assert(false); throw std::exception()
>>  #endif
>>  #if ! defined(RWMUTEX_WIN_DTOR_UNEXPECTED)
>>    #define RWMUTEX_WIN_DTOR_UNEXPECTED()assert(false)
>>  #endif
>> /*************************************************************/
>>
>>
>>
>>
>> /* Simple Scoped Lock Helpers
>> _________________________________________________________*/
>> namespace lockguard {
>>  template<typename T>
>>  class read {
>>    T* m_state;
>>
>>  public:
>>    read(T& state) throw()
>>      : m_state(&state) {
>>      m_state->rdlock();
>>    }
>>
>>    ~read() throw() {
>>      if (m_state) {
>>        m_state->rdunlock();
>>      }
>>    }
>>
>>  public:
>>    void cancel() throw() {
>>      m_state = 0;
>>    }
>>  };
>>
>>
>>  template<typename T>
>>  class write {
>>    T* m_state;
>>
>>  public:
>>    write(T& state) throw()
>>      : m_state(&state) {
>>      m_state->wrlock();
>>    }
>>
>>    ~write() throw() {
>>      if (m_state) {
>>        m_state->wrunlock();
>>      }
>>    }
>>
>>  public:
>>    void cancel() throw() {
>>      m_state = 0;
>>    }
>>  };
>> }
>>
>>
>>
>>
>> /* Wait-Free Fast-Pathed Read/Write Mutex
>> _________________________________________________________*/
>> class rwmutex_win {
>>  rwmutex_win(rwmutex_win const&);
>>  rwmutex_win const& operator =(rwmutex_win const&);
>>
>>
>> public:
>>  typedef lockguard::read<rwmutex_win> lock_read;
>>  typedef lockguard::write<rwmutex_win> lock_write;
>>
>>
>> private:
>>  LONG m_count;
>>  LONG m_rdwake;
>>  CRITICAL_SECTION m_wrlock;
>>  HANDLE m_wrwset;
>>  HANDLE m_rdwset;
>>
>>
>> public:
>>  rwmutex_win()
>>    : m_count(LONG_MAX),
>>      m_rdwake(0),
>>      m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
>>      m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {
>>    if (! m_rdwset || ! m_wrwset) {
>>      if (m_rdwset) { CloseHandle(m_rdwset); }
>>      if (m_wrwset) { CloseHandle(m_wrwset); }
>>      RWMUTEX_WIN_CTOR_UNEXPECTED();
>>    }
>>    InitializeCriticalSection(&m_wrlock);
>>  }
>>
>>  ~rwmutex_win() throw() {
>>    if (m_count != LONG_MAX || m_rdwake) {
>>      RWMUTEX_WIN_DTOR_UNEXPECTED();
>>    }
>>    DeleteCriticalSection(&m_wrlock);
>>    if (! CloseHandle(m_rdwset) ||
>>        ! CloseHandle(m_wrwset)) {
>>      RWMUTEX_WIN_DTOR_UNEXPECTED();
>>    }
>>  }
>>
>>
>> public:
>>  void wrlock() throw() {
>>    EnterCriticalSection(&m_wrlock);
>>    assert(m_count > -1);
>>    LONG count =
>>      InterlockedExchangeAdd(&m_count, -LONG_MAX);
>>    if (count < LONG_MAX) {
>>      if (InterlockedExchangeAdd(&m_rdwake,
>>            LONG_MAX - count) + LONG_MAX - count) {
>>        if (WaitForSingleObject(m_wrwset,
>>              INFINITE) != WAIT_OBJECT_0) {
>>          RWMUTEX_WIN_UNEXPECTED();
>>        }
>>      }
>>    }
>>  }
>>
>>  void wrunlock() throw() {
>>    assert(m_count < 1);
>>    LONG count =
>>      InterlockedExchangeAdd(&m_count, LONG_MAX);
>>    if (count < 0) {
>>      if (! ReleaseSemaphore(m_rdwset, -count, 0)) {
>>        RWMUTEX_WIN_UNEXPECTED();
>>      }
>>    }
>>    LeaveCriticalSection(&m_wrlock);
>>  }
>>
>>
>> public:
>>  void rdlock() throw() {
>>    assert(m_count <= LONG_MAX);
>>    LONG count = InterlockedDecrement(&m_count);
>>    if (count < 0) {
>>      if (WaitForSingleObject(m_rdwset,
>>            INFINITE) != WAIT_OBJECT_0) {
>>        assert(false); RWMUTEX_WIN_UNEXPECTED();
>>      }
>>    }
>>  }
>>
>>  void rdunlock() throw() {
>>    assert(m_count < LONG_MAX);
>>    LONG count = InterlockedIncrement(&m_count);
>>    if (count < 1) {
>>      if (! InterlockedDecrement(&m_rdwake)) {
>>        if (! SetEvent(m_wrwset)) {
>>          RWMUTEX_WIN_UNEXPECTED();
>>        }
>>      }
>>    }
>>  }
>>
>>
>> public:
>>  bool tryrdtowr() throw() {
>>    assert(m_count < LONG_MAX);
>>    if (TryEnterCriticalSection(&m_wrlock)) {
>>      assert(m_count > -1);
>>      LONG count =
>>        InterlockedExchangeAdd(&m_count, (-LONG_MAX) + 1) + 1;
>>      if (count < LONG_MAX) {
>>        if (InterlockedExchangeAdd(&m_rdwake,
>>              LONG_MAX - count) + LONG_MAX - count) {
>>          if (WaitForSingleObject(m_wrwset,
>>                INFINITE) != WAIT_OBJECT_0) {
>>            RWMUTEX_WIN_UNEXPECTED();
>>          }
>>        }
>>      }
>>      return true;
>>    }
>>    return false;
>>  }
>> };
>>
>>
>>
>>
>> /*************************************************************/
>> #endif
>> ________________________________________________________________
>>
>>
>>
>> Enjoy!
>>
>>
>> BTW, have any of you been tinkering around with this particular design?
>>
>>
>> Thanks. 


0
Reply Alexander 11/17/2007 3:15:12 AM

"Alexander Grigoriev" <alegr@earthlink.net> wrote in message 
news:OO9kSgMKIHA.4196@TK2MSFTNGP04.phx.gbl...
>I think that in most uses write lock will be held for quite a while.
>
> Using a CS with a spin count only makes sense if the CS is usually held 
> for very little time, to justify wait in a tight loop.

Agreed.



>
> "Pops" <dude@nospamnospamnospam.com> wrote in message 
> news:%23csdw6JKIHA.3400@TK2MSFTNGP03.phx.gbl...
>> Chris,
>>
>> I have your previous copy, and the one I used for my context switch rate 
>> profile testing (did you see that message in the ISLL thread?)
>>
>> I just wanted to ask you a few things.
>>
>> I believe I did see an slight improvement with I changed
>>
>>     InitializeCriticalSection(&m_wrlock);
>>
>> to
>>
>>     InitializeCriticalSectionAndSpinCount(&m_wrlock,4);
>>
>>
>> Also, in my evaluation and comments regarding deprecating the 
>> depth=LONG_MAX,  you stated something that implied there was a 
>> relationship with the # of cpus.
>>
>> If so, then why not use get the CPU count and use that?  I did this in my 
>> copy of your previous posted version:

I use LONG_MAX because I want the algorithm to be able to support a large 
amount of readers. I cannot set it to the number of cpus simply because a 
user can decide to create 10 threads in a 2 CPU system. The depth of the 
counter is directly proportional to the total number of reader threads, not 
cpus, that can access the rw-mutex at any one time. 

0
Reply Chris 11/18/2007 2:24:57 AM

[comp.os.ms-windows.programmer.win32 was added...]

"Chris Thomasson" <cristom@comcast.net> wrote in message 
news:i5idndsOPO1ifLranZ2dnUVZ_rignZ2d@comcast.com...
> This simplistic reader/writer algorithm is wait-free in the sense that 
> there are no loops; no CAS or LL/SC:
>
>
> http://appcore.home.comcast.net/misc/rwmutex_win_hpp.html
>
>
> So far, the counting algorithm looks okay. I guess at a certain level its 
> uses some of the same basic principals that one of Joe Seighs wait-free 
> semaphore algorithms used. Everything takes place in single atomic 
> statements which do not depend on any prior state holding a consistent 
> value within a specific timeframe, unlike CAS or LL/SC. There is no need 
> for any sort of "compare logic" in order to commit a mutation to the 
> shared counter. You don't have to load; modify; store only if prior load 
> is consistent... Therefore, simple increments, decrements and 
> fetch-and-add is all that is really required. There is no timeout logic as 
> of yet. However, I believe the same basic scheme that Joe used can be make 
> to work within the context of my algorithm...
>
> Anyway, it should compile fine. Please tinker around with it, and see if 
> you can come up with some real nasty race-conditions!
[...]

I have updated the code some more. I removed some local variables, and I 
made some of the shared ones mutable in order to decorate the read-access 
functions as const. I tweaked the code for the writer locking. Basically, I 
moved the functionality out into two private functions (e.g., 
rwmutex::sys_wracquire/release).

Also, I created a stupid simple version thing based on namespaces; 
simplistic analog of the following technique:

http://groups.google.com/group/comp.lang.c++.moderated/msg/bb45c17cdd3c04a3



Here is the latest version:


http://appcore.home.comcast.net/misc/winsync_v0001_rwmutex_hpp.html
__________________________________________________________________
/*************************************************************/
/*************************************************************
==============================================================


  Experimental, Fast-Pathed Thread Synchronization for Windows
  ___________________________________________________________
                     //              \\
                     \\  Created By  //
            _________//______________\\__________
           //                                   \\
           *|                                   |*
           *|        Chris M. Thomasson         |*
           *|                                   |*
           *|       cristom@comcast.net         |*
           *|  news://comp.programming.threads  |*
           *|  http://appcore.home.comcast.net  |*
           *|                                   |*
           \\___________________________________//



==============================================================
/*************************************************************/
/*************************************************************/



// validate that we are using a C++ compiler
#if ! defined(__cplusplus)
  #error RWMUTEX_WIN_BUILD - C++ compiler is required!!!


// check include guard
#elif ! defined(RWMUTEX_WIN_V0001_HPP)

  // include guard file version 0001
  #define RWMUTEX_WIN_V0001_HPP()0x1UL

  // setup some windows build flags
  #undef WIN32_LEAN_AND_MEAN
  #undef _WIN32_WINNT
  #define WIN32_LEAN_AND_MEAN
  #define _WIN32_WINNT 0x0400 // for TryEnterCriticalSection

  // include required support
  #include <windows.h> // for a handfull windows API calls
  #include <exception> // for std::exception, std::unexpected
  #include <cassert>   // for RWMUTEX_WIN_ASSERT
  #include <climits>   // for LONG_MAX

  // the assertion macro
  #if ! defined(RWMUTEX_WIN_ASSERT)
    #define RWMUTEX_WIN_ASSERT(mp_this, mp_exp) \
      assert((mp_exp))
  #endif

  // the constructor failure macro
  #if ! defined(RWMUTEX_WIN_CTOR_UNEXPECTED)
    #define RWMUTEX_WIN_CTOR_UNEXPECTED(mp_this) \
      RWMUTEX_WIN_ASSERT(mp_this, false); \
      throw std::exception()
  #endif

  // the macros called when the shi% hits the fan!  ;^0
  #if ! defined(RWMUTEX_WIN_UNEXPECTED)
    #if defined(_MSC_VER)
      #define RWMUTEX_WIN_UNEXPECTED(mp_this) \
        RWMUTEX_WIN_ASSERT(mp_this, false), unexpected()
    #else
      #define RWMUTEX_WIN_UNEXPECTED(mp_this) \
        RWMUTEX_WIN_ASSERT(mp_this, false), std::unexpected()
    #endif
  #endif
  #if ! defined(RWMUTEX_WIN_DTOR_UNEXPECTED)
    #define RWMUTEX_WIN_DTOR_UNEXPECTED(mp_this) \
      RWMUTEX_WIN_ASSERT(mp_this, false)
  #endif
  #if ! defined(RWMUTEX_WIN_DTOR_SANITY_UNEXPECTED)
    #define RWMUTEX_WIN_DTOR_SANITY_UNEXPECTED(mp_this) \
      RWMUTEX_WIN_ASSERT(mp_this, false)
  #endif

  // the version default macro
  #if ! defined(RWMUTEX_WIN_VER_DEFAULT)
    #define RWMUTEX_WIN_VER_DEFAULT()v0001
  #endif

  // the version selection macro
  #if ! defined(RWMUTEX_WIN_VER_SELECT)
    #define RWMUTEX_WIN_VER_SELECT RWMUTEX_WIN_VER_DEFAULT
  #endif
/*************************************************************/
/*************************************************************/





/* Windows Syncronization System Library Version 0001
_________________________________________________________*/


/* Public Forward Declarations
___________________________________________________*/
namespace winsyncsys {
namespace v0001 {
  class rwmutex;

  namespace raii {
  namespace lock {
    template<typename T> class read;
    template<typename T> class write;
  }} // namespace raii::lock

}} // namespace winsyncsys::v0001




/* Public Definitions/Declarations
___________________________________________________*/
namespace winsyncsys {
namespace v0001 {




/* RAII Scoped Helper Objects
___________________________________________________*/
namespace raii {
namespace lock {

  // scoped read-access helper
  template<typename T>
  class read {
    T* m_state;
    mutable bool m_upgraded;

  public:
    inline read(T& state) throw()
      : m_state(&state),
        m_upgraded(false) {
      m_state->rdlock();
    }

    inline ~read() throw() {
      if (m_state) {
        if (! m_upgraded) {
          m_state->rdunlock();
        } else {
          m_state->wrunlock();
        }
      }
    }

  public:
    inline void cancel() throw() {
      m_state = 0;
    }

    inline bool is_active() const throw() {
      return (m_state);
    }

    inline bool is_upgraded() const throw() {
      RWMUTEX_WIN_ASSERT(this, (m_upgraded) ? m_state : false);
      return m_upgraded;
    }

    inline bool tryrdtowr() const throw() {
      RWMUTEX_WIN_ASSERT(this, ! m_upgraded && m_state);
      return (m_state && ! m_upgraded) ?
        (m_upgraded = m_state->tryrdtowr()) : false;
    }
  }; // class read


  // scoped write-access helper
  template<typename T>
  class write {
    T* m_state;

  public:
    inline write(T& state, bool trylock = false) throw()
      : m_state(&state) {
      if (! trylock) {
        m_state->wrlock();
      } else {
        if (! m_state->trywrlock()) {
          m_state = 0;
        }
      }
    }

    inline ~write() throw() {
      if (m_state) { m_state->wrunlock(); }
    }

  public:
    inline void cancel() throw() {
      m_state = 0;
    }

    inline bool is_active() const throw() {
      return (m_state);
    }
  }; // class write

}} // namespace raii::lock




/* Wait-Free Fast-Pathed Read/Write Mutex Object
___________________________________________________*/
class rwmutex {

  // no copy/assign
  rwmutex(rwmutex const&);
  rwmutex const& operator =(rwmutex const&);



// Public Scoped-Locking Utility (RAII)
public:

  // obtain scoped read-access
  typedef raii::lock::read<rwmutex> read;

  // obtain scoped write-access
  typedef raii::lock::write<rwmutex> write;



// Private State
private:

  // read/write-access (atomic count: fast/slow-path)
  mutable LONG m_count;

  // read-epoch (atomic differental-count: fast/slow-path)
  mutable LONG m_rdepoch;

  // write recursion (simple count: fast/slow-path)
  LONG m_wrrecurse;

  // writer-access overall lock (mutex: fast/slow-path)
  mutable CRITICAL_SECTION m_wrlock;

  // waiting writer (waitset: bin-sema slow-path)
  HANDLE m_wrwset;

  // waiting reader(s) (waitset: sema slow-path)
  HANDLE m_rdwset;



// Public Constructor/Destructor
public:

  // initialize the object for LONG_MAX total readers...
  rwmutex()
    : m_count(LONG_MAX),
      m_rdepoch(0),
      m_wrrecurse(0),
      m_wrwset(CreateEvent(0, FALSE, FALSE, 0)),
      m_rdwset(CreateSemaphore(0, 0, LONG_MAX, 0)) {

    // ensure we acquired our sync resources
    if (! m_rdwset || ! m_wrwset) {
      if (m_rdwset) { CloseHandle(m_rdwset); }
      if (m_wrwset) { CloseHandle(m_wrwset); }

      RWMUTEX_WIN_CTOR_UNEXPECTED(this); // Out-Of-Resources!
    }

    InitializeCriticalSection(&m_wrlock);
  }


  // destroy the object...
  ~rwmutex() throw() {

    // a basic sanity check!  ;^)
    if (m_count != LONG_MAX ||
        m_rdepoch ||
        m_wrrecurse) {

      /* the shit hit the fuc%king fan because this is
        being destroyed, while its being used!!! :^0
      */

      RWMUTEX_WIN_DTOR_SANITY_UNEXPECTED(this); // DOH!
    }

    // make sure we can release our sync resources
    DeleteCriticalSection(&m_wrlock);

    if (! CloseHandle(m_rdwset) ||
        ! CloseHandle(m_wrwset)) {

      RWMUTEX_WIN_DTOR_UNEXPECTED(this); // YIKES!
    }
  }



// Private System Writer API
private:

  // adjust state wrt introducing writer-access...
  void sys_wracquire(LONG CONST addend = 0) throw() {

    // inc-and-test writer recursion depth
    if (! (m_wrrecurse++)) {

      // ensure we are the only writer
      RWMUTEX_WIN_ASSERT(this, m_count > -1);

      // introduce writer request
      LONG CONST count =
        InterlockedExchangeAdd(
          &m_count,
          (-(LONG_MAX)) + addend) + addend;

      // check for readers
      if (count < LONG_MAX) {

        // increment the differential read-epoch count
        if (InterlockedExchangeAdd(
              &m_rdepoch,
              LONG_MAX - count) + LONG_MAX - count) {

          // wait for the readers in the current read-epoch
          if (WaitForSingleObject(
                m_wrwset,
                INFINITE) != WAIT_OBJECT_0) {

            RWMUTEX_WIN_UNEXPECTED(this); // SON-OF-A-BIT%CH!
          }
        }
      }
    }
  }


  // adjust state wrt revoking writer-access...
  void sys_wrrelease(LONG CONST addend = LONG_MAX) throw() {

    // validate prior write-access
    RWMUTEX_WIN_ASSERT(this, m_count < 1);

    // dec-and-test writer recursion depth
    if (! (--m_wrrecurse)) {

      // revoke prior writer request
      LONG CONST count =
        InterlockedExchangeAdd(
          &m_count,
          addend);

      // check for waiting readers
      if (count < 0) {

        /* signal all the the reader(s) waiting on the
          next read-epoch
        */
        if (! ReleaseSemaphore(
                m_rdwset,
                -count,
                0)) {

          RWMUTEX_WIN_UNEXPECTED(this); // OMFG!!
        }
      }
    }
  }



// Public Writer API
public:

  // lock for writer-access...
  inline void wrlock() throw() {

    // obtain overall mutual exclusion
    EnterCriticalSection(&m_wrlock);

    // let everybody know a writer is in the pipe
    sys_wracquire();

    // we have write-access!
  }


  // try-lock for writer-access...
  inline bool trywrlock() throw() {

    // try to obtain overall mutual exclusion
    if (TryEnterCriticalSection(&m_wrlock)) {

      // let everybody know a writer is in the pipe
      sys_wracquire();

      // we have been granted write-access
      return true;
    }

    // CRAP! somebody else beat us here ;^(...
    return false;
  }


  // unlock for writer-access...
  inline void wrunlock() throw() {

    // let everybody know the writer is leaving
    sys_wrrelease();

    // release overall mutual exclusion
    LeaveCriticalSection(&m_wrlock);

    // we have released the prior write-access!
  }



// Public Reader API
public:

  // lock for read-access...
  inline void rdlock() const throw() {

    // validate the count
    RWMUTEX_WIN_ASSERT(this, m_count <= LONG_MAX);

    // introduce a read-access request
    if (InterlockedDecrement(&m_count) < 0) {

      // there is writer-access at this time, we need to
      // wait on the current read-epoch
      if (WaitForSingleObject(
            m_rdwset,
            INFINITE) != WAIT_OBJECT_0) {

        RWMUTEX_WIN_UNEXPECTED(this); // NOT GOOD!!
      }
    }

    // we have read-access!
  }


  // unlock for read-access...
  inline void rdunlock() const throw() {

    // validate a prior read-access
    RWMUTEX_WIN_ASSERT(this, m_count < LONG_MAX);

    // revoke the prior read-access
    if (InterlockedIncrement(&m_count) < 1) {

      /* there is a waiting writer, therefore, we need to
        decrement the differential read-epoch count
      */
      if (! InterlockedDecrement(&m_rdepoch)) {

        /* we are last reader in the current read-epoch,
          therefore, we need to singal the waiting writer
        */
        if (! SetEvent(m_wrwset)) {

          RWMUTEX_WIN_UNEXPECTED(this); // CRAP!
        }
      }
    }

    // we have released the prior read-access!
  }


  // try to upgrade read-access to writer-access...
  inline bool tryrdtowr() const throw() {

    // validate prior read-access
    RWMUTEX_WIN_ASSERT(this, m_count < LONG_MAX);

    // try to obtain overall mutual exclusion
    if (TryEnterCriticalSection(&m_wrlock)) {

      /* let everybody know a writer is in the pipe,
        and atomically revoke the prior read-access
      */
      const_cast<rwmutex*>(this)->sys_wracquire(1);

      /*  we have atomically rendered the prior read-access
        into a write-access!  :^)
      */
      return true;
    }

    /* another write-request beat us to the punch;
      sh%t happens! ;^)
    */
    return false;
  }

}; // class rwmutex

}} // namespace winsyncsys::v0001




/* Basic Windows Synchronization User Library Abstraction
_________________________________________________________*/
namespace winsync = winsyncsys::RWMUTEX_WIN_VER_SELECT();





/*************************************************************/
/*************************************************************/
#endif // #if ! defined(RWMUTEX_WIN_V0001_HPP)
__________________________________________________________________




I am creating some simple usage examples. Basically, you can use it like:
__________________________________________________________________
static winsync::rwmutex g_rwmtx;

void readers() {
  for(;;) {
    [...]

    {
      winsync::rwmutex::read rdlock(g_rwmtx);
        [read-access granted]
    }

    [...]
  }
}

void writers() {
  for(;;) {
    [...]

    {
      winsync::rwmutex::write wrlock(g_rwmtx);
        [write-access granted]
    }

    [...]
  }
}
__________________________________________________________________



Its fairly simple to use indeed... BTW, sorry for some of the spelling 
mistakes in the comments. I need to run those through a spell checker. Humm, 
do any of you know of a tool that can spell check comments in C/C++ code? I 
think I could write a macro for ms-word or something... 

0
Reply Chris 11/27/2007 8:03:46 PM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:08561920-beb9-488b-9bc1-dae34fee1144@e23g2000prf.googlegroups.com...
> On Nov 27, 11:03 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> I have updated the code some more. I removed some local variables, and I
>> made some of the shared ones mutable in order to decorate the read-access
>> functions as const. I tweaked the code for the writer locking. Basically, 
>> I
>> moved the functionality out into two private functions (e.g.,
>> rwmutex::sys_wracquire/release).
>
>
> I would not like to upset you, but Win32 API already includes such
> API. Along with condition variable, lock-free mpmc stack and one-time
> initialization.

Yeah. I can't believe it took them that long to include a condvar! Anyway... 
They are still missing EventCount's! Unless I am missing something here, I 
don't think keyed events are as versatile as eventcounts with regard to 
their use in non-blocking algorithms.




> Good news for you is that API is available only since
> Vista/Longhorn :)

Indeed. :^)

I was trying to do something for all the people who want to keep using NT/XP 
for as long as possible. Vista, IMVHO, does not have what it takes to 
justify a switch from NT/XP...




> For details see in MSDN:
> InitializeSRWLock()/AcquireSRWLockExclusive()/AcquireSRWLockShared()
>
> And maybe it will be interesting for you too:
> InitializeConditionVariable()/SleepConditionVariableCS()/
> SleepConditionVariableSRW()
> InitOnceInitialize()/InitOnceExecuteOnce()
> InitializeSListHead()/InterlockedPopEntrySList()/
> InterlockedPushEntrySList()

Yeah. Here is something you can do with the InterlockedXXXSList API:

http://groups.google.com/group/microsoft.public.win32.programmer.kernel/msg/dc5a8ab20816b56c

Simplistic Lock-Free LIFO/FIFO IPC. Your going to have to use something to 
make it waitable. EventCount makes this trivial. You can also use something 
like Joe Seighs fast semaphore implementation he posted on cpt several years 
ago.




> SRWLock is very slim - only sizeof pointer. Thus is not recursive. And
> not upgradeable. Shared lock/unlock - one interlocked operation as
> expected.

I believe the last rwmutex implmentation of Microsoft's was in some .NET CLI 
code here:

http://groups.google.com/group/comp.programming.threads/msg/b73084939aac60f0

I believe its structure was something like the one Michel outlined in the 
post above:
_____________________________________
uchar - readers
bit - readers signaled
bit - writers signaled
uchar - waiting readers
uchar - waiting writers
_____________________________________

It seems that the links are no longer active, anyway IIRC, the code used 
event/sema to do the waitsets. Humm, I am not sure if Windows Vista uses 
keyed events as waitsets for everything now. I also think they use CAS to 
mutate the state. The algorithm I created for NT/XP does not use CAS at all. 
It also has wait-free properties for uncontended readers. Even though the 
code is in a early stage, IMVHO, I think its more versatile than the one 
provided by Vista. What do you think?




> See this blog for details:
> http://www.bluebytesoftware.com/blog/2006/06/22/NewVistaConcurrencyFeatures.aspx

Yup; I believe I have already read that sometime ago. I guess the main point 
I want to make here is that there is no "real" need to get Vista just to 
have access to a condvar, or rwmutex. As you know, Windows 9x/XP/NT already 
exposes all of the API's that are needed to get fast-pathed synchronization 
primitives up and running; Vista NOT required for this at all.

Its kind of funny how some people seem to get so excited about Microsoft 
sticking a condvar and rwmutex implementation in Windows Vista, when they 
most certainly could of had them all the way back in Windows 9x. They also 
could have easily had a "initialize-once" implementation; why they waited 
all this time, I don't know. IMHO, POSIX still has them beat wrt 
threading/synchronization API... 

0
Reply Chris 11/29/2007 11:15:38 PM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:7f4b5bed-2c56-46e4-a629-2543f593644f@i12g2000prf.googlegroups.com...
> On Nov 30, 2:15 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Yeah. Here is something you can do with the InterlockedXXXSList API:
>>
>> http://groups.google.com/group/microsoft.public.win32.programmer.kern...
>>
>> Simplistic Lock-Free LIFO/FIFO IPC. Your going to have to use something 
>> to
>> make it waitable. EventCount makes this trivial. You can also use 
>> something
>> like Joe Seighs fast semaphore implementation he posted on cpt several 
>> years
>> ago.
>
>
> Yes. Their API is not "complete". One must bolt something on top
> anyway...
> I think that they will create more usable API in .NET on top of
> this...
>
>
> BTW SList copes with ABA in interesting way.
> They use ABA counter along with SEH __try/__catch block to prevent
> reading from freed memory.
> Something like:

[...]

Right. We have has discussion on this here in cpt. One of the participants 
in the discussion "Oliver S." or something, posted many messages with the 
"No-Archive-X" flag raised in the headers; his posts are deleted. He 
proposed Microsoft's solution. There can be several issues; I can only hope 
that Microsoft has addressed them all in their SEH implementation. This is 
similar to counting on a Load-Linked reservation to fail when the memory 
that represents it gets freed within the LL/SC scope; e.g, mem gets freed 
after LL, and before SC. That can raise some exceptional conditions. SEH is 
a "bit" of a hack?, IMVVVHO that is... 

0
Reply Chris 11/30/2007 9:52:09 PM

InterlockedCompareExchange is no more or less expensive than 
InterlockedDecrement or InterlockedIncrement. Two latter are actually 
implemented with exchange-add (XADD) instruction. All of them are of equal 
cost.

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:7e695ca8-ab73-4ada-b596-b338eb89f583@s36g2000prg.googlegroups.com...
> Instead of:
> bool lock()
> {
>   long old_flag=0;
>   while(true)
>
> Must be:
> bool lock()
> {
>   long old_flag=flag;
>   while(true)
>
>
> Hmmm.... Now I notice that he use InterlockedCompareExchange on fast-
> path too...
> So your implementation with InterlockedIncrement/InterlockedDecrement
> is somewhat better ;)
> Maybe Anthony Williams will be interested in your algorithm to
> incorporate into boost...
>
>
> Dmitriy V'jukov 


0
Reply Alexander 12/1/2007 3:56:01 AM

InterlockedCompareExchange by itself is LOCK XCHG instruction. Of course, it 
allows to implement arbitrary atomic read-modify-write functions, including 
OR, AND, XOR and conditionals, at a cost of possible retrying the operation.


"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:c87222d0-214b-4203-a254-451f396e17d6@n20g2000hsh.googlegroups.com...
On 1 ���, 06:56, "Alexander Grigoriev" <al...@earthlink.net> wrote:
> InterlockedCompareExchange is no more or less expensive than
> InterlockedDecrement or InterlockedIncrement. Two latter are actually
> implemented with exchange-add (XADD) instruction. All of them are of equal
> cost.


With InterlockedCompareExchange() you have to organize do{}while(!
CAS()) loop. So InterlockedCompareExchange() can be executed several
times instead of one.
I am not sure how significant this advantage in practice...


Dmitriy V'jukov 


0
Reply Alexander 12/1/2007 10:55:27 PM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:ec556fdd-4316-4989-83b4-9dc39efc7a04@a35g2000prf.googlegroups.com...
> On Dec 1, 12:52 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "Dmitriy Vyukov" <dvyu...@gmail.com> wrote in message
>>
>> news:7f4b5bed-2c56-46e4-a629-2543f593644f@i12g2000prf.googlegroups.com...
>>
>>
>>
>> > On Nov 30, 2:15 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>>
>> >> Yeah. Here is something you can do with the InterlockedXXXSList API:
>>
>> >>http://groups.google.com/group/microsoft.public.win32.programmer.kern...
>>
>> >> Simplistic Lock-Free LIFO/FIFO IPC. Your going to have to use 
>> >> something
>> >> to
>> >> make it waitable. EventCount makes this trivial. You can also use
>> >> something
>> >> like Joe Seighs fast semaphore implementation he posted on cpt several
>> >> years
>> >> ago.
>>
>> > Yes. Their API is not "complete". One must bolt something on top
>> > anyway...
>> > I think that they will create more usable API in .NET on top of
>> > this...
>>
>> > BTW SList copes with ABA in interesting way.
>> > They use ABA counter along with SEH __try/__catch block to prevent
>> > reading from freed memory.
>> > Something like:
>>
>> [...]
>>
[...]
>
>
> If memory is freed then CAS in lock-free stack algorithm will fail.

If the CAS even gets executed. The thing can crash on the hazard. I am not 
sure how good it is to base your non-blocking memory management scheme on 
the fact that your threads will be hitting segmentation faults. Microsoft 
doesn't seem to mind. That's why I think its a bit of a hack.




> Freed node can't be head of stack. I'm sure that they have addressed
> all such issues.

Yeah. The basic result is seg-fault, and they use SEH to handle that 
situation, and they make the OS, so I bet it works.

[...]




> And SEH frame installation/deinstallation cost can be eliminated if
> one big frame will be installed for every thread when thread starts.

A big catch all... Not sure if I like that or not... Humm... 

0
Reply Chris 12/4/2007 7:26:24 AM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:b04ca06d-fac5-4375-ba65-a2becb3122c4@s19g2000prg.googlegroups.com...
> On Dec 4, 10:26 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> > If memory is freed then CAS in lock-free stack algorithm will fail.
>>
>> If the CAS even gets executed. The thing can crash on the hazard. I am 
>> not
>> sure how good it is to base your non-blocking memory management scheme on
>> the fact that your threads will be hitting segmentation faults. Microsoft
>> doesn't seem to mind. That's why I think its a bit of a hack.
>
>
> In Unix world paging fault (they call this segmentation fault :) ) is
> a really fearful thing without appropriate means to cope with.
[...]

seg-fault is a bug in my book...  :^0 

0
Reply Chris 12/4/2007 8:29:56 AM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:b04ca06d-fac5-4375-ba65-a2becb3122c4@s19g2000prg.googlegroups.com...
> On Dec 4, 10:26 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> > If memory is freed then CAS in lock-free stack algorithm will fail.
>>
>> If the CAS even gets executed. The thing can crash on the hazard. I am 
>> not
>> sure how good it is to base your non-blocking memory management scheme on
>> the fact that your threads will be hitting segmentation faults. Microsoft
>> doesn't seem to mind. That's why I think its a bit of a hack.
>
>
> In Unix world paging fault (they call this segmentation fault :) ) is
> a really fearful thing without appropriate means to cope with.
> But in Windows world paging fault (they call this access violation) is
> a ordinary thing with appropriate means to cope with. You can install
> SEH frames per thread, you can nest SEH frames, you can filter
> exceptions, you can even repair environment and retry and raise any
> exception programmatically with RaiseException(). And raising
> exceptions with particular codes is the official method to communicate
> with debugger.
>
>
>
>> > And SEH frame installation/deinstallation cost can be eliminated if
>> > one big frame will be installed for every thread when thread starts.
>>
>> A big catch all... Not sure if I like that or not... Humm...
>
>
> Yes, big catch, but not all. Only AV exceptions in pop() function.
> Exception handler must restart pop() function from beginning.

Well, if I guess you can use SEH for strong thread-safe atomic ref-counting 
as well:
_____________________________________________________________
rc* rcinc_strong(rc** const pshared) {
  rc* local;
restart:
  local = *pshared;
  if (local) {
    _try {
      int rcval
      do {
        rcval = local->rcval;
        if (rcval < 1) {
          goto restart;
        }
      } while (! CAS(&local->rcval, rcval, rcval + 1));
    } _catch_everything {
      goto restart;
    }
  }
  return local;
}
_____________________________________________________________


0
Reply Chris 12/4/2007 8:42:13 AM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:b04ca06d-fac5-4375-ba65-a2becb3122c4@s19g2000prg.googlegroups.com...
> On Dec 4, 10:26 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> > If memory is freed then CAS in lock-free stack algorithm will fail.
>>
>> If the CAS even gets executed. The thing can crash on the hazard. I am 
>> not
>> sure how good it is to base your non-blocking memory management scheme on
>> the fact that your threads will be hitting segmentation faults. Microsoft
>> doesn't seem to mind. That's why I think its a bit of a hack.
>
>
> In Unix world paging fault (they call this segmentation fault :) ) is
> a really fearful thing without appropriate means to cope with.
> But in Windows world paging fault (they call this access violation) is
> a ordinary thing with appropriate means to cope with. You can install
> SEH frames per thread, you can nest SEH frames, you can filter
> exceptions, you can even repair environment and retry and raise any
> exception programmatically with RaiseException(). And raising
> exceptions with particular codes is the official method to communicate
> with debugger.

I don't want/need to use exceptions and/or debugging interfaces in order to 
get correct code. Virtually zero-overhead PDR works fine, without 
sef-faulting... Humm. 

0
Reply Chris 12/4/2007 10:15:38 AM

Dmitriy Vyukov <dvyukov@gmail.com> writes:

> BTW did you see this:
> http://www.accu.org/index.php/journals/1324
>
> New boost.thread implementation will have solution similar to your.
> And it will be available on all Windows versions along with Linux etc.

Currently the linux version uses pthreads, but I do intend to rewrite it at
some point.

> And I propose a little improvement for this implementation :) It seems
> that Anthony Williams just overlooked this little moment:
>
> Instead of:
>  bool lock()
>  {
>    long old_flag=0;
>    while(true)
>
> Must be:
>  bool lock()
>  {
>    long old_flag=flag;
>    while(true)

I did spot that after I wrote the article. The current boost trunk
implementation does this.

> Hmmm.... Now I notice that he use InterlockedCompareExchange on fast-
> path too...
>
> So your implementation with InterlockedIncrement/InterlockedDecrement
> is somewhat better ;)
> Maybe Anthony Williams will be interested in your algorithm to
> incorporate into boost...

As explained in the article, the CAS in the lock() is to allow a "waiters"
count as well as a "locked" flag. We could remove the count (and just use
InterlockedExchange in lock()) if unlock() always did an unconditional
SetEvent call, or we used another scheme for the count (such as an
entirely-separate counter).

We could do this, for example:

class mutex
{
    long locked;
    long counter;
    void* event;

public:
    mutex():
        locked(0),counter(0),event(CreateEvent(0,0,0,0))
    {}

    void lock()
    {
        lock const was_locked=InterlockedExchange(&locked,1);
        if(was_locked)
        {
            InterlockedIncrement(&counter);
            while(InterlockedExchange(&locked,1))
            {
                WaitForSingleObject(event,INFINITE);
            }
            InterlockedDecrement(&counter);
        }
    }

    void unlock()
    {
        store_release(locked,0);
        _ReadWriteBarrier(); // don't move the read of counter before the
                             // write of locked
        if(load_acquire(counter))
        {
            SetEvent(event);
        }
    }    
};

Is that really an improvement? Well, I suppose it doesn't use any interlocked
instructions in unlock().

What does Chris's InterlockedIncrement/InterlockedDecrement implementation
look like?

Anthony
-- 
Anthony Williams
Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
0
Reply Anthony 12/4/2007 11:41:30 AM

"Chris Thomasson" <cristom@comcast.net> wrote in message 
news:orKdnbXF394dYsnanZ2dnUVZ_jKdnZ2d@comcast.com...
> "Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message
>>
>> If memory is freed then CAS in lock-free stack algorithm will fail.
>
> If the CAS even gets executed. The thing can crash on the hazard. I am not 
> sure how good it is to base your non-blocking memory management scheme on 
> the fact that your threads will be hitting segmentation faults. Microsoft 
> doesn't seem to mind. That's why I think its a bit of a hack.
>
>
>
>
>> Freed node can't be head of stack. I'm sure that they have addressed
>> all such issues.
>

If the memory gets freed, it's the same as someone standing on a chair, and 
another kicks the chair from under the first's feet. Properly designed 
program just should not do it. 


0
Reply Alexander 12/4/2007 3:04:26 PM

"Anthony Williams" <anthony_w.geo@yahoo.com> wrote in message 
news:8x4a4rh1.fsf@yahoo.com...
[...]
>
> What does Chris's InterlockedIncrement/InterlockedDecrement implementation
> look like?

Here is the code:

http://appcore.home.comcast.net/misc/winsync_v0001_rwmutex_hpp.html 

0
Reply Chris 12/4/2007 10:10:31 PM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:1a8f28d6-8921-4495-8d43-c5f5a54bce10@e23g2000prf.googlegroups.com...
[...]
>> > On Dec 4, 10:26 am, "Chris Thomasson" <cris...@comcast.net> wrote:
[...]>> Well, if I guess you can use SEH for strong thread-safe atomic 
ref-counting
>> as well:
>> _____________________________________________________________
>> rc* rcinc_strong(rc** const pshared) {
>>   rc* local;
>> restart:
>>   local = *pshared;
>>   if (local) {
>>     _try {
>>       int rcval
>>       do {
>>         rcval = local->rcval;
>>         if (rcval < 1) {
>>           goto restart;
>>         }
>>       } while (! CAS(&local->rcval, rcval, rcval + 1));
>>     } _catch_everything {
>>       goto restart;
>>     }
>>   }
>>   return local;}
>>
>> _____________________________________________________________
>
>
> NO!

YIKES! Your absolutely correct. WTF was I thinking!

;^0




> Here is big difference. SEH only prevents access to unreserved
> memory in process address space. But SEH don't prevent from reading/
> writing new objects in old place (or just trash).
> So here you can not only read some unknown object, but write some
> random bytes into unknown object too!

Check these posts out:

http://groups.google.com/group/comp.programming.threads/msg/93421cc16fcfc072

http://groups.google.com/group/comp.programming.threads/msg/32bf9bc5c5aa85ee

http://groups.google.com/group/comp.programming.threads/msg/24b6ef3944657343

http://groups.google.com/group/comp.programming.threads/msg/447b1397aa7ee7fa


Those posts were about Joe Seighs atomic_ptr for PPC. The SC can operate on 
reclaimed memory, it will fail, but it raises exceptions and its 
implementation defined behavior on the PPC. I wonder how Microsoft gets 
around the hazard in the lock-free stack in for PPC (e.g., XBOX)...

0
Reply Chris 12/4/2007 10:21:54 PM

Chris Thomasson wrote:
> 
> "Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
> news:1a8f28d6-8921-4495-8d43-c5f5a54bce10@e23g2000prf.googlegroups.com...
> [...]
> 
> 
> 
>> Here is big difference. SEH only prevents access to unreserved
>> memory in process address space. But SEH don't prevent from reading/
>> writing new objects in old place (or just trash).
>> So here you can not only read some unknown object, but write some
>> random bytes into unknown object too!
> 
> Check these posts out:
> 
> http://groups.google.com/group/comp.programming.threads/msg/93421cc16fcfc072 
> 
> 
> http://groups.google.com/group/comp.programming.threads/msg/32bf9bc5c5aa85ee 
> 
> 
> http://groups.google.com/group/comp.programming.threads/msg/24b6ef3944657343 
> 
> 
> http://groups.google.com/group/comp.programming.threads/msg/447b1397aa7ee7fa 
> 
> 
> 
> Those posts were about Joe Seighs atomic_ptr for PPC. The SC can operate 
> on reclaimed memory, it will fail, but it raises exceptions and its 
> implementation defined behavior on the PPC. I wonder how Microsoft gets 
> around the hazard in the lock-free stack in for PPC (e.g., XBOX)...
> 

Lock-free LIFO stack algorithm (the one describe in 370/Z-arch POp) has
the same restriction since it fetchs a "next" pointer from what may
no longer be a node.  Not new.



-- 
Joe Seigh

When you get lemons, you make lemonade.
When you get hardware, you make software. 
0
Reply Joe 12/5/2007 1:42:35 AM

<anthony.ajw@gmail.com> wrote in message 
news:337e4488-ddfc-4bea-8524-b76ef9791590@y43g2000hsy.googlegroups.com...
> On 4 Dec, 22:10, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "Anthony Williams" <anthony_w....@yahoo.com> wrote in message
>>
>> news:8x4a4rh1.fsf@yahoo.com...
>> [...]
>>
>>
>>
>> > What does Chris's InterlockedIncrement/InterlockedDecrement 
>> > implementation
>> > look like?
>>
>> Here is the code:
>>
>> http://appcore.home.comcast.net/misc/winsync_v0001_rwmutex_hpp.html
>
> That's a read-write mutex, not a plain mutex (which is what my article
> was about). Also, the write-lock portion (which is the closest to a
> normal mutex) starts with "EnterCriticalSection" --- not exactly the
> optimized code I expected.
>
> I would be interested in your comments on the new scheme I just
> suggested:
>
> class mutex
> {
>    long locked;
>    long counter;
>    void* event;
>
> public:
>    mutex():
>        locked(0),counter(0),event(CreateEvent(0,0,0,0))
>    {}
>
>    void lock()
>    {
>        lock const was_locked=InterlockedExchange(&locked,1);
>        if(was_locked)
>        {
>            InterlockedIncrement(&counter);
>            while(InterlockedExchange(&locked,1))
>            {
>                WaitForSingleObject(event,INFINITE);
>            }
>            InterlockedDecrement(&counter);
>        }
>    }
>
>    void unlock()
>    {
>        store_release(locked,0);
>        _ReadWriteBarrier(); // don't move the read of counter before
> the
>                             // write of locked

I assume that _ReadWriteBarrier is a #StoreLoad | #StoreStore right?


>        if(load_acquire(counter))
>        {
>            SetEvent(event);
>        }
>    }
>
> };


That barrier is too strong for the mutex unlock function because it only 
requires release semantics. Since you do a store, followed by a load to 
another location, you need a fence in between. So, IMO, you have not 
actually optimized the unlock function... You made it more expensive because 
it has to have the full membar semantics instead of just release semantics. 
I like Alex Terekhov's mutex implementation myself. IIRC, it was something 
like:
_________________________________________________________________
class mutex {
  long m_state;
  HANDLE m_wset;

  enum state_e {
    UNLOCKED = 0,
    LOCKED = 1,
    CONTENTION = 2
  };

public:
  mutex() :
    m_state(UNLOCKED),
    m_wset(CreateEvent(0, FALSE, FALSE, 0)) {
    if (! m_wset) { throw std::exception(); }
  }

  ~mutex() throw() {
    if (! CloseHandle(m_wset)) { assert(false); }
  }

public:
  void lock() throw() {
    if (InterlockedExchangeAcquire(&m_state, LOCKED)) {
      Sleep(1);
      while (InterlockedExchangeAcquire(&m_state, CONTENTION)) {
        if (WaitForSingleObject(m_wset) != WAIT_OBJECT_0) {
          assert(false); std::unexpected();
        }
      }
    }
  }

  void unlock() throw() {
    if (InterlockedExchangeRelease(&m_state, UNLOCKED) == CONTENTION) {
      if (! SetEvent(m_wset)) {
        assert(false); std::unexpected();
      }
    }
  }
};
_________________________________________________________________





0
Reply Chris 12/5/2007 10:11:21 PM

<anthony.ajw@gmail.com> wrote in message 
news:337e4488-ddfc-4bea-8524-b76ef9791590@y43g2000hsy.googlegroups.com...
> On 4 Dec, 22:10, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "Anthony Williams" <anthony_w....@yahoo.com> wrote in message
>>
>> news:8x4a4rh1.fsf@yahoo.com...
>> [...]
>>
>>
>>
>> > What does Chris's InterlockedIncrement/InterlockedDecrement 
>> > implementation
>> > look like?
>>
>> Here is the code:
>>
>> http://appcore.home.comcast.net/misc/winsync_v0001_rwmutex_hpp.html
>
> That's a read-write mutex, not a plain mutex (which is what my article
> was about). Also, the write-lock portion (which is the closest to a
> normal mutex) starts with "EnterCriticalSection" --- not exactly the
> optimized code I expected.
[...]

What about the read functions? I did not bother to heavily optimize the 
write functions. The writer fast-paths have to execute two interlocked 
instructions. The read functions only have to execute a single interlocked 
instruction. Also, uncontended readers are wait-free; there are no loops in 
the entire algorithm. 

0
Reply Chris 12/5/2007 10:15:17 PM

<anthony.ajw@gmail.com> wrote in message 
news:337e4488-ddfc-4bea-8524-b76ef9791590@y43g2000hsy.googlegroups.com...
> On 4 Dec, 22:10, "Chris Thomasson" <cris...@comcast.net> wrote:
>> "Anthony Williams" <anthony_w....@yahoo.com> wrote in message
>>
>> news:8x4a4rh1.fsf@yahoo.com...
>> [...]
>>
>>
>>
>> > What does Chris's InterlockedIncrement/InterlockedDecrement 
>> > implementation
>> > look like?
>>
>> Here is the code:
>>
>> http://appcore.home.comcast.net/misc/winsync_v0001_rwmutex_hpp.html
>
> That's a read-write mutex, not a plain mutex (which is what my article
> was about). Also, the write-lock portion (which is the closest to a
> normal mutex) starts with "EnterCriticalSection" --- not exactly the
> optimized code I expected.
>
> I would be interested in your comments on the new scheme I just
> suggested:
[...]

This might work:
___________________________________________________________
enum flags_e {
  LOCKED = 0x1,
  WAITERS = (LOCKED | 0x2),
  UNLOCKED = 0
};


void lock() {
  while(ATOMIC_BTS(&m_state, LOCKED)) {
    if(ATOMIC_SWAP(&m_state, WAITERS) & LOCKED){
      WaitForSingleObject(m_wset);
    }
  }
  membar #StoreLoad | #StoreStore;
}


void unlock() {
  long const state = ATOMIC_LOAD(&m_state);
  membar #LoadStore | #StoreStore;
  ATOMIC_STORE(m_state, UNLOCKED);
  long const cmp = ATOMIC_LOAD(&m_state);
  if (state & WAITERS || cmp & WAITERS) {
    SetEvent(m_wset);
  }
}
___________________________________________________________


You don't need the full membar in the unlock function here because the store 
is followed by a load to the same location. 

0
Reply Chris 12/5/2007 10:36:23 PM

They pack more into 64 bits. (by the way, it's not x86-64, it's x64).

Actually there is 128 bit CAS, but not on IA64 (as if anybody cares).

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:98e81acd-3e5e-4428-82b8-a6454ec873e6@d4g2000prg.googlegroups.com...
>
>
>
> Interesting how M$ implement lock-free stack on x86-64? There is no
> DWORD CAS...
>
>
> Dmitriy V'jukov 


0
Reply Alexander 12/6/2007 2:12:53 PM

anthony.ajw@gmail.com wrote:
[...]
> > I like Alex Terekhov's mutex implementation myself. IIRC, it was something
> > like:
> > _________________________________________________________________
> > class mutex {
> >   long m_state;
> >   HANDLE m_wset;
> >
> >   enum state_e {
> >     UNLOCKED = 0,
> >     LOCKED = 1,
> >     CONTENTION = 2
> >   };
> >
> > public:
> >   mutex() :
> >     m_state(UNLOCKED),
> >     m_wset(CreateEvent(0, FALSE, FALSE, 0)) {
> >     if (! m_wset) { throw std::exception(); }
> >   }
> >
> >   ~mutex() throw() {
> >     if (! CloseHandle(m_wset)) { assert(false); }
> >   }
> >
> > public:
> >   void lock() throw() {
> >     if (InterlockedExchangeAcquire(&m_state, LOCKED)) {
> >       Sleep(1);
> >       while (InterlockedExchangeAcquire(&m_state, CONTENTION)) {
> >         if (WaitForSingleObject(m_wset) != WAIT_OBJECT_0) {
> >           assert(false); std::unexpected();
> >         }
> >       }
> >     }
> >   }
> >
> >   void unlock() throw() {
> >     if (InterlockedExchangeRelease(&m_state, UNLOCKED) == CONTENTION) {
> >       if (! SetEvent(m_wset)) {
> >         assert(false); std::unexpected();
> >       }
> >     }
> >   }};
> 
> That's pretty good, but I'm not sure I like it. If thread A has the
> lock, and thread B is waiting, m_state is CONTENTION. If thread C then
> sets the state to LOCKED and gets preempted in the Sleep, and thread A
> unlocks whilst thread C is sleeping, thread A will see a value of
> LOCKED rather than CONTENTION, so it won't trigger the event. Thread B
> will then not be woken, and can't take the lock, even though thread C
> has just gone to Sleep.

Having Sleep() there makes no sense. Chris' IIRC is not quite C. :-)

However, if the only atomic RMW available is swap then we could indeed 
have a temporary loss of contention indication. Atomic bit test and 
set on x86 (to get rid of that problem) aside for a moment, a bit more 
powerful archs (like the current Xbox for example) can do something 
along the lines of the following:

// doesn't provide "POSIX-safety" with respect to destruction 
class mutex_for_Xbox_360 { 

  atomic<int>      m_lock_status; // 0: free, 1/-1: locked/contention 
  auto_reset_event m_retry_event; // prohibitively slow bin.sema/gate 

  int change_reserved(int old, int new) { 
    while (!m_lock_status.store_conditional(new)) { 
      int fresh = m_lock_status.load_and_reserve(); 
      if (fresh != old) return fresh; 
    } 
    return old; 
  } 

public: 

  void lock() { 
    int old = m_lock_status.load_and_reserve(); 
    if (old || old = change_reserved(0, 1)) {
      do { 
        while (old < 0 || old = change_reserved(1, -1)) { 
          m_retry_event.wait(); 
          if (!(old = m_lock_status.load_and_reserve())) break; 
        } 
      } while (old = change_reserved(0, -1)); 
    } 
    isync();
  } 

  void unlock() { 
    int old = m_lock_status.load_and_reserve(); 
    assert(old);
    lwsync();
    if (old < 0 || !m_lock_status.store_conditional(0)) {
      m_lock_status.store(0); 
      m_retry_event.set(); 
    } 
  } 

}; 

regards,
alexander.
0
Reply Alexander 12/6/2007 4:54:38 PM

<anthony.ajw@gmail.com> wrote in message 
news:5bccd7d7-016f-4b10-a2fe-b1b86c7ae0d4@v4g2000hsf.googlegroups.com...
> On 5 Dec, 22:11, "Chris Thomasson" <cris...@comcast.net> wrote:
>> <anthony....@gmail.com> wrote in message
[...]
>> >    void unlock()
>> >    {
>> >        store_release(locked,0);
>> >        _ReadWriteBarrier(); // don't move the read of counter before
>> > the
>> >                             // write of locked
>>
>> I assume that _ReadWriteBarrier is a #StoreLoad | #StoreStore right?
>
> No, it's a compiler barrier that prevents rearranging the code, rather
> than a membar.
[...]

A compiler barrier is not going to prevent the x86 from hoisting the load 
above the store.
__________________________________________________
void unlock()
{
  1: store_release(locked,0);
  _ReadWriteBarrier();
  2: if(load_acquire(counter))
  {
    SetEvent(event);
  }
}
__________________________________________________


Step 2 can be executed before step 1:

http://groups.google.com/group/comp.programming.threads/msg/731cd06e8c04f744


Unless I am missing something, this will cause problems in the algorithm 
as-is. I think your going to need to do a fence:
__________________________________________________
void unlock()
{
  store_fence(&locked,0);
  if(load_naked(&counter))
  {
    SetEvent(event);
  }
}
__________________________________________________


Humm.

0
Reply Chris 12/6/2007 7:04:53 PM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:a2fc5e5a-369c-4da1-918f-d2378d499257@e10g2000prf.googlegroups.com...
> On Dec 6, 5:12 pm, "Alexander Grigoriev" <al...@earthlink.net> wrote:
>> They pack more into 64 bits. (by the way, it's not x86-64, it's x64).
>
>
> But they have to pack full-fledged user pointer *and* ABA counter in
> 64 bits...

Yeah. Here is official information:

http://groups.google.com/group/comp.programming.threads/msg/e01455d08325c71c

Neill Clift works on the kernel. According to him, they use a pointer 
compression technique.




>> Actually there is 128 bit CAS, but not on IA64 (as if anybody cares).

This is pain in the neck for some people... Yes, they care about this 
indeed. SPARC has the same problem:

http://groups.google.com/group/comp.arch/browse_frm/thread/71f8e0094e353e5




> Afaik IA64 have cmp8bxchg16b, but first x64 doesn't have. Afaik only
> latest x64 have 128 bit CAS...

Yup. 

0
Reply Chris 12/6/2007 7:08:39 PM

"Alexander Terekhov" <terekhov@web.de> wrote in message 
news:4758294E.678F2770@web.de...
>
> anthony.ajw@gmail.com wrote:
> [...]
>> > I like Alex Terekhov's mutex implementation myself. IIRC, it was 
>> > something
>> > like:
>> > _________________________________________________________________
>> > class mutex {
[...]
>> > };
>>
>> That's pretty good, but I'm not sure I like it. If thread A has the
>> lock, and thread B is waiting, m_state is CONTENTION. If thread C then
>> sets the state to LOCKED and gets preempted in the Sleep, and thread A
>> unlocks whilst thread C is sleeping, thread A will see a value of
>> LOCKED rather than CONTENTION, so it won't trigger the event. Thread B
>> will then not be woken, and can't take the lock, even though thread C
>> has just gone to Sleep.
>
> Having Sleep() there makes no sense.

DOH! Your right of course! Shi%t. 

0
Reply Chris 12/6/2007 7:19:04 PM

"Alexander Grigoriev" <alegr@earthlink.net> wrote in message 
news:OH%23PP2GNIHA.4752@TK2MSFTNGP05.phx.gbl...
> InterlockedCompareExchange by itself is LOCK XCHG instruction. [...]

FWIW, LOCK is already implied by the XCHG instruction; is not needed 
explicitly... 

0
Reply Chris 12/6/2007 7:22:00 PM

"Joe Seigh" <jseigh_01@xemaps.com> wrote in message 
news:fmn5j.18672$t31.1509@trnddc02...
> Chris Thomasson wrote:
>>
>> "Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
>> news:1a8f28d6-8921-4495-8d43-c5f5a54bce10@e23g2000prf.googlegroups.com...
>> [...]
>>
>>
>>
>>> Here is big difference. SEH only prevents access to unreserved
>>> memory in process address space. But SEH don't prevent from reading/
>>> writing new objects in old place (or just trash).
>>> So here you can not only read some unknown object, but write some
>>> random bytes into unknown object too!
>>
>> Check these posts out:
>>
>> http://groups.google.com/group/comp.programming.threads/msg/93421cc16fcfc072 
>> http://groups.google.com/group/comp.programming.threads/msg/32bf9bc5c5aa85ee 
>> http://groups.google.com/group/comp.programming.threads/msg/24b6ef3944657343 
>> http://groups.google.com/group/comp.programming.threads/msg/447b1397aa7ee7fa 
>> Those posts were about Joe Seighs atomic_ptr for PPC. The SC can operate 
>> on reclaimed memory, it will fail, but it raises exceptions and its 
>> implementation defined behavior on the PPC. I wonder how Microsoft gets 
>> around the hazard in the lock-free stack in for PPC (e.g., XBOX)...
>>
>
> Lock-free LIFO stack algorithm (the one describe in 370/Z-arch POp) has
> the same restriction since it fetchs a "next" pointer from what may
> no longer be a node.  Not new.

Indeed it can read from memory that has been reclaimed, and subsequently 
reallocated, within the scope of the CAS-loop or LL/SC. However, I think 
that the SC part, or the CAS w/ ABA counter, should prevent the read of 
garbage from being committed in the data-structure. The reservation is lost 
for sure wrt LL/SC, and the ABA counter would detect that the operation that 
removed and reclaimed the node wrt CAS-loop. 

0
Reply Chris 12/7/2007 7:15:57 AM

"Chris Thomasson" <cristom@comcast.net> wrote in message 
news:cvqdna5L354ebMXanZ2dnUVZ_u-unZ2d@comcast.com...
> "Joe Seigh" <jseigh_01@xemaps.com> wrote in message 
> news:fmn5j.18672$t31.1509@trnddc02...
>> Chris Thomasson wrote:
>>>
>>> "Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
>>> news:1a8f28d6-8921-4495-8d43-c5f5a54bce10@e23g2000prf.googlegroups.com...
>>> [...]
>>>
>>>
>>>
>>>> Here is big difference. SEH only prevents access to unreserved
>>>> memory in process address space. But SEH don't prevent from reading/
>>>> writing new objects in old place (or just trash).
>>>> So here you can not only read some unknown object, but write some
>>>> random bytes into unknown object too!
>>>
>>> Check these posts out:
>>>
>>> http://groups.google.com/group/comp.programming.threads/msg/93421cc16fcfc072 
>>> http://groups.google.com/group/comp.programming.threads/msg/32bf9bc5c5aa85ee 
>>> http://groups.google.com/group/comp.programming.threads/msg/24b6ef3944657343 
>>> http://groups.google.com/group/comp.programming.threads/msg/447b1397aa7ee7fa 
>>> Those posts were about Joe Seighs atomic_ptr for PPC. The SC can operate 
>>> on reclaimed memory, it will fail, but it raises exceptions and its 
>>> implementation defined behavior on the PPC. I wonder how Microsoft gets 
>>> around the hazard in the lock-free stack in for PPC (e.g., XBOX)...
>>>
>>
>> Lock-free LIFO stack algorithm (the one describe in 370/Z-arch POp) has
>> the same restriction since it fetchs a "next" pointer from what may
>> no longer be a node.  Not new.
>
> Indeed it can read from memory that has been reclaimed, and subsequently 
> reallocated, within the scope of the CAS-loop or LL/SC. However, I think 
> that the SC part, or the CAS w/ ABA counter, should prevent the read of 
> garbage from being committed in the data-structure. The reservation is 
> lost for sure wrt LL/SC, and the ABA counter would detect that the 
> operation that removed and reclaimed the node wrt CAS-loop.

Ahh, why not just avoid the seg-fault like this:
_________________________________________________
void*
slist_anchor_pop(
  slist_anchor* const _this
) {
  PROXYGC_ACQUIRE();
  slist_anchor cmp, xchg;
  do {
    cmp = *_this;
    if (! cmp) { return 0; }
  } while (! CAS(_this, cmp, cmp->nx));
  void* state = cmp->state;
  PROXYGC_RELEASE_AND_DEFER(cmp);
  return state;
}
_________________________________________________


Using PDR within the context of a lock-free stack means that you non longer 
need to worry about ABA or any reclamation issue for that matter. This seems 
Kosher to me. I don't like my apps to seg-fault!

;^0 

0
Reply Chris 12/7/2007 7:22:24 AM

<anthony.ajw@gmail.com> wrote in message 
news:24a8669e-683f-40df-81af-788b957d8ffd@w56g2000hsf.googlegroups.com...
> On 5 Dec, 22:36, "Chris Thomasson" <cris...@comcast.net> wrote:
>> <anthony....@gmail.com> wrote in message
>>
>> news:337e4488-ddfc-4bea-8524-b76ef9791590@y43g2000hsy.googlegroups.com...
>>
>> > On 4 Dec, 22:10, "Chris Thomasson" <cris...@comcast.net> wrote:
>> >> "Anthony Williams" <anthony_w....@yahoo.com> wrote in message
>>
>> >>news:8x4a4rh1.fsf@yahoo.com...
>> >> [...]
>>
>> > I would be interested in your comments on the new scheme I just
>> > suggested:
>>
>> [...]
>>
>> This might work:
>> ___________________________________________________________
[...]
>> ___________________________________________________________
>>
>> You don't need the full membar in the unlock function here because the 
>> store
>> is followed by a load to the same location.
>
> How is that better than my original code?
[...]

Well, if your code can avoid the fence in between the store_release and 
subsequent load_acquire in the unlock function, then its not any better at 
all:

http://groups.google.com/group/comp.programming.threads/msg/e15e510849374f97 

0
Reply Chris 12/7/2007 7:29:47 AM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:6ae21ef9-509d-4f4d-9bec-4027124ded64@t1g2000pra.googlegroups.com...
> On Dec 7, 10:22 am, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Using PDR within the context of a lock-free stack means that you non 
>> longer
>> need to worry about ABA or any reclamation issue for that matter. This 
>> seems
>> Kosher to me. I don't like my apps to seg-fault!
>
>
> I think it will be not very Kosher if Microsoft will require that
> *ALL* user programs must call something like PdrSynchronize() in *ALL*
> thread *PERIODICALLY*.

http://groups.google.com/group/comp.programming.threads/msg/9134192beca91c5a

http://groups.google.com/group/comp.programming.threads/msg/890bb145ba5606ef

Windows is a platform in which you can do automatic epoch detection. You 
don't have to explicitly call PdrSynchronize at all.


> And all old code on Win32 will be broken,
> including all scripts, java, delphi etc.

This is Microsoft's problem. 

0
Reply Chris 12/7/2007 7:52:29 PM

<anthony.ajw@gmail.com> wrote in message 
news:d2751975-f2df-4edb-894c-0179c88c1f33@s8g2000prg.googlegroups.com...
> On 6 Dec, 19:04, "Chris Thomasson" <cris...@comcast.net> wrote:
>> <anthony....@gmail.com> wrote in message
>>
>> news:5bccd7d7-016f-4b10-a2fe-b1b86c7ae0d4@v4g2000hsf.googlegroups.com...> 
>> On 5 Dec, 22:11, "Chris Thomasson" <cris...@comcast.net> wrote:
>> >> <anthony....@gmail.com> wrote in message
>> [...]
>> >> >    void unlock()
>> >> >    {
>> >> >        store_release(locked,0);
>> >> >        _ReadWriteBarrier(); // don't move the read of counter before
>> >> > the
>> >> >                             // write of locked
>>
>> >> I assume that _ReadWriteBarrier is a #StoreLoad | #StoreStore right?
>>
>> > No, it's a compiler barrier that prevents rearranging the code, rather
>> > than a membar.
>>
>> [...]
>>
>> A compiler barrier is not going to prevent the x86 from hoisting the load
>> above the store.
>
>> Unless I am missing something, this will cause problems in the algorithm
>> as-is. I think your going to need to do a fence:
>
> You're right. Any ideas how a store/fence/load compares to
> InterlockedExchangeAdd?

Since InterlockedExchangeAdd executes a fence, its most likely just as 
expensive. However, when you compare it to InterlockedExchangeAddRelease, 
then the fence will start to cause to unneeded overhead... 

0
Reply Chris 12/7/2007 8:42:29 PM

"Chris Thomasson" <cristom@comcast.net> wrote in message 
news:wqednfydDszfu8ranZ2dnUVZ_o-mnZ2d@comcast.com...
> <anthony.ajw@gmail.com> wrote in message 
> news:337e4488-ddfc-4bea-8524-b76ef9791590@y43g2000hsy.googlegroups.com...
>> On 4 Dec, 22:10, "Chris Thomasson" <cris...@comcast.net> wrote:
>>> "Anthony Williams" <anthony_w....@yahoo.com> wrote in message
>>>
>>> news:8x4a4rh1.fsf@yahoo.com...
>>> [...]
>>>
>>>
>>>
>>> > What does Chris's InterlockedIncrement/InterlockedDecrement 
>>> > implementation
>>> > look like?
>>>
>>> Here is the code:
>>>
>>> http://appcore.home.comcast.net/misc/winsync_v0001_rwmutex_hpp.html
>>
>> That's a read-write mutex, not a plain mutex (which is what my article
>> was about). Also, the write-lock portion (which is the closest to a
>> normal mutex) starts with "EnterCriticalSection" --- not exactly the
>> optimized code I expected.
>>
>> I would be interested in your comments on the new scheme I just
>> suggested:
> [...]
>
> This might work:

Let try any refine this a little bit:

> ___________________________________________________________
[...]
enum state_e {
  UNLOCKED = 0,
  LOCKED = 0x1,
  WAITER = 0x2,
  CONTENTION = (LOCKED | WAITER)
};

void lock() {
  if (ATOMIC_BTS(&m_state, LOCKED)) {
    while (ATOMIC_SWAP(&m_state, CONTENTION) & LOCKED) {
      WaitForSingleObject(m_wset);
    }
  }
  membar #StoreLoad | #StoreStore;
}

void unlock() {
  long const state = ATOMIC_LOAD(&m_state);
  membar #LoadStore | #StoreStore;
  ATOMIC_STORE(m_state, UNLOCKED);
  long const cmp = ATOMIC_LOAD(&m_state);
  if (state & WAITER || cmp & WAITER) {
    SetEvent(m_wset);
  }
}
> ___________________________________________________________
[...]


Humm. Can you see a race-condition in the unlock function? I can't seem to 
find one right now. 

0
Reply Chris 12/7/2007 10:23:53 PM

[...]
>> This might work:
> 
> Let try any refine this a little bit:
^^^^^^^^^^^^^^^^^^^

Let me try an refine this a little bit here:
>> ___________________________________________________________
[...]

It sounded kind of retarded before.
0
Reply Chris 12/7/2007 10:37:51 PM

"Alexander Terekhov" <terekhov@web.de> wrote in message 
news:4758294E.678F2770@web.de...
>
> anthony.ajw@gmail.com wrote:
> [...]
>> > I like Alex Terekhov's mutex implementation myself. IIRC, it was 
>> > something
>> > like:
>> > _________________________________________________________________
>> > class mutex {
>> >   long m_state;
>> >   HANDLE m_wset;
>> >
>> >   enum state_e {
>> >     UNLOCKED = 0,
>> >     LOCKED = 1,
>> >     CONTENTION = 2
>> >   };
>> >
>> > public:
>> >   mutex() :
>> >     m_state(UNLOCKED),
>> >     m_wset(CreateEvent(0, FALSE, FALSE, 0)) {
>> >     if (! m_wset) { throw std::exception(); }
>> >   }
>> >
>> >   ~mutex() throw() {
>> >     if (! CloseHandle(m_wset)) { assert(false); }
>> >   }
>> >
>> > public:
>> >   void lock() throw() {
>> >     if (InterlockedExchangeAcquire(&m_state, LOCKED)) {
>> >       Sleep(1);
>> >       while (InterlockedExchangeAcquire(&m_state, CONTENTION)) {
>> >         if (WaitForSingleObject(m_wset) != WAIT_OBJECT_0) {
>> >           assert(false); std::unexpected();
>> >         }
>> >       }
>> >     }
>> >   }
>> >
>> >   void unlock() throw() {
>> >     if (InterlockedExchangeRelease(&m_state, UNLOCKED) == CONTENTION) {
>> >       if (! SetEvent(m_wset)) {
>> >         assert(false); std::unexpected();
>> >       }
>> >     }
>> >   }};
>>
>> That's pretty good, but I'm not sure I like it. If thread A has the
>> lock, and thread B is waiting, m_state is CONTENTION. If thread C then
>> sets the state to LOCKED and gets preempted in the Sleep, and thread A
>> unlocks whilst thread C is sleeping, thread A will see a value of
>> LOCKED rather than CONTENTION, so it won't trigger the event. Thread B
>> will then not be woken, and can't take the lock, even though thread C
>> has just gone to Sleep.
>
> Having Sleep() there makes no sense. Chris' IIRC is not quite C. :-)
>
> However, if the only atomic RMW available is swap then we could indeed
> have a temporary loss of contention indication. Atomic bit test and
> set on x86 (to get rid of that problem) aside for a moment, a bit more
> powerful archs (like the current Xbox for example) can do something
> along the lines of the following:
>
> // doesn't provide "POSIX-safety" with respect to destruction
> class mutex_for_Xbox_360 {
[...]
> };

Humm. I am not sure I like the LL/SC version compared to the interlocked 
swap/bts one. I think the latter can outperform it because the code to 
perform the swap instruction on a system that only has LL/SC is something 
like:
__________________________________________
extern "C" void*
ATOMIC_SWAPPTR(
  void* volatile* const shared,
  void* const local;
) {
  void* prev;
  do {
    prev = LL(shared);
  } while(! SC(shared, local));
  return prev;
}
__________________________________________


instead of:
__________________________________________
extern "C" void*
ATOMIC_SWAPPTR(
  void* volatile* const shared,
  void* const local;
) {
  return XCHG(shared, local);
}
__________________________________________


The LL/SC version has more overhead, and its not wait-free...

Humm...

0
Reply Chris 12/7/2007 10:54:27 PM

In article <8padna4Eq-ULVMTanZ2dnUVZ_sGvnZ2d@comcast.com>, 
cristom@comcast.net says...
> [...]
> >> This might work:
> > 
> > Let try any refine this a little bit:
> ^^^^^^^^^^^^^^^^^^^
> 
> Let me try an refine this a little bit here:

If you're going to correct it, ITYM:

Let me try and refine this a little bit here:

-- 
    Later,
    Jerry.

The universe is a figment of its own imagination.
0
Reply Jerry 12/8/2007 4:57:29 AM

Chris Thomasson wrote:
[...]
> void unlock() {
>   long const state = ATOMIC_LOAD(&m_state);
>   membar #LoadStore | #StoreStore;
>   ATOMIC_STORE(m_state, UNLOCKED);
>   long const cmp = ATOMIC_LOAD(&m_state);
>   if (state & WAITER || cmp & WAITER) {
>     SetEvent(m_wset);
>   }
> }
> > ___________________________________________________________
> [...]
> 
> Humm. Can you see a race-condition in the unlock function? I can't seem to
> find one right now.

Think of waiters coming in between m_state store and first load.

Take a break. :-)

regards,
alexander.
0
Reply Alexander 12/8/2007 6:21:10 PM

"Alexander Terekhov" <terekhov@web.de> wrote in message 
news:475AE096.663C5EFC@web.de...
>
> Chris Thomasson wrote:
> [...]
>> void unlock() {
>>   long const state = ATOMIC_LOAD(&m_state);
>>   membar #LoadStore | #StoreStore;
>>   ATOMIC_STORE(m_state, UNLOCKED);
>>   long const cmp = ATOMIC_LOAD(&m_state);
>>   if (state & WAITER || cmp & WAITER) {
>>     SetEvent(m_wset);
>>   }
>> }
>> > ___________________________________________________________
>> [...]
>>
>> Humm. Can you see a race-condition in the unlock function? I can't seem 
>> to
>> find one right now.
>
> Think of waiters coming in between m_state store and first load.

Yup.

void lock() {
L1:  if (ATOMIC_BTS(&m_state, LOCKED)) {
L2:    while (ATOMIC_SWAP(&m_state, CONTENTION) & LOCKED) {
L3:      WaitForSingleObject(m_wset);
    }
  }
  membar #StoreLoad | #StoreStore;
}

void unlock() {
U1:  long const state = ATOMIC_LOAD(&m_state);
  membar #LoadStore | #StoreStore;
U2:  ATOMIC_STORE(m_state, UNLOCKED);
U3:  long const cmp = ATOMIC_LOAD(&m_state);
U4:  if (state & WAITER || cmp & WAITER) {
    SetEvent(m_wset);
  }
}


The following execution sequence will miss a waiter:

ThreadA: U1 - loads LOCKED
ThreadB: L1 - sets LOCKED
ThreadB: L2 - stores CONTENTION
ThreadB: L3 - waits
ThreadA: U2 - stores UNLOCKED
ThreadA: U3 - loads UNLOCKED
ThreadA: U4 - misses waiter.


Is that the condition you are referring to?




> Take a break. :-)

;^)

Well, I think your SWAP version is about as good as its going to get. Even 
though it can issue a superfluous wakeup at the end of a contention 
interval:

ThreadA: SWAP(LOCKED)     - Has the lock
ThreadB: SWAP(LOCKED)     - Contention!
ThreadA: SWAP(UNLOCKED)   - Issues nothing
ThreadB: SWAP(CONTENTION) - Has the lock
ThreadB: SWAP(UNLOCKED)   - Issues superfluous wakeup


That's fine with me. 

0
Reply Chris 12/8/2007 10:42:46 PM

"Chris Thomasson" <cristom@comcast.net> wrote in message 
news:qLOdnafjJvLQW8TanZ2dnUVZ_oytnZ2d@comcast.com...
> "Chris Thomasson" <cristom@comcast.net> wrote in message 
> news:wqednfydDszfu8ranZ2dnUVZ_o-mnZ2d@comcast.com...
>> <anthony.ajw@gmail.com> wrote in message 
>> news:337e4488-ddfc-4bea-8524-b76ef9791590@y43g2000hsy.googlegroups.com...
>>> On 4 Dec, 22:10, "Chris Thomasson" <cris...@comcast.net> wrote:
>>>> "Anthony Williams" <anthony_w....@yahoo.com> wrote in message
>>>>
>>>> news:8x4a4rh1.fsf@yahoo.com...
>>>> [...]
>>>>
>>>>
>>>>
>>>> > What does Chris's InterlockedIncrement/InterlockedDecrement 
>>>> > implementation
>>>> > look like?
>>>>
>>>> Here is the code:
>>>>
>>>> http://appcore.home.comcast.net/misc/winsync_v0001_rwmutex_hpp.html
>>>
>>> That's a read-write mutex, not a plain mutex (which is what my article
>>> was about). Also, the write-lock portion (which is the closest to a
>>> normal mutex) starts with "EnterCriticalSection" --- not exactly the
>>> optimized code I expected.
>>>
>>> I would be interested in your comments on the new scheme I just
>>> suggested:
>> [...]
>>
>> This might work:
>
> Let try any refine this a little bit:
>
>> ___________________________________________________________
> [...]
> enum state_e {
>  UNLOCKED = 0,
>  LOCKED = 0x1,
>  WAITER = 0x2,
>  CONTENTION = (LOCKED | WAITER)
> };
>
[...]

To correct the following:

> void unlock() {
>  long const state = ATOMIC_LOAD(&m_state);
>  membar #LoadStore | #StoreStore;
>  ATOMIC_STORE(m_state, UNLOCKED);
>  long const cmp = ATOMIC_LOAD(&m_state);
>  if (state & WAITER || cmp & WAITER) {
>    SetEvent(m_wset);
>  }
> }


void unlock() {
  membar #LoadStore | #StoreStore;
  if (ATOMIC_SWAP(&m_state, UNLOCKED) & WAITER) {
    SetEvent(m_wset);
  }
}



>> ___________________________________________________________
> [...]


0
Reply Chris 12/8/2007 10:44:33 PM

"Chris Thomasson" <cristom@comcast.net> wrote in message 
news:pqadnZhWe_yngcbanZ2dnUVZ_uWlnZ2d@comcast.com...
> "Alexander Terekhov" <terekhov@web.de> wrote in message 
> news:475AE096.663C5EFC@web.de...
[...]
>
> Well, I think your SWAP version is about as good as its going to get. Even 
> though it can issue a superfluous wakeup at the end of a contention 
> interval:
>
> ThreadA: SWAP(LOCKED)     - Has the lock
> ThreadB: SWAP(LOCKED)     - Contention!
> ThreadA: SWAP(UNLOCKED)   - Issues nothing
> ThreadB: SWAP(CONTENTION) - Has the lock
> ThreadB: SWAP(UNLOCKED)   - Issues superfluous wakeup
>
>
> That's fine with me.

I wonder how much better a waiters-bit solution would actually be, other 
than POSIX safety on sync object destruction. 

0
Reply Chris 12/9/2007 12:15:37 AM

Chris Thomasson wrote:
[...]
> The following execution sequence will miss a waiter:
> 
> ThreadA: U1 - loads LOCKED
> ThreadB: L1 - sets LOCKED
> ThreadB: L2 - stores CONTENTION
> ThreadB: L3 - waits
> ThreadA: U2 - stores UNLOCKED
> ThreadA: U3 - loads UNLOCKED
> ThreadA: U4 - misses waiter.
> 
> Is that the condition you are referring to?

Yes.

> 
> > Take a break. :-)
> 
> ;^)
> 
> Well, I think your SWAP version is about as good as its going to get. Even
> though it can issue a superfluous wakeup at the end of a contention
> interval:
> 
> ThreadA: SWAP(LOCKED)     - Has the lock
> ThreadB: SWAP(LOCKED)     - Contention!
> ThreadA: SWAP(UNLOCKED)   - Issues nothing
> ThreadB: SWAP(CONTENTION) - Has the lock
> ThreadB: SWAP(UNLOCKED)   - Issues superfluous wakeup
> 
> That's fine with me.

It is much, much better in this respect (setting event with no 
waiters pending) than Anthony's scheme with waiters counter.

regards,
alexander.
0
Reply Alexander 12/10/2007 9:17:20 AM

Alexander Terekhov <terekhov@web.de> writes:

>> Well, I think your SWAP version is about as good as its going to get. Even
>> though it can issue a superfluous wakeup at the end of a contention
>> interval:
>> 
>> ThreadA: SWAP(LOCKED)     - Has the lock
>> ThreadB: SWAP(LOCKED)     - Contention!
>> ThreadA: SWAP(UNLOCKED)   - Issues nothing
>> ThreadB: SWAP(CONTENTION) - Has the lock
>> ThreadB: SWAP(UNLOCKED)   - Issues superfluous wakeup
>> 
>> That's fine with me.
>
> It is much, much better in this respect (setting event with no 
> waiters pending) than Anthony's scheme with waiters counter.

"Much, much better" is quite an improvement! Care to elaborate?

Anthony
-- 
Anthony Williams
Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
0
Reply Anthony 12/10/2007 3:31:46 PM

Anthony Williams wrote:
> 
> Alexander Terekhov <terekhov@web.de> writes:
> 
> >> Well, I think your SWAP version is about as good as its going to get. Even
> >> though it can issue a superfluous wakeup at the end of a contention
> >> interval:
> >>
> >> ThreadA: SWAP(LOCKED)     - Has the lock
> >> ThreadB: SWAP(LOCKED)     - Contention!
> >> ThreadA: SWAP(UNLOCKED)   - Issues nothing
> >> ThreadB: SWAP(CONTENTION) - Has the lock
> >> ThreadB: SWAP(UNLOCKED)   - Issues superfluous wakeup
> >>
> >> That's fine with me.
> >
> > It is much, much better in this respect (setting event with no
> > waiters pending) than Anthony's scheme with waiters counter.
> 
> "Much, much better" is quite an improvement! Care to elaborate?

Assume one waiter blocked on event. Lock owner does unlock() 
and sets the event but that doesn't mean that waiter will 
proceed immediately and decrement waiters count (suppose both 
threads are bound to the same processor and old owner is higher 
priority thread or equal priority thread but at the beginning 
of its timeslice). Next, old owner locks mutex once again, 
unlocks... and again enters slow path in unlock(). That can 
repeat many, very many times until waiters count decrement.

regards,
alexander.
0
Reply Alexander 12/10/2007 6:47:29 PM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:1993eca1-77be-4742-8387-449e41830988@s12g2000prg.googlegroups.com...
> On Dec 7, 10:52 pm, "Chris Thomasson" <cris...@comcast.net> wrote:
>
>> Windows is a platform in which you can do automatic epoch detection. You
>> don't have to explicitly call PdrSynchronize at all.
>
>
> Interesting.

The automatic epoch detection is used to determine when a CPU, or group of 
CPUS, have executed membars.


> Don't forgot that SList is intrusive. How you are going
> to detect that user's code doesn't hold any references in arbitrary
> form to arbitrary object without any additional helper calls like
> release_object()?
> With stack analyzing? This won't work in C/C++ :)

Microsoft would have to be able to determine that its freeing a SList node. 
Something like:
_________________________________________________
void free(void* ptr) {
  if (ptr) {
    if (NtIsPtrSList(ptr)) {
      NtPdrDefer(ptr);
    } else {
      NtHeapFree(ptr);
    }
  }
}
_________________________________________________



They would need to add something like this to their pop code:
_________________________________________________
void*
slist_anchor_pop(
  slist_anchor* const _this
) {
  NtPdrAcquire(_this);
  slist_anchor cmp, xchg;
  do {
    cmp = *_this;
    if (! cmp) { return 0; }
  } while (! CAS(_this, cmp, cmp->nx));
  void* state = cmp->state;
  NtPdrRelease(_this);
  return state;
}
_________________________________________________



>> > And all old code on Win32 will be broken,
>> > including all scripts, java, delphi etc.
>>
>> This is Microsoft's problem.
>
>
> And exactly because this they don't relay on PDR

Yup. 

0
Reply Chris 12/10/2007 7:38:27 PM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:4ca830f0-5e24-4450-9720-59eb8e640512@q60g2000hsb.googlegroups.com...
On 10 ���, 22:38, "Chris Thomasson" <cris...@comcast.net> wrote:
>
> > Don't forgot that SList is intrusive. How you are going
> > to detect that user's code doesn't hold any references in arbitrary
> > form to arbitrary object without any additional helper calls like
> > release_object()?
> > With stack analyzing? This won't work in C/C++ :)
>
> Microsoft would have to be able to determine that its freeing a SList 
> node.
> Something like:
[...]
> Not so fast ;)
[...]

> Do you still think that SEH is very bad? ;)

I think its a more of a hack than anything because seg-fault is a sign of a 
significant bug in the code, IMO of course. However, Microsoft can simply 
create a new proxy collector synchronization primitive and use that for PDR. 
The API will have absolutely no prior dependencies because it would be brand 
new. 

0
Reply Chris 12/11/2007 11:55:21 PM

<anthony.ajw@gmail.com> wrote in message 
news:ddd50164-9f75-460c-b5da-08345a32ba4f@d4g2000prg.googlegroups.com...
> On 10 Dec, 18:47, Alexander Terekhov <terek...@web.de> wrote:
>> Anthony Williams wrote:
>>
>> > Alexander Terekhov <terek...@web.de> writes:
>>
>> > >> Well, I think your SWAP version is about as good as its going to 
>> > >> get. Even
>> > >> though it can issue a superfluous wakeup at the end of a contention
>> > >> interval:
>>
>> > >> ThreadA: SWAP(LOCKED)     - Has the lock
>> > >> ThreadB: SWAP(LOCKED)     - Contention!
>> > >> ThreadA: SWAP(UNLOCKED)   - Issues nothing
>> > >> ThreadB: SWAP(CONTENTION) - Has the lock
>> > >> ThreadB: SWAP(UNLOCKED)   - Issues superfluous wakeup
>>
>> > >> That's fine with me.
>>
[...]
> The swap-based scheme can easily generate spurious signals on
> contention too:
[...]

Right. The algorithm will _always_ execute a superfluous-wake at the end of 
a contention interval:

http://groups.google.com/group/comp.programming.threads/msg/9fe0f8ff3e1c4cd3
(relevant part at end of post...)

If the OS can optimize the superfluous-wake away into fast-path (e.g., no 
waiters == NOP) then the performance hit should be minimal. In this case it 
would be a QOI issue on the implementation of the event synchronization 
object in Windows. You can do a waiters-bit implementation to get some 
better information on the waiters. Something like:

________________________________________________________________
#define WAITER_BIT() 0x80000000
#define LOCKED_BIT() 0x1

static ATOMIC_WORD_t m_state = 0;


void lock() {
  while(ATOMIC_BTS(&m_state, LOCKED_BIT())) {
    // wait if lock-bit is set
    WAITSET_WAIT(&m_state, LOCKED_BIT());
  }
  membar #StoreLoad | #StoreStore;
}

void unlock() {
  membar #LoadStore | #StoreStore;
  if (ATOMIC_SWAP(&m_state) & WAITER_BIT()) {
    WAITSET_WAKE(&m_state, 1); // wake one
  }
}
________________________________________________________________



IIRC, the implementation of WAITSET_WAIT would be something like:
________________________________________________________________
int WAITSET_WAIT(
  ATOMIC_WORD_t *pstate,
  ATOMIC_WORD_t state_cmp
) {
  ATOMIC_WORD_t cmp;
  int waitcount, need_to_wait = 0;

  MEMPAGESYS_PIN(pstate);
  WAITSETSYS_LOCK(pstate);

  waitcount = WAITSETSYS_GETCOUNT();

  do {
    cmp = *pstate;
    if ((cmp & ~WAITERS_BIT()) != state_cmp && ! waitcount) {
      goto bye:
    }
  } while (! ATOMIC_CAS(pstate, cmp, cmp | WAITERS_BIT()));

  if ((cmp & ~WAITERS_BIT()) == state_cmp) {
    WAITSETSYS_THREADPUSH(pstate); // push calling thread
  }

bye:
  WAITSETSYS_UNLOCK(pstate);
  MEMPAGESYS_UNPIN(pstate);

  if ((cmp & ~WAITERS_BIT()) == state_cmp) {
    WAITSETSYS_THREADWAIT(pstate); // wait on thread gate
  }

  return 0;
}
________________________________________________________________



0
Reply Chris 12/16/2007 12:21:59 AM

"Zeljko Vrba" <zvrba.nospam@ieee-sb1.cc.fer.hr> wrote in message 
news:slrnfla6fk.h50.zvrba@ieee-sb1.cc.fer.hr...
> On 2007-12-04, Dmitriy Vyukov <dvyukov@gmail.com> wrote:
>>
>> In Unix world paging fault (they call this segmentation fault :) ) is
>> a really fearful thing without appropriate means to cope with.
>>
> There are appropriate means to deal with it: see sigaction(2) with
> SA_SIGINFO flag, as well as the ucontext structure which contains
> the complete machine context.  Yes, it's platform-dependent, but it
> _can_ be dealt with.  (Also see libsigsegv).

I thought that the C Standard says something about returning from a 
signal-handler with a signal value of SIGFPE, SIGILL, SIGSEGV results in 
undefined behavior. 

0
Reply Chris 12/16/2007 11:04:29 PM

Chris Thomasson wrote:

> I thought that the C Standard says something 

see google
0
Reply alfred 12/17/2007 4:00:14 AM

Chris Thomasson writes:
>"Zeljko Vrba" <zvrba.nospam@ieee-sb1.cc.fer.hr> wrote in message
>> There are appropriate means to deal with it: see sigaction(2) with
>> SA_SIGINFO flag, as well as the ucontext structure which contains
>> the complete machine context.  Yes, it's platform-dependent, but it
>> _can_ be dealt with.  (Also see libsigsegv).
>
> I thought that the C Standard says something about returning from a
> signal-handler with a signal value of SIGFPE, SIGILL, SIGSEGV results in
> undefined behavior.

Yes, under 7.14.1.1p3 (The signal function).  Other standards may define
undefined (and implementation-defined/unspecified) behavoir further -
the standard which defines sigaction().  Undefined behaviour doesn't
mean the compiler is required to output misbehaving code:-)

-- 
Hallvard
0
Reply Hallvard 12/17/2007 6:05:02 AM

I've been toying with adding an extra flag bit to my waiter_count
implementation:

class mutex()
{
    long flag;
    HANDLE event;

    static unsigned const locked_flag_bit=31;
    static unsigned const event_flag_bit=30;

    static long const locked_value=1<<locked_flag_bit;
    static long const event_set_value=1<<event_flag_bit;

public:
    mutex():
        flag(0),event(CreateEvent(0,0,0,0))
    {}

    void lock()
    {
        long old=flag;
        while(true)
        {
            long const current=InterlockedCompareExchange(&flag,(old+1)|locked_value,old);
            if(current==old) break;
            old=current;
        }
        
        if(old&locked_value)
        {
            // slow path: mutex already locked
            ++old; // we're waiting too
            bool lock_acquired=false;
            do
            {
                if(WaitForSingleObject(event,INFINITE))
                {
                    assert(false);
                }
                old-=(locked_value+1); // let's hope it's unlocked
                old|=event_set_value; // event was probably set, because it woke us.
                while(true)
                {
                    long const current=InterlockedCompareExchange(&flag,(old&~event_set_value)|locked_value,old);
                    if(current==old) break;
                    old=current;
                }
                lock_acquired=!(old&locked_value);
            }while(!lock_acquired);
        }
    }

    void unlock()
    {
        long const offset=locked_value+1;
        long const old=InterlockedExchangeAdd(&flag,(~offset)+1);
        if(!(old&event_set_value) && (old>offset))
        {
            // slow path: there are waiters and the event isn't set
            if(!InterlockedBitTestAndSet(&flag,event_flag_bit))
            {
                // it was us that set the flag, set the event
                SetEvent(event);
            }
        }
    }
};

For an uncontended lock(), it does one CAS loop. For a contended lock, it
waits on the event, and one CAS loop between waits.

For an unlock() with no contention it does one XADD. If there are waiters, and
they haven't already been signalled, set the "signalled" flag (LOCK BTS), and
set the event if it was really us that set it.

For a timed lock, we can decrement the count when the wait times out to
avoid spurious SetEvent calls.

This way, if a high priority thread has the lock, and other threads are
waiting, and it unlocks and relocks the mutex before they wake, then the event
will still only be set once, until that wake has been consumed.

Compared to the swap-based method, we have a CAS loop in lock on entry and
every wake, vs two interlocked ops on entry, and one on every wake.

In unlock, we have one interlocked op on the fast path, the same, and an extra
interlocked op on the slow path, vs fewer unnecessary SetEvent calls.

Anthony
-- 
Anthony Williams
Just Software Solutions Ltd - http://www.justsoftwaresolutions.co.uk
Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL
0
Reply Anthony 12/17/2007 10:15:27 AM

"Anthony Williams" <anthony_w.geo@yahoo.com> wrote in message 
news:lk7ty6dc.fsf@yahoo.com...
[...]
> Compared to the swap-based method, we have a CAS loop in lock on entry and
> every wake, vs two interlocked ops on entry, and one on every wake.

The swap-based method has a single interlocked op on fast-path uncontended 
lock. On the slow-path it enters a swap-loop. After its in the loop, it will 
only need a single swap instruction per-loop iteration:
____________________________________________
void lock() {
  if (ATOMIC_SWAP(&m_state, LOCKED)) {
    /* slow-path; enter wait loop */
    while (ATOMIC_SWAP(&m_state, CONTENTION)) {
      WaitForSingleObject(m_wset, INFINITE);
    }
  }
  membar #StoreLoad | #StoreStore;
}
____________________________________________



> In unlock, we have one interlocked op on the fast path, the same, and an 
> extra
> interlocked op on the slow path, vs fewer unnecessary SetEvent calls.
[...]

fewer calls to SetEvent is fine by me. The only quibble is that you need to 
go through a lock-free CAS-Loop for the initial fast-path. The swap-based 
version has a wait-free (e.g., loop-free) fast-path. IMHO, that is a winner 
for me. The caveat is the spurious wait at the end of every single 
contention-interval. Also, you can do Alex's algorithm on an ARM9 that only 
has a SWAP operation. Newer ARMS have LL/SC. 

0
Reply Chris 12/17/2007 9:33:35 PM

> I think its a more of a hack than anything because seg-fault is a sign of 
> a significant bug in the code, IMO of course. However, Microsoft can 
> simply

Only if the design doesn't leverage faults.  Virtual memory is a good 
example where no one would say trapping a seg fault is due to a code bug.

> create a new proxy collector synchronization primitive and use that for 
> PDR. The API will have absolutely no prior dependencies because it would 
> be brand new. 


0
Reply Ben 1/8/2008 8:01:24 PM

"Ben Voigt [C++ MVP]" <rbv@nospam.nospam> wrote in message 
news:ebxCzDjUIHA.748@TK2MSFTNGP04.phx.gbl...
>> I think its a more of a hack than anything because seg-fault is a sign of 
>> a significant bug in the code, IMO of course. However, Microsoft can 
>> simply
>
> Only if the design doesn't leverage faults.  Virtual memory is a good 
> example where no one would say trapping a seg fault is due to a code bug.

Touch�


>
>> create a new proxy collector synchronization primitive and use that for 
>> PDR. The API will have absolutely no prior dependencies because it would 
>> be brand new.
>
> 

0
Reply Chris 1/9/2008 1:23:53 AM

"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:f8c12d8d-3057-45a9-893f-66203c957f34@i12g2000prf.googlegroups.com...
On Jan 9, 4:23 am, "Chris Thomasson" <cris...@comcast.net> wrote:
> "Ben Voigt [C++ MVP]" <r...@nospam.nospam> wrote in 
> messagenews:ebxCzDjUIHA.748@TK2MSFTNGP04.phx.gbl...
>
> >> I think its a more of a hack than anything because seg-fault is a sign 
> >> of
> >> a significant bug in the code, IMO of course. However, Microsoft can
> >> simply
>
> > Only if the design doesn't leverage faults.  Virtual memory is a good
> > example where no one would say trapping a seg fault is due to a code 
> > bug.
>
> Touch�


bit stealing from pointers
index offset
deferred memory reclamation
hazard pointers
elimination back-off
recursive split-ordering
inconsistent back-link hint
obstruction-free
ABA tags

hmmmm... imho "infrequent paging fault" fits in this list :)


If you want clean and straightforward code without tricks so you
better use locks :)))


Dmitriy V'jukov 

0
Reply Chris 1/10/2008 6:45:55 AM

[forget that last post as it was in error.]


"Dmitriy Vyukov" <dvyukov@gmail.com> wrote in message 
news:f8c12d8d-3057-45a9-893f-66203c957f34@i12g2000prf.googlegroups.com...
On Jan 9, 4:23 am, "Chris Thomasson" <cris...@comcast.net> wrote:
[...]

> hmmmm... imho "infrequent paging fault" fits in this list :)


> If you want clean and straightforward code without tricks so you
> better use locks :)))

Even locking algorithms use tricks... However, I would not call handling a 
segmentation-fault a trick... I am still convinced that its a hack around a 
real problem wrt application code... 

0
Reply Chris 1/10/2008 6:47:48 AM

94 Replies
218 Views

(page loaded in 1.189 seconds)

Similiar Articles:




7/24/2012 5:29:38 AM


Reply: