Possible distributed lock issue

  • Follow


One of my customers has two RX2620 Itaniums in a cluster (OpenVMS 8.3,
current patches) connected to a MSA1000 disk system. It is under HP
support and the MSA firmware is up to date and the fiber adapters are
scheduled to be brought up to the latest firmware soon. There is a cat 6
crossover cable between EIB0 on both systems set up as 1000Mbit full
duplex with no errors showing on the ethernet ports.

The first system had one job that just did a "$DIR [x...]/siz/grand" on
each top level directory on each disk to produce a disk space usage
report. That system showed a lock rate of about 6000 locks/sec.

The second system had one batch job that was reading a health claim
input file and matching the claims to the doctor/hospital based on a
variety of criteria. It reads a lot of records to find the one that best
matches. This process was getting about 4% cpu and showed about 3000
locks/sec and processed about 100 claims in two hours.

When I suspended the DIR job on the first system the second batch job on
the other system jumped to between 90 and 100% cpu with a lock rate of
over 120,000 locks/sec and was then processing about 50 claims/minute. I
unsuspending the first job and the lock rate on the first system went to
about 3000 locks/sec with no visible effect on the second job.

I have spent quite a bit of time looking in to this and have a lot more
information from SCACP and ANALYZE/SYSTEM.

HP to date has only confirmed that there are no hardware problems and
have reviewed the set up on the MSA1000.

I suspect there is some kind of locking issue but am open to any
suggestions.

Jeff Coffield
0
Reply Jeffrey 3/19/2010 5:06:52 PM

In article <ho0avd$u46$1@news.eternal-september.org>, "Jeffrey H. Coffield" <jeffrey@digitalsynergyinc.com> writes:
> 
> I suspect there is some kind of locking issue but am open to any
> suggestions.

   Cache/buffer sizes and hit ratios?

0
Reply koehler 3/19/2010 5:09:59 PM



Bob Koehler wrote:
> In article <ho0avd$u46$1@news.eternal-september.org>, "Jeffrey H. Coffield" <jeffrey@digitalsynergyinc.com> writes:
>> I suspect there is some kind of locking issue but am open to any
>> suggestions.
> 
>    Cache/buffer sizes and hit ratios?
> 
System 1:

 In use (GBytes)                 1.93
 Percentage Read I/Os              93%
 Read hit rate                     86%

System 2:

 In use (GBytes)                 1.88
 Percentage Read I/Os              88%
 Read hit rate                     86%

If it was a cache issue, why would suspending a job on system 1 allow a
job on system 2 to run 25 times faster?

Jeff
0
Reply Jeffrey 3/19/2010 5:31:32 PM

"Jeffrey H. Coffield" <jeffrey@digitalsynergyinc.com> wrote in message 
news:ho0avd$u46$1@news.eternal-september.org...
> One of my customers has two RX2620 Itaniums in a cluster (OpenVMS 8.3,
> current patches) connected to a MSA1000 disk system. It is under HP
> support and the MSA firmware is up to date and the fiber adapters are
> scheduled to be brought up to the latest firmware soon. There is a cat 6
> crossover cable between EIB0 on both systems set up as 1000Mbit full
> duplex with no errors showing on the ethernet ports.
>
> The first system had one job that just did a "$DIR [x...]/siz/grand" on
> each top level directory on each disk to produce a disk space usage
> report. That system showed a lock rate of about 6000 locks/sec.
>
> The second system had one batch job that was reading a health claim
> input file and matching the claims to the doctor/hospital based on a
> variety of criteria. It reads a lot of records to find the one that best
> matches. This process was getting about 4% cpu and showed about 3000
> locks/sec and processed about 100 claims in two hours.
>
> When I suspended the DIR job on the first system the second batch job on
> the other system jumped to between 90 and 100% cpu with a lock rate of
> over 120,000 locks/sec and was then processing about 50 claims/minute. I
> unsuspending the first job and the lock rate on the first system went to
> about 3000 locks/sec with no visible effect on the second job.
>
> I have spent quite a bit of time looking in to this and have a lot more
> information from SCACP and ANALYZE/SYSTEM.
>
> HP to date has only confirmed that there are no hardware problems and
> have reviewed the set up on the MSA1000.
>
> I suspect there is some kind of locking issue but am open to any
> suggestions.
>
> Jeff Coffield

I'd almost bet that you are serializing thru an F11B$ resource.  Have you 
turmed on lock tracing on both of these systems?  See SDA> LCK and use LCK 
START COLLECT/PROCESS  and collect stats during each behaviour. 


0
Reply Jilly 3/19/2010 5:37:46 PM

On Mar 19, 1:31=A0pm, "Jeffrey H. Coffield"
<jeff...@digitalsynergyinc.com> wrote:
> Bob Koehler wrote:
> > In article <ho0avd$u4...@news.eternal-september.org>, "Jeffrey H. Coffi=
eld" <jeff...@digitalsynergyinc.com> writes:
> >> I suspect there is some kind of locking issue but am open to any
> >> suggestions.
>
> > =A0 =A0Cache/buffer sizes and hit ratios?
>
> System 1:
>
> =A0In use (GBytes) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.93
> =A0Percentage Read I/Os =A0 =A0 =A0 =A0 =A0 =A0 =A093%
> =A0Read hit rate =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 86%
>
> System 2:
>
> =A0In use (GBytes) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 1.88
> =A0Percentage Read I/Os =A0 =A0 =A0 =A0 =A0 =A0 =A088%
> =A0Read hit rate =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 86%
>
> If it was a cache issue, why would suspending a job on system 1 allow a
> job on system 2 to run 25 times faster?
>
> Jeff

Jeff,

I would want to know more about the health claims processing job. What
files were being processed? How is the data structured.

There are many possibilities. Serialization of an Files 11 resource is
one possibility. Serialization through disk arms is another.

On your question of how can cache me affected, consider what the cache
contains. If two consumers are competing for cache, then the hit rate
of the cache will fall. How much it will fall depends precisely on the
reference pattern of the two jobs. On caches at all levels, it is
common to see a dramatic drop in performance if the "working set" of
elements being used is larger than the available space.

It is effectively the same phenomenon as "thrashing", which refers to
what happens when the virtual-real memory ratio on a virtual storage
system gets too high.

Details are needed to diagnose accurately.

- Bob Gezelter, http://www.rlgsc.com
0
Reply Bob 3/19/2010 7:38:16 PM

On Mar 19, 1:06=A0pm, "Jeffrey H. Coffield"
<jeff...@digitalsynergyinc.com> wrote:
> One of my customers has two RX2620 Itaniums in a cluster (OpenVMS 8.3,
> current patches) connected to a MSA1000 disk system. It is under HP
> support and the MSA firmware is up to date and the fiber adapters are
> scheduled to be brought up to the latest firmware soon. There is a cat 6
> crossover cable between EIB0 on both systems set up as 1000Mbit full
> duplex with no errors showing on the ethernet ports.
>
> The first system had one job that just did a "$DIR [x...]/siz/grand" on
> each top level directory on each disk to produce a disk space usage
> report. That system showed a lock rate of about 6000 locks/sec.
>
> The second system had one batch job that was reading a health claim
> input file and matching the claims to the doctor/hospital based on a
> variety of criteria. It reads a lot of records to find the one that best
> matches. This process was getting about 4% cpu and showed about 3000
> locks/sec and processed about 100 claims in two hours.
>
> When I suspended the DIR job on the first system the second batch job on
> the other system jumped to between 90 and 100% cpu with a lock rate of
> over 120,000 locks/sec and was then processing about 50 claims/minute. I
> unsuspending the first job and the lock rate on the first system went to
> about 3000 locks/sec with no visible effect on the second job.
>
> I have spent quite a bit of time looking in to this and have a lot more
> information from SCACP and ANALYZE/SYSTEM.
>
> HP to date has only confirmed that there are no hardware problems and
> have reviewed the set up on the MSA1000.
>
> I suspect there is some kind of locking issue but am open to any
> suggestions.
>
> Jeff Coffield


Perhaps this is just a curiousity question ... would running the DIR
job on the *same* system change the impact ?


Verne Britton, WVNET
0
Reply Verne 3/19/2010 7:55:09 PM

snip

> > Jeff Coffield
>
> I'd almost bet that you are serializing thru an F11B$ resource. =A0Have y=
ou
> turmed on lock tracing on both of these systems? =A0See SDA> LCK and use =
LCK
> START COLLECT/PROCESS =A0and collect stats during each behaviour.

Or install OpenVMS Availability Manager, which is almost always my
answer to a question about distributed locking...


Sean
0
Reply Curlsman 3/19/2010 8:11:07 PM


Bob Gezelter wrote:

> On your question of how can cache me affected, consider what the cache
> contains. If two consumers are competing for cache, then the hit rate
> of the cache will fall. How much it will fall depends precisely on the
> reference pattern of the two jobs. On caches at all levels, it is
> common to see a dramatic drop in performance if the "working set" of
> elements being used is larger than the available space.
> 
> It is effectively the same phenomenon as "thrashing", which refers to
> what happens when the virtual-real memory ratio on a virtual storage
> system gets too high.
> 
> Details are needed to diagnose accurately.
> 
> - Bob Gezelter, http://www.rlgsc.com

The two jobs were each on separate nodes of the cluster so I don't see
how the caching on one system could impact the other.

I will have the systems over the weekend so I plan to test the
suggestions I get here.

Jeff
0
Reply Jeffrey 3/19/2010 8:55:17 PM

 "Jeffrey H. Coffield" <jeffrey@digitalsynergyinc.com> wrote in message 
 news:ho0avd$u46$1@news.eternal-september.org...

> One of my customers has two RX2620 Itaniums in a cluster (OpenVMS 8.3,
> current patches) connected to a MSA1000 disk system.


  No other systems in the cluster?  (Is it safe to assume that one of
  these two has the lock directory, and one (maybe the same) has the
  lock master?)

0
Reply koehler 3/19/2010 9:16:00 PM

In article <ho0cdm$4nd$1@news.eternal-september.org>, "Jeffrey H. Coffield" <jeffrey@digitalsynergyinc.com> writes:
>> 
> System 1:
> 
>  In use (GBytes)                 1.93
>  Percentage Read I/Os              93%
>  Read hit rate                     86%
> 
> System 2:
> 
>  In use (GBytes)                 1.88
>  Percentage Read I/Os              88%
>  Read hit rate                     86%
> 
> If it was a cache issue, why would suspending a job on system 1 allow a
> job on system 2 to run 25 times faster?

   If they were both working through the same cache at some level, but
   accessing diferent parts of the disk, you could have had poor cache 
   hit ratios.

0
Reply koehler 3/19/2010 9:16:01 PM


Bob Koehler wrote:
> In article <ho0cdm$4nd$1@news.eternal-september.org>, "Jeffrey H. Coffield" <jeffrey@digitalsynergyinc.com> writes:
>> System 1:
>>
>>  In use (GBytes)                 1.93
>>  Percentage Read I/Os              93%
>>  Read hit rate                     86%
>>
>> System 2:
>>
>>  In use (GBytes)                 1.88
>>  Percentage Read I/Os              88%
>>  Read hit rate                     86%
>>
>> If it was a cache issue, why would suspending a job on system 1 allow a
>> job on system 2 to run 25 times faster?
> 
>    If they were both working through the same cache at some level, but
>    accessing diferent parts of the disk, you could have had poor cache 
>    hit ratios.
> 

There are only two systems in the cluster. As far as I understand the
caching is only for that system. The jobs were on two separate systems.
I believe that eliminates cache issues.

Jeff
0
Reply Jeffrey 3/19/2010 9:17:49 PM

On Mar 19, 4:55=A0pm, "Jeffrey H. Coffield"
<jeff...@digitalsynergyinc.com> wrote:
> Bob Gezelter wrote:
> > On your question of how can cache me affected, consider what the cache
> > contains. If two consumers are competing for cache, then the hit rate
> > of the cache will fall. How much it will fall depends precisely on the
> > reference pattern of the two jobs. On caches at all levels, it is
> > common to see a dramatic drop in performance if the "working set" of
> > elements being used is larger than the available space.
>
> > It is effectively the same phenomenon as "thrashing", which refers to
> > what happens when the virtual-real memory ratio on a virtual storage
> > system gets too high.
>
> > Details are needed to diagnose accurately.
>
> > - Bob Gezelter,http://www.rlgsc.com
>
> The two jobs were each on separate nodes of the cluster so I don't see
> how the caching on one system could impact the other.
>
> I will have the systems over the weekend so I plan to test the
> suggestions I get here.
>
> Jeff

Jeff,

The OpenVMS disk caches are on different systems. What are the hit
rates on the caches (e.g., SHOW MEMORY)? Also, how many IOs/second are
going to the MSA?

I was obliquely referring to the fact that many if not most storage
arrays improve performance by caching. While powerful, caching is
always a question of a degree of sleight of hand. If there is too much
activity, or the wrong sequence of activity, then the cache
effectively becomes useless, and performance degrades to that possible
with the real underlying disks. As has been often said, the laws of
physics may be stretched, but they cannot be repealed.

That said, there is a good chance that studying the problem will yield
a better understanding of precisely what is happening, and how it can
be resolved beneficially.

- Bob Gezelter, http://www.rlgsc.com
0
Reply Bob 3/19/2010 11:02:11 PM

In article <ho0plt$qgh$1@news.eternal-september.org>, "Jeffrey H. Coffield" <jeffrey@digitalsynergyinc.com> writes:
>
>
>Bob Koehler wrote:
>> In article <ho0cdm$4nd$1@news.eternal-september.org>, "Jeffrey H. Coffield" <jeffrey@digitalsynergyinc.com> writes:
>>> System 1:
>>>
>>>  In use (GBytes)                 1.93
>>>  Percentage Read I/Os              93%
>>>  Read hit rate                     86%
>>>
>>> System 2:
>>>
>>>  In use (GBytes)                 1.88
>>>  Percentage Read I/Os              88%
>>>  Read hit rate                     86%
>>>
>>> If it was a cache issue, why would suspending a job on system 1 allow a
>>> job on system 2 to run 25 times faster?
>> 
>>    If they were both working through the same cache at some level, but
>>    accessing diferent parts of the disk, you could have had poor cache 
>>    hit ratios.
>> 
>
>There are only two systems in the cluster. As far as I understand the
>caching is only for that system. The jobs were on two separate systems.
>I believe that eliminates cache issues.

File caching?  The cache may be local to each system but if this is a
cluster, that cache data needs to keep in step with what's on a disk.
Locks serve this purpose.

If you can outline a bit more detail of the file accesses, it might be
more apparent what's going on.

-- 
VAXman- A Bored Certified VMS Kernel Mode Hacker    VAXman(at)TMESIS(dot)ORG

  http://www.quirkfactory.com/popart/asskey/eqn2.png
  
Yeah. You know, it occurs to me that the best way you hurt rich people is by
turning them into poor people. -- Billy Ray Valentine
0
Reply VAXman 3/19/2010 11:16:48 PM

On Mar 19, 1:06=A0pm, "Jeffrey H. Coffield"
<jeff...@digitalsynergyinc.com> wrote:
> When I suspended the DIR job on the first system the second batch job on
> the other system jumped to between 90 and 100% cpu with a lock rate of
> over 120,000 locks/sec and was then processing about 50 claims/minute. I
> unsuspending the first job and the lock rate on the first system went to
> about 3000 locks/sec with no visible effect on the second job.
>

By any chance, when you got the second batch job up to 90% cpu, was it
all EXEC mode?

I suspect the application that processes the claims file is doing
something really stupid.  Like opening/closing one or more files
multiple times for each record processed.  I've seen this before, and
after studying and rewriting the application the run time went from
six hours to about 11 minutes, with a dramatic drop in CPU usage (as
you can imagine).

And that was on an AlphaServer GS80.

www.noesys.com
0
Reply FrankS 3/19/2010 11:19:00 PM


VAXman- @SendSpamHere.ORG wrote:

> If you can outline a bit more detail of the file accesses, it might be
> more apparent what's going on.
> 

The first job is only DCL directory commands.

The second job opens all files in one place only at the start. It
basically searches for matches on either a name or phone number by
alternate keys. The program is written in Basic and all these reads are
done using GET REGARDLESS. It does read a lot of records due to variants
in abbreviations. Once a match is either found or not, a few records are
written to an output file for later processing.

The files are converted and the disk defragged on a regular basis. I had
done a monitor and the file open rate and window turn rates were
basically zero on the the system while it was running. I looked at the
modes but nothing struck me as unusual there but I don't remember the
exact numbers.

This weekend I plan on setting up some tests and looking in to the SDA
LCK information (Thanks Jilly).

Jeff


0
Reply Jeffrey 3/20/2010 12:09:21 AM

Jeff,

please take into account remote locking ! What does MONI DLOCK
report ? Incoing/Outgoing lock requests are much slower than lock
requests, that can be handled locally.

Volker.
0
Reply Volker 3/20/2010 7:53:00 AM

Jeff,

forget about the XFC cache. DIRECTORY does NOT use it. It only uses
the File System Cache when reading the File Headers or the Directory
Data Cache !
Look at MONI DLOCK and maybe MONI RLOCK (lock trees moving ?). The
most likely problem would be remote (incoming/outgoing) lock requests,
which are much slower than local locks,

Volker.


0
Reply Volker 3/20/2010 12:42:55 PM

On Mar 19, 8:09=A0pm, "Jeffrey H. Coffield"
<jeff...@digitalsynergyinc.com> wrote:
> VAXman- @SendSpamHere.ORG wrote:
> > If you can outline a bit more detail of the file accesses, it might be
> > more apparent what's going on.
>
> The first job is only DCL directory commands.

Ok, then like Jilly indicates  the only directly shared resource might
be XQP serialization lock.

Jilly: "I'd almost bet that you are serializing thru an F11B$
resource.  "

 > The second job opens all files in one place only at the start.

Ok, scratch the XQP direct inference. Once the files are open your
program will not go there expect for extent. It is creating output
files a few block at a time? Sort work files? Other file open/close
you did not recognize behing a function perhaps?

$ SET WATCH FILE/WATCH=3D[MAJOR | ALL] would show activity after the
opens.

Or... maybe the lock manager (spinlock) is busy
Maybe the SPL (Spin Lock Trace) extention to SDA can help?
Easiest use for that is @SYS$EXAMPLES:SPL


>> The program is written in Basic and all these reads are
> done using GET REGARDLESS.

GET REGARDLESS is only relevant IF you have disabled RMS query locking
which tells RMS that it is OK to NOT take out any locks when RRL + NLK
is requested.

Note: BASRTL clears RAB$_ROP bits after record operations, so it is
harder than usual to see what it does.
Check out with $ SHOW RMS

Better still. Use $ SET FILE/STAT and MONI RMS /ITEM=3DLOCK or ANAL/
SYS .. SET PROC... SHOW PROC/RMS=3DFSB, or my RMS_STATS program.

The bucket locks are typically 4 - 20 times more active than the
record locks.
Only way to reduce the bucket locks? $ SET FILE/GLOB=3Dxxx


>> Once a match is either found or not, a few records are
> written to an output file for later processing.

Pre-allocated?

> I had done a monitor and the file open rate and window turn rates were
> basically zero on the the system while it was running.

Good check.

But since this is an a full hour anyway, not just a few second why not
check EVERYTHING, by activating T4 and zooming in on the test hour?

> I looked at the modes but nothing struck me as unusual there but I don't =
remember the exact numbers.

Again, good check, but T4 data would allow you to go back and double-
check the number post event.

Interesting. Keep us posted?!

Regards,

Hein van den Heuvel
HvdH Performance Consulting.
0
Reply Hein 3/21/2010 1:21:21 AM

Thank you to all who responded so far. I tried duplicating the events
that I saw and was unable to reproduce them. The processing job got 20
to 40% cpu and finished in the expected time and nothing I could do on
the other system seemed to affect it.

I did learn quite a bit more about locking and am going to continue to
dig into this.

I still feel that there is some sort of locking issue and had hoped that
 I could use this as a test case. I think the next thing is to explore
what support options we have thru HP.

If I do find something, I will report back here.

Thanks again,
Jeff Coffield
0
Reply Jeffrey 3/22/2010 4:46:59 AM

In article <ho0plt$qgh$1@news.eternal-september.org>, "Jeffrey H. Coffield" <jeffrey@digitalsynergyinc.com> writes:
> 
> There are only two systems in the cluster. As far as I understand the
> caching is only for that system. The jobs were on two separate systems.
> I believe that eliminates cache issues.

   Some disks have internal caches.  IF multiple nodes have multiple
   active caches, then yes, DLM gets involved.

0
Reply koehler 3/22/2010 1:05:51 PM

19 Replies
185 Views

(page loaded in 0.161 seconds)

Similiar Articles:











7/30/2012 3:33:59 PM


Reply: