> "BRANDON, JOHN M" <brandon@dalsemi.com> wrote in message
> news:06091211331797@dscis6-0.dalsemi.com...
> >I have 2 servers DS20 (2x cpu's) and 1 server ES40 (4x cpu's).
> >
> > All three servers run FOCUS (and various other app's)
> >
> > I was recently testing the performance of the job-streams on these servers
> > and
> > found that my MP Synchronization was nominal on the DS20's and extremely
> > high
> > on the ES40.
> >
> > For example,
> > 10% or less on the DS20
> > 100% to 200% on the ES40
> >
> > Any reason why the big difference?
>
> What version of VMS? Are you current on patches for that version? If
> V7.3-2 or later please execute SYS$EXAMPLES:SPL.COM during a period of
> elevated MP Synch time and see what spinlock(s) have the most usage.
> Jilly
Guess that would help... V7.2-1
John "REBOOT" Brandon
VMS Systems Administrator
firstname.lastname.spam.me.not@dalsemi.com
|
|
0
|
|
|
|
Reply
|
brandon18 (250)
|
9/12/2006 7:55:29 PM |
|
BRANDON, JOHN M wrote:
>> "BRANDON, JOHN M" <brandon@dalsemi.com> wrote in message
>> news:06091211331797@dscis6-0.dalsemi.com...
>>> I have 2 servers DS20 (2x cpu's) and 1 server ES40 (4x cpu's).
....
>>> I was recently testing the performance of the job-streams on these servers
>>> and found that my MP Synchronization was nominal on the DS20's and extremely
>>> high on the ES40.
>>>
>>> For example,
>>> 10% or less on the DS20
>>> 100% to 200% on the ES40
>>>
>>> Any reason why the big difference?
Without some idea of where the multiprocessing synchronization time is going,
no -- there are any number of interlock structures that could be contributing
here. There have been on-going efforts to split up these structures and move to
finer granularity around various operations.
>> What version of VMS? Are you current on patches for that version? If
>> V7.3-2 or later please execute SYS$EXAMPLES:SPL.COM during a period of
>> elevated MP Synch time and see what spinlock(s) have the most usage.
> Guess that would help... V7.2-1
Clustered?
What sort of I/O device(s)?
What happens when you temporarily drop from 4 CPUs down to 1 or 2? (STOP/CPU
your way down, for purposes of testing.) Do you see a big drop-off in MPSYNC?
Are there differences in settings for the working set, for instance, or disk
contention on the system, or such, between/among these systems?
Can you see what are the jobs completing for? Memory? Disk? I/O? Network?
Do you have a disk around (and licenses) where you can try OpenVMS V7.3-2
with the configuration? V7.3 saw changes to off-load I/O Lock 8 activity --
this was one of the major locks on earlier releases. A number of folks have
found that getting to V7.3-1 or V7.3-2 -- or more current -- really helps system
and application performance. V7.2-1H1 added spinlock tracing, so you're
unfortunately below that release, too. V7.3 adds SDA LCK/RLOCK tracing, too.
AMDS can sometimes help spot culprits, too.
What I generally end up doing here is looking at what the particular
applications are doing. In detail.
|
|
0
|
|
|
|
Reply
|
hoff-remove-this (566)
|
9/12/2006 8:20:34 PM
|
|
"BRANDON, JOHN M" <brandon@dalsemi.com> wrote in message
news:06091214552903@dscis6-0.dalsemi.com...
>> "BRANDON, JOHN M" <brandon@dalsemi.com> wrote in message
>> news:06091211331797@dscis6-0.dalsemi.com...
>> >I have 2 servers DS20 (2x cpu's) and 1 server ES40 (4x cpu's).
>> >
>> > All three servers run FOCUS (and various other app's)
>> >
>> > I was recently testing the performance of the job-streams on these
>> > servers
>> > and
>> > found that my MP Synchronization was nominal on the DS20's and
>> > extremely
>> > high
>> > on the ES40.
>> >
>> > For example,
>> > 10% or less on the DS20
>> > 100% to 200% on the ES40
>> >
>> > Any reason why the big difference?
>>
>> What version of VMS? Are you current on patches for that version? If
>> V7.3-2 or later please execute SYS$EXAMPLES:SPL.COM during a period of
>> elevated MP Synch time and see what spinlock(s) have the most usage.
>> Jilly
>
> Guess that would help... V7.2-1
>
If you're running multiple CPUs you will notice a performance boost when
only upgrading from 7.3-1 to 7.3-2. Prior to 7.3-2 many internal processes
erroneously serialized on an internal flag called IOLOCK8 which limited SMP
scalability.
http://www.canacu.org/dantoni-vms732update.ppt
On top of that, I remember big performance improvements due to XFC (extended
file cache)
Neil Rieck
Kitchener/Waterloo/Cambridge,
Ontario, Canada.
http://www3.sympatico.ca/n.rieck/links/cool_openvms.html
|
|
0
|
|
|
|
Reply
|
n.rieck (1972)
|
9/12/2006 10:49:39 PM
|
|
Neil Rieck wrote:
> If you're running multiple CPUs you will notice a performance boost when
> only upgrading from 7.3-1 to 7.3-2. Prior to 7.3-2 many internal processes
> erroneously serialized on an internal flag called IOLOCK8 which limited SMP
> scalability.
There wasn't anything particularly "erroneous" about the IOLOCK8 spinlock
synchronization and the associated activity, that's how OpenVMS was designed to
work, and it's the SMP direct descendant of the IPL8 synchronization.
What happened at V7.3-2 was that the on-going performance work found that the
various pieces that were synchronizing on IOLOCK8 could be split into finer
granularity, and substantially reducing the contention on that spinlock.
IOLOCK8 is the traditional main system data structure lock, and -- prior to
the V7.3-2 changes -- a whole lot of the OpenVMS internal data structures were
protected by that particular spinlock.
As we've added tools (the ssl system service logging, the spl spinlock
tracing, etc) we've identified areas of contention or of heavy activity, and
have targeted these for work. As is the case with application performance, you
can be surprised what a tracing will tell you -- tools such as DECset PCA, DTM,
and SCA can provide insight into what the application code is actually doing and
where it is spending its time, and the analogous mechanisms added into OpenVMS
provided similar insight.
|
|
0
|
|
|
|
Reply
|
hoff-remove-this (566)
|
9/13/2006 1:40:24 PM
|
|
"Hoff Hoffman" <hoff-remove-this@hp.com> wrote in message
news:45080a44@usenet01.boi.hp.com...
> Neil Rieck wrote:
>
[...snip...]
>
> There wasn't anything particularly "erroneous" about the IOLOCK8
> spinlock synchronization and the associated activity, that's how OpenVMS
> was designed to work, and it's the SMP direct descendant of the IPL8
> synchronization.
>
> What happened at V7.3-2 was that the on-going performance work found
> that the various pieces that were synchronizing on IOLOCK8 could be split
> into finer granularity, and substantially reducing the contention on that
> spinlock.
>
> IOLOCK8 is the traditional main system data structure lock, and -- prior
> to the V7.3-2 changes -- a whole lot of the OpenVMS internal data
> structures were protected by that particular spinlock.
>
> As we've added tools (the ssl system service logging, the spl spinlock
> tracing, etc) we've identified areas of contention or of heavy activity,
> and have targeted these for work. As is the case with application
> performance, you can be surprised what a tracing will tell you -- tools
> such as DECset PCA, DTM, and SCA can provide insight into what the
> application code is actually doing and where it is spending its time, and
> the analogous mechanisms added into OpenVMS provided similar insight.
>
Thanks for the clarification.
Maybe my use of the word "erroneous" was not entirely accurate but I found
an entry in my notes from the 7.3-2 presentation from Gaitan D'Antoni
reminding me that "sometimes progammers do things by habbit rather than
design intention".
Adding to your point, Gaitan discussed some large SMP systems (in Europe)
where a customers system was bogging down, wasn't I/O bound, but inserting
additional CPUs didn't improve things.
Snips from the 7.3-2 power point:
�LAN Fastpath is designed to reduce the contention for the IOLOCK8 spinlock
and to allow LAN to perform its I/O processing on a CPU other than the
primary, improving SMP performance scalability.
�LAN Drivers
�Move off of IOLOCK8 to LAN device specific spinlocks
�Allow device interrupts to CPUs other than the primary
�PEdriver
�Move off of IOLOCK8 to PE specific spinlocks
�Allow a specific CPU to be chosen for PEdriver processing
�Allows PEDRIVER to process cluster communications on a single CPU
�Reduces CPU cost due to streamlined codepath also for served block data
�Fastpath for Smart Array 5300
�Backplane RAID controller
�Offload IOLOCK8 spinlock, allows CPU selection
�Scalable Kernel
�Performance and scalability improvements for SMP systems
�Multiple dynamic spinlocks
�No more IOLOCK8
OpenVMS is very cool and I suspect the equivalent of crossing this
speed-bump (after your SMP granularity analysis) has not even been
considered in other operating system like Windows etc.
Neil Rieck
Kitchener/Waterloo/Cambridge,
Ontario, Canada.
http://www3.sympatico.ca/n.rieck/
|
|
0
|
|
|
|
Reply
|
n.rieck (1972)
|
9/14/2006 11:15:34 AM
|
|
Neil Rieck wrote:
> "Hoff Hoffman" <hoff-remove-this@hp.com> wrote in message
> news:45080a44@usenet01.boi.hp.com...
>> Neil Rieck wrote:
>>
> [...snip...]
>> There wasn't anything particularly "erroneous" about the IOLOCK8
>> spinlock synchronization and the associated activity, that's how OpenVMS
>> was designed to work, and it's the SMP direct descendant of the IPL8
>> synchronization.
>>
>> What happened at V7.3-2 was that the on-going performance work found
>> that the various pieces that were synchronizing on IOLOCK8 could be split
>> into finer granularity, and substantially reducing the contention on that
>> spinlock.
>>
>> IOLOCK8 is the traditional main system data structure lock, and -- prior
>> to the V7.3-2 changes -- a whole lot of the OpenVMS internal data
>> structures were protected by that particular spinlock.
>>
>> As we've added tools (the ssl system service logging, the spl spinlock
>> tracing, etc) we've identified areas of contention or of heavy activity,
>> and have targeted these for work. As is the case with application
>> performance, you can be surprised what a tracing will tell you -- tools
>> such as DECset PCA, DTM, and SCA can provide insight into what the
>> application code is actually doing and where it is spending its time, and
>> the analogous mechanisms added into OpenVMS provided similar insight.
>>
> Thanks for the clarification.
>
> Maybe my use of the word "erroneous" was not entirely accurate but I found
> an entry in my notes from the 7.3-2 presentation from Gaitan D'Antoni
> reminding me that "sometimes progammers do things by habbit rather than
> design intention".
>
> Adding to your point, Gaitan discussed some large SMP systems (in Europe)
> where a customers system was bogging down, wasn't I/O bound, but inserting
> additional CPUs didn't improve things.
>
> Snips from the 7.3-2 power point:
>
> �LAN Fastpath is designed to reduce the contention for the IOLOCK8 spinlock
> and to allow LAN to perform its I/O processing on a CPU other than the
> primary, improving SMP performance scalability.
> �LAN Drivers
> �Move off of IOLOCK8 to LAN device specific spinlocks
> �Allow device interrupts to CPUs other than the primary
> �PEdriver
> �Move off of IOLOCK8 to PE specific spinlocks
> �Allow a specific CPU to be chosen for PEdriver processing
> �Allows PEDRIVER to process cluster communications on a single CPU
> �Reduces CPU cost due to streamlined codepath also for served block data
> �Fastpath for Smart Array 5300
> �Backplane RAID controller
> �Offload IOLOCK8 spinlock, allows CPU selection
> �Scalable Kernel
> �Performance and scalability improvements for SMP systems
> �Multiple dynamic spinlocks
> �No more IOLOCK8
>
> OpenVMS is very cool and I suspect the equivalent of crossing this
> speed-bump (after your SMP granularity analysis) has not even been
> considered in other operating system like Windows etc.
It's also possible that some of these others never implemented such a
bottleneck. Unless they were blindly copying VMS. :-)
--
David Froble Tel: 724-529-0450
Dave Froble Enterprises, Inc. E-Mail: davef@tsoft-inc.com
DFE Ultralights, Inc.
170 Grimplin Road
Vanderbilt, PA 15486
|
|
0
|
|
|
|
Reply
|
davef3 (3419)
|
9/14/2006 2:47:08 PM
|
|
|
5 Replies
53 Views
(page loaded in 0.097 seconds)
|