We have a four-node VMSCluster consisting of two GS1280 "frames",
each partitioned into two nodes. For reasons best known to the
vendor of the primary application, Cerner Millennium, we're using
VIOC rather than XFC file caching. The cluster is not homogeneous
in that one node is dedicated to "non-production" versions of the
application, one node is the production database server, and two
nodes share the production "application server" duties (tier 2 in
a 3-tier architecture).
That said, the vendor is asking us to set VCC_MAXSIZE <= 256000,
whereas we have it set to 512000 on two nodes, and 768000 on a
third.
Looking at SHOW MEMORY/CACHE/FULL on the three production nodes,
I see Read Hit Rates of 58% (database node, 512000 maxsize),
15% on app server node w/512000 maxsize, and 66% on the other app
server node w/768000 maxsize (the non-prod node is at 256000 and
has a hit rate of 71%).
Question: What is the best, or at least a good, indicator of
VIOC effectiveness? Is it the read hit rate, or something else?
Secondly, what would one expect the effect would be of *reducing*
VCC_MAXSIZE by 50%, or even 75% in the one case?
I'm concerned that the vendor is making these recommendations
without taking into consideration the actual potential performance
impacts. Their comment reads like so:
"Sets size for closed file cache. 256,000 is the start value.
This should be tuned down later after full load conditions
are studied."
I.e., this seems like a new-installation setting, not one based on
actual running history of over 2 years. And note that every change
takes a downtime for Autogen then reboot <sigh...>.
Thanks for any and all hints, tips, and/or personal stories. ;-)
-Ken
--
Ken & Ann Fairfield
What: Ken dot And dot Ann
Where: Gmail dot Com
|
|
0
|
|
|
|
Reply
|
ken158 (98)
|
11/15/2007 4:46:08 AM |
|
On Nov 14, 11:46 pm, Ken Fairfield <K...@Napili.Fairfield.Home> wrote:
> We have a four-node VMSCluster consisting of two GS1280 "frames",
> each partitioned into two nodes. For reasons best known to the
> vendor of the primary application, Cerner Millennium, we're using
> VIOC rather than XFC file caching. The cluster is not homogeneous
> in that one node is dedicated to "non-production" versions of the
> application, one node is the production database server, and two
> nodes share the production "application server" duties (tier 2 in
> a 3-tier architecture).
>
> That said, the vendor is asking us to set VCC_MAXSIZE <= 256000,
> whereas we have it set to 512000 on two nodes, and 768000 on a
> third.
>
> Looking at SHOW MEMORY/CACHE/FULL on the three production nodes,
> I see Read Hit Rates of 58% (database node, 512000 maxsize),
> 15% on app server node w/512000 maxsize, and 66% on the other app
> server node w/768000 maxsize (the non-prod node is at 256000 and
> has a hit rate of 71%).
>
> Question: What is the best, or at least a good, indicator of
> VIOC effectiveness? Is it the read hit rate, or something else?
> Secondly, what would one expect the effect would be of *reducing*
> VCC_MAXSIZE by 50%, or even 75% in the one case?
>
> I'm concerned that the vendor is making these recommendations
> without taking into consideration the actual potential performance
> impacts. Their comment reads like so:
>
> "Sets size for closed file cache. 256,000 is the start value.
> This should be tuned down later after full load conditions
> are studied."
>
> I.e., this seems like a new-installation setting, not one based on
> actual running history of over 2 years. And note that every change
> takes a downtime for Autogen then reboot <sigh...>.
>
> Thanks for any and all hints, tips, and/or personal stories. ;-)
>
> -Ken
> --
> Ken & Ann Fairfield
> What: Ken dot And dot Ann
> Where: Gmail dot Com
Ken,
Caches of all types are an excellent candidate for the "Your mileage
will vary". Even in precisely the same application context, the hit
rate is, in essence, a direct manifestation of how the applications
interact with the people who use them. Thus, I have always considered
recommendations as a starting point, with the actual numbers to be
increased or decreased as experience with the operating environment.
While the original post did not include details of the disk
configuration, it is likely that there are (at least) three levels of
caching actually present: within the database, the VIOC/XFC, and the
caching within the storage controller. The optimum performance
settings are a balancing act between all three and their effects.
CPU caches have multiple levels because of size limits inherent to
geometry and packaging, thus they typically have small L1 caches,
backed by ever increasing L2 and, in some cases, L3 caches. This
relationship does not apply in the case of mass storage.
My general recommendation over the years, always tempered by the
actual situation, is to adjust the internal database caches to
maximize performance, and then, to the extent possible, exclude the
database activity from consideration for caching. If a properly tuned
database is releasing data back to mass storage it is probably making
an intelligent choice, particularly if the database cache is tuned
correctly.
As to the hit rate, the effect of a reduction in cache size depends
where on the curve one is at the present value. The general curve for
cache performance is a curve approximating exponential decay of
benefits as size increases. The true metric here is not really hit
rate, but eviction rate: How often data must be evicted from the cache
to make room for other data. A hit rate of 50% with an eviction rate
of 0% is far different than a hit rate of 50% with a 100% eviction
rate (the latter gains by adding size, the former does not).
I would be tempted to arrange things so that the cache is not polluted
by the presence of the data back and forth from the database and
Cache. I would also consider the use of RMS tuning on other active
files as an alternative to the sledgehammer of using the cache size.
Cache size (either XFC or VIOC) is a large, somewhat imprecise tool
for achieving performance. While I will not say that one cannot wield
a sledgehammer with precision, it can be challenge.
I hope that the above is helpful.
- Bob Gezelter, http://www.rlgsc.com
|
|
0
|
|
|
|
Reply
|
gezelter (537)
|
11/15/2007 11:15:06 AM
|
|
On Nov 15, 3:15 am, Bob Gezelter <gezel...@rlgsc.com> wrote:
[...]
>
> While the original post did not include details of the disk
> configuration, it is likely that there are (at least) three levels of
> caching actually present: within the database, the VIOC/XFC, and the
> caching within the storage controller. The optimum performance
> settings are a balancing act between all three and their effects.
Right, sorry about that. As in my reply to Michael Forster, I'm
actually less concerned about the database node than the two
application server nodes.
The two application server nodes each have 14x1.3Ghz processors
and 32GB of memory. The storage is on an EVA6000 where we're
using, what's the term, "fully mirrored vdisks"? The VIOC caches
are 250MB on the DB node, 250MB on one app server, and
375MB on the other.
And it is VIOC, *not* XFC, caching.
[...]
> As to the hit rate, the effect of a reduction in cache size depends
> where on the curve one is at the present value. The general curve for
> cache performance is a curve approximating exponential decay of
> benefits as size increases. The true metric here is not really hit
> rate, but eviction rate: How often data must be evicted from the cache
> to make room for other data. A hit rate of 50% with an eviction rate
> of 0% is far different than a hit rate of 50% with a 100% eviction
> rate (the latter gains by adding size, the former does not).
I include here the output of Show Memory/Cache/Full from the
three nodes. What I see is that on each node, about 100 files
are cached. There is a lot of I/O bypassing the cache, and on
the two app server nodes, most of the cache is in use (not much
in the "Free" column. What I can't see from this output is the
eviction rate...
------------------------------------------------------------------------------------
SYSMAN> do sho mem/cach/ful
%SYSMAN-I-OUTPUT, command execution on node DBB1
System Memory Resources on 15-NOV-2007 08:46:58.82
Virtual I/O Cache
Total Size (MBytes) 250.00 Read IO Count
1170423898
Free (MBytes) 151.07 Read Hit Count
670193435
In Use (MBytes) 98.92 Read Hit
Rate 57%
Write IO Bypassing Cache ********* Write IO Count
2501817621
Files Retained 99 Read IO Bypassing Cache
390546546
Write Bitmap (WBM) Memory Summary
Local bitmap count: 251 Local bitmap memory usage (MB)
41.69
Master bitmap count: 117 Master bitmap memory usage (MB)
18.86
%SYSMAN-I-OUTPUT, command execution on node APP1
System Memory Resources on 15-NOV-2007 08:46:58.87
Virtual I/O Cache
Total Size (MBytes) 250.00 Read IO Count
2485755609
Free (MBytes) 22.03 Read Hit Count
371641860
In Use (MBytes) 227.96 Read Hit
Rate 14%
Write IO Bypassing Cache 29108623 Write IO Count
738910797
Files Retained 100 Read IO Bypassing
Cache1954885777
Write Bitmap (WBM) Memory Summary
Local bitmap count: 285 Local bitmap memory usage (MB)
47.58
Master bitmap count: 71 Master bitmap memory usage (MB)
12.17
%SYSMAN-I-OUTPUT, command execution on node APP2
System Memory Resources on 15-NOV-2007 08:46:58.96
Virtual I/O Cache
Total Size (MBytes) 375.00 Read IO Count
103022387
Free (MBytes) 0.03 Read Hit Count
68204994
In Use (MBytes) 374.96 Read Hit
Rate 66%
Write IO Bypassing Cache 2681158 Write IO Count
156859274
Files Retained 100 Read IO Bypassing Cache
497166
Write Bitmap (WBM) Memory Summary
Local bitmap count: 285 Local bitmap memory usage (MB)
47.58
Master bitmap count: 71 Master bitmap memory usage (MB)
12.17
SYSMAN>
------------------------------------------------------------------------------
Thanks, Ken
--
Ken & Ann Fairfield
What: Ken dot And dot Ann
Where: Gmail dot Com
|
|
0
|
|
|
|
Reply
|
Ken.Fairfield (491)
|
11/15/2007 4:57:10 PM
|
|
On Nov 15, 11:57 am, Ken.Fairfi...@gmail.com wrote:
> On Nov 15, 3:15 am, Bob Gezelter <gezel...@rlgsc.com> wrote:
> [...]
>
>
>
> > While the original post did not include details of the disk
> > configuration, it is likely that there are (at least) three levels of
> > caching actually present: within the database, the VIOC/XFC, and the
> > caching within the storage controller. The optimum performance
> > settings are a balancing act between all three and their effects.
>
> Right, sorry about that. As in my reply to Michael Forster, I'm
> actually less concerned about the database node than the two
> application server nodes.
>
> The two application server nodes each have 14x1.3Ghz processors
> and 32GB of memory. The storage is on an EVA6000 where we're
> using, what's the term, "fully mirrored vdisks"? The VIOC caches
> are 250MB on the DB node, 250MB on one app server, and
> 375MB on the other.
>
> And it is VIOC, *not* XFC, caching.
>
> [...]
>
> > As to the hit rate, the effect of a reduction in cache size depends
> > where on the curve one is at the present value. The general curve for
> > cache performance is a curve approximating exponential decay of
> > benefits as size increases. The true metric here is not really hit
> > rate, but eviction rate: How often data must be evicted from the cache
> > to make room for other data. A hit rate of 50% with an eviction rate
> > of 0% is far different than a hit rate of 50% with a 100% eviction
> > rate (the latter gains by adding size, the former does not).
>
> I include here the output of Show Memory/Cache/Full from the
> three nodes. What I see is that on each node, about 100 files
> are cached. There is a lot of I/O bypassing the cache, and on
> the two app server nodes, most of the cache is in use (not much
> in the "Free" column. What I can't see from this output is the
> eviction rate...
>
> ------------------------------------------------------------------------------------
> SYSMAN> do sho mem/cach/ful
> %SYSMAN-I-OUTPUT, command execution on node DBB1
> System Memory Resources on 15-NOV-2007 08:46:58.82
>
> Virtual I/O Cache
> Total Size (MBytes) 250.00 Read IO Count
> 1170423898
> Free (MBytes) 151.07 Read Hit Count
> 670193435
> In Use (MBytes) 98.92 Read Hit
> Rate 57%
> Write IO Bypassing Cache ********* Write IO Count
> 2501817621
> Files Retained 99 Read IO Bypassing Cache
> 390546546
>
> Write Bitmap (WBM) Memory Summary
> Local bitmap count: 251 Local bitmap memory usage (MB)
> 41.69
> Master bitmap count: 117 Master bitmap memory usage (MB)
> 18.86
> %SYSMAN-I-OUTPUT, command execution on node APP1
> System Memory Resources on 15-NOV-2007 08:46:58.87
>
> Virtual I/O Cache
> Total Size (MBytes) 250.00 Read IO Count
> 2485755609
> Free (MBytes) 22.03 Read Hit Count
> 371641860
> In Use (MBytes) 227.96 Read Hit
> Rate 14%
> Write IO Bypassing Cache 29108623 Write IO Count
> 738910797
> Files Retained 100 Read IO Bypassing
> Cache1954885777
>
> Write Bitmap (WBM) Memory Summary
> Local bitmap count: 285 Local bitmap memory usage (MB)
> 47.58
> Master bitmap count: 71 Master bitmap memory usage (MB)
> 12.17
> %SYSMAN-I-OUTPUT, command execution on node APP2
> System Memory Resources on 15-NOV-2007 08:46:58.96
>
> Virtual I/O Cache
> Total Size (MBytes) 375.00 Read IO Count
> 103022387
> Free (MBytes) 0.03 Read Hit Count
> 68204994
> In Use (MBytes) 374.96 Read Hit
> Rate 66%
> Write IO Bypassing Cache 2681158 Write IO Count
> 156859274
> Files Retained 100 Read IO Bypassing Cache
> 497166
>
> Write Bitmap (WBM) Memory Summary
> Local bitmap count: 285 Local bitmap memory usage (MB)
> 47.58
> Master bitmap count: 71 Master bitmap memory usage (MB)
> 12.17
> SYSMAN>
> ------------------------------------------------------------------------------
>
> Thanks, Ken
> --
> Ken & Ann Fairfield
> What: Ken dot And dot Ann
> Where: Gmail dot Com
Ken,
I would hazard a guess that the workload is different on the two
application servers. I hesitate to "prescribe" over the newsgroup, but
I would not generally look at reducing the size of cache that is fully
used. In fact, I would seriously consider resizing in the other
direction.
Depending on my flexibility, and some performance studying, I might
also very well consider re-organizing the file locations, to reduce
useless caching.
The comment in my original post was not a case of mis-speak. The
precise same argument holds for VIOC in this respect as XFC, so, while
the installation described in the original post uses VIOC, I wanted
those following this thread to understand that the phenomenology is
similar in both cases.
- Bob Gezelter, http://www.rlgsc.com
|
|
0
|
|
|
|
Reply
|
gezelter (537)
|
11/15/2007 7:12:27 PM
|
|
On Nov 15, 11:12 am, Bob Gezelter <gezel...@rlgsc.com> wrote:
[...]
> I would hazard a guess that the workload is different on the two
> application servers. I hesitate to "prescribe" over the newsgroup, but
> I would not generally look at reducing the size of cache that is fully
> used. In fact, I would seriously consider resizing in the other
> direction.
Thanks, Bob, that's exactly what I was thinking and what was
behind my questions, that the resizing proposed by the vendor
is in the wrong direction. I just wanted some confirmation that
I wasn't completely off-base on this one.
Thanks, Ken
--
Ken & Ann Fairfield
What: Ken dot And dot Ann
Where: Gmail dot Com
|
|
0
|
|
|
|
Reply
|
Ken.Fairfield (491)
|
11/15/2007 7:35:35 PM
|
|
On Nov 14, 10:46 pm, Ken Fairfield <K...@Napili.Fairfield.Home> wrote:
> For reasons best known to the vendor of the primary application, Cerner
> Millennium, we're using VIOC rather than XFC file caching.
Do NOT accept that without a modicum of a fight.
DO question. They probably do NOT know why this (still) is.
The softest question is probably
'for which versions of OpenVMS/Millennium should we use VIOC'
You also want to ask whether it is a (dated) recommendation, or a
requirement in order to have a supported configuration
The VIOC/XFC is transparant. If Cerner believes it is not transparant,
then they had better be able to articulate that. Without clear and
concise argument I would not accept the suggestion to use VIOC and
switch to XFC for it is far superior.
I suspect there was an early incident with XFC in some OpenVMS
version.
Surely that has been adressed by OpenVMS and folks need to move on.
fwiw,
Hein.
|
|
0
|
|
|
|
Reply
|
heinvandenheuvel2 (577)
|
11/15/2007 7:37:55 PM
|
|
On Nov 15, 11:37 am, Hein RMS van den Heuvel
<heinvandenheu...@gmail.com> wrote:
> On Nov 14, 10:46 pm, Ken Fairfield <K...@Napili.Fairfield.Home> wrote:
>
> > For reasons best known to the vendor of the primary application, Cerner
> > Millennium, we're using VIOC rather than XFC file caching.
>
> Do NOT accept that without a modicum of a fight.
> DO question. They probably do NOT know why this (still) is.
>
> The softest question is probably
> 'for which versions of OpenVMS/Millennium should we use VIOC'
> You also want to ask whether it is a (dated) recommendation, or a
> requirement in order to have a supported configuration
>
> The VIOC/XFC is transparant. If Cerner believes it is not transparant,
> then they had better be able to articulate that. Without clear and
> concise argument I would not accept the suggestion to use VIOC and
> switch to XFC for it is far superior.
>
> I suspect there was an early incident with XFC in some OpenVMS
> version.
> Surely that has been adressed by OpenVMS and folks need to move on.
Thanks for weighing in, Hein. :-) I doubt there was any incident. I
suspect
it's more like they got things working, "back in the day", with VIOC,
and
didn't want to bother with testing XFC. :-( I will ask if XFC is "a
problem"
with them support-wise. And I agree, if I can, and if we want to
bother
putting the resources into it (planning, testing, etc.), I'd like to
change to
XFC.
Thanks again, Ken
--
Ken & Ann Fairfield
What: Ken dot And dot Ann
Where: Gmail dot Com
|
|
0
|
|
|
|
Reply
|
Ken.Fairfield (491)
|
11/15/2007 8:56:35 PM
|
|
On Nov 15, 2:37 pm, Hein RMS van den Heuvel
<heinvandenheu...@gmail.com> wrote:
> On Nov 14, 10:46 pm, Ken Fairfield <K...@Napili.Fairfield.Home> wrote:
>
> > For reasons best known to the vendor of the primary application, Cerner
> > Millennium, we're using VIOC rather than XFC file caching.
>
> Do NOT accept that without a modicum of a fight.
> DO question. They probably do NOT know why this (still) is.
>
> The softest question is probably
> 'for which versions of OpenVMS/Millennium should we use VIOC'
> You also want to ask whether it is a (dated) recommendation, or a
> requirement in order to have a supported configuration
>
> The VIOC/XFC is transparant. If Cerner believes it is not transparant,
> then they had better be able to articulate that. Without clear and
> concise argument I would not accept the suggestion to use VIOC and
> switch to XFC for it is far superior.
So Hein, can you please tell us what makes XFC so much better than
VIOC? I'm sure you're right -- I would just like to know the details
behind it.
Thanks!
> I suspect there was an early incident with XFC in some OpenVMS
> version.
> Surely that has been adressed by OpenVMS and folks need to move on.
Yes, I clearly remember Hoff warning us to get very important ECO kits
when XFC first came out.
>
> fwiw,
> Hein.
AEF
|
|
0
|
|
|
|
Reply
|
spamsink2001 (3065)
|
11/15/2007 11:17:10 PM
|
|
|
7 Replies
38 Views
(page loaded in 0.951 seconds)
|