How to determine wait time?

  • Follow


We have a stored procedure that seems to have poor performance. In
order to find out what is happening we
have used the monitor function MON_GET_PKG_CACHE_STMT_DETAILS to find
out the details. And what puzzles us is that there seems to be a lot
of wait time, but we are not able to find out where the wait time is
spent.

An example:
total_act_time = 5085
total_act_wait_time: 5051

So it seems to us that the procedure spends 99% of the time waiting.
But when we add up all the other
output elements concerned with wait it is like this:

fcm_tq_recv_wait_time = fcm_message_recv_wait_time =
fcm_tq_send_wait_time = fcm_message_send_wait_time =
lock_wait_time = log_buffer_wait_time = log_disk_wait_time =
audit_file_write_wait_time =
audit_subsystem_wait_time = diaglog_write_wait_time =
fcm_send_wait_time = fcm_recv_wait_time = 0

All off them are 0.
So where do the value for total_act_wait_time come from? And how can
we find out where the time is spent?

We are using DB2 9.7 on z/Linux.

Regards
Odd B Andersen, EDB ErgoGroup AS, Oslo, Norway
0
Reply oddbande 3/21/2011 12:19:47 PM

Comments from backstage:

Not all the wait times are being added below (missing IOs) so not 
necessarily an error. I don't think he's using an appropriate interface 
tho to diagnose stored procedure slowdown. The metrics reported for a 
given section in the package cache only describe the processing of that 
section - they are not the aggregates across all nested sections 
(assuming the procedure he's investigating is issuing SQL statements).

This topic in the info center (in particular Figure 3) describes the 
hierarchy of time-spent metrics available for statements, and also 
indicates that the wait times reported do not include the wait times of 
nested statements (see the final "Other" bullet point in the table):

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.mon.doc/doc/c0055434.html


Something like the following topic in the info center may be more useful.

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.admin.wlm.doc/doc/c0058761.html

This topic describes using an activity event monitor to capture 
information about each stament a procedure executes and then aggregates 
elapsed time across all children. Using this you could first identify 
which statement(s) are taking the most time in the procedure. Then could 
drill down on a given statement by feeding the statement metrics (in the 
details_xml column of the activity table in the activity event monitor) 
into the MON_FORMAT_XML_TIMES_BY_ROW table function to see a breakdown 
of where that statement is spending it's time (could also choose to 
explain the statements that are taking the most time).
---------
Also if you follow teh link in my footer (Oracle2DB2Wiki) you will find 
a profiler which you may find useful.

-- 
Serge Rielau
SQL Architect DB2 for LUW, IBM Toronto Lab
Blog:    tinyurl.com/SQLTips4DB2
Wiki:    tinyurl.com/Oracle2DB2Wiki
Twitter: srielau

0
Reply Serge 3/21/2011 9:40:55 PM


On Mar 21, 10:40=A0pm, Serge Rielau <srie...@ca.ibm.com> wrote:
> Comments from backstage:
>
> Not all the wait times are being added below (missing IOs) so not
> necessarily an error. I don't think he's using an appropriate interface
> tho to diagnose stored procedure slowdown. The metrics reported for a
> given section in the package cache only describe the processing of that
> section - they are not the aggregates across all nested sections
> (assuming the procedure he's investigating is issuing SQL statements).
>
> This topic in the info center (in particular Figure 3) describes the
> hierarchy of time-spent metrics available for statements, and also
> indicates that the wait times reported do not include the wait times of
> nested statements (see the final "Other" bullet point in the table):
>
> http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=3D..=
..
>
> Something like the following topic in the info center may be more useful.
>
> http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=3D..=
..
>
> This topic describes using an activity event monitor to capture
> information about each stament a procedure executes and then aggregates
> elapsed time across all children. Using this you could first identify
> which statement(s) are taking the most time in the procedure. Then could
> drill down on a given statement by feeding the statement metrics (in the
> details_xml column of the activity table in the activity event monitor)
> into the MON_FORMAT_XML_TIMES_BY_ROW table function to see a breakdown
> of where that statement is spending it's time (could also choose to
> explain the statements that are taking the most time).
> ---------
> Also if you follow teh link in my footer (Oracle2DB2Wiki) you will find
> a profiler which you may find useful.
>
> --
> Serge Rielau
> SQL Architect DB2 for LUW, IBM Toronto Lab
> Blog: =A0 =A0tinyurl.com/SQLTips4DB2
> Wiki: =A0 =A0tinyurl.com/Oracle2DB2Wiki
> Twitter: srielau

Thank you!
A very useful response (I hope :-))
I will look into this further and maybe find out more.

I also forgot to mention that this is not particular for one stored
procedure - all of them wriiten in SQL PL. There seems to be a very
high precentage wait time for all of them.

Regards
Odd B Andersen
0
Reply oddbande 3/22/2011 8:25:26 AM

On 22 Mar, 09:25, oddbande <odd.ander...@ergo.no> wrote:
> On Mar 21, 10:40=A0pm, Serge Rielau <srie...@ca.ibm.com> wrote:
>
>
>
>
>
> > Comments from backstage:
>
> > Not all the wait times are being added below (missing IOs) so not
> > necessarily an error. I don't think he's using an appropriate interface
> > tho to diagnose stored procedure slowdown. The metrics reported for a
> > given section in the package cache only describe the processing of that
> > section - they are not the aggregates across all nested sections
> > (assuming the procedure he's investigating is issuing SQL statements).
>
> > This topic in the info center (in particular Figure 3) describes the
> > hierarchy of time-spent metrics available for statements, and also
> > indicates that the wait times reported do not include the wait times of
> > nested statements (see the final "Other" bullet point in the table):
>
> >http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=3D.=
...
>
> > Something like the following topic in the info center may be more usefu=
l.
>
> >http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=3D.=
...
>
> > This topic describes using an activity event monitor to capture
> > information about each stament a procedure executes and then aggregates
> > elapsed time across all children. Using this you could first identify
> > which statement(s) are taking the most time in the procedure. Then coul=
d
> > drill down on a given statement by feeding the statement metrics (in th=
e
> > details_xml column of the activity table in the activity event monitor)
> > into the MON_FORMAT_XML_TIMES_BY_ROW table function to see a breakdown
> > of where that statement is spending it's time (could also choose to
> > explain the statements that are taking the most time).
> > ---------
> > Also if you follow teh link in my footer (Oracle2DB2Wiki) you will find
> > a profiler which you may find useful.
>
> > --
> > Serge Rielau
> > SQL Architect DB2 for LUW, IBM Toronto Lab
> > Blog: =A0 =A0tinyurl.com/SQLTips4DB2
> > Wiki: =A0 =A0tinyurl.com/Oracle2DB2Wiki
> > Twitter: srielau
>
> Thank you!
> A very useful response (I hope :-))
> I will look into this further and maybe find out more.
>
> I also forgot to mention that this is not particular for one stored
> procedure - all of them wriiten in SQL PL. There seems to be a very
> high precentage wait time for all of them.
>
> Regards
> Odd B Andersen

I now have been told that pool_read/write_time was included in
wait_time. Sorry for not remembering that :-)

Most of the wait_time is just pool_read_time (97%). That explains why
wait_time is so high, but then I ask myself - why is pool_read_time so
high? I assume this is syncron I/O. And I have to look into it more in
detail now if there are some settings that need to be changed.

We have turned on AUTOMATIC wherever possible, including all
bufferpools, and I assume that this is alright. But I wonder if one
could have too many bufferpools, and if that could result in too much
syncron I/O? We have 40 bufferpools, and 454 user-defined tablespaces.
And if there is a limited amount of memory (12 GB in total) each
bufferpool is fairly small and there will be almost continuous reading-
in and writing-out of data. The database is 315 GB (test) and 1,3 TB
(production).

Just some thoughts on my part :-) But what I would like is some
suggestions on where to start looking?

Regards
Odd B Andersen
0
Reply oddbande 3/24/2011 6:55:23 PM

On Mar 24, 11:55=A0am, oddbande <odd.ander...@ergo.no> wrote:


> I now have been told that pool_read/write_time was included in
> wait_time. Sorry for not remembering that :-)
>
> Most of the wait_time is just pool_read_time (97%). That explains why
> wait_time is so high, but then I ask myself - why is pool_read_time so
> high? I assume this is syncron I/O. And I have to look into it more in
> detail now if there are some settings that need to be changed.
>
> We have turned on AUTOMATIC wherever possible, including all
> bufferpools, and I assume that this is alright. But I wonder if one
> could have too many bufferpools, and if that could result in too much
> syncron I/O? We have 40 bufferpools, and 454 user-defined tablespaces.
> And if there is a limited amount of memory (12 GB in total) each
> bufferpool is fairly small and there will be almost continuous reading-
> in and writing-out of data. The database is 315 GB (test) and 1,3 TB
> (production).
>
> Just some thoughts on my part :-) But what I would like is some
> suggestions on where to start looking?

You're spending 5.051 seconds waiting for physical I/O out of a total
of 5.085 seconds.

What are your average physical page read times?  What is the
bufferpool hit ratio for the bufferpools affected by this stored
procedure?

I would hope that physical IOs are being serviced in <=3D 2 ms =96 which
would mean that you're doing ~2,500 physical IOs.  If you have higher
IO service times (> 8ms) perhaps there is something wrong with the
disk subsystem?

Why do you have so many bufferpools?  40 sounds a little crazy to me
-- I tend to have 1 bufferpool per database page size, and make
exceptions on that depending on the workload.  It's probably
reasonable to have 5-7 bufferpools, but I have a hard time imagining a
situation where 40 is necessary.  As you say, you end up with a lot of
small bufferpools, which may be more trouble than it's worth.  STMM
won't be your friend in a situation like this, either.




0
Reply Ian 3/25/2011 1:17:03 AM

4 Replies
251 Views

(page loaded in 0.092 seconds)

5/22/2013 3:24:38 AM


Reply: