COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### SQLCI LOAD and BLOCKIN parameter

• Follow

```I need to load a large Enscribe data file into an SQL/MP table.  In
order to make the process efficient, I would like to use the BLOCKIN
parameter to retrieve the maximum number of records in a physical

The Enscribe record length is 265, while the BLOCK value is 4096.
Should I set the value of BLOCKIN to a multiple of 265 or a multiple
of 4096?  If it has to be a multiple of 4096, then the highest value I
can set it 28672 (32767 is the limit).  If I can set it to a multiple
of 265, then I can go as high as 32595.  Or, can I just set it at

I haven't been able to find out how this works internally, so I'm
hoping an expert in this group can point me in the right direction.
If there are other approaches I can take, I will appreciate learning

Unfortunately, I don't have access to Measure to test the benefits of
one method versus another.

Thanks.
```
 0
Reply ravijankar (6) 12/1/2009 10:07:13 PM

```Ravi wrote:
> I need to load a large Enscribe data file into an SQL/MP table.  In
> order to make the process efficient, I would like to use the BLOCKIN
> parameter to retrieve the maximum number of records in a physical
>
> The Enscribe record length is 265, while the BLOCK value is 4096.
> Should I set the value of BLOCKIN to a multiple of 265 or a multiple
> of 4096?  If it has to be a multiple of 4096, then the highest value I
> can set it 28672 (32767 is the limit).  If I can set it to a multiple
> of 265, then I can go as high as 32595.  Or, can I just set it at
> 32767 to receive maximum benefit?
>
> I haven't been able to find out how this works internally, so I'm
> hoping an expert in this group can point me in the right direction.
> If there are other approaches I can take, I will appreciate learning
>
> Unfortunately, I don't have access to Measure to test the benefits of
> one method versus another.
>
> Thanks.

If your input file is in an unstructured file, then using BLOCKIN and RECIN would be a good idea.  But from what you write, it sounds like the input file is an Enscribe structured file (entry-sequenced, relative, or key-sequenced).  In that case, as long as you do NOT use the SHARE option, LOAD will read the input file with 56KB blocks and unpack the records itself. (Well, actually, it uses an internal library for reading files that is not publised for customers to use.)

So you probably don't need to use BLOCKIN.  If my interpretation of your post was wrong, and your input file actually is an unstructured file, post again and I (or someone else here) will explain how to use BLOCKIN.

One other tip that sometimes is overlooked:  If the SQL table is key-sequenced, and if you know the input file's records are in order by the table's key, then you should include the SORTED option on the LOAD command.  That option tells LOAD that the input records do not have to be sorted, and that will save a lot of processing time.
```
 0

```In article <787f5fcb-cea4-4fac-81d4-8c743fced948@z41g2000yqz.googlegroups.com>, Ravi <ravijankar@gmail.com> wrote:
>I need to load a large Enscribe data file into an SQL/MP table.  In
>order to make the process efficient, I would like to use the BLOCKIN
>parameter to retrieve the maximum number of records in a physical
>
>The Enscribe record length is 265, while the BLOCK value is 4096.
>Should I set the value of BLOCKIN to a multiple of 265 or a multiple
>of 4096?  If it has to be a multiple of 4096, then the highest value I
>can set it 28672 (32767 is the limit).  If I can set it to a multiple
>of 265, then I can go as high as 32595.  Or, can I just set it at

Assuming this is a structured file, the answer is "None of the above."

For structured files, if you use either BLOCKIN or RECIN, they must
_exactly_match_ the physical block size and physical record size,
respectively, of the input file -- so in this case, if you use BLOCKIN at all,
you *must* set it to 4096. There are no other options.

In any event, if your input file is anything other than an unstructured file,
it's completely unnecessary to specify either RECIN or BLOCKIN. Both FUP LOAD
and block sizes from the file label, and are blindingly fast. Nothing you
tweak in the RECIN/BLOCKIN settings is going to make it go any faster; the
best you could hope for is to avoid slowing it down.

If the file is an unstructured file, then set RECIN to the actual record size
(265) and BLOCKIN to the largest multiple of RECIN that is <= 32768.
```
 0

```>
> If your input file is in an unstructured file, then using BLOCKIN and REC=
IN would be a good idea. =A0But from what you write, it sounds like the inp=
ut file is an Enscribe structured file (entry-sequenced, relative, or key-s=
equenced). =A0In that case, as long as you do NOT use the SHARE option, LOA=
D will read the input file with 56KB blocks and unpack the records itself. =
(Well, actually, it uses an internal library for reading files that is not =
publised for customers to use.)
>
> So you probably don't need to use BLOCKIN. =A0If my interpretation of you=
r post was wrong, and your input file actually is an unstructured file, pos=
t again and I (or someone else here) will explain how to use BLOCKIN.
>
> One other tip that sometimes is overlooked: =A0If the SQL table is key-se=
quenced, and if you know the input file's records are in order by the table=
's key, then you should include the SORTED option on the LOAD command. =A0T=
hat option tells LOAD that the input records do not have to be sorted, and =
that will save a lot of processing time.

Thank you, Keith.  Your explanation is much clearer than the manual.
It's not clear that the BLOCKIN and RECIN apply only to unstructured
file.  This is indeed a large key-sequenced file.  I found the SORTED
option shortly after I posted this.
```
 0

```On Dec 1, 10:17=A0pm, spamb...@milmac.com (Doug Miller) wrote:

> Assuming this is a structured file, the answer is "None of the above."
>
> For structured files, if you use either BLOCKIN or RECIN, they must
> _exactly_match_ the physical block size and physical record size,
> respectively, of the input file -- so in this case, if you use BLOCKIN at=
all,
> you *must* set it to 4096. There are no other options.
>
> In any event, if your input file is anything other than an unstructured f=
ile,
> it's completely unnecessary to specify either RECIN or BLOCKIN. Both FUP =
> and SQLCI LOAD use large-transfer mode (56KB) reads, determine the record
> and block sizes from the file label, and are blindingly fast. Nothing you
> tweak in the RECIN/BLOCKIN settings is going to make it go any faster; th=
e
> best you could hope for is to avoid slowing it down.
>
> If the file is an unstructured file, then set RECIN to the actual record =
size
> (265) and BLOCKIN to the largest multiple of RECIN that is <=3D 32768.

Thank you, Doug.  As I said to Keith, this isn't at all clear in the
manual.  I've been concentrating on the SQL/MP Reference Manual and
the SQL/MP Installation and Management Guide.  Is there another manual
I should be looking at?

I have one other question.  The source file I'm working with has three
partitioned files is as follows:

"1. Start an SQLCI process for each partition. One way to do this is
to start each SQLCI process in the processor associated with the
partition to be loaded.  Another way is to start SQLCI processes
associated with the data sources for the load command."  (ch. 8)

So, assuming I decide to partition my target table, I interpret this
to mean that I should run SQLCI in the same processor as the disk
process of either the source file or target table.  Is that correct?
A sysadmin at my location told me this is a bad idea, but I don't
understand why this is.  This will have to run with the SHARE option,
as the file is open in production at all times.  (The target table
will not be in production.  It's only temporary.)  Also, would it be a
bad idea to put a table partition on the same disk as the source file
file?  The sysadmin suggested that if the LOAD is running in the same
CPU as the disk process, then users would be locked out from the file
because the CPU would be hogged by the LOAD.  I don't understand why
that would happen, if I set the LOAD process priority very low.

FYI, our test application resides on virtual disks, while the
production is on "real" disks.  This means I can't really test this in
test.

```
 0

```Ravi wrote:
> On Dec 1, 10:17 pm, spamb...@milmac.com (Doug Miller) wrote:
>
>
>>Assuming this is a structured file, the answer is "None of the above."
>>
>>For structured files, if you use either BLOCKIN or RECIN, they must
>>_exactly_match_ the physical block size and physical record size,
>>respectively, of the input file -- so in this case, if you use BLOCKIN at all,
>>you *must* set it to 4096. There are no other options.
>>
>>In any event, if your input file is anything other than an unstructured file,
>>it's completely unnecessary to specify either RECIN or BLOCKIN. Both FUP LOAD
>>and block sizes from the file label, and are blindingly fast. Nothing you
>>tweak in the RECIN/BLOCKIN settings is going to make it go any faster; the
>>best you could hope for is to avoid slowing it down.
>>
>>If the file is an unstructured file, then set RECIN to the actual record size
>>(265) and BLOCKIN to the largest multiple of RECIN that is <= 32768.
>
>
> Thank you, Doug.  As I said to Keith, this isn't at all clear in the
> manual.  I've been concentrating on the SQL/MP Reference Manual and
> the SQL/MP Installation and Management Guide.  Is there another manual
> I should be looking at?
>
> I have one other question.  The source file I'm working with has three
> partitioned files is as follows:
>
> "1. Start an SQLCI process for each partition. One way to do this is
> to start each SQLCI process in the processor associated with the
> partition to be loaded.  Another way is to start SQLCI processes
> associated with the data sources for the load command."  (ch. 8)
>
> So, assuming I decide to partition my target table, I interpret this
> to mean that I should run SQLCI in the same processor as the disk
> process of either the source file or target table.  Is that correct?
> A sysadmin at my location told me this is a bad idea, but I don't
> understand why this is.  This will have to run with the SHARE option,
> as the file is open in production at all times.  (The target table
> will not be in production.  It's only temporary.)  Also, would it be a
> bad idea to put a table partition on the same disk as the source file
> partition?  Could that affect users that need access to the source
> file?  The sysadmin suggested that if the LOAD is running in the same
> CPU as the disk process, then users would be locked out from the file
> because the CPU would be hogged by the LOAD.  I don't understand why
> that would happen, if I set the LOAD process priority very low.
>
> FYI, our test application resides on virtual disks, while the
> production is on "real" disks.  This means I can't really test this in
> test.
>

For SQLCI commands that are similar to FUP commands, sometimes it helps to look at the description of the command in the FUP Reference Manual.  Sometimes the explanation in the FUP manual is a little easier to follow because it has only Enscribe files to deal with.  Of course, when there is any difference in the explanations, the one in the SQL/MP Reference Manual is the one to follow.

In the case of the LOAD command, many of its options are the same as those of the COPY command, so it is also good to read the description of the COPY command (in full) to be sure you see everything in the manual that might be relevant.

The point of the suggestion to run SQLCI in the same CPU as either the source or target of the LOAD is to slightly improve performance by avoiding inter-processor messages for some of the disc accesses.  Your system manager is correct to raise some caution about that if there is other important work running in the system at the same time as the LOAD will run.  In some cases, a LOAD process can require a significant amount of CPU processing, so you should consider other work going on in the system so as to place the LOAD processing in a CPU that has extra capacity.

manager knows enough about load balancing to give you reasonably good guidance, unless he does not have a lot of experience at it.

The amount of CPU processing a LOAD needs is not easy to predict, since it depends on how much data manipulation the LOAD must do to convert the input data records into the output data records.  I usually suggest that you get MEASURE data of some tests with files/tables with the same description as the live ones and use the same command options as you would in the real case.  I notice that you said you did not have access to MEASURE.  That is unfortunate.  If you have access to MEASURE on your test system, it would be useful to do tests on the test system.  The fact that it uses virtual discs will not make much difference in the performance.  The virtual disc software generally is only involved in opening a file (to map the virtual name to a physical file).  The actual reads and writes go directly to the physical disc that holds the file.  FUP INFO DETAIL will let you determine which physical disc a file is on, so you can know what to look at with MEASURE, and you can even c
ontrol which physical disc a virtual file is placed on by the PHYSVOL option in CREATE commands.  If the CPU types in the test system are significantly different than the CPUs in the production system, that can make it harder to apply what you learn from measuring the test system to the production work, but it still can help.  And if you cannot use MEASURE, even on the test system, make use of whatever performance monitoring tools you are able to use when running the tests.

Partitioning of the table you create is completely independent of the partitioning of the source file.  It can have more, fewer, or the same number of partitions, and the partition keys do not have to match the source file.  If you decide to use parallel LOADs of the partitions of the table, it can be a little easier to set up the LOAD commands if the partitions of the table match the partitions of the source file, but it is possible to do parallel loading when the partitions do not match.

A possible advantage of making the table be partitioned is that you could run LOADs of each of the partitions in parallel to reduce the total time it would take to load the table.  However, that would put more demands on the system during the LOAD, which could impact the production work more than if you do a single LOAD of the whole table without parallelism.  You also should consider how you are going to use the table once it is created.  If the work you will do with it might perform better with a partitioned table so some queries can utilize parallel processing, it might be good to partition the table.  Of course, if you are going to use the table while other production work is going on, parallelizing the work on the table could make for more interference with the production work, so take that into account.

Note that partitioning the table does NOT require that you do parallel LOADs of the partitions.  You can LOAD a partitioned table with one LOAD command.  That just doesn't do any parallelism in the LOAD.

Whether to place the table on the same discs as the source file probably is most affected by how busy the production work keeps those discs (and, of course, whether there is enough space available on them).  My guess from what you have said is that it probably would be best to put the table on different discs than the source file, but your system manager probably is in a better position than I am to say where they should go.  Considering only how to make the LOAD run the fastest, it doesn't make a lot of difference whether the source and target are on the same disc or different discs.  Different discs probably would be a little better, but not a lot better.  But in your case, it seems that the effect on the production work is more important than the performance of the LOAD.

s, ask for more directions about that.  Do not forget to use the SORTED option when the input data is in the proper order.  You only have to tell LOAD where to start reading when the PARTONLY option is used.  With SORTED and PARTONLY together, LOAD knows to stop reading when it encounters the first input record whose key falls past the end of the target partition.

I assume you know that doing a LOAD while the source is being updated can easily result in the target being an inconsistent copy of the data.  If that is not important for your situation, that's okay -- I just wanted to be sure you realize that.

Using the SHARE option will make the LOAD run much slower than it would run without using the SHARE option, since SHARE causes LOAD to read the source one record at a time rather than reading large chunks of the source and deblocking the records itself.  But if you must do the LOAD while other work is going on, SHARE is probably required, unless the other work has the source open in read-only mode.
```
 0

```Thanks for all the valuable information.  This helps me a lot.
```
 0

6 Replies
518 Views

Similiar Articles:

7/24/2012 11:38:21 AM