How to increase write speed to local hard drive?

  • Follow


Hi there,
I need to write about 4,000 files (the total size is just 500MB) to local 
ufs partition.
But it takes about 35 minutes.
To delete them, it still takes about 20 minutes.
Is there a way to tune the local I/O to increase the write speed?
BTW, it's a Solaris 8 running on a Sun Fire server with SCSI hard drive in 
standard configuration.
Thanks in advance,
Ross 


0
Reply Ross 6/23/2006 2:01:23 PM

Ross wrote:
> Hi there,
> I need to write about 4,000 files (the total size is just 500MB) to local
> ufs partition.
> But it takes about 35 minutes.
> To delete them, it still takes about 20 minutes.
> Is there a way to tune the local I/O to increase the write speed?
> BTW, it's a Solaris 8 running on a Sun Fire server with SCSI hard drive in
> standard configuration.
> Thanks in advance,
> Ross

Since it sounds like you are not keeping the files perhaps a tmpfs
filesystem
would be quicker?

..

0
Reply greek_philosophizer 6/23/2006 2:09:41 PM


Thanks Greek.
But I need to keep them for a while after reboot.
Are there some other things I can try (like enabling DMA)?
Thanks again,
Ross

<greek_philosophizer@hotmail.com> wrote in message 
news:1151071781.470831.172840@r2g2000cwb.googlegroups.com...
>
> Ross wrote:
>> Hi there,
>> I need to write about 4,000 files (the total size is just 500MB) to local
>> ufs partition.
>> But it takes about 35 minutes.
>> To delete them, it still takes about 20 minutes.
>> Is there a way to tune the local I/O to increase the write speed?
>> BTW, it's a Solaris 8 running on a Sun Fire server with SCSI hard drive 
>> in
>> standard configuration.
>> Thanks in advance,
>> Ross
>
> Since it sounds like you are not keeping the files perhaps a tmpfs
> filesystem
> would be quicker?
>
> .
> 


0
Reply Ross 6/23/2006 3:22:00 PM

On Fri, 23 Jun 2006, Ross wrote:

Please don't top post.

> Thanks Greek.
> But I need to keep them for a while after reboot.
> Are there some other things I can try (like enabling DMA)?

Have you tried mounting the file system with the "noatime" option,
and/or enabled UFS logging?

You might also see some imporvment if you upgraded from Solaris 8.

-- 
Rich Teer, SCNA, SCSA, OpenSolaris CAB member

President,
Rite Online Inc.

Voice: +1 (250) 979-1638
URL: http://www.rite-group.com/rich
0
Reply Rich 6/23/2006 4:18:00 PM

Thanks Rite!
With noatime and logging options, the write time decreased from 35 minutes 
to 20 minutes!
But it's still not good enough because for the same files on Linux machine 
with IDE/Ext3 partition, it just takes 6 minutes!
Any more idea will be greatly appreciated,
Ross

"Rich Teer" <rich.teer@rite-group.com> wrote in message 
news:Pine.SOL.4.64.0606230915500.4149@marrakesh...
> On Fri, 23 Jun 2006, Ross wrote:
>
> Please don't top post.
>
>> Thanks Greek.
>> But I need to keep them for a while after reboot.
>> Are there some other things I can try (like enabling DMA)?
>
> Have you tried mounting the file system with the "noatime" option,
> and/or enabled UFS logging?
>
> You might also see some imporvment if you upgraded from Solaris 8.
>
> -- 
> Rich Teer, SCNA, SCSA, OpenSolaris CAB member
>
> President,
> Rite Online Inc.
>
> Voice: +1 (250) 979-1638
> URL: http://www.rite-group.com/rich 


0
Reply Ross 6/23/2006 8:58:28 PM

"Ross" <nospam@ross.com> writes:

>Thanks Rite!
>With noatime and logging options, the write time decreased from 35 minutes 
>to 20 minutes!
>But it's still not good enough because for the same files on Linux machine 
>with IDE/Ext3 partition, it just takes 6 minutes!
>Any more idea will be greatly appreciated,


Set the following kernel variable in /etc/system:

	set ufs:ufs_WRITES = 0


On a *quiescent* system, you can set this at runtime using:

	echo ufs_WRITES/0 | adb -wk

This disable the UFS write throttling feature.

Casper
0
Reply Casper 6/24/2006 8:54:49 AM

Ross wrote:
> Thanks Rite!
> With noatime and logging options, the write time decreased from 35 minutes
> to 20 minutes!
> But it's still not good enough because for the same files on Linux machine
> with IDE/Ext3 partition, it just takes 6 minutes!
> Any more idea will be greatly appreciated,
> Ross
>

Ross,  The usual reason for a Linux PC to be a lot faster is that the
IDE  disk Write Cache   is enabled.  That  means that  the data is not
written to disk  until some time later.  If there was a power faliure
all the  8 MB of data  in the write cache  would be lost.

So you are tradeing data security  for speed.

To get a  a more  comparable base line  you should turn
the PC IDE disk write cache off  an remeasure the time
it takes to do the operations on linux.

One of the high points  in the feature list from solaris 8  to solaris
9
was UFS speed improvment although many of these imrovments was
backported to the later revisions  of Solaris8
the  02/02   ( feb, 2002 )  had many of them.

//Lars

0
Reply tunla 6/24/2006 9:02:06 PM

tunla <lars.tunkrans@bredband.net> wrote:
> 
> Ross wrote:
>> Thanks Rite!
>> With noatime and logging options, the write time decreased from 35 minutes
>> to 20 minutes!
>> But it's still not good enough because for the same files on Linux machine
>> with IDE/Ext3 partition, it just takes 6 minutes!
>> Any more idea will be greatly appreciated,
>> Ross
>>
> 
> Ross,  The usual reason for a Linux PC to be a lot faster is that the
> IDE  disk Write Cache   is enabled.  That  means that  the data is not
> written to disk  until some time later.  If there was a power faliure
> all the  8 MB of data  in the write cache  would be lost.

Solaris also *always* enables the write cache for IDE disks - regardless
of the original setting:

http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c#130
http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c#2919

If you want to forcibly turn write cache off you have to put:

set ata:ata_write_cache = -1

in /etc/system


-- 
Daniel
0
Reply Daniel 6/24/2006 11:58:19 PM

Daniel Rock wrote:
> tunla <lars.tunkrans@bredband.net> wrote:
>> Ross wrote:
>>> Thanks Rite!
>>> With noatime and logging options, the write time decreased from 35 minutes
>>> to 20 minutes!
>>> But it's still not good enough because for the same files on Linux machine
>>> with IDE/Ext3 partition, it just takes 6 minutes!
>>> Any more idea will be greatly appreciated,
>>> Ross
>>>
>> Ross,  The usual reason for a Linux PC to be a lot faster is that the
>> IDE  disk Write Cache   is enabled.  That  means that  the data is not
>> written to disk  until some time later.  If there was a power faliure
>> all the  8 MB of data  in the write cache  would be lost.
> 
> Solaris also *always* enables the write cache for IDE disks - regardless
> of the original setting:
> 
> http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c#130
> http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c#2919
> 
> If you want to forcibly turn write cache off you have to put:
> 
> set ata:ata_write_cache = -1
> 
> in /etc/system
> 
> 
Though it would be a little risky, and I'm not recommending it,
I believe that using Casper's "fastfs" would improve performance.
Google <solaris fastfs>.

Bob
0
Reply Robert 6/25/2006 12:17:52 AM

"Daniel Rock" <v200625@deadcafe.de> writes:

>Solaris also *always* enables the write cache for IDE disks - regardless
>of the original setting:

Not in Solaris 8, 9 or 10.  Not on Solaris SPARC, either.

In those release it just keeps the default (which is off for most
Sun branded disks and ON for all other disks)

>http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/io/dktp/controller/
ata/ata_disk.c#130
>http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/io/dktp/controller/
ata/ata_disk.c#2919

>If you want to forcibly turn write cache off you have to put:

>set ata:ata_write_cache = -1

>in /etc/system

Which has no effect in Solaris 8 or 9 or 10 (except perhaps some
updates) or on SPARC.

ZFS will now enable write caches if it owns the whole disk and it
will flush the cache at the appropriate moments.

Casper
0
Reply Casper 6/25/2006 9:56:00 AM

Daniel Rock wrote:
> tunla <lars.tunkrans@bredband.net> wrote:
> >
> > Ross wrote:
> >> Thanks Rite!
> >> With noatime and logging options, the write time decreased from 35 minutes
> >> to 20 minutes!
> >> But it's still not good enough because for the same files on Linux machine
> >> with IDE/Ext3 partition, it just takes 6 minutes!
> >> Any more idea will be greatly appreciated,
> >> Ross
> >>
> >
> > Ross,  The usual reason for a Linux PC to be a lot faster is that the
> > IDE  disk Write Cache   is enabled.  That  means that  the data is not
> > written to disk  until some time later.  If there was a power faliure
> > all the  8 MB of data  in the write cache  would be lost.
>
> Solaris also *always* enables the write cache for IDE disks - regardless
> of the original setting:
>
> http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c#130
> http://cvs.opensolaris.org/source/xref/on/usr/src/uts/intel/io/dktp/controller/ata/ata_disk.c#2919
>
> If you want to forcibly turn write cache off you have to put:
>
> set ata:ata_write_cache = -1
>
> in /etc/system
>
>
> --
> Daniel


The OP  is comparing   the  UFS performans on a Solaris 8 Sparc based
SCSI disk
with  the performance of EXT3  on a LINUX  PC  with IDE disks.

LInux wins here because data security is sacrificed to obtain speed on
IDE disks.

The OP will have to decide wheter he can live with loosing the  8 MB of
data in the
write cache which  LInux belives it has written to the disk but hasen't
 -
 in case of a power faliure, or if he cant live with it.
If he can't live with it, he'll have to turn off the LINUX PC IDE disk
write cache,
then his performance on linux will drop drastically.

 A system that have high requirements on data integrity/security cant
have
disks writecache active. since you are basically lying to the
operatingsystem
and saying that data has been written to disk when it hasen't - yet .

Then again if you have a high reliabilty  disksystem  like a Hitatchi
or EMC
SAN device which has its own internal battery backup you do trust the
disksystem to  cache and handle the data as soon as you have written it
away.

The OP's  ( and most users of PC based operating systems  )
 problem,  is that  the write cache on a small
IDE disk is usually not battery backuped - and will fail  with very
small power
fluctuations.

0
Reply tunla 6/25/2006 12:45:17 PM

With 8 MB it seems that you are talking about the harddrive cache. This
cache will not be longer used then a few ms until one rotation is done.
IDE drives normally only use this for some cylinder read ahead, write
behind caching.

SCSI are more stupid here (called clever in the past where the OS were
stupid). But normally the time is very low  - measured in milliseconds.

0
Reply llothar 6/25/2006 2:54:47 PM

Casper H.S. Dik wrote:
> ZFS will now enable write caches if it owns the whole disk and it
> will flush the cache at the appropriate moments.

hm let's assume solaris support's ZFS boot in the near future, i.e. you
can install right to a zfs /, /usr, /var etc.; what about swap? you need
a partition/slice for swap so zfs on such a system will never own the
whole disk, so no whole disk = no write cache = lower performance?

or are there plans to intergrate swap handling into zfs somehow?
encrypted swap with zfs-crypto, sounds nice, don't you think? ;)
0
Reply ISO 6/25/2006 6:02:35 PM

llothar wrote:
> With 8 MB it seems that you are talking about the harddrive cache. This
> cache will not be longer used then a few ms until one rotation is done.
> IDE drives normally only use this for some cylinder read ahead, write
> behind caching.
>
> SCSI are more stupid here (called clever in the past where the OS were
> stupid). But normally the time is very low  - measured in milliseconds.

And milliseconds count, in fact: it means that every time the system
does a write it needs to wait until the write is safely on disk if it
wants to be safe, which perhaps takes several ms, which might easily be
more than the time to actually write the data.

Of course, you can just elect not to be safe, but 8MB is quite a lot to
lose, even if you only lose it occasionally.  And with large numbers of
spindles you don't lose it only occasionally.

SCSI actually is much smarter, because the tagged-queuing stuff lets
the system have several writes in flight at once to the disk, with the
disk notifying when they complete, so it can often hide the latency,
without needing to rely on an unreliable write-cache (it's more-or-less
the same trick that processors do to hide latency).  I think that SATA
disks (and maybe just plain ATA ones) can do this now too.

--tim

0
Reply Tim 6/25/2006 7:34:08 PM

In article <e7mmns$c0d$1@news01.versatel.de>,
	Stefan Kr�ger <skrueger@meinberlikomm.de> writes:
> 
> hm let's assume solaris support's ZFS boot in the near future, i.e. you
> can install right to a zfs /, /usr, /var etc.; what about swap? you need
> a partition/slice for swap so zfs on such a system will never own the
> whole disk, so no whole disk = no write cache = lower performance?
> 
> or are there plans to intergrate swap handling into zfs somehow?
> encrypted swap with zfs-crypto, sounds nice, don't you think? ;)

ZFS supports creation of emulated volumes, which you can swap to.

-- 
Andrew Gabriel
0
Reply andrew 6/25/2006 7:50:31 PM

Casper H.S. Dik <Casper.Dik@sun.com> wrote:
> "Daniel Rock" <v200625@deadcafe.de> writes:
>>If you want to forcibly turn write cache off you have to put:
>> 
>>set ata:ata_write_cache = -1
>> 
>>in /etc/system
> 
> Which has no effect in Solaris 8 or 9 or 10 (except perhaps some
> updates) or on SPARC.

Hmm, the following file:

server:/mnt/platform/i86pc/kernel/drv# what ata
ata:
        SunOS 5.10 Generic_118844-26 November 2005

defines the variable and the function. Maybe they are not connected in the
source tree buy they definitely are there:

server:/mnt/platform/i86pc/kernel/drv# nm ata | grep write_cache
[164]   |     29471|     233|FUNC |LOCL |0    |1      |ata_set_write_cache
[444]   |       624|       4|OBJT |GLOB |0    |4      |ata_write_cache

This is extracted from the x86.miniroot (S10U1).


-- 
Daniel
0
Reply Daniel 6/25/2006 9:41:44 PM

llothar wrote:
> With 8 MB it seems that you are talking about the harddrive cache. This
> cache will not be longer used then a few ms until one rotation is done.
> IDE drives normally only use this for some cylinder read ahead, write
> behind caching.
>
> SCSI are more stupid here (called clever in the past where the OS were
> stupid). But normally the time is very low  - measured in milliseconds.

 Sigh......

  Read ahead is fine  never has been any problems with that.
Write behind Caching  is the potential black hole.
During the Milliseconds when data is in the cache , it is at risk.
and if you have a busy system  SOME DATA IS almost ALWAYS
in the CACHE.
hence it does not matter if a singel record is just a few msec in the
cache.

THE POINT IS that WHENEVER you have a power failure you WILL  loose
the data thats in the cache at that time.

If we are talking a logging file system like EXT3  this is the
difference between a
log replay restart and a FULL FSCK restart. not counting the data that
has vanished into the great bit bucket.

This is why I say that data security is sacrificed for speed in cheep
pc solutions.

Unless this cache is battery backed up ofcourse as in BIG SAN Arrays.
But we are discussing  $50   IDE disks  and they dont have battery
backed up write caches. And they are very seldom on UPS powersupply.
because the UPS costs more than the PC.



   //Lars

0
Reply tunla 6/25/2006 10:32:27 PM

tunla wrote:
> But we are discussing  $50   IDE disks  and they dont have battery
> backed up write caches. And they are very seldom on UPS powersupply.
> because the UPS costs more than the PC.

???
You should change your Hardware shop.
I have a small UPS with a batterie that keeps my headless U10 Server up
for one hour. This costs 60 Euro. It has a serial port interface that
allows a clean shutdown.

On my workstation with power consuming cpu, i have the same UPS
version, which works only a few minutes but i do a suspend to disk
immediately on power failure, so it just needs up a few seconds.

If you have sensible data you should always have at least such a low
cost solution.

0
Reply llothar 6/26/2006 5:21:41 AM

> And milliseconds count, in fact: it means that every time the system
> does a write it needs to wait until the write is safely on disk if it
> wants to be safe, which perhaps takes several ms, which might easily be
> more than the time to actually write the data.

If PC's were designed well it wouldn't have an influence because the
condensator in your power supply are able to keep the power for a few
hundert milliseconds. Enough time for the disk to write it, but the OS
does not know that it must immediately stop queuing new data.

> SCSI actually is much smarter, because the tagged-queuing stuff lets
> the system have several writes in flight at once to the disk, with the
> disk notifying when they complete, so it can often hide the latency,
> without needing to rely on an unreliable write-cache (it's more-or-less
> the same trick that processors do to hide latency).  I think that SATA
> disks (and maybe just plain ATA ones) can do this now too.

I was thinking about one of the SCSI features that allows the
controller to map bad blocks to different locations on the disk without
 notifying the OS about this change (because in mid 90 the OS didn't do
anything usefull with this information).
So even when the OS thinks that blocks are close together SCSI disks
might need a long way to reach the block (and killing the elevator
algorithm in the OS). I hope that this SCSI feature is now disabled by
default.

In this scenario we have hunderts of milliseconds in the worst case.

0
Reply llothar 6/26/2006 5:28:27 AM

llothar wrote:

>
> If PC's were designed well it wouldn't have an influence because the
> condensator in your power supply are able to keep the power for a few
> hundert milliseconds. Enough time for the disk to write it, but the OS
> does not know that it must immediately stop queuing new data.

Well, as you say, the issue is that systems aren't well designed, or
cheap one's aren't! So to be safe on cheap systems you have to be
cautious, and Solaris is a bit more interested in being safe than linux
typically is.  It's kind of the definition of a non-cheap system that
it deals with issues like this properly, so disks can have write caches
which can be used, memory has ECC &c &c...

>
> I was thinking about one of the SCSI features that allows the
> controller to map bad blocks to different locations on the disk without
>  notifying the OS about this change (because in mid 90 the OS didn't do
> anything usefull with this information).
> So even when the OS thinks that blocks are close together SCSI disks
> might need a long way to reach the block (and killing the elevator
> algorithm in the OS). I hope that this SCSI feature is now disabled by
> default.

I think this is actually OK - if a small number of sectors are remapped
then the bad case will happen only very rarely, which is OK.  if a
*large* number are remapped, then chances are very high the disk is
about to die anyway :-)

--tim

0
Reply Tim 6/26/2006 6:56:48 AM

tunla wrote:

> If we are talking a logging file system like EXT3  this is the
> difference between a
> log replay restart and a FULL FSCK restart. not counting the data that
> has vanished into the great bit bucket.

That should never be the case for a proper logging FS - writes should
always happen in a good order so that, so long as the disk commits the
writes in the order they're issued, the FS is in a good state, however
many might be lost after the last one committed.  Of course disks might
commit out of order I guess (and not tell you).

--tim

0
Reply Tim 6/26/2006 6:59:37 AM

=?ISO-8859-1?Q?Stefan_Kr=FCger?= <skrueger@meinberlikomm.de> writes:

>Casper H.S. Dik wrote:
>> ZFS will now enable write caches if it owns the whole disk and it
>> will flush the cache at the appropriate moments.

>hm let's assume solaris support's ZFS boot in the near future, i.e. you
>can install right to a zfs /, /usr, /var etc.; what about swap? you need
>a partition/slice for swap so zfs on such a system will never own the
>whole disk, so no whole disk = no write cache = lower performance?

On single disk systems; or you could enable the write cache yourself;
surely, losing swap space is fairly uninteresting when the system goes
down.

>or are there plans to intergrate swap handling into zfs somehow?
>encrypted swap with zfs-crypto, sounds nice, don't you think? ;)

ZFS already supports swap using zvols.  (zfs create ... -V size volume)

(Swapping the ZFS files is a bad idea because of the never overwrite)

Casper
0
Reply Casper 6/26/2006 9:09:02 AM

"Daniel Rock" <v200625@deadcafe.de> writes:

>Casper H.S. Dik <Casper.Dik@sun.com> wrote:
>> "Daniel Rock" <v200625@deadcafe.de> writes:
>>>If you want to forcibly turn write cache off you have to put:
>>> 
>>>set ata:ata_write_cache = -1
>>> 
>>>in /etc/system
>> 
>> Which has no effect in Solaris 8 or 9 or 10 (except perhaps some
>> updates) or on SPARC.

>Hmm, the following file:

>server:/mnt/platform/i86pc/kernel/drv# what ata
>ata:
>        SunOS 5.10 Generic_118844-26 November 2005

>defines the variable and the function. Maybe they are not connected in the
>source tree buy they definitely are there:

>server:/mnt/platform/i86pc/kernel/drv# nm ata | grep write_cache
>[164]   |     29471|     233|FUNC |LOCL |0    |1      |ata_set_write_cache
>[444]   |       624|       4|OBJT |GLOB |0    |4      |ata_write_cache

>This is extracted from the x86.miniroot (S10U1).

It's not in S10 FCS is was added later (not sure when it was backported).

The reason for this change was fairly simple; generally, all IDE
disks come with write caches enabled so for the most part this
change is a no-op.  It is mostly relevant for a series of IDE disks
Sun shipped with the write cache disabled.

Now, why we thought this was a good change to make is beyond me.

Casper
-- 
Expressed in this posting are my opinions.  They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
0
Reply Casper 6/26/2006 9:12:52 AM

Casper H.S. Dik <Casper.Dik@Sun.COM> writes:

> ZFS already supports swap using zvols.  (zfs create ... -V size volume)

True, but you'll need the fix for CR 6405330, recently integrated into
Nevada.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Faculty of Technology, Bielefeld University
0
Reply Rainer 6/26/2006 10:08:19 AM

Casper H.S. Dik wrote:
> =?ISO-8859-1?Q?Stefan_Kr=FCger?= <skrueger@meinberlikomm.de> writes:
>> or are there plans to intergrate swap handling into zfs somehow?
>> encrypted swap with zfs-crypto, sounds nice, don't you think? ;)
> 
> ZFS already supports swap using zvols.  (zfs create ... -V size volume)
and you can enable/add this with

swap -a /dev/zvol/dsk/pool/swap ?

is this done automagically after a reboot?

> (Swapping the ZFS files is a bad idea because of the never overwrite)
what does that mean? Swapspace created with (for example)

zfs create -V 1g pool/swap
swap -a /dev/zvol/dsk/pool/swap

is a bad idea?
0
Reply ISO 6/26/2006 5:22:37 PM

24 Replies
568 Views

(page loaded in 0.203 seconds)

Similiar Articles:


















7/25/2012 9:35:13 AM


Reply: