I'm thinking of buying a used Blade 2000. I've been offered a Dell 147
GB F-CAL (fibre) disk. Are Suns fussy about their disks, or will pretty
much any FCAL disk work in a Blade 2000?
I know on the older machines, it rarely seems to matter who makes the
SCSI disk. One usually has to label non-Sun disks, but that is about it.
But I've not no idea if this freedom extends to fibre disks.
|
|
0
|
|
|
|
Reply
|
Dave
|
9/27/2007 1:37:45 PM |
|
On 2007-09-27, Dave <sorry-no-email@nowhere.com> wrote:
> I'm thinking of buying a used Blade 2000. I've been offered a Dell 147
> GB F-CAL (fibre) disk. Are Suns fussy about their disks, or will pretty
> much any FCAL disk work in a Blade 2000?
>
> I know on the older machines, it rarely seems to matter who makes the
> SCSI disk. One usually has to label non-Sun disks, but that is about it.
> But I've not no idea if this freedom extends to fibre disks.
W-e-e-e-e-e-e-e-e-e-e-e-llll. This is long, but bear with me...
I bought a SB2000 a couple of months ago, from eBay, after my main home machine
(an Ultra 60) died in a lightning strike. (Trying to claim on the insurance for
a non-PC, non-Mac computer was amusing, but that's another story). The SB2000
has twin 73GB disks, 4 GB of memory and twin 1.2GHz processors and initially I
was delighted with it.
I reinstalled everything, restored from the U60 backup tapes and was poised on
brink of getting disk mirroring set up (which I'd put off, because it's fiddly
to do and the sort of thing I only do rarely (the systems I work on at work are
set up by other people)) when it started crashing. No errors, nothing in syslog,
it just died. I dd'd /home to the other disk to save my work. Then I noticed I
was getting UFS log rollover errors from the boot disk - they weren't getting
syslogged, just coming on the console, so I never saw them unless I was actually
there. After a day of crashes of increasing frequency, it would no longer boot
from the main disk. Booting from DVD, the main disk could no longer be seen by
the system at all. probe-scsi also couldn't see it at all.
So I contacted the supplier, who was absolutely brilliant throughout, and he
sent me another 73Gb disk. This is where it gets relevant to you. The original
disks were Sun badged ones. The replacement disk was a Fujitsu one. The original
disk was c1t1d0, the new one came up as c1t33d0, and I *could* *not* make its
logical ID correct (and no-one responded when I posted here asking for help -
not to worry, I learned loads about luxadm, OBP, FC-AL disks and so on). I
decided to press ahead regardless and run the machine with c1t33d0 and c1t2d0,
but first I ran a surface analysis on the new disk. Hundreds of errors, where it
said that the error was repairable, but was unable to determine the block ID to
repair it.
I contacted the supplier again, and he sent me another new disk, only this time
another Sun badged one, identical to the original. This came up as c1t1d0
immediately, surface analysis ran fine, so I installed it. By this time, I'd
reinstalled everything (again) on c1t2d0, so I just made c1t1d0 the mirror. It's
been absolutely fine for about 10 days now. (I do need to swap the boot disk
back to c1t1d0, I suppose).
So ... my conclusion? I'd be inclined to think that the SB2000 *is* picky about
disks, and if I ever need to fit another, I shall be choosy about what I buy.
Would I ever buy another machine with FC-AL disks? Probably not.
Oh, and "Dear Mr. Sun", what's the point in fitting hot swappable disks to a
machine with a power supply interlock on its access panel, so you can't open it
up "hot" to swap the disks anyway?
--
"Religion poisons everything."
[email me at huge {at} huge (dot) org <dot> uk]
|
|
0
|
|
|
|
Reply
|
Huge
|
9/27/2007 2:33:17 PM
|
|
Huge wrote:
> On 2007-09-27, Dave <sorry-no-email@nowhere.com> wrote:
>
>>I'm thinking of buying a used Blade 2000. I've been offered a Dell 147
>>GB F-CAL (fibre) disk. Are Suns fussy about their disks, or will pretty
>>much any FCAL disk work in a Blade 2000?
>>
>>I know on the older machines, it rarely seems to matter who makes the
>>SCSI disk. One usually has to label non-Sun disks, but that is about it.
>>But I've not no idea if this freedom extends to fibre disks.
>
>
> W-e-e-e-e-e-e-e-e-e-e-e-llll. This is long, but bear with me...
>
> I bought a SB2000 a couple of months ago, from eBay, after my main home machine
> (an Ultra 60) died in a lightning strike. (Trying to claim on the insurance for
> a non-PC, non-Mac computer was amusing, but that's another story). The SB2000
> has twin 73GB disks, 4 GB of memory and twin 1.2GHz processors and initially I
> was delighted with it.
>
> I reinstalled everything, restored from the U60 backup tapes and was poised on
> brink of getting disk mirroring set up (which I'd put off, because it's fiddly
> to do and the sort of thing I only do rarely (the systems I work on at work are
> set up by other people)) when it started crashing. No errors, nothing in syslog,
> it just died. I dd'd /home to the other disk to save my work. Then I noticed I
> was getting UFS log rollover errors from the boot disk - they weren't getting
> syslogged, just coming on the console, so I never saw them unless I was actually
> there. After a day of crashes of increasing frequency, it would no longer boot
> from the main disk. Booting from DVD, the main disk could no longer be seen by
> the system at all. probe-scsi also couldn't see it at all.
>
> So I contacted the supplier, who was absolutely brilliant throughout, and he
> sent me another 73Gb disk. This is where it gets relevant to you. The original
> disks were Sun badged ones. The replacement disk was a Fujitsu one. The original
> disk was c1t1d0, the new one came up as c1t33d0, and I *could* *not* make its
> logical ID correct (and no-one responded when I posted here asking for help -
> not to worry, I learned loads about luxadm, OBP, FC-AL disks and so on). I
> decided to press ahead regardless and run the machine with c1t33d0 and c1t2d0,
> but first I ran a surface analysis on the new disk. Hundreds of errors, where it
> said that the error was repairable, but was unable to determine the block ID to
> repair it.
>
> I contacted the supplier again, and he sent me another new disk, only this time
> another Sun badged one, identical to the original. This came up as c1t1d0
> immediately, surface analysis ran fine, so I installed it. By this time, I'd
> reinstalled everything (again) on c1t2d0, so I just made c1t1d0 the mirror. It's
> been absolutely fine for about 10 days now. (I do need to swap the boot disk
> back to c1t1d0, I suppose).
>
> So ... my conclusion? I'd be inclined to think that the SB2000 *is* picky about
> disks, and if I ever need to fit another, I shall be choosy about what I buy.
> Would I ever buy another machine with FC-AL disks? Probably not.
>
> Oh, and "Dear Mr. Sun", what's the point in fitting hot swappable disks to a
> machine with a power supply interlock on its access panel, so you can't open it
> up "hot" to swap the disks anyway?
>
>
Just guessing but it might have something to do with buying disks in
quantity and/or not needing to stock hundreds of different replacement
disks.
|
|
0
|
|
|
|
Reply
|
Richard
|
9/27/2007 3:28:17 PM
|
|
On Sep 27, 6:37 am, Dave <sorry-no-em...@nowhere.com> wrote:
> I'm thinking of buying a used Blade 2000. I've been offered a Dell 147
> GB F-CAL (fibre) disk. Are Suns fussy about their disks, or will pretty
> much any FCAL disk work in a Blade 2000?
>
> I know on the older machines, it rarely seems to matter who makes the
> SCSI disk. One usually has to label non-Sun disks, but that is about it.
> But I've not no idea if this freedom extends to fibre disks.
YMMV as they say. Huge had problems with one disk whereas I have 50
odd
of them and nary a squawk from one in the last X # of years. Lest we
forget SAN
used to be exclusively (pretty much ) FC/AL. Anyway - that said all my
disks
are Sun branded. And each one has the latest firmware updates.. How
would you,
if you had to, update a Dell disk? Further I have used lots of SCSI
too and the
non Sun ones did on occasion act strangely. Some died the death. All
the
Sun ones still live : > Recently I purchased an HBA for my 2000. Too
bad the PCI
only has one 66 MHz slot as I built an external 500 GB SATA-II drive
and
it works very well and was under 200 CDN dollars. Fast even at half
speed.
The 2000 with dual 1.2's is a fine box. I dont think they have
released anything to
replace it YET thats worth buying new or used. Still waiting.
|
|
0
|
|
|
|
Reply
|
gerryt
|
9/27/2007 3:41:42 PM
|
|
On 2007-09-27, Richard B. Gilbert <rgilbert88@comcast.net> wrote:
> Huge wrote:
>> So ... my conclusion? I'd be inclined to think that the SB2000 *is* picky about
>> disks, and if I ever need to fit another, I shall be choosy about what I buy.
>> Would I ever buy another machine with FC-AL disks? Probably not.
>>
>> Oh, and "Dear Mr. Sun", what's the point in fitting hot swappable disks to a
>> machine with a power supply interlock on its access panel, so you can't open it
>> up "hot" to swap the disks anyway?
>>
>>
>
> Just guessing but it might have something to do with buying disks in
> quantity and/or not needing to stock hundreds of different replacement
> disks.
It was a rhetorical question... :o)
--
"Religion poisons everything."
[email me at huge {at} huge (dot) org <dot> uk]
|
|
0
|
|
|
|
Reply
|
Huge
|
9/27/2007 4:00:36 PM
|
|
In comp.sys.sun.admin Huge <Huge@nowhere.much.invalid> wrote:
> So ... my conclusion? I'd be inclined to think that the SB2000 *is* picky
> about disks, and if I ever need to fit another, I shall be choosy about
> what I buy. Would I ever buy another machine with FC-AL disks? Probably not.
Not my experience. I recently put "defective" disks from an EMC
Symmetrix DMX2 into my SB1000 and had no issues at all.
These are Seagate disks with a custom EMC firmware (they identify themselves
as SX3146707 instead of ST3146707). The two 146GB disks replaced the
original 36GB disks.
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
9/27/2007 6:14:31 PM
|
|
Daniel Rock wrote:
> In comp.sys.sun.admin Huge <Huge@nowhere.much.invalid> wrote:
>
>>So ... my conclusion? I'd be inclined to think that the SB2000 *is* picky
>>about disks, and if I ever need to fit another, I shall be choosy about
>>what I buy. Would I ever buy another machine with FC-AL disks? Probably not.
>
>
> Not my experience. I recently put "defective" disks from an EMC
> Symmetrix DMX2 into my SB1000 and had no issues at all.
>
> These are Seagate disks with a custom EMC firmware (they identify themselves
> as SX3146707 instead of ST3146707). The two 146GB disks replaced the
> original 36GB disks.
>
If EMC thinks they are defective they MIGHT be OK. EMC tends to be
ultra conservative; I worked with an EMC 3630 for several years without
a single failure! Every once in a while someone from EMC would show up
and replace a cable or a circuit board but this was almost always done
without down time! What they replaced was working but they considered
the replacement to be "better" or "more reliable". If it's not the most
reliable storage on the planet, it comes close! If it's not the most
expensive storage on the planet, it comes close!
It's worth noting that there is almost NO secondary market in EMC
equipment; if you didn't buy it from EMC, they won't support it!
|
|
0
|
|
|
|
Reply
|
Richard
|
9/27/2007 7:31:26 PM
|
|
In comp.sys.sun.admin Richard B. Gilbert <rgilbert88@comcast.net> wrote:
> If EMC thinks they are defective they MIGHT be OK.
I know. That's why I put them into the SB1000. We have a policy of not
giving back failed disks. Instead we keep them and physically destroy
them from time to time.
90% of the "failed" EMC disks are indeed Ok. They may have spin-up problems
or a few entries in the grown defect list. But basically they are Ok.
The EMC firmware also doesn't prevent using them in another environment.
> It's worth noting that there is almost NO secondary market in EMC
> equipment; if you didn't buy it from EMC, they won't support it!
Why would EMC support its disks in a SB1000?
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
9/27/2007 9:56:57 PM
|
|
In comp.sys.sun.hardware Daniel Rock <v200739@deadcafe.de> wrote:
> In comp.sys.sun.admin Richard B. Gilbert <rgilbert88@comcast.net> wrote:
>> If EMC thinks they are defective they MIGHT be OK.
>
> I know. That's why I put them into the SB1000. We have a policy of not
> giving back failed disks. Instead we keep them and physically destroy
> them from time to time.
>
> 90% of the "failed" EMC disks are indeed Ok. They may have spin-up problems
> or a few entries in the grown defect list. But basically they are Ok.
I'd say a disk that doesn't always spin up correctly is not a good place
to store data.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
9/28/2007 3:47:45 AM
|
|
According to Huge <huge@huge.org.uk>:
[ ... ]
> So I contacted the supplier, who was absolutely brilliant throughout, and he
> sent me another 73Gb disk. This is where it gets relevant to you. The original
> disks were Sun badged ones. The replacement disk was a Fujitsu one. The original
> disk was c1t1d0, the new one came up as c1t33d0, and I *could* *not* make its
> logical ID correct (and no-one responded when I posted here asking for help -
> not to worry, I learned loads about luxadm, OBP, FC-AL disks and so on). I
> decided to press ahead regardless and run the machine with c1t33d0 and c1t2d0,
> but first I ran a surface analysis on the new disk. Hundreds of errors, where it
> said that the error was repairable, but was unable to determine the block ID to
> repair it.
Hmm ... that may be because the new disk had a different WWN
(World Wide Number) which makes all FC-AL disks unique, and Solaris had
already allocated c1t1d0 to another WWN. The way to fix that is to run
devfsadm with the -C (cleanup) option, so it removes the data about the
old WWN and frees c1t1d0 for the new one.
Perhaps the reason that things worked as desired with the third
disk is that you had done a fresh install of Solaris on c1t2d0 while
there was no disk in the c1t1d0 slot.
An interesting thing, BTW, The Sun Fire 280R (which I have) uses
the same system board, but a slightly different drive cage, and it
assigns c1t0d0 and c1t1d0 instead of the c1t1d0 and c1t2d0 which the Sun
Blade 2000 does. Another difference is that the Sun Fire 280's drive
cage will only accept 1" high drives, while the Sun Blade 2000's drive
cage will accept 1.6" high drives.
> I contacted the supplier again, and he sent me another new disk, only this time
> another Sun badged one, identical to the original. This came up as c1t1d0
> immediately, surface analysis ran fine, so I installed it. By this time, I'd
> reinstalled everything (again) on c1t2d0,
With nothing in the c1t1d0 slot? So Solaris was able to start
from scratch, with no WWN conflicts.
> so I just made c1t1d0 the mirror. It's
> been absolutely fine for about 10 days now. (I do need to swap the boot disk
> back to c1t1d0, I suppose).
>
> So ... my conclusion? I'd be inclined to think that the SB2000 *is* picky about
> disks, and if I ever need to fit another, I shall be choosy about what I buy.
> Would I ever buy another machine with FC-AL disks? Probably not.
My Sun Fire 280R came with no disks, and I bought a pair of 146
GB drives (non Sun) and both work with no problems. Before I knew that
the Sun Fire 280R would not accept the 1.6" high drives, I had gotten a
pair of them (at about 180 GB) from eBay -- and I was able to test them
in a friend's Sun Blade 2000, and they just would not work at all. The
vendor took them back, since he had not been able to test them either.
> Oh, and "Dear Mr. Sun", what's the point in fitting hot swappable disks to a
> machine with a power supply interlock on its access panel, so you can't open it
> up "hot" to swap the disks anyway?
Because the same system board goes in the Sun Fire 280R, which
has both (1") drives, and both hot-swap power supplies changeable from
the front panel with no interlocks. I guess that they figure that if
you don't need the hot swappable power supplies, you also don't need hot
swapable disks. :-)
When you *do* hot swap them in the Sun Fire 280R, you do have to
umount them anyway, which may require rebooting onto another drive.
I have a card cage for a Sun Blade 2000 which I plan to set up
for making duplicate drives for backups using the external FC-AL
connector.
And I did not answer the original questions probably because I
did not have the Sun Fire 280R yet and thus had no experience with the
FC-AL disks.
Good Luck,
DoN.
--
Email: <dnichols@d-and-d.com> | Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
|
|
0
|
|
|
|
Reply
|
dnichols
|
9/28/2007 5:48:32 AM
|
|
In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
> I'd say a disk that doesn't always spin up correctly is not a good place
> to store data.
So don't let it spin down.
The disk is in a workstation. If one fails the mirror is still Ok. If both
fail, I can restore the data from the backup.
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
9/28/2007 8:45:40 AM
|
|
On 2007-09-28, DoN. Nichols <dnichols@d-and-d.com> wrote:
> According to Huge <huge@huge.org.uk>:
>
[snippage]
>> but first I ran a surface analysis on the new disk. Hundreds of errors, where it
>> said that the error was repairable, but was unable to determine the block ID to
>> repair it.
>
> Hmm ... that may be because the new disk had a different WWN
> (World Wide Number) which makes all FC-AL disks unique, and Solaris had
> already allocated c1t1d0 to another WWN. The way to fix that is to run
> devfsadm with the -C (cleanup) option, so it removes the data about the
> old WWN and frees c1t1d0 for the new one.
>
> Perhaps the reason that things worked as desired with the third
> disk is that you had done a fresh install of Solaris on c1t2d0 while
> there was no disk in the c1t1d0 slot.
This was happening even when booting from the install media, which builds its
device tree from scratch each time.
>> Oh, and "Dear Mr. Sun", what's the point in fitting hot swappable disks to a
>> machine with a power supply interlock on its access panel, so you can't open it
>> up "hot" to swap the disks anyway?
>
> Because the same system board goes in the Sun Fire 280R, which
> has both (1") drives, and both hot-swap power supplies changeable from
> the front panel with no interlocks. I guess that they figure that if
> you don't need the hot swappable power supplies, you also don't need hot
> swapable disks. :-)
Ahhh, that makes sense. Although 30 seconds with some Scotch tape sorted out the
interlock.
> When you *do* hot swap them in the Sun Fire 280R, you do have to
> umount them anyway, which may require rebooting onto another drive.
In theory you can metadetach, then luxadm {remove - whatever the comnand is} the
drive on the SB2K, except you can't open the box! :o)
> And I did not answer the original questions probably because I
> did not have the Sun Fire 280R yet and thus had no experience with the
> FC-AL disks.
--
"Religion poisons everything."
[email me at huge {at} huge (dot) org <dot> uk]
|
|
0
|
|
|
|
Reply
|
Huge
|
9/28/2007 9:09:50 AM
|
|
Huge wrote:
>>> Oh, and "Dear Mr. Sun", what's the point in fitting hot swappable disks to a
>>> machine with a power supply interlock on its access panel, so you can't open it
>>> up "hot" to swap the disks anyway?
>> Because the same system board goes in the Sun Fire 280R, which
>> has both (1") drives, and both hot-swap power supplies changeable from
>> the front panel with no interlocks. I guess that they figure that if
>> you don't need the hot swappable power supplies, you also don't need hot
>> swapable disks. :-)
>
> Ahhh, that makes sense. Although 30 seconds with some Scotch tape sorted out the
> interlock.
Personally, given the Blade 2000 is not designed for hot swapping of
disks, I suspect it could be risky to swap them. I would suspect both
the disk and where it plugs must both be designed to allow hot-swap. It
seems unlikely Sun would have designed the Blade 2000 to be
hot-swappable, then put an interlock on it.
Of course, it may be that the disk just connects to a standard SCSI
chip, and that takes care of it all. But unless I knew that to be the
case, I personally would not risk it.
|
|
0
|
|
|
|
Reply
|
Dave
|
9/28/2007 9:45:25 AM
|
|
On 2007-09-28, Dave <sorry-no-email@nowhere.com> wrote:
> Huge wrote:
>
>>>> Oh, and "Dear Mr. Sun", what's the point in fitting hot swappable disks to a
>>>> machine with a power supply interlock on its access panel, so you can't open it
>>>> up "hot" to swap the disks anyway?
>>> Because the same system board goes in the Sun Fire 280R, which
>>> has both (1") drives, and both hot-swap power supplies changeable from
>>> the front panel with no interlocks. I guess that they figure that if
>>> you don't need the hot swappable power supplies, you also don't need hot
>>> swapable disks. :-)
>>
>> Ahhh, that makes sense. Although 30 seconds with some Scotch tape sorted out the
>> interlock.
>
> Personally, given the Blade 2000 is not designed for hot swapping of
> disks, I suspect it could be risky to swap them. I would suspect both
> the disk and where it plugs must both be designed to allow hot-swap. It
> seems unlikely Sun would have designed the Blade 2000 to be
> hot-swappable, then put an interlock on it.
Oh, I think you underestimate the stupidity of the Health and Safety fascists.
--
"Religion poisons everything."
[email me at huge {at} huge (dot) org <dot> uk]
|
|
0
|
|
|
|
Reply
|
Huge
|
9/28/2007 10:38:41 AM
|
|
In comp.sys.sun.hardware Daniel Rock <v200739@deadcafe.de> wrote:
> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>> I'd say a disk that doesn't always spin up correctly is not a good place
>> to store data.
>
> So don't let it spin down.
>
> The disk is in a workstation. If one fails the mirror is still Ok. If both
> fail, I can restore the data from the backup.
Do you drive around with a flat tire? Three out of four isn't too bad.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/1/2007 1:42:03 AM
|
|
In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
> Do you drive around with a flat tire? Three out of four isn't too bad.
Bad analogy.
Do you change your LCD screen if it shows a bad pixel?
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
10/1/2007 12:59:25 PM
|
|
In comp.sys.sun.hardware Daniel Rock <v200740@deadcafe.de> wrote:
> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>> Do you drive around with a flat tire? Three out of four isn't too bad.
>
> Bad analogy.
>
> Do you change your LCD screen if it shows a bad pixel?
If my display has two pixels, and I know one is broken to start with, yes,
I replace it.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/1/2007 3:15:36 PM
|
|
In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
> If my display has two pixels, and I know one is broken to start with, yes,
> I replace it.
Do you replace your car if you jump-started it once?
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
10/1/2007 4:37:25 PM
|
|
Daniel Rock wrote:
> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>> If my display has two pixels, and I know one is broken to start with, yes,
>> I replace it.
>
> Do you replace your car if you jump-started it once?
If my car doesn't work reliably, I find out what's wrong and fix or
replace it. If the disk is purely scratch storage and I wouldn't miss
what's on it, the I would reuse the disk. I however would not store
anything critical on the disk even if it is part of a mirror set.
If your car's steering failed once, would you keep on driving it?
|
|
0
|
|
|
|
Reply
|
Douglas
|
10/1/2007 4:48:59 PM
|
|
In comp.sys.sun.hardware Daniel Rock <v200740@deadcafe.de> wrote:
> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>> If my display has two pixels, and I know one is broken to start with, yes,
>> I replace it.
>
> Do you replace your car if you jump-started it once?
I would replace what was broken or failing.
I prefer preventative maintenance, not cleaning up larger messes later.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/1/2007 6:36:24 PM
|
|
In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
> I prefer preventative maintenance, not cleaning up larger messes later.
A few "metattach" or "metareplace" are a larger mess?
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
10/1/2007 9:39:25 PM
|
|
In comp.sys.sun.hardware Daniel Rock <v200740@deadcafe.de> wrote:
> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>> I prefer preventative maintenance, not cleaning up larger messes later.
>
> A few "metattach" or "metareplace" are a larger mess?
I guess if you're bored and have nothing better to do with a computer than
stuff it full of blatantly broken drives from a junk pile, then rebuild
the data you're just going to lose anyways next week, because you're
probably also using RAM that's "mostly" OK and SCSI card that "sort of
works" on a system board that's "almost always fine" with power supplies
with fans that "usually" spin, then go for it.
Some people have slightly different standards, and know that drives don't
fix themselves, and always just get worse and should be replaced at the
first signs of trouble, at your convenience, not when they finally do
catastrophically fail.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/2/2007 5:12:23 AM
|
|
In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
> In comp.sys.sun.hardware Daniel Rock <v200740@deadcafe.de> wrote:
>> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>>> I prefer preventative maintenance, not cleaning up larger messes later.
>>
>> A few "metattach" or "metareplace" are a larger mess?
>
> Some people have slightly different standards, and know that drives don't
> fix themselves, and always just get worse and should be replaced at the
> first signs of trouble, at your convenience, not when they finally do
> catastrophically fail.
Do you just replace a flat tire or the entire car?
Let's calculate the probability of a total failure...
Normal SCSI drives have a AFR of ~3%. Let's say the AFR of these drives
is 10 times higher (i.e. 30%). Let's also assume it takes on average 48 hours
to replace a broken drive.
What is the probability that two drives fail within 48 hours?
The probability is ~0.05% p.a. (0.3 * 0.3 * (2/365))
BTW this is the SMART output of one of the drives:
Device: SEAGATE SX3146807FC Version: D010
Device type: disk
Transport protocol: Fibre channel (FCP-2)
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK
Elements in grown defect list: 8
Vendor (Seagate) cache information
Blocks sent to initiator = 323870666916662
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 27478.45
number of minutes until next internal SMART test = 10
|
|
0
|
|
|
|
Reply
|
Daniel
|
10/2/2007 7:55:51 AM
|
|
Daniel Rock wrote:
> <snip>
>
> Let's calculate the probability of a total failure...
>
> Normal SCSI drives have a AFR of ~3%. Let's say the AFR of these drives
> is 10 times higher (i.e. 30%). Let's also assume it takes on average 48 hours
> to replace a broken drive.
>
> What is the probability that two drives fail within 48 hours?
>
> The probability is ~0.05% p.a. (0.3 * 0.3 * (2/365))
You've assumed that drives fail independently of each other. If the
drives have a higher probability of failure following some event (e.g.,
a power cycle), then your calculation is flawed. Take another example;
the drive has a 30% chance of not spinning up after a power cycle. The
probability of a catastrophic failure is
0.3 * 0.3 * P(power cycle)
Since you know you will power cycle at some point, you have a 9% chance
of losing your data at that point. Not a risk I'd take.
|
|
0
|
|
|
|
Reply
|
Douglas
|
10/2/2007 2:16:55 PM
|
|
In comp.sys.sun.admin Douglas O'Neal <oneal@dbi.udel.edu> wrote:
> Since you know you will power cycle at some point, you have a 9% chance
> of losing your data at that point. Not a risk I'd take.
You are assuming that the drive will never again spin up after a power
cycle.
This assumption is flawed.
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
10/2/2007 2:39:47 PM
|
|
In comp.sys.sun.hardware Daniel Rock <v200740@deadcafe.de> wrote:
> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>> In comp.sys.sun.hardware Daniel Rock <v200740@deadcafe.de> wrote:
>>> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>>>> I prefer preventative maintenance, not cleaning up larger messes later.
>>>
>>> A few "metattach" or "metareplace" are a larger mess?
>>
>> Some people have slightly different standards, and know that drives don't
>> fix themselves, and always just get worse and should be replaced at the
>> first signs of trouble, at your convenience, not when they finally do
>> catastrophically fail.
>
> Do you just replace a flat tire or the entire car?
the car/stires analogy is best summed up as the person that uses broken
drives puts leaky tires on their car, and hopes it doesn't go flat, and if
it does, they're fine with three good tires- and they saved a a few
dollars because they're witty.
> Let's calculate the probability of a total failure...
>
> Normal SCSI drives have a AFR of ~3%. Let's say the AFR of these drives
> is 10 times higher (i.e. 30%). Let's also assume it takes on average 48 hours
> to replace a broken drive.
>
> What is the probability that two drives fail within 48 hours?
more than you'd expect. I've seen plenty of double disk failures.
> The probability is ~0.05% p.a. (0.3 * 0.3 * (2/365))
I can simplify that equation into:
it's stupid to put broken disks back into a machine, no matter what
nonsense math you try to justify it with.
>
> BTW this is the SMART output of one of the drives:
>
> Device: SEAGATE SX3146807FC Version: D010
> Device type: disk
> Transport protocol: Fibre channel (FCP-2)
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Elements in grown defect list: 8
> Vendor (Seagate) cache information
> Blocks sent to initiator = 323870666916662
> Vendor (Seagate/Hitachi) factory information
> number of hours powered up = 27478.45
> number of minutes until next internal SMART test = 10
You're blinding yourself.
You know the drive doesn't always spin up. No amount of smart data cancels
that out.
just throw the drive out or RMA it.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/2/2007 4:08:40 PM
|
|
In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
> just throw the drive out or RMA it.
Why should I pay for it?
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
10/2/2007 4:11:04 PM
|
|
Daniel Rock wrote:
> In comp.sys.sun.admin Douglas O'Neal <oneal@dbi.udel.edu> wrote:
>> Since you know you will power cycle at some point, you have a 9% chance
>> of losing your data at that point. Not a risk I'd take.
>
> You are assuming that the drive will never again spin up after a power
> cycle.
>
> This assumption is flawed.
Agreed, my probability is too high. But the point is that the 0.05%
catastrophic probability you calculated is too low. And if we take
a number somewhere in the middle, say 0.5% chance of catastrophic
failure per power cycle, that would be way too high for me to trust
with critical data.
|
|
0
|
|
|
|
Reply
|
Douglas
|
10/2/2007 5:14:52 PM
|
|
Daniel Rock wrote:
> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>> In comp.sys.sun.hardware Daniel Rock <v200740@deadcafe.de> wrote:
>>> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>>>> I prefer preventative maintenance, not cleaning up larger messes later.
>>> A few "metattach" or "metareplace" are a larger mess?
>> Some people have slightly different standards, and know that drives don't
>> fix themselves, and always just get worse and should be replaced at the
>> first signs of trouble, at your convenience, not when they finally do
>> catastrophically fail.
>
> Do you just replace a flat tire or the entire car?
>
>
> Let's calculate the probability of a total failure...
>
> Normal SCSI drives have a AFR of ~3%. Let's say the AFR of these drives
> is 10 times higher (i.e. 30%). Let's also assume it takes on average 48 hours
> to replace a broken drive.
MTBF is around 800kh for SCSI disks giving AFR=1.095%
> What is the probability that two drives fail within 48 hours?
>
> The probability is ~0.05% p.a. (0.3 * 0.3 * (2/365))
0.00657% is more correct from above.
>
> BTW this is the SMART output of one of the drives:
>
> Device: SEAGATE SX3146807FC Version: D010
But this disk has 1200000 h MTBF so here we have an AFR of
and your probability therefore 0.0000292%
Quite a difference...
http://www.seagate.com/support/disc/specs/fc/st3146807fc.html
> Device type: disk
> Transport protocol: Fibre channel (FCP-2)
> Device supports SMART and is Enabled
> Temperature Warning Disabled or Not Supported
> SMART Health Status: OK
>
> Elements in grown defect list: 8
> Vendor (Seagate) cache information
> Blocks sent to initiator = 323870666916662
> Vendor (Seagate/Hitachi) factory information
> number of hours powered up = 27478.45
> number of minutes until next internal SMART test = 10
|
|
0
|
|
|
|
Reply
|
Thommy
|
10/2/2007 5:42:44 PM
|
|
In comp.sys.sun.admin Douglas O'Neal <oneal@dbi.udel.edu> wrote:
> that would be way too high for me to trust with critical data.
Who said there was critical data?
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
10/2/2007 7:16:52 PM
|
|
In comp.sys.sun.hardware Daniel Rock <v200740@deadcafe.de> wrote:
> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>> just throw the drive out or RMA it.
>
> Why should I pay for it?
>
You're a cheap hobbiest, you shouldn't pay for anything.
do what suits your needs best.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/2/2007 7:58:32 PM
|
|
Thommy M. wrote:
>> Let's calculate the probability of a total failure...
>>
>> Normal SCSI drives have a AFR of ~3%. Let's say the AFR of these drives
>> is 10 times higher (i.e. 30%). Let's also assume it takes on average 48 hours
>> to replace a broken drive.
>
> MTBF is around 800kh for SCSI disks giving AFR=1.095%
>
>> What is the probability that two drives fail within 48 hours?
>>
>> The probability is ~0.05% p.a. (0.3 * 0.3 * (2/365))
>
> 0.00657% is more correct from above.
You need to be careful in interpreting MTBF of disks. The MTBF is based
on the assumption that the disk will be replaced (even if working) at
the end of the service life, which is typically 5 years for a SCSI disk
- I found that on the Seagate web site once.
A MTBF of 1,000,000 hours does *not* mean the disks will last an
average time of 1,000,000 hours or 114 years if you switch them on and
never replace them. They will on average last a LOT less than that.
Although I have no data on it, I doubt any single disk would be working
114 years later!
During that 5 years, the disk is likely to be under warranty anyway.
I think the point at which one disposes of disks depends on ones
circumstances. If it's an important server in your company, it might be
wise to replace them every 5 years. If its on a less important system,
you might not do so until logs indicate a problem. If it's for a home
machine, and not one use to store important information, one might
tolerate a few errors.
I don't know how many Suns are used by hobbyists, but I suspect there
are quite a few. A previous employer had a site licence for a piece of
software, which allowed one to use a copy at home. I asked for a SPARC
licence for use at home, and was initially declined this as "a Sun SPARC
is not considered a home computer". After some discussions they agreed
as a "one-off".
I would not personally use a disk that did not reliably spin up (even at
home as a scratch disk), but I would not criticise someone who felt in
their circumstances that was appropriate. Clearly anyone doing that on a
important server in their company would need their head tested!
|
|
0
|
|
|
|
Reply
|
Dave
|
10/3/2007 7:24:40 AM
|
|
On 2007-10-03, Dave <someplace@nowhere-nice.com> wrote:
> If it's an important server in your company, it might be
> wise to replace them every 5 years.
You jest. We have over 3000 Unix servers. Wild guesstimate, 12,000 disks.
Replace 2400 disks a year? Nonsense.
--
"Religion poisons everything."
[email me at huge {at} huge (dot) org <dot> uk]
|
|
0
|
|
|
|
Reply
|
Huge
|
10/4/2007 9:02:28 AM
|
|
In comp.sys.sun.admin Huge <Huge@nowhere.much.invalid> wrote:
> On 2007-10-03, Dave <someplace@nowhere-nice.com> wrote:
>
>> If it's an important server in your company, it might be
>> wise to replace them every 5 years.
>
> You jest. We have over 3000 Unix servers. Wild guesstimate, 12,000 disks.
> Replace 2400 disks a year? Nonsense.
Just let the machines age and you will be replacing that many at some
point.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/5/2007 12:00:02 AM
|
|
Daniel Rock wrote:
> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>> just throw the drive out or RMA it.
>
> Why should I pay for it?
>
Rock on Daniel, I think you know more about disks than the others know
about cars :-)
/Jorgen
|
|
0
|
|
|
|
Reply
|
Jorgen
|
10/5/2007 1:13:03 AM
|
|
In comp.sys.sun.hardware Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
> Daniel Rock wrote:
>> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>>> just throw the drive out or RMA it.
>>
>> Why should I pay for it?
>>
> Rock on Daniel, I think you know more about disks than the others know
> about cars :-)
> /Jorgen
yup, it's always best to use broken parts, and when things do fail, to do
nothing. Problems with machines only get better with time, they're self
healing.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/5/2007 3:03:06 PM
|
|
Cydrome Leader wrote:
> In comp.sys.sun.hardware Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>> Daniel Rock wrote:
>>> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>>>> just throw the drive out or RMA it.
>>> Why should I pay for it?
>>>
>> Rock on Daniel, I think you know more about disks than the others know
>> about cars :-)
>> /Jorgen
>
> yup, it's always best to use broken parts, and when things do fail, to do
> nothing. Problems with machines only get better with time, they're self
> healing.
scsi and fcal disks are "selfhealing", lots of spare tracks/cyls.
replacement and cacheing tables, one spare sector per cyl and two spare
cyls per surface as i recall.
and several copies of the bootcode/os/rtc.
very easy to monitoring grown defect list or ioerrors or use SMART.
can only see one scary situation, if 2 drives are manufactured the same day.
well if having backups :-)
/jorgen
|
|
0
|
|
|
|
Reply
|
Jorgen
|
10/5/2007 9:51:25 PM
|
|
In comp.sys.sun.admin Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
> Cydrome Leader wrote:
>> In comp.sys.sun.hardware Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>>> Daniel Rock wrote:
>>>> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>>>>> just throw the drive out or RMA it.
>>>> Why should I pay for it?
>>>>
>>> Rock on Daniel, I think you know more about disks than the others know
>>> about cars :-)
>>> /Jorgen
>>
>> yup, it's always best to use broken parts, and when things do fail, to do
>> nothing. Problems with machines only get better with time, they're self
>> healing.
> scsi and fcal disks are "selfhealing", lots of spare tracks/cyls.
> replacement and cacheing tables, one spare sector per cyl and two spare
> cyls per surface as i recall.
None of this keeps errors from happening in the first place. Spare sectors
don't make unrecoverable read errors not happen. It's also pretty known
that once you start to see errors (and that means the drive has warning
you because there's something wrong), things only go downhill from there.
these media errors tend to grow. I know this is link to PC magazine of all
places, but it's a short link
http://www.pcmag.com/encyclopedia_term/0,2542,t=hard+disk+defect+management&i=55545,00.asp
> and several copies of the bootcode/os/rtc.
> very easy to monitoring grown defect list or ioerrors or use SMART.
> can only see one scary situation, if 2 drives are manufactured the same day.
> well if having backups :-)
> /jorgen
Plenty of drive problems are mechanical. Having 9000% spare data on
platters doesn't help if you crashed a head or your disk won't spin up.
drives don't heal themselves. They never improve the state they're in.
If they start to throw errors, replace them.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/5/2007 11:22:41 PM
|
|
Cydrome Leader wrote:
> In comp.sys.sun.admin Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>
>>Cydrome Leader wrote:
>>
>>>In comp.sys.sun.hardware Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>>>
>>>>Daniel Rock wrote:
>>>>
>>>>>In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>>>>>
>>>>>>just throw the drive out or RMA it.
>>>>>
>>>>>Why should I pay for it?
>>>>>
>>>>
>>>>Rock on Daniel, I think you know more about disks than the others know
>>>>about cars :-)
>>>>/Jorgen
>>>
>>>yup, it's always best to use broken parts, and when things do fail, to do
>>>nothing. Problems with machines only get better with time, they're self
>>>healing.
>>
>>scsi and fcal disks are "selfhealing", lots of spare tracks/cyls.
>>replacement and cacheing tables, one spare sector per cyl and two spare
>>cyls per surface as i recall.
>
>
> None of this keeps errors from happening in the first place. Spare sectors
> don't make unrecoverable read errors not happen. It's also pretty known
> that once you start to see errors (and that means the drive has warning
> you because there's something wrong), things only go downhill from there.
>
Disk drives can and do survive a block becoming unreadable. SCSI drives
can "revector" a bad block. In some operating systems, the disk driver
works in conjunction with the disk to copy data from a questionable
block to a replacement block. This looks, to the user, like "self
healing". If a bad block is revectored, it's not an indication of a
serious problem.
What IS an indication of a serious problem is A PATTERN of bad blocks
being revectored. When you see that, it's time to replace the the disk.
Do it NOW! Tomorrow may be too late.
|
|
0
|
|
|
|
Reply
|
Richard
|
10/6/2007 1:53:59 AM
|
|
In comp.sys.sun.admin Richard B. Gilbert <rgilbert88@comcast.net> wrote:
> Cydrome Leader wrote:
>> In comp.sys.sun.admin Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>>
>>>Cydrome Leader wrote:
>>>
>>>>In comp.sys.sun.hardware Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>>>>
>>>>>Daniel Rock wrote:
>>>>>
>>>>>>In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>>>>>>
>>>>>>>just throw the drive out or RMA it.
>>>>>>
>>>>>>Why should I pay for it?
>>>>>>
>>>>>
>>>>>Rock on Daniel, I think you know more about disks than the others know
>>>>>about cars :-)
>>>>>/Jorgen
>>>>
>>>>yup, it's always best to use broken parts, and when things do fail, to do
>>>>nothing. Problems with machines only get better with time, they're self
>>>>healing.
>>>
>>>scsi and fcal disks are "selfhealing", lots of spare tracks/cyls.
>>>replacement and cacheing tables, one spare sector per cyl and two spare
>>>cyls per surface as i recall.
>>
>>
>> None of this keeps errors from happening in the first place. Spare sectors
>> don't make unrecoverable read errors not happen. It's also pretty known
>> that once you start to see errors (and that means the drive has warning
>> you because there's something wrong), things only go downhill from there.
>>
>
> Disk drives can and do survive a block becoming unreadable. SCSI drives
> can "revector" a bad block. In some operating systems, the disk driver
> works in conjunction with the disk to copy data from a questionable
> block to a replacement block. This looks, to the user, like "self
> healing". If a bad block is revectored, it's not an indication of a
> serious problem.
>
> What IS an indication of a serious problem is A PATTERN of bad blocks
> being revectored. When you see that, it's time to replace the the disk.
> Do it NOW! Tomorrow may be too late.
>
and it generally always is a patter of failing blocks, not one random one
and then things are great again for years.
|
|
0
|
|
|
|
Reply
|
Cydrome
|
10/6/2007 9:33:44 PM
|
|
Cydrome Leader wrote:
> In comp.sys.sun.admin Richard B. Gilbert <rgilbert88@comcast.net> wrote:
>
>>Cydrome Leader wrote:
>>
>>>In comp.sys.sun.admin Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>>>
>>>
>>>>Cydrome Leader wrote:
>>>>
>>>>
>>>>>In comp.sys.sun.hardware Jorgen Moquist <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>>>>>
>>>>>
>>>>>>Daniel Rock wrote:
>>>>>>
>>>>>>
>>>>>>>In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>>just throw the drive out or RMA it.
>>>>>>>
>>>>>>>Why should I pay for it?
>>>>>>>
>>>>>>
>>>>>>Rock on Daniel, I think you know more about disks than the others know
>>>>>>about cars :-)
>>>>>>/Jorgen
>>>>>
>>>>>yup, it's always best to use broken parts, and when things do fail, to do
>>>>>nothing. Problems with machines only get better with time, they're self
>>>>>healing.
>>>>
>>>>scsi and fcal disks are "selfhealing", lots of spare tracks/cyls.
>>>>replacement and cacheing tables, one spare sector per cyl and two spare
>>>>cyls per surface as i recall.
>>>
>>>
>>>None of this keeps errors from happening in the first place. Spare sectors
>>>don't make unrecoverable read errors not happen. It's also pretty known
>>>that once you start to see errors (and that means the drive has warning
>>>you because there's something wrong), things only go downhill from there.
>>>
>>
>>Disk drives can and do survive a block becoming unreadable. SCSI drives
>>can "revector" a bad block. In some operating systems, the disk driver
>>works in conjunction with the disk to copy data from a questionable
>>block to a replacement block. This looks, to the user, like "self
>>healing". If a bad block is revectored, it's not an indication of a
>>serious problem.
>>
>>What IS an indication of a serious problem is A PATTERN of bad blocks
>>being revectored. When you see that, it's time to replace the the disk.
>>Do it NOW! Tomorrow may be too late.
>>
>
>
> and it generally always is a patter of failing blocks, not one random one
> and then things are great again for years.
>
>
I think "always" is an overstatement. I've seen a lot of disks over the
years. Some of them showed the pattern of failure I've described. Some
of them revectored a bad block or two and ran for several more years.
If a disk has critical data and is not a member of a RAID set you may be
justified in replacing it the first time it detects a bad block. In
most cases I would not get excited about a single bad block.
It also makes a difference if you have a service contract or are doing
"self maintenance".
|
|
0
|
|
|
|
Reply
|
Richard
|
10/6/2007 10:54:56 PM
|
|
In comp.sys.sun.admin Richard B. Gilbert <rgilbert88@comcast.net> wrote:
> If a disk has critical data and is not a member of a RAID set you may be
> justified in replacing it the first time it detects a bad block. In
> most cases I would not get excited about a single bad block.
If there is a power failure while the disk is in the middle of a write you
will most likely have a bad block. I don't worry if the number of entries
in the grown defect list keeps constant (five or less). I get suspicous
if the grown defect list grows silently.
Then I run a read analysis (or write analysis if possible) of the disk.
If the defect list has grown again, it is time to replace the disk.
But Sun doesn't care about the grown defect list, if you want to replace
a disk under service contract. "Hopefully" a read analysis of the disk
find an unrevorable read error. When enough SCSI errors have filled up
/var/adm/messages you can finally convince Sun to replace the disk.
--
Daniel
|
|
0
|
|
|
|
Reply
|
Daniel
|
10/6/2007 11:47:35 PM
|
|
Richard B. Gilbert wrote:
> Cydrome Leader wrote:
>> In comp.sys.sun.admin Richard B. Gilbert <rgilbert88@comcast.net> wrote:
>>
>>> Cydrome Leader wrote:
>>>
>>>> In comp.sys.sun.admin Jorgen Moquist
>>>> <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>>>>
>>>>
>>>>> Cydrome Leader wrote:
>>>>>
>>>>>
>>>>>> In comp.sys.sun.hardware Jorgen Moquist
>>>>>> <jorgen.moquist@n.o.s.p.a.m.mailbox.swipnet.se> wrote:
>>>>>>
>>>>>>
>>>>>>> Daniel Rock wrote:
>>>>>>>
>>>>>>>
>>>>>>>> In comp.sys.sun.admin Cydrome Leader <presence@mungepanix.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> just throw the drive out or RMA it.
>>>>>>>>
>>>>>>>> Why should I pay for it?
>>>>>>>>
>>>>>>>
>>>>>>> Rock on Daniel, I think you know more about disks than the others
>>>>>>> know about cars :-)
>>>>>>> /Jorgen
>>>>>>
>>>>>> yup, it's always best to use broken parts, and when things do
>>>>>> fail, to do nothing. Problems with machines only get better with
>>>>>> time, they're self healing.
>>>>>
>>>>> scsi and fcal disks are "selfhealing", lots of spare tracks/cyls.
>>>>> replacement and cacheing tables, one spare sector per cyl and two
>>>>> spare
>>>>> cyls per surface as i recall.
>>>>
>>>>
>>>> None of this keeps errors from happening in the first place. Spare
>>>> sectors don't make unrecoverable read errors not happen. It's also
>>>> pretty known that once you start to see errors (and that means the
>>>> drive has warning you because there's something wrong), things only
>>>> go downhill from there.
>>>>
>>>
>>> Disk drives can and do survive a block becoming unreadable. SCSI
>>> drives can "revector" a bad block. In some operating systems, the
>>> disk driver works in conjunction with the disk to copy data from a
>>> questionable block to a replacement block. This looks, to the user,
>>> like "self healing". If a bad block is revectored, it's not an
>>> indication of a serious problem.
>>>
>>> What IS an indication of a serious problem is A PATTERN of bad blocks
>>> being revectored. When you see that, it's time to replace the the disk.
>>> Do it NOW! Tomorrow may be too late.
>>>
>>
>>
>> and it generally always is a patter of failing blocks, not one random
>> one and then things are great again for years.
>>
>>
>
> I think "always" is an overstatement. I've seen a lot of disks over the
> years. Some of them showed the pattern of failure I've described. Some
> of them revectored a bad block or two and ran for several more years.
>
> If a disk has critical data and is not a member of a RAID set you may be
> justified in replacing it the first time it detects a bad block. In
> most cases I would not get excited about a single bad block.
>
> It also makes a difference if you have a service contract or are doing
> "self maintenance".
>
It would be useful if you cut out irrelevant stuff when quoting - there
is rarely much point in quoting this amount, most of which is totally
irrelevant.
Dave
|
|
0
|
|
|
|
Reply
|
Dave
|
10/8/2007 1:05:39 PM
|
|
|
42 Replies
200 Views
(page loaded in 0.323 seconds)
Similiar Articles: Disk cylinder, blocks information from OBP - comp.sys.sun.admin ...Copy disk block-by-block - comp.unix.solaris Disk cylinder, blocks information from OBP - comp.sys.sun.admin ... Are Suns fussy about fibre channel disks?? - comp.sys.sun ... Copy disk block-by-block - comp.unix.solarisIs there a good way to copy a disk with an overwritten disk label block-by ... Disk cylinder, blocks information from OBP - comp.sys.sun.admin ... Copy disk block-by-block ... Disk firmware upgrade - comp.unix.solaris... 0307 firmware) Is it possible to apply these patches to non-Sun branded disks ... anyone tell me if it is possible to upgrade the firmware of a fibre channel (FC-AL) disk ... FC devices disappear after patching Solaris 9? - comp.unix.solaris ...Even so, if I had a V440 with this patch and fibre channel I would be very upset. Sun should ... comp.unix.solaris... upgrade the firmware of a fibre channel (FC-AL) disk ... Solaris 10 dvd istallation Sun Blade 1000 - comp.unix.solaris ...... 900MHz UltraSPARC III Processor 1GB RAM total (Total - Sun original) QTY 2: 540-4525 36GB 10K Fibre Channel ... ok boot cdrom > Boot device : ?pci@8,700000/scsi/disc@6:,0:f ... vxdisk sees only one path - comp.unix.solarisHello All, I can see my VX disks through one path only... Got ... comp.unix.solaris... 8 Generic_108528-16 sun4u sparc SUNW,Sun ... B 1 33 33 4,0 ok fibre-channel ... QLogic OFFLINE / ONLINE - comp.unix.solarisI have a Sun Fire v480 server that just started spitting out NOTICEs ... It only has the two internal disks and a single fibre-channel controller. Disk bottleneck? - comp.unix.solaris... on oracle 9 , rehat linux itanium 64 with 4 cpus , 8gb. ram , emc disk array , 1 hba , fibre channel . ... Hi Michael, Sorry you are right that sun group is not suitable to ... QLogic PCI 2100 Help - comp.unix.solaris... 2100.4/ssd (blo+ 0 PCI 33 3 scsi-glm/disk ... The card is a Sun branded Qlogic qla2100 HBA. It's Fibre channel, not GigE. Solaris 9 multipath problem - comp.unix.solaris... create a filesystem on that disk so both pathes will be used. I have read the Solaris Volume Manager Guide, the Solaris Fibre Channel ... on Solaris 9 - comp.sys.sun ... ... Solaris 10 - Fibre Multipathing - mpathadm command is missing ...http://docs.sun.com/app/docs/doc ... results from one of the disks ... Solaris 10 - Fibre ... to check the status of fiber channel cards - comp.sys.hp ... Solaris 10 - Fibre ... Which drives support SCSI-3 Peristent Reservations? - comp.periphs ...... Oracle RAC using Veritas Cluster. > >Thanks, >Raj Serial SCSI disks (i.e, fibre channel) drives ... Sun Cluster Reservations(0): Removing PGRe Keys Sun Cluster uses the SCSI ... Emulex HBAS issues on Solaris 9 - comp.unix.solaris... 2003, Emulex Network Systems, Inc Emulex Fibre Channel Host ... Issue: Adding new disk from SAN (EMC) to Solaris 10 OS ... install q logic hba in sun v880 server - comp.unix ... HBA Queue Depth - comp.unix.solarisThe card is a Sun badged QLogic card: SUNWqlc Qlogic ISP 2200/2202 Fibre Channel Device Driver Matt ... confirm which one is being used for the SAN disks ... HBA Queue Depth - comp.unix.solarisThe card is a Sun badged QLogic card: SUNWqlc Qlogic ISP 2200/2202 Fibre Channel Device Driver Matt ... How about queue depth settings on the disks set by ... Fibre Channel - Wikipedia, the free encyclopediaFibre Channel, or FC, is a gigabit-speed network technology ... port is usually implemented in a device such as disk storage, an HBA on a server or a Fibre Channel switch disk-storagebugs.sun.com; Technology Network; Video and Multimedia ... performance and capacity while delivering faster disk ... Simultaneous support of iSCSI and 4 and 8 Gbit fibre channel 7/30/2012 3:53:01 PM
|