E450 error

  • Follow


Hi,

For zpool pool issues I had in august I had to transfer all pools from a 
AMD server to the old trusted fully loaded E450.

I just discovered a syslog message as below, I have never seen it before 
ever, could it be connected to high env temperature perhaps since due to 
this AMD -> Sparc move I had to bring two more Sparc NFS servers up 
causing the temperature to rise alot?

System Temperatures (Celsius):
------------------------------
AMBIENT    28
CPU 0      49
CPU 1      40
CPU 2      48
CPU 3      41
=================================

Front Status Panel:
-------------------
Keyswitch position is in Secure mode.

System LED Status:    POWER     GENERAL ERROR      ACTIVITY
                       [ ON]         [OFF]           [OFF]
                     DISK ERROR  THERMAL ERROR  POWER SUPPLY ERROR
                       [OFF]         [OFF]           [OFF]

Disk LED Status:        OK = GREEN      ERROR = YELLOW
                 DISK 18:    [OK]        DISK 19:    [OK]
                 DISK 16:    [OK]        DISK 17:    [OK]
                 DISK 14:    [OK]        DISK 15:    [OK]
                 DISK 12:    [OK]        DISK 13:    [OK]
                 DISK 10:    [OK]        DISK 11:    [OK]
                 DISK  8:    [OK]        DISK  9:    [OK]
                 DISK  6:    [OK]        DISK  7:    [OK]
                 DISK  4:    [OK]        DISK  5:    [OK]
                 DISK  2:    [OK]        DISK  3:    [OK]
                 DISK  0:    [OK]        DISK  1:    [OK]
=================================

Fans:
-----
Fan Bank   Speed    Status
--------   -----    ------
CPU          53       OK
PWR          31       OK


Power Supplies:
---------------
Supply     Rating    Temp    Status
------     ------    ----    ------
   0         550 W     42       OK
   1         550 W     47       OK
   2         550 W     46       OK


from dmesg


Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 131646 kern.info] NOTICE: 
[AFT2] errID 0x000d0606.f6f503c9 CBI event on CPU1
Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 418788 kern.info] [AFT2] 
errID 0x000d0606.f6f503c9 PA=0x00000000.faf43b40
Nov  3 09:21:24 tango     E$tag 0x00000000.08401f5e E$State: Shared 
E$parity 0x04
Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2] 
E$Data (0x00): 0x20000000.00000000 *Bad* PSYND=0x8000
Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] 
E$Data (0x08): 0x00000000.00000000
Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] 
E$Data (0x10): 0x00000000.00000000
Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] 
E$Data (0x18): 0x00000000.00000000
Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] 
E$Data (0x20): 0x00000000.00000000
Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] 
E$Data (0x28): 0x00000000.00000000
Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] 
E$Data (0x30): 0x00000000.00000000
Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2] 
E$Data (0x38): 0x00000000.00000000
N
0
Reply michael_laajanen (637) 11/13/2011 6:57:17 PM

> For zpool pool issues I had in august I had to transfer all pools from a 
> AMD server to the old trusted fully loaded E450.

SPARC wins every time :-)

> I just discovered a syslog message as below, I have never seen it before 
> ever, could it be connected to high env temperature perhaps since due to 
> this AMD -> Sparc move I had to bring two more Sparc NFS servers up 
> causing the temperature to rise alot?
> 
> System Temperatures (Celsius):
> ------------------------------
> AMBIENT    28
> CPU 0      49
> CPU 1      40
> CPU 2      48
> CPU 3      41

Don't know what's normal for an E450 but my V210 runs as warm or warmer as
your CPU0. showenvironment from the sc prompt (or whatever they had for
E450) should show high warn and high soft and high hard temps. For the V210
they're 88 93 and 100, far from your 49 degrees.

> System LED Status:    POWER     GENERAL ERROR      ACTIVITY
>                        [ ON]         [OFF]           [OFF]
>                      DISK ERROR  THERMAL ERROR  POWER SUPPLY ERROR
>                        [OFF]         [OFF]           [OFF]

Everythin looks good.

Looks like maybe your RAM is going bad. The environmental stuff looks
good. Maybe someone else knows. Good luck, keep that E450 going!




0
Reply nobody15 (1231) 11/13/2011 11:05:04 PM


Michael <michael_laajanen@yahoo.com> wrote:
> Hi,
> 
> For zpool pool issues I had in august I had to transfer all pools from a 
> AMD server to the old trusted fully loaded E450.
> 
> I just discovered a syslog message as below, I have never seen it before 
> ever, could it be connected to high env temperature perhaps since due to 
> this AMD -> Sparc move I had to bring two more Sparc NFS servers up 
> causing the temperature to rise alot?
> 
> System Temperatures (Celsius):
> ------------------------------
> AMBIENT    28
> CPU 0      49
> CPU 1      40
> CPU 2      48
> CPU 3      41
> =================================
> 
> Front Status Panel:
> -------------------
> Keyswitch position is in Secure mode.
> 
> System LED Status:    POWER     GENERAL ERROR      ACTIVITY
>                       [ ON]         [OFF]           [OFF]
>                     DISK ERROR  THERMAL ERROR  POWER SUPPLY ERROR
>                       [OFF]         [OFF]           [OFF]
> 
> Disk LED Status:        OK = GREEN      ERROR = YELLOW
>                 DISK 18:    [OK]        DISK 19:    [OK]
>                 DISK 16:    [OK]        DISK 17:    [OK]
>                 DISK 14:    [OK]        DISK 15:    [OK]
>                 DISK 12:    [OK]        DISK 13:    [OK]
>                 DISK 10:    [OK]        DISK 11:    [OK]
>                 DISK  8:    [OK]        DISK  9:    [OK]
>                 DISK  6:    [OK]        DISK  7:    [OK]
>                 DISK  4:    [OK]        DISK  5:    [OK]
>                 DISK  2:    [OK]        DISK  3:    [OK]
>                 DISK  0:    [OK]        DISK  1:    [OK]
> =================================
> 
> Fans:
> -----
> Fan Bank   Speed    Status
> --------   -----    ------
> CPU          53       OK
> PWR          31       OK
> 
> 
> Power Supplies:
> ---------------
> Supply     Rating    Temp    Status
> ------     ------    ----    ------
>   0         550 W     42       OK
>   1         550 W     47       OK
>   2         550 W     46       OK
> 
> 
> from dmesg
> 
> 
> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 131646 kern.info] NOTICE: 
> [AFT2] errID 0x000d0606.f6f503c9 CBI event on CPU1
> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 418788 kern.info] [AFT2] 
> errID 0x000d0606.f6f503c9 PA=0x00000000.faf43b40
> Nov  3 09:21:24 tango     E$tag 0x00000000.08401f5e E$State: Shared 
> E$parity 0x04

looks like a memory error, could be cache.


0
Reply presence (537) 11/14/2011 6:28:03 PM

Michael <michael_laajanen@yahoo.com> wrote:
> 
> 
> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 131646 kern.info] NOTICE:
> [AFT2] errID 0x000d0606.f6f503c9 CBI event on CPU1
> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 418788 kern.info] [AFT2]
> errID 0x000d0606.f6f503c9 PA=0x00000000.faf43b40
> Nov  3 09:21:24 tango     E$tag 0x00000000.08401f5e E$State: Shared E$parity 0x04
> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
> E$Data (0x00): 0x20000000.00000000 *Bad* PSYND=0x8000
> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
> E$Data (0x08): 0x00000000.00000000

Maybe a CPU cache error. Try replacing/removing CPU1.

Dennis

-- 
 Nichts begrüßt einen freudiger als ein nasser Hund.
(Werner Koczwara)
0
Reply dennis.grevenstein (122) 11/14/2011 9:54:46 PM

Hi,

On 11/14/11 10:54 PM, Dennis Grevenstein wrote:
> Michael<michael_laajanen@yahoo.com>  wrote:
>>
>>
>> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 131646 kern.info] NOTICE:
>> [AFT2] errID 0x000d0606.f6f503c9 CBI event on CPU1
>> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 418788 kern.info] [AFT2]
>> errID 0x000d0606.f6f503c9 PA=0x00000000.faf43b40
>> Nov  3 09:21:24 tango     E$tag 0x00000000.08401f5e E$State: Shared E$parity 0x04
>> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 989652 kern.info] [AFT2]
>> E$Data (0x00): 0x20000000.00000000 *Bad* PSYND=0x8000
>> Nov  3 09:21:24 tango SUNW,UltraSPARC-II: [ID 359263 kern.info] [AFT2]
>> E$Data (0x08): 0x00000000.00000000
>
> Maybe a CPU cache error. Try replacing/removing CPU1.
>
Hmm, I can replace the module later when I will update to u10 but it 
should okay to just put CPU1 offline i guess for the time being?!

/michael



0
Reply michael_laajanen (637) 11/15/2011 11:50:27 AM

On 2011-11-13, Michael <michael_laajanen@yahoo.com> wrote:
> System Temperatures (Celsius):
> ------------------------------
> AMBIENT    28
> CPU 0      49
> CPU 1      40
> CPU 2      48
> CPU 3      41
>=================================

28� ambient is already quite warm. Having such temperature differences
between CPUs 0, 2 in comparison to CPUs 1, 3 seems odd. Temperatures
around 40� are quite normal, 48 or 49 is to our experience for a E450
somewhat high. We had configured our alarms on 45� -- but we wanted to
stay on the safe side and to get alerted early if air conditioning fails.

Andreas.
0
Reply comp.unix.solaris128 (2) 11/16/2011 4:22:25 PM

5 Replies
94 Views

(page loaded in 0.117 seconds)

Similiar Articles:













7/14/2012 12:13:32 PM


Reply: