Trouble with SSA 112

Shane Williams broot@gslis.utexas.edu
Fri, 21 Jan 2000 16:19:22 -0600 (CST)


Hopefully somebody can save my sanity and give me a clue as to what's
happening with our SSA 112.  Around mid-November, for no apparent
reason (in other words, no disk failures, added disks, changes to
server configuration, etc.), we started seeing the

Nov 21 16:02:03 fiat unix: ID[SUNWssa.soc.link.5010] soc0: port 0:
Fibre Channel is OFFLINE
Nov 21 16:02:08 fiat unix: ID[SUNWssa.soc.link.6010] soc0: port 0:
Fibre Channel is ONLINE
Nov 21 16:02:08 fiat unix: ID[SUNWssa.soc.login.6010] soc0: Fibre
Channel login succeeded

looping.  At the time, the looping was worst in the middle of the
night when server load was low, and would then disappear during the
day.  During the night, the array would go offline for maybe five
seconds of every minute.  While annoying, the situation seemed
bearable, until about five days later when the machine froze around
11pm. We would reboot and see the same sequence of events roughly
every five days. 

So, I made sure I had all the latest software patches for Solaris
2.5.1, and upgraded the Array firmware to 3.12.  Neither of these
solutions helped, so I started looking for a hardware culprit

We started with the lowest cost items to replace.  We replaced the
fibre channel cable itself to no effect.  We then replaced the FC/OM
module on the SBUS card in the server to no effect.  I am assuming
that a faulty FC/OM card on the array would show up in extended
diagnostics (which it does not), so I'm working on the assumption that
it's not the problem.  And now we've reached the point where the next
item to replace is the SBus card itself (which runs around $800), but
before we throw good money after bad, I thought I'd see if any of the
people on the list might have some ideas.

To make matters more critical, it's gotten to the point where the
fibre link is offline about 7 out of every 10 seconds, which means our
server is just about worthless now.

If there's any more info I can provide, let me know.  Any help is
greatly appreciated.

-- 
Public key at www-swiss.ai.mit.edu |                 Shane Williams
/~bal/pks-toplev.html              | Systems Administrator UT-GSLIS
=----------------------------------+-------------------------------
All syllogisms contain three lines |        shanew@gslis.utexas.edu
Therefore this is not a syllogism  |   www.gslis.utexas.edu/~shanew