Pertinent Info

Mark Lloyd mlloyd@cbis.com
Fri, 6 Jun 1997 15:27:45 -0400 (EDT)


Obviously 2.3 should follow the first occurrence of VxVM in my note below.

I apologize for the confusion.

Mark E. Lloyd
Cincinnati Bell Information Systems (CBIS)

voice : (513) 784 - 7455
email : mlloyd@cbis.com




> From owner-ssa-managers@Eng.Auburn.EDU Fri Jun  6 15:10:36 1997
> From: Mark Lloyd <mlloyd@cbis.com>
> Date: Fri, 6 Jun 1997 15:06:46 -0400 (EDT)
> To: ssa-managers@Eng.Auburn.EDU
> Subject: Re: Pertinent Info
> Mime-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Md5: WWFCEsfWaCyGOgv8gYJl9w==
> X-Info: To unsubscribe, send 'unsubscribe ssa-managers' to 
majordom@Eng.Auburn.EDU in message body
> 
> Am I the only confused by this Alert ? There's certain parts that seem to 
imply 
> if you are running VxVM then everything should be ok. However, there's another 
> line stating they are working for a patch for VxVM version 2.3
> 
> What's reality ?
> 
> Mark E. Lloyd
> Cincinnati Bell Information Systems (CBIS)
> 
> voice : (513) 784 - 7455
> email : mlloyd@cbis.com
> 
> 
> 
> > From owner-ssa-managers@Eng.Auburn.EDU Fri Jun  6 14:34:02 1997
> > From: Doug Hughes <Doug.Hughes@Eng.Auburn.EDU>
> > Date: Fri, 6 Jun 1997 13:30:44 -0500
> > To: ssa-managers@Eng.Auburn.EDU
> > Subject: Pertinent Info
> > X-Info: To unsubscribe, send 'unsubscribe ssa-managers' to 
> majordom@Eng.Auburn.EDU in message body
> > 
> > Just got this:
> > 
> > (We've already fixed this here, by the way, and had gotten bitten on it
> > many many times in the past, and submitted many many bug reports before
> > it finally got fixed. It's nefarious.. You go to move some subdisks.
> > Everything goes fine. Then, at some later time (usually within a week)
> > your computer crashes on free freeing free frag, or bad inode, or FS
> > corruption of some other kind. We're REALLY GLAD they finally fixed it...
> > Saves having to get up at 3am to fsck 40GB of data.. thank goodness
> > for remote consoles.. enough rambling..)
> > 
> >  
> > 
> 
================================================================================
> >                                 SunService
> > 
> >                     SUNSOLVE EARLYNOTIFIER(SM) ALERT
> > 
> > 
> > SunSolve EarlyNotifier Alert is published periodically to provide 
> > SunService customers with the latest and most important technical 
> > information regarding Sun hardware and software.
> > 
> > 
******************************************************************************
> > 
******************************************************************************
> > 
> > DATE: May/22/97
> > 
> > SYNOPSIS: "Bad parity" with RAID-5 and SPARCstorage Array Volume Manager 
> > 	  Software version 2.x.
> > 
> > PRODUCT CATEGORY: Storage/Software
> > 
> > PRODUCTS AFFECTED:
> > 
> > 	Any SPARCstorage Array Volume Manager Version 2.X Software releases,
> > 	patched and unpatched, that support RAID-5 configurations.
> > 
> > PART NUMBERS AFFECTED:
> > 
> > 	N/A
> > 
> > REFERENCES:
> > 
> > 	BUGID#s 1223482, 1242923, 4010911, 4043658
> > 	ESC #s 509297, 508222, 506310, 504099, 504031
> > 
> > PROBLEM DESCRIPTION:
> > 
> > In all Veritas releases that supported RAID-5 prior to VxVM 2.3, some
> > maintenance functions of RAID-5 volumes under Veritas control can create
> > bad parity.  This could occur during maintenance of the RAID-5 volume when
> > growing the volume or file system, moving subdisk, vxevac, etc.
> > 
> > For non-typical RAID-5 configurations which have the following 
> > characteristics:
> > 
> >         - a RAID-5 column which is split into multiple subdisks where the
> > 	  subdisks do NOT end and begin on a stripe-unit aligned
> > 	  boundary
> > 
> >         - and a RAID-5 reconstruction operation was performed.
> > 
> > Data corruption can be present, in the region of the RAID-5 column
> > where the split subdisks align.
> > 
> > The following is an example of a RAID-5 column which is comprised of
> > split subdisks within a RAID-5 stripe-unit, which is not on a
> > stripe-unit boundary:  (Note the split in column 2 composed of subdisk
> > 2 and subdisk 4)
> > 
> > e.g.:
> > 		3 column RAID-5 (using defaults):
> > 
> >     Column 1                Column 2                Column 3
> >     =========               =========               =========  
> >         
> >     subdisk 1               subdisk2                subdisk3   
> > +---------------+       +===============+       +---------------+ 
> > |stripe-unit    |       | (subdisk 2)   |       |               | 
stripe-width
> > |is 16k         |       |               |       |               | is 48k
> > +---------------+       +===============+       +---------------+
> > |               |       | (subdisk 2)   |       |               |
> > |               |       |               |       |               |
> > +---------------+       +===============+       +---------------+<-
> > stripe-unit      
> >                                                                boundaries
> >         ...                     ...                     ...          /  /
> > +---------------+       +===============+       +---------------+<--/  /
> > |               |       | (subdisk 2)   |       |               |     /
> > |               |       |               |       |               |    /
> > +---------------+       +=======+++++++++       +---------------+<--/
> > |               |   +-->| (sd 2)+ (sd 4)+       |               |
> > |               |   |   |       +       +       |               |
> > +---------------+   |   +=======+++++++++       +---------------+
> > |               |   |   + (subdisk 4)   +       |               |   
> > |               |   |   +               +       |               |
> > +---------------+   |   +++++++++++++++++       +---------------+
> > |               |   |   + (subdisk 4)   +       |               |
> > |               |   |   +               +       |               |
> > +---------------+   |   +++++++++++++++++       +---------------+
> >         ...         |           ...                     ...
> >         ...         |           ...                     ...
> >         ...         |           ...                     ...
> >                     |
> >                     |
> >                     |
> >                     Note that this stripe-unit for this RAID-5 volume
> >                         is "split" by 2 subdisks (i.e. the end region of
> > 			subdisk 2 and the beginning region of subdisk
> > 			4).  Note also that subdisk 2 does not end on a
> > 			stripe-unit boundary, and that subdisk 4 does
> > 			not start on a stripe-unit boundary, but rather
> > 			somewhere within the stripe-unit.
> > 
> > 
> > If a RAID-5 column geometry has multiple subdisks where the subdisks
> > boundaries are not stripe-unit aligned, you may see data corruption
> > after a RAID-5 reconstruction operation in the RAID-5 volume at the
> > point of the split subdisks.  If this RAID-5 volume contains a
> > filesystem, this could manifest itself as a failed "FSCK" pass.
> > 
> > 
> > CORRECTIVE ACTION:
> > 
> > Upgrade the customer to VxVM 2.3 and execute the following procedure on
> > those RAID-5 volumes you suspect may be bad, and that a backup does not
> > exist for.  If a current backup exists, then restore any RAID-5 volume
> > suspected of having bad parity.
> > 
> > How to repair the parity if suspected bad:
> >    
> >    A raid parity resync can be forced by doing the following:
> >    
> >    1) Unmount the file system on a RAID-5
> >    2) vxvol -g <diskgroup> stop <volume>
> >    3) vxmend -g <diskgroup> fix empty <volume>
> >    4) vxvol -g <diskgroup> start <volume>
> >    
> > This will take a while and will resync the parity.
> > 
> > 
> > COMMENTS:
> > 
> > Timing tests took 58 minutes to resync the parity on a 10 GB volume with
> > enough I/O load on the system to cause all other disks to be an average
> > of 75% busy.
> > 
> > A patch (for Vm 2.3, only) is in test at this time and will be released
> > in the near future with a utility that will allow the checking of the
> > RAID-5 volumes to determine if they need to have their parity synced.
> > 
> > 
******************************************************************************
> > 
******************************************************************************
> > 
> > Other SunService Information Resources
> > --------------------------------------
> > SunSolve Online including SunSolve Bulletin Board(SM)
> > 
> > World Wide Web:
> >    North America:  http://sunsolve.sun.com
> >    UK:             http://online.sunsolve.sun.co.uk
> >    France:         http://sunsolve.sun.fr
> >    Germany:        http://sunsolve.sun.de/sunsolve
> >    Switzerland:    http://sunsolve.sun.ch
> >    Japan:          http://sunsolve.sun.co.jp/sunsolve
> >    Australia:      http://sunsolve1.sun.com.au
> > 
> > Telnet access is available on each of the above servers
> > (except in Germany, Telnet to suninfo.sun.de).
> > 
> > SunSolve CD-ROM(TM):
> >    Updated and distributed every six weeks to SunService contract customers.
> > 
> > For a complete list of patches, check the patch reports in the EarlyNotifier
> > data collection on SunSolve.
> > 
> > 
>