Plaiding, again.

Amos Gouaux ssa-managers@utdallas.edu
15 Jan 2000 16:04:07 -0600


Yeah, I know this has been gone over before, but I'm still a little
bit stuck on this plaiding business, at least in this specific case
(Cyrus message store).

We've got an E250 that's hooked up to an A3500 controller, which in
turn has 2 D1000 trays associated with it.  These D1000 trays are
divided so that each half is on a different bus on the A3500
controller.  The trays are fully populated with the 9GB, 10K RPM
drives.  This results in 4 SCSI chains of 6 disk drives each.  We've
also purchased VxVM and VxFS for this box, which is currently
running Solaris 2.6.  (I just happened to get the VxVM and VxFS
releases that will run on Solaris 7, but I don't think I'll be able
upgrade the box just yet.)

These trays are going to be used as the message store for a Cyrus
IMAP server that will be running on this E250.  This message store
is quite similar to MH or INN in that it stores the messages one per
file.  There are also some cache files per folder (directory) that
store accesses and message headers, among other things.  After
running some crude scripts, we've observed that the predominate file
size is roughly 8KB.

So, what we've considered doing was to create 5 RAID5 LUNs composed
of 4 drives.  That way each drive of a LUN is on a separate bus.
The segment size (this time anyway) is 8KB (16 blocks).  We settled
on 8KB because of the message sizes.  We were then thinking of using
the remaining 4 drives as hot spares.

How does that sound so far?  Suggestions?

Now comes the part that we've had the most difficulty in resolving:
filesystem layouts.  I suppose we could have 5 separate filesystems
on each of these LUNs.  Unfortunately, this would tend to mean that
we have to shuffle accounts between these filesystems to keep things
more or less balanced.  Instead, what I'd like to see is one
filesystem composed of these LUNs, if that's at all reasonable.

If the one filesystem approach is taken, the next question is how to
organize these LUNs.  The simplest approach would be to have one big
concat.  The other approach would be this "plaiding", where the LUNs
are striped together.  If a stripe were to be used, exactly how
should it be configured?

After reading some of these ssa-manager posts over and over, it would
seem that the "chunk" (one of our problems I'm sure are the different
terms being used for segment size, interlace value, etc) should be
some multiple of the 8KB "chunk" size on the RAID5 LUNs.  Would that
be 3 * 8KB = 24KB, because we're using RAID5 LUNs composed of 4
member disks?  If so, isn't this 24KB kinda big for I/Os that would
be around 8KB?  Is the goal to try to cache the writes so that
several messages are written at once?  But VxVM, since it doesn't use
NVRAM, isn't going to be able to do that, right?

We're thinking that because we're talking about so many small files,
that a VxVM stripe might not be all that advantageous, and it might
simply be more reasonable in this case to do a concat.  But then
again, this is getting into areas that we haven't dealt with much
before, at least not at this level of complexity.

Any suggestions on this black magic would be immensely appreciated.

Regards, 
Amos