[Veritas-bu] NB5.0MP3 on Sol8 with Win2003 clients.. backups stalling

Paul Keating pkeating at bank-banque-canada.ca
Wed Mar 9 14:17:39 CST 2005

I have several machines that seem to stall during their backups.
In all cases they are Win 2003...backing up to a Sunfire V880 runing
Solaris 8 and NB5.0MP3 to STK L700 with FC connected LTO2 drives.
Backups start out fine, then the throughput gradually drops untill it is
in the neighbourhood of 25KB/s
Basically, at that point, the choice it to wait 3 days for it to finish,
or kill the job.
In every case so far, if I kill the job and run a manual, the manual
will run fine.
This is an issue in several cases since the machines have databases that
are backed up cold. (DBA's preference)
Because of this, the backup doesn't complete, therefore the
bpend_notify.bat doesn't kick off, the DBs don't restart and the clients
get in to find the service is down....also manuals can't be run on the
DB servers during the day, since the job will shutdown the DB.
All the DB stuff is kind of secondary, however, since there are dozens
of way of remediating that situation.
I do want to treat the root cause, which is these stalling backup jobs.
There are actually 4 machines on which this stalling is an issue (only 2
of them happen to be SQL servers).
The machines all have 100FD connections. The switch and NIC are both
hard set to 100FD.
NB client config is perfect. These 4 machines are backing up using vnetd
through a firewall (as are about 30+ others in the same configuration,
but without issue.)
It seems that probably 3 days a week, one of these machines will
stall.....at random....rarely the same machine twice in a row....very
unpredictable when one will stall......At this point, i have a VERY
clean environment.....with the exception of these machines, I have 100%
success rate.....
Any ideas???
