[Veritas-bu] Resolution to Jobs Hanging or taking a long time to disengage since upgrade to 5.x
briandiven at northwesternmutual.com
Fri Mar 18 18:32:33 CST 2005
I was reminded about the following email and want to thank people for their help and explain the resolution. The following is a reminder of the issue, the fun stuff is after that.
> > Date: Sat, 29 Jan 2005 18:20:09 -0600
> > TechNote 274544 provides ideas to reduce the burden on the NBU 5.1 software in large environments. Since our upgrade 9 weeks ago, we have been bouncing NBU almost daily due to hung backups and we are a large 24x7 environment and have seen limited improvements after 9 weeks of an open case with Veritas - and then seeing this TechNote. I find it ironic that the re-branding of 5.1 to Enterprise Server and a technote that says not to stress BPSCHED in Enterprise Server environments can occur, so I'd like to see if I'm alone here.
> > We have had several issues upgrading to NBU 5.1 MP1 and now MP2 where we are unable to submit many backups (queued or active) at once. The recent TechNote 274544 fits our account perfectly and I am wondering if any other large NBU shops are experiencing similar issues. I have a hard time believing this TechNote was generated just because of us. Veritas shows no desire to address this other than to wait until release 6.x and I could use some friends that will either state that they have an issue or help me push a fix through.
OK - New Date: Today
We have a site specific BPSCHED binary that will be made available in 5.1 MP3 coming to a theatre near you soon. This should resolve many problems in the accounts that I have been in contact with. Although BPSCHED hasn't changed much from 4.5 to 5.1, it changed enough. Here is my understanding of what has changed which was related to our issues.
NBU 4.5 did Version Checking for the purposes of In-Line Tape Copy (ITC). Because 4.5 was backwards compatible, they checked to see if there was a 3.4 client which didn't support ITC. It also appears that 5.1 handled directives as to what was to be backed up differently. So, when you use a directive of "All Local Hard Drives", NBU did the analysis of what this meant and interrogated each server sequentially to resolve this before the job would even appear as a queued job in the Activity Monitor. In other words, there were a lot of background security checks utilizing resources that you can't see. If you have any clients with Network Connectivity issues, BPSCHED will wait for the time out value before querying the next server. Meanwhile, you may hit the next backup window and continue to backlog BPSCHED with resolution issues that are transparent to you.
The fix was that NBU 5.1 doesn't need to do this version checking from 5.1 back to 4.5 for ITC. When they took this out of BPSCHED, we have been able to push the system with over 2,000 concurrent backup jobs and nothing is getting delayed. This seems to be extremely susceptible for Windows Servers.
Because of this, we are going to go back to "All Local Drives" this weekend along with ITC. I have run just over a week without issues with ITC turned back on. I am anxious to see how the new directive of "All Local Drives" works. Per my reference to the technote, we have also tried to "not stress" NBU and broke our policies into multiple policies that will have a convenient schedule. If this new directive works, my next step will be to get back to life as normal and submit everything at once and let NBU determine when resources are available, run my backups, and it will really be good to wake up and see my backups are done.
So far, this has been a wonderful fix that 2 Veritas back-end Engineers have been involved in on daily calls for 15 weeks. I must say that Veritas really gave us the resources to resolve this, but we will have a post-mortem as to what took so long to get their attention. Coming together on this site and finding friends helped a lot ... this is a useful tool for communication and resolution. So, I send my thanks to many people.
We do have a special binary for BPSCHED and they want us to upgrade to MP3 soon, but that does include other improvements. If it ain't broke, don't fix it. I am very happy where I am. I am gunshy to even apply MP3, 15 weeks was a long burden for many of us support people.
I wrote this email in hopes that it will help some people that are now 7 weeks without a life, my friends, and people considering upgrading. My heart feels that 5.1 MP3 will be solid.
I wish that everyone will find that release that keeps them solid and working and with family.
More information about the Veritas-bu