[Veritas-bu] RE: Error 24 problems and trouble shooting

Charles Ballowe cballowe at gmail.com
Wed Mar 9 11:54:55 CST 2005


I applied the updated kernel parameters for solaris 9 from a technote.
After that I still encountered the problem. Here's log entries from
bpbrm on the affected media server affected yesterday. These repeat
until all jobs associated with the process are killed, then everything
returns to normal. Any more thoughts?

-Charlie

12:29:11.837 [27194] <2> sighdl: pipe signal
12:29:11.837 [27194] <2> put_long: (11) network write() error: Broken
pipe (32); socket = 5
12:29:11.837 [27194] <16> bpbrm send_keepalive: could not write
KEEPALIVE to COMM_SOCK
12:29:11.839 [27194] <2> logconnections: BPJOBD CONNECT FROM
xxx.xxx.xxx.xxx.40738 TO yyy.yyy.yyy.yyy.13723
12:29:11.841 [27194] <2> job_authenticate_connection: ignoring VxSS
authentication check for now...
12:29:11.843 [27194] <2> job_connect: Connected to the host nbmaster-b
contype 10 jobid <51195> socket <7>
12:29:11.843 [27194] <2> job_connect: Connected on port 40738
12:29:11.843 [27194] <2> set_job_details: Done 
12:29:11.887 [27194] <2> job_monitoring_exex: ACK disconnect
12:29:11.887 [27194] <2> job_disconnect: Disconnected
12:29:11.888 [27194] <2> logconnections: BPDBM CONNECT FROM
xxx.xxx.xxx.xxx.40739 TO yyy.yyy.yyy.yyy.13721

On Thu, 17 Feb 2005 13:39:43 -0500, Kevin Zhang
<Kevin.Zhang at rci.rogers.com> wrote:
> I will suggest to check the network related performance for this media
> server, also maybe you want to look into the kernel to fine tune some
> parameters.
> 
> Kevin
> 
> Date: Thu, 17 Feb 2005 11:28:28 -0600
> From: Charles Ballowe <cballowe at gmail.com>
> Reply-To: Charles Ballowe <cballowe at gmail.com>
> To: veritas-bu at mailman.eng.auburn.edu
> Subject: [Veritas-bu] Fwd: Error 24 problems and trouble shooting
> 
> It seems when this starts happening, I find at least one job who's
> detailed status is many lines of "Error bpbrm could not write KEEPALIVE
> to COMM_SOCK". Maybe that gives a clue to what's going on? I'm still
> looking for thoughts on this.
> 
> -Charlie
> 
> ---------- Forwarded message ----------
> From: Charles Ballowe <cballowe at gmail.com>
> Date: Wed, 16 Feb 2005 13:34:35 -0600
> Subject: Error 24 problems and trouble shooting
> To: veritas-bu at mailman.eng.auburn.edu
> 
> It seems that every couple of weeks one of my media servers will
> completely forget how to talk to the world and every backup that tries
> to use it will fail with a 24. Reboots can clear this, but there has to
> be a better way.
> 
> A network sniff at the time a backup gets kicked off doesn't show any
> traffic to the clients involved so I believe that the problem is on the
> server. Outside of backup processes, I'm able to send traffic through
> the interface, but backups stop working. In progress backups seem to
> continue on to completion.
> 
> The environment is NB 5.1 MP2, many of the clients are still 4.5 of some
> form. This problem seemed to exist in 4.5 as well. The servers are all
> solaris, clients are a mix of unix and windows. Any idea where to look
> to start troubleshooting this one?
> 
> -Charlie
> 
> _______________________________________________
> Veritas-bu maillist  -  Veritas-bu at mailman.eng.auburn.edu
> http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
>



More information about the Veritas-bu mailing list