[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]
Greg A. Woods woods@weird.com
Tue, 4 Jun 2002 19:55:13 -0700 (PDT)
[ On Sunday, May 26, 2002 at 00:10:06 (-0700), Bryan Stansell wrote: ] > Subject: Re: conserver eventually goes catatonic after SIGPIPE (on NetBSD) > > looking at #2, you see it's calling waitpid() from ConsChat(). > ConsChat() is part of your patch. the problem, i'm guessing, is that > the waitpid() inside the while loop has a little bad logic. > specifically, what happens when the waitpid() returns an error that > isn't EINTR? it'll come around for another waitpid() and, i suppose, > lock up like this. at least, that's my guess - i haven't done any real > testing - just scanned the code quickly. Hmmm... but there's never been any errno value other than EINTR -- there would be a "ConsChat: error waiting for chat process:" message in my log if there had..... I've done a whole lot of more careful error checking, including blocking SIGCHLD before calling waitpid(), setting an alarm(), checking that the process still exists when the alarm expires and EINTR is returned. I've also added a break out of the loop if ECHILD is returned. I don't know what to do if either of EFAULT or EINVAL are returned -- something's drastically wrong in that case and it should probably abort().... So far the deadlock hasn't occured again, though perhaps the blocking of SIGCHLD has prevented it. The problem without the blocking (or ignoring) of SIGCHLD is that the delivery (and catch) caused waitpid() to be interrupted and for it to return EINTR. I don't know why the second call didn't work though -- perhaps there's a race condition in my kernel that loses the status information if it's waitpid() itself that is interrupted.... -- Greg A. Woods +1 416 218-0098; <gwoods@acm.org>; <g.a.woods@ieee.org>; <woods@robohack.ca> Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>