[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]
Bryan Stansell bryan@conserver.com
Sun, 26 May 2002 00:10:06 -0700 (PDT)
On Thu, May 23, 2002 at 04:45:42PM -0400, Greg A. Woods wrote: > conserver eventually seems to go catatonic after SIGPIPE (on NetBSD) > > I think this is the same problem as what I reported some time ago with > a somewhat older release, but now with 7.2.1 it's just far less common. > (note the code I'm running includes the patches I sent to the list, > though I don't see how any of those changes could affect any signal > processing....) ugh...well, i'll address this at the stack trace... > I suspect the SIGPIPE is triggered by an attempt to write to a socket > that's been closed (TCP RST) by the client. The server should just > close the socket and do any per-client cleanup necessary, but I don't > see a signal handler for SIGPIPE anywhere.... right...no SIGPIPE handler. but, there is a SIGCHLD handler, which gets called when forked processes die. i don't think this is a problem (yet, at least). > I'm recompiling now..... Hmm.... seems it was waiting for a PID that > didn't exist: > [chopped gdb header] > (gdb) where > #0 0x100b7e5c in wait4 () > #1 0x100b4e88 in waitpid () > #2 0xa6cc in ConsChat (pCE=0x15400) at group.c:3016 > #3 0x7a9c in Kiddie (pGE=0x47180, sfd=0x6c010) at group.c:1458 > #4 0xa20c in Spawn (pGE=0x47180) at group.c:2907 > #5 0xcad0 in FixKids () at master.c:143 > #6 0xd2f4 in Master () at master.c:313 > #7 0xc874 in main (argc=269482840, argv=0x14400) at main.c:724 > (gdb) up > #1 0x100b4e88 in waitpid () > (gdb) up > #2 0xa6cc in ConsChat (pCE=0x15400) at group.c:3016 > 3016 while (waitpid(pid, &cstatus, 0) < 0) { > (gdb) print pid > $1 = 21006 > (gdb) looking at #2, you see it's calling waitpid() from ConsChat(). ConsChat() is part of your patch. the problem, i'm guessing, is that the waitpid() inside the while loop has a little bad logic. specifically, what happens when the waitpid() returns an error that isn't EINTR? it'll come around for another waitpid() and, i suppose, lock up like this. at least, that's my guess - i haven't done any real testing - just scanned the code quickly. so, unfortunately, it looks like the patch you've put together might need a little work. i don't think folks using the base 7.2.1 will see this type of problem. i'd love to get the whole chat-based thing integrated in...hopefully this stuff can be worked out. anyway, there's my two cents. Bryan