[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]
Bryan Stansell bryan@conserver.com
Wed, 14 Apr 2004 16:14:23 -0700 (PDT)
On Wed, Apr 14, 2004 at 10:31:39AM -0400, nathan r. hruby wrote: > Hi! > > We seem to be having this problem concerning binary data shoved out a > console and though conserver making conserver freeze. well, that's really not supposed to happen. i've even done some (minimal) testing doing xmodem transfers to and from consoles (local shell commands) using the '^Ec|' sequence. found a few binary-data bugs and fixed them (a few releases ago). i was able to xfer multiple megs of data without a hiccup. it was definitely shoving 8-bit data without problems. perhaps i should try that test again... > Apr 14 10:03:14 xoff kernel: [<c015f9e6>] sys_close [kernel] 0x66 (0xe0f97fb0) assuming i'm reading this right, one of the child processes was locked up trying to close the serial port (i assume it was a serial port since higher up it was doing cyclades stuff). all the files are opened non-blocking (O_NDELAY for the serial ports, but under linux O_NONBLOCK==O_NDELAY - i'm going to change all the O_NDELAY to O_NONBLOCK, btw), so it concerns me that a close would block. at least, i'm assuming it was blocked on the close. my big question is, why was it trying to close the connection to the serial port? where there any conserver messages logged that might explain it? or errors? or anything? > So, the conserver process seems "frozen" in that any console commands > seem to hang. I assume that's cause it's locked waiting for tty > operations. yeah, if you try and connect to any console managed by that process, it'll lock up. if you poll the servers it can lock too (stuff like 'console -u', 'console -w', etc) since the children provide the info. > So my question is: is this expected or possible buglet fodder? I'm > thinking that just telling conserver to strip the 7th bit shoudl hopefully > make this go away, but any guidance folks could share would be much > appreciated. i doubt the high-bit stripping will help. if it does, i'd love to know. can you make this happen at will? if so, i'd suggest copying the conserver.cf file, commenting out all but one console, and then tell conserver to ignore the console by connecting to it and doing a '^Ecd'. then start up another conserver process with 'conserver -C new.cf -DDD -p 7777 > /tmp/conserver.log 2>&1'. connect to the console with 'console -p 7777 (consolename)' and cause it to lock up. then just nuke the conserver processes that are using '-p 7777'. oh, and connect to the console with the real server and do a '^Eco' to bring it back up. the /tmp/conserver.log file will be HUGE. and i mean **HUGE**. but, it'll show every bit of detail, and hopefully will shed some light on what's going on (unless you've figured out what it is by now). and, of course, there was the off-mailing list suggestion that perhaps it's a software flow-control issue. dunno if you've checked into that yet, but maybe there's some way of that influencing the close(). but again, why is it trying to close the device? as a final "aside", i've talked to another person about issues with the cyclades Y cards and problems with conserver. afaik, he's still having problems - weird things where some ports work, others don't. maybe these are tied together somehow. we never got anywhere with his problems. it makes me think there's some low-level serial stuff that the cyclades drivers have issues with...or something. ok...i'm done with my long rambling. maybe, hopefully, something in here will be of help. if you can trigger the problem, i'd love to see the conserver log (with the -DDD). Bryan