[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]
Steve Lammert slammert@panasas.com
Mon, 28 Apr 2003 12:21:11 -0700 (PDT)
So, I solved my immediate problem by using "lsof" (Sol8 binary obtained via freshmeat.net) to obtain the pid of the conserver daemon which was not responding. Killing the pid, then sending the "reconnect" signal to Conserver, and I'm back in business. ... but I'd still like to know why this happens ... Cheers, S Steve Lammert wrote: > > We are using Conserver 7.2.7 to serve about 900 console lines on 25 > Cyclades TS terminal concentrators, from a Sun Ultra 5 running Solaris > 8. We had no problems at levels of 500-600 lines, but the recent > expansion to 900 lines appears to have led to the following interesting > behavior: > > After the server has been up for ten days or so, a few users begin > experiencing timeouts when connecting to a small number of console > lines, viz: > > -------------------------------------------------------- > myhost$ console beta-15-1 > < --- Three minutes of silence --- > > console: connect: 61897@conserver: Connection timed out > -------------------------------------------------------- > > Logging into the Conserver server, I notice a number of connections to > port 61897 in CLOSE_WAIT state. These entries tend to hang around for a > LONG time (e.g. days): > > -------------------------------------------------------------------- > conserver# netstat -a|grep 61897 > *.61897 *.* 0 0 32768 0 LISTEN > lyell.panasas.com.61897 kinsman.2458 1 0 33304 0 ESTABLISHED > lyell.panasas.com.61897 build-bsd6.1851 57920 0 33304 0 CLOSE_WAIT > lyell.panasas.com.61897 build-bsd6.1855 57920 0 33304 0 CLOSE_WAIT > lyell.panasas.com.61897 build-bsd6.1863 57920 0 33304 0 CLOSE_WAIT > lyell.panasas.com.61897 rack-bsd2.2776 57920 0 33304 0 CLOSE_WAIT > lyell.panasas.com.61897 rack-bsd2.2778 57920 0 33304 0 CLOSE_WAIT > lyell.panasas.com.61897 rack-bsd2.2781 57920 0 33304 0 CLOSE_WAIT > lyell.panasas.com.61897 rack-bsd2.2783 57920 0 33304 0 CLOSE_WAIT > lyell.panasas.com.61897 kinsman.1984 57920 0 33304 0 CLOSE_WAIT > -------------------------------------------------------------------- > > One also sees timeouts when using commands such as "console -x"... the > list of connections pauses at a certain point, and eventually times out. > It seems likely that a single Conserver daemon (out of the 55 or so > that are spawned to handle 900 lines) is being affected. > > Restarting Conserver is sometimes (but not always) effective in clearing > this up. In many cases, though, the only solution is to reboot the server. > > I had previously bumped up certain values in /etc/system (e.g. > "maxusers", "tcp:tcp_conn_hash_size") to better handle the large number > of connections to Conserver, and I'm also planning to install the latest > Solaris patch cluster, in case this is a Solaris TCP/IP issue... > > ... but I thought I ought to ask the List as well, in case others have > seen this before. > > TIA, > S > > -- -- steve lammert software engineer voice: +1-412-323-3500 slammert@panasas.com panasas, inc fax: +1-412-323-3511