[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]
Steve Lammert slammert@panasas.com
Mon, 28 Apr 2003 12:09:34 -0700 (PDT)
We are using Conserver 7.2.7 to serve about 900 console lines on 25 Cyclades TS terminal concentrators, from a Sun Ultra 5 running Solaris 8. We had no problems at levels of 500-600 lines, but the recent expansion to 900 lines appears to have led to the following interesting behavior: After the server has been up for ten days or so, a few users begin experiencing timeouts when connecting to a small number of console lines, viz: -------------------------------------------------------- myhost$ console beta-15-1 < --- Three minutes of silence --- > console: connect: 61897@conserver: Connection timed out -------------------------------------------------------- Logging into the Conserver server, I notice a number of connections to port 61897 in CLOSE_WAIT state. These entries tend to hang around for a LONG time (e.g. days): -------------------------------------------------------------------- conserver# netstat -a|grep 61897 *.61897 *.* 0 0 32768 0 LISTEN lyell.panasas.com.61897 kinsman.2458 1 0 33304 0 ESTABLISHED lyell.panasas.com.61897 build-bsd6.1851 57920 0 33304 0 CLOSE_WAIT lyell.panasas.com.61897 build-bsd6.1855 57920 0 33304 0 CLOSE_WAIT lyell.panasas.com.61897 build-bsd6.1863 57920 0 33304 0 CLOSE_WAIT lyell.panasas.com.61897 rack-bsd2.2776 57920 0 33304 0 CLOSE_WAIT lyell.panasas.com.61897 rack-bsd2.2778 57920 0 33304 0 CLOSE_WAIT lyell.panasas.com.61897 rack-bsd2.2781 57920 0 33304 0 CLOSE_WAIT lyell.panasas.com.61897 rack-bsd2.2783 57920 0 33304 0 CLOSE_WAIT lyell.panasas.com.61897 kinsman.1984 57920 0 33304 0 CLOSE_WAIT -------------------------------------------------------------------- One also sees timeouts when using commands such as "console -x"... the list of connections pauses at a certain point, and eventually times out. It seems likely that a single Conserver daemon (out of the 55 or so that are spawned to handle 900 lines) is being affected. Restarting Conserver is sometimes (but not always) effective in clearing this up. In many cases, though, the only solution is to reboot the server. I had previously bumped up certain values in /etc/system (e.g. "maxusers", "tcp:tcp_conn_hash_size") to better handle the large number of connections to Conserver, and I'm also planning to install the latest Solaris patch cluster, in case this is a Solaris TCP/IP issue... ... but I thought I ought to ask the List as well, in case others have seen this before. TIA, S -- -- steve lammert software engineer voice: +1-412-323-3500 slammert@panasas.com panasas, inc fax: +1-412-323-3511