Timeout connecting to server

Mon, 28 Apr 2003 12:09:34 -0700 (PDT) · Steve Lammert

We are using Conserver 7.2.7 to serve about 900 console lines on 25 
Cyclades TS terminal concentrators, from a Sun Ultra 5 running Solaris 
8.  We had no problems at levels of 500-600 lines, but the recent 
expansion to 900 lines appears to have led to the following interesting 
behavior:

After the server has been up for ten days or so, a few users begin 
experiencing timeouts when connecting to a small number of console 
lines, viz:

--------------------------------------------------------
myhost$ console beta-15-1
< --- Three minutes of silence --- >
console: connect: 61897@conserver: Connection timed out
--------------------------------------------------------

Logging into the Conserver server, I notice a number of connections to 
port 61897 in CLOSE_WAIT state.  These entries tend to hang around for a 
LONG time (e.g. days):

--------------------------------------------------------------------
conserver# netstat -a|grep 61897
*.61897              *.*                0      0 32768  0 LISTEN
lyell.panasas.com.61897 kinsman.2458    1      0 33304  0 ESTABLISHED
lyell.panasas.com.61897 build-bsd6.1851 57920  0 33304  0 CLOSE_WAIT
lyell.panasas.com.61897 build-bsd6.1855 57920  0 33304  0 CLOSE_WAIT
lyell.panasas.com.61897 build-bsd6.1863 57920  0 33304  0 CLOSE_WAIT
lyell.panasas.com.61897 rack-bsd2.2776 57920   0 33304  0 CLOSE_WAIT
lyell.panasas.com.61897 rack-bsd2.2778 57920   0 33304  0 CLOSE_WAIT
lyell.panasas.com.61897 rack-bsd2.2781 57920   0 33304  0 CLOSE_WAIT
lyell.panasas.com.61897 rack-bsd2.2783 57920   0 33304  0 CLOSE_WAIT
lyell.panasas.com.61897 kinsman.1984 57920     0 33304  0 CLOSE_WAIT
--------------------------------------------------------------------

One also sees timeouts when using commands such as "console -x"... the 
list of connections pauses at a certain point, and eventually times out. 
  It seems likely that a single Conserver daemon (out of the 55 or so 
that are spawned to handle 900 lines) is being affected.

Restarting Conserver is sometimes (but not always) effective in clearing 
this up.  In many cases, though, the only solution is to reboot the server.

I had previously bumped up certain values in /etc/system (e.g. 
"maxusers", "tcp:tcp_conn_hash_size") to better handle the large number 
of connections to Conserver, and I'm also planning to install the latest 
Solaris patch cluster, in case this is a Solaris TCP/IP issue...

... but I thought I ought to ask the List as well, in case others have 
seen this before.

TIA,
S

-- 
--
steve lammert     software engineer   voice: +1-412-323-3500
slammert@panasas.com   panasas, inc     fax: +1-412-323-3511