[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]
David Harris zonker@certaintysolutions.com
Wed, 28 Nov 2001 09:19:23 -0800 (PST)
I've seen a similar scenario in one particular lab, using a Cisco 3640 with NM-32A cards. I don't think this is a brand issue. I merely offer it as another clue.... When we see the failure, we typically see 8 ports in a group go down...all 8 in a modulo-8 group. (i.e. 1-8, 17-24, etc.) All of the affected lines are run by the same OCTART chip. While I could point to a failure in IOS for this (which would only be circumstantial and unsupported by fact), I actually have another working theory, based on looking at the devices attached... In these cases, there was usually a network interruption between the conserver and the console server. This could be a switch/router failure in the network, or a forced reboot of the conserver host without a polite shutdown...and the devices showing 'down' were what I call 'quiet hosts'. (A quiet host is a device that only replies when you talk to it...it doesn't usually offer any log traffic, time stamps, etc. to the logs unless someone is typing to it.) In the case of a network break like this, the TCP session to all of the ports (from Conserver to the Console Server) don't get cleared out when the connectivity failure occurs! Since the host doesn't generate any traffic on the serial port, the console server never tries to send traffic to the conserver host, and the console server leaves the session open, thinking that the conserver host is just idle. The root cause here is that the TCP FIN sequence never occured. So, when you restart your Conserver, and it tries to then connect to these ports on the console server, the console server tells the conserver that the TCP port is busy (since the console server still thinks the old session is still there and idle...) In these cases, our cure has been to log into the console server, and reset each affected line, one by one. This will blow away the (already broken) TCP session, and allow you to either restart your conserver, or just force open each of the lines that were down. While this doesn't happen too often in the data centers, I have seen this in some of the remote locations. Maybe that's another good argument for having a distributed Conserver deployment, and putting a logging host 'closer' to the console servers? :-) Regards, -Z- http://www.conserver.com/consoles/breakoff.html