RE: LISA 2005 attendees?

Fri, 18 Nov 2005 14:59:10 -0800 (PST) · Zonker Harris

On Fri, 2005-11-18 at 14:38 -0800, Zonker Harris wrote:
> and never
> learned about crash carts, didn't have to wrestle
> with serial settings, didn't have to walk or drive
> to other sites when they wanted to check...
> 

I worked for vars that did this so I got to see many data centers.

2 years ago I went to a data center that has aprox 10,000 Rack Saver
blades.  It is a Linux cluster.  Each rack had a Dell switch.  That
switched served the rack.  Sometimes the dell switch would go stupid.  I
was there when it happened.  7 engineers was huddled around a crash cart
that had a PC for serial console access.  Each one of them had to leave
their location and spend time solving these issues.  Only one of them
could type and it was hard for all 7 to see the screen because it was so
crowded.  Normally when this happens they telnet to the switches but
with the clusters the switches are so busy processing packets they can
no longer accept telnet connections.  At that point the only access was
serial console.  They might have had only a couple hundred of these
switches managing all these blades.

I could start a flame war on this list about how I feel about Dell but
if they wanted to run those critical switches headless then they should
have at least picked Cisco.  This cluster processed geological data so
every second a node was down was money lost.  One of the main goals of
this cluster was to find oil.

Now their competitor only had 2,000 nodes of Dell servers.  Each node
was attached via console management.  Each switch.  Everything.  They
had also written utilities to interact with all the consoles that were
in the system. Some people see the vision, some don't.  Once you do you
never go back.