[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]
Christopher Fowler cfowler@outpostsentinel.com
Fri, 18 Nov 2005 14:59:10 -0800 (PST)
On Fri, 2005-11-18 at 14:38 -0800, Zonker Harris wrote: > and never > learned about crash carts, didn't have to wrestle > with serial settings, didn't have to walk or drive > to other sites when they wanted to check... > I worked for vars that did this so I got to see many data centers. 2 years ago I went to a data center that has aprox 10,000 Rack Saver blades. It is a Linux cluster. Each rack had a Dell switch. That switched served the rack. Sometimes the dell switch would go stupid. I was there when it happened. 7 engineers was huddled around a crash cart that had a PC for serial console access. Each one of them had to leave their location and spend time solving these issues. Only one of them could type and it was hard for all 7 to see the screen because it was so crowded. Normally when this happens they telnet to the switches but with the clusters the switches are so busy processing packets they can no longer accept telnet connections. At that point the only access was serial console. They might have had only a couple hundred of these switches managing all these blades. I could start a flame war on this list about how I feel about Dell but if they wanted to run those critical switches headless then they should have at least picked Cisco. This cluster processed geological data so every second a node was down was money lost. One of the main goals of this cluster was to find oil. Now their competitor only had 2,000 nodes of Dell servers. Each node was attached via console management. Each switch. Everything. They had also written utilities to interact with all the consoles that were in the system. Some people see the vision, some don't. Once you do you never go back.