HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Off the Wire

University of Tennessee Optimizes Cluster Management


Page:  1  of  2
1 | 2   All  »  

CHATTANOOGA, Tenn., Aug. 20 -- The University of Tennessee at Chattanooga (UTC) SimCenter has implemented the Intelligent Platform Management Interface (IPMI) -- reducing their operational costs by about $60,000 per year. The UTC SimCenter uses a high performance scientific supercomputing server cluster to run a computational engineering research and education center. Avocent IPMI technology pre-integrated within the majority of SimCenter servers allows IT staff to more rapidly access information about system components, manage power control, and monitor overall system hardware health remotely -- increasing server availability for their clients -- all from a single interface.

UTC Systems Administrator Wally Edmondson was spending a lot of his time on management issues such as powering on and off servers, maintaining temperature stability and viewing boot and OS console screens to help troubleshoot server errors within the SimCenter. Using IPMI changed this routine for the better.

"For anyone not using IPMI, they don't know what they are missing," said Edmondson. "There's a huge time saver you have already paid for just sitting under the covers of your cluster. Furthermore, using this agentless management approach responds to our need for expansion in that the amount of time to manage the cluster without IPMI might be so great that it would prevent us from expanding without hiring more people."

Becoming frustrated by the time-consuming and tedious process for manually managing the cluster, Edmondson set out to learn more about IPMI, a technology that he found out Avocent was pre-integrating within his Dell servers. Implementing a strategy that began in the fall of 2004 and that initially comprised a 33-node Microtronix Intel cluster, the UTC SimCenter has since added 508 Dell PowerEdge 1850 servers and PowerEdge 1855 Blade servers running Red Hat Linux 8.0. Although server clusters offer high performance, scalability and reliability, management can be very complex. Maintaining cluster availability was critical given the extremely high amount of computational power the cluster provided the faculty, students, Ph.D. candidates, researchers and off-site customers to conduct their research to get their jobs done.

"When we first implemented the cluster, I had heard of IPMI but did not know about its features," added Edmondson. "I used to have to physically inspect each server, make a list on a piece of paper to which servers needed attention, and then return to my office and dispatch them one way or another. I spent a significant amount of time doing a lot of power cycling using the power buttons before tapping into IPMI's power. Now, in seconds, I can look at my monitor and identify and resolve any issues from my desk."

Using IPMI, Edmondson now has a common interface for accessing system components such as environmental sensors, chassis power control, viewing boot and Linux OS console screens, system identification and to analyze system event logs. By periodically reading temperature, voltage and fan readings, Edmondson can quickly identify fluctuations that might lead to rack hotspots - insights that can help determine optimal rack configurations within the UTC SimCenter.

IPMI was created by the IPMI forum back in 1998. It's an industry-wide management initiative that today has over 180-vendors including, AMD, Avocent, Dell, HP, IBM, Intel, Microsoft and SUN. These vendors work together to continually update and implement this open hardware management standard for servers and other systems such as storage, network and telecommunications equipment. In its third major release, IPMI 2.0 includes enhancements to, among others, authentication and encryption, Serial over LAN (SoL), Virtual LAN (VLAN) and blade support. An important characteristic of IPMI is that it is an open and flexible standard that can be supported across tower, pedestal, rack and blade servers -- irrespective of the hardware vendor or OS used. And by being pre-integrated within the device, it does not demand any extra management agent purchases -- an approach frequently described as agentless.

Because IPMI functions on a stand-alone chip (sometimes called a BMC -- Baseboard Management Controller -- or Service Processor) independent of the OS, BIOS and CPU, access to IPMI is still available even when the operating system is unresponsive. This capability complements existing agent-based management approaches that fail when an OS crashes. Having both agent and agentless approaches fills those operational gaps. Avocent works with Dell and other leading original equipment manufacturers (OEMs) to pre-integrate IPMI capabilities into server product lines, Recently, Avocent reached a significant milestone with approximately one server containing Avocent agentless management firmware purchased every 15 seconds.

"Our embedded IPMI is a valuable component in Avocent's broad set of management solutions" added Dave Perry, executive vice president, Avocent. "By complementing out-of-band management with in-band software for inventory, provisioning and security, customers can expect cost savings managing complex clusters and data centers."

Since discovering the benefits of IPMI, productivity has improved because Edmondson no longer spends his time walking to the server room and physically checking for amber alert lights. Now he is able to rapidly identify which server needs attention and quickly troubleshoots the problem without moving from his desk.

Page:  1  of  2
1 | 2   All  »  

HPCwire on Twitter

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

HPC in the Cloud Part 2
People to Watch 2010


Feature Articles

The Week in Review

C-DAC announces plans for a petaflop system; IBM researchers are working on vertical integration techniques to extend Moore's Law another 15 years. We recap those stories and more in our weekly wrapup.
Read More...

Moscow State University Supercomputer Has Petaflop Aspirations

The Moscow State University supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of tripling its processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.
Read More...

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...

Top Headlines

Australia Commissions Cray Supercomputer

Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...

Intel Partners See 'Easy' Upgrade Path With Xeon 5600 Chips

Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

SC09 HPC in the Cloud

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium