HPCwire

Leading HPC
Solution Providers




















HPCwire >> Off the Wire

University of Tennessee Optimizes Cluster Management


Page:  1  of  2
1 | 2   All  »  

CHATTANOOGA, Tenn., Aug. 20 -- The University of Tennessee at Chattanooga (UTC) SimCenter has implemented the Intelligent Platform Management Interface (IPMI) -- reducing their operational costs by about $60,000 per year. The UTC SimCenter uses a high performance scientific supercomputing server cluster to run a computational engineering research and education center. Avocent IPMI technology pre-integrated within the majority of SimCenter servers allows IT staff to more rapidly access information about system components, manage power control, and monitor overall system hardware health remotely -- increasing server availability for their clients -- all from a single interface.

UTC Systems Administrator Wally Edmondson was spending a lot of his time on management issues such as powering on and off servers, maintaining temperature stability and viewing boot and OS console screens to help troubleshoot server errors within the SimCenter. Using IPMI changed this routine for the better.

"For anyone not using IPMI, they don't know what they are missing," said Edmondson. "There's a huge time saver you have already paid for just sitting under the covers of your cluster. Furthermore, using this agentless management approach responds to our need for expansion in that the amount of time to manage the cluster without IPMI might be so great that it would prevent us from expanding without hiring more people."

Becoming frustrated by the time-consuming and tedious process for manually managing the cluster, Edmondson set out to learn more about IPMI, a technology that he found out Avocent was pre-integrating within his Dell servers. Implementing a strategy that began in the fall of 2004 and that initially comprised a 33-node Microtronix Intel cluster, the UTC SimCenter has since added 508 Dell PowerEdge 1850 servers and PowerEdge 1855 Blade servers running Red Hat Linux 8.0. Although server clusters offer high performance, scalability and reliability, management can be very complex. Maintaining cluster availability was critical given the extremely high amount of computational power the cluster provided the faculty, students, Ph.D. candidates, researchers and off-site customers to conduct their research to get their jobs done.

"When we first implemented the cluster, I had heard of IPMI but did not know about its features," added Edmondson. "I used to have to physically inspect each server, make a list on a piece of paper to which servers needed attention, and then return to my office and dispatch them one way or another. I spent a significant amount of time doing a lot of power cycling using the power buttons before tapping into IPMI's power. Now, in seconds, I can look at my monitor and identify and resolve any issues from my desk."

Using IPMI, Edmondson now has a common interface for accessing system components such as environmental sensors, chassis power control, viewing boot and Linux OS console screens, system identification and to analyze system event logs. By periodically reading temperature, voltage and fan readings, Edmondson can quickly identify fluctuations that might lead to rack hotspots - insights that can help determine optimal rack configurations within the UTC SimCenter.

IPMI was created by the IPMI forum back in 1998. It's an industry-wide management initiative that today has over 180-vendors including, AMD, Avocent, Dell, HP, IBM, Intel, Microsoft and SUN. These vendors work together to continually update and implement this open hardware management standard for servers and other systems such as storage, network and telecommunications equipment. In its third major release, IPMI 2.0 includes enhancements to, among others, authentication and encryption, Serial over LAN (SoL), Virtual LAN (VLAN) and blade support. An important characteristic of IPMI is that it is an open and flexible standard that can be supported across tower, pedestal, rack and blade servers -- irrespective of the hardware vendor or OS used. And by being pre-integrated within the device, it does not demand any extra management agent purchases -- an approach frequently described as agentless.

Because IPMI functions on a stand-alone chip (sometimes called a BMC -- Baseboard Management Controller -- or Service Processor) independent of the OS, BIOS and CPU, access to IPMI is still available even when the operating system is unresponsive. This capability complements existing agent-based management approaches that fail when an OS crashes. Having both agent and agentless approaches fills those operational gaps. Avocent works with Dell and other leading original equipment manufacturers (OEMs) to pre-integrate IPMI capabilities into server product lines, Recently, Avocent reached a significant milestone with approximately one server containing Avocent agentless management firmware purchased every 15 seconds.

"Our embedded IPMI is a valuable component in Avocent's broad set of management solutions" added Dave Perry, executive vice president, Avocent. "By complementing out-of-band management with in-band software for inventory, provisioning and security, customers can expect cost savings managing complex clusters and data centers."

Since discovering the benefits of IPMI, productivity has improved because Edmondson no longer spends his time walking to the server room and physically checking for amber alert lights. Now he is able to rapidly identify which server needs attention and quickly troubleshoots the problem without moving from his desk.

Page:  1  of  2
1 | 2   All  »  

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  



Feature Articles

TeraGrid '09: Student Participation Soars

There was a new energy at this year's TeraGrid '09 conference thanks to an outstanding turnout for the student program. Thanks to support from the National Science Foundation, more than 100 high school, undergraduate and graduate students were able to participate in the conference.
Read More...

TeraGrid '09: OSG and TeraGrid Collaboration

Paul Avery, a recognized leader in advanced grid and networking for science, delivered the first keynote address at the recent TeraGrid '09 conference in Arlington, Virginia. A professor of physics at the University of Florida, Avery is co-principal investigator and founding member of the Open Science Grid (OSG). Avery talked about the history of OSG, some of the projects that leverage its resources, and OSG's relationship with TeraGrid.
Read More...

TeraGrid '09: Thriving in an Exponentially Changing World

Before he even took the podium, Ed Seidel was one of the buzz makers at the TeraGrid '09 conference. The day before his keynote, it was announced that he was stepping in as acting assistant director of the National Science Foundation's math and physical sciences directorate. For his talk at the conference, however, Seidel focused on the issues and efforts within his home at NSF, the Office of Cyberinfrastructure.
Read More...

Top Headlines

3D Seismic Data: Taking a Smarter Approach to Interpretation

Jul 09 | Engineer Live | The demand for computational tools to underpin the 3D seismic interpretation process has never been more apparent. Read more...

Engineering Unemployment Soared in 2Q to 8.6%

Jul 08 | EE Times | Unemployment for U.S. engineers has reached record levels, according to government figures. Read more...

Gartner Adjusts 2009 IT Spend Downward Again

Jul 08 | Network World | Global spending for 2009 projected to drop 6 percent, for a total of $3.2 trillion. Read more...

Concurrent and Parallel Are Not The Same

Jul 08 | Linux Magazine | Portability or efficiency? Neither is guaranteed when writing explicit parallel code. Read more...

800 TFLOP Real-Time Ray Tracing GPU Unveiled, Not for Gamers

Jul 07 | Ars Technica | Japanese company builds custom ASIC to accelerate real-time ray traced rendering for the auto industry. Read more...

Featured Whitepapers

Building High Performance Computing in a Green and Modular Solution Building Block

Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.

Multimedia

Webcast: Dell Expands HPC Access and Adoption with Intel Cluster Ready Program


Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell

Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.

Video White Paper: Architecting a Better Network Storage Solution

BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.

Webcast: HPC Development Solutions: Sun Studio & Sun HPC ClusterTools


Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.

Special Feature: ISC'09

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

WORLDCOMP 2009
Data Mining Courses