HPCwire

Leading HPC
Solution Providers


























HPCwire >> Off the Wire

Cluster Adoption Phases


If you're reading this article, the idea that cluster adoption is increasing will be of no surprise to you. The idea of linking low cost, commodity computers together, Rick Friedman, Director of Marketing, Scali Inc.and leveraging a low cost operating system, has been around for a number of years now. The ability to cost-effectively leverage high performance computing changes the way people do their jobs and changes the way leaders plan their projects. The use of these types of systems has grown from small projects inside of labs, universities, and basements into full-fledged validated enterprise tools. Scali, a company making cluster management and MPI library software tools, has been involved in Linux-based cluster computing since 1997. Over the past nine years, we've noticed a pattern in how clusters are brought into organizations, particularly commercial organizations, and how their usage and impact changes over time. This article will discuss those various phases and how laying the proper groundwork can significantly increase the likelihood of success when transitioning to the next phase.

One of the interesting aspects of seeing how cluster adoption has spread, particularly within a single organization over time, is how many users claim that if they knew then what they know now, they would have done things differently. In general, many users comment that they tend to focus very heavily on the processor count and performance aspects of their initial systems, but very quickly found that things like management, expansion, networks, operating systems, and applications ended up being the key elements that affected actual user success.

As clusters become more mainstream, we see that the challenges of deploying, managing and maintaining these environments have become increasingly complex and require comprehensive tools, such as Scali Manage, to provide assistance in making these clusters as efficient as possible.

Phase One: Project/Evaluation

Typically, we see that most commercial organizations bring their initial clusters in as single purpose clusters. These clusters are usually focused on specific tasks or specific projects. They consist of homogenous hardware and operating environments that are not expected to change much from initial specifications.

These clusters are often brought into the organization by the project teams themselves and are not directly tied to the general IT environment. This implies that the support and ongoing maintenance is also a responsibility of the project team, with all the benefits and risks that control offers them. They are typically deployed to provide performance and capabilities to teams that have specific project goals that need to be met. They usually have a single or limited number of applications and a limited user base. The primary criteria of success for these types of clusters are that they get up and running quickly, providing benefits to the project completion.

From a management perspective, these users need solutions that can deploy their clusters quickly, leveraging best practices to ensure success. From an ongoing perspective, these environments need an easy-to-use solution for keeping their systems running and for handling any updates to the application environment, while providing basic feedback on the efficiency and effectiveness of the cluster.

Phase Two: Distribution

Once projects experience some success with clusters, that success breeds additional interest in the use of the cluster -- and of clusters in general -- by other projects within the organization.

At this point, the initial cluster environment starts to grow in all dimensions from its original design and purpose. As this occurs, the number of nodes increases, the mix of hardware changes from a purely homogenous to a heterogeneous environment, the number of applications increases and the number of users expands.

Within this phase, users start to rely more heavily on the cluster for productive work, while at the same time the ongoing support demands and the complexity of the cluster are growing. At this phase, the criteria for success become delivering reliable, on-going performance to the project teams and users of the system, while maintaining the flexibility to respond quickly to changes and required additions.

The management challenge at this phase is being able to manage the increasing diversity of resources, increasing utilization rates, and the total life cycle of the cluster, including additional hardware deployments, operating and applications updates and upgrades, performance optimization and planning for additional usage. Being able to rapidly update or modify the environment to effectively service the increasing expectations and requirements of the cluster becomes critical.

It is interesting to note that some organizations enter the "distribution phase" of the cluster adoption lifecycle and struggle to stay successful. They tend to focus purely on the hardware implementation aspects of their cluster deployment and fail to plan and prepare adequately for the challenges of their evolving environment. These organizations don't optimize this phase and continue to struggle or return to phase one, starting over again with a new cluster implementation.

The successful companies grow their environments with increasing, not decreasing, efficiency, finding that with the use of adequate management tools and processes, they can increase their responsiveness and ability to adapt even while the total number of nodes within the environment grows. Success at this stage ensures that the investment made by the organization in clustering technology continues to scale, and scale effectively.

Phase Three: Centralization

For those organizations that continue to expand their use of clusters, the next phase, centralization, is defined by the realization that there are a number of clusters operating throughout the organization. They realize that by optimizing the environment across all clusters, the organization as a whole can leverage these resources more effectively. This represents the transition from being project-driven to corporate-driven where resources are deployed and managed in a coordinated fashion, optimizing investment, time to production, and cross-organization efficiency.

At this point, the challenge for the organization is centralizing the deployment, access, and availability of the cluster resources. The clusters are being used more dynamically, with jobs and applications being moved regularly across a heterogeneous environment. Part of this effort often includes developing clear levels of expected service and performance for the end users and projects. This environment is increasingly dynamic as the clusters move to a generalized resource for users, and business requirements drive the expectations of the environment. This stage further increases the complexity of the environment due to the increasing number of less technical users. These users tend to view the cluster environment as simply a processing facility and expect consistency, performance, and flexibility without regard for the underlying infrastructure they are using. Additionally, they are often from diverse groups with different applications requirements, performance expectations and needs -- all of which expect responsive service. At the same time that the users are more demanding, the system administrators are increasingly generalists, covering a range of environments.

At this point, the key management challenges are about assuring rapid deployment and responsiveness to business requirements, while having a single point of management across all the clusters to assure consistency and allow for efficient ongoing management. Given the range of abilities of the teams available to manage the clusters, the ease of use and broad, simple functionality of the management environment become critical. Additionally, as the cluster resources become part of the organization's infrastructure, the management of those resources needs to become part of the overall IT management ecosystem.

Phase Four: Utility/Grid

Ideally, the clustering resources eventually become "hidden" resources to users who are simply submitting jobs without regard to infrastructure, location, or workload but only with concern to performance statistics. This stage requires policy-based automation, and ties to specific Service Level Agreements (SLAs) based on applications. We are just starting to see some initial steps in this area of using and managing clusters.

One of the key implications of this phase, which is often overlooked in simple "grid" strategies, is that all resources within an environment are not equal and understanding the underlying hardware characteristics can have large implications on how SLAs can be accomplished. With the proper management infrastructure in place that understands the relative, real-time capabilities and capacities of the various technologies within an organization, a robust set of SLAs can be created and managed, assuring that user jobs are run in the most efficient manner on the most effective platforms.

Conclusion

Based on our experience of helping over 300 organizations deploy clusters over the past 9 years, we have seen a consistent pattern in how these technologies are deployed and expanded within organizations, particularly commercial organizations. We have tried to define these phases to help users better understand what happens beyond their initial deployments and how properly managing each phase increases the success of the next phase. Today, we find the majority of organizations are in the second phase of adopting clusters with some starting to enter the later stages. The common characteristic we see among organizations successfully adopting clusters is a coherent, planned approach to the deployment, monitoring and on-going management of their environments.

-----

Rick Friedman is currently the Director of Marketing for Scali, Inc. He has over 20 years of experience as a marketing executive for technology-related products. His areas of expertise include electronic design automation, compute servers, high performance Ethernet networking, software for supporting clinical trials, and high performance computing clusters.


Article Tools

  • Print This Article

Share & Save Options

Discussion

There are 0 discussion items posted.  

Sponsored Links



Feature Articles

Nexsan Looks to Scare Up HPC Customers With Storage Beast

Even though the cost of servers still dominates the datacenter budget, storage is actually on a steeper growth curve. HPC storage, in particular, is being singled out as high-growth opportunity. Vendors are scrambling to keep up.
Read More...

The Week in Review

Google datacenters most energy efficient; Cluster Resources to demo Moab Hybrid Cluster; Red Hat Linux releases HPC distro. John West recaps those stories and more in our weekly wrap-up.
Read More...

Saudi Arabia Buys Some Big Iron

Last week, IBM and King Abdullah University of Science and Technology announced a collaboration to build "Shaheen," a 222 teraflop Blue Gene/P supercomputer. When deployed in 2009, it will represent the most powerful computer in the Middle East and one of the top systems in the world.
Read More...

Top Headlines

Oracle and HP's Database Machine Predicated on Voltaire

Oct 06 | The Register | Does the HP Oracle Database Machine represent InfiniBand's big chance to break out its HPC niche? Read more...

3D Imaging Spreads to Fashion and Beyond

Oct 06 | BusinessWeek | A body scan can save a lot of time in the fitting room, and fields from medicine to architecture are adopting 3D computing applications. Read more...

Structural Engineers and Computer Scientists Hope to Integrate Disciplines to 'Revolutionize Building Construction'

Oct 03 | UCSD News | Despite the evolution of computer science over the past 30 years, structural engineering -- hindered by a reluctance to adapt to digital innovations -- has remained relatively unchanged as a discipline. Read more...

Credit Crisis Spreads a Pall Over Silicon Valley

Oct 02 | New York Times | Silcon Valley is starting to feel the effects of the credit crunch. Read more...

Google: 'The World's Most Efficient Data Centers'

Oct 01 | Data Center Knowledge | Google today disclosed details of its data center energy usage, confirming that it operates some of the most efficient facilities in the world. Read more...

Featured Whitepapers

Panasas® Tiered Parity™ Architecture

Sep 04 | | Disk drives are approximately 250 times denser today than a decade ago. This is good news for users who are creating, manipulating and storing more data than ever before. It gives them an opportunity to derive more value from their stored data and lowers the capital acquisition and operating expense associated with that data.

Multimedia

Video White Paper: Architecting a Better Network Storage Solution

BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.

High Performance on Wall Street

Newsletters

Stay informed! Subscribe to HPCWire email Newsletters.

Get updates and insights on the High Productivity Computing industry delivered driectly to your inbox.






Featured Events

Harvard Summit 2008
LCI Workshop
SIFMA
HP-CAST
2008 Virtualization Conference & Expo
Symposium 2009

HPC Job Bank