Visit additional Tabor Communication Publications
December 17, 2012
HPC clusters first emerged in universities and research centers that required extra compute power but had limited budgets. The development of open source Linux-based operating systems and management tools was a natural evolution.
However, because open source tools are created by a variety of different organizations, no one solution offers a complete all-in-one product. Organizations therefore often need to tap into multiple open source projects to obtain everything they need to manage a cluster environment. They then need to get everything working together and educate their users on how to use each component.
According to IDC, software remains the biggest roadblock for HPC users because parallel software is lacking and many applications run into issues such as scalability. The number of cores per processor and per cluster continues to increase and new programming paradigms are needed to increase efficiency. As clusters grow larger and more complex, sophisticated management tools are required. Setting up and monitoring a cluster can become very difficult - this is especially true for heterogeneous environments that are dependent on multiple generations of technology that support a variety of applications and multiple user groups. The complexity further increases when organizations deploy multiple clusters -- whether they are within the same datacenter or across a global organization – or move to a cloud-based model.
When addressing open source software for HPC
Despite the pervasiveness and benefits of open source software, it is not without its pitfalls for those organizations that lack the expertise needed to integrate, maintain and operate a stack of open source software. The associated costs can manifest in many ways – such as increased time spent on system administration, troubleshooting problems due to lack of formal support channels, or the cost of regression testing in-house. Administrators may also experience reduced productivity due to cluster downtime, support issues and issues due to low utilization of resources.
Organizations often experience additional education expenses associated with maintaining an open source environment. They will likely also experience creeping operational costs as they find themselves not only in the research business, but in the software maintenance business.
Comparisons between open source and commercial software often assume the alternatives have equal merit
This can be true for database software or scripting languages in areas where open source software is more widely recognized. However, in more specialized areas, open source software may not exist at all or provides less functionality and requires more effort to install and integrate. Some needed capabilities that emerge as critical elements include Web management consoles and portals, software provisioning tools, reporting and analysis tools, and tools to facilitate application integrations.
Often additional factors contribute to cost and risk when using open source software. This includes a lack of reliable technical roadmaps or the task of important software maintenance such as performing updates or applying security.
So how do you weigh the benefits of open source software?
Total cost of ownership (TCO) varies based on unique requirements for each organization. Measuring your TCO is about finding the right balance. Clearly open source software is here to stay and organizations should consider many factors when deciding on the degree to which to deploy commercial or open source software. Some of these considerations include the cost of software, risk associated with deploying a variety of solutions, the complexity of the environment, and the expertise of the staff. Often these considerations are interrelated – such as in the case of selecting open source software for lower cost reasons, but in doing so creating a higher element of risk and added complexity in the environment.
The decision whether to deploy open-source or commercial software for HPC is sometimes painted as a either/or choice. In practice, however, organizations have a range of different options. The figure below illustrates a range of alternatives between purely open and commercial solutions. The shape of the TCO curve will vary depending on the environment. For most organizations, being at one extreme or the other is likely to be expensive and limits their downstream flexibility.
Productivity is Key
This is particularly true when evaluating the total cost of ownership. If we purely focused on acquisition cost, it’s easy to overlook other factors such as putting measures in place to track throughput and utilization of workloads. Utilization is crucial to getting the most from your infrastructure investment. Improving utilization minimizes additional resource acquisition costs and helps ensure that wait times for resources are minimal so that results are achieved in less time. Another consideration related to productivity is the time it takes to install a complete cluster environment, have it ready for full productive use and educate users on how to use the cluster so they can run their simulations and analyses productively.
Future proofing is very important
As a cluster environments evolve, we need to ensure that extensibility exists in the environment to support future requirements. As user requirements and applications evolve, features such as system monitoring and alerting tools, workload management systems, support for an increasing number of specialized workload types, and ease-of-use features including user-centric web portals become increasingly important.
For more information
TCO estimates will vary based on many factors including the nature of the installation, in-house capability, types of applications and cost of down-time. As you choose between open-source and commercial alternatives, there are many different costs related to administration and productivity.
IBM Platform Computing has developed a whitepaper and recorded Webcast that guide you through evaluating the true cost of deploying and managing an HPC environment. If you are interested in receiving a TCO evaluation of your HPC environment, please contact IBM.
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Supercomputing veteran, Bo Ewald, has been neck-deep in bleeding edge system development since his twelve-year stint at Cray Research back in the mid-1980s, which was followed by his tenure at large organizations like SGI and startups, including Scale Eight Corporation and Linux Networx. He has put his weight behind quantum company....
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.