With over 5 million licensed users, Microsoft Excel has been a standard for data analysis for a number of years. However, as the size of data sets continues to grow, once-sufficient tools are being strained to the limit. Microsoft claims that its Excel can expand with the big data drive using its Azure cloud and a offering called Excel DataScope.
Excel DataScope extends the functionality of the original, adding new algorithms and analytics that users can execute on Azure via an add-on function that opens a portal to the cloud. Users can then use this portal to share and collaborate or simply use the storage and compute resources hosted by Microsoft.
Roger Barga, an architect in the Cloud Research Engagement team within Microsoft’s eXtreme Computing Group introduced the value proposition behind Excel DataScope in a recent article, noting that for technical computing, this represents a comfortable way to contend with large data sets.
Barga told R&D Magazine, that “scientists tend to talk about big data as a problem but it’s an ideal opportunity for cloud computing. How large data sets can be addressed in the cloud is one of the important technology shifts that will emerge over the next several years.”
According to Barga, there are a number of features that go beyond analytics that were offered in Excel previously, including new machine learning algorithms that can run on Azure and leverage hundreds of cores on the fly. He says that what this tool does is opens up an analytics algorithm in the cloud so users can “visualize the results and never have to move the actual data out of the cloud.” He says that they don’t want “a data analyst to learn much more than the names of the algorithms and what they do. Users should just think Excel has a new capability which opens up great opportunities for extracting new insights out of massive data sets.”
Although the resources available from the cloud might appear limitless, Barga said that there are performance differences that should be expected. He told R&D that “supercomputers and high performance computing clusters are designed to share data at high frequencies and low latency; in this respect, cloud computing is slower.” He touched on the storage differences as well, stating that “high performance clusters have storage arrays that provide high speed, high bandwidth pathways to storage. This is not the case in clouds, where the storage often resides separately from computing nodes, with multiple routers or hops widening the distance between them.”
He says that despite these general performance differences, the scale of big data problems that researchers are facing still make the cloud an attractive option due to the ability to scale with need and have on-demand access to high-end compute resources.