January 10, 2014

Scientific Computing: the Case for Python

Tiffany Trader
Python logo 250x

What’s in your scientific computing toolbox? Over at the R Bloggers site, University of Texas at Austin research associate Tal Yarkoni explains why these days, his go-to language is Python, whether it’s for text processing, numerical computing, or even data visualization. The post, subtitled “why Python is steadily eating other languages’ lunch,” explores the advantages of using one programming language across different applications.

Yarkoni observes that over the past 2 or 3 years, his scientific computing toolbox has grown increasingly homogenous. The formerly diverse set of tools, which included Ruby, Python, MATLAB, R, and a few others, is now dominated by Python. Just how dominated? Yarkoni contends that 90-95 percent of his work can now be done “comfortably” in Python.

States Yarkoni: “The increasing homogenization (Pythonification?) of the tools I use on a regular basis primarily reflects the spectacular recent growth of the Python ecosystem. A few years ago, you couldn’t really do statistics in Python unless you wanted to spend most of your time pulling your hair out and wishing Python were more like R (which, is a pretty remarkable confession considering what R is like). Neuroimaging data could be analyzed in SPM (MATLAB-based), FSL, or a variety of other packages, but there was no viable full-featured, free, open-source Python alternative. Packages for machine learning, natural language processing, web application development, were only just starting to emerge.”

“These days, tools for almost every aspect of scientific computing are readily available in Python. And in a growing number of cases, they’re eating the competition’s lunch,” he adds.

Now when Yarkoni tackles a new project, his first consideration has shifted from looking for the best-suited tool to asking whether it can be done in Python. There are potential downsides (lack of statistical functionality, for example) but they are outweighed by benefits such as preserving language purity and reducing switching costs. A programmer also has to decide if going with a better-suited (but less familiar) language for a particular task is worth the loss of efficiency.

Of course, Python cannot replace all other languages. As examples, Yarkoni points to the statistical packages users have contributed to R, MATLAB’s signal processing packages, and for serious performance on the largest datasets, the need for highly optimized code in a low-level compiled language. But when it comes to the majority of scientific computing, he stands firm in his support for Python.

“I don’t think it’s entirely unfair to suggest that, at this point, Python has become the de facto language of scientific computing in many domains,” he says. “If you’re reading this and haven’t had much prior exposure to Python, now’s a great time to come on board!”

The author’s original blog entry with comments is available here.