Cloud Gives DNA Compiler Wings
Imagine being able to turn bacteria into molecular factories, capable of producing everything from earth-friendly biofuels to personalized medicines? This is the promise of synthetic biology, which has made remarkable advances over the last decade, taking tools and concepts from physics, engineering and computer science to design new biological systems.
One of the foremost researchers in the field is Professor Howard Salis. An assistant professor at Penn State University, Salis developed a cloud-based software platform, called DNA Compiler, to support the efforts of synthetic biology researchers around the world.
In an interview with HPCwire, Salis explains that even the simplest bacterium has more moving parts than an automobile, but nature did not provide exact design specifications for how those living organisms function.
One of the primary goals of microbiology and genomics is to reverse-engineer these design specifications. In synthetic biology and in metabolic engineering, the objective is to forward engineer non-natural organisms in order to solve humanity’s problems.
“It’s a long-term goal of the world to reduce reliance on petrochemical feedstock and to manufacture chemicals at low-cost and renewably,” states Salis.
“In addition, if we know how an organism functions, we can treat diseases more efficiently, not just by trying to find a particular drug that happens to work, but by designing new drugs that harvest specific mechanisms that treat very specific diseases.”
Designing DNA sequences is computationally intensive. The number of possible mutations in a short DNA sequence is greater than the number of atoms in the universe. Salis’ group uses optimization algorithms to identify a DNA sequence that achieves a specific behavior.
“Even though DNA is a universal genetic code, a bacterium will interpret its DNA in one way, but a human cell, for example, will interpret its DNA differently,” explains Salis.
“There are very specific physical and chemical interactions in the cell that determine how an organism reads its DNA and expresses its gene.
“By knowing those physical and chemical interactions and developing quantitative models to predict the strength of those interactions, we can then design new DNA sequences that are interpreted by the organisms, thus writing new code, that will then execute some desired program – so it’s a compiler.”
The DNA compiler is a collection of biophysical models that are predicting the main steps in gene expression and how they all work together to control an organism’s behavior.
About four years ago, Salis and his team created a Web interface to the Compiler, but the site drew so many users that the local campus-based server was soon overloaded. The decision was made to offload the computation to Amazon compute clusters, allowing the underlying model to run its computations on nodes that are dynamically turned on in response to user submissions.
The stack includes a Web interface, written in Python, SQLAlchemy for the database connection, and some JAVA script to make it interactive. On the backend, there’s a front-facing server which makes a RESTFUL API to another server hosted on Amazon’s AWS EC2 computing cluster and S3 distributed storage. EC2 AutoScale groups facilitates dynamic scaling.
Salis said they considered other solutions, but when they signed on two years ago, it was pretty clear that Amazon had the best solution with the best documentation. The fact that Netflix was a primary customer helped allay reliability concerns.
The professor cites ease of operation and the ability to offload management responsibilities as some of the key motivations for moving to a cloud based-solution. Of course, the scalability has been a huge boon to users who no longer have to deal with long wait times and also enjoy faster compute times.
So far, more than 6,000 registered users from MIT, Harvard, Caltech, Stanford, Rice, Imperial College, and many more institutions, have used the DNA Compiler to engineer more than 50,000 DNA sequences.
Although computational methods are used to design synthetic organisms, Salis points out that the design of an organism is very different compared to the design of a semiconductor chip.
“Designing life will be more like designing a space shuttle than designing a chip,” says Salis. “A space shuttle has a lot of moving parts and is operating under extreme conditions. If something goes wrong, it can effect other things.”
At its current pace of progress, Salis predicts that biotech will have almost complete control over energy metabolism of cells in the next five years, leading to a much better capability to manufacture a large diversity of chemicals.
“Starting right now, if you had the financial resources, you could reengineer an entire organism with the sole purpose of manufacturing a biofield,” he states. “Every single nucleotide in that organism could be optimized for the sole purpose of manufacturing a biofield with very high production rates. That is very different compared to what people have done in the past which is to make small number of mutations to an existing microbe in order to improve production.”
A paper detailing the team’s research appears in a recent issue of the journal Molecular Systems Biology.