Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

Language Flags
January 8, 2014

Improved Genome Portal Benefits Users

Tiffany Trader

The DOE Joint Genome Institute (JGI), a national user facility that supports the managing and analysis of complex genomic data, has been working for two years to improve its user interface and infrastructure. The Genome Portal (http://genome.jgi.doe.gov), the massive genomic database and data management system operated by the JGI, now boasts significant upgrades to support efficient handling of the rapidly growing diverse genomic data stored there.

JGI Tree of Life

The JGI provides high-throughput sequencing and computational analysis in support of DOE missions related to clean energy generation and environmental characterization and cleanup. The Genome Portal allowsusers to search, download and explore multiple data sets. All DOE JGI sequencing projects are available, as well as the status, assemblies and annotations of sequenced genomes.

The DOE JGI and its partners are no stranger to big data. As a recent paper in Nucleic Acids Research highlights, JGI completed 2,635 projects in 2012, a three-fold increase over 2011. The JGI generated more than 56 trillion nucleotides of genome-sequence data in 2012 and over 70 trillion nucleotides in 2013. Over the past year (2013), JGI has added 650 genomes to the public databases. Because of the increased amount and complexity of data, it became necessary to upgrade the Genome Portal. The main focus of the upgrade was expanding computational resources to enable efficient storage, access, download and analysis of data.

Among the updates are new tools designed to make it easier to locate a specific genome, including a detailed list of all JGI projects, an interactive “Tree of Life” and domain-specific comparative resources. Enhanced search functionality supports searching for genomes and projects by keyword (e.g. plants, algae, single cell, water), name and other categories of data.

The Genome Portal website was built using Apache HTTPD, Tomcat and MySQL, and most of the Genome Portal components have been developed using Java and open sources tools. The more robust infrastructure includes four load-balanced Web servers, talking to two back-end database servers. An automated build system uses Jenkins to allow updates to be applied with disruption users.

Partnerships have also been instrumental to the upgrade effort. A strong alliance with the National Energy Research Scientific Computing Center (NERSC) has led to increased HPC-level capabilities, according to the paper’s authors. NERSC hosts the servers that run the Genome Portal and provides access to ESnet (Energy Sciences Network), which facilitates high-speed data transfers.

According to JGI’s Inna Dubchak, JGI’s alliance with NERSC will enable “faster and smoother access for users tapping into the Genome Portal’s resources.”

Tags: ,

SC14 Virtual Booth Tours

AMD SC14 video AMD Virtual Booth Tour @ SC14
Click to Play Video
Cray SC14 video Cray Virtual Booth Tour @ SC14
Click to Play Video
Datasite SC14 video DataSite and RedLine @ SC14
Click to Play Video
HP SC14 video HP Virtual Booth Tour @ SC14
Click to Play Video
IBM DCS3860 and Elastic Storage @ SC14 video IBM DCS3860 and Elastic Storage @ SC14
Click to Play Video
IBM Flash Storage
@ SC14 video IBM Flash Storage @ SC14  
Click to Play Video
IBM Platform @ SC14 video IBM Platform @ SC14
Click to Play Video
IBM Power Big Data SC14 video IBM Power Big Data @ SC14
Click to Play Video
Intel SC14 video Intel Virtual Booth Tour @ SC14
Click to Play Video
Lenovo SC14 video Lenovo Virtual Booth Tour @ SC14
Click to Play Video
Mellanox SC14 video Mellanox Virtual Booth Tour @ SC14
Click to Play Video
Panasas SC14 video Panasas Virtual Booth Tour @ SC14
Click to Play Video
Quanta SC14 video Quanta Virtual Booth Tour @ SC14
Click to Play Video
Seagate SC14 video Seagate Virtual Booth Tour @ SC14
Click to Play Video
Supermicro SC14 video Supermicro Virtual Booth Tour @ SC14
Click to Play Video