XSEDE ECSS, Blacklight and Stampede Support California Yellowtail Genome Assembly

April 12, 2018

April 12, 2018 — If you eat fish in the U.S., chances are it once swam in another country. That’s because the U.S. imports over 80 percent of its seafood, according to estimates by the United Nations. New genetic research could help make farmed fish more palatable and bring America’s wild fish species to dinner tables. Scientists have used big data and supercomputers to catch a fish genome, a first step in its sustainable aquaculture harvest.

Researchers assembled and annotated for the first time the genome – the total genetic material – of the fish species Seriola dorsalis. Also known as California Yellowtail, it’s a fish of high value to the sashimi, or raw seafood industry. The science team formed from the Southwest Fisheries Science Center of the U.S. National Marine Fisheries Service, Iowa State University, and the Instituto Politécnico Nacional in Mexico. They published their results on January of 2018 in the journal BMC Genomics.

“The major findings in this publication were to characterize the Seriola dorsalis genome and its annotation, along with getting a better understanding of sex determination of this fish species,” said study co-author Andrew Severin, a Scientist and Facility Manager at the Genome Informatics Facility of Iowa State University.

“We can now confidently say,” added Severin, “that Seriola dorsalis has a Z-W sex determination system, and that we know the chromosome that it’s contained on and the region that actually determines the sex of this fish.” Z-W refers to the sex chromosomes and depends on whether the male or female is heterozygous (XX,XY or ZZ,ZW), respectively. Another way to think about this is that in Z-W sex determination, the DNA molecules of the fish ovum determine the sex of the offspring. By contrast, in the X-Y sex determination system, such is found in humans, the sperm determines sex in the offspring.

It’s hard to tell the difference between a male and female yellowtail fish because they don’t have any obvious phenotypical, or outwardly physically distinguishing traits. “Being able to determine sex in fish is really important because we can develop a marker that can be used to determine sex in young fish that you can’t determine phenotypically,” Severin explained. “This can be used to improve aquaculture practices.” Sex identification lets fish farmers stock tanks with the right ratio of males to females and get better yield.

Assembling and annotating a genome is like building an enormous three-dimensional jigsaw puzzle. The Seriola dorsalis genome has 685 million pieces – its base pairs of DNA – to put together. “Gene annotations are locations on the genome that encode transcripts that are translated into proteins,” explained Severin. “Proteins are the molecular machinery that operate all the biochemistry in the body from the digestion of your food, to the activation of your immune system to the growth of your fingernails. Even that is an oversimplification of all the regulation.”

Severin and his team assembled the genome of 685 megabase (MB) pairs from thousands of smaller fragments that each gave information to form the complete picture. “We had to sequence them for quite a bit of depth in order to construct the full 685 MB genome,” said study co-author Arun Seetharam. “This amounted to a lot of data,” added Seetharam, who is an associate scientist at the Genome Informatics Facility of Iowa State University.

The raw DNA sequence data ran 500 gigabytes for the Seriola dorsalis genome, coming from tissue samples of a juvenile fish collected at the Hubbs SeaWorld Research Institute in San Diego. “In order to put them together,” Seetharam said, “we needed a computer with a lot more RAM to put it all into the computer’s memory and then put it together to construct the 685 MB genome. We needed really powerful machines.”

That’s when Seetharam realized that the computational resources at Iowa State University at the time weren’t sufficient get the job done in a timely manner, and he turned to XSEDE, the eXtreme Science and Engineering Discovery Environment funded by the National Science Foundation. XSEDE is a single virtual system that scientists can use to interactively share computing resources, data and expertise.

“When we first started using XSEDE resources,” explained Seetharam, “there was an option for us to select for ECSS, the Extended Collaborative Support Services. We thought it would be a great help if there were someone from the XSEDE side to help us. We opted for ECSS. Our interactions with Phillip Blood of the Pittsburgh Supercomputing Center were extremely important to get us up and running with the assembly quickly on XSEDE resources,” Seetharam said.

The genome assembly work was computed at the Pittsburgh Supercomputing Center (PSC) on the Blacklightsystem, which at one point was the world’s largest coherent shared-memory computing system. Blacklight has since been superseded by the data-centric Bridges system at PSC, which includes similar large-memory nodes of up to 12 terabytes — a thousand times more than a typical personal computer. “We ended up using Blacklight at the time because it had a lot of RAM,” recalled Andrew Severin. That’s because they needed to put all the raw data into the computer’s random access memory (RAM) so that it could use the algorithms of the Maryland Super-Read Celera Assembler genome assembly software. “You have to be able to compare every single piece of sequence data to every other piece to figure out which pieces need to be joined together, like a giant puzzle,” Severin explained.

“We also used Stampede,” continued Severin, “the first Stampede, which is another XSEDE computational resource that has lots and lots of compute nodes. Each compute node you can think of as a separate computer. ” The Stampede1 system at the Texas Advanced Computing Center had over 6,400 Dell PowerEdge server nodes, which later added 508 Intel Knights Landing (KNL) nodes in preparation for its current successor, Stampede2 with 4,200 KNL nodes.

“We used Stampede to do the annotation of these gene models that we identified in the genome to try and figure out what their functions are,” Severin said. “That required us to perform an analysis called the Basic Local Alignment Search Tool (BLAST), and it required us to use many CPUs, over a year’s worth of compute time that we ended up doing within a couple of week’s worth of actual time because of the many nodes that were on Stampede.”

To read the full article, click here.


Source: TACC

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use of Rigetti’s Novera 9-qubit QPU. The approach by a quantum Read more…

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire