When Time Is of the Egress: Optimizing Your Transfers

By Andrew Kaczorek and Dan Harris

July 31, 2012

Traditionally running scientific workloads in AWS provides a diverse toolkit that allows researchers to easily sling data around different time zones, regions, or even globally once the data is inside of the infrastructure sandbox. However, getting data in and out of AWS has historically been more of a challenge. The available resources are still evolving and those pesky laws of physics tend to get in the way. Considering the rise of enterprises utilizing cloud for larger data and compute needs and the complexities that come with it, we thought it would be helpful to offer tips on optimizing ingress and egress transfers.

Within scientific computing there is a massive disconnect from theoretical conversations and the real world of data movement. We recently performed a data transfer to Amazon’s Elastic Compute Cloud using their Import/Export service. The service allows customers to mail in data on physical media which is then placed into a S3 bucket or EBS volume of their choice. As an experiment to compare this transfer to network-based transfer mechanisms like multi-stream upload to S3, we recorded all the time it took to prepare and ship the drive to Amazon.

There were several steps to transfer the 317 GBs of DNA sequence data into EC2:

  1. Installed AWS Import/Export command line tools.

  2. Created an Import job using AWS command line tools including a manifest and signature.

  3. Realized that the drive is an ext3 file system (and mounting ext3 on OS X is non-trivial).

  4. Created an Ubuntu virtual machine.

  5. Mounted the drive on the Ubuntu VM and wrote the signature file and manifest to the drive.

  6. Physically labeled the drive with a transfer ID that was provided by the registration process.

  7. Packaged and addressed the drive with a specific address that was to be used for the shipment.

  8. Headed to the local FedEx and shipped the drive overnight.

  9. Waited….

  10. Viewed completed transfer logs.

The next step had us moving the data from S3 to an EC2 instance to use it in a computation run. Direct to EBS snapshot is an option, but due to its higher costs as an image of the drive, the unknowns associated with the newness of the feature, and the constrains to the specific content of the file system, we decided against it.

Table of Shipping and Transport Times:

Prepare Drive

3 hr (concurrent with other project work)

Drive Shipped

4:12 PM EST (FedEx log)

Drive Arrives IAD

3:20 AM EST (FedEx log)

Drive Arrives at Amazon facility

9:45 AM EST (FedEx log)

Drive accepted by Amazon

1:13 PM EST (I/E toolkit log)

Data transfer begins

5:40 PM EST (I/E toolkit log)

Data transfer completes

9:17 PM EST (I/E toolkit log)

Here is a summary of the entire activity:

Total time to transfer 317GB

32 hours

Extrapolated total time to transfer 1TB

39.8 hours

Throughput of active AWS transfer

199 Mbps

Active AWS transfer of 317GB

3.6 hours

Extrapolated active AWS transfer of 1TB

11.4 hours

Overall throughput of 317GB transfer

22.5 Mbps

Extrapolated overall throughput of 1TB transfer

57.2 Mbps

This import job was compared to the results on some recent multi-stream upload tests performed with an envy-inducing 5 Mbps upload speed compared to 1 Mbps.

File Size

Transfer Time

Avg Speed

250 MB – one thread

413 seconds

.605 MB/sec (4.84 Mbit/sec)

250 MB – 30 threads

412 seconds

.606 MB/sec (4.84 Mbit/sec)

1 GB – one thread

1,695 seconds

.604 MB/Sec (4.83 Mbit/sec)

1 GB – 30 threads

1,693 seconds

.605 MB/sec (4.84 Mbit/sec)

We were able to saturate upload bandwidth and ingress at customer sites, which have much higher outbound data rates in the 50 Mbps range. Further, if there’s a bottleneck for delivering data over the wire it’s on the source end and not on the EC2 end of the line.

The results showed that 50 Mbps of upload speed could saturate a company’s network therefore throttling transfer at 70 percent total bandwidth for an outbound rate 35 Mbps. Interestingly, the transfer speed is faster than the Import/Export service. This shows that almost 500 GB could be moved in the same time it took to transfer by shipping the drive. This drive wasn’t filled to capacity and the theoretical Import/Export throughput would use a full drive by extrapolating the time to load 1TB. Loading that extra data would take about 8 more hours and increase the throughput of the Import/Export approach to 58 Mbps. The rate could also increase if the time it takes to prep the drive was reduced.

What we found from our experiment is that the nature of your workflow should be considered when deciding which transfer method to use. If producing a constant flow of data at a rate that matches the allotted upload bandwidth, streaming over the network is a better option. On the other hand, if there is a large, pre-existing data set and no time to wait for it to upload consider using Amazon’s Import/Export service.

Initiating a transfer entirely in software and having the data eventually make its way into the cloud without getting up from your desk is not always practical. For example, a 317 GB payload would take approximately 30 hours to transfer to AWS if using the Import/Export job approach and 30 days to import 1 Mbps uplink was saturated 24/7. Given a typical enterprise uplink of 50 Mbps, the tables would be turned. Let’s not forget non-technical factors involved in the use of the Import/Export approach such as the hassle handling USB drives, cardboard, packing tape, and cranky shipping depot employees.

Lastly, if the over-the-wire transfer is projected to take longer than a business week, use an AWS Import/Export job instead. AWS Import/Export is an extremely viable way of managing the ingress and egress of data until bandwidth becomes more ubiquitous and plentiful.

Editor’s Note: The original byline was incorrectly attributed to Cycle Computing CEO Jason Stowe.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Challenges Face Astroinformatics as It Sorts Through the Stars

June 15, 2018

You might have seen one of those YouTube videos: they begin on Earth, slowly zooming out to the Moon, the Solar System, the Milky Way, beyond – and suddenly, you’re looking at trillions of stars. It’s a lot to take Read more…

By Oliver Peckham

The Machine Learning Hype Cycle and HPC

June 14, 2018

Like many other HPC professionals I’m following the hype cycle around machine learning/deep learning with interest. I subscribe to the view that we’re probably approaching the ‘peak of inflated expectation’ but not quite yet starting the descent into the ‘trough of disillusionment. This still raises the probability that... Read more…

By Dairsie Latimer

SDSC Researchers Use Machine Learning to More Accurately Model Water

June 13, 2018

Water – H2O – is a simple but fascinating (and useful) compound. San Diego Supercomputing Center researchers used machine learning techniques to develop models for simulations of water with “unprecedented accuracy. Read more…

By Staff

HPE Extreme Performance Solutions

HPC and AI Convergence is Accelerating New Levels of Intelligence

Data analytics is the most valuable tool in the digital marketplace – so much so that organizations are employing high performance computing (HPC) capabilities to rapidly collect, share, and analyze endless streams of data. Read more…

IBM Accelerated Insights

Banks Boost Infrastructure to Tackle GDPR

As banks become more digital and data-driven, their IT managers are challenged with fast growing data volumes and lines-of-businesses’ (LoBs’) seemingly limitless appetite for analytics. Read more…

Xiaoxiang Zhu Receives the 2018 PRACE Ada Lovelace Award for HPC

June 13, 2018

Xiaoxiang Zhu, who works for the German Aerospace Center (DLR) and Technical University of Munich (TUM), was awarded the 2018 PRACE Ada Lovelace Award for HPC for her outstanding contributions in the field of high performance computing (HPC) in Europe. Read more…

By Elizabeth Leake

The Machine Learning Hype Cycle and HPC

June 14, 2018

Like many other HPC professionals I’m following the hype cycle around machine learning/deep learning with interest. I subscribe to the view that we’re probably approaching the ‘peak of inflated expectation’ but not quite yet starting the descent into the ‘trough of disillusionment. This still raises the probability that... Read more…

By Dairsie Latimer

Xiaoxiang Zhu Receives the 2018 PRACE Ada Lovelace Award for HPC

June 13, 2018

Xiaoxiang Zhu, who works for the German Aerospace Center (DLR) and Technical University of Munich (TUM), was awarded the 2018 PRACE Ada Lovelace Award for HPC for her outstanding contributions in the field of high performance computing (HPC) in Europe. Read more…

By Elizabeth Leake

U.S Considering Launch of National Quantum Initiative

June 11, 2018

Sometime this month the U.S. House Science Committee will introduce legislation to launch a 10-year National Quantum Initiative, according to a recent report by Read more…

By John Russell

ORNL Summit Supercomputer Is Officially Here

June 8, 2018

Oak Ridge National Laboratory (ORNL) together with IBM and Nvidia celebrated the official unveiling of the Department of Energy (DOE) Summit supercomputer toda Read more…

By Tiffany Trader

Exascale USA – Continuing to Move Forward

June 6, 2018

The end of May 2018, saw several important events that continue to advance the Department of Energy’s (DOE) Exascale Computing Initiative (ECI) for the United Read more…

By Alex R. Larzelere

Exascale for the Rest of Us: Exaflops Systems Capable for Industry

June 6, 2018

Enterprise advanced scale computing – or HPC in the enterprise – is an entity unto itself, situated between (and with characteristics of) conventional enter Read more…

By Doug Black

Fracas in Frankfurt: ISC18 Cluster Competition Teams Unveiled

June 6, 2018

The Student Cluster Competition season heats up with the seventh edition of the ISC Student Cluster Competition, slated to begin on June 25th in Frankfurt, Germ Read more…

By Dan Olds

Japan Starts Up 3-Petaflops ‘ATERUI II’ Cray Supercomputer

June 5, 2018

The world's most powerful supercomputer for astrophysical calculations has begun operations in Japan. The announcement comes from the National Astronomical Obse Read more…

By Tiffany Trader

MLPerf – Will New Machine Learning Benchmark Help Propel AI Forward?

May 2, 2018

Let the AI benchmarking wars begin. Today, a diverse group from academia and industry – Google, Baidu, Intel, AMD, Harvard, and Stanford among them – releas Read more…

By John Russell

How the Cloud Is Falling Short for HPC

March 15, 2018

The last couple of years have seen cloud computing gradually build some legitimacy within the HPC world, but still the HPC industry lies far behind enterprise I Read more…

By Chris Downing

US Plans $1.8 Billion Spend on DOE Exascale Supercomputing

April 11, 2018

On Monday, the United States Department of Energy announced its intention to procure up to three exascale supercomputers at a cost of up to $1.8 billion with th Read more…

By Tiffany Trader

Deep Learning at 15 PFlops Enables Training for Extreme Weather Identification at Scale

March 19, 2018

Petaflop per second deep learning training performance on the NERSC (National Energy Research Scientific Computing Center) Cori supercomputer has given climate Read more…

By Rob Farber

Lenovo Unveils Warm Water Cooled ThinkSystem SD650 in Rampup to LRZ Install

February 22, 2018

This week Lenovo took the wraps off the ThinkSystem SD650 high-density server with third-generation direct water cooling technology developed in tandem with par Read more…

By Tiffany Trader

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

ORNL Summit Supercomputer Is Officially Here

June 8, 2018

Oak Ridge National Laboratory (ORNL) together with IBM and Nvidia celebrated the official unveiling of the Department of Energy (DOE) Summit supercomputer toda Read more…

By Tiffany Trader

HPE Wins $57 Million DoD Supercomputing Contract

February 20, 2018

Hewlett Packard Enterprise (HPE) today revealed details of its massive $57 million HPC contract with the U.S. Department of Defense (DoD). The deal calls for HP Read more…

By Tiffany Trader

Leading Solution Providers

SC17 Booth Video Tours Playlist

Altair @ SC17

Altair

AMD @ SC17

AMD

ASRock Rack @ SC17

ASRock Rack

CEJN @ SC17

CEJN

DDN Storage @ SC17

DDN Storage

Huawei @ SC17

Huawei

IBM @ SC17

IBM

IBM Power Systems @ SC17

IBM Power Systems

Intel @ SC17

Intel

Lenovo @ SC17

Lenovo

Mellanox Technologies @ SC17

Mellanox Technologies

Microsoft @ SC17

Microsoft

Penguin Computing @ SC17

Penguin Computing

Pure Storage @ SC17

Pure Storage

Supericro @ SC17

Supericro

Tyan @ SC17

Tyan

Univa @ SC17

Univa

Hennessy & Patterson: A New Golden Age for Computer Architecture

April 17, 2018

On Monday June 4, 2018, 2017 A.M. Turing Award Winners John L. Hennessy and David A. Patterson will deliver the Turing Lecture at the 45th International Sympo Read more…

By Staff

Google Chases Quantum Supremacy with 72-Qubit Processor

March 7, 2018

Google pulled ahead of the pack this week in the race toward "quantum supremacy," with the introduction of a new 72-qubit quantum processor called Bristlecone. Read more…

By Tiffany Trader

Google I/O 2018: AI Everywhere; TPU 3.0 Delivers 100+ Petaflops but Requires Liquid Cooling

May 9, 2018

All things AI dominated discussion at yesterday’s opening of Google’s I/O 2018 developers meeting covering much of Google's near-term product roadmap. The e Read more…

By John Russell

Nvidia Ups Hardware Game with 16-GPU DGX-2 Server and 18-Port NVSwitch

March 27, 2018

Nvidia unveiled a raft of new products from its annual technology conference in San Jose today, and despite not offering up a new chip architecture, there were still a few surprises in store for HPC hardware aficionados. Read more…

By Tiffany Trader

Pattern Computer – Startup Claims Breakthrough in ‘Pattern Discovery’ Technology

May 23, 2018

If it weren’t for the heavy-hitter technology team behind start-up Pattern Computer, which emerged from stealth today in a live-streamed event from San Franci Read more…

By John Russell

Part One: Deep Dive into 2018 Trends in Life Sciences HPC

March 1, 2018

Life sciences is an interesting lens through which to see HPC. It is perhaps not an obvious choice, given life sciences’ relative newness as a heavy user of H Read more…

By John Russell

Intel Pledges First Commercial Nervana Product ‘Spring Crest’ in 2019

May 24, 2018

At its AI developer conference in San Francisco yesterday, Intel embraced a holistic approach to AI and showed off a broad AI portfolio that includes Xeon processors, Movidius technologies, FPGAs and Intel’s Nervana Neural Network Processors (NNPs), based on the technology it acquired in 2016. Read more…

By Tiffany Trader

Google Charts Two-Dimensional Quantum Course

April 26, 2018

Quantum error correction, essential for achieving universal fault-tolerant quantum computation, is one of the main challenges of the quantum computing field and it’s top of mind for Google’s John Martinis. At a presentation last week at the HPC User Forum in Tucson, Martinis, one of the world's foremost experts in quantum computing, emphasized... Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This