ESnet’s Bill Johnston to Retire From LBNL

By Nicole Hemsoth

April 11, 2008

In a few months, Bill Johnston of Lawrence Berkeley National Laboratory will step down as head of ESnet, the Department of Energy’s international network that provides high bandwidth networking to tens of thousands of researchers around the world. In a career that began in the 1970s and has included seminal work in networking, distributed computing, the Grid, and even crossing paths with Al Gore, Johnston has had a hand in the development of many of the high performance computing and networking resources that are today taken for granted. And as he tells it, it all began with the brain of Berkeley Lab scientist Tom Budinger.

Berkeley Lab is now recruiting for a new head of the ESnet Department at the Lab. [see the posting at: http://jobs.lbl.gov/LBNLCareers/details.asp?jid=21495&p=1]. Although he plans to officially retire from LBNL by June 1, Johnston is already planning how he’ll spend his time — doing pretty much what he does now: working, reading for both professional and personal interest, and traveling, but adjusting the ratio somewhat. Berkeley Lab’s Jon Bashor managed to get an hour with Johnston to talk about his career, his accomplishments and his future plans.

Question: You’ve announced your plan to retire this year after 35 years at Department of Energy labs. How did you get started in your career?

Bill Johnston: When I was a graduate student at San Francisco State University, one of my professors spent her summers working on math libraries for the Star 100, which was CDC’s supercomputer successor to the CDC 7600. Through this connection, I started taking graduate classes at the Department of Applied Sciences at Lawrence Livermore National Laboratory , then went to work full time in the atmospheric sciences division. There I worked on LIARQ, an air quality model that is still used by the San Francisco Bay Area Air Quality Management District (BAAQMD). Although the code was developed at Livermore, the BAAQMD couldn’t run it there. So, I would bring it to LBNL to run on the Lab’s CDC 7600 computer.

I began spending more and more time at the Berkeley Lab, and developed data visualization techniques that added a graphical interpretation interface to the code, so that they had dozens of different ways of looking at the data. I went on to turn this work into a data visualization package and made it available to other users of the 7600 that was the main LBNL machine at the time. Through this work I met Harvard Holmes, then head of the graphics group. I also knew the head of the systems group and was offered jobs in each group. Something Harvard said led me to join the graphics group, which was a good decision because five years later the systems group had tanked because there was no new funding to replace the 7600 when it was retired.

Over the years, I took over the graphics group, and was also getting more involved in visualization of science data. As a result, we were often focused on large data sets. These data sets were often stored at remote sites, and accessing them led me into networking. In fact as a result of some of this work, we set up the first remote, high performance network-based visualization demonstration at the Supercomputing conference in 1991. Working with the Pittsburgh Supercomputer Center (PSC), we combined the Cray Y-MP at PSC with a Thinking Machines CM2 in order to do the rendering — the conversion of the data into a graphical representation — fast enough for interactive manipulation. We — mostly David Robertson — split the code up to run part on the massively parallel CM2 and do the vector processing part on the Cray. The idea was to have the graphics workstation at SC91 in Albuquerque getting data from the supercomputers at PSC. Because high performance TCP/IP implementations weren’t available, we partnered with Van Jacobson of LBNL and Dave Borman from Cray to provide a high-speed, wide area version of TCP for a Sun workstation at SC91 and for the Cray at PSC. I remember Van working on the Sun for 48 hours in order to get the two TCP stacks to work together. NSF ran a connection from the conference to the 45 Mb/s NSF network backbone (which effectively was the Internet at the time) into the conference for the first time.

The demo was a volume visualization of Tom Budinger’s brain, with the data from some of Tom’s high-resolution MRI work. This was real-time visualization — you could take it, grab it, rotate it. It all started with Tom’s brain. [Note: Budinger is a physician and physicist who helped develop MRI and previously headed LBNL’s Center for Functional Imaging.]

For myself and Brian Tierney and David Robertson from LBNL, this was our introduction to high performance wide area networking. We got involved more and more with networking and graphics, and were even involved with ESnet on several projects.

One of our projects was with the DARPA MAGIC gigabit network testbed, that included LBNL, SRI, University of Kansas, the Minnesota Supercomputer Center, the USGS EROS Data Center, and Sprint. We worked with Sprint to build the country’s first 2.5 gigabit ATM (a technology that is not used much any more) network linking Minneapolis with sites in Sioux Falls and Overland Park and Lawrence, Kansas. Together with Brian Tierney and Jason Lee (both students of mine), we developed the Distributed Parallel Storage System to drive an SRI-developed visualization application over the network with high-speed parallel data streams. This experiment made it clear that in order to get end-to-end high performance you had to address every component in the distributed system from end to end — the applications, the operating system and network software, and the network devices — all at the same time in order to make things run fast. This led directly to my interest in Grids. Interestingly, the ideas behind our work in DPSS [Distributed-Parallel Storage System] also fed into the development of GridFTP, which is one of the most enduring Grid applications, and heavily used by the LHC [Large Hadron Collider] community to move the massive data of the CMS and ATLAS detectors around the world for analysis by the collaborating physics community.

Question: Can you elaborate more on your work with Grids?

Johnston: In the late 1990s Ian Foster organized a workshop on distributed computing, and the focus was on writing a book on the component-based, end-to-end approach that was emerging in the research and education community. Bill Feiereisen of NASA (head of the NAS center at the time) participated and he’s the one who suggested the name “Grid” for the book that ended up popularizing the subject, saying it reminded him of the power grid with the big computers on the Grid akin to power generators.

At this workshop, we sketched out the outline of a book, covering the basic concepts. The time was right — we had a group of people interested in a common development. This led Ian and me to establish the Grid Forum. But it never would have turned into a viable organization had Charlie Catlett not attended the workshop. He’s the consummate organizational guy and he got the Forum organized and ran it for several years.

And then the Europeans got interested, especially with the planned experiments at CERN. Charlie spent a lot of time traveling to Europe and working with the different Grid organizations over there. This led to combining the U.S. and European efforts to produce the Global Grid Forum (now called the Open Grid Forum — the result of much more industry participation).

At this time, I was working on assignment at NASA’s Ames Research Center, helping build NASA’s Grid — the Information Power Grid. Grids were becoming more well-known, and in 2000 Bill McCurdy from Berkeley Lab talked me into coming back to LBNL full time to establish the Distributed Systems Department. After a few years, I was invited to take over leadership of ESnet.

Question: In the early 1990s, you worked on a number of pioneering projects. One of the best generally known is the virtual frog, which still gets thousands of hits a month [http://froggy.lbl.gov/virtual/]. Can you talk about how they were done and the effects today?

Johnston: The frog was really a side activity, but it came out of my belief that with the Web you ought to be able to do interactive graphics. David Robertson and I launched it in 1994 and it’s still being used — tens of thousands of hits a day. If the server goes down, we get email from science teachers around the world.

During the time of the MAGIC testbed, we developed BAGNET, the Bay Area Gigabit Testbed Network. This was when Bob Kahn’s Strategic Computing Program, Gigabit Network Testbeds project, was part of the federal budget, and then Sen. Al Gore became interested in what he dubbed the Information Superhighway. (See “Creating a Giant Computer Highway” by John Markoff, New York Times, September 2, 1990.) Gore was head of the Senate Committee on Commerce, Science and Transportation, and he called together the heads of Sun, Thinking Machines, Cray and DARPA to talk about high speed networking and supercomputing. LBNL, because of its work in the DARPA MAGIC testbed project, was asked to create a demo to show what bandwidth was — with the possible exception of Gore, the senators on the committee did not know.

We wanted to bring in a live network connection to the Senate building, but Craig Fields, then head of DARPA, so “no way” — it too risky. So, we used inter-packet arrival times from measurements on the Internet backbone that Van provided to realistically simulate an Internet connection and produced a movie to show the equivalent of a remote connection at different speeds, from 9600 bits/sec to 45 megabits/sec. The data we used was a fluid flow over a back-facing step — from research done by James Sethian.

Two funny things happened after the demo. When we were all finished, this old senator piped up and said, “All I want to know is what’s this going to do for North Dakota?” Then John Rollwagen was talking about the next-generation supercomputer and how they were going to reach gigaflops. Well, Gore just started laughing — he said “That’s what my (1988) presidential campaign was — a gigaflop!” He was very warm and funny, not at all like he seemed as vice president.

Question: About five years ago, you were named head of ESnet, DOE’s network supporting scientists around the world. How does the ESnet of today compare to the 2003 version?

Johnston: When I joined ESnet, the organization was totally focused on ESnet as a production network, with the leadership deciding the directions and the needs of the users. When I came in, I decided to make a fundamental change. There was nothing we could say as a network organization that wouldn’t appear self-serving, such as seeking a budget increase. We needed to make a solid connection between the network and the science needs of the Office of Science (SC), and if they needed a bigger, higher speed network, they could help make the case for it.

At the time, Mary Ann Scott was our DOE program manager and an enthusiastic backer of ESnet. We organized a workshop for our user community to look at how the SC science community did their science now and how the process of doing science had to change over the next five to 10 years to make significant advances. At the workshop about two-thirds of the people were from the applications side and the rest were computer and network scientists.

In talking about how science would change, we were able to show that network capabilities would have to evolve to support distributed collaborations, many of them sharing massive amounts of data. It was very clear that there would soon be a deluge of science data on the network. This led the DOE Office of Science to see that a larger network was needed and to fund a substantially expended ESnet with a new architecture known as ESnet4.

The second change was that ESnet was an insular organization focused on the network. We needed to become intimately involved with the community. For example, none of the end-to-end data flows were entirely within DOE. We had to become more outward looking and work with the intervening networks. We created working groups and committees in the U.S. and international R&E communities to determine how to provide better services.

I spent a lot of time on the road talking with the research and education networks that enabled the science collaborations between the DOE Labs and the universities: Internet2 (U.S. R&E backbone network), the Quilt (U.S. regional networks), DANTE (which operates GÉANT, the European R&E network), and two or three of the key European research and education networks. We set up working groups to build close cooperation in developing next-generation services. One example is OSCARS, the On-Demand Secure Circuits and Advance Reservation System developed in partnership with Internet2 and GÉANT. That put us on the path to where we are today — very close to end-to-end network service guarantees such as guaranteed bandwidth.

The first two workshops were so successful that ASCR (Advanced Scientific Computing Research Office) of the Office of Science — the DOE program that funds ESnet and NERSC — continued to organize workshops for gathering the networking requirements of the science program offices in the Office of Science. We’re lucky to have Eli Dart organizing these workshops, which will survey each of the SC science programs about once every 2.5 years. Eli came to us from NERSC, where he was used to working with users to learn about their requirements.

Question: One of the more significant changes has been the partnership with Internet2. Can you elaborate on this?

Johnston: The partnership really started on a bus ride between Hong Kong and Shenzhen in China. Shenzhen was China’s first “special economic zone” and is a “manufactured” city about 75 miles from Hong Kong. It went from being a village to a city of 10 million in about 30 years. There are two research and education networks in China — and there is considerable rivalry between them. We could not hold a common meeting with them, so after the meeting in Hong Kong, we took a 1½ hour bus ride from Hong Kong for a second meeting with the other group. Doug Van Houweling, CEO of Internet2, and I got to talking about our vision of what a next-generation R&E network should look like, and it turned out we had very common visions.

At the time, Internet2 was looking at using a dedicated fiber network for their next generation network, but were not completely sure they could swing it financially. We both had commercial contracts for our networks and both contracts would end within a year. What we really ought to do, we agreed, was leverage our efforts to get a good deal. When we described this idea to Dan Hitchcock, now the head of ASCR’s facilities division, he was also enthusiastic about using DOE funding to strengthen the U.S. R&E community networking while at the same time getting a good deal for the bandwidth that ESnet needed.

Question: Last year, ESnet completed the coast-to-coast links for ESnet4, the new network architecture. Can you describe the thinking behind that architecture and talk about plans for the year ahead?

Johnston: This is something that came out of the science requirements workshop. One thing that turned the light on for me was a talk by Cees de Laat of the University of Amsterdam. The Netherlands is one of the most fibered countries in the world, and Cees is involved in NetherLight — the major R&E network exchange point in the Netherlands. He gave a talk on Internet traffic that they saw through NetherLight. Cees observed that there were three easily identified peaks in traffic when you plot source traffic versus the amount of data sent per connection.

The first peak shows a lot of data traveling over a lot of connections, which is how most people think of Internet traffic — Web, email, “typical” user data transfers, etc. To handle this traffic you need expensive, high performance IP routers capable of routing millions of packets per second.

The second peak shows more data being moved between a smaller number of connections with fewer end points. These patterns are typical of university researchers and some commercial content suppliers — think high-definition video sent from movie studios to theaters. This traffic is better handled with Ethernet switches, which cost about one-fifth the price of a high-speed router.

The third peak consists of long-lived, data-intensive paths over relatively static optical links. One example of this would be the Large Hadron Collider at CERN, which will produce terabytes of data for months on end and send that data to about 15 national datacenters (two of which are ESnet sites in the U.S. — FermiLab and Brookhaven).

I realized that this is exactly the nature of the traffic that the evolving process of doing science was going to increasingly be putting onto ESnet and that we should build separate networks for the different types of traffic. The general IP network can manage the 3–4 billion connections per month of general traffic — email, Web, smaller science data transfers, etc. A second, more specialized network — the Science Data Network (SDN) — we would build as a switched circuit network to handle the very large data flows of SC science, and that is where most of our new bandwidth is. The rollout of the IP network is essentially complete and consists of five interconnected 10 Gb/s rings that cover the country. The SDN has several 10 Gb/s paths across the northern half of the country, and these will be closed and formed into rings by this summer. We will add one complete set of 10 Gb/s rings for SDN each year for the next four to five years, resulting in five, 50 Gb/s rings covering the country within five years. This is how ESnet4 is designed and is being built.

To support the third case, the petabytes of data being sent from CERN to Tier 1 national datacenter sites, there is a dedicated optical network with several 10 gigabit channels to delivering data to Fermilab near Chicago and Brookhaven National Laboratory on Long Island essentially continuously — 24 hours a day, seven days a week, for about nine months out of the year.

Question: In 2007, ESnet also completed its third Metropolitan Area Network. Can you discuss the idea behind these?

Johnston: One thing that became clear when we looked at the science requirements was that the big science national labs need to be connected directly to the high-speed ESnet core, but we couldn’t do this with old, commercial tail circuit architecture because these are prone to mechanical damage and can be out of service for days. We needed to build redundant optical networks to connect the labs to the ESnet core network. One evening at a meeting in Columbus, Ohio, Wes Kaplow, then CTO of Qwest, and I sketched out a plan for the Bay Area Metropolitan Area Network (BAMAN) on the back of a napkin over beer. The BAMAN links LBNL, LLNL, NERSC, the Joint Genome Institute, Sandia California and SLAC with a redundant ring, providing multiple 10 Gb/s channels connected to the ESnet backbone. This approach had proven cost effective and reliable.

Given this success, we pushed this metro area architecture forward, next to Long Island, then to Chicago. In the Chicago area, Linda Winkler of Argonne had a number of the network elements in place to Chicago, but no connection from Fermilab to Argonne. To bridge this gap, we got a special grant from the Office of Science and installed new fiber between the labs. Among other things, this involved tunneling under Interstate 55, which turned out to be easier than getting the permits to go through a couple of little villages along the path.

Question: OK. How do you plan to keep busy after you retire?

Johnston: Well, I plan to work part-time for ESnet, helping my successor with the transition. The next person may not have the combination of the science experience, the DOE experience, and the lab experience that I did, so I will be around to mentor and assist.

On the personal side, over the past decade I have developed an intense curiosity about the world’s transition to modernity — the historical and cultural events between about 1870 and 1945 that shaped the world as we know it today. What led to the situation of modern Europe, to the blooming of Fascism in the early-mid 20th century? What role did religion play in the transition? This is an extension of my original interest in German expressionist art, which was an outgrowth of World War I. What caused this sea change in how people create and perceive art? Why did the Weimar Republic — the German experiment in parliamentary democracy — fail? How did Hitler get elected? I think that all of this is part and parcel of the transition to modernity and I plan to spend a lot of time reading about these interrelated parts of 19th and 20th century Western culture.

Also, with my wife Nancy’s interest in birding, we plan to spend a lot of time exploring California in detail and photographing birds. She is getting quite good at it — look at nejohnston.org. We will also be travelling more to some of our favorite areas — Seattle, Hawaii, and New Mexico.

—–

Source: Lawrence Berkeley National Laboratory

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire