Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

In a recent article, Massively Parallel Technologies claimed that they had developed the first mathematical derivation of Amdahl's law. You know, marketing is a funny thing.  The claim itself sounds preposterous, but there is nothing false in it. How is that?

To keep things simple:  Suppose you implemented an algorithm with a program that has two instructions: (I1 + I2).  It would take to two clock ticks to complete on one CPU that executes one instruction at each clock tick.

To understand the parallelization opportunities you have to ask yourself a simple question:

Can I2 be executed without waiting for I1 to execute to completion?

If the answer is yes, then I could run both instructions on two different processors at the same time, in which case the program can finish in nearly one clock tick.  I said “nearly” because I may have to send the second instruction to the second CPU and fetch results back, etc.  So, the best speedup would be cutting the time in half; therefore, if I had N instructions executing on N processors, the best theoretical speedup would be approaching 1/N — assuming network communication latency approaches zero.

Clearly if I2 cannot be executed before I1 completes, we are out of luck! The algorithm is not parallelizable — the exact same algorithm, that is. (Park this notion for a later HPCwire article.) But why can I2 execute at the same time? Because it needs data from I1.  This is the case of serially dependent pattern  – check my recent article for a brief description of what I call stage I and stage II applications: http://www.sys-con.com/story/?storyid=48036&de=1

So there you have it:  1+1 = 2 for serially executing applications and 1+1 = 1 for stage I parallel executing applications. So for N processors, I can execute a stage I application in 1/N time or for nearly N times speedup.

Let's get back to MPT claims:

• New math? Not really, just a few new terms.
• 240 times speedup. Certainly! On 255 processors
• 102 speedup? Absolutely! On 127 processors.

So, the results are impressive, but are still below the theoretical N times speedup. WOW!

The numbers I just quoted are all in Tim Curns' great follow-up HPCwire interview – (http://news.tgc.com/msgget.jsp?mid=334190), [M334190]) — just look closely.

But let me raise the next obvious question:  Can I do better than N times speedup on N processors?   You are thinking a big NO, but I will tell you a big YES — tune in to my next HPCwire article for the answer…

Labro Dimitriou is a High Performance Computing product and marketing specialist. He has been in the field of distributed computing, applied mathematics, and operations research for over 20 years, and has developed commercial software for trading, engineering, and geosciences. Labro has spent the last five years designing and implementing HPC and BPM business solutions. Currently employed by ASPEED Software.