July 28, 2000

by Peter C. Patton, Ph.D.

San Diego, CA — CHoPP, for Columbia Homogenous Parallel Processor, is a parallel processor architecture developed by Herbert Sullivan when he was at Columbia University in New York in the 1970s. Sullivan used the acronym CHoPP as the name for a venture capital corporation he founded. He wanted funds to build a highly parallel supercomputer based on the CHoPP architecture and patents that resulted from his work at Columbia. He intended that this supercomputer would take advantage of concurrency at all levels of granularity in a computer program, even in existing programs that had been written for sequential or vector machines. A prototype was built at the firm’s facility in San Diego, California and the first model was sold to the San Diego Supercomputer Center. It was never delivered due to difficulty raising capital, but the prototype demonstrated that the basic concepts of the CHoPP architecture were effective and indeed, could be implemented in computer hardware.

The firm was reduced to a small consulting and design staff that became involved in a number of government sponsored studies and design projects revolving around parallel processing. During this period Sullivan came to realize that there was very little difference in the performance of various shared memory architectures. Average latency increases approximately with the log of the number of processors, processing power increases almost linearly until the number of processors equals half, or so, the average concurrency. Increasing the concurrency of the programs was much more important than any particular architecture. Much of the uniqueness of the CHoPP architecture, particularly the patented techniques for exploiting latent concurrency in existing programs, could also be implemented in user-level control software. The increasing use of distributed computing models, such as MPP and, more importantly, networks, led CHoPP to concentrate on using increased concurrency to reduce the effect of network latency on performance, in addition to the original goal of allowing nodes of hundreds or thousands of processors to scale. The original patents were sufficiently broad to cover this unforeseen application to software. However, recent developments at CHoPP and the increasing possibility of gaining software patents have led management of the firm to file specific software patents to cover a software development technology called ANTs: Asynchronous Non-preemptive Tasks. CHoPP has now applied for a patent for ANTs as a non-preemptive threading technology.

The people at CHoPP realized that, while the commercial software market was having difficulty scaling beyond a handful of processors, CHoPP had technology that permitted scaling into the thousands of processors. While the scalability and robustness of asynchronous, stateless communication has been made obvious by the success of the World Wide Web, its application to general purpose computing has been hidden by the overhead of existing methods. CHoPP has been using a low overhead mechanism for asynchronous distributed control for some time. The thirty year history of the development of this technology has been distilled into a simmple and elegant set of primitive operations that are object oriented, easy to use, and compatible with existing software systems. CHoPP now offers this environment for sale as the ANTs Software System. It is available in C++ and Java (soon). ANTs will also be available as part of MinimalJ, the most secure and portable client ever developed.

When Herb Sullivan first told me about the software implementation of his CHoPP technology, I reminded him of a termite colony. The mystery of a termite colony arises from the number of bits of information required to describe the architecture of a termite mound, a six-foot high structure commonly seen on the African savanna. The information required to design the architecture of the mound surpasses the number of neurons in a given termite’s brain; no individual termite’s brain is large enough to store the entire design. So where does the colony store it? Herb’s response was that, of course, the architecture is not stored explicitly, each termite knows only its preprogrammed function and they all work together in a non-preemptive multi-tasking environment, each to do their own thing, which in sum is the thing of the colony. He then replied that ants do the same sorts of things as termites but do it in the daylight and are beautiful and shiny black or red creatures, rather than ugly and pasty white or semi-transparent, so he would rather call the technology ants. Moreover, he pointed out, the acronym works out much better, as well.

ANTs are a programming technology for simplifying large complex software systems into independent components, thereby making them more reliable, more efficient and easier to maintain. It achieves these results because, like the ant or termite colony, it distributes control structures throughout the computation. ANTs computations are a collection of essentially independent component sub-programs whose interactions are described within the component sub-programs themselves, and do not depend on a central control structure. ANTs execute concurrently, or appear to be processing data simultaneously and interact only when they explicitly exchange data. Thus ANTs computations are concurrent computations in which all components appear to execute simultaneously unless they are waiting for data to be provided by another ANT. An inherent property of ANTs computations are that they execute concurrently on parallel multiprocessor or distributed hardware. The benefit of the ANTs programming technique can be seen from two perspectives. On the one hand, ANTs simplify programming by eliminating the need for central control structures; on the other, ANTs enhance efficiency by effectively exploiting the parallel resources of a multi-processor.

The computer architect has faced two challenges: improvement of computational efficiency – by avoiding the waste of time or space – and reducing complexity by logical simplification and reductions of system component count. In terms of computational efficiency, progress has been made in two key areas. Improvement in communication between software and hardware has come through the use of operating systems advances and threading technology. Efficiency has also been improved through the use of multiple processors. In this domain, performance is also improved because multiple processors can cooperate to complete a given computation in minimum time. Computer program resource demands are satisfied by calls to an operating system. Traditionally, hardware resources are assigned to competing processes (which may or may not be parts of one or more concurrent programs) by a technique called multi-tasking or multi-threading. This process takes place in the twilight zone between the application program and its assignment of resources by the operating system upon which it depends. Today’s state-of-the-art technology in this area is synchronous, preemptive multi-threading; that is, when an application asks the operating system to accomplish a task, the system opens a thread and locks it; assigning the requested task to that thread. The thread remains locked until the task is complete or the operating system decides some more important resource needs that thread’s assigned resource. These multiple preemptive threads are then scheduled by a single thread responsible only for scheduling all the others.

In conventional procedural programming practice, the control program specifies the sequence in which component parts of a computation are executed. Normally, this sequence ensures that program segments do not execute until the data they need is available, and also ensures that appropriate program segments execute in time to provide data before it is needed for the computation to progress. As computations become larger and more complex, control structures become more complex and limit the size and effectiveness of the application. Complexity rises exponentially with the number of function points in an application program. At 1,000 or more function points, complexity usually becomes overwhelming and the application is abandoned. While systems this large do not necessarily require more bits than the number of neurons in a single human brain, they do seem to be well beyond the ability of a single application architect to describe, and to predict, the real world performance of the application system.

In software engineering, the usual method for solving complexity issues has been to break complex problems down into a number of simpler, more manageable parts. Unfortunately, the programmer must then also undertake the coordination of the various components of the program; there has been no technique for separating the central control mechanism and encapsulating it into the program’s components. In fact, the control structure becomes either highly centralized in an explicit way or interwoven implicitly into the fabric of the application program. The result is that the parts of the program are no longer interchangeable or amenable to independent upgrades, and the entire program must be analyzed when local changes or maintenance activities are undertaken. The advantages of breaking up the program into quasi-independent segments are undermined by centralized or embedded control structures.

Due to its central control structure preemptive threading technology may get in the way of the effort to reduce complexity. ANTs solve this problem by eliminating the central control structure and making the interaction between program components explicit, asynchronous, and easily accessible for later changes. These components, or ANTs, acquire input data, process that data, and provide the results as output to be received by other ANTs, which are in a suspended state until the data they need becomes available. The programmer is no longer required to be concerned with scheduling or explicitly sequencing tasks because the ANTs control kernel (ACK) automatically schedules the execution of any program component or ANT whenever the data it awaits becomes available for processing. There is no longer any need to sequentially order processes and the computations will proceed concurrently from the point of view of the programmer. In fact they will proceed in parallel on a multiprocessor or distributed server system up to: Minimum {processors available, concurrency in the program} which is one of Sullivan’s theorems. Dependencies of one part of the program on another are automatically respected, because no ANT can execute until the ANTs on which it depends have provided the data it needs to execute.

The efficiency of ANTs depends on the special properties of the ACK, which is at the heart of CHoPP’s proprietary technology. The ACK sits between the program, which may be written in any language (commonly C++ or JAVA) and any operating system (typically UNIX or NT). ANTs communicate with messages sent to mailboxes and the ACK acts as postmaster. Some of the functions of the ACK are usually provided by conventional operating system kernels, but these have two shortcomings for ANTs. First, they are not efficient enough to manage a program broken into so many small parts, and second, currently neither operating systems nor databases provide a single object oriented interface required for today’s programs. The efficiency issue can be understood by observing that the natural decomposition of a problem often produces numerous fine grain components. For example, a business transaction may spawn as many as 50 internal computing/database transactions or object references. A natural programming style would turn each such activity into an ANT. Conventional operating systems would associate each such component with a synchronous operating system process, or thread. Then scheduling and task switching overhead would be incurred every time a thread or process is activated or suspended. The useful work done by these small grain components, say 50 instructions or so, would be very small compared to the thousands required for thread management.

By contrast, the ACK executes 10 to 30 instructions for overhead when an ANT is activated. The overhead is so low that the programmer can ignore size considerations in partitioning a computation into its natural components, using segments as large or as small as the application or its algorithm dictate. The second reason the ACK must be substituted for certain operating system functions is that protocols for thread management are inconsistent with the view that ANTs are independent, encapsulated objects. Threads communicate via unrestricted shared memory that provides no tools for private protection, or for structured shared access. As a result, threads are notoriously difficult to use; they provide mechanisms for concurrent (and parallel) execution only at the expense of increased programming complexity. ANTs, which make the interaction between components explicit and asynchronous, provide an improvement in conventional encapsulation, while permitting concurrent execution as a welcome side benefit.

ANTs are more efficient than threads for mathematical reasons and not simply because of coding tricks. An ANT program is a partially ordered set and therefore has more degrees of freedom in execution order than does a deterministic set of threads. Because the thread control mechanism is used to manage hardware and I/O, fairness and prioritization are essential. Fairness is inherently blocking and prioritization is inherently centralized. Lock/Unlock, etc. are inherently blocking because any locked thread must place itself on a list and the unlocking thread must be able to restart it. One or the other must wait while the list is altered. Once the thread count is sufficiently high, blocking dominates execution time. ANTs are non-blocking. ANTs cannot span multiple applications with a single mechanism. ANTs cannot operate in real time or assign priorities. ANTs can only guarantee that all the work in a batch of work is completed before it halts. This simplification allows ANTs to be non-blocking and have distributed control.

Traditional programming techniques have difficulties when, as is usually the case, the event, which determines a context switch, occurs before the operations necessary to permit it are complete. The resulting flag variables inevitably lead to errors. An ANTs program makes the context switch immediately and the ANT representing the new context waits for the data to be ready for it. This simplifies the code and removes the largest barrier to concurrent execution. The two technologies differ in six fundamental aspects as shown in the table below. ANTs operate in user space whereas threads operate in the kernel space. This lowers overhead and improves protection between processes. The operating system is not aware of ANTs, it deals with an ANT program as if it were a single process.

ANTs impose no priorities and unlike threads, are non-preemptive. Priorities can be established only by reference to a global view of execution, a view that determines some order relationship between threads. Such a global view is in conflict with the notion of independent execution. Priorities are appropriate in the execution of operating system calls, for the management of hardware resources, for example. In these cases ANTs make OS calls and interface with threads in the usual ways. ANTs are encapsulated objects with protected code and data. Threads are not objects, they are kernel managed scheduling entities.

ANTs execute reentrant code and have private data that is not accessible from other ANTs. They communicate by means of structured mailbox objects only. These mailboxes provide the primary ANTs message passing control mechanisms, because when ANTs access mailboxes they are suspended or continue execution according to the availability of data in the mailboxes. Threads, by contrast, communicate via shared memory. This provides no protection or structure to threads communication priorities and is a significant source of error prone code.

The programmer sees the synchronization and coordination of ANTs based entirely on data availability; ANTs execute as soon as the data they need becomes available. This data flow view appears natural to the programmer. Threads are synchronized and coordinated using Wait/Signal or Lock/Unlock primitives. Their role in the application program is unspecified and extrinsic to the application logic or algorithm, and often manifested within obscure or opaque coding sequences. This contributes to the complexity and error proneness of threads usage. When an ANT is suspended and another starts to execute certain housekeeping functions, an activity called task switching must be carried out. The number of task switching instructions required averages between 10 and 30 for ANTs. The analogous operation for threads takes a thousand or more instructions. This distinction is at the heart of ANTs efficiency and also contributes to ANTs programmability. Such low task switching overhead allows the user to be indifferent to task granularity.

ANTs and Threads Compared

ANTs Threads Execution Environment User Space Kernel Space Priority Preemption No Priority Preemption Priority Based Scheduling Encapsulation Private Data and Code None Communication Message Passing Shared Memory Synchronization Criteria Data Availability Lock/Unlock Task Switching 10-30 Instructions 1000+ Instructions

ANTs technology originated as a programming technique for exploiting parallel processing in multiprocessor computer systems. Much of the early work on concurrent programming for parallel processors involved explicit mapping of the application onto the computer architecture by the programmer, including discovering new algorithms suitable for specific architectures. In many embedded applications, the parallel processor was even specifically designed to execute a given algorithm. From the earliest days of the CHoPP architecture, the design team sought automatic mapping technology to remove this burden from the programmer. During the past decade while the CHoPP team has been developing ANTs as the software realization of its hardware architecture, multiprocessor operating systems have developed around the threads technology first used at Carnegie Mellon for Mach. This became the kernel of AiX. More recently the JAVA programming language was developed with light weight threads in an effort to reduce complexity.

Threads and ANTs both incorporate automatic mapping techniques that enable the programmer to create concurrent programs without reference to the specific hardware platform on which the application will run. ANTs cannot perform many of the real-time operating system functions for which threads were designed, and thus an ANTs program makes system calls to thread implemented functions whenever needed. On the other hand, threads are poorly adapted as a basis for a modern application programming system, because they introduce too much overhead, they defeat encapsulation, and they rely on a central control paradigm.

The original CHoPP goal of eliminating the manual mapping of programs onto multiprocessor hardware was achieved through ANTs by a significant reduction in the overhead required for intertask communication. To effectively exploit the parallel resources of a multiprocessor, it is necessary to express the application as a collection of tasks that can be distributed among the available processors. The natural decomposition of an application into interdependent cooperating tasks that communicate frequently requires processors to communicate frequently. Over the past decade interprocessor communication has improved dramatically and so has networked communication between distributed processors. In fact, processors in IBM SP Series systems connected by T3 lines are indifferent as to whether they communicate with another on the same Omega switch or over the network with a processor thousands of miles away. Today, hardware generated time required for interprocess communication is about the same as ordinary processor memory access when there is a cache miss. The software generated time required for threads to communicate when potentially suspending is two to three orders of magnitude higher than the hardware time. Thus threads are unsuited for communication intensive applications because they generate so much overhead they cancel the benefits of parallel processing.

ANTs use the conventions and tools of object oriented programming technology to provide encapsulation, naming, and reuse. The initial implementation of ANTs was done using C++; although a runtime library and several precompiled files are required to program in ANTs, no special programming tools are necessary. A JAVA version of ANTs is in its final stages of testing and will soon be available in demonstration form from an ANTs Web site. ANTs provide the programmer with a small set of reusable object classes, foremost of which are the ANTclass and the Mailboxclass. The actual ANTs runtime environment is invisible to the user and is entirely self-adjusting, since it executes object code on different platforms without user intervention. The programmer will need to give some attention to issues arising from partial ordering of their ANT “colony” or set of tasks in order to gain the full performance benefit of the technology when using a parallel processor.

Objects of the ANTsclass have private data, but all public data is restricted to that contained in Mailboxes. This data is accessed by communications instructions, which are method calls to the Mailboxclass. Objects of ANTsclass execute concurrently to the extent of processors available to host them. Their order of execution is determined by the Mailboxclass methods invoked for interprocess communication and not by any syntax ordering of the ANTs within the program. Unlike threads, ANTs operate in the user space under the ACK in the ANTs runtime package.

The Mailbox object has several private states, two of which are Valid and Invalid. If the Mailbox state is Valid, the Mailbox contains a pointer to public data. The methods of the Mailboxclass are called communication instructions because these are the only way that ANT objects communicate with each other. Sample communication instructions for the C++ version are:

void Cmailbox::Send (void* pointername) // places void pointer in mailbox and marks mailbox valid

void* Cmailbox::Receive (void) // waits until mailbox is valid and returns from send

void Cmailbox::Reuse (void) // invalidates mailbox

These functions allow the ANTs programmer to send mail to a particular ANT by storing a pointer in its mailbox, to receive mail by waiting until a particular mailbox is filled, and to clear a mailbox for later reuse.

Naturally this syntax is a bit different for the JAVA version of ANTs since the JAVA language does not support pointers. The JAVA versions are:

public class Mailbox {

Mailbox (Class type); // TypeDesc is always a reference

void Send (type ObjectReference); // in JAVA, Mailbox is strongly typed

type Receive (void);

void Reuse();


The JAVA Mailboxclass allows the programmer the same three functions without the use of pointers. The constructor of a Mailbox checks a list of available Mailbox types and creates the correct one; if it does not exist, a new class is created and added to the list. Unlike in C++, a JAVA Mailbox may only pass a single type of object. The advantages of this approach are that all ANTs programs are strongly typed and no runtime type checking is necessary. The disadvantages are that it is necessary to subclass mailbox (or use TypeDesc) to know the actual class of a mailbox. Certain programs that create dynamic classes without end may have their symbol and vector tables grow much like a C memory leak; runtime type checking will be done on lists of mailboxes because they are underspecified. JAVA casting rules make this necessary and dynamic testing is too slow to be useful.

Experiments with ANTs have shown a factor of four performance improvements without changing the code, other than to add ANTs wrappers to natural, easily recognized application program components. Another promising component based technology for distributed object computing has appeared as a part of the greater JAVA phenomenon recently and comparison with ANTs is interesting.

JavaBeans is a portable, platform-independent component model written in the JAVA language and developed in collaboration with software industry leaders. It enables developers to write reusable components once and run them anywhere, taking advantage of the platform independent nature of JAVA technology and the object distribution capability of the Internet. JavaBeans acts as a bridge between the proprietary component models and provides a seamless and powerful means for developers to build components that will run in ActiveX container applications, for example. The JavaBeans’ API provides a framework for defining reusable, embeddable, modular software components. The JavaBeans’ Specification defines a bean as a reusable software component that can be manipulated visually in a builder tool. Any object that conforms to certain basic rules can be a bean; there is no Bean class that all beans are required to subclass. A bean exports properties, events and methods. A property is a piece of the bean’s internal state that can be set and queried by the programmer; a bean may generate events and may export methods as public methods defined by the bean. In addition to these common object features a bean may have indexed properties, bound properties, and constrained properties. An indexed property has an array value for which the bean provides methods to get and set individual elements of the array. A bound property sends out a notification when its value changes, and a constrained property does the same but also allows that change to be vetoed by listeners.

ANTs always adopt the component mechanism of its host language. So, if a JAVA ANT is compiled as a JAVA bean, it is componentized as a bean. Since all ANTs calls are virtual and JAVA supports only virtual calls, it doesn’t matter, except that JAVA cannot compile across a bean boundary and thus performance suffers. The JAVA communication model is not scalable but the use of ANTs technology does fix that problem. For example, a JAVA program compiled as ANTs ran four times faster than its non-ANTs version.

The reason ANTs simplify code is clarified by referring to the old 4GL arguments between the “template” school and the “rules” school. The template school permitted similar instances of code to be produced cookie-cutter style by filling out a form. The rules school created code from a list of rules specifying what ought to occur in any circumstance. The difficulty both schools faced was that they could only create 90% of a program, but moreover, there was no means to go from the 3GL output of the methodology back to the original 4GL input. Any change to the 4GL wiped out the 3GL supplement that finished the program. The most important contribution of object oriented programming to productivity may be the fact that it provides a reversible template technology. However, the objections of the rules side to templates are still valid. The implementation of formulas, tests for conditions, names of methods, and types of data are spread out in the code.

Cataloging and altering these a posteriori code augmentations becomes virtually impossible in a large program. The complex job of determining dependencies and order is still left to the programmer. ANTs adds to object oriented programming the most important benefits of rules based programming, that is, the ability to ignore ordering, the separation of rules from the rest of the program, and removal of specific symbols from the object language source code. While earlier systems used sophisticated algorithms to accomplish these things, ANTs simplifies things by associating the execution of code with the Boolean states of data references. There was little else that sophisticated systems could do anyway. Virtually all of the objections of the rules school to object technology are eliminated without resorting to preprocessors or arcane syntax, and without giving up a single benefit.

Mailboxes contain references to data and not data itself. The structure of C++ and JAVA, the only intended target languages for ANTs, make the cost of that indirection insignificant since data is rarely passed by value in a real program. Since pointers and references only differ as to whether arithmetic is allowed, and since ANTs discourage address arithmetic, there is little practical difference.

JAVA performance seems to have several significant bottlenecks. Those most often complained about will become moot with the advent of optimizing compilers for servers and frequently used applications. More difficult is the synchronization of threads. When a thread encounters a lock it must test it and if it is already locked, place itself on a list. To do that, it must lock the list. When a thread releases a lock, it, or its proxy, must check to see if a thread is waiting to gain the lock. If the list is locked, only bad choices remain. Either the thread can spin on the lock or it can suspend, leaving the next thread wanting the lock waiting. If the locked-out thread finds the list locked, it must suspend and wait. Many implementations use a third thread to maintain the list. Regardless of the strategy, a reliable system is computationally costly. With a reasonably large number of preemptively multi-tasked threads, synchronization will come to dominate execution time. Example: If a list remains locked for a microsecond, and the preemptive slice is ten milliseconds, one in ten thousand times, the thread locking the list will be preempted. Because preemption usually lasts much longer than the time slice, a busy list will almost always be locked by a preempted thread.

When the odds are high that a thread will pause for a lock, the time for a suspend/resume pair must be included in the cost of the lock. When two or more threads are polling for a place on the wait list, and the odds are low of getting in, many times the cost of a suspend/resume pair can be incurred. The non-blocking structure of ANTs fixes this unfortunate situation, which often occurs in large distributed object information systems.

Because JAVA disallows all of the performance enhancing hacks that C programmers use to make elaborate control programs work (like table driven state machines, which are rather slow without pointer arithmetic), JAVA tends to have performance problems with parsers and formatters, for example. ANTs make most of these programs run faster than their C equivalent by replacing the table entries with ANTs waiting on mailboxes. Because JAVA is structured as a call-return, synchronous language, distributed systems must either use some programmer-devised optimization or fail from network delay. ANTs are inherently asynchronous and naturally exploit the concurrency of a program to moderate latency.

A principal goal of ANTs from the beginning was to reduce the need to map a program onto the architecture of the processor that would execute it. In today’s business computing environment, that translates into avoiding API sets. Programs that must operate on various machines have a difficult time dealing with variations in their threading models, event models, message queuing and other system functions. Even systems that only support a single operating system face problems scaling to clusters, distributed computing, handling network latency and properly packaging system calls. ANTs permit a single code base that can be run on a single computer, clusters, distributed networks, and large parallel processors and distributed systems without alterations. Examples of large scale enterprise servers particularly well-suited for the ANTS programming model are the IBM SP Series and the Unisys Aquanta 32 processor enterprise server, both of which can be clustered into sets of 4-8 processor groups.

Writing for a cluster usually involves mapping functions onto specific processors and contending with a difficult API. There are several reasons that this is the case: the call-return model does not start work on data until it is needed (the function that creates the data is called as a request for the data); synchronization is performed by event notification, which is a system function; event notification across OS instances is accomplished by a different mechanism than within the OS; and migration of computation from one node to another is difficult or impossible. Moreover, Object Oriented Programming (OOP) does not provide enough encapsulation to hide machine boundaries and asynchronous communication is provided by an external entity (the message queue) which is slow and clumsy.

ANTs employ asynchronous communication as its basic primitive. Which instance of the OS in a cluster an ANT actually runs on is a matter of considerable indifference to ANTs. ANTs are linked to the list of runable tasks by the arrival of all requested data and, therefore, are neither notified nor polled. Because no distinction is made about where an ANT runs, the cluster API is not a factor. Because ANTs automatically load-balance in a cluster, no special mapping is required. Because ANTs do not rely on the system for synchronization, distributed and dissimilar systems are no problem. (Special harnesses for fault tolerance have been designed, but not implemented.) Hand mapping synchronously communicating processes does not work very well. ANTs are a significant improvement. Windows NT, and, to a lesser degree, UNIX are famous for their system inefficiency. ANTs dramatically reduce the role of the operating system, improving efficiency and scalability. Thread management overhead is lowered 90% using ANTs and can be distributed. This is enough to make NT scale to very large parallel systems.

The call-return model of programming, embeds the name of the function called in the calling code. ANTs embed the name of the desired value. Additionally, the name is local to a component, and, therefore, components can be interchanged without altering the source of other components. Rules are easy to add and modify, because whenever Event X happens is the fundamental primitive of ANTs and is usually the trigger for rules. Call-return programs must embed the notification code in the calling sequences, however many of them there may be, and that makes change difficult.

The reliable superclassing of objects makes JAVA an ideal language for forms routing systems (because stations may need only the functionality of the superclass, and keeping the class description of every form on every station is a management problem), but the absence of useful synchronization mechanisms in JAVA thwart most efforts in this direction. ANTs provide robust synchronization at this global level, reduce the cost of network latency, and improve component encapsulation.

Large online applications with routing requirements for many forms and, similarly, directed map simulations are notoriously hard to parallelize. ANTs make it easy because ANTs programs are partially ordered directed maps. Many common instruction lists are partially ordered directed maps, such as recipes. For instance, it is necessary to break the eggs before separating them, but it is not necessary to break them before sifting the flour or vice versa. It is necessary to do all three before mixing the flour and the eggs. It is interesting to note that cooking a meal is often used as an everyday example of parallel processing, even though the recipes or instructions for preparing the meal’s components or dishes, are procedural rather than descriptive. With complex sets of instructions, it is often difficult to find an order for the instructions that will not halt them before completion. ANTs naturally express their latent partial ordering and the directed map. An ANTs program will not halt before completion, if any path completes, regardless of whether the programmer recognizes the proper path or whether there is even a consistent set of rules for recognizing it. This fact simplifies the job of writing the program, improving the performance by a factor of a hundred by eliminating a complicated central control. In addition, the program will run at maximum concurrency on a multiprocessor without modification.

Lawson Software has the first release C++ version of ANTs installed on a ten processor Aquanta XR/6 NT server in the Research & Development benchmark laboratory. The technology is being evaluated as a means of improving performance against the strict preemptive multi-threading of the NT operating system. We have an application written in JAVA and are looking forward to the forthcoming JAVA version of ANTs to evaluate performance improvements over JAVA’s lightweight but still preemptive threading.

ANTs are an enabling technology not only for parallel processing but for distributed object oriented systems as well. Since ANTs are language and operating system independent (implemented in both C++ and JAVA for NT, UNIX and any JVM), the components of a distributed object based application can be assembled from objects written in either language, (or any other, with JAVA wrappers), running on any OS that supports them or any JVM.

One might ask why anyone would want such a complex distributed application platform. The answer is that nobody does. Such systems are usually “received structures” and must be operated as effectively as possible until they can be replaced by more highly integrated successors. It is unusual but not unknown for customers of an ISV providing cross-industry applications on multiple platforms, to (inter)operate software suites on AS/400 servers for manufacturing, UNIX servers for CAD/CAM and perhaps supply chain, and NT servers for European subsidiaries and some corporate functions like HR. As these applications migrate to object technology with increasing functional and geographic distribution and ever more complex interoperability, software tools like ANTs will enable dramatic performance increases over preemptive threading technology.

Our experience in the development and application of parallel processing over the past thirty years and in distributed object oriented systems over the past ten, tells us that ANTs represent a significant break-through technology with a very promising future in the Age of JAVA. We have long known that the real world operates concurrently; physicists tell us that time is all that keeps everything from happening at once. The object paradigm offers an opportunity to map real world business objects such as invoices, for example, into computer stored, manipulated and distributed electronic objects in a natural way. ANTs allow us to escape the constraints of the computer operating system’s in-built deterministic, sequential ordering and also to map onto our computer models the partially ordered real world activities that make up actual business processes.

The author wishes to acknowledge many helpful discussions with Clifford Hersh, Vice President for Product Development at CHoPP Computer Corporation and with Dr. Richard Wright recently of Lawson Software, now with The Pitts Library at Emory University, Atlanta, G.A.

Peter Patton was formerly Vice President of International operations for Lawson software group based out of MN. Patton has held numerous prestigious high technology positions, including the post of chief information officer (CIO) of the University of Pennsylvania and lead IT consultant to: the National Technology Transfer Center; NASA’s Jet Propulsion Laboratory; and Dun and Bradstreet’s DataQuest Division. He was founding director of Minnesota Supercomputer Institute, vice president and director of Parallel Processing Architecture at MCC and director of computing at the University of Minnesota. He is the author of 16 books and 86 journal articles. Patton also holds a Harvard A.B. degree in Engineering and Applied Physics, an M.A. in Mathematics from the University of Kansas, and a Ph.D. in Aerospace Engineering, summa cum laude, from the University of Stuttgart in Germany.

For more information, see


Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

University of Chicago Researchers Generate First Computational Model of Entire SARS-CoV-2 Virus

January 15, 2021

Over the course of the last year, many detailed computational models of SARS-CoV-2 have been produced with the help of supercomputers, but those models have largely focused on critical elements of the virus, such as its Read more…

By Oliver Peckham

Pat Gelsinger Returns to Intel as CEO

January 14, 2021

The Intel board of directors has appointed a new CEO. Intel alum Pat Gelsinger is leaving his post as CEO of VMware to rejoin the company that he parted ways with 11 years ago. Gelsinger will succeed Bob Swan, who will remain CEO until Feb. 15. Gelsinger previously spent 30 years... Read more…

By Tiffany Trader

Roar Supercomputer to Support Naval Aircraft Research

January 14, 2021

One might not think “aircraft” when picturing the U.S. Navy, but the military branch actually has thousands of aircraft currently in service – and now, supercomputing will help future naval aircraft operate faster, Read more…

By Staff report

DOE and NOAA Extend Computing Partnership, Plan for New Supercomputer

January 14, 2021

The National Climate-Computing Research Center (NCRC), hosted by Oak Ridge National Laboratory (ORNL), has been supporting the climate research of the National Oceanic and Atmospheric Administration (NOAA) for the last 1 Read more…

By Oliver Peckham

Using Micro-Combs, Researchers Demonstrate World’s Fastest Optical Neuromorphic Processor for AI

January 13, 2021

Neuromorphic computing, which uses chips that mimic the behavior of the human brain using virtual “neurons,” is growing in popularity thanks to high-profile efforts from Intel and others. Now, a team of researchers l Read more…

By Oliver Peckham

AWS Solution Channel

Now Available – Amazon EC2 C6gn Instances with 100 Gbps Networking

Amazon EC2 C6gn instances powered by AWS Graviton2 processors are now available!

Compared to C6g instances, this new instance type provides 4x higher network bandwidth, 4x higher packet processing performance, and 2x higher EBS bandwidth. Read more…

Intel® HPC + AI Pavilion

Intel Keynote Address

Intel is the foundation of HPC – from the workstation to the cloud to the backbone of the Top500. At SC20, Intel’s Trish Damkroger, VP and GM of high performance computing, addresses the audience to show how Intel and its partners are building the future of HPC today, through hardware and software technologies that accelerate the broad deployment of advanced HPC systems. Read more…

Honing In on AI, US Launches National Artificial Intelligence Initiative Office

January 13, 2021

To drive American leadership in the field of AI into the future, the National Artificial Intelligence Initiative Office has been launched by the White House Office of Science and Technology Policy (OSTP). The new agen Read more…

By Todd R. Weiss

Pat Gelsinger Returns to Intel as CEO

January 14, 2021

The Intel board of directors has appointed a new CEO. Intel alum Pat Gelsinger is leaving his post as CEO of VMware to rejoin the company that he parted ways with 11 years ago. Gelsinger will succeed Bob Swan, who will remain CEO until Feb. 15. Gelsinger previously spent 30 years... Read more…

By Tiffany Trader

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from I Read more…

By John Russell

Intel ‘Ice Lake’ Server Chips in Production, Set for Volume Ramp This Quarter

January 12, 2021

Intel Corp. used this week’s virtual CES 2021 event to reassert its dominance of the datacenter with the formal roll out of its next-generation server chip, the 10nm Xeon Scalable processor that targets AI and HPC workloads. The third-generation “Ice Lake” family... Read more…

By George Leopold

Researchers Say It Won’t Be Possible to Control Superintelligent AI

January 11, 2021

Worries about out-of-control AI aren’t new. Many prominent figures have suggested caution when unleashing AI. One quote that keeps cropping up is (roughly) th Read more…

By John Russell

AMD Files Patent on New GPU Chiplet Approach

January 5, 2021

Advanced Micro Devices is accelerating the GPU chiplet race with the release of a U.S. patent application for a device that incorporates high-bandwidth intercon Read more…

By George Leopold

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

By Tracey Bryant

Intel Touts Optane Performance, Teases Next-gen “Crow Pass”

January 5, 2021

Competition to leverage new memory and storage hardware with new or improved software to create better storage/memory schemes has steadily gathered steam during Read more…

By John Russell

Farewell 2020: Bleak, Yes. But a Lot of Good Happened Too

December 30, 2020

Here on the cusp of the new year, the catchphrase ‘2020 hindsight’ has a distinctly different feel. Good riddance, yes. But also proof of science’s power Read more…

By John Russell

Esperanto Unveils ML Chip with Nearly 1,100 RISC-V Cores

December 8, 2020

At the RISC-V Summit today, Art Swift, CEO of Esperanto Technologies, announced a new, RISC-V based chip aimed at machine learning and containing nearly 1,100 low-power cores based on the open-source RISC-V architecture. Esperanto Technologies, headquartered in... Read more…

By Oliver Peckham

Azure Scaled to Record 86,400 Cores for Molecular Dynamics

November 20, 2020

A new record for HPC scaling on the public cloud has been achieved on Microsoft Azure. Led by Dr. Jer-Ming Chia, the cloud provider partnered with the Beckman I Read more…

By Oliver Peckham

NICS Unleashes ‘Kraken’ Supercomputer

April 4, 2008

A Cray XT4 supercomputer, dubbed Kraken, is scheduled to come online in mid-summer at the National Institute for Computational Sciences (NICS). The soon-to-be petascale system, and the resulting NICS organization, are the result of an NSF Track II award of $65 million to the University of Tennessee and its partners to provide next-generation supercomputing for the nation's science community. Read more…

Is the Nvidia A100 GPU Performance Worth a Hardware Upgrade?

October 16, 2020

Over the last decade, accelerators have seen an increasing rate of adoption in high-performance computing (HPC) platforms, and in the June 2020 Top500 list, eig Read more…

By Hartwig Anzt, Ahmad Abdelfattah and Jack Dongarra

Aurora’s Troubles Move Frontier into Pole Exascale Position

October 1, 2020

Intel’s 7nm node delay has raised questions about the status of the Aurora supercomputer that was scheduled to be stood up at Argonne National Laboratory next year. Aurora was in the running to be the United States’ first exascale supercomputer although it was on a contemporaneous timeline with... Read more…

By Tiffany Trader

Google Hires Longtime Intel Exec Bill Magro to Lead HPC Strategy

September 18, 2020

In a sign of the times, another prominent HPCer has made a move to a hyperscaler. Longtime Intel executive Bill Magro joined Google as chief technologist for hi Read more…

By Tiffany Trader

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from I Read more…

By John Russell

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Leading Solution Providers


Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

By Tracey Bryant

Top500: Fugaku Keeps Crown, Nvidia’s Selene Climbs to #5

November 16, 2020

With the publication of the 56th Top500 list today from SC20's virtual proceedings, Japan's Fugaku supercomputer – now fully deployed – notches another win, Read more…

By Tiffany Trader

European Commission Declares €8 Billion Investment in Supercomputing

September 18, 2020

Just under two years ago, the European Commission formalized the EuroHPC Joint Undertaking (JU): a concerted HPC effort (comprising 32 participating states at c Read more…

By Oliver Peckham

Texas A&M Announces Flagship ‘Grace’ Supercomputer

November 9, 2020

Texas A&M University has announced its next flagship system: Grace. The new supercomputer, named for legendary programming pioneer Grace Hopper, is replacing the Ada system (itself named for mathematician Ada Lovelace) as the primary workhorse for Texas A&M’s High Performance Research Computing (HPRC). Read more…

By Oliver Peckham

At Oak Ridge, ‘End of Life’ Sometimes Isn’t

October 31, 2020

Sometimes, the old dog actually does go live on a farm. HPC systems are often cursed with short lifespans, as they are continually supplanted by the latest and Read more…

By Oliver Peckham

Nvidia and EuroHPC Team for Four Supercomputers, Including Massive ‘Leonardo’ System

October 15, 2020

The EuroHPC Joint Undertaking (JU) serves as Europe’s concerted supercomputing play, currently comprising 32 member states and billions of euros in funding. I Read more…

By Oliver Peckham

Gordon Bell Special Prize Goes to Massive SARS-CoV-2 Simulations

November 19, 2020

2020 has proven a harrowing year – but it has produced remarkable heroes. To that end, this year, the Association for Computing Machinery (ACM) introduced the Read more…

By Oliver Peckham

Nvidia-Arm Deal a Boon for RISC-V?

October 26, 2020

The $40 billion blockbuster acquisition deal that will bring chipmaker Arm into the Nvidia corporate family could provide a boost for the competing RISC-V architecture. As regulators in the U.S., China and the European Union begin scrutinizing the impact of the blockbuster deal on semiconductor industry competition and innovation, the deal has at the very least... Read more…

By George Leopold

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This