ANTS TECHNOLOGY MAY IMPROVE COMPUTER PROGRAMMING

July 28, 2000

by Peter C. Patton, Ph.D.

San Diego, CA — CHoPP, for Columbia Homogenous Parallel Processor, is a parallel processor architecture developed by Herbert Sullivan when he was at Columbia University in New York in the 1970s. Sullivan used the acronym CHoPP as the name for a venture capital corporation he founded. He wanted funds to build a highly parallel supercomputer based on the CHoPP architecture and patents that resulted from his work at Columbia. He intended that this supercomputer would take advantage of concurrency at all levels of granularity in a computer program, even in existing programs that had been written for sequential or vector machines. A prototype was built at the firm’s facility in San Diego, California and the first model was sold to the San Diego Supercomputer Center. It was never delivered due to difficulty raising capital, but the prototype demonstrated that the basic concepts of the CHoPP architecture were effective and indeed, could be implemented in computer hardware.

The firm was reduced to a small consulting and design staff that became involved in a number of government sponsored studies and design projects revolving around parallel processing. During this period Sullivan came to realize that there was very little difference in the performance of various shared memory architectures. Average latency increases approximately with the log of the number of processors, processing power increases almost linearly until the number of processors equals half, or so, the average concurrency. Increasing the concurrency of the programs was much more important than any particular architecture. Much of the uniqueness of the CHoPP architecture, particularly the patented techniques for exploiting latent concurrency in existing programs, could also be implemented in user-level control software. The increasing use of distributed computing models, such as MPP and, more importantly, networks, led CHoPP to concentrate on using increased concurrency to reduce the effect of network latency on performance, in addition to the original goal of allowing nodes of hundreds or thousands of processors to scale. The original patents were sufficiently broad to cover this unforeseen application to software. However, recent developments at CHoPP and the increasing possibility of gaining software patents have led management of the firm to file specific software patents to cover a software development technology called ANTs: Asynchronous Non-preemptive Tasks. CHoPP has now applied for a patent for ANTs as a non-preemptive threading technology.

The people at CHoPP realized that, while the commercial software market was having difficulty scaling beyond a handful of processors, CHoPP had technology that permitted scaling into the thousands of processors. While the scalability and robustness of asynchronous, stateless communication has been made obvious by the success of the World Wide Web, its application to general purpose computing has been hidden by the overhead of existing methods. CHoPP has been using a low overhead mechanism for asynchronous distributed control for some time. The thirty year history of the development of this technology has been distilled into a simmple and elegant set of primitive operations that are object oriented, easy to use, and compatible with existing software systems. CHoPP now offers this environment for sale as the ANTs Software System. It is available in C++ and Java (soon). ANTs will also be available as part of MinimalJ, the most secure and portable client ever developed.

When Herb Sullivan first told me about the software implementation of his CHoPP technology, I reminded him of a termite colony. The mystery of a termite colony arises from the number of bits of information required to describe the architecture of a termite mound, a six-foot high structure commonly seen on the African savanna. The information required to design the architecture of the mound surpasses the number of neurons in a given termite’s brain; no individual termite’s brain is large enough to store the entire design. So where does the colony store it? Herb’s response was that, of course, the architecture is not stored explicitly, each termite knows only its preprogrammed function and they all work together in a non-preemptive multi-tasking environment, each to do their own thing, which in sum is the thing of the colony. He then replied that ants do the same sorts of things as termites but do it in the daylight and are beautiful and shiny black or red creatures, rather than ugly and pasty white or semi-transparent, so he would rather call the technology ants. Moreover, he pointed out, the acronym works out much better, as well.

ANTs are a programming technology for simplifying large complex software systems into independent components, thereby making them more reliable, more efficient and easier to maintain. It achieves these results because, like the ant or termite colony, it distributes control structures throughout the computation. ANTs computations are a collection of essentially independent component sub-programs whose interactions are described within the component sub-programs themselves, and do not depend on a central control structure. ANTs execute concurrently, or appear to be processing data simultaneously and interact only when they explicitly exchange data. Thus ANTs computations are concurrent computations in which all components appear to execute simultaneously unless they are waiting for data to be provided by another ANT. An inherent property of ANTs computations are that they execute concurrently on parallel multiprocessor or distributed hardware. The benefit of the ANTs programming technique can be seen from two perspectives. On the one hand, ANTs simplify programming by eliminating the need for central control structures; on the other, ANTs enhance efficiency by effectively exploiting the parallel resources of a multi-processor.

The computer architect has faced two challenges: improvement of computational efficiency – by avoiding the waste of time or space – and reducing complexity by logical simplification and reductions of system component count. In terms of computational efficiency, progress has been made in two key areas. Improvement in communication between software and hardware has come through the use of operating systems advances and threading technology. Efficiency has also been improved through the use of multiple processors. In this domain, performance is also improved because multiple processors can cooperate to complete a given computation in minimum time. Computer program resource demands are satisfied by calls to an operating system. Traditionally, hardware resources are assigned to competing processes (which may or may not be parts of one or more concurrent programs) by a technique called multi-tasking or multi-threading. This process takes place in the twilight zone between the application program and its assignment of resources by the operating system upon which it depends. Today’s state-of-the-art technology in this area is synchronous, preemptive multi-threading; that is, when an application asks the operating system to accomplish a task, the system opens a thread and locks it; assigning the requested task to that thread. The thread remains locked until the task is complete or the operating system decides some more important resource needs that thread’s assigned resource. These multiple preemptive threads are then scheduled by a single thread responsible only for scheduling all the others.

In conventional procedural programming practice, the control program specifies the sequence in which component parts of a computation are executed. Normally, this sequence ensures that program segments do not execute until the data they need is available, and also ensures that appropriate program segments execute in time to provide data before it is needed for the computation to progress. As computations become larger and more complex, control structures become more complex and limit the size and effectiveness of the application. Complexity rises exponentially with the number of function points in an application program. At 1,000 or more function points, complexity usually becomes overwhelming and the application is abandoned. While systems this large do not necessarily require more bits than the number of neurons in a single human brain, they do seem to be well beyond the ability of a single application architect to describe, and to predict, the real world performance of the application system.

In software engineering, the usual method for solving complexity issues has been to break complex problems down into a number of simpler, more manageable parts. Unfortunately, the programmer must then also undertake the coordination of the various components of the program; there has been no technique for separating the central control mechanism and encapsulating it into the program’s components. In fact, the control structure becomes either highly centralized in an explicit way or interwoven implicitly into the fabric of the application program. The result is that the parts of the program are no longer interchangeable or amenable to independent upgrades, and the entire program must be analyzed when local changes or maintenance activities are undertaken. The advantages of breaking up the program into quasi-independent segments are undermined by centralized or embedded control structures.

Due to its central control structure preemptive threading technology may get in the way of the effort to reduce complexity. ANTs solve this problem by eliminating the central control structure and making the interaction between program components explicit, asynchronous, and easily accessible for later changes. These components, or ANTs, acquire input data, process that data, and provide the results as output to be received by other ANTs, which are in a suspended state until the data they need becomes available. The programmer is no longer required to be concerned with scheduling or explicitly sequencing tasks because the ANTs control kernel (ACK) automatically schedules the execution of any program component or ANT whenever the data it awaits becomes available for processing. There is no longer any need to sequentially order processes and the computations will proceed concurrently from the point of view of the programmer. In fact they will proceed in parallel on a multiprocessor or distributed server system up to: Minimum {processors available, concurrency in the program} which is one of Sullivan’s theorems. Dependencies of one part of the program on another are automatically respected, because no ANT can execute until the ANTs on which it depends have provided the data it needs to execute.

The efficiency of ANTs depends on the special properties of the ACK, which is at the heart of CHoPP’s proprietary technology. The ACK sits between the program, which may be written in any language (commonly C++ or JAVA) and any operating system (typically UNIX or NT). ANTs communicate with messages sent to mailboxes and the ACK acts as postmaster. Some of the functions of the ACK are usually provided by conventional operating system kernels, but these have two shortcomings for ANTs. First, they are not efficient enough to manage a program broken into so many small parts, and second, currently neither operating systems nor databases provide a single object oriented interface required for today’s programs. The efficiency issue can be understood by observing that the natural decomposition of a problem often produces numerous fine grain components. For example, a business transaction may spawn as many as 50 internal computing/database transactions or object references. A natural programming style would turn each such activity into an ANT. Conventional operating systems would associate each such component with a synchronous operating system process, or thread. Then scheduling and task switching overhead would be incurred every time a thread or process is activated or suspended. The useful work done by these small grain components, say 50 instructions or so, would be very small compared to the thousands required for thread management.

By contrast, the ACK executes 10 to 30 instructions for overhead when an ANT is activated. The overhead is so low that the programmer can ignore size considerations in partitioning a computation into its natural components, using segments as large or as small as the application or its algorithm dictate. The second reason the ACK must be substituted for certain operating system functions is that protocols for thread management are inconsistent with the view that ANTs are independent, encapsulated objects. Threads communicate via unrestricted shared memory that provides no tools for private protection, or for structured shared access. As a result, threads are notoriously difficult to use; they provide mechanisms for concurrent (and parallel) execution only at the expense of increased programming complexity. ANTs, which make the interaction between components explicit and asynchronous, provide an improvement in conventional encapsulation, while permitting concurrent execution as a welcome side benefit.

ANTs are more efficient than threads for mathematical reasons and not simply because of coding tricks. An ANT program is a partially ordered set and therefore has more degrees of freedom in execution order than does a deterministic set of threads. Because the thread control mechanism is used to manage hardware and I/O, fairness and prioritization are essential. Fairness is inherently blocking and prioritization is inherently centralized. Lock/Unlock, etc. are inherently blocking because any locked thread must place itself on a list and the unlocking thread must be able to restart it. One or the other must wait while the list is altered. Once the thread count is sufficiently high, blocking dominates execution time. ANTs are non-blocking. ANTs cannot span multiple applications with a single mechanism. ANTs cannot operate in real time or assign priorities. ANTs can only guarantee that all the work in a batch of work is completed before it halts. This simplification allows ANTs to be non-blocking and have distributed control.

Traditional programming techniques have difficulties when, as is usually the case, the event, which determines a context switch, occurs before the operations necessary to permit it are complete. The resulting flag variables inevitably lead to errors. An ANTs program makes the context switch immediately and the ANT representing the new context waits for the data to be ready for it. This simplifies the code and removes the largest barrier to concurrent execution. The two technologies differ in six fundamental aspects as shown in the table below. ANTs operate in user space whereas threads operate in the kernel space. This lowers overhead and improves protection between processes. The operating system is not aware of ANTs, it deals with an ANT program as if it were a single process.

ANTs impose no priorities and unlike threads, are non-preemptive. Priorities can be established only by reference to a global view of execution, a view that determines some order relationship between threads. Such a global view is in conflict with the notion of independent execution. Priorities are appropriate in the execution of operating system calls, for the management of hardware resources, for example. In these cases ANTs make OS calls and interface with threads in the usual ways. ANTs are encapsulated objects with protected code and data. Threads are not objects, they are kernel managed scheduling entities.

ANTs execute reentrant code and have private data that is not accessible from other ANTs. They communicate by means of structured mailbox objects only. These mailboxes provide the primary ANTs message passing control mechanisms, because when ANTs access mailboxes they are suspended or continue execution according to the availability of data in the mailboxes. Threads, by contrast, communicate via shared memory. This provides no protection or structure to threads communication priorities and is a significant source of error prone code.

The programmer sees the synchronization and coordination of ANTs based entirely on data availability; ANTs execute as soon as the data they need becomes available. This data flow view appears natural to the programmer. Threads are synchronized and coordinated using Wait/Signal or Lock/Unlock primitives. Their role in the application program is unspecified and extrinsic to the application logic or algorithm, and often manifested within obscure or opaque coding sequences. This contributes to the complexity and error proneness of threads usage. When an ANT is suspended and another starts to execute certain housekeeping functions, an activity called task switching must be carried out. The number of task switching instructions required averages between 10 and 30 for ANTs. The analogous operation for threads takes a thousand or more instructions. This distinction is at the heart of ANTs efficiency and also contributes to ANTs programmability. Such low task switching overhead allows the user to be indifferent to task granularity.

ANTs and Threads Compared

ANTs Threads Execution Environment User Space Kernel Space Priority Preemption No Priority Preemption Priority Based Scheduling Encapsulation Private Data and Code None Communication Message Passing Shared Memory Synchronization Criteria Data Availability Lock/Unlock Task Switching 10-30 Instructions 1000+ Instructions

ANTs technology originated as a programming technique for exploiting parallel processing in multiprocessor computer systems. Much of the early work on concurrent programming for parallel processors involved explicit mapping of the application onto the computer architecture by the programmer, including discovering new algorithms suitable for specific architectures. In many embedded applications, the parallel processor was even specifically designed to execute a given algorithm. From the earliest days of the CHoPP architecture, the design team sought automatic mapping technology to remove this burden from the programmer. During the past decade while the CHoPP team has been developing ANTs as the software realization of its hardware architecture, multiprocessor operating systems have developed around the threads technology first used at Carnegie Mellon for Mach. This became the kernel of AiX. More recently the JAVA programming language was developed with light weight threads in an effort to reduce complexity.

Threads and ANTs both incorporate automatic mapping techniques that enable the programmer to create concurrent programs without reference to the specific hardware platform on which the application will run. ANTs cannot perform many of the real-time operating system functions for which threads were designed, and thus an ANTs program makes system calls to thread implemented functions whenever needed. On the other hand, threads are poorly adapted as a basis for a modern application programming system, because they introduce too much overhead, they defeat encapsulation, and they rely on a central control paradigm.

The original CHoPP goal of eliminating the manual mapping of programs onto multiprocessor hardware was achieved through ANTs by a significant reduction in the overhead required for intertask communication. To effectively exploit the parallel resources of a multiprocessor, it is necessary to express the application as a collection of tasks that can be distributed among the available processors. The natural decomposition of an application into interdependent cooperating tasks that communicate frequently requires processors to communicate frequently. Over the past decade interprocessor communication has improved dramatically and so has networked communication between distributed processors. In fact, processors in IBM SP Series systems connected by T3 lines are indifferent as to whether they communicate with another on the same Omega switch or over the network with a processor thousands of miles away. Today, hardware generated time required for interprocess communication is about the same as ordinary processor memory access when there is a cache miss. The software generated time required for threads to communicate when potentially suspending is two to three orders of magnitude higher than the hardware time. Thus threads are unsuited for communication intensive applications because they generate so much overhead they cancel the benefits of parallel processing.

ANTs use the conventions and tools of object oriented programming technology to provide encapsulation, naming, and reuse. The initial implementation of ANTs was done using C++; although a runtime library and several precompiled files are required to program in ANTs, no special programming tools are necessary. A JAVA version of ANTs is in its final stages of testing and will soon be available in demonstration form from an ANTs Web site. ANTs provide the programmer with a small set of reusable object classes, foremost of which are the ANTclass and the Mailboxclass. The actual ANTs runtime environment is invisible to the user and is entirely self-adjusting, since it executes object code on different platforms without user intervention. The programmer will need to give some attention to issues arising from partial ordering of their ANT “colony” or set of tasks in order to gain the full performance benefit of the technology when using a parallel processor.

Objects of the ANTsclass have private data, but all public data is restricted to that contained in Mailboxes. This data is accessed by communications instructions, which are method calls to the Mailboxclass. Objects of ANTsclass execute concurrently to the extent of processors available to host them. Their order of execution is determined by the Mailboxclass methods invoked for interprocess communication and not by any syntax ordering of the ANTs within the program. Unlike threads, ANTs operate in the user space under the ACK in the ANTs runtime package.

The Mailbox object has several private states, two of which are Valid and Invalid. If the Mailbox state is Valid, the Mailbox contains a pointer to public data. The methods of the Mailboxclass are called communication instructions because these are the only way that ANT objects communicate with each other. Sample communication instructions for the C++ version are:

void Cmailbox::Send (void* pointername) // places void pointer in mailbox and marks mailbox valid

void* Cmailbox::Receive (void) // waits until mailbox is valid and returns from send

void Cmailbox::Reuse (void) // invalidates mailbox

These functions allow the ANTs programmer to send mail to a particular ANT by storing a pointer in its mailbox, to receive mail by waiting until a particular mailbox is filled, and to clear a mailbox for later reuse.

Naturally this syntax is a bit different for the JAVA version of ANTs since the JAVA language does not support pointers. The JAVA versions are:

public class Mailbox {

Mailbox (Class type); // TypeDesc is always a reference

void Send (type ObjectReference); // in JAVA, Mailbox is strongly typed

type Receive (void);

void Reuse();

}

The JAVA Mailboxclass allows the programmer the same three functions without the use of pointers. The constructor of a Mailbox checks a list of available Mailbox types and creates the correct one; if it does not exist, a new class is created and added to the list. Unlike in C++, a JAVA Mailbox may only pass a single type of object. The advantages of this approach are that all ANTs programs are strongly typed and no runtime type checking is necessary. The disadvantages are that it is necessary to subclass mailbox (or use TypeDesc) to know the actual class of a mailbox. Certain programs that create dynamic classes without end may have their symbol and vector tables grow much like a C memory leak; runtime type checking will be done on lists of mailboxes because they are underspecified. JAVA casting rules make this necessary and dynamic testing is too slow to be useful.

Experiments with ANTs have shown a factor of four performance improvements without changing the code, other than to add ANTs wrappers to natural, easily recognized application program components. Another promising component based technology for distributed object computing has appeared as a part of the greater JAVA phenomenon recently and comparison with ANTs is interesting.

JavaBeans is a portable, platform-independent component model written in the JAVA language and developed in collaboration with software industry leaders. It enables developers to write reusable components once and run them anywhere, taking advantage of the platform independent nature of JAVA technology and the object distribution capability of the Internet. JavaBeans acts as a bridge between the proprietary component models and provides a seamless and powerful means for developers to build components that will run in ActiveX container applications, for example. The JavaBeans’ API provides a framework for defining reusable, embeddable, modular software components. The JavaBeans’ Specification defines a bean as a reusable software component that can be manipulated visually in a builder tool. Any object that conforms to certain basic rules can be a bean; there is no Bean class that all beans are required to subclass. A bean exports properties, events and methods. A property is a piece of the bean’s internal state that can be set and queried by the programmer; a bean may generate events and may export methods as public methods defined by the bean. In addition to these common object features a bean may have indexed properties, bound properties, and constrained properties. An indexed property has an array value for which the bean provides methods to get and set individual elements of the array. A bound property sends out a notification when its value changes, and a constrained property does the same but also allows that change to be vetoed by listeners.

ANTs always adopt the component mechanism of its host language. So, if a JAVA ANT is compiled as a JAVA bean, it is componentized as a bean. Since all ANTs calls are virtual and JAVA supports only virtual calls, it doesn’t matter, except that JAVA cannot compile across a bean boundary and thus performance suffers. The JAVA communication model is not scalable but the use of ANTs technology does fix that problem. For example, a JAVA program compiled as ANTs ran four times faster than its non-ANTs version.

The reason ANTs simplify code is clarified by referring to the old 4GL arguments between the “template” school and the “rules” school. The template school permitted similar instances of code to be produced cookie-cutter style by filling out a form. The rules school created code from a list of rules specifying what ought to occur in any circumstance. The difficulty both schools faced was that they could only create 90% of a program, but moreover, there was no means to go from the 3GL output of the methodology back to the original 4GL input. Any change to the 4GL wiped out the 3GL supplement that finished the program. The most important contribution of object oriented programming to productivity may be the fact that it provides a reversible template technology. However, the objections of the rules side to templates are still valid. The implementation of formulas, tests for conditions, names of methods, and types of data are spread out in the code.

Cataloging and altering these a posteriori code augmentations becomes virtually impossible in a large program. The complex job of determining dependencies and order is still left to the programmer. ANTs adds to object oriented programming the most important benefits of rules based programming, that is, the ability to ignore ordering, the separation of rules from the rest of the program, and removal of specific symbols from the object language source code. While earlier systems used sophisticated algorithms to accomplish these things, ANTs simplifies things by associating the execution of code with the Boolean states of data references. There was little else that sophisticated systems could do anyway. Virtually all of the objections of the rules school to object technology are eliminated without resorting to preprocessors or arcane syntax, and without giving up a single benefit.

Mailboxes contain references to data and not data itself. The structure of C++ and JAVA, the only intended target languages for ANTs, make the cost of that indirection insignificant since data is rarely passed by value in a real program. Since pointers and references only differ as to whether arithmetic is allowed, and since ANTs discourage address arithmetic, there is little practical difference.

JAVA performance seems to have several significant bottlenecks. Those most often complained about will become moot with the advent of optimizing compilers for servers and frequently used applications. More difficult is the synchronization of threads. When a thread encounters a lock it must test it and if it is already locked, place itself on a list. To do that, it must lock the list. When a thread releases a lock, it, or its proxy, must check to see if a thread is waiting to gain the lock. If the list is locked, only bad choices remain. Either the thread can spin on the lock or it can suspend, leaving the next thread wanting the lock waiting. If the locked-out thread finds the list locked, it must suspend and wait. Many implementations use a third thread to maintain the list. Regardless of the strategy, a reliable system is computationally costly. With a reasonably large number of preemptively multi-tasked threads, synchronization will come to dominate execution time. Example: If a list remains locked for a microsecond, and the preemptive slice is ten milliseconds, one in ten thousand times, the thread locking the list will be preempted. Because preemption usually lasts much longer than the time slice, a busy list will almost always be locked by a preempted thread.

When the odds are high that a thread will pause for a lock, the time for a suspend/resume pair must be included in the cost of the lock. When two or more threads are polling for a place on the wait list, and the odds are low of getting in, many times the cost of a suspend/resume pair can be incurred. The non-blocking structure of ANTs fixes this unfortunate situation, which often occurs in large distributed object information systems.

Because JAVA disallows all of the performance enhancing hacks that C programmers use to make elaborate control programs work (like table driven state machines, which are rather slow without pointer arithmetic), JAVA tends to have performance problems with parsers and formatters, for example. ANTs make most of these programs run faster than their C equivalent by replacing the table entries with ANTs waiting on mailboxes. Because JAVA is structured as a call-return, synchronous language, distributed systems must either use some programmer-devised optimization or fail from network delay. ANTs are inherently asynchronous and naturally exploit the concurrency of a program to moderate latency.

A principal goal of ANTs from the beginning was to reduce the need to map a program onto the architecture of the processor that would execute it. In today’s business computing environment, that translates into avoiding API sets. Programs that must operate on various machines have a difficult time dealing with variations in their threading models, event models, message queuing and other system functions. Even systems that only support a single operating system face problems scaling to clusters, distributed computing, handling network latency and properly packaging system calls. ANTs permit a single code base that can be run on a single computer, clusters, distributed networks, and large parallel processors and distributed systems without alterations. Examples of large scale enterprise servers particularly well-suited for the ANTS programming model are the IBM SP Series and the Unisys Aquanta 32 processor enterprise server, both of which can be clustered into sets of 4-8 processor groups.

Writing for a cluster usually involves mapping functions onto specific processors and contending with a difficult API. There are several reasons that this is the case: the call-return model does not start work on data until it is needed (the function that creates the data is called as a request for the data); synchronization is performed by event notification, which is a system function; event notification across OS instances is accomplished by a different mechanism than within the OS; and migration of computation from one node to another is difficult or impossible. Moreover, Object Oriented Programming (OOP) does not provide enough encapsulation to hide machine boundaries and asynchronous communication is provided by an external entity (the message queue) which is slow and clumsy.

ANTs employ asynchronous communication as its basic primitive. Which instance of the OS in a cluster an ANT actually runs on is a matter of considerable indifference to ANTs. ANTs are linked to the list of runable tasks by the arrival of all requested data and, therefore, are neither notified nor polled. Because no distinction is made about where an ANT runs, the cluster API is not a factor. Because ANTs automatically load-balance in a cluster, no special mapping is required. Because ANTs do not rely on the system for synchronization, distributed and dissimilar systems are no problem. (Special harnesses for fault tolerance have been designed, but not implemented.) Hand mapping synchronously communicating processes does not work very well. ANTs are a significant improvement. Windows NT, and, to a lesser degree, UNIX are famous for their system inefficiency. ANTs dramatically reduce the role of the operating system, improving efficiency and scalability. Thread management overhead is lowered 90% using ANTs and can be distributed. This is enough to make NT scale to very large parallel systems.

The call-return model of programming, embeds the name of the function called in the calling code. ANTs embed the name of the desired value. Additionally, the name is local to a component, and, therefore, components can be interchanged without altering the source of other components. Rules are easy to add and modify, because whenever Event X happens is the fundamental primitive of ANTs and is usually the trigger for rules. Call-return programs must embed the notification code in the calling sequences, however many of them there may be, and that makes change difficult.

The reliable superclassing of objects makes JAVA an ideal language for forms routing systems (because stations may need only the functionality of the superclass, and keeping the class description of every form on every station is a management problem), but the absence of useful synchronization mechanisms in JAVA thwart most efforts in this direction. ANTs provide robust synchronization at this global level, reduce the cost of network latency, and improve component encapsulation.

Large online applications with routing requirements for many forms and, similarly, directed map simulations are notoriously hard to parallelize. ANTs make it easy because ANTs programs are partially ordered directed maps. Many common instruction lists are partially ordered directed maps, such as recipes. For instance, it is necessary to break the eggs before separating them, but it is not necessary to break them before sifting the flour or vice versa. It is necessary to do all three before mixing the flour and the eggs. It is interesting to note that cooking a meal is often used as an everyday example of parallel processing, even though the recipes or instructions for preparing the meal’s components or dishes, are procedural rather than descriptive. With complex sets of instructions, it is often difficult to find an order for the instructions that will not halt them before completion. ANTs naturally express their latent partial ordering and the directed map. An ANTs program will not halt before completion, if any path completes, regardless of whether the programmer recognizes the proper path or whether there is even a consistent set of rules for recognizing it. This fact simplifies the job of writing the program, improving the performance by a factor of a hundred by eliminating a complicated central control. In addition, the program will run at maximum concurrency on a multiprocessor without modification.

Lawson Software has the first release C++ version of ANTs installed on a ten processor Aquanta XR/6 NT server in the Research & Development benchmark laboratory. The technology is being evaluated as a means of improving performance against the strict preemptive multi-threading of the NT operating system. We have an application written in JAVA and are looking forward to the forthcoming JAVA version of ANTs to evaluate performance improvements over JAVA’s lightweight but still preemptive threading.

ANTs are an enabling technology not only for parallel processing but for distributed object oriented systems as well. Since ANTs are language and operating system independent (implemented in both C++ and JAVA for NT, UNIX and any JVM), the components of a distributed object based application can be assembled from objects written in either language, (or any other, with JAVA wrappers), running on any OS that supports them or any JVM.

One might ask why anyone would want such a complex distributed application platform. The answer is that nobody does. Such systems are usually “received structures” and must be operated as effectively as possible until they can be replaced by more highly integrated successors. It is unusual but not unknown for customers of an ISV providing cross-industry applications on multiple platforms, to (inter)operate software suites on AS/400 servers for manufacturing, UNIX servers for CAD/CAM and perhaps supply chain, and NT servers for European subsidiaries and some corporate functions like HR. As these applications migrate to object technology with increasing functional and geographic distribution and ever more complex interoperability, software tools like ANTs will enable dramatic performance increases over preemptive threading technology.

Our experience in the development and application of parallel processing over the past thirty years and in distributed object oriented systems over the past ten, tells us that ANTs represent a significant break-through technology with a very promising future in the Age of JAVA. We have long known that the real world operates concurrently; physicists tell us that time is all that keeps everything from happening at once. The object paradigm offers an opportunity to map real world business objects such as invoices, for example, into computer stored, manipulated and distributed electronic objects in a natural way. ANTs allow us to escape the constraints of the computer operating system’s in-built deterministic, sequential ordering and also to map onto our computer models the partially ordered real world activities that make up actual business processes.

The author wishes to acknowledge many helpful discussions with Clifford Hersh, Vice President for Product Development at CHoPP Computer Corporation and with Dr. Richard Wright recently of Lawson Software, now with The Pitts Library at Emory University, Atlanta, G.A.

Peter Patton was formerly Vice President of International operations for Lawson software group based out of MN. Patton has held numerous prestigious high technology positions, including the post of chief information officer (CIO) of the University of Pennsylvania and lead IT consultant to: the National Technology Transfer Center; NASA’s Jet Propulsion Laboratory; and Dun and Bradstreet’s DataQuest Division. He was founding director of Minnesota Supercomputer Institute, vice president and director of Parallel Processing Architecture at MCC and director of computing at the University of Minnesota. He is the author of 16 books and 86 journal articles. Patton also holds a Harvard A.B. degree in Engineering and Applied Physics, an M.A. in Mathematics from the University of Kansas, and a Ph.D. in Aerospace Engineering, summa cum laude, from the University of Stuttgart in Germany.

For more information, see http://www.antssoftware.com/

============================================================

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

Nvidia Appoints Andy Grant as EMEA Director of Supercomputing, Higher Education, and AI

March 22, 2024

Nvidia recently appointed Andy Grant as Director, Supercomputing, Higher Education, and AI for Europe, the Middle East, and Africa (EMEA). With over 25 years of high-performance computing (HPC) experience, Grant brings a Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Houston We Have a Solution: Addressing the HPC and Tech Talent Gap

March 15, 2024

Generations of Houstonian teachers, counselors, and parents have either worked in the aerospace industry or know people who do - the prospect of entering the fi Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire