ANTS TECHNOLOGY MAY IMPROVE COMPUTER PROGRAMMING

July 28, 2000

by Peter C. Patton, Ph.D.

San Diego, CA — CHoPP, for Columbia Homogenous Parallel Processor, is a parallel processor architecture developed by Herbert Sullivan when he was at Columbia University in New York in the 1970s. Sullivan used the acronym CHoPP as the name for a venture capital corporation he founded. He wanted funds to build a highly parallel supercomputer based on the CHoPP architecture and patents that resulted from his work at Columbia. He intended that this supercomputer would take advantage of concurrency at all levels of granularity in a computer program, even in existing programs that had been written for sequential or vector machines. A prototype was built at the firm’s facility in San Diego, California and the first model was sold to the San Diego Supercomputer Center. It was never delivered due to difficulty raising capital, but the prototype demonstrated that the basic concepts of the CHoPP architecture were effective and indeed, could be implemented in computer hardware.

The firm was reduced to a small consulting and design staff that became involved in a number of government sponsored studies and design projects revolving around parallel processing. During this period Sullivan came to realize that there was very little difference in the performance of various shared memory architectures. Average latency increases approximately with the log of the number of processors, processing power increases almost linearly until the number of processors equals half, or so, the average concurrency. Increasing the concurrency of the programs was much more important than any particular architecture. Much of the uniqueness of the CHoPP architecture, particularly the patented techniques for exploiting latent concurrency in existing programs, could also be implemented in user-level control software. The increasing use of distributed computing models, such as MPP and, more importantly, networks, led CHoPP to concentrate on using increased concurrency to reduce the effect of network latency on performance, in addition to the original goal of allowing nodes of hundreds or thousands of processors to scale. The original patents were sufficiently broad to cover this unforeseen application to software. However, recent developments at CHoPP and the increasing possibility of gaining software patents have led management of the firm to file specific software patents to cover a software development technology called ANTs: Asynchronous Non-preemptive Tasks. CHoPP has now applied for a patent for ANTs as a non-preemptive threading technology.

The people at CHoPP realized that, while the commercial software market was having difficulty scaling beyond a handful of processors, CHoPP had technology that permitted scaling into the thousands of processors. While the scalability and robustness of asynchronous, stateless communication has been made obvious by the success of the World Wide Web, its application to general purpose computing has been hidden by the overhead of existing methods. CHoPP has been using a low overhead mechanism for asynchronous distributed control for some time. The thirty year history of the development of this technology has been distilled into a simmple and elegant set of primitive operations that are object oriented, easy to use, and compatible with existing software systems. CHoPP now offers this environment for sale as the ANTs Software System. It is available in C++ and Java (soon). ANTs will also be available as part of MinimalJ, the most secure and portable client ever developed.

When Herb Sullivan first told me about the software implementation of his CHoPP technology, I reminded him of a termite colony. The mystery of a termite colony arises from the number of bits of information required to describe the architecture of a termite mound, a six-foot high structure commonly seen on the African savanna. The information required to design the architecture of the mound surpasses the number of neurons in a given termite’s brain; no individual termite’s brain is large enough to store the entire design. So where does the colony store it? Herb’s response was that, of course, the architecture is not stored explicitly, each termite knows only its preprogrammed function and they all work together in a non-preemptive multi-tasking environment, each to do their own thing, which in sum is the thing of the colony. He then replied that ants do the same sorts of things as termites but do it in the daylight and are beautiful and shiny black or red creatures, rather than ugly and pasty white or semi-transparent, so he would rather call the technology ants. Moreover, he pointed out, the acronym works out much better, as well.

ANTs are a programming technology for simplifying large complex software systems into independent components, thereby making them more reliable, more efficient and easier to maintain. It achieves these results because, like the ant or termite colony, it distributes control structures throughout the computation. ANTs computations are a collection of essentially independent component sub-programs whose interactions are described within the component sub-programs themselves, and do not depend on a central control structure. ANTs execute concurrently, or appear to be processing data simultaneously and interact only when they explicitly exchange data. Thus ANTs computations are concurrent computations in which all components appear to execute simultaneously unless they are waiting for data to be provided by another ANT. An inherent property of ANTs computations are that they execute concurrently on parallel multiprocessor or distributed hardware. The benefit of the ANTs programming technique can be seen from two perspectives. On the one hand, ANTs simplify programming by eliminating the need for central control structures; on the other, ANTs enhance efficiency by effectively exploiting the parallel resources of a multi-processor.

The computer architect has faced two challenges: improvement of computational efficiency – by avoiding the waste of time or space – and reducing complexity by logical simplification and reductions of system component count. In terms of computational efficiency, progress has been made in two key areas. Improvement in communication between software and hardware has come through the use of operating systems advances and threading technology. Efficiency has also been improved through the use of multiple processors. In this domain, performance is also improved because multiple processors can cooperate to complete a given computation in minimum time. Computer program resource demands are satisfied by calls to an operating system. Traditionally, hardware resources are assigned to competing processes (which may or may not be parts of one or more concurrent programs) by a technique called multi-tasking or multi-threading. This process takes place in the twilight zone between the application program and its assignment of resources by the operating system upon which it depends. Today’s state-of-the-art technology in this area is synchronous, preemptive multi-threading; that is, when an application asks the operating system to accomplish a task, the system opens a thread and locks it; assigning the requested task to that thread. The thread remains locked until the task is complete or the operating system decides some more important resource needs that thread’s assigned resource. These multiple preemptive threads are then scheduled by a single thread responsible only for scheduling all the others.

In conventional procedural programming practice, the control program specifies the sequence in which component parts of a computation are executed. Normally, this sequence ensures that program segments do not execute until the data they need is available, and also ensures that appropriate program segments execute in time to provide data before it is needed for the computation to progress. As computations become larger and more complex, control structures become more complex and limit the size and effectiveness of the application. Complexity rises exponentially with the number of function points in an application program. At 1,000 or more function points, complexity usually becomes overwhelming and the application is abandoned. While systems this large do not necessarily require more bits than the number of neurons in a single human brain, they do seem to be well beyond the ability of a single application architect to describe, and to predict, the real world performance of the application system.

In software engineering, the usual method for solving complexity issues has been to break complex problems down into a number of simpler, more manageable parts. Unfortunately, the programmer must then also undertake the coordination of the various components of the program; there has been no technique for separating the central control mechanism and encapsulating it into the program’s components. In fact, the control structure becomes either highly centralized in an explicit way or interwoven implicitly into the fabric of the application program. The result is that the parts of the program are no longer interchangeable or amenable to independent upgrades, and the entire program must be analyzed when local changes or maintenance activities are undertaken. The advantages of breaking up the program into quasi-independent segments are undermined by centralized or embedded control structures.

Due to its central control structure preemptive threading technology may get in the way of the effort to reduce complexity. ANTs solve this problem by eliminating the central control structure and making the interaction between program components explicit, asynchronous, and easily accessible for later changes. These components, or ANTs, acquire input data, process that data, and provide the results as output to be received by other ANTs, which are in a suspended state until the data they need becomes available. The programmer is no longer required to be concerned with scheduling or explicitly sequencing tasks because the ANTs control kernel (ACK) automatically schedules the execution of any program component or ANT whenever the data it awaits becomes available for processing. There is no longer any need to sequentially order processes and the computations will proceed concurrently from the point of view of the programmer. In fact they will proceed in parallel on a multiprocessor or distributed server system up to: Minimum {processors available, concurrency in the program} which is one of Sullivan’s theorems. Dependencies of one part of the program on another are automatically respected, because no ANT can execute until the ANTs on which it depends have provided the data it needs to execute.

The efficiency of ANTs depends on the special properties of the ACK, which is at the heart of CHoPP’s proprietary technology. The ACK sits between the program, which may be written in any language (commonly C++ or JAVA) and any operating system (typically UNIX or NT). ANTs communicate with messages sent to mailboxes and the ACK acts as postmaster. Some of the functions of the ACK are usually provided by conventional operating system kernels, but these have two shortcomings for ANTs. First, they are not efficient enough to manage a program broken into so many small parts, and second, currently neither operating systems nor databases provide a single object oriented interface required for today’s programs. The efficiency issue can be understood by observing that the natural decomposition of a problem often produces numerous fine grain components. For example, a business transaction may spawn as many as 50 internal computing/database transactions or object references. A natural programming style would turn each such activity into an ANT. Conventional operating systems would associate each such component with a synchronous operating system process, or thread. Then scheduling and task switching overhead would be incurred every time a thread or process is activated or suspended. The useful work done by these small grain components, say 50 instructions or so, would be very small compared to the thousands required for thread management.

By contrast, the ACK executes 10 to 30 instructions for overhead when an ANT is activated. The overhead is so low that the programmer can ignore size considerations in partitioning a computation into its natural components, using segments as large or as small as the application or its algorithm dictate. The second reason the ACK must be substituted for certain operating system functions is that protocols for thread management are inconsistent with the view that ANTs are independent, encapsulated objects. Threads communicate via unrestricted shared memory that provides no tools for private protection, or for structured shared access. As a result, threads are notoriously difficult to use; they provide mechanisms for concurrent (and parallel) execution only at the expense of increased programming complexity. ANTs, which make the interaction between components explicit and asynchronous, provide an improvement in conventional encapsulation, while permitting concurrent execution as a welcome side benefit.

ANTs are more efficient than threads for mathematical reasons and not simply because of coding tricks. An ANT program is a partially ordered set and therefore has more degrees of freedom in execution order than does a deterministic set of threads. Because the thread control mechanism is used to manage hardware and I/O, fairness and prioritization are essential. Fairness is inherently blocking and prioritization is inherently centralized. Lock/Unlock, etc. are inherently blocking because any locked thread must place itself on a list and the unlocking thread must be able to restart it. One or the other must wait while the list is altered. Once the thread count is sufficiently high, blocking dominates execution time. ANTs are non-blocking. ANTs cannot span multiple applications with a single mechanism. ANTs cannot operate in real time or assign priorities. ANTs can only guarantee that all the work in a batch of work is completed before it halts. This simplification allows ANTs to be non-blocking and have distributed control.

Traditional programming techniques have difficulties when, as is usually the case, the event, which determines a context switch, occurs before the operations necessary to permit it are complete. The resulting flag variables inevitably lead to errors. An ANTs program makes the context switch immediately and the ANT representing the new context waits for the data to be ready for it. This simplifies the code and removes the largest barrier to concurrent execution. The two technologies differ in six fundamental aspects as shown in the table below. ANTs operate in user space whereas threads operate in the kernel space. This lowers overhead and improves protection between processes. The operating system is not aware of ANTs, it deals with an ANT program as if it were a single process.

ANTs impose no priorities and unlike threads, are non-preemptive. Priorities can be established only by reference to a global view of execution, a view that determines some order relationship between threads. Such a global view is in conflict with the notion of independent execution. Priorities are appropriate in the execution of operating system calls, for the management of hardware resources, for example. In these cases ANTs make OS calls and interface with threads in the usual ways. ANTs are encapsulated objects with protected code and data. Threads are not objects, they are kernel managed scheduling entities.

ANTs execute reentrant code and have private data that is not accessible from other ANTs. They communicate by means of structured mailbox objects only. These mailboxes provide the primary ANTs message passing control mechanisms, because when ANTs access mailboxes they are suspended or continue execution according to the availability of data in the mailboxes. Threads, by contrast, communicate via shared memory. This provides no protection or structure to threads communication priorities and is a significant source of error prone code.

The programmer sees the synchronization and coordination of ANTs based entirely on data availability; ANTs execute as soon as the data they need becomes available. This data flow view appears natural to the programmer. Threads are synchronized and coordinated using Wait/Signal or Lock/Unlock primitives. Their role in the application program is unspecified and extrinsic to the application logic or algorithm, and often manifested within obscure or opaque coding sequences. This contributes to the complexity and error proneness of threads usage. When an ANT is suspended and another starts to execute certain housekeeping functions, an activity called task switching must be carried out. The number of task switching instructions required averages between 10 and 30 for ANTs. The analogous operation for threads takes a thousand or more instructions. This distinction is at the heart of ANTs efficiency and also contributes to ANTs programmability. Such low task switching overhead allows the user to be indifferent to task granularity.

ANTs and Threads Compared

ANTs Threads Execution Environment User Space Kernel Space Priority Preemption No Priority Preemption Priority Based Scheduling Encapsulation Private Data and Code None Communication Message Passing Shared Memory Synchronization Criteria Data Availability Lock/Unlock Task Switching 10-30 Instructions 1000+ Instructions

ANTs technology originated as a programming technique for exploiting parallel processing in multiprocessor computer systems. Much of the early work on concurrent programming for parallel processors involved explicit mapping of the application onto the computer architecture by the programmer, including discovering new algorithms suitable for specific architectures. In many embedded applications, the parallel processor was even specifically designed to execute a given algorithm. From the earliest days of the CHoPP architecture, the design team sought automatic mapping technology to remove this burden from the programmer. During the past decade while the CHoPP team has been developing ANTs as the software realization of its hardware architecture, multiprocessor operating systems have developed around the threads technology first used at Carnegie Mellon for Mach. This became the kernel of AiX. More recently the JAVA programming language was developed with light weight threads in an effort to reduce complexity.

Threads and ANTs both incorporate automatic mapping techniques that enable the programmer to create concurrent programs without reference to the specific hardware platform on which the application will run. ANTs cannot perform many of the real-time operating system functions for which threads were designed, and thus an ANTs program makes system calls to thread implemented functions whenever needed. On the other hand, threads are poorly adapted as a basis for a modern application programming system, because they introduce too much overhead, they defeat encapsulation, and they rely on a central control paradigm.

The original CHoPP goal of eliminating the manual mapping of programs onto multiprocessor hardware was achieved through ANTs by a significant reduction in the overhead required for intertask communication. To effectively exploit the parallel resources of a multiprocessor, it is necessary to express the application as a collection of tasks that can be distributed among the available processors. The natural decomposition of an application into interdependent cooperating tasks that communicate frequently requires processors to communicate frequently. Over the past decade interprocessor communication has improved dramatically and so has networked communication between distributed processors. In fact, processors in IBM SP Series systems connected by T3 lines are indifferent as to whether they communicate with another on the same Omega switch or over the network with a processor thousands of miles away. Today, hardware generated time required for interprocess communication is about the same as ordinary processor memory access when there is a cache miss. The software generated time required for threads to communicate when potentially suspending is two to three orders of magnitude higher than the hardware time. Thus threads are unsuited for communication intensive applications because they generate so much overhead they cancel the benefits of parallel processing.

ANTs use the conventions and tools of object oriented programming technology to provide encapsulation, naming, and reuse. The initial implementation of ANTs was done using C++; although a runtime library and several precompiled files are required to program in ANTs, no special programming tools are necessary. A JAVA version of ANTs is in its final stages of testing and will soon be available in demonstration form from an ANTs Web site. ANTs provide the programmer with a small set of reusable object classes, foremost of which are the ANTclass and the Mailboxclass. The actual ANTs runtime environment is invisible to the user and is entirely self-adjusting, since it executes object code on different platforms without user intervention. The programmer will need to give some attention to issues arising from partial ordering of their ANT “colony” or set of tasks in order to gain the full performance benefit of the technology when using a parallel processor.

Objects of the ANTsclass have private data, but all public data is restricted to that contained in Mailboxes. This data is accessed by communications instructions, which are method calls to the Mailboxclass. Objects of ANTsclass execute concurrently to the extent of processors available to host them. Their order of execution is determined by the Mailboxclass methods invoked for interprocess communication and not by any syntax ordering of the ANTs within the program. Unlike threads, ANTs operate in the user space under the ACK in the ANTs runtime package.

The Mailbox object has several private states, two of which are Valid and Invalid. If the Mailbox state is Valid, the Mailbox contains a pointer to public data. The methods of the Mailboxclass are called communication instructions because these are the only way that ANT objects communicate with each other. Sample communication instructions for the C++ version are:

void Cmailbox::Send (void* pointername) // places void pointer in mailbox and marks mailbox valid

void* Cmailbox::Receive (void) // waits until mailbox is valid and returns from send

void Cmailbox::Reuse (void) // invalidates mailbox

These functions allow the ANTs programmer to send mail to a particular ANT by storing a pointer in its mailbox, to receive mail by waiting until a particular mailbox is filled, and to clear a mailbox for later reuse.

Naturally this syntax is a bit different for the JAVA version of ANTs since the JAVA language does not support pointers. The JAVA versions are:

public class Mailbox {

Mailbox (Class type); // TypeDesc is always a reference

void Send (type ObjectReference); // in JAVA, Mailbox is strongly typed

type Receive (void);

void Reuse();

}

The JAVA Mailboxclass allows the programmer the same three functions without the use of pointers. The constructor of a Mailbox checks a list of available Mailbox types and creates the correct one; if it does not exist, a new class is created and added to the list. Unlike in C++, a JAVA Mailbox may only pass a single type of object. The advantages of this approach are that all ANTs programs are strongly typed and no runtime type checking is necessary. The disadvantages are that it is necessary to subclass mailbox (or use TypeDesc) to know the actual class of a mailbox. Certain programs that create dynamic classes without end may have their symbol and vector tables grow much like a C memory leak; runtime type checking will be done on lists of mailboxes because they are underspecified. JAVA casting rules make this necessary and dynamic testing is too slow to be useful.

Experiments with ANTs have shown a factor of four performance improvements without changing the code, other than to add ANTs wrappers to natural, easily recognized application program components. Another promising component based technology for distributed object computing has appeared as a part of the greater JAVA phenomenon recently and comparison with ANTs is interesting.

JavaBeans is a portable, platform-independent component model written in the JAVA language and developed in collaboration with software industry leaders. It enables developers to write reusable components once and run them anywhere, taking advantage of the platform independent nature of JAVA technology and the object distribution capability of the Internet. JavaBeans acts as a bridge between the proprietary component models and provides a seamless and powerful means for developers to build components that will run in ActiveX container applications, for example. The JavaBeans’ API provides a framework for defining reusable, embeddable, modular software components. The JavaBeans’ Specification defines a bean as a reusable software component that can be manipulated visually in a builder tool. Any object that conforms to certain basic rules can be a bean; there is no Bean class that all beans are required to subclass. A bean exports properties, events and methods. A property is a piece of the bean’s internal state that can be set and queried by the programmer; a bean may generate events and may export methods as public methods defined by the bean. In addition to these common object features a bean may have indexed properties, bound properties, and constrained properties. An indexed property has an array value for which the bean provides methods to get and set individual elements of the array. A bound property sends out a notification when its value changes, and a constrained property does the same but also allows that change to be vetoed by listeners.

ANTs always adopt the component mechanism of its host language. So, if a JAVA ANT is compiled as a JAVA bean, it is componentized as a bean. Since all ANTs calls are virtual and JAVA supports only virtual calls, it doesn’t matter, except that JAVA cannot compile across a bean boundary and thus performance suffers. The JAVA communication model is not scalable but the use of ANTs technology does fix that problem. For example, a JAVA program compiled as ANTs ran four times faster than its non-ANTs version.

The reason ANTs simplify code is clarified by referring to the old 4GL arguments between the “template” school and the “rules” school. The template school permitted similar instances of code to be produced cookie-cutter style by filling out a form. The rules school created code from a list of rules specifying what ought to occur in any circumstance. The difficulty both schools faced was that they could only create 90% of a program, but moreover, there was no means to go from the 3GL output of the methodology back to the original 4GL input. Any change to the 4GL wiped out the 3GL supplement that finished the program. The most important contribution of object oriented programming to productivity may be the fact that it provides a reversible template technology. However, the objections of the rules side to templates are still valid. The implementation of formulas, tests for conditions, names of methods, and types of data are spread out in the code.

Cataloging and altering these a posteriori code augmentations becomes virtually impossible in a large program. The complex job of determining dependencies and order is still left to the programmer. ANTs adds to object oriented programming the most important benefits of rules based programming, that is, the ability to ignore ordering, the separation of rules from the rest of the program, and removal of specific symbols from the object language source code. While earlier systems used sophisticated algorithms to accomplish these things, ANTs simplifies things by associating the execution of code with the Boolean states of data references. There was little else that sophisticated systems could do anyway. Virtually all of the objections of the rules school to object technology are eliminated without resorting to preprocessors or arcane syntax, and without giving up a single benefit.

Mailboxes contain references to data and not data itself. The structure of C++ and JAVA, the only intended target languages for ANTs, make the cost of that indirection insignificant since data is rarely passed by value in a real program. Since pointers and references only differ as to whether arithmetic is allowed, and since ANTs discourage address arithmetic, there is little practical difference.

JAVA performance seems to have several significant bottlenecks. Those most often complained about will become moot with the advent of optimizing compilers for servers and frequently used applications. More difficult is the synchronization of threads. When a thread encounters a lock it must test it and if it is already locked, place itself on a list. To do that, it must lock the list. When a thread releases a lock, it, or its proxy, must check to see if a thread is waiting to gain the lock. If the list is locked, only bad choices remain. Either the thread can spin on the lock or it can suspend, leaving the next thread wanting the lock waiting. If the locked-out thread finds the list locked, it must suspend and wait. Many implementations use a third thread to maintain the list. Regardless of the strategy, a reliable system is computationally costly. With a reasonably large number of preemptively multi-tasked threads, synchronization will come to dominate execution time. Example: If a list remains locked for a microsecond, and the preemptive slice is ten milliseconds, one in ten thousand times, the thread locking the list will be preempted. Because preemption usually lasts much longer than the time slice, a busy list will almost always be locked by a preempted thread.

When the odds are high that a thread will pause for a lock, the time for a suspend/resume pair must be included in the cost of the lock. When two or more threads are polling for a place on the wait list, and the odds are low of getting in, many times the cost of a suspend/resume pair can be incurred. The non-blocking structure of ANTs fixes this unfortunate situation, which often occurs in large distributed object information systems.

Because JAVA disallows all of the performance enhancing hacks that C programmers use to make elaborate control programs work (like table driven state machines, which are rather slow without pointer arithmetic), JAVA tends to have performance problems with parsers and formatters, for example. ANTs make most of these programs run faster than their C equivalent by replacing the table entries with ANTs waiting on mailboxes. Because JAVA is structured as a call-return, synchronous language, distributed systems must either use some programmer-devised optimization or fail from network delay. ANTs are inherently asynchronous and naturally exploit the concurrency of a program to moderate latency.

A principal goal of ANTs from the beginning was to reduce the need to map a program onto the architecture of the processor that would execute it. In today’s business computing environment, that translates into avoiding API sets. Programs that must operate on various machines have a difficult time dealing with variations in their threading models, event models, message queuing and other system functions. Even systems that only support a single operating system face problems scaling to clusters, distributed computing, handling network latency and properly packaging system calls. ANTs permit a single code base that can be run on a single computer, clusters, distributed networks, and large parallel processors and distributed systems without alterations. Examples of large scale enterprise servers particularly well-suited for the ANTS programming model are the IBM SP Series and the Unisys Aquanta 32 processor enterprise server, both of which can be clustered into sets of 4-8 processor groups.

Writing for a cluster usually involves mapping functions onto specific processors and contending with a difficult API. There are several reasons that this is the case: the call-return model does not start work on data until it is needed (the function that creates the data is called as a request for the data); synchronization is performed by event notification, which is a system function; event notification across OS instances is accomplished by a different mechanism than within the OS; and migration of computation from one node to another is difficult or impossible. Moreover, Object Oriented Programming (OOP) does not provide enough encapsulation to hide machine boundaries and asynchronous communication is provided by an external entity (the message queue) which is slow and clumsy.

ANTs employ asynchronous communication as its basic primitive. Which instance of the OS in a cluster an ANT actually runs on is a matter of considerable indifference to ANTs. ANTs are linked to the list of runable tasks by the arrival of all requested data and, therefore, are neither notified nor polled. Because no distinction is made about where an ANT runs, the cluster API is not a factor. Because ANTs automatically load-balance in a cluster, no special mapping is required. Because ANTs do not rely on the system for synchronization, distributed and dissimilar systems are no problem. (Special harnesses for fault tolerance have been designed, but not implemented.) Hand mapping synchronously communicating processes does not work very well. ANTs are a significant improvement. Windows NT, and, to a lesser degree, UNIX are famous for their system inefficiency. ANTs dramatically reduce the role of the operating system, improving efficiency and scalability. Thread management overhead is lowered 90% using ANTs and can be distributed. This is enough to make NT scale to very large parallel systems.

The call-return model of programming, embeds the name of the function called in the calling code. ANTs embed the name of the desired value. Additionally, the name is local to a component, and, therefore, components can be interchanged without altering the source of other components. Rules are easy to add and modify, because whenever Event X happens is the fundamental primitive of ANTs and is usually the trigger for rules. Call-return programs must embed the notification code in the calling sequences, however many of them there may be, and that makes change difficult.

The reliable superclassing of objects makes JAVA an ideal language for forms routing systems (because stations may need only the functionality of the superclass, and keeping the class description of every form on every station is a management problem), but the absence of useful synchronization mechanisms in JAVA thwart most efforts in this direction. ANTs provide robust synchronization at this global level, reduce the cost of network latency, and improve component encapsulation.

Large online applications with routing requirements for many forms and, similarly, directed map simulations are notoriously hard to parallelize. ANTs make it easy because ANTs programs are partially ordered directed maps. Many common instruction lists are partially ordered directed maps, such as recipes. For instance, it is necessary to break the eggs before separating them, but it is not necessary to break them before sifting the flour or vice versa. It is necessary to do all three before mixing the flour and the eggs. It is interesting to note that cooking a meal is often used as an everyday example of parallel processing, even though the recipes or instructions for preparing the meal’s components or dishes, are procedural rather than descriptive. With complex sets of instructions, it is often difficult to find an order for the instructions that will not halt them before completion. ANTs naturally express their latent partial ordering and the directed map. An ANTs program will not halt before completion, if any path completes, regardless of whether the programmer recognizes the proper path or whether there is even a consistent set of rules for recognizing it. This fact simplifies the job of writing the program, improving the performance by a factor of a hundred by eliminating a complicated central control. In addition, the program will run at maximum concurrency on a multiprocessor without modification.

Lawson Software has the first release C++ version of ANTs installed on a ten processor Aquanta XR/6 NT server in the Research & Development benchmark laboratory. The technology is being evaluated as a means of improving performance against the strict preemptive multi-threading of the NT operating system. We have an application written in JAVA and are looking forward to the forthcoming JAVA version of ANTs to evaluate performance improvements over JAVA’s lightweight but still preemptive threading.

ANTs are an enabling technology not only for parallel processing but for distributed object oriented systems as well. Since ANTs are language and operating system independent (implemented in both C++ and JAVA for NT, UNIX and any JVM), the components of a distributed object based application can be assembled from objects written in either language, (or any other, with JAVA wrappers), running on any OS that supports them or any JVM.

One might ask why anyone would want such a complex distributed application platform. The answer is that nobody does. Such systems are usually “received structures” and must be operated as effectively as possible until they can be replaced by more highly integrated successors. It is unusual but not unknown for customers of an ISV providing cross-industry applications on multiple platforms, to (inter)operate software suites on AS/400 servers for manufacturing, UNIX servers for CAD/CAM and perhaps supply chain, and NT servers for European subsidiaries and some corporate functions like HR. As these applications migrate to object technology with increasing functional and geographic distribution and ever more complex interoperability, software tools like ANTs will enable dramatic performance increases over preemptive threading technology.

Our experience in the development and application of parallel processing over the past thirty years and in distributed object oriented systems over the past ten, tells us that ANTs represent a significant break-through technology with a very promising future in the Age of JAVA. We have long known that the real world operates concurrently; physicists tell us that time is all that keeps everything from happening at once. The object paradigm offers an opportunity to map real world business objects such as invoices, for example, into computer stored, manipulated and distributed electronic objects in a natural way. ANTs allow us to escape the constraints of the computer operating system’s in-built deterministic, sequential ordering and also to map onto our computer models the partially ordered real world activities that make up actual business processes.

The author wishes to acknowledge many helpful discussions with Clifford Hersh, Vice President for Product Development at CHoPP Computer Corporation and with Dr. Richard Wright recently of Lawson Software, now with The Pitts Library at Emory University, Atlanta, G.A.

Peter Patton was formerly Vice President of International operations for Lawson software group based out of MN. Patton has held numerous prestigious high technology positions, including the post of chief information officer (CIO) of the University of Pennsylvania and lead IT consultant to: the National Technology Transfer Center; NASA’s Jet Propulsion Laboratory; and Dun and Bradstreet’s DataQuest Division. He was founding director of Minnesota Supercomputer Institute, vice president and director of Parallel Processing Architecture at MCC and director of computing at the University of Minnesota. He is the author of 16 books and 86 journal articles. Patton also holds a Harvard A.B. degree in Engineering and Applied Physics, an M.A. in Mathematics from the University of Kansas, and a Ph.D. in Aerospace Engineering, summa cum laude, from the University of Stuttgart in Germany.

For more information, see http://www.antssoftware.com/

============================================================

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Hyperion: AI-driven HPC Industry Continues to Push Growth Projections

November 21, 2019

Three major forces – AI, cloud and exascale – are combining to raise the HPC industry to heights exceeding expectations. According to market study results released this week by Hyperion Research at SC19 in Denver, Read more…

By Doug Black

At SC19: Bespoke Supercomputing for Climate and Weather

November 20, 2019

Weather and climate applications are some of the most important uses of HPC – a good model can save lives, as well as billions of dollars. But many weather and climate models struggle to run efficiently in their HPC en Read more…

By Oliver Peckham

Microsoft, Nvidia Launch Cloud HPC Service

November 20, 2019

Nvidia and Microsoft have joined forces to offer a cloud HPC capability based on the GPU vendor’s V100 Tensor Core chips linked via an InfiniBand network scaling up to 800 graphics processors. The partners announced Read more…

By George Leopold

Hazra Retiring from Intel Data Center Group, Successor Not Known

November 20, 2019

Rajeeb Hazra, corporate VP of Intel’s Data Center Group and GM for the Enterprise and Government Group, is retiring after more than 24 years at the company. At this writing, his successor is unknown. An earlier story on... Read more…

By Doug Black

Jensen Huang’s SC19 – Fast Cars, a Strong Arm, and Aiming for the Cloud(s)

November 20, 2019

We’ve come to expect Nvidia CEO Jensen Huang’s annual SC keynote to contain stunning graphics and lively bravado (with plenty of examples) in support of GPU-accelerated computing. In recent years, AI has joined the s Read more…

By John Russell

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

IBM Accelerated Insights

Data Management – The Key to a Successful AI Project

 

Five characteristics of an awesome AI data infrastructure

[Attend the IBM LSF & HPC User Group Meeting at SC19 in Denver on November 19!]

AI is powered by data

While neural networks seem to get all the glory, data is the unsung hero of AI projects – data lies at the heart of everything from model training to tuning to selection to validation. Read more…

SC19 Student Cluster Competition: Know Your Teams

November 19, 2019

I’m typing this live from Denver, the location of the 2019 Student Cluster Competition… and, oh yeah, the annual SC conference too. The attendance this year should be north of 13,000 people, with the majority attende Read more…

By Dan Olds

Hyperion: AI-driven HPC Industry Continues to Push Growth Projections

November 21, 2019

Three major forces – AI, cloud and exascale – are combining to raise the HPC industry to heights exceeding expectations. According to market study results r Read more…

By Doug Black

At SC19: Bespoke Supercomputing for Climate and Weather

November 20, 2019

Weather and climate applications are some of the most important uses of HPC – a good model can save lives, as well as billions of dollars. But many weather an Read more…

By Oliver Peckham

Hazra Retiring from Intel Data Center Group, Successor Not Known

November 20, 2019

Rajeeb Hazra, corporate VP of Intel’s Data Center Group and GM for the Enterprise and Government Group, is retiring after more than 24 years at the company. At this writing, his successor is unknown. An earlier story on... Read more…

By Doug Black

Jensen Huang’s SC19 – Fast Cars, a Strong Arm, and Aiming for the Cloud(s)

November 20, 2019

We’ve come to expect Nvidia CEO Jensen Huang’s annual SC keynote to contain stunning graphics and lively bravado (with plenty of examples) in support of GPU Read more…

By John Russell

Top500: US Maintains Performance Lead; Arm Tops Green500

November 18, 2019

The 54th Top500, revealed today at SC19, is a familiar list: the U.S. Summit (ORNL) and Sierra (LLNL) machines, offering 148.6 and 94.6 petaflops respectively, Read more…

By Tiffany Trader

ScaleMatrix and Nvidia Launch ‘Deploy Anywhere’ DGX HPC and AI in a Controlled Enclosure

November 18, 2019

HPC and AI in a phone booth: ScaleMatrix and Nvidia announced today at the SC19 conference in Denver a joint offering that puts up to 13 petaflops of Nvidia DGX Read more…

By Doug Black

Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI

November 17, 2019

Intel today revealed a few more details about its forthcoming Xe line of GPUs – the top SKU is named Ponte Vecchio and will be used in Aurora, the first plann Read more…

By John Russell

SC19: Welcome to Denver

November 17, 2019

A significant swath of the HPC community has come to Denver for SC19, which began today (Sunday) with a rich technical program. As is customary, the ribbon cutt Read more…

By Tiffany Trader

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

Cray, Fujitsu Both Bringing Fujitsu A64FX-based Supercomputers to Market in 2020

November 12, 2019

The number of top-tier HPC systems makers has shrunk due to a steady march of M&A activity, but there is increased diversity and choice of processing compon Read more…

By Tiffany Trader

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Dell Ramps Up HPC Testing of AMD Rome Processors

October 21, 2019

Dell Technologies is wading deeper into the AMD-based systems market with a growing evaluation program for the latest Epyc (Rome) microprocessors from AMD. In a Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Xilinx vs. Intel: FPGA Market Leaders Launch Server Accelerator Cards

August 6, 2019

The two FPGA market leaders, Intel and Xilinx, both announced new accelerator cards this week designed to handle specialized, compute-intensive workloads and un Read more…

By Doug Black

Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI

November 17, 2019

Intel today revealed a few more details about its forthcoming Xe line of GPUs – the top SKU is named Ponte Vecchio and will be used in Aurora, the first plann Read more…

By John Russell

When Dense Matrix Representations Beat Sparse

September 9, 2019

In our world filled with unintended consequences, it turns out that saving memory space to help deal with GPU limitations, knowing it introduces performance pen Read more…

By James Reinders

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This