Parallel Simulation Technologies

Next: Flow/Fluid-based Modeling Methodologies Up: Trends in Network Modeling Previous: Sequential Simulation Technologies

Parallel Simulation Technologies

The critical issue when executing a discrete-event simulation program on parallel or distributed computers is ensuring that the program is properly synchronized. The distributed simulator consists of a collection of logical processes or LPs, each modeling a distinct component of the system, e.g., a server in a queuing network. LPs communicate by exchanging time stamped event messages, e.g., denoting the arrival of a new job at that server. The synchronization mechanism must ensure that each LP processes events in timestamp order and prevent events in the simulated future from affecting those in the past.

In general, there are two approaches for addressing the synchronization problem. Conservative approaches ensure that events are processed in timestamp order by waiting until it is ``safe'' to process each event (i.e., no smaller time stamped events will be received later). Optimistic approaches allow events to be processed out of order, but provide a mechanism to detect and erase incorrect event computations.

The seminal conservative approach is the Null Message algorithm developed by Chandy and Misra [24] and Bryant [15]. In this algorithm, each process sends a ``null message'' to its neighboring processes after executing each event. The ``null message'' contains a timestamp that serves as a ``promise'' that the sending process will not later send a message with timestamp smaller than to the receiving process. If the next event to be processed has a timestamp that is greater than any of the ``null message'' events, that process must wait until it receives the next wave of ``null messages'' such that all ``null message'' timestamps are greater than the timestamp of the next event to be processed. It has been shown that this algorithm avoids deadlock provided that there is no cycle in which a message could traverse without incrementing its timestamp (i.e., timestamp increments must be non-zero).

If zero timestamp increment messages are allowed, then a dependency cycle among LPs may develop along which simulated time is not incremented and, consequently, the Null Message algorithm is unable to determine which LP should be allowed to process its next event. While refinements to the original Null Message algorithm have been made to mitigate the dead-lock problem [42], all conservative mechanisms still require a significant amount of ``LP slackness'' to achieve performance. By this we mean LPs cannot be too tightly coupled, otherwise sequential execution will ensue [45].

As a work around to this problem, developers either write applications in such a way that the timestamp increment or lookahead between LPs is maximized or the underlying system helps the developer maximize lookahead, such as Path Lookahead [72] and Critical Channel Traversing [107]. Conservative systems, such as SSFNet and daSSF [4,5,27,28], have used similar techniques to efficiently simulate up to 100,000 TCP/IP flows on a shared-memory multiprocessor system.

So called ``federated'' approaches, such as the DOD High Level Architecture [55] and RTIKit [37], utilize a conservative synchronization mechanism to support the interoperability among a group of sequential or parallel simulators. Because a conservative protocol is used, these techniques fall prey to the same problems as conservative parallel simulators. Moreover, the performance of any federated approach is inheritly limited by the performance of slowest simulator.

The seminal work in optimistic approaches is the Time Warp mechanism developed by Jefferson and Sowizral [59,60]. In contrast to the Null Message algorithm, the Time Warp mechanism uses a detection-and-recovery protocol to synchronize the computation. Any time an LP determines that it has processed events out of timestamp order, it ``rolls back'' those events, and re-executes them. Rolling back an incorrect event computation requires one to (i) undo any changes the incorrect computation made to the state of the LP, and (ii) ``unsend'' any messages that it sent. The first task is accomplished by periodically saving the state of the LP, and restoring it when a rollback occurs. Unsending previously sent messages is accomplished by sending an anti-message for each message that is to be canceled. When received, the anti-message deletes (annihilates) the previously sent message. If that message had already been processed, the LP is first rolled back (possibly generating additional anti-messages) prior to annihilation. Recursively applying this ``roll back and send anti-message'' cycle will eventually erase all effects of the ``unsent'' message.

In order to reclaim memory (i.e., fossil collection) of processed messages, snapshots of the LP's state, and to allow operations that cannot be rolled back (e.g., I/O), global virtual time (GVT) is defined. GVT is a lower bound on the timestamp of any rollback that might later occur, and is defined as the smallest timestamp of any unprocessed or partially processed message or anti-message. Irrevocable operations occurring at simulated times older than GVT can be performed, and storage carrying a timestamp older than GVT can be reclaimed.

Of the two approaches, optimistic approaches were viewed as the most promising of the two for a number of reasons. First, optimistic approaches are able to ``uncover'' more parallelism in the application and theoretically achieve greater performance. Conservative mechanisms `block'' on the remote possibility, no matter how small, that some event might affect another. Optimistic methods do not suffer from this drawback since they continually process events until an error in causality actually happens. Assuming fixed rollback costs, Lipton and Mizell [69] have shown that Time Warp can outperform the Chandy/Misra algorithms by an arbitrary amount. Given processors, Time Warp can outperform the Null Message algorithm by a factor of . Lipton and Mizell further show that the opposite is not true: the Null Message algorithm can only outperform Time Warp by a constant factor.

Critics of optimistic methods counter that in practice Time Warp systems incur excessive rollback costs and state saving overheads as reasons to use conservative techniques. In particular, it has been believed in the parallel simulation community that applications such as packet-level network models are not well suited for optimistic methods because they incur such a high state-saving overhead relative to the processing time of a single event.

Another source of concern with Time Warp systems is memory utilization. The basic unit of memory can be generalized to a single object called a buffer [30]. A buffer contains all the necessary event and state data for a particular LP at a particular instance in virtual time. Because the optimistic mechanism mandates support of the ``undo'' operation, these buffers cannot be immediately reclaimed. There have been several techniques developed to reduce the number of buffers as well as to reduce the size of buffers required to execute a Time Warp simulation. These techniques include infrequent state-saving [6,68], incremental state-saving [51,99], and most recently reverse computation [20].

Rollback-based protocols have demonstrated that Time Warp systems can execute in no more memory than the corresponding sequential simulation, such as Artificial Rollback [67] and Cancelback [61], however performance suffers. Adaptive techniques [29,30], which adjust the amount of memory dynamically, have been shown to improve performance under ``rollback thrashing'' conditions and reduce memory consumption to within a constant factor of sequential. However, for small event granularity models (i.e., models that require only a few microseconds to process an event), these adaptive techniques are viewed as being too heavy weight.

In light of these findings, Time Warp programs typically allocate much more memory than is required by the sequential simulation. In a recent performance study in retrofitting a large sequential Ada simulator for parallel execution, SPEEDES consumed 58 MB of memory where the corresponding sequential only consumed 8 MB. [94].

Finally, because of the communication overhead due to regular GVT calculations to support fossil collection, scalable Time Warp performance on a large network of workstations (NOW) (i.e., 30 nodes) has never been achieved.

Because of this long list of drawbacks, the pendulum has swung back in the direction of conservative approaches for synchronization in the networking modeling and simulation domain. State-of-the-art systems like GloMoSim [72] and SSFNet [27,28,96] support advanced conservative techniques. We, however, believe that for optimistic parallel simulations, our new reverse computation techniques will break these barriers thus opening the way for the potential that these optimistic mechanisms could outperform their conservative counterparts, but a complete performance analysis must be done before any conclusions can be drawn. That performance analysis is one of the major research foci of this proposal.

Next: Flow/Fluid-based Modeling Methodologies Up: Trends in Network Modeling Previous: Sequential Simulation Technologies

Christopher D. Carothers 2002-03-07