The application of a real time systems simulation tool to protocol engine design

1. Introduction
2. Using next-event simulation for modelling a real-timeexecutive
3. The modelling of processes - the abstract machine
4. Mapping a real machine onto the abstract machine
5. Protocol engine design
6. Results
7. Conclusions
References

Mr G.P.C. Mulley and Dr R.J. Loader
Department of Computer Science University of Reading Whiteknights, PO Box 220, Reading, RG6 2AX

Published in Procedings of the International Conference on Data Communication Technology, The National Institution for Higher Education, September 1988, p240-250

1. Introduction

The design of real time microprocessor based systems is becoming increasingly complex. Such systems are now constructed from a combination of high level VLSI components, which include processors together with their associated system software. It is not easy to extrapolate from the performance measures of individual components to that of the complete system. All too often it is the benchmarking of the production system that finally yields an indication of overall throughput.

The systems of interest are assumed to be interrupt driven and require a multi tasking executive to supervise and coordinate the various processes. Previous work[7][1] has shown that in addition to the simple processor demand, system performance is limited by the necessary context switch and synchronisation required to support asynchronous processes. A real time system requiring a timeout facility introduces a further burden which reduces overall performance.

Performance prediction of real time systems is difficult due to the complex relationship of hardware measures and software factors. The interrelated nature of these factors means that solving any single problem in isolation will not necessarily deliver the desired improvement. Thus there is a need for a performance analysis tool which considers all the key factors that restrict the throughput of the system.

A design tool has been written to analyse the limitations of the proposed hardware and software configuration prior to implementation. The design tool consists of a Modula-2 compiler, assembler and simulator. The modelled simulation components, interconnections and software description are described by input files. The software can be expressed in an abstract machine code or in a subset of Modula-2. The design tool is capable of monitoring multiple processors each running an executive controlling explicit processes, interrupt and direct memory access devices.

2. Using next-event simulation for modelling a real-timeexecutive

For next event simulation[3] a system is described by a set of variables whose values are collectively known as the system state. The system state changes with time. Discrete event simulation models a system in which state changes can be represented by a collection of discrete events. The occurrence of an event implies a change in the system state but the values of the state variables remain constant between events. There is no need to take account of the time periods when the system state is constant. The next event simulation method exploits these inactive periods by repetitively advancing simulated time to that of the next event and effecting any state changes consequent upon that event.

To show the close correspondence between the technique of next event simulation and the operation of a multi tasking executive consider the simulation of a single server queue. In this problem there are two events: arrival of a job for service and end of service. The system state variables are the server state (idle or busy) and the number in this system (nj). The simulation algorithm of jobs can be expressed informally as follows:

nj := 0
REPEAT
   get next event(event)
   CASE event OF

   arrival:         nj := nj+1 ;
                    generate next arrival event

   end of service:  nj := nj-1 ;
                    server := idle


   ESAC
   IF server is idle AND nj>0
   THEN
      schedule next job for service
      generate end of service event
   END
UNTIL end of simulation

The events are recorded in time order on a queue. The calculation of the time interval to the next event can be obtained in a number of ways but frequently the intervals are drawn from a cumulative distribution using a pseudo-random number generator. The choice of distribution will depend on the system being modelled.

Lister,[5] Comer[2] and Loader[6] present a clear description of the constituents of a real time system executive or nucleus. These are an interrupt handler, a low level scheduler (or dispatcher) which selects the next system process to execute and the semaphore operations wait and signal for inter-process communication. Purser[9] discusses the executive in the context of data communication.

There is a direct correspondence between the single server queue simulation algorithm and the activity of the executive. Clearly scheduling the next job for service corresponds to low level scheduling of processes whereas events correspond to interrupts and the calling of wait and signal operations. The construction of the program code to implement the simulated nucleus functions can be made very close to that of the real system. If a language such as Modula-2 is used then the simulated nucleus functions of the dispatcher, wait and signal, would manipulate the same process data structures as a real implementation. An additional benefit is that accurate timings of system overheads such as context switching, process synchronisation and interrupt handling can easily be included in the simulation.

3. The modelling of processes - the abstract machine

The previous section described how next event simulation was applicable to model the internals of a multitasking executive. A major problem that remains is the modelling of processes which place requests upon the executive. The simulation needs a reliable means of describing process activity to generate events. This could be achieved by using a trace driven simulation technique, but this implies the use of long scripts for each process. The most compact form of script is a program. Depending on the level of detail required for the model, the program may be approximated by pseudo code, with groups of instructions coalesced into an aggregate processor demand. This aggregate will termed a profile instruction. A simulation event is associated with the beginning of the execution of a profile instruction. A profile instruction will usually correspond to one or more machine code instructions on a real processor. A separate script is written for every process and the simulated scheduler successively interprets a process one profile instruction at a time. A time of completion (or time of new event) must be calculated for each instruction.

A design prototype evolves through a series of stages or levels. The later the stage the greater the detail included in any model. Ideally the final model should correspond closely with reality. An abstract machine was introduced to provide a systematic means of modelling all the design stages. The instruction set of the abstract machine can be summerised as follows:

	(i)		Executive calls; wait, signal, initprocess, killprocess and initsemaphore.
	(ii)		Request for the processor; pr.
	(iii)		Instructions which implement assignment, iteration and conditional statements.

The request for the processor, pr, represents an average load of a block of "real" machine code instructions whose functionality is not modelled in detail at the current level.

Device characteristics such as speed and connectivity are specified in a system description language. Within this language three types of simulation device exist; clocks, DMA devices and character devices. The devices are accessible to the abstract processor by explicit input/output instructions. The processor communicates to an model device controller by sense status polling or a ready interrupt.

4. Mapping a real machine onto the abstract machine

The previous section defined a means of specifying a process in terms of abstract machine profile instructions. It remains to define the time interval associated with interpreting a single profile instruction.

The profile instructions can be classified into two groups: executive primitives and user instructions. A profile instruction is either an executive primitive such as wait, signal or more basic instructions which together form assignments or loops. The times associated for each executive primitive can be derived by examining real implementation code for wait and signal. The cost is expressed in terms of abstract processor clock cycles. Thus a profile instruction cost can be converted to an equivalent time interval by using a given processor speed. Direct memory access cycle stealing is modelled by reducing the number of cycles the processor receives.

For a particular real machine, manual generation of profile instruction costs are tedious. A pr represents a sequential statement sequence (SSS). An SSS is defined as a block of code which only has one entry and exit point. The equivalent number of machine code cycles for this sequence have to be accumulated from manufacturer’s data. Clearly an automatic method is needed if the simulation is to be a practical design tool. A compiler can be adapted to produce these costs as part of its code generation phase. Thus high level source code can be translated to functionally equivalent profile instructions together with associated costs.

A compiler for a subset of Modula-2 has been written to fulfill these specifications. Typical code generation techniques will produce lists of SSS’s and therefore it is straightforward to automatically produce the necessary execution timings and costs. The current compiler uses costs based on the Motorola MC68000 but it could easily be altered to generate different processor costs.

The simulation abstract machine must be configured with real system overheads. Hardware characteristics such as processor speed, instruction speed and rate of device data transmission, must be mapped onto the abstract machine. These overheads are entered using the system description language and the characteristics are derived from the design specification of real components.

5. Protocol engine design

This section describes the application of the general purpose tool to model a particular multiprocessor system. The network protocol engine poses difficult performance constraints and thus makes a suitable case study for examination. In addition previous work at Reading by Martin[8] involved designing and measurement of a Cambridge Ring network protocol engine. The protocol engine was non intrusively monitored and each component of the protocol software was measured to determine the processor load on the system. These measurements can thus be used to validate the simulation tool.

Following the ISO 7 Layer model,[4] network protocols are defined in terms of the exchange of messages between peer layers which use the services of an adjacent layer. Thus the purpose of each layer is to offer a set of services to the higher layers, shielding superiors from the details of how the offered services are actually implemented. In a practical implementation of the protocol engine the messages from a given layer pass vertically down the protocol tower across the physical connection and up a similar tower to a corresponding peer layer. At the lowest level there is physical communication with another machine and only at this point are data really transferred. Below the transport layer the asynchronous character of the protocols lead to the implementation of each layer as an executive process. Thus pseudo parallelism is achieved through interleaving input, output and processing. The coordination of pseudo parallelism is solved by a multitasking executive.

The main characteristics of the protocol engine were:

	(i)		The software/hardware interface to the Cambridge Ring access logic were driven by DMA.
	(ii)		The protocol software was designed in a high level language and hand compiled into assembly language.
	(iii)		An 8 Mhz Z80 processor was employed to execute the software.

The simulation model consisted of two protocol engines communicating across the Cambridge Ring network, each protocol engine modelled the mini packet layer and basic block layer. The characteristics of the the Cambridge Ring hardware for exchanging messages across a network is specified in the system description language. The Cambridge Ring network was assumed error free and the network model used a DMA interface at either machine. The protocol software was translated by hand into equivalent profile instructions with associated timings derived from the Zilog Z80 processor manual.[10]

Image grohtml-204771.png

Model of Two Cambridge Ring Protocol Engines

The simulation tool allows the system designer to explore speculative hardware and software configurations. One obvious change was to substitute the Z80 for the MC68000 and to rewrite the system software in Modula-2 instead of assembler. The enhanced protocol engine considered had the following specifications:

	(i)		Protocol software written in Modula-2 and compiled into MC68000 machine code.
	(ii)		A 16 Mhz processor with a 16 bit interface to 125ns memory.
	(iii)		Identical ring access logic to the previous design.

6. Results

The results shown in Table 1 are the performance figures for two protocol engines exchanging a range of constant size messages. Martin[8] measured software activity in the protocol engine using a Hewlett Packard HP64000 workstation together with Z80 in-circuit emulation and software performance analyzer. The measurements are extracted from his thesis and are indicated in bold. The table contains a break down of processor load in each functional component of the protocol engine. A range of message sizes is shown together with the throughput and processor cost.

The second figure in each box is the validation result from the simulation model of the Z80 protocol engine. The maximum deviation of simulated results from measured data is 4%. The third figure is the simulation result from the enhanced protocol engine design employing the MC68000 processor to execute high level language protocol software.

It is interesting to compare the performance of the Z80 protocol engine against the predicted MC68000 protocol engine. The Modula-2 compiler produced MC68000 32 bit arithmetic operations throughout, whereas the hand produced Z80 code used 8 bit operations if at all possible. Therefore during queue manipulation the Z80 is forced to use expensive 16 bit and indexed operations for pointer manipulation, but during the protocol operation most instructions manipulate 8 bit data. Thus the MC68000 is more efficient in the queue manipulation routines because of the faster instructions needed for pointer manipulation, but it is less efficient in executing the basic block protocol because it is using 32 bit operands where 8 bit operands would suffice. During the transmission of small packet sizes the protocol engine is processor bound and the overall increase in speed of the 16 Mhz MC68000 against the 8 Mhz Z80 yields a throughput gain of approximately 450%. The transmission of large packets in the protocol engine results in the system becoming ring input/output bound, thus the increase in processor speed makes little difference in throughput.

7. Conclusions

A design tool has been written in Modula-2 to investigate the performance of real time systems. The simulation models an executive with explicit processes, interrupts, direct memory access devices and process synchronization. The executive processes may be distributed between different physical processors. Processes may be expressed in Modula-2 or abstract profile code. Simulation of an existing system has shown the tool to be reliable. The tool has predicted the performance of a enhanced protocol engine design.

Recent work has examined higher network layers on the novel protocol engine. Future work will extend the simulation device model to include disks and a memory management unit. These additions widen the use of the tool as it could be used during research in both networks and operating systems. With a graphical interface the design tool could form the basis of an interactive teaching tool for real time systems.

References

1.		Brereton, O.P., “Performance Figures for Message Passing over a Cambridge Ring,” Software Practice and Experience 12(1), pp. 95-96, John Wiley and Sons, New York (January 1982).
2.		Comer, D., Operating System Design: The XINU Approach, Prentice-Hall International, Englewood Cliffs (1984).
3.		Fishman, G.S., Principles of Discrete Event Simulation, John Wiley and Sons, New York (1978).
4.		ISO, “Reference Model of Open Systems Intercommunication,” International Standards Organisation ISO/TC97/SG16/N227 (1979).
5.		Lister, A.M., Fundamentals of Operating Systems 2nd Edition, The MacMillan Press Ltd (1979).
6.		Loader, R.J., “Introduction to Systems Programming PART 1,” Internal Report Department Of Computer Science, University of Reading (July 1985).
7.		Loader, R.J. and Martin, P.A., “A High Performance Concentrator for the Cambridge Ring” in Ring Technology Local Area Networks, ed. Dallas I.N. & Spratt E.B., North - Holland, Amsterdam (1984).
8.		Martin, P.A., The Exploitation of Cambridge Ring Performance Through the Design and Analysis of Intelligent Access Logic, PhD. Thesis. University of Reading (1985).
9.		Purser, M., Data Communications For Programmers, pp. 72-92, Addison-Wesley Publishing Company Inc, Wokingham England (1985).
10.		Zilog, Z80 - CPU Technical Manual, Zilog Inc (1977).

Protocol engine transmitting a range of constant length packets.

Table 1

This document was produced using groff-1.22.4.