Enhancing a Modula-2 compiler to help students learn interactively within the Ceilidh system

Introduction
Previous work
Compiler implementation and maintenance
Encouraging interactive learning during program development
Exploiting traditional code optimization algorithms to yieldsemantic errors
Conclusions and further work
References

Gaius Mulley, Keith Verheyden
Department of Computer Studies, University of Glamorgan
Treforest, Mid Glamorgan, CF37 1DL, UK

ABSTRACT

A compiler has been modified to provide extra warnings and error messages appropriate for level one students. The compiler warnings are generated post intermediate code optimization in an attempt to maximize the knowledge a compiler has about the program source. This paper summaries those enhancements, discusses their usefulness and concludes with some further suggestions for future work in this area.

Introduction

The University of Glamorgan Department of Computer Studies has been using Modula-2 as a level one teaching language for the last six years. Three years ago the department started to use Linux as an alternative operating system on IBM/PC compatibles as this integrated well with the existing SunOS and MSDOS/Windows 3.1/Novell operating systems. The Modula-2 environment ran under MSDOS together with the computer based learning package, CLEM. This academic year, the University has adopted Windows NT in place of Windows 3.1 and MSDOS. Although the existing Modula-2 MSDOS environment works with Windows NT the level one HND teaching team decided to migrate to the Ceilidh(3) computer based learning package under the Linux operating system(10).

Ceilidh provides interactive guidance for students undertaking assignments or problem solving exercises and Linux provides the security necessary for running the coursework marking elements of Ceilidh. It also allows the teaching team to easily modify the level one programme and improve the learning material. The automated guidance is very helpful with increased student numbers and it operates by performing a static syntax analysis on the student programs.

The Ceilidh system provides tools for static syntax analysis checking built from a combination of standard UNIX utilities and custom programs. The modifications necessary to make Ceilidh operate with Modula-2 are discussed elsewhere(10). It was decided that a Modula-2 compiler that was written locally should be used with the Ceilidh system. The reasons for this were ease of maintenance and that it could be easily modified to complement the Ceilidh system.

Previous work

Although static syntax analysis is helpful when developing programs, the teaching team felt that there was an opportunity to provide more semantic guidance for the students. The semantic analysis would be provided by adapting the compiler and exploiting its knowledge about the source program. Although this approach has been taken before, notably by Johnson(7) with the C tool lint, there are some important differences.

The lint tool examines C source programs, detecting a number of bugs and obscurities. It enforces the type rules of C more strictly than some C compilers. It may also detect a number of wasteful, or error prone constructions which nevertheless are, strictly speaking, legal. It is implemented by a number of programs, one of which is a version of the Portable C Compiler(9)(8) which performs lexical and syntax analysis on the input text, constructs and maintains symbol tables, and builds trees for expressions. This program produces an intermediate file which consists of lines of ASCII text. Another program consumes the intermediate file comparing all of the definitions, declarations, and uses for consistency.

The weakness with this approach is that some of the program transformations associated with code optimization are never applied. The transformations that are commonly used to optimize code actually do so by understanding the behaviour of certain code boundaries. This knowledge can be used, not only to improve code quality, but also to check the user’s use of source constructs.

The approach presented here is to perform some of the semantic checking post intermediate code optimization and to add explicit warnings for the benefit of students undertaking their first programming course. These warnings comment on programming style and are enabled via a command line switch (-students).

Compiler implementation and maintenance

The design and implementation of the compiler commenced in 1987 as part of a research project(1)(11) and was based on the 2nd edition Programming in Modula-2(13). As far as level one students are concerned, the major difference between the 2nd edition and later editions is that identifiers must explicitly be exported from the definition module. It is believed that this is useful when learning about definition and implementation modules, requires discipline from the student and reinforces the reason for the definition module. It also provides the user of a definition module with a brief list of all exported identifiers.

Another attractive feature from the compiler implementor’s viewpoint is the ease at which Modula-2 can be integrated with C and thus the UNIX operating system. Identifiers which are qualified on export have the module name prefixed to them in the assembler and object file. Identifiers which are not qualified on export do not have the module name prefixed, thus definition modules can be easily written for corresponding C files.

The compiler has been implemented using a recursive descent parser and produces quadruples as intermediate code. The 80[345]86 code generator is very simple but mature and reliable, the compiler has been used by staff and final year students since 1994.

The sources to the compiler are maintained under the control of CVS(4) thus logging all changes and conflicts which might occur during development and maintenance. When the compiler was originally installed under Linux a complete set of sources were "checked out", creating a new branch so that independent compiler development, such as code optimization, might continue in parallel. This strategy works very well, shielding the student version from any major changes, but allowing minor bug fixes to be confidently and swiftly applied. Once any change is applied to the compiler, a regression test suite is run before installing the new version.

Encouraging interactive learning during program development

Level one students require unambiguous error messages from a compiler. From the teaching teams experience students often request help in understanding the error messages from compilers. All too often compilers give terse error messages or error messages which could give more information about what is expected. Two examples from a well known commercial compiler are parameter mismatch error or undeclared identifier in export list of module. Both could be more informative. The first message could explain exactly which parameter is at error and what type of identifier is expected. The error message should also show source code fragments of both the procedure declaration and offending procedure call. The second message could at least include the name of the offending identifier!

The compiler records the position of each declared identifier and where they are first used, thus when a problem arises with a symbol it can display both the declaration or first usage together with the current position for any symbol. Extra special care is taken over procedure parameter errors as these require the student to examine at least two fragments of source code. For example when the program module shown in figure 1 is compiled the student is given a GNU emacs compatible error stating: the file and line number of the procedure declaration, the program module call, a synopsis of the expected formal parameter, the type and name of the identifier being passed illegally. When used with Ceilidh, the compiler is run with the -verbose option indicating that all errors should show the corresponding fragments of source files.

Image grohtml-203941.png

Figure 1: source code error in procedure call

Over a number years it has been observed that the same student mistakes occur, typically these might be: using variables before they have been initialized or never using variables and parameters. There are at least two sub classes of the these problems. Sometimes they occur because duplicate variables are declared at different scope levels and at other times the student might declare two variables with the same name but different alphabetical case. At other times, keywords are frequently typed in lower case. Ironically this last error can be easily overlooked by a new programmer, yet can be so easily found by a compiler. While some of these problems are technically legal they are generally considered bad programming practice. All of these cases are detected by this Modula-2 compiler and appropriate warnings are generated. The compiler only terminates compilation when an illegality is found or when it detects a variable being used before being initialized. Table 2 shows some examples of common problems and associated compiler errors and warnings.

Exploiting traditional code optimization algorithms to yieldsemantic errors

Another category of semantic error that can be found automatically is that of the infinite loop. Lint was able to find simple infinite loops by searching for special case loops such as while (1) and for (;;), but it was not able to determine whether a function is never called or whether more complex loops are infinite. It is necessary to have more source program knowledge to discover semantic problems in these situations.

The approach presented here is to perform further semantic analysis after code optimization as extra source program knowledge will be derived by these algorithms. For example the Modula-2 compiler performs a modified common subexpression algorithm presented in Aho(2) during the code optimization phase which combines constant folding, strength reduction and copy propagation. These optimization techniques are exploited together with jump and branch folding to find further infinite loops. The compiler then checks for any backwards referencing quadruple and then searches this sublist for a conditional branch whose destination address is beyond the bottom of the loop. The conditional branch operands are examined for any modification within the loop. Currently this algorithm is limited to operands which are leaves rather than expression trees, nevertheless, it has been fruitful in finding level one programming errors (45 infinite loops found in 953 programs) and also dead code in the lexical analysis component of the compiler!

Image grohtml-203942.png

Figure 2: example of errant code which might result in an infinite loop

The example in figure 2 is missing the statement RETURN( TRUE ) at † which might result in an infinite loop at runtime. Superficial infinite loop analysis at compile time might show that the loop condition variable is being altered and thus this loop is not a candidate for a warning. However, if the loop is broken down into quadruples and the code optimization techniques described above are applied, the loop will be transformed into the following intermediate pseudo code.

start:
   if p#NIL goto endloop
   if p#q goto elselabel
   WriteString(’found item’)
   WriteLn
   goto start
elselabel:
   p := p^.Next
   goto start
endloop:

At this point, an infinite loop is detected easily since the condition variable, p , is never altered in the first loop.

It is of course possible to fool the loop analysis by subverting pointers, but these cases are very rare and also bad programming practice. However even in this case, analysis would yield an unwarranted warning causing the user to reexamine the code.

Conclusions and further work

The addition of compiler warnings were, with hindsight, a more important benefit to the students than originally perceived by the authors. The warnings were added throughout the semester, as required, in an attempt to stop the students from producing sloppy code. These warnings are generally easy to implement and include checking for identifiers which look like keywords, same name variables in different visible scopes, variables and parameters declared but never used and variables used before being initialized. The choice was made only to terminate compilation with this last warning classification as it would result in an errant runtime program. While the warning messages might be too strict for proficient users, they do cut out some of the bad habits which can be all too easily "learned" during level one programming.

Although the regression tests were useful a more formal stress testing mechanism could be adopted(5)(12)(6). Future work could address the issue of detecting more complex infinite loops and dereferencing NIL valued pointers at compile time.

References

1.
2.		A.V. Aho, R. Sethi, and J.D. Ullman, Compilers: Principles Tools and Techniques, Addison Wesley (1986).
3.		S. Benford, E. Burke, E. Foxley, N. Gutteridge A.M. Zin, The Ceilidh System: A General Overview, Learning Technology Research, Computer Science Department, Nottingham University (1994).
4.		B. Berliner, “CVS II: Parallelizing Software Development,” Proceedings of the Winter 1990 USENIX Conference, pp. 141-352, Washington DC, USA (January 22-26, 1990).
5.		C.J. Burgess, “The Automated Generation of Test Cases for Compilers,” Software Testing, Verification and Reliability 4, pp. 81-99 (1994).
6.		C.J. Burgess, M. Saidi, “The automatic generation of test cases for optimizing Fortran compilers,” Informatuin and Software Technology 38, pp. 111-119 (1996).
7.		S.C. Johnson, “Lint, a C Program Checker,” Comp. Sci. Tech. Rep. No. 65 (1978). updated version TM 78-1273-3.
8.		S.C. Johnson, “A Portable Compiler: Theory and Practice,” Proc. 5th ACM Symp. on Principles of Programming Languages, pp. 97-104 (January 1978).
9.		S. C. Johnson and D. M. Ritchie, “UNIX Time-Sharing System: Portability of C Programs and the UNIX System,” Bell Sys. Tech. J. 57(6), pp. 2021-2048 (1978).
10.		S.F. Lewis, “Developing a Modula 2 course for Ceilidh,” CTI Computing 5th Annual Conference on Teaching of Computing, pp. 126-128, Dublin (1997).
11.		R.J. Loader and G.P.C. Mulley, “Performance Prediction of Network Protocol Engines with Simulation,” Tenth Annual Microcomputer Applications Workshop, University Of Strathclyde, Glasgow (8-10th September 1986).
12.		C. Pronk, “Stress Testing of Compilers for Modula-2,” Software Practice and Experience 22(10), pp. 885-897 (October 1992).
13.		N. Wirth, Programming in Modula-2 2nd Edition, Springer Verlag (1982).

Image grohtml-203943.png

Table 1: source code fragments relating to warnings in Table 2

Image grohtml-203944.png

Table 2: errors and warnings relating to the code fragments in Table 1

This document was produced using groff-1.22.4.