Experiences gained from producing a compiler to guide first year programming students

1. Introduction and previous work
2. Improving compiler error descriptions and includinguseful warning messages
3. Classification of error and warning messages within thecompiler
4. Conclusion
References

Stuart Lewis, Gaius Mulley
Department of Computer Studies, University of Glamorgan
Treforest, Mid Glamorgan, CF37 1DL, UK

ABSTRACT

A Modula-2 compiler has been modified to provide additional warnings and error messages appropriate for students undertaking their first programming course. This paper justifies the inclusion of these warning messages and shows their statistical relevance.

1. Introduction and previous work

Published in Image grohtml-20441-1.png Annual Conference on Teaching of Computing, Dublin, pp129-131, 1997

The University of Glamorgan Department of Computer Studies has been using Modula-2 as a level one teaching language for the last six years. Three years ago the department started to use Linux as an alternative operating system on IBM/PC compatibles as this integrated well with the existing SunOS and MSDOS/Windows 3.1/Novell operating systems. The Modula-2 environment ran under MSDOS together with the computer based course management system, CLEM. This academic year, the University has adopted Windows NT in place of Windows 3.1 and MSDOS. Although the existing Modula-2 MSDOS environment works with Windows NT the level one HND teaching team decided to migrate to the Ceilidh(1) computer based learning package under the Linux operating system(7). The Ceilidh environment provides interactive guidance for students, Linux provides the security necessary for automated marking and X windows and xemacs gives the students a GUI environment.

It was noticed that the students make the same common mistakes when undertaking their first programming language. In particular in Modula-2 they have a tendency to: type keywords in lower case; declare more variables than necessary; declare variables with the same name in different scopes and then become confused about which variable is being referenced; put types in place of variables and variables in the place of types; attempt to pass wrong typed variables as a procedure parameter; use indices outside a for loop; and forget to alter indices inside while and repeat loops. Many students with previous experience of other languages: C, Pascal, BASIC which run on personal computers are initially discouraged by the tight type checking and case sensitivity of Modula-2.

Generally compilers are not very helpful in explaining why an error has occurred. While an error message may make sense to a mature programmer familiar with the language they can be improved to help a novice programmer who is not only learning programming but also learning the particular language. Ideally a compiler should not only identify an error correctly but help a user in fixing the problem. This might be done by issuing warnings about bad programming practice or suggesting a likely cause of the mistake or correctly displaying relevant source code containing an error.

Previous work has been performed in checking programming style 6 and warnings, notably by Johnson(3) with the C tool lint. Our work differs from Johnsons in that lint does all its checking prior to the Portable C Compiler(5)(4) code optimization. We believe this to be a weakness as some of the transformations that are commonly used to optimize code actually do so by understanding the behaviour of certain code boundaries. This knowledge can be used, not only to improve code quality, but also to check the user’s use of source constructs.

The approach presented in(8) is to perform some of the semantic checking post intermediate code optimization. These warnings comment on programming style and are enabled via a command line switch (-students and -pedantic). We extend that research with further warning checks and present an analysis of statistical data based on the compiler usage.

It is ironic that many optimizing compilers contain information about the source program behaviour yet this is seldom forthcoming in the form of error or warning messages. Our goal was twofold: firstly to adapt an existing compiler to detect these problems and, secondly, make the errors more informative. We choose to modify a compiler built locally as it could be easily maintained as we had access to the CVS(2) development history repository. This gave the authors confidence that changes could be rapidly applied or removed as necessary.

2. Improving compiler error descriptions and includinguseful warning messages

All errors and warnings from the compiler are issued in the GNU emacs error format, thus the compiler can be run interactively from within emacs and students can utilize the emacs next error facility. Alternatively the student can click on the Ceilidh compile button and any error messages are displayed into the Ceilidh window together with a snapshot of offending source code.

It appears that students find certain categories of compile time errors harder to fix than others. In particular those errors which require the programmer to examine two sections of source code cause the most difficulty; these include incorrect parameter passing; declaring procedures differently between definition modules and implementation module. In making the error messages more informative our compiler tries to show both sections of source code. In the case of attempting to pass an incorrect procedure parameter the compiler also describes the type expected and the type actually given.

Within the compiler the declaration and first usage of each variable, type and constant is recorded. Furthermore each variable has an associated read and write usage list. Thus whenever a type incompatibility is detected the compiler can issue the offending fragment together with the declaration of the various types. The variable read and write usage list is used for two purposes: it is intended as a building block for code optimization; also it allows the compiler to detect variables being used before being initialized, elementary infinite loops and manipulation of for loop indices and using these indices outside the loop.

The development effort required to implement these features was minimal (1 month) compared to the task of building the compiler (24 months).

3. Classification of error and warning messages within thecompiler

Figure 1 shows a classification of the warning and error messages together with their statistical occurrence during the final programming assignment. This assignment required students to implement a number of functions and procedures within a skeleton implementation module. The skeleton module as supplied to the students compiles with warnings about declared but unused parameters and an infinite loop. Thus to avoid distortion these warnings have been removed from this table. Figure 1 was generated from the remaining errors and warnings produced by 2378 compilations. It is observed that some of the semantic warnings; manipulation of a for loop index and uninitialized variable messages are rarely generated. Nevertheless it would be dangerous to believe that these messages are less important, as the latter two problems can result in much debugging effort on the part of a programmer. It is interesting to note that 22% of errors are parameter related, thus justifying the extra effort placed into producing specialist messages.

Image grohtml-204411.png

Figure 1: Table of errors and their percentage occurence

4. Conclusion

It is interesting that many of the building blocks (variable life lists, variable reads, variable writes) required to implement these warnings are often necessary to achieve reasonable code optimization. Thus there seems little justification overlooking these checks in modern optimizing compilers.

On a practical note it cannot be stressed how useful CVS was in undertaking this development and research activity. During the first semester the student version of the compiler branched off from the main development of the compiler in which a new code generator was being constructed. This allowed the student compiler to receive essential bug fixes, error and warning message improvements. Three months later this branch was merged into the main development of the compiler.

In conclusion we believe that, although these warnings are appropriate for first year students, the transformation of this compiler in proactively finding some runtime errors and guiding programming style would benefit students of other levels and where students are debugging in a more hostile environment. In particular the compiler will be used next semester in a final year year operating system internals course. It will be interesting to log the error messages during this course and compare the percentage occurrence against the ones reported in this paper. We also hope to repeat the statistics gathering on all assignments undertaken by first year students for comparison.

References

1.

S. Benford, E. Burke, E. Foxley, N. Gutteridge A.M. Zin, The Ceilidh System: A General Overview, Learning Technology Research, Computer Science Department, Nottingham University (1994).

2.

B. Berliner, “CVS II: Parallelizing Software Development,” Proceedings of the Winter 1990 USENIX Conference, pp. 141-352, Washington DC, USA (January 22-26, 1990).

3.

S.C. Johnson, “Lint, a C Program Checker,” Comp. Sci. Tech. Rep. No. 65 (1978). updated version TM 78-1273-3.

4.

S.C. Johnson, “A Portable Compiler: Theory and Practice,” Proc. 5th ACM Symp. on Principles of Programming Languages, pp. 97-104 (January 1978).

5.

S. C. Johnson and D. M. Ritchie, “UNIX Time-Sharing System: Portability of C Programs and the UNIX System,” Bell Sys. Tech. J. 57(6), pp. 2021-2048 (1978).

6.

B.W. Kernigham and P.J. Plauger, Elements of Programming Style 2nd Edition, McGraw-Hill (1978).

7.

S.F. Lewis, “Developing a Modula 2 course for Ceilidh,” CTI Computing 5th Annual Conference on Teaching of Computing, pp. 126-128, Dublin (1997).

8.

G.P.C Mulley, K. Verheyden, “Enhancing a Modula-2 compiler to help students learn interactively within the Ceilidh system,” Knowledge Transfer 97 (1997).


This document was produced using groff-1.22.4.