Compilation of a program text proceeds by analyzing the text and thereby decomposing it recursively into its constructs according to the syntax. When a construct is identified, code is generated according to the semantic rule associated with the construct. The components of the identified construct supply parameters for the generated code. It follows that we distinguish between two kinds of actions: analyzing steps and code generating steps. In a rough approximation we may say that the former are source language dependent and target computer independent, whereas the latter are source language independent and target computer dependent.

The main module of the compiler is ORP (for Oberon to RISC Parser) It is primarily dedicated to syntactic analysis, parsing. Upon recognition of a syntactic construct, an appropriate procedure is called the code generator module ORG (for Oberon to RISC Generator). Apart from parsing, ORP checks for type consistency of operands, and it computes the attributes of objects identified in declarations. Whereas ORP mirrors the source language and is independent of a target computer, ORG reflects the target computer, but is independent of the source language.

Each time the syntax analyzer (parser) proceeds to read the next symbol, it does this by calling procedure Get, which constitutes the so- called scanner residing in module ORS (for Oberon to RISC Scanner). It reads from the source text as many characters as are needed to recognize the next symbol.

The recognition of symbols within a character sequence is called lexical analysis.

Ideally the recognition of any syntactic construct, say A, consisting of subconstructs, say B1, B2, … , Bn, leads to the generation of code that depends only on (1) the semantic rules associated with A, and (2) on (attributes of) B1, B2, … , Bn. If this condition is satisfied, the construct is said to be context-free, and if all constructs of a language are context-free, then also the language is context-free.

exception is embodied by the notion of declarations. The declaration of an identifier, say x, attaches permanent properties to x, such as the fact that x denotes a variable and that its type is T. These properties are “invisible” when parsing a statement containing x, because the declaration of x is not also part of the statement. The “meaning” of identifiers is thus inherently context-dependent. Context-dependence due to declarations is the immediate reason for the use of a global data structure which represents the declared identifiers and their properties (attributes). Since this concept stems from early assemblers where identifiers (then called symbols) were registered in a linear table, the term symbol table tends to persist for this structure, although in this compiler it is considerably more complex than an array

A complication arises from the notion of exports and imports in Oberon. Its consequence is that the declaration of an identifier x may be in a module, say M, different from where x is referenced. If x is exported, the compiler includes x together with its attributes in the symbol file of the compiled module M. When compiling another module which imports M, that symbol file is read and its data are incorporated into the symbol table. Procedures for reading and writing symbol files are contained in module ORB, and no other module relies on information about the structure of symbol files.

uses the straight-forward scheme of sequential allocation of consecutively declared variables. An address is a pair consisting of a base address (in a register) and an offset. Global variables are allocated in the module’s data section and the respective base address register is SB (Static Base, see Chapter 6). Local variables are allocated in a procedure activation record on the stack; the respective base register is SP (Stack Pointer). Offsets are positive integers.

The parser is designed according to the proven method of top-down, recursive descent parsing with a look-ahead of a single symbol. The last symbol read is represented by the global variable sym. Syntactic entities are mirrored by procedures of the same name. Their goal is to recognize the specified construct in the source text. The start symbol and corresponding procedure is Module.

The rule of parsing strictly based on a single-symbol look-ahead and without reference to context is violated in three places. The prominent violation occurs in statements. If the first symbol of a statement is an identifier, the decision of whether an assignment or a procedure call is to be recognized is based on contextual information, namely the class of the identified object. The second violation occurs in qualident; if the identifier x preceding a period denotes a module, it is recognized together with the subsequent identifier as a qualified identifier. Otherwise x supposedly denotes a record variable. The third violation is made in procedure selector; if an identifier is followed by a left parenthesis, the decision of whether a procedure call or a type guard is to be recognized is again made on the basis of contextual information, namely the mode of the identified object.

Besides the parsing of text, the Parser also performs the checking for type consistency of objects. This is based on type information held in the global table, gained during the processing of declarations, which is also handled by the routines which parse. Thereby an unjustifiably large number of very short procedures is avoided. However, the strict target-computer independence of the parser is violated slightly: Information about variable allocation strategy including alignment, and about the sizes of basic types is used in the parser module. Whereas the former violation is harmless, because the allocation strategy is hardly controversial, the latter case constitutes a genuine target-dependence embodied in a number of explicitly declared constants. Mostly these constants are contained in the respective type definitions, represented by records of type Type initialized by ORB.

An inherently nasty subject is the treatment of forward references in a single-pass compiler. In Oberon, there are two such cases

The symbol table constitutes the context in which statements and expressions are parsed. Each procedure establishes a scope of visibility of local identifiers. The records registering identifiers belonging to a scope are linked as a linear list. They are of type Object. Each object has a type.

If a new identifier is to be added, procedure NewObj first searches the list, and if the identifier is already present, a double definition is diagnosed. Otherwise the new element is appended, thereby preserving the order given by the source text.

Procedures, and therefore also scopes, may be nested. Each scope is represented by the list of its declared identifiers, and the list of the currently visible scopes are again connected as a list. Procedure OpenScope appends an element and procedure CloseScope removes it. The list of scopes is anchored in the global variable topScope and linked by the field dsc. It is treated like a stack. It consists of elements of type Object, each one being the header (class = Head) of the list of declared entities.

A search of an identifier proceeds first through the scope list, and for each header its list of object records is scanned. This mirrors the scope rule of the language and guarantees that if several entities carry the same identifier, the most local one is selected. The linear list of objects represents the simplest implementation by far. A tree structure would in many cases be more efficient for searching, and would therefore seem more recommendable. Experiments have shown, however, that the gain in speed is marginal. The reason is that the lists are typically quite short. The superiority of a tree structure becomes manifest only when a large number of global objects is declared. We emphasize that when a tree structure is used for each scope, the linear lists must still be present, because the order of declarations is sometimes relevant in interpretation, e.g. in parameter lists.

Not only procedures, but also record types establish their own local scope. The list of record fields is anchored in the type record’s field dsc, and it is searched by procedure thisField. If a record type R1 is an extension of R0, then R1’s field list contains only the fields of the extension proper. The base type R0 is referenced by the BaseTyp field of R1. Hence, a search for a field may have to proceed through the field lists of an entire sequence of record base types.

A symbol file is a linearized form of an excerpt of the symbol table containing descriptions of all exported (marked) objects.

Sample scope of Semantic analysis

This scope demonstrates most typical use cases, and the actual scope is way bigger

  • Detect a type conflict in an expression, i.e., in X OP Y, the types of either X or Y are incompatible with OP
  • Detect an illegal assignment. i.e., A := Expr, where A is not a variable
  • Detect a type conflict in an assignment
  • Detect an illegal procedure call
    • different number of arguments
    • argument type is not assignable or equivalent to parameter’s type.
    • the corresponding argument is not addressable (i.e., the argument must be a variable, not a constant, nor a literal, nor an expression, nor a procedure call).
  • Detect an illegal constant declaration, where the value of Expr is not known at compile time.

References