The Procedural Fascicle

Chapter 2

Basic concepts

Variables, keywords, and regions

Scheme is a statically scoped language with block structure. To each place in a top-level program or library body where an identifier is bound there corresponds a region of code within which the binding is visible. The region is determined by the particular binding construct that establishes the binding; if the binding is established by a lambda expression, for example, then its region consists of the formals and body of that lambda expression. Every mention of an identifier refers to the binding of the identifier that establishes the innermost of the regions containing the use. If a use of an identifier appears in a place where none of the surrounding expressions contains a binding for the identifier, the use may refer to a binding established by a definition or import at the top of the enclosing library or top-level program. If there is no binding for the identifier, it is said to be unbound.

Within the body of a library or top-level program, an identifier may name a kind of syntax, or it may name a location where a value can be stored. An identifier that names a kind of syntax is called a keyword, or syntax keyword, and is said to be bound to that kind of syntax (or, in the case of a syntactic abstraction, a transformer that translates the syntax into more primitive forms; see [Editorial note: the macrological fascicle ]. An identifier that names a location is called a variable and is said to be bound to that location. At each point within a top-level program or a library, a specific, fixed set of identifiers is bound. The set of these identifiers, the set of visible bindings, is known as the environment in effect at that point. The value stored in the location to which a variable is bound is called the variable’s value. By abuse of terminology, the variable is sometimes said to name the value or to be bound to the value. This is not quite accurate, but confusion rarely results from this practice.

A variable’s value can be accessed, which occurs when an expression which consists only of the variable’s name is evaluated (see 3.1); and a variable’s value can be assigned, which occurs when a set! expression affecting the variable is evaluated (see 4.3). A variable can also be referred to, which occurs when an expression which would, if evaluated, access or assign the value of that variable appears in code in that variable’s region, without necessarily evaluating that code. In some contexts, this report restricts the ability to refer to, access the value of, or assign the value of some variables. For example, it is not permitted for code within a library [Editorial note: cross-reference to bibliothecarial fascicle ] to attempt to assign the value of a variable which it has imported from another library, but that variable can be freely referred to and its value accessed in any context in that library. Likewise, in the context of a letrec or letrec* expression (see 4.2), there is no restriction on referring to variables’ names, but an attempt to evaluate an expression which accesses or assigns the value of a variable bound by those expressions can cause an error to be signalled, if its value has not yet been determined according to the evaluation rules of those expressions. Finally, in the context of a body (see 2.6), it can cause an error to be signalled if a variable is only referred to, if that variable’s definition occurs in a group of definitions which appears to the right of the expression which refers to it.

Certain expression types are used to create syntactic abstractions and to bind keywords to transformers for those new syntactic abstractions, while other forms create new locations and bind variables to those locations. Collectively, these forms are called binding constructs. Some binding constructs take the form of definitions, while others are expressions. With the exception of exported library bindings, a binding created by a definition is visible only within the body in which the definition appears, e.g., the body of a library, top-level program, or lambda expression. Exported library bindings are also visible within the bodies of the libraries and top-level programs that import them.

Procedures

A procedure is an abstraction of an operation over objects. The operation thus abstracted over may be performed by a call to the procedure. When a procedure is called, it receives a number of inputs, referred to as its arguments. The procedure call processes those arguments either according to a specification defined in this report or by an implementation, or by evaluating some Scheme code which operates on those arguments, to perform the operation and produce its outputs. The procedure then returns its outputs to the continuation of the expression which called it. A procedure whose operation is defined by Scheme code is created by the lambda or case-lambda expressions: the arguments to the procedure are made available to that code as variables bound within the region of that code, which is a body (see 2.6). Additional inputs to the procedure may come from variables whose region encompasses the corresponding lambda or case-lambda expression.

Scheme procedures are also objects in their own right. Procedures can be created dynamically, stored in data structures, returned as results of procedures, and so on. Many predefined operations of Scheme are provided not by syntax, but by variables whose values are procedures. The + operation, for example, which receives special syntactic treatment in many other languages, is just a regular identifier in Scheme, bound to a procedure that adds number objects.

All procedure objects in Scheme are conceptually associated with location tags, in order to make the eq? and eqv? procedures (section 4.6) work on procedures. The nature of this location tag is unspecified: the location tag itself cannot be directly treated as an object in Scheme; it is required only to have a defined equivalence relation for the purposes of the internal workings of eq? and eqv?.

Rationale: Besides their obvious uses, procedures have traditionally been used in some Scheme programs as a form of data abstraction for object-oriented programming and similar techniques (see for example (Abelson and Sussman 1996, section 2.1.3). This makes it convenient to have a form of equivalence predicate defined over them, as for other Scheme data objects. However, treating lambda as a data constructor subject to the usual requirement to return a freshly-allocated object upon each evaluation (in the same way as, for example, the cons procedure) disallows some useful compiler optimizations for the more common case of using lambda to abstract a series of operations, rather than a collection of data.

A future fascicle will define a data constructor for procedures which guarantees that every evaluation creates a procedure object which is distinct from every other object in the sense of eqv? and eq?. A future Scheme report may then make the equivalence of all other procedures under eq? and eqv? unspecified. (The R6RS already made this change, but neglected to provide a suitable alternative for creating procedures which are actually used as data, so the change was reverted by the small language report.)

Every named procedure defined by this report has a single location tag distinct from the location tag of procedures with distinct names. The exception is if this report describes one procedure as an alias of another, in which case the two procedures may or may not have distinct location tags.

Expressions and definitions

All main forms in Scheme program and library code can be categorized as either definitions or expressions.

Definitions may appear directly within a program, a library [Editorial note: cross-reference to bibliothecarial fascicle ], or a body (see section 2.6). They may also appear within a begin form (see section 4.4) or other splicing form that appears within one of the contexts where a definition may appear. A definition has the effect of binding one or more identifiers to variables or syntax keywords within the region of that program, library, or body in which it is found, and within any environments created by extending the environment in place in that region. A definition which appears within a library or splicing form within a library has the effect of allowing the identifiers it names to be exported from that library [Editorial note: cross-reference to bibliothecarial fascicle ]. A definition conceptually does not evaluate to any value or values, although a definition form which is passed to the eval procedure may cause that procedure call to return a value [Editorial note: cross-reference to bibliothecarial fascicle ].

Expressions evaluate to a value or values (see section 2.5). They may contain a region for an identifier or identifiers created by the expression itself, or by definitions which form part of the expression, but expressions do not create bindings in the region surrounding themselves. Expressions may occur within other expressions, as well as within definitions to specify the values of variables or the transformers for syntax keywords that are bound by the definition. All syntactic forms defined in this report are expressions unless explicitly stated to be definitions.

When an expression appears within a definition, or within a syntactic form or procedure call that changes the value of a variable or the value stored in some other location, the expression which denotes the new value to be stored in that location is said to be on the right-hand side of the definition, assignment form, or procedure call. The remaining subforms or arguments, which denote the location which is to have a new value stored in it, are said to be on the left-hand side.

When syntactic forms are specified in this report, the rule expression means any form which is an expression, and definition means any form which is a definition. Definition or expression means either a definition or an expression, although the final sequence of forms within any context where either is allowed must conform to the restrictions on definitions and expressions which apply within that context. For example, one such restriction which applies to all contexts where definitions are allowed is that no definition form may attempt to bind an identifier which another definition form in the same context also binds. It is a syntax violation to use a definition in a context where only expressions are allowed.

Some main forms in Scheme also contain auxiliary forms which are neither definitions nor expressions. Such auxiliary forms, unlike main forms, have no meaning or validity outside of by the main form to which they belong. Examples of such auxiliary forms include the formals clause of a lambda expression (see section 4.1), the left-hand side of define forms and the bindings clauses of let expressions (see section 4.2), and each cond clause of a cond expression (see section 4.5).

Continuations and dynamic extents

Note: This section will be extended by a future fascicle.

Whenever a Scheme expression is evaluated there is a continuation waiting for the result of the expression. The continuation represents an entire (default) future for the computation. For example, informally the continuation of 3 in the expression (+ 1 3) adds 1 to it.

Normally these ubiquitous continuations are hidden behind the scenes and programmers do not think much about them. However, one distinguishing feature of Scheme is that continuations, which in most other languages only operate behind the scenes, also have ‘first-class’ status. Continuations may be captured in first-class procedure objects and returned to later by subsequent calls to these procedures. Using this mechanism, continuations in Scheme may never be returned to, or they may be returned to multiple times. Thus, the evaluation of any expression or sequence of expressions in Scheme can be interrupted and restarted multiple times by capture and return to a continuation.

For a procedure call, the time between when it is initiated and when it returns is called its dynamic extent. Capture of continuations allows re-entering a dynamic extent after its procedure call has returned. Thus, the dynamic extent of a call may not be a single, connected time period.

Multiple return values

A Scheme expression can evaluate to an arbitrary finite number of values. These values are passed to the expression’s continuation.

Not all continuations accept any number of values. For example, a continuation that accepts the argument to a procedure call is guaranteed to accept exactly one value (see section 3).

A number of forms in Scheme have sequences of expressions as subforms that are evaluated sequentially, with the return values of all but the last expression being discarded. The continuations discarding these values accept any number of values.

Bodies

Many syntactic forms in Scheme are specified as containing a body. (In order to distinguish bodies from the code inside of libraries and top-level programs, a body is sometimes referred to as a procedure body, because lambda is the fundamental form for the implementation of forms which use bodies.) A body is a region for keywords and variables created by definitions directly within the body, or inside of splicing constructs (such as begin, splicing-let-syntax, and splicing-letrec-syntax) within the body. A body is concluded with an expression and may also contain additional expressions before, between, or after the definitions.

Bodies are first processed by the expander according to the process defined in [Editorial note: cross-reference to the macrological fascicle ], a process which also removes any splicing constructs by replacing them with their expanded subforms. The expander also takes note of the set of identifier names bound by syntax or variable definitions within the entire region of the body. It is a syntax violation if any definition attempts to bind the same identifier more than once, or attempts to bind the same identifier as another definition within the same body (including in any splicing constructs).

After this process, a body consists only of variable definitions and expressions in their fully expanded core forms. These are further processed for evaluation in groups as follows. An expanded body has the form

variable definition–expression group ... expression

The expression is referred to as the final expression of the body. A variable definition–expression group has the form

variable definition ... expression ...

A variable definition is any implementation-dependent core form which is a definition form for variables. The expressions of a variable definition–expression group are referred to as the non-final expressions of the body. It is a syntax violation if the expanded content of a body does not conform to these rules.

Each variable definition–expression group is converted into the equivalent of a letrec*-values (see section 4.2) for the variable definitions in that group, although this letrec*-values does not contain a body itself but only a sequence of expressions, which are evaluated from left to right. The sequence of expressions consists of the expressions within the respective variable definition–expression group, followed by the corresponding conversion of the next variable definition–expression group, or by the final expression if there are no further variable definition–expression groups. Implementations should signal a syntax violation if the expansion of the right-hand side of any variable definition, or the expansion of any expression, refers to the name of any identifier which is a variable defined by a variable definition–expression group which is to the right of the variable definition–expression group in which it appears.

Rationale: Scheme has traditionally only allowed a single group of definitions at the head of a body, followed by at least one expression. Experience has shown that it is often useful to be able to mix the two types of forms in bodies, especially for error checking and logging of intermediate results. However, typical optimized implementation techniques for the semantics of the letrec or letrec* underlying a traditional Scheme body work only when the right-hand sides of definitions, and by extension the entire block of definitions, are free of side-effects or uses of first-class continuations. Since non-final expressions in a body are only useful for their side-effects or continuation invocations, simply incorporating expressions into the sequence of letrec* bindings invariably inhibits these optimizations. The compilation scheme described here provides the convenience of mixing definitions and expressions in any order, while ultimately grouping the definitions so that optimization can still happen as in implementations of previous revisions of this report. Since the expander still treats syntax keyword bindings throughout the whole body as belonging to a single region, and implementations should treat attempts at forward reference to a variable defined in a future group of definitions and expressions as a syntax violation, the property that a body is a single region for bindings is maintained: the only practical restriction is that it is not possible to create mutually-recursive definitions between distinct variable definition–expression groups, which is necessary to ensure each letrec*-values equivalent form in the transformed body can still be optimized.

Issue: The syntax and semantics of bodies described above represent an initial attempt to find a satisfactory means of allowing Scheme programmers to mix definitions and expressions in any order within bodies, as requested by voters on the Stygian Blue ballot. They correspond to a variant of semantics originally proposed by Sergei Egorov in SRFI 251, adapted for the expansion order originally specified by R6RS and described in the macrological fascicle. There is also a permissive provision allowing implementations to adopt semantics more like those of SRFI 245 or R6RS program bodies, which do not restrict forward reference between different variable definition–expression groups. This permissive provision is the ‘should’ in the recommendation to signal a syntax violation in the case of a reference to a variable in a subsequent variable definition–expression group.

The working group is unsatisfied with the creation of implicit lexical contours by variable definition–expression groups, especially considering the behaviour of identifier syntax defined in the body (which may inadvertently flout good Scheme style by not behaving exactly like a variable) and the behaviour of macros which may expand to a combination of a definition and an expression. The R6RS program body semantics are also problematic for the optimization-related reasons described above in the rationale. Further, with semantics derived from those of R6RS program bodies, it would not be safely possible to use continuations to return twice from any expression in a body before that body’s final definition. (The R6RS encourages implementations to raise a violation exception in this case.) Note that the permissive provision above implies that implementations which take advantage of it cannot impose such a restriction on multiple return from expressions.

The working group is especially keen to receive feedback on this matter from the Scheme community.

The continuations of all of the non-final expressions within a body (i.e. of the expanded expressions within the created letrec*-values equivalent) accept any number of values and discard them. The continuation of the final expression of a body accepts the same number of values as the continuation of the body itself, and passes the values to that continuation. (A body may be in tail context, in which case the continuation of the final expression is the same as the continuation of the body itself.)

Storage model

Note: This section will be extended by a future fascicle.

Variables and objects implicitly refer to locations or sequences of locations. A vector, for example, contains as many locations as there are elements in the vector. A new value may be stored into one of these locations using the vector-set! procedure, but the vector contains the same locations as before.

An object fetched from a location, by a variable access or by a procedure such as car, vector-ref, or string-ref, is equivalent in the sense of eqv? (see 4.6) to the object last stored in the location before the fetch.

Every location is marked to show whether it is in use. No variable or object ever refers to a location that is not in use. Whenever this report speaks of storage being allocated for a variable or object, what is meant is that an appropriate number of locations are chosen from the set of locations that are not in use, and the chosen locations are marked to indicate that they are now in use before the variable or object is made to refer to them.

Active procedure calls

Implementations of Scheme must be properly tail-recursive. Procedure calls that occur in certain syntactic contexts called tail contexts are tail calls. A Scheme implementation is properly tail-recursive if it supports an unbounded number of active tail calls. A call is active if the called procedure may still return. Note that this includes regular returns as well as returns through continuations captured earlier by call-with-current-continuation that are later invoked. [Editorial note: Flow-control will include delimited continuations probably ] In the absence of captured continuations, calls could return at most once and the active calls would be those that had not yet returned. This allows the execution of an iterative computation in constant space, even if the iterative computation is described by a syntactically recursive procedure. Thus with a properly tail-recursive implementation, iteration can be expressed using the ordinary procedure-call mechanics, so that special iteration constructs are useful only as syntactic sugar.

A formal definition of proper tail recursion can be found in (Clinger 1998).

Rationale: Intuitively, no space is needed for an active tail call because the continuation that is used in the tail call has the same semantics as the continuation passed to the procedure containing the call. Although an improper implementation might use a new continuation in the call, a return to this new continuation would be followed immediately by a return to the continuation passed to the procedure. A properly tail-recursive implementation returns to that continuation directly.

Proper tail recursion was one of the central ideas in Steele and Sussman’s original version of Scheme. Their first Scheme interpreter implemented both functions and actors. Control flow was expressed using actors, which differed from functions in that they passed their results on to another actor instead of returning to a caller. In the terminology of this section, each actor finished with a tail call to another actor.

Steele and Sussman later observed that in their interpreter the code for dealing with actors was identical to that for functions and thus there was no need to include both in the language.

Implementations of Scheme should also support a practically unbounded number of active calls which are not tail calls. The number of active non-tail calls should be bounded only by the amount of storage space available to the program. Any other behaviour is implementation-specified; however, implementations which do not support unbounded numbers of active non-tail calls should raise an exception with condition type &implementation-restriction upon an attempt to activate a non-tail call which breaks the implementation’s limit.

Rationale: The need to support unbounded numbers of non-tail calls is related to the requirement for safety [Editorial note: cross-reference to erroneous fascicle ]. Consider the following implementation of a simplified version of the map procedure [Editorial note: cross-reference to valued fascicle or perhaps to Batteries ]:

(define (map proc ls)
  (if (pair? ls)
      (cons (proc (car ls))
            (map proc (cdr ls)))
      '()))

Without a guarantee that practically unbounded numbers of non-tail calls may be active at any one time, it is unsafe to call this procedure on a list ls of any particular length, because a list longer than the number of allowed active non-tail calls would not behave in the expected way.

The tail contexts in which tail calls occur are defined inductively. In general, any body or sequence of expressions has its final expression in a tail context if the body or sequence of expressions itself is part of the syntax of a syntactic form which itself is used in a tail context. This report usually notes when final expressions are in tail contexts, and always notes explicitly when final expressions are not required to be in a tail context. Implementations which provide libraries with additional syntactic forms in addition to those defined in this report must also document any case in which a body or sequence of expressions which forms part of a syntactic form is never in tail context, even if the syntactic form as a whole is in a tail context.

Certain built-in procedures must also perform tail calls. The first argument passed to apply and to call-with-current-continuation, and the second argument passed to call-with-values, must be called via a tail call. Implementations which provide libraries with additional procedures whose final result is the unmodified result of calling some other procedure given as an argument should document whether the arguments to those procedures are tail-called or not.