The Procedural Fascicle
Chapter 2
Basic concepts
Variables, keywords, and regions
Scheme is a statically scoped language with block structure. To each
place in a top-level program or library body where an identifier is
bound there corresponds a region of code within which the binding is
visible. The region is determined by the particular binding construct
that establishes the binding; if the binding is established by a
lambda expression, for example, then its region consists of
the formals and body of that lambda expression. Every
mention of an identifier refers to the binding of the identifier that
establishes the innermost of the regions containing the use. If a use of
an identifier appears in a place where none of the surrounding
expressions contains a binding for the identifier, the use may refer to
a binding established by a definition or import at the top of the
enclosing library or top-level program. If there is no binding for the
identifier, it is said to be unbound.
Within the body of a library or top-level program, an identifier may name a kind of syntax, or it may name a location where a value can be stored. An identifier that names a kind of syntax is called a keyword, or syntax keyword, and is said to be bound to that kind of syntax (or, in the case of a syntactic abstraction, a transformer that translates the syntax into more primitive forms; see [Editorial note: the macrological fascicle ]. An identifier that names a location is called a variable and is said to be bound to that location. At each point within a top-level program or a library, a specific, fixed set of identifiers is bound. The set of these identifiers, the set of visible bindings, is known as the environment in effect at that point. The value stored in the location to which a variable is bound is called the variable’s value. By abuse of terminology, the variable is sometimes said to name the value or to be bound to the value. This is not quite accurate, but confusion rarely results from this practice.
A variable’s value can be accessed, which occurs when an expression
which consists only of the variable’s name is evaluated
(see 3.1); and a variable’s value can be assigned, which
occurs when a set! expression affecting the variable is evaluated
(see 4.3). A variable can also be referred to, which
occurs when an expression which would, if evaluated, access or assign
the value of that variable appears in code in that variable’s region,
without necessarily evaluating that code. In some contexts, this
report restricts the ability to refer to, access the value of, or
assign the value of some variables. For example, it is not permitted
for code within a library [Editorial note: cross-reference to bibliothecarial fascicle ]
to attempt to
assign the value of a variable which it has imported from another
library, but that variable can be freely referred to and its value
accessed in any context in that library. Likewise, in the context of a
letrec or letrec* expression
(see 4.2), there is no
restriction on referring to variables’ names, but an attempt to
evaluate an expression which accesses or assigns the value of a
variable bound by those expressions can cause an error to be
signalled, if its value has not yet been determined according to the
evaluation rules of those expressions. Finally, in the context of a
body (see 2.6), it can cause an error to be signalled if a
variable is only referred to, if that variable’s definition occurs in
a group of definitions which appears to the right of the expression
which refers to it.
Certain expression types are used to create syntactic abstractions
and to bind keywords to transformers for those new syntactic
abstractions, while other forms create new locations and bind variables
to those locations. Collectively, these forms are called binding
constructs. Some binding constructs take the form of definitions, while
others are expressions. With the exception of exported library bindings,
a binding created by a definition is visible only within the body in
which the definition appears, e.g., the body of a library, top-level
program, or lambda expression. Exported library bindings
are also visible within the bodies of the libraries and top-level
programs that import them.
Procedures
A procedure is an abstraction of an operation over objects. The
operation thus abstracted over may be performed by a call to the
procedure. When a procedure is called, it receives a number of inputs,
referred to as its arguments. The procedure call processes those
arguments either according to a specification defined in this report
or by an implementation, or by evaluating some Scheme code which
operates on those arguments, to perform the operation and produce its
outputs. The procedure then returns its outputs to the continuation of
the expression which called it. A procedure whose operation is defined
by Scheme code is created by the lambda or case-lambda
expressions: the arguments to the procedure are made available to that
code as variables bound within the region of that code, which is a
body (see 2.6). Additional inputs to the procedure may come
from variables whose region encompasses the corresponding lambda or
case-lambda expression.
Scheme procedures are also objects in their own right. Procedures can
be created dynamically, stored in data structures, returned as results
of procedures, and so on. Many predefined operations of Scheme are
provided not by syntax, but by variables whose values are procedures.
The + operation, for example, which receives special syntactic
treatment in many other languages, is just a regular identifier in
Scheme, bound to a procedure that adds number objects.
All procedure objects in Scheme are conceptually associated with
location tags, in order to make the eq? and eqv? procedures
(section 4.6) work on procedures. The nature of
this location tag is unspecified: the location tag itself cannot be
directly treated as an object in Scheme; it is required only to have a
defined equivalence relation for the purposes of the internal workings
of eq? and eqv?.
Rationale: Besides their obvious uses, procedures have traditionally
been used in some Scheme programs as a form of data abstraction for
object-oriented programming and similar techniques (see for example
(Abelson and Sussman 1996, section 2.1.3). This makes it convenient
to have a form of equivalence predicate defined over them, as for
other Scheme data objects. However, treating lambda as a data
constructor subject to the usual requirement to return a
freshly-allocated object upon each evaluation (in the same way as, for
example, the cons procedure) disallows some useful compiler
optimizations for the more common case of using lambda to abstract a
series of operations, rather than a collection of data.
A future fascicle will define a data constructor for procedures which
guarantees that every evaluation creates a procedure object which is
distinct from every other object in the sense of eqv? and eq?. A
future Scheme report may then make the equivalence of all other
procedures under eq? and eqv? unspecified. (The R6RS already made
this change, but neglected to provide a suitable alternative for
creating procedures which are actually used as data, so the change was
reverted by the small language report.)
Every named procedure defined by this report has a single location tag distinct from the location tag of procedures with distinct names. The exception is if this report describes one procedure as an alias of another, in which case the two procedures may or may not have distinct location tags.
Expressions and definitions
All main forms in Scheme program and library code can be categorized as either definitions or expressions.
Definitions may appear directly within a program, a library [Editorial note:
cross-reference to bibliothecarial fascicle ], or a body (see
section 2.6). They may also appear within a begin form (see
section 4.4) or other splicing form that appears within one
of the contexts where a definition may appear. A definition has the
effect of binding one or more identifiers to variables or syntax
keywords within the region of that program, library, or body in which
it is found, and within any environments created by extending the
environment in place in that region. A definition which appears within
a library or splicing form within a library has the effect of allowing
the identifiers it names to be exported from that library [Editorial note:
cross-reference to bibliothecarial fascicle ]. A definition
conceptually does not evaluate to any value or values, although a
definition form which is passed to the eval procedure may cause that
procedure call to return a value [Editorial note: cross-reference to
bibliothecarial fascicle ].
Expressions evaluate to a value or values (see section 2.5). They may contain a region for an identifier or identifiers created by the expression itself, or by definitions which form part of the expression, but expressions do not create bindings in the region surrounding themselves. Expressions may occur within other expressions, as well as within definitions to specify the values of variables or the transformers for syntax keywords that are bound by the definition. All syntactic forms defined in this report are expressions unless explicitly stated to be definitions.
When an expression appears within a definition, or within a syntactic form or procedure call that changes the value of a variable or the value stored in some other location, the expression which denotes the new value to be stored in that location is said to be on the right-hand side of the definition, assignment form, or procedure call. The remaining subforms or arguments, which denote the location which is to have a new value stored in it, are said to be on the left-hand side.
When syntactic forms are specified in this report, the rule expression means any form which is an expression, and definition means any form which is a definition. Definition or expression means either a definition or an expression, although the final sequence of forms within any context where either is allowed must conform to the restrictions on definitions and expressions which apply within that context. For example, one such restriction which applies to all contexts where definitions are allowed is that no definition form may attempt to bind an identifier which another definition form in the same context also binds. It is a syntax violation to use a definition in a context where only expressions are allowed.
Some main forms in Scheme also contain auxiliary forms which are
neither definitions nor expressions. Such auxiliary forms, unlike main
forms, have no meaning or validity outside of by the main form to
which they belong. Examples of such auxiliary forms include the
formals clause of a lambda expression (see
section 4.1), the left-hand side of define forms and the
bindings clauses of let expressions (see
section 4.2), and each cond
clause of a cond expression (see
section 4.5).
Continuations and dynamic extents
Note: This section will be extended by a future fascicle.
Whenever a Scheme expression is evaluated there is a continuation
waiting for the result of the expression. The continuation represents
an entire (default) future for the computation. For example,
informally the continuation of 3 in the expression (+ 1 3) adds 1 to
it.
Normally these ubiquitous continuations are hidden behind the scenes and programmers do not think much about them. However, one distinguishing feature of Scheme is that continuations, which in most other languages only operate behind the scenes, also have ‘first-class’ status. Continuations may be captured in first-class procedure objects and returned to later by subsequent calls to these procedures. Using this mechanism, continuations in Scheme may never be returned to, or they may be returned to multiple times. Thus, the evaluation of any expression or sequence of expressions in Scheme can be interrupted and restarted multiple times by capture and return to a continuation.
For a procedure call, the time between when it is initiated and when it returns is called its dynamic extent. Capture of continuations allows re-entering a dynamic extent after its procedure call has returned. Thus, the dynamic extent of a call may not be a single, connected time period.
Multiple return values
A Scheme expression can evaluate to an arbitrary finite number of values. These values are passed to the expression’s continuation.
Not all continuations accept any number of values. For example, a continuation that accepts the argument to a procedure call is guaranteed to accept exactly one value (see section 3).
A number of forms in Scheme have sequences of expressions as subforms that are evaluated sequentially, with the return values of all but the last expression being discarded. The continuations discarding these values accept any number of values.
Bodies
Many syntactic forms in Scheme are specified as containing a body. (In
order to distinguish bodies from the code inside of libraries and
top-level programs, a body is sometimes referred to as a procedure
body, because lambda is the fundamental form for the implementation
of forms which use bodies.) A body is a region for keywords and
variables created by definitions directly within the body, or inside
of splicing constructs (such as begin, splicing-let-syntax, and
splicing-letrec-syntax) within the body. A body is concluded with an
expression and may also contain additional expressions before,
between, or after the definitions.
Bodies are first processed by the expander according to the process defined in [Editorial note: cross-reference to the macrological fascicle ], a process which also removes any splicing constructs by replacing them with their expanded subforms. The expander also takes note of the set of identifier names bound by syntax or variable definitions within the entire region of the body. It is a syntax violation if any definition attempts to bind the same identifier more than once, or attempts to bind the same identifier as another definition within the same body (including in any splicing constructs).
After this process, a body consists only of variable definitions and expressions in their fully expanded core forms. These are further processed for evaluation in groups as follows. An expanded body has the form
variable definition–expression group ... expression
The expression is referred to as the final expression of the body. A variable definition–expression group has the form
variable definition ... expression ...
A variable definition is any implementation-dependent core form which is a definition form for variables. The expressions of a variable definition–expression group are referred to as the non-final expressions of the body. It is a syntax violation if the expanded content of a body does not conform to these rules.
Each variable definition–expression group is converted into the
equivalent of a letrec*-values (see
section 4.2) for the variable
definitions in that group, although this letrec*-values does not
contain a body itself but only a sequence of expressions, which are
evaluated from left to right. The sequence of expressions consists of
the expressions within the respective variable
definition–expression group, followed by the corresponding
conversion of the next variable definition–expression group, or by
the final expression if there are no further variable
definition–expression groups. Implementations should signal a syntax
violation if the expansion of the right-hand side of any variable
definition, or the expansion of any expression, refers to the
name of any identifier which is a variable defined by a variable
definition–expression group which is to the right of the variable
definition–expression group in which it appears.
Rationale: Scheme has traditionally only allowed a single group of
definitions at the head of a body, followed by at least one
expression. Experience has shown that it is often useful to be able to
mix the two types of forms in bodies, especially for error checking
and logging of intermediate results. However, typical optimized
implementation techniques for the semantics of the letrec or
letrec* underlying a traditional Scheme body work only when the
right-hand sides of definitions, and by extension the entire block of
definitions, are free of side-effects or uses of first-class
continuations. Since non-final expressions in a body are only useful
for their side-effects or continuation invocations, simply
incorporating expressions into the sequence of letrec* bindings
invariably inhibits these optimizations. The compilation scheme
described here provides the convenience of mixing definitions and
expressions in any order, while ultimately grouping the definitions so
that optimization can still happen as in implementations of previous
revisions of this report. Since the expander still treats syntax
keyword bindings throughout the whole body as belonging to a single
region, and implementations should treat attempts at forward reference
to a variable defined in a future group of definitions and expressions
as a syntax violation, the property that a body is a single region for
bindings is maintained: the only practical restriction is that it is
not possible to create mutually-recursive definitions between distinct
variable definition–expression groups, which is necessary to
ensure each letrec*-values equivalent form in the transformed body
can still be optimized.
Issue: The syntax and semantics of bodies described above represent an initial attempt to find a satisfactory means of allowing Scheme programmers to mix definitions and expressions in any order within bodies, as requested by voters on the Stygian Blue ballot. They correspond to a variant of semantics originally proposed by Sergei Egorov in SRFI 251, adapted for the expansion order originally specified by R6RS and described in the macrological fascicle. There is also a permissive provision allowing implementations to adopt semantics more like those of SRFI 245 or R6RS program bodies, which do not restrict forward reference between different variable definition–expression groups. This permissive provision is the ‘should’ in the recommendation to signal a syntax violation in the case of a reference to a variable in a subsequent variable definition–expression group.
The working group is unsatisfied with the creation of implicit lexical contours by variable definition–expression groups, especially considering the behaviour of identifier syntax defined in the body (which may inadvertently flout good Scheme style by not behaving exactly like a variable) and the behaviour of macros which may expand to a combination of a definition and an expression. The R6RS program body semantics are also problematic for the optimization-related reasons described above in the rationale. Further, with semantics derived from those of R6RS program bodies, it would not be safely possible to use continuations to return twice from any expression in a body before that body’s final definition. (The R6RS encourages implementations to raise a violation exception in this case.) Note that the permissive provision above implies that implementations which take advantage of it cannot impose such a restriction on multiple return from expressions.
The working group is especially keen to receive feedback on this matter from the Scheme community.
The continuations of all of the non-final expressions within a body
(i.e. of the expanded expressions within the created letrec*-values
equivalent) accept any number of values and discard them. The
continuation of the final expression of a body accepts the same number
of values as the continuation of the body itself, and passes the
values to that continuation. (A body may be in tail context, in which
case the continuation of the final expression is the same as the
continuation of the body itself.)
Storage model
Note: This section will be extended by a future fascicle.
Variables and objects implicitly refer to locations or sequences of
locations. A vector, for example, contains as many locations as there
are elements in the vector. A new value may be stored into one of
these locations using the vector-set! procedure, but the vector
contains the same locations as before.
An object fetched from a location, by a variable access or by a
procedure such as car, vector-ref, or string-ref, is equivalent
in the sense of eqv? (see 4.6) to the object last stored in the
location before the fetch.
Every location is marked to show whether it is in use. No variable or object ever refers to a location that is not in use. Whenever this report speaks of storage being allocated for a variable or object, what is meant is that an appropriate number of locations are chosen from the set of locations that are not in use, and the chosen locations are marked to indicate that they are now in use before the variable or object is made to refer to them.
Active procedure calls
Implementations of Scheme must be properly tail-recursive. Procedure
calls that occur in certain syntactic contexts called tail contexts
are tail calls. A Scheme implementation is properly tail-recursive if
it supports an unbounded number of active tail calls. A call is active
if the called procedure may still return. Note that this includes
regular returns as well as returns through continuations captured
earlier by call-with-current-continuation that are later invoked.
[Editorial note: Flow-control will include delimited continuations probably ] In
the absence of captured continuations, calls could return at most once
and the active calls would be those that had not yet returned. This
allows the execution of an iterative computation in constant space,
even if the iterative computation is described by a syntactically
recursive procedure. Thus with a properly tail-recursive
implementation, iteration can be expressed using the ordinary
procedure-call mechanics, so that special iteration constructs are
useful only as syntactic sugar.
A formal definition of proper tail recursion can be found in (Clinger 1998).
Rationale: Intuitively, no space is needed for an active tail call because the continuation that is used in the tail call has the same semantics as the continuation passed to the procedure containing the call. Although an improper implementation might use a new continuation in the call, a return to this new continuation would be followed immediately by a return to the continuation passed to the procedure. A properly tail-recursive implementation returns to that continuation directly.
Proper tail recursion was one of the central ideas in Steele and Sussman’s original version of Scheme. Their first Scheme interpreter implemented both functions and actors. Control flow was expressed using actors, which differed from functions in that they passed their results on to another actor instead of returning to a caller. In the terminology of this section, each actor finished with a tail call to another actor.
Steele and Sussman later observed that in their interpreter the code for dealing with actors was identical to that for functions and thus there was no need to include both in the language.
Implementations of Scheme should also support a practically unbounded
number of active calls which are not tail calls. The number of active
non-tail calls should be bounded only by the amount of storage space
available to the program. Any other behaviour is
implementation-specified; however, implementations which do not support
unbounded numbers of active non-tail calls should raise an exception
with condition type &implementation-restriction upon an
attempt to activate a non-tail call which breaks the implementation’s
limit.
Rationale: The need to support unbounded numbers of non-tail calls is
related to the requirement for safety [Editorial note: cross-reference to erroneous fascicle
]. Consider the following implementation of a simplified version of
the map procedure [Editorial note: cross-reference to valued fascicle or perhaps to
Batteries ]:
(define (map proc ls)
(if (pair? ls)
(cons (proc (car ls))
(map proc (cdr ls)))
'()))Without a guarantee that practically unbounded numbers of non-tail
calls may be active at any one time, it is unsafe to call this
procedure on a list ls of any particular length, because a list
longer than the number of allowed active non-tail calls would not
behave in the expected way.
The tail contexts in which tail calls occur are defined inductively. In general, any body or sequence of expressions has its final expression in a tail context if the body or sequence of expressions itself is part of the syntax of a syntactic form which itself is used in a tail context. This report usually notes when final expressions are in tail contexts, and always notes explicitly when final expressions are not required to be in a tail context. Implementations which provide libraries with additional syntactic forms in addition to those defined in this report must also document any case in which a body or sequence of expressions which forms part of a syntactic form is never in tail context, even if the syntactic form as a whole is in a tail context.
Certain built-in procedures must also perform tail calls. The first
argument passed to apply and to call-with-current-continuation,
and the second argument passed to call-with-values, must be called
via a tail call. Implementations which provide libraries with
additional procedures whose final result is the unmodified result of
calling some other procedure given as an argument should document
whether the arguments to those procedures are tail-called or not.