The Revised7 Report on the Algorithmic Language Scheme
The Valued Fascicle

draft

Editor's introduction to this fascicle

This section will not appear in the final report.

The Revised7 Report on the Algorithmic Language Scheme, Large Edition, is released in fascicles, which are incremental releases of the full standard.

This is the Valued Fascicle (fascicle 3 of 8) containing the description of the values that make up Scheme data, such as pairs, vectors, numbers, and strings.

Changes to the R7RS

The major features of this fascicle:

What Implementations Need To Do To Support These Changes

Raw string literals will require changes to the reader. Some standard library procedures need redefinition.

1: Numbers

This chapter describes Scheme’s model for numbers. It is important to distinguish between the mathematical numbers, the Scheme objects that attempt to model them, the machine representations used to implement the numbers, and notations used to write numbers. In this report, the term number refers to a mathematical number, and the term number object refers to a Scheme object representing a number. This report uses the types complex, real, rational, andinteger to refer to both mathematical numbers and number objects.[phm]: The term "number object" is confusing and pedantic. It should be removed and the terminology should be reverted back to the R5RS terminology. The fixnum and flonum refer to special subsets of the number objects, as determined by common machine representations, as explained below.

1.1: Numerical tower

Numbers may be arranged into a tower of subsets in which each level is a subset of the level above it:

For example, 5 is an integer. Therefore 5 is also a rational, a real, and a complex. The same is true of the number objects that model 5.

Number objects are organized as a corresponding tower of subtypes defined by the predicates number?, complex?, real?, rational?, and integer?; see section 1.6. Integer number objects are also called integer objects.

There is no simple relationship between a number’s type and its representation inside a computer. Although most implementations of Scheme will offer at least two different representations of 3, these different representations denote the same integer.

Scheme’s numerical operations treat numbers as abstract data, as independent of their representation as possible. Although an implementation of Scheme may use multiple internal representations of numbers, this ought not to be apparent to a casual programmer writing simple programs.

1.2: Exactness

It is useful to distinguish between number objects that are known to correspond to a number exactly, and those number objects whose computation involved rounding or other errors. For example, index operations into data structures may need to know the index exactly, as may some operations on polynomial coefficients in a symbolic algebra system. On the other hand, the results of measurements are inherently inexact, and irrational numbers may be approximated by rational and therefore inexact approximations. In order to catch uses of numbers known only inexactly where exact numbers are required, Scheme explicitly distinguishes exact from inexact number objects. This distinction is orthogonal to the dimension of type.

A number object is exact if it is the value of an exact numerical literal or was derived from exact number objects using only exact operations. Exact number objects correspond to mathematical numbers in the obvious way.

Conversely, a number object is inexact if it is the value of an inexact numerical literal, or was derived from inexact number objects, or was derived using inexact operations. Thus inexactness is contagious.

Exact arithmetic is reliable in the following sense: If exact number objects are passed to any of the arithmetic procedures described in section TODO, and an exact number object is returned, then the result is mathematically correct. This is generally not true of computations involving inexact number objects because approximate methods such as floating-point arithmetic may be used, but it is the duty of each implementation to make the result as close as practical to the mathematically ideal result.

1.3: Fixnums and flonums

TODO

1.4: Implementation requirements

Implementations of Scheme must support number objects for the entire tower of subtypes given in section 1.1. Moreover, implementations must support exact integer objects and exact rational number objects of practically unlimited size and precision, and to implement certain procedures (listed in section TODO) so they always return exact results when given exact arguments. ( Practically unlimited means that the size and precision of these numbers should only be limited by the size of the available memory.)

Implementations may support only a limited range of inexact number objects of any type, subject to the requirements of this section. For example, an implementation may limit the range of the inexact real number objects (and therefore the range of inexact integer and rational number objects) to the dynamic range of the flonum format. Furthermore the gaps between the inexact integer objects and rationals are likely to be very large in such an implementation as the limits of this range are approached.Where does this leave mixed-exactness complex numbers? I read this as making them optional. Also how does "range" apply to complex numbers?

An implementation may use floating point and other approximate representation strategies for inexact numbers. This report recommends, but does not require, that the IEEE floating-point standards be followed by implementations that use floating-point representations, and that im- plementations using other representations should match or exceed the precision achievable using these floating-point standards TODO: cite IEEE 754.

In particular, implementations that use floating-point representations must follow these rules: A floating-point result must be represented with at least as much precision as is used to express any of the inexact arguments to that operation. Potentially inexact operations such as sqrt, when applied to exact arguments, should produce exact answers whenever possible (for example the square root of an exact 4 ought to be an exact 2). However, this is not required.Should the following two sentences be removed?If, on the other hand, an exact number object is operated upon so as to produce an inexact result (as by sqrt), nd if the result is represented in floating point, then the most precise floating-point format available must be used; but if the result is represented in some other way then the representation must have at least as much precision as the most precise floating-point format available.

It is the programmer’s responsibility to avoid using inexact number objects with magnitude or significand too large to be represented in the implementation.

Compatbility Note The R6RS required exact integers and rationals of practically unlimited size and precision, and the whole numerical tower. The R7RS-Small did not. The R7RS-Large adopts the R6RS behavior.

1.5: IEEE arithmetic

TODO: Should we require IEEE 754?

1.6: Number predicates

These numerical type predicates can be applied to any kind of argument. They return #t if the object is a number object of the named type, and #f otherwise. In general, if a type predicate is true of a number object then all higher type predicates are also true of that number object. Consequently, if a type predicate is false of a number object, then all lower type predicates are also false of that number object.

1.6.1: number?

  • (number? obj) procedure
Semantics

Returns #t if the object is a number object, and #f otherwise.[phm]: Depending on the allowed extensions, this should have description similar to that in the R7RS-Small.

Examples

[phm]: These examples are not in previous reports.

(number? 0) ⇒ #t
(number? 0.0) ⇒ #t
(number? 1/2) ⇒ #t
(number? 1+2i) ⇒ #t
(number? "1") ⇒ #f
(number? '(1)) ⇒ #f

1.6.2: complex?

  • (complex? obj) procedure
Semantics

Returns #t if the object is a complex number object, and #f otherwise.[phm]: Depending on allowed extensions, this should have discussion similar to the R7RS-Small.

Examples
(complex? 3+4i) ⇒ #t
(complex? 3) ⇒ #t

1.6.3: real?

  • (real? obj) procedure
Semantics

Returns #t if the object is a real number object, and #f otherwise. Equivalent to: [phm]: In a break from previous Reports, and more in-line with how SRFIs are written, this is written as a implementation.

(define (real? x)
  (and (complex? x)
       (zero? (imag-part x))))

Rationale The R6RS defined a complex number object to be real if and only if it had an exact zero imaginary part. This differed from the R5RS definition. One of the reasons given for this change in the rationale of the R6RS is that real? could be used as a representational predicate separate from the usual representation of complex numbers. The R5RS definition was given the name real-valued?.

The R7RS-Small attemped to change the definition back to the R5RS definition, but the examples made it unclear if the R5RS or R6RS definition was required. Working Group 2 voted to restore the R5RS definition and give the R6RS definition the name strictly-real?.

Examples
(real? 3+4i) ⇒ #f
(real? 3) ⇒ #t
(real? -2.5+0i) ⇒ #t
(real? -2.5+0.0i) ⇒ #t
(real? -2.5-0.0i) ⇒ #t
(real? #e1e10) ⇒ #t
(real? +inf.0) ⇒ #t
(real? +nan.0) ⇒ #t

1.6.4: strictly-real?

  • (strictly-real? obj) procedure
Semantics

Returns #t iff the object is a real number object with an imaginary part that is exactly zero. Equivalent to

(define (strictly-real? obj)
  (and (real? obj)
       (exact? (imag-part obj))))

Rationale The sign of the zero in the imaginary part of a complex number can change the behavior of the number when applied to trancendental functions, as the real number line is a common branch cut cite Kahan's paper. This predicate returns true when the number is exactly on the real line, and not possibly infinitesimally off of it.

Examples
(strictly-real? 3+4i) ⇒ #f
(strictly-real? 3) ⇒ #t
(strictly-real? -2.5+0i) ⇒ #t
(strictly-real? -2.5+0.0i) ⇒ #f
(strictly-real? -2.5-0.0i) ⇒ #f
(strictly-real? #e1e10) ⇒ #t
(strictly-real? +inf.0) ⇒ #t
(strictly-real? +nan.0) ⇒ #t

1.6.5: integer?

  • (integer? obj) procedure
Semantics

Returns #t if the object is an integer object, and #f otherwise. Equivalent to:

(define (integer? obj)
  (and (real? obj)
       (= obj (round obj))))

[phm]: This definition is taken from the R7RS-Small. Is the R6RS definition better?

Examples
(integer? 3+0i) ⇒ #t
(integer? 3.0) ⇒ #t
(integer? 8/4) ⇒ #t
(integer? 3.1) ⇒ #f
(integer? 3+1i) ⇒ #f
(integer? 3+0.0i) ⇒ #t
(integer? 3-0.0i) ⇒ #t

1.6.6: strictly-integer?

  • (strictly-integer? obj) procedure
Semantics

Returns #t iff the object is an integer object with an imaginary part that is exactly zero. Equivalent to

(define (strictly-integer? obj)
  (and (integer? obj)
       (exact? (imag-part obj))))
Examples
(strictly-integer? 3+0i) ⇒ #t
(strictly-integer? 3.0) ⇒ #t
(strictly-integer? 8/4) ⇒ #t
(strictly-integer? 3.1) ⇒ #f
(strictly-integer? 3+1i) ⇒ #f
(strictly-integer? 3+0.0i) ⇒ #f
(strictly-integer? 3-0.0i) ⇒ #f

2: Characters

Characters are objects that represent Unicode scalar values.

Note Unicode defines a standard mapping between sequences of Unicode scalar values (integers in the range 0 to #x10FFFF, excluding the range #xD800 to #xDFFF) in the latest version of the standard and human-readable characters.[phm]: That previous sentence should be reworded. More precisely, Unicode distinguishes between glyphs, which are printed for humans to read, and characters, which are abstract entities that map to glyphs (sometimes in a way that’s sensitive to surrounding characters). Furthermore, different sequences of scalar values sometimes correspond to the same character. The relationships among scalar, characters, and glyphs are subtle and complex.

Despite this complexity, most things that a literate human would call acharacter can be represented by a single Unicode scalar value (although several sequences of Unicode scalar values may represent that same character). For example, Roman letters, Cyrillic letters, Hebrew consonants, and most Chinese characters fall into this category.

Unicode scalar values exclude the range #xD800 to #xDFFF, which are part of the range of Unicode code points. However, the Unicode code points in this range, the so-called surrogates, are an artifact of the UTF-16 encoding, and can only appear in specific Unicode encodings, and even then only in pairs that encode scalar values. Consequently, all characters represent code points, but the surrogate code points do not have representations as characters.

Compatibility Note The R6RS required Scheme programs to be written in sequences of Unicode scalar values. The R7RS-Small allowed for non-Unicode characters as long as #x0 and #x7F were the ASCII characters, that char->integer and integer->char treated non-Unicode characters as having integer values greater than Unicode characters, and that certain procedures were defined on Unicode scalar values according to certain properties and annexes of the Unicode standard. The R7RS-Large adopts the R6RS behavior.

2.1: Syntax

The grammar of character constants is:

⟨character⟩
::= #\any character
::= #\character name
::= #\xhex scalar value
⟨character name⟩
::= alarm
::= backspace
::= delete
::= escape
::= newline
::= null
::= return
::= space
::= tab
⟨R6RS character name⟩
::= alarm
::= backspace
::= delete
::= esc
::= linefeed
::= newline
::= nul
::= page
::= return
::= space
::= tab
::= vtab

The R6RS character names are used instead of the character names when the parser is set to R6RS mode.

Case is significant in #\character, and in #\character name, but not in #\xhex scalar value. A character must be followed by a delimiter or by the end of the input. This rule resolves various ambiguous cases involving named characters, requiring, for example, the sequence of characters #\space to be interpreted as the space character rather than as the character #s followed by the identifier pace.

Characters written in the #\ notation are self-evaluating. That is, they do not have to be quoted in programs.[phm]: Extension paragraph?

Compatability Note The R6RS introduced more control characters for the ASCII control characters that commonly had escape sequences in other programming languages. The R7RS-Small adopted a different set of names and escape sequences, intended to be more mnemonic, and to remove rarely used codes such as vertical tab and form feed. The R7RS-Large adopts the R7RS-Small convention.

The meaning of each escape sequence (R6RS-specific escape sequence in parentheses) is:

#\alarm
U+0007
#\backspace
U+0008
#\delete
U+007F
#\escape (#\esc)
U+001B
#\newline (#\linefeed)
U+000A (the linefeed character)
#\null (#\nul)
U+0000 (the NUL character)
#\return
U+000D (the return character)
#\space
U+0020 (the space character)
#\tab
U+0009 (the horiziontal tab character)
(#\vtab)
U+000B (the vertical tab character)

A character written as #\hex scalar value is the Unicode scalar value with that number.

2.1.1: Examples

[phm]: TODO: how to mark this up?

2.2: Character procedures

2.2.1: char?

  • (char? obj) procedure
Semantics

Returns #t if obj is a character, or #f otherwise.

2.2.2: char->integer

  • (char->integer char) procedure
Semantics

Returns the unicode scalar value of char as an exact integer.

Examples
(char->integer #\x10FFFF) ⇒ #x10FFFF

2.2.3: integer->char

  • (integer->char sv) procedure
Restrictions

The sv must be an exact integer Unicode scalar value, which is equivalent to the following evaluating to true:

(or (<= 0 sv #xD7FF) (<= #xE000 sv #x10FFFF))
Semantics

Returns the character associated with the scalar value.

Examples
(integer->char 32) ⇒ #\space
(char->integer (integer->char 5000)) ⇒ 5000
(integer->char #xD800)&assertion exception

2.2.4: Character comparisions

  • (char=? char1 char2 charn) procedure
  • (char<? char1 char2 charn) procedure
  • (char>? char1 char2 charn) procedure
  • (char<=? char1 char2 charn) procedure
  • (char>=? char1 char2 charn) procedure
Semantics

These procedures return #t if the results of passing their arguments to char->integer are respectively equal, monotonically increasing, monotonically decreasing, monotonically non-decreasing, or monotonically non-increasing.

These predicates are required to be transitive.

Examples
(char<? #\z #\ß) ⇒ #t
(char<? #\z #\Z) ⇒ #f

3: Strings

Strings are sequences of characters.

The length of a string is the number of characters that it contains. This number is fixed when the string is created. The valid indicies of a string are the non-negative integers less than the length of the string. The first character of a string has index 0, the second has index 1, and so on.

Compatability Note The R7RS-Small allowed implementations to forbid certain characters in strings, as long as every ASCII character that was not U+0000 (ASCII NUL) was allowed. The R6RS allows all characters in a string. The R7RS-Large adopts the R6RS behavior.

3.1: String Syntax

String literals are written as sequences of characters enclosed within quotation marks (", U+0022). Within a string literal, various escape sequences represent characters other than themselves. Escape sequences always start with a backslash (\, U+005C):

These escape sequences are case-sensitive, except that the alphabetic digits of a hex scalar value can be uppercase or lowercase.

Compatability Note The R6RS made it a &lexical violation to use any other character than the ones specified as an escape character. The R7RS-Small made it unspecified what happens when a character not listed above was used in an escape sequence.[phm]: TODO: specify that extension is OK, except for code using #!r7rs or similar. Also specify behavior when using #!r6rs.

Examples

TODO: how should we mark this up?

3.2: Raw String Syntax

Raw strings are an alternative syntax for string literals that do not interpret escape sequences inside of them.

The string S is written as a raw string using the notation #"delimiter"string data"delimiter", where string data is the characters of S; with the restriction that S does not contain "delimiter" as a substring, and does not contain "delimiter as a suffix. Since the delimiter can be freely chosen, any string can be represented as a raw string with a suitable choice of delimiter. All characters in S are repesented literally in the string.

Examples

TODO: how should we mark this up?

3.3: String procedures

[phm]: String mutation deprecated?

3.3.1: string?

  • (string? obj) procedure
Semantics

Returns #t if obj is a string, and #f otherwise.

3.3.2: make-string

  • (make-string k [char]) procedure
Semantics

[phm]: Require fixnums here and everywhere else.

Returns a newly allocated string of length k. If char is given, then all elements of the string are initialized to char, otherwise the contents of the string are unspecified.

Examples
(make-string 5 #\a) ⇒ "aaaaa"
(equal? "a" (make-string 1)) ⇒ unspecified

3.3.4: string

  • (string char) procedure
Semantics

Returns a newly allocated string composed of the arguments.

Examples
(string) ⇒ ""
(string #\a #\b #\c) ⇒ "abc"

3.3.4: string

  • (string-length string) procedure
Semantics

Returns the number of characters in the given string as an exact integer.

Examples
(string-length "") ⇒ 0
(string-length "abc") ⇒ 3

3.3.5: string-ref

  • (string-ref string k) procedure
Restrictions

The k must be a valid index of the string.

Semantics

Returns character k of string using zero-origin indexing.

Compatability Note Some implementations may use variable-length string encodings, like UTF-8, for all of their strings. Hence, this procedure may run in O(n) time, where n is the length of the string. Implementers should make this procedure run in constant time.

Examples
(string-ref "ab" 0) ⇒ #\a
(string-ref "ab" 1) ⇒ #\b
(string-ref "ab" 2)&assertion exception

3.3.6: string=?

  • (string=? string1 string2 stringn) procedure
Semantics

Returns #t if the strings are the same length and contain the same characters in the same positions. Otherwise, the string=? procedure returns #f.

Examples
(string=? "Straße" "Strasse") ⇒ #f
(string=? "abc" "aBc") ⇒ #f

3.3.7: String ordering predicates

  • (string<? string1 string2 stringn) procedure
  • (string>? string1 string2 stringn) procedure
  • (string<=? string1 string2 stringn) procedure
  • (string>=? string1 string2 stringn) procedure
Semantics

These procedures are the lexicographic extensions to strings of the corresponding orderings on characters. For example, string<? is the lexicographic ordering on strings induced by the ordering char<? on characters. If two strings differ in length but are the same up to the length of the shorter string, the shorter string is considered to be lexicographically less than the longer string.

Compatability Note The R7RS-Large uses the behavior of the R5RS and the R6RS for these procedures. The R7RS-Small left it implementation defined what specific ordering was used for strings.

Examples
(string<? "z" "ß") ⇒ #t
(string<? "z" "zz") ⇒ #t
(string<? "z" "Z") ⇒ #f

3.3.8: string-copy and substring

  • (string-copy string [start [end]]) procedure
  • (substring string start end) procedure
Semantics

Returns a newly allocated copy of part of string from start inclusive to end exclusive.

The start defaults to 0, and end defaults to (string-length string).

The procedure substring is an alias for string-copy except that all arguments are requied.

Compatability Note The procedure substring was introduced in the R6RS. The R7RS-Small extended the string-copy procedure to take optional arguments.

Examples
(string-copy "Beta substitution") ⇒ "Beta substitution"
(let* ((s1 (string-copy "abc"))
       (s2 (string-copy s1)))
  (string-set! s1 0 #\A)
  (values s1 s2)) ⇒ "Abc" "abc"
(string-copy "Beta substitution" 1 10) ⇒ "eta subst"
(string-copy "Beta substitution" 5) ⇒ "substitution"

3.3.9: string-append

  • (string-append string) procedure
Semantics

Returns a newly allocated string whose characters are the concatenation of the characters in the given strings.

Examples
(string-append "abc" "def") ⇒ "abcdef"
(string-append) ⇒ ""