The Revised7 Report on the Algorithmic Language Scheme, Large Edition, is released in fascicles, which are incremental releases of the full standard.
This is the Valued Fascicle (fascicle 3 of 8) containing the description of the values that make up Scheme data, such as pairs, vectors, numbers, and strings.
The major features of this fascicle:
Raw string literals.
The addition of strictly-real? and other procedures, which has the semantics of real?, etc. in the R6RS.
The semantics of real? are now aligned with the intentions of the R7RS's Working Group 1.
Raw string literals will require changes to the reader. Some standard library procedures need redefinition.
This chapter describes Scheme’s model for numbers. It is important to distinguish between the mathematical numbers, the Scheme objects that attempt to model them, the machine representations used to implement the numbers, and notations used to write numbers. In this report, the term number refers to a mathematical number, and the term number object refers to a Scheme object representing a number. This report uses the types complex, real, rational, andinteger to refer to both mathematical numbers and number objects.[phm]: The term "number object" is confusing and pedantic. It should be removed and the terminology should be reverted back to the R5RS terminology. The fixnum and flonum refer to special subsets of the number objects, as determined by common machine representations, as explained below.
Numbers may be arranged into a tower of subsets in which each level is a subset of the level above it:
number
complex number
real number
rational number
integer
For example, 5 is an integer. Therefore 5 is also a rational, a real, and a complex. The same is true of the number objects that model 5.
Number objects are organized as a corresponding tower of subtypes defined by the predicates number?, complex?, real?, rational?, and integer?; see section 1.6. Integer number objects are also called integer objects.
There is no simple relationship between a number’s type and its representation inside a computer. Although most implementations of Scheme will offer at least two different representations of 3, these different representations denote the same integer.
Scheme’s numerical operations treat numbers as abstract data, as independent of their representation as possible. Although an implementation of Scheme may use multiple internal representations of numbers, this ought not to be apparent to a casual programmer writing simple programs.
It is useful to distinguish between number objects that are known to correspond to a number exactly, and those number objects whose computation involved rounding or other errors. For example, index operations into data structures may need to know the index exactly, as may some operations on polynomial coefficients in a symbolic algebra system. On the other hand, the results of measurements are inherently inexact, and irrational numbers may be approximated by rational and therefore inexact approximations. In order to catch uses of numbers known only inexactly where exact numbers are required, Scheme explicitly distinguishes exact from inexact number objects. This distinction is orthogonal to the dimension of type.
A number object is exact if it is the value of an exact numerical literal or was derived from exact number objects using only exact operations. Exact number objects correspond to mathematical numbers in the obvious way.
Conversely, a number object is inexact if it is the value of an inexact numerical literal, or was derived from inexact number objects, or was derived using inexact operations. Thus inexactness is contagious.
Exact arithmetic is reliable in the following sense: If exact number objects are passed to any of the arithmetic procedures described in section TODO, and an exact number object is returned, then the result is mathematically correct. This is generally not true of computations involving inexact number objects because approximate methods such as floating-point arithmetic may be used, but it is the duty of each implementation to make the result as close as practical to the mathematically ideal result.
Implementations of Scheme must support number objects for the entire tower of subtypes given in section 1.1. Moreover, implementations must support exact integer objects and exact rational number objects of practically unlimited size and precision, and to implement certain procedures (listed in section TODO) so they always return exact results when given exact arguments. ( “Practically unlimited” means that the size and precision of these numbers should only be limited by the size of the available memory.)
Implementations may support only a limited range of inexact number objects of any type, subject to the requirements of this section. For example, an implementation may limit the range of the inexact real number objects (and therefore the range of inexact integer and rational number objects) to the dynamic range of the flonum format. Furthermore the gaps between the inexact integer objects and rationals are likely to be very large in such an implementation as the limits of this range are approached.Where does this leave mixed-exactness complex numbers? I read this as making them optional. Also how does "range" apply to complex numbers?
An implementation may use floating point and other approximate representation strategies for inexact numbers. This report recommends, but does not require, that the IEEE floating-point standards be followed by implementations that use floating-point representations, and that im- plementations using other representations should match or exceed the precision achievable using these floating-point standards TODO: cite IEEE 754.
In particular, implementations that use floating-point representations must follow these rules: A floating-point result must be represented with at least as much precision as is used to express any of the inexact arguments to that operation. Potentially inexact operations such as sqrt, when applied to exact arguments, should produce exact answers whenever possible (for example the square root of an exact 4 ought to be an exact 2). However, this is not required.Should the following two sentences be removed?If, on the other hand, an exact number object is operated upon so as to produce an inexact result (as by sqrt), nd if the result is represented in floating point, then the most precise floating-point format available must be used; but if the result is represented in some other way then the representation must have at least as much precision as the most precise floating-point format available.
It is the programmer’s responsibility to avoid using inexact number objects with magnitude or significand too large to be represented in the implementation.
Compatbility Note The R6RS required exact integers and rationals of practically unlimited size and precision, and the whole numerical tower. The R7RS-Small did not. The R7RS-Large adopts the R6RS behavior.
TODO: Should we require IEEE 754?
These numerical type predicates can be applied to any kind
of argument. They return #t if the object is a number object
of the named type, and #f otherwise. In general, if a
type predicate is true of a number object then all higher
type predicates are also true of that number object. Consequently,
if a type predicate is false of a number object, then all lower type
predicates are also false of that number object.
Returns #t if the object is a number
object, and #f otherwise.[phm]: Depending
on the allowed extensions, this should have description similar to that
in the R7RS-Small.
[phm]: These examples are not in previous reports.
(number? 0) ⇒ #t(number? 0.0) ⇒ #t(number? 1/2) ⇒ #t(number? 1+2i) ⇒ #t(number? "1") ⇒ #f(number? '(1)) ⇒ #fReturns #t if the object is a complex
number object, and #f otherwise.[phm]: Depending on allowed extensions, this should have
discussion similar to the R7RS-Small.
(complex? 3+4i) ⇒ #t(complex? 3) ⇒ #tReturns #t if the object is a real number
object, and #f otherwise.
Equivalent to: [phm]: In a break from previous Reports, and more in-line with
how SRFIs are written, this is written as a implementation.
(define (real? x)
(and (complex? x)
(zero? (imag-part x))))Rationale The R6RS defined a complex number object to be real if and only if it had an exact zero imaginary part. This differed from the R5RS definition. One of the reasons given for this change in the rationale of the R6RS is that real? could be used as a representational predicate separate from the usual representation of complex numbers. The R5RS definition was given the name real-valued?.
The R7RS-Small attemped to change the definition back to the R5RS definition, but the examples made it unclear if the R5RS or R6RS definition was required. Working Group 2 voted to restore the R5RS definition and give the R6RS definition the name strictly-real?.
(real? 3+4i) ⇒ #f(real? 3) ⇒ #t(real? -2.5+0i) ⇒ #t(real? -2.5+0.0i) ⇒ #t(real? -2.5-0.0i) ⇒ #t(real? #e1e10) ⇒ #t(real? +inf.0) ⇒ #t(real? +nan.0) ⇒ #tReturns #t iff the object is a real
number object with an imaginary part that is exactly zero. Equivalent to
(define (strictly-real? obj)
(and (real? obj)
(exact? (imag-part obj))))Rationale The sign of the zero in the imaginary part of a complex number can change the behavior of the number when applied to trancendental functions, as the real number line is a common branch cut cite Kahan's paper. This predicate returns true when the number is exactly on the real line, and not possibly infinitesimally off of it.
(strictly-real? 3+4i) ⇒ #f(strictly-real? 3) ⇒ #t(strictly-real? -2.5+0i) ⇒ #t(strictly-real? -2.5+0.0i) ⇒ #f(strictly-real? -2.5-0.0i) ⇒ #f(strictly-real? #e1e10) ⇒ #t(strictly-real? +inf.0) ⇒ #t(strictly-real? +nan.0) ⇒ #tReturns #t if the object is an integer
object, and #f otherwise. Equivalent to:
(define (integer? obj)
(and (real? obj)
(= obj (round obj))))[phm]: This definition is taken from the R7RS-Small. Is the R6RS definition better?
(integer? 3+0i) ⇒ #t(integer? 3.0) ⇒ #t(integer? 8/4) ⇒ #t(integer? 3.1) ⇒ #f(integer? 3+1i) ⇒ #f(integer? 3+0.0i) ⇒ #t(integer? 3-0.0i) ⇒ #tReturns #t iff the object is an integer
object with an imaginary part that is exactly zero. Equivalent to
(define (strictly-integer? obj)
(and (integer? obj)
(exact? (imag-part obj))))(strictly-integer? 3+0i) ⇒ #t(strictly-integer? 3.0) ⇒ #t(strictly-integer? 8/4) ⇒ #t(strictly-integer? 3.1) ⇒ #f(strictly-integer? 3+1i) ⇒ #f(strictly-integer? 3+0.0i) ⇒ #f(strictly-integer? 3-0.0i) ⇒ #fCharacters are objects that represent Unicode scalar values.
Note Unicode defines a standard mapping between sequences
of Unicode scalar values (integers in the range 0 to #x10FFFF,
excluding the range #xD800 to #xDFFF) in the latest version
of the standard and human-readable “characters”.[phm]: That previous sentence should be reworded.
More precisely, Unicode distinguishes between glyphs, which are
printed for humans to read, and characters, which are abstract
entities that map to glyphs (sometimes in a way that’s sensitive
to surrounding characters). Furthermore, different sequences of
scalar values sometimes correspond to the same character. The
relationships among scalar, characters, and glyphs are subtle
and complex.
Despite this complexity, most things that a literate human would call a“character” can be represented by a single Unicode scalar value (although several sequences of Unicode scalar values may represent that same character). For example, Roman letters, Cyrillic letters, Hebrew consonants, and most Chinese characters fall into this category.
Unicode scalar values exclude the range #xD800 to #xDFFF,
which are part of the range of Unicode code points. However,
the Unicode code points in this range, the so-called surrogates,
are an artifact of the UTF-16 encoding, and can only appear in
specific Unicode encodings, and even then only in pairs that encode
scalar values. Consequently, all characters represent code
points, but the surrogate code points do not have representations
as characters.
Compatibility Note The R6RS required Scheme programs
to be written in sequences of Unicode scalar values. The R7RS-Small allowed for non-Unicode characters as long as #x0 and #x7F were the ASCII characters, that char->integer and integer->char treated
non-Unicode characters as having integer values greater than Unicode
characters, and that certain procedures were defined on Unicode scalar
values according to certain properties and annexes of the Unicode standard.
The R7RS-Large adopts the R6RS behavior.
The grammar of character constants is:
The ⟨R6RS character name⟩s are used instead of the ⟨character name⟩s when the parser is set to R6RS mode.
Case is significant in #\⟨character⟩, and
in #\⟨character name⟩, but not in
#\x⟨hex scalar value⟩. A ⟨character⟩ must be followed by a ⟨delimiter⟩ or by the end of the
input. This rule resolves various ambiguous cases involving named
characters, requiring, for example, the sequence of characters
#\space to be interpreted as the space character rather than
as the character #s followed by the identifier pace.
Characters written in the #\ notation are self-evaluating.
That is, they do not have to be quoted in programs.[phm]: Extension paragraph?
Compatability Note The R6RS introduced more control characters for the ASCII control characters that commonly had escape sequences in other programming languages. The R7RS-Small adopted a different set of names and escape sequences, intended to be more mnemonic, and to remove rarely used codes such as vertical tab and form feed. The R7RS-Large adopts the R7RS-Small convention.
The meaning of each escape sequence (R6RS-specific escape sequence in parentheses) is:
#\alarm#\backspace#\delete#\escape (#\esc)#\newline (#\linefeed)#\null (#\nul)#\return#\space#\tab#\vtab)A character written as #\⟨hex scalar value⟩ is the Unicode scalar
value with that number.
[phm]: TODO: how to mark this up?
Returns the unicode scalar value of char as an exact integer.
(char->integer #\x10FFFF) ⇒ #x10FFFFThe sv must be an exact integer Unicode
scalar value, which is equivalent to the following evaluating to true:
(or (<= 0 sv #xD7FF) (<= #xE000 sv #x10FFFF))Returns the character associated with the scalar value.
(integer->char 32) ⇒ #\space(char->integer (integer->char 5000)) ⇒ 5000(integer->char #xD800) ⇒ &assertion exceptionThese procedures return #t if the results
of passing their arguments to char->integer are respectively
equal, monotonically increasing, monotonically decreasing, monotonically
non-decreasing, or monotonically non-increasing.
These predicates are required to be transitive.
(char<? #\z #\ß) ⇒ #t(char<? #\z #\Z) ⇒ #fStrings are sequences of characters.
The length of a string is the number of characters that it contains. This number is fixed when the string is created. The valid indicies of a string are the non-negative integers less than the length of the string. The first character of a string has index 0, the second has index 1, and so on.
Compatability Note The R7RS-Small allowed implementations to forbid certain characters in strings, as long as every ASCII character that was not U+0000 (ASCII NUL) was allowed. The R6RS allows all characters in a string. The R7RS-Large adopts the R6RS behavior.
String literals are written as
sequences of characters enclosed within quotation marks (", U+0022). Within a string literal, various escape sequences
represent characters other than themselves. Escape sequences always
start with a backslash (\, U+005C):
\a: alarm, U+0007
\b: backspace, U+0008
\t: character tabulation, U+0009
\n: line feed, U+000A
\r: return, U+000D
\": double quote, U+0022
\\: backslash, U+005C
\|: vertical line, U+007C
\⟨intraline whitespace⟩⟨line ending⟩⟨intraline whitespace⟩: nothing
\x⟨hex scalar value⟩;: specified character (note the
terminating semicolon)
These escape sequences are case-sensitive, except that the alphabetic digits of a ⟨hex scalar value⟩ can be uppercase or lowercase.
Compatability Note The R6RS made it a &lexical violation to use any other character than the ones specified as an
escape character. The R7RS-Small made it unspecified what
happens when a character not listed above was used in an escape sequence.[phm]: TODO: specify that extension is OK, except
for code using #!r7rs or similar. Also specify behavior when
using #!r6rs.
TODO: how should we mark this up?
Raw strings are an alternative syntax for string literals that do not interpret escape sequences inside of them.
The string S is written as a raw string
using the notation #"⟨delimiter⟩"⟨string data⟩"⟨delimiter⟩", where ⟨string data⟩ is the characters of S; with the restriction that S does not contain "⟨delimiter⟩" as a substring, and does not contain "⟨delimiter⟩ as a suffix. Since the ⟨delimiter⟩ can be freely chosen, any string can be
represented as a raw string with a suitable choice of delimiter. All
characters in S are repesented literally in the string.
TODO: how should we mark this up?
[phm]: String mutation deprecated?
[phm]: Require fixnums here and everywhere else.
Returns a newly allocated string of length k. If char is given, then all elements of the string are initialized to char, otherwise the contents of the string are unspecified.
(make-string 5 #\a) ⇒ "aaaaa"(equal? "a" (make-string 1)) ⇒ unspecifiedReturns a newly allocated string composed of the arguments.
(string) ⇒ ""(string #\a #\b #\c) ⇒ "abc"Returns the number of characters in the given string as an exact integer.
(string-length "") ⇒ 0(string-length "abc") ⇒ 3The k must be a valid index of the string.
Returns character k of string using zero-origin indexing.
Compatability Note Some implementations may use variable-length string encodings, like UTF-8, for all of their strings. Hence, this procedure may run in O(n) time, where n is the length of the string. Implementers should make this procedure run in constant time.
(string-ref "ab" 0) ⇒ #\a(string-ref "ab" 1) ⇒ #\b(string-ref "ab" 2) ⇒ &assertion exceptionReturns #t if the strings are the same
length and contain the same characters in the same positions. Otherwise,
the string=? procedure returns #f.
(string=? "Straße" "Strasse") ⇒ #f(string=? "abc" "aBc") ⇒ #fThese procedures are the lexicographic extensions to strings of the corresponding orderings on characters. For example, string<? is the lexicographic ordering on strings induced by the ordering char<? on characters. If two strings differ in length but are the same up to the length of the shorter string, the shorter string is considered to be lexicographically less than the longer string.
Compatability Note The R7RS-Large uses the behavior of the R5RS and the R6RS for these procedures. The R7RS-Small left it implementation defined what specific ordering was used for strings.
(string<? "z" "ß") ⇒ #t(string<? "z" "zz") ⇒ #t(string<? "z" "Z") ⇒ #fReturns a newly allocated copy of part of string from start inclusive to end exclusive.
The start defaults to 0, and end defaults to (string-length string).
The procedure substring is an alias for string-copy except that all arguments are requied.
Compatability Note The procedure substring was introduced in the R6RS. The R7RS-Small extended the string-copy procedure to take optional arguments.
(string-copy "Beta substitution") ⇒ "Beta substitution"(let* ((s1 (string-copy "abc"))
(s2 (string-copy s1)))
(string-set! s1 0 #\A)
(values s1 s2)) ⇒ "Abc" "abc"(string-copy "Beta substitution" 1 10) ⇒ "eta subst"(string-copy "Beta substitution" 5) ⇒ "substitution"Returns a newly allocated string whose characters are the concatenation of the characters in the given strings.
(string-append "abc" "def") ⇒ "abcdef"(string-append) ⇒ ""