###: Type-specific Flonum Libraries

by Peter McGoron

Status

¶ Abstract

This SRFI organizes flonum operations into libraries depending on what representation of flonum they operate on. Each library also has the ability to inspect properties of the flonum operations, such as rounding mode and deviations from IEEE 754 arithmetic. Additional procedures are defined for flonums the fact that their representation is known from their library name.

¶ Issues

Should there be an interface that allows one to inspect the floating point interface of a compile target at expand time, if the expand time and run time floating point semantics differ?
Should this include conversion to/from f32 and f64 vectors, and should it specify that f32vectors are vectors of binary32 numbers, etc?
Should conversion procedures raise an exception if a NaN value is unacceptable, or should they coerce it?
The proposed representation-specific flonum syntax is ugly, although in this case accuracy is more important than aesthetics. (They are based off of C’s macros for integer constants.) They also clash with the semantics of Scheme’s numbers: what is the meaning of #e#fl(binary64 1/2)?
To what extent can we specify correct rounding for multi-argument versions of procedures like (:+ x y z w …)?

¶ Rationale

This section is non-normative.

Standard Scheme doesn’t give specifics about the precision and range of inexact numbers. From the R⁴RS onward, implementations could use s, f, d, and l to denote inexact constants of different precisions. The R⁶RS and SRFI 144 included “flonum” operations. However, these specifications do not specify what the format of the flonum is. The flonum might not be an IEEE format number and operations may differ from implementation to implementation.

This SRFI proposes a variant of SRFI 144 that is organized into representation-specific libraries. The functions exported from a specific library operate on a precisely defined number format. For example, if one wanted to operate on binary64 floating point numbers, one can import (srfi ### binary64).

¶ Speed and Portability vs. Ease of Use

One reason to use type-specific procedures is speed: the function sqrt from (srfi ### binary32) can be compiled to a single FSQRT instruction on a RISC-V processor. One could also compile multiple square roots to a single vectorized SQRTPS instruction on an x86_64 processor with SSE2.

Another reason to use type-specific procedures is portability. Given the same rounding mode, format, and IEEE 754 conformance flag below, operations like +, -, and √ will always return the same value given the same inputs.

Most programmers do not have the speed and portability of floating point operations as their top priorities. They want their floating point calculations to work well above blazing speeds or bit-for-bit reproducibility across architectures. Basically, floating-point should do “what they want.” A type-flexible system is more likely to do what the non-numerically-inclined programmer wants: see Kahan 1997 p. 29 and Kahan and Darcy 1998 pp. 60ff.

Scheme’s module system, lack of special arithmetic syntax, and latent typing allow us to separate strict correctness and “do what I want.” Programmers who wish for their programs to do “what they want” should use Scheme’s generic arithmetic. An implementation is free to do things like widen operands or optimize expressions (for example, using the SSE2 instruction RSQRTPS for (/ 1 (sqrt x))) without worrying about strict reproducability or the absolute fastest speed.

¶ Specification

¶ Terminology

All references to IEEE 754 refer to its 2019 revision.

A representation is a type of inexact number that has fixed properties, like exponent range and mantissa width. Examples include binary32, binary64, and posit32 (Gustafson 2022).

An operation is correctly rounded if the returned value is the same as if the operation was calculated to infinite precision, and then rounded to fit in the resulting representation according to the current rounding mode.

Operations are non-stop if all functions return a flonum in the same format as the input, even if the result is a subnormal, infinity, or NaN.

Square brackets [] are used to denote a group of arguments that are optional, but all arguments must be present or absent. If one pair of square brackets are nested in another pair, then the nested pair is optional even when the other arguments are supplied.

In procedure arguments, it is an error if endianness is not the symbol little, big, or an endianness supplied by the macro in (rnrs bytevectors). When endianness is not supplied, it is the native endianness.

Requirements on implementations using RFC 2119 terminology are marked up in strong text.

¶ Library names

The following library names, if available, must implement the library described in the section “IEEE binary floating point library”:

(srfi ### binary16): Operates on IEEE 754 binary16 (aka half precision floating point) values.
(srfi ### binary32): Operates on IEEE 754 binary32 (aka single precision floating point) values.
(srfi ### binary64): Operates on IEEE 754 binary64 (aka double precision floating point) values.
(srfi ### binary128): Operates on IEEE 754 binary128 (aka quadruple precision floating point) values.
(srfi ### binary256): Operates on IEEE 754 binary256 (aka octuple precision floating point) values.

The implementation must export (srfi ###), which implements the same library. It should be the same as one of the above libraries.

The following library names are reserved (where ⟨n⟩ is a base-10 numeral). They are reserved because some of the functions in the flonum library may not be appropriate for these format numbers. A future SRFI or Report will define operations on these representations.

(srfi ### decimal⟨n⟩): Operates on IEEE 754 decimal formats.
(srfi ### complex-⟨format⟩⟨n⟩) where ⟨format⟩ is either binary or decimal: Operates on complex numbers represented as two values in that IEEE format.
(srfi ### binary⟨n⟩) for ⟨n⟩ not previously defined: Reserved for future IEEE 754 revisions.

An implementation may provide libraries with different names than the ones above. Such a library should implement all of the procedures described below. For example, an implementation could provide (srfi ### posit32) for operations on posits, with a similar API to the one below. However, posits do not have infinite values, so infinite? would not be exported.

¶ IEEE binary floating point library

This library exports the identifiers of SRFI 144, with the following modifications:

The identifiers are not prefixed with fl. Instead, they are prefixed with :. (Exceptions: procedures that start with flonum now start with :flonum. The procedure make-fllog-base becomes :make-log-base.)
The fl-fast-fl+* identifier is not exported.
The procedure flnormalized? is now normal?, and fldenormalized? is now subnormal?.
The procedure flonum must return a NaN value if the argument is a non-real number.
The implementation must supply a mode such that the two-argument versions of :+, :-, :*, :/, :+* (aka fused multiply-add), :sqrt, :abs, :absdiff, :posdiff, :floor, :ceiling, :round, :truncate and :remainder return correctly rounded values.
Implementations should have the other procedures return correctly rounded values.
The implementation must supply a mode such that arithmetic is non-stop. (Most implementation already satisfy this requirement.)

Rationale: Multiple floating-point libraries may be pulled into the same library. To disambiguate them, one would prefix them differently. This would mean that procedures would look like f32:fl+, which is redundant. Hence this SRFI uses the shorter : prefix. Then the above can be imported as f32:+ using f32 as a prefix.

The operations that must return correctly rounded values are the one that IEEE 754 mandates to be correctly rounded.

¶ Library summary

The following identifiers are exported:

:e :1/e :e-2 :e-pi/4 :log2-e :log10-e :log-2 :1/log-2 :log-3 :log-pi :log-10 :1/log-10 :pi :1/pi :2pi :pi/2 :pi/4 :2/sqrt-pi :pi-squared :degree :2/pi :sqrt-2 :sqrt-3 :sqrt-5 :sqrt-10 :1/sqrt-2 :cbrt-2 :cbrt-3 :4thrt-2 :phi :log-phi :1/log-phi :euler :e-euler :sin-1 :cos-1 :gamma-1/2 :gamma-1/3 :gamma-2/3 :greatest :least :epsilon :integer-exponent-zero :integer-exponent-nan :flonum :adjacent :copysign :make-flonum :integer-fraction :exponent :integer-exponent :normalized-fraction-exponent :sign-bit :flonum? :=? :<? :>? :<=? :>=? :unordered? :max :min :integer? :zero? :positive? :negative? :odd? :even? :finite? :infinite? :nan? :normal? :subnormal? :+ :* :+* :- :/ :abs :absdiff :posdiff :sgn :numerator :denominator :floor :ceiling :round :truncate :exp :exp2 :exp-1 :square :sqrt :cbrt :hypot :expt :log :log1+ :log2 :log10 :make-log-base :sin :cos :tan :asin :acos :atan :sinh :cosh :tanh :asinh :acosh :atanh :quotient :remainder :remquo :gamma :loggamma :first-bessel :second-bessel :erf :erfc :rounding-mode :features :read-random-flonum :round/ties-to-away :byte-width :bytevector-flonum-ref :bytevector-flonum-set! :string->flonum :flonum->string

¶ New procedures

The examples assume use the optional reader syntax suggestions to denote values of different repersentation.

(srfi ### ⟨library⟩)
procedure (:rounding-mode)

Returns the current rounding mode for this flonum type. This SRFI defines the following symbols which can be returned from this procedure. The SRFI defers to the IEEE 754 standard for the complete definition of these rounding modes. An implementation may add other rounding modes, which should be symbols. For example, an implementation with support for GNU MPFR may add MPFR's additional rounding modes.

round-to-nearest/ties-to-even: Operations are rounded to the nearest representable value, with ties broken by returning the value with an even least significant digit. For representations where that is ambiguous, the returned value is the larger of the tie in magnitude. (IEEE 754 roundTiesToEven)
round-to-nearest/ties-to-away: Operations are rounded to the nearest representable value, with ties broken by returning the tie value with the largest magnitude. (roundTiesToAway)
round-towards-positive: Operations are rounded to the closest representable value not less than the infinitely precise value. (IEEE 754 roundTowardsPositive)
round-towards-negative: Operations are rounded to the closest representable value not greater than the infinitely precise value. (IEEE 754 roundTowardsNegative)
round-towards-zero: Operations are rounded to the closest representable value not greater than in magnitude the infinitely precise value. (IEEE 754 roundTowardsZero)

Note: This SRFI provides no portable way to change the rounding mode because it is a major implementation burden with little benefit. In a vacuum, the rounding mode is best represented as a dynamic variable that can be parameterized. However, the rounding mode is generally a global variable, and can sometimes be attached to individual instructions (RISC-V is an example). Modifying the rounding mode is a pretty rare operation: an analysis of RISC-V code saw no use of any mode besides roundTiesToEven outside of conversion procedures [Zurstraßen 2023].

In a similar vein, this SRFI provides no way of inspecting and raising IEEE 754 exceptions. An example of an implementation that has both IEEE 754 exception handling and rounding mode control is MIT Scheme.

The rounding mode is independent of the behavior of the round function.

(srfi ### ⟨library⟩)
procedure (:features)

Returns a list containing information about the floating point operations in this library. The following symbols have defined meanings. An implementation may add other features, which should be symbols.

subnormals-are-zero: Subnormal numbers are treated as zero. (This is sometimes called “DAZ,” or “denormals are zero” mode, for historical reasons.)
flush-to-zero: An operation that would underflow and create a subnormal number instead creates a zero. (This is sometimes called “FTZ.”)
ieee-754-2019: Arithmetic compiles with IEEE 754. In particular, the operations that IEEE 754 requires to be correctly rounded are correctly rounded. Must not appear when subnormals-are-zero or flush-to-zero appear.
non-stop: Arithmetic is non-stop (see above).
fast-fma: The function (:+* x y z) is at least as fast as or faster than (:+ (:* x y) z). (Fused multiply add must be rounded correctly when IEEE 754 compliance mode is on, regardless of fast-fma being available or not.)
⟨name⟩-correctly-rounded where ⟨name⟩ is a procedure from the library without : prefixed: The function ⟨name⟩ is always correctly rounded. (When ieee-754-2019 appears, then features corresponding to functions the IEEE 754 be correctly rounded must not appear.)

Note: DAZ/FTZ modes are usually enabled by the compiler, or are baked-in features of the architecture. As such, this SRFI does not provide a portable way to manipulate this mode.

This should not be confused with the features procedure in the R⁷RS. This is a run-time procedure that reports on the run-time environment, and the flags may change over the runtime of the program. These flags are not accessable through cond-expand.

(srfi ### ⟨library⟩)
procedure

(:read-random-flonum
                          binary-input-port
                          [start [end]])

Returns a random flonum between start (default 0) exclusive and end (default 1) exclusive calculated from the bytes from binary-input-port. If the bytes from the port are uniformly distributed, then the resulting flonum is drawn from a uniform distribution of flonums between the two supplied numbers, to the best extent possible.

Rationale: Floating point random number generators may take a variable number of bytes to return an answer: see for example Campbell 2014. Because of this, this procedure cannot take a bytevector. This procedure could take an SRFI 158 generator, but those have issues as described in SRFI 271.

This procedure requires that any flonum between the two ends may be returned with roughly equal probability. This precludes some methods such as filling in the lower 52 bits of a binary64 number, because that does not sample all possible exponents. If one wants this faster (and less accurate) sampling method, one can directly manipulate the structure of the flonum using a bytevector and bytevector-flonum-ref.

It is not possible to pick a random flonum between two arbitrary finite flonums uniformly. It is possible in special cases, including the important case of (0,1): see Goualard 2022 for discussion and an algorithm that attempts to sample from arbitrary intervals as uniformly as possible.

(srfi ### ⟨library⟩)
procedure

(:round/ties-to-away
                          fl)

Round fl to an integer flonum, with ties broken as in roundTiesToAway. (C99 round).

(:round/ties-to-away 2.5) ⇒ 3.0
(:round 2.5) ⇒ 2.0
(:round/ties-to-away 3.5) ⇒ 4.0
(:round 3.5) ⇒ 4.0

Note: The flround procedure in the R⁶RS and SRFI 144 implements Scheme’s round ties-to-even behavior, which is the behavior of roundeven in C11.

(srfi ### ⟨library⟩)
value :byte-width

Size of the flonum in bytes.

(srfi ### ⟨library⟩)
procedure

(:bytevector-flonum-ref
                         bv
                         k
                         [endianness])

It is an error if k to k + :byte-width are not valid indices of bv. If endianness is not supplied, it is an error if k is not a multiple of :byte-width.

Read the bytes in bv at k as a flonum of this type, with the endianness.

If the value is a NaN, then the NaN may be coerced into another NaN.

(import (scheme base) (prefix (srfi ### binary64) f64:))

(define bv (make-bytevector f64:byte-width))
(bytevector-u8-set! bv 0 #b01000000)
(bytevector-u8-set! bv 1 #b00001001)
(bytevector-u8-set! bv 2 #b00100001)
(bytevector-u8-set! bv 3 #b11111011)
(bytevector-u8-set! bv 4 #b01010100)
(bytevector-u8-set! bv 5 #b01000100)
(bytevector-u8-set! bv 6 #b00101101)
(bytevector-u8-set! bv 7 #b00011000)
(f64:bytevector-flonum-ref bv 0 'big) ⇒ 3.141592653589793116

Rationale: Some implementations, in particular those that use NaN boxing, may only be able to represent a limited set of NaNs. There were few requirements on quiet versus signalling NaN formats until 2019. Different systems may have different canonical NaNs. For this reason portable code should not expect that different NaNs are distinguishable.

(srfi ### ⟨library⟩)
procedure

(:bytevector-flonum-set!
                         bv
                         k
                         fl
                         [endianness])

It is an error if k to k + :byte-width are not valid indices of bv. If endianness is not supplied, it is an error if k is not a multiple of byte-width.

Write fl to bv at k with endianness.

With the exception of NaNs, this procedure and :bytevector-flonum-ref must to round-trip. That is, given a non-NaN flonum fl,


  (let ((bv (make-bytevector :byte-width)))
    (:bytevector-flonum-set! bv 0 fl)
    (eqv? fl (:bytevector-flonum-ref bv 0)))

always evaluates to #t.

(import (scheme base) (prefix (srfi ### binary32) f32))

(define bv (make-bytevector f32:byte-width))
(f32:bytevector-flonum-set! bv 0 1.41421353816986083984f0 'little)
bv⇒ #u8(#xf3 #x04 #xb5 #x3f)

(srfi ### ⟨library⟩)
procedure

(:string->flonum
                         string
                         [radix])

It is an error if radix is not 2, 8, 10, or 16. The value of radix defaults to 10.

Read string as a number in that representation.

(import (scheme base)
        (prefix (srfi ### binary64) f64)
        (prefix (srfi ### binary128) f128))
(f128:string->flonum "1e400") ⇒ 1l400
(f64:string->flonum "1e400") ⇒ #fl(binary64 +inf.0)

(srfi ### ⟨library⟩)
procedure

(:flonum->string
                         fl
                         [radix])

It is an error if radix is not 2, 8, 10, or 16. The value of radix defaults to 10.

Return a string that represents fl in radix. This procedure must round-trip fl with string->flonum in the way that the R⁷RS specifies for number->string.

¶ Implications of IEEE arithmetic for optimizers

When an implementation advertises that it implements, e.g. sqrt with one rounding, then it must not reorder or optimize the program if it would return a different result. For example, (/ 1 (sqrt x)) may return a different result if implemented as two operations literally, versus as one inverse square root operation. Implementations should offer modes that do not optimize mathematical operations at the expense of reproducibility.

Given the same rounding mode, input values, with ieee-754-2019 and non-stop as features, any set of operations that are correctly rounded will produce the same answers on one correctly conforming implementation as on another with the same rounding mode, input values, and features implicating correct rounding.

¶ Examples

This section is non-normative.

(import (scheme base) (prefix (srfi ### binary32) f32))
(unless (member 'ieee-754-2019 (f32:features))
  (error "requires IEEE 754 arithmetic"))

(define (f32:kahan-sum lst)
  (do ((sum (f32:flonum 0.0))
       (c (f32:flonum 0.0))
       (lst lst (cdr lst)))
      ((null? lst) sum)
    (let* ((y (fl32:- (car lst) c))
           (t (fl32:+ sum y)))
      (set! c (fl32:- (fl32:- t sum) y))
      (set! sum t))))

This code will always calculate the correct results with the desired algorithmic properties on any conforming implementation. In particular, a conforming implementation will not re-order operations in such a way to make the output values differ.

¶ Considerations for inexact number vectors

SRFI 4 specifies f32vectors and f64vectors, and SRFI 160 specifies c64vectors and c128vectors. Implementors should make the elements of each vector the corresponding representation in the table:

Vector	Representation
f32vector	binary32
f64vector	binary64
c64vector	each part is	binary32
c128vector	each part is	binary64

The R⁷RS-Large is likely to promote that “should” to “must.”

¶ Reader syntax suggestions

On implementations with native binary floating point of multiple precisions, the exponent specifiers in the table should map to the corresponding representation:

Exponent	Representation
`s`	binary16
`f`	binary32
`d`	binary64
`l`	binary128

On implementations with wildly varying representations, such as decimal floats or posit numbers, one wants to specify the number format in a precise and portable manner. For that one may modify the grammar of the R⁷RS to be the following:

  ⟨real R⟩ → ⟨real numeral R⟩
           | ⟨represented flonum R⟩
  ⟨real numeral R⟩ → ⟨sign⟩ ⟨ureal R⟩ | ⟨infnan⟩
  ⟨represented flonum R⟩ → #fl( ⟨representation name⟩ ⟨real numeral R⟩ )
  ⟨representation name⟩ → binary16 | binary32 | …

For example, #fl(binary256 1e400) reads as a finite number, while #fl(binary64 1e400) reads the same as #fl(binary64 +inf.0). The syntax allows for complex numbers to be written with mixed precision: for example, #fl(binary32 1.0)+#fl(binary64 2.0)i.

¶ Implementation

A portable implementation is impossible. In general, a complete implementation of this SRFI would require knowledge of what optimizations occur on floating point operations, and the target architecture.

Most implementations only have one floating-point type. An implementation can copy most of their SRFI 144 implementation to (srfi ### binary64) with minor renamings without issue.

The simplest way to implement this SRFI is an FFI to C’s fenv.h. Checking the FTZ/DAZ mode (for example, on Intel CPUs) requires intrinsics to check the MXCSR register.

A sample implementation specific to an implementation + architecture will be written.

¶ Bibliography

Taylor Campbell. 2014. Uniform random floats: How to generate a double-precision floating-point number in [0, 1] uniformly at random given a uniform random source of bits. Retrieved from https://mumble.net/~campbell/2014/04/28/uniform-random-float on 2026-06-20.
Laurent Fousse et al. 2007. MPFR: A multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw., 33, 2. doi:10.1145/1236463.1236468. The version referenced in this SRFI is 4.2.2.
Frédéric Goualard. 2022 Drawing random floating-point numbers from an interval. ACM Transactions on Modeling and Computer Simulation, 32 (3). hal-03282794v2
John Gustafson et al. 2022. Standard for Posit Arithmetic. Retrieved from https://posithub.org/docs/posit_standard-2.pdf on 2026-06-20.
IEEE Computer Society. 2019. IEEE Standard for Floating-Point Arithmetic (IEEE STD 754-2019). doi:10.1109/IEEESTD.2019.8766229. ISBN 978-1-5044-5924-2.
William Kahan. 1997. Lecture Notes on the Status of IEEE Standard 754 for Binary Floating-Point Arithmetic. Retrieved from https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF on 2026-06-20.
William Kahan and Joseph Darcy. 1998. How Java’s Floating-Point Hurts Everyone. Retrieved from https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf on 2026-06-20.
Massachusetts Institute of Technology. 2022. Fixnum and Flonum Operations in MIT/GNU Scheme. Retrieved from https://www.gnu.org/software/mit-scheme/documentation/stable/mit-scheme-ref/Fixnum-and-Flonum-Operations.html on 2026-06-20.
Niko Zurstraßen. 2023. Evaluation of the RISC-V Floating Point Extensions F/D. Retrieved from https://www.chciken.com/risc-v/2023/08/06/evaluation-riscv-fd.html on 2026-06-20.

¶ Acknowledgements

Thanks to those in Working Group 2 for discussing the semantics of this SRFI. In particular, I would like to thank Zhu Zihao for lots of information gathering.

I thank Bradley Lucier for his input.

I thank the authors of SRFI 144, as this work builds on theirs.

I also thank William Kahan, whose work on IEEE 754 and his many complaints about how programming language designers fail to understand it influenced the design of this SRFI (even if I could not incorporate all of his suggestions).

¶ Copyright

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Editor: Arthur A. Gleckler