mirror of
https://github.com/apple/swift.git
synced 2025-12-14 20:36:38 +01:00
478 lines
16 KiB
ReStructuredText
478 lines
16 KiB
ReStructuredText
:orphan:
|
|
|
|
Text Formatting in Swift
|
|
========================
|
|
|
|
:Author: Dave Abrahams
|
|
:Author: Chris Lattner
|
|
:Author: Dave Zarzycki
|
|
:Date: 2013-08-12
|
|
|
|
|
|
.. contents:: Index
|
|
|
|
**Abstract:** We propose a system for creating textual representations
|
|
of Swift objects. Our system unifies conversion to ``String``, string
|
|
interpolation, printing, and representation in the REPL and debugger.
|
|
|
|
Scope
|
|
-----
|
|
|
|
Goals
|
|
.....
|
|
|
|
* The REPL and LLDB ("debuggers") share formatting logic
|
|
* All types are "debug-printable" automatically
|
|
* Making a type "printable for humans" is super-easy
|
|
* ``toString()``-ability is a consequence of printability.
|
|
* Customizing a type's printed representations is super-easy
|
|
* Format variations such as numeric radix are explicit and readable
|
|
* Large textual representations do not (necessarily) ever need to be
|
|
stored in memory, e.g. if they're being streamed into a file or over
|
|
a remote-debugging channel.
|
|
|
|
Non-Goals
|
|
.........
|
|
|
|
.. sidebar:: Rationale
|
|
|
|
Localization (including single-locale linguistic processing such as
|
|
what's found in Clang's diagnostics subsystem) is the only major
|
|
application we can think of for dynamically-constructed format
|
|
strings, [#dynamic]_ and is certainly the most important consumer of
|
|
that feature. Therefore, localization and dynamic format strings
|
|
should be designed together, and *under this proposal* the only
|
|
format strings are string literals containing interpolations
|
|
("``\(...)``"). Cocoa programmers can still use Cocoa localization
|
|
APIs for localization jobs.
|
|
|
|
In Swift, only the most common cases need to be very terse.
|
|
Anything "fancy" can afford to be a bit more verbose. If and when
|
|
we address localization and design a full-featured dynamic string
|
|
formatter, it may make sense to incorporate features of ``printf``
|
|
into the design.
|
|
|
|
* **Localization** issues such as pluralizing and argument
|
|
presentation order are beyond the scope of this proposal.
|
|
|
|
* **Dynamic format strings** are beyond the scope of this proposal.
|
|
|
|
* **Matching the terseness of C**\ 's ``printf`` is a non-goal.
|
|
|
|
CustomStringConvertible Types
|
|
-----------------------------
|
|
|
|
``CustomStringConvertible`` types can be used in string literal interpolations,
|
|
printed with ``print(x)``, and can be converted to ``String`` with
|
|
``x.toString()``.
|
|
|
|
The simple extension story for beginners is as follows:
|
|
|
|
"To make your type ``CustomStringConvertible``, simply declare conformance to
|
|
``CustomStringConvertible``::
|
|
|
|
extension Person : CustomStringConvertible {}
|
|
|
|
and it will have the same printed representation you see in the
|
|
interpreter (REPL). To customize the representation, give your type
|
|
a ``func format()`` that returns a ``String``::
|
|
|
|
extension Person : CustomStringConvertible {
|
|
func format() -> String {
|
|
return "\(lastName), \(firstName)"
|
|
}
|
|
}
|
|
|
|
The formatting protocols described below allow more efficient and
|
|
flexible formatting as a natural extension of this simple story.
|
|
|
|
Formatting Variants
|
|
-------------------
|
|
|
|
``CustomStringConvertible`` types with parameterized textual representations
|
|
(e.g. number types) *additionally* support a ``format(...)`` method
|
|
parameterized according to that type's axes of variability::
|
|
|
|
print(offset)
|
|
print(offset.format()) // equivalent to previous line
|
|
print(offset.format(radix: 16, width: 5, precision: 3))
|
|
|
|
Although ``format(...)`` is intended to provide the most general
|
|
interface, specialized formatting interfaces are also possible::
|
|
|
|
print(offset.hex())
|
|
|
|
|
|
Design Details
|
|
--------------
|
|
|
|
Output Streams
|
|
..............
|
|
|
|
The most fundamental part of this design is ``TextOutputStream``, a thing
|
|
into which we can stream text: [#character1]_
|
|
|
|
::
|
|
|
|
protocol TextOutputStream {
|
|
func append(_ text: String)
|
|
}
|
|
|
|
Every ``String`` can be used as an ``TextOutputStream`` directly::
|
|
|
|
extension String : TextOutputStream {
|
|
func append(_ text: String)
|
|
}
|
|
|
|
Debug Printing
|
|
..............
|
|
|
|
Via compiler magic, *everything* conforms to the ``CustomDebugStringConvertible``
|
|
protocol. To change the debug representation for a type, you don't
|
|
need to declare conformance: simply give the type a ``debugFormat()``::
|
|
|
|
/// A thing that can be printed in the REPL and the Debugger
|
|
protocol CustomDebugStringConvertible {
|
|
typealias DebugRepresentation : TextOutputStreamable = String
|
|
|
|
/// Produce a textual representation for the REPL and
|
|
/// Debugger.
|
|
func debugFormat() -> DebugRepresentation
|
|
}
|
|
|
|
Because ``String`` is a ``TextOutputStreamable``, your implementation of
|
|
``debugFormat`` can just return a ``String``. If want to write
|
|
directly to the ``TextOutputStream`` for efficiency reasons,
|
|
(e.g. if your representation is huge), you can return a custom
|
|
``DebugRepresentation`` type.
|
|
|
|
|
|
.. Admonition:: Guideline
|
|
|
|
Producing a representation that can be consumed by the REPL
|
|
and LLDB to produce an equivalent object is strongly encouraged
|
|
where possible! For example, ``String.debugFormat()`` produces
|
|
a representation starting and ending with "``"``", where special
|
|
characters are escaped, etc. A ``struct Point { var x, y: Int }``
|
|
might be represented as "``Point(x: 3, y: 5)``".
|
|
|
|
(Non-Debug) Printing
|
|
....................
|
|
|
|
The ``CustomStringConvertible`` protocol provides a "pretty" textual representation
|
|
that can be distinct from the debug format. For example, when ``s``
|
|
is a ``String``, ``s.format()`` returns the string itself,
|
|
without quoting.
|
|
|
|
Conformance to ``CustomStringConvertible`` is explicit, but if you want to use the
|
|
``debugFormat()`` results for your type's ``format()``, all you
|
|
need to do is declare conformance to ``CustomStringConvertible``; there's nothing to
|
|
implement::
|
|
|
|
/// A thing that can be print()ed and toString()ed.
|
|
protocol CustomStringConvertible : CustomDebugStringConvertible {
|
|
typealias PrintRepresentation : TextOutputStreamable = DebugRepresentation
|
|
|
|
/// produce a "pretty" textual representation.
|
|
///
|
|
/// In general you can return a String here, but if you need more
|
|
/// control, return a custom TextOutputStreamable type
|
|
func format() -> PrintRepresentation {
|
|
return debugFormat()
|
|
}
|
|
|
|
/// Simply convert to String
|
|
///
|
|
/// You'll never want to reimplement this
|
|
func toString() -> String {
|
|
var result: String
|
|
self.format().write(result)
|
|
return result
|
|
}
|
|
}
|
|
|
|
``TextOutputStreamable``
|
|
........................
|
|
|
|
Because it's not always efficient to construct a ``String``
|
|
representation before writing an object to a stream, we provide a
|
|
``TextOutputStreamable`` protocol, for types that can write themselves into an
|
|
``TextOutputStream``. Every ``TextOutputStreamable`` is also a
|
|
``CustomStringConvertible``, naturally::
|
|
|
|
protocol TextOutputStreamable : CustomStringConvertible {
|
|
func writeTo<T: TextOutputStream>(_ target: [inout] T)
|
|
|
|
// You'll never want to reimplement this
|
|
func format() -> PrintRepresentation {
|
|
return self
|
|
}
|
|
}
|
|
|
|
How ``String`` Fits In
|
|
......................
|
|
|
|
``String``\ 's ``debugFormat()`` yields a ``TextOutputStreamable`` that
|
|
adds surrounding quotes and escapes special characters::
|
|
|
|
extension String : CustomDebugStringConvertible {
|
|
func debugFormat() -> EscapedStringRepresentation {
|
|
return EscapedStringRepresentation(self)
|
|
}
|
|
}
|
|
|
|
struct EscapedStringRepresentation : TextOutputStreamable {
|
|
var _value: String
|
|
|
|
func writeTo<T: TextOutputStream>(_ target: [inout] T) {
|
|
target.append("\"")
|
|
for c in _value {
|
|
target.append(c.escape())
|
|
}
|
|
target.append("\"")
|
|
}
|
|
}
|
|
|
|
Besides modeling ``TextOutputStream``, ``String`` also conforms to
|
|
``TextOutputStreamable``::
|
|
|
|
extension String : TextOutputStreamable {
|
|
func writeTo<T: TextOutputStream>(_ target: [inout] T) {
|
|
target.append(self) // Append yourself to the stream
|
|
}
|
|
|
|
func format() -> String {
|
|
return self
|
|
}
|
|
}
|
|
|
|
This conformance allows *most* formatting code to be written entirely
|
|
in terms of ``String``, simplifying usage. Types with other needs can
|
|
expose lazy representations like ``EscapedStringRepresentation``
|
|
above.
|
|
|
|
Extended Formatting Example
|
|
---------------------------
|
|
|
|
The following code is a scaled-down version of the formatting code
|
|
used for ``Int``. It represents an example of how a relatively
|
|
complicated ``format(...)`` might be written::
|
|
|
|
protocol CustomStringConvertibleInteger
|
|
: ExpressibleByIntegerLiteral, Comparable, SignedNumber, CustomStringConvertible {
|
|
func %(lhs: Self, rhs: Self) -> Self
|
|
func /(lhs: Self, rhs: Self) -> Self
|
|
constructor(x: Int)
|
|
func toInt() -> Int
|
|
|
|
func format(_ radix: Int = 10, fill: String = " ", width: Int = 0)
|
|
-> RadixFormat<This> {
|
|
|
|
return RadixFormat(this, radix: radix, fill: fill, width: width)
|
|
}
|
|
}
|
|
|
|
struct RadixFormat<T: CustomStringConvertibleInteger> : TextOutputStreamable {
|
|
var value: T, radix = 10, fill = " ", width = 0
|
|
|
|
func writeTo<S: TextOutputStream>(_ target: [inout] S) {
|
|
_writeSigned(value, &target)
|
|
}
|
|
|
|
// Write the given positive value to stream
|
|
func _writePositive<T:CustomStringConvertibleInteger, S: TextOutputStream>(
|
|
_ value: T, stream: [inout] S
|
|
) -> Int {
|
|
if value == 0 { return 0 }
|
|
var radix: T = T.fromInt(self.radix)
|
|
var rest: T = value / radix
|
|
var nDigits = _writePositive(rest, &stream)
|
|
var digit = UInt32((value % radix).toInt())
|
|
var baseCharOrd : UInt32 = digit <= 9 ? '0'.value : 'A'.value - 10
|
|
stream.append(String(UnicodeScalar(baseCharOrd + digit)))
|
|
return nDigits + 1
|
|
}
|
|
|
|
func _writeSigned<T:CustomStringConvertibleInteger, S: TextOutputStream>(
|
|
_ value: T, target: [inout] S
|
|
) {
|
|
var width = 0
|
|
var result = ""
|
|
|
|
if value == 0 {
|
|
result = "0"
|
|
++width
|
|
}
|
|
else {
|
|
var absVal = abs(value)
|
|
if (value < 0) {
|
|
target.append("-")
|
|
++width
|
|
}
|
|
width += _writePositive(absVal, &result)
|
|
}
|
|
|
|
while width < width {
|
|
++width
|
|
target.append(fill)
|
|
}
|
|
target.append(result)
|
|
}
|
|
}
|
|
|
|
extension Int : CustomStringConvertibleInteger {
|
|
func toInt() -> Int { return self }
|
|
}
|
|
|
|
|
|
Possible Extensions (a.k.a. Complications)
|
|
------------------------------------------
|
|
|
|
We are not proposing these extensions. Since we have given them
|
|
considerable thought, they are included here for completeness and to
|
|
ensure our proposed design doesn't rule out important directions of
|
|
evolution.
|
|
|
|
``TextOutputStream`` Adapters
|
|
.............................
|
|
|
|
Most text transformations can be expressed as adapters over generic
|
|
``TextOutputStream``\ s. For example, it's easy to imagine an upcasing
|
|
adapter that transforms its input to upper case before writing it to
|
|
an underlying stream::
|
|
|
|
struct UpperStream<UnderlyingStream:TextOutputStream> : TextOutputStream {
|
|
func append(_ x: String) { base.append(x.toUpper()) }
|
|
var base: UnderlyingStream
|
|
}
|
|
|
|
However, upcasing is a trivial example: many such transformations--such
|
|
as ``trim()`` or regex replacement--are stateful, which implies some
|
|
way of indicating "end of input" so that buffered state can be
|
|
processed and written to the underlying stream:
|
|
|
|
.. parsed-literal::
|
|
|
|
struct TrimStream<UnderlyingStream:TextOutputStream> : TextOutputStream {
|
|
func append(_ x: String) { ... }
|
|
**func close() { ... }**
|
|
var base: UnderlyingStream
|
|
var bufferedWhitespace: String
|
|
}
|
|
|
|
This makes general ``TextOutputStream`` adapters more complicated to write
|
|
and use than ordinary ``TextOutputStream``\ s.
|
|
|
|
``TextOutputStreamable`` Adapters
|
|
.................................
|
|
|
|
For every conceivable ``TextOutputStream`` adaptor there's a corresponding
|
|
``TextOutputStreamable`` adaptor. For example::
|
|
|
|
struct UpperStreamable<UnderlyingStreamable : TextOutputStreamable> {
|
|
var base: UnderlyingStreamable
|
|
|
|
func writeTo<T: TextOutputStream>(_ target: [inout] T) {
|
|
var adaptedStream = UpperStream(target)
|
|
self.base.writeTo(&adaptedStream)
|
|
target = adaptedStream.base
|
|
}
|
|
}
|
|
|
|
Then, we could extend ``TextOutputStreamable`` as follows::
|
|
|
|
extension TextOutputStreamable {
|
|
typealias Upcased : TextOutputStreamable = UpperStreamable<This>
|
|
func toUpper() -> UpperStreamable<This> {
|
|
return Upcased(self)
|
|
}
|
|
}
|
|
|
|
and, finally, we'd be able to write:
|
|
|
|
.. parsed-literal::
|
|
|
|
print(n.format(radix:16)\ **.toUpper()**)
|
|
|
|
The complexity of this back-and-forth adapter dance is daunting, and
|
|
might well be better handled in the language once we have some formal
|
|
model--such as coroutines--of inversion-of-control. We think it makes
|
|
more sense to build the important transformations directly into
|
|
``format()`` methods, allowing, e.g.:
|
|
|
|
.. parsed-literal::
|
|
|
|
print(n.format(radix:16, **case:.upper**))
|
|
|
|
Possible Simplifications
|
|
------------------------
|
|
|
|
One obvious simplification might be to fearlessly use ``String`` as
|
|
the universal textual representation type, rather than having a
|
|
separate ``TextOutputStreamable`` protocol that doesn't necessarily
|
|
create a fully-stored representation. This approach would trade some
|
|
efficiency for considerable design simplicity. It is reasonable to
|
|
ask whether the efficiency cost would be significant in real cases,
|
|
and the truth is that we don't have enough information to know. At
|
|
least until we do, we opt not to trade away any CPU, memory, and
|
|
power.
|
|
|
|
If we were willing to say that only ``class``\ es can conform to
|
|
``TextOutputStream``, we could eliminate the explicit ``[inout]`` where
|
|
``TextOutputStream``\ s are passed around. Then, we'd simply need a
|
|
``class StringStream`` for creating ``String`` representations. It
|
|
would also make ``TextOutputStream`` adapters a *bit* simpler to use
|
|
because you'd never need to "write back" explicitly onto the target
|
|
stream. However, stateful ``TextOutputStream`` adapters would still need a
|
|
``close()`` method, which makes a perfect place to return a copy of
|
|
the underlying stream, which can then be "written back":
|
|
|
|
.. parsed-literal::
|
|
|
|
struct AdaptedStreamable<T : TextOutputStreamable> {
|
|
...
|
|
func writeTo<Target: TextOutputStream>(_ target: [inout] Target) {
|
|
// create the stream that transforms the representation
|
|
var adaptedTarget = adapt(target, adapter);
|
|
// write the Base object to the target stream
|
|
base.writeTo(&adaptedTarget)
|
|
// Flush the adapted stream and, in case Target is a value type,
|
|
// write its new value
|
|
**target = adaptedTarget.close()**
|
|
}
|
|
...
|
|
}
|
|
|
|
We think anyone writing such adapters can handle the need for explicit
|
|
write-back, and the ability to use ``String`` as an ``TextOutputStream``
|
|
without additionally allocating a ``StringStream`` on the heap seems
|
|
to tip the balance in favor of the current design.
|
|
|
|
--------
|
|
|
|
.. [#format] Whether ``format(...)`` is to be a real protocol or merely
|
|
an ad-hoc convention is TBD. So far, there's no obvious use for a
|
|
generic ``format`` with arguments that depend on the type being
|
|
formatted, so an ad-hoc convention would be just fine.
|
|
|
|
.. [#character1] We don't support streaming individual code points
|
|
directly because it's possible to create invalid sequences of code
|
|
points. For any code point that, on its own, represents a valid
|
|
``Character`` (a.k.a. Unicode `extended grapheme cluster`__), it is
|
|
trivial and inexpensive to create a ``String``. For more
|
|
information on the relationship between ``String`` and
|
|
``Character`` see the (forthcoming, as of this writing) document
|
|
*Swift Strings State of the Union*.
|
|
|
|
__ http://www.unicode.org/glossary/#extended_grapheme_cluster
|
|
|
|
.. [#dynamic] In fact it's possible to imagine a workable system for
|
|
localization that does away with dynamic format strings altogether,
|
|
so that all format strings are fully statically-checked and some of
|
|
the same formatting primitives can be used by localizers as by
|
|
fully-privileged Swift programmers. This approach would involve
|
|
compiling/JIT-ing localizations into dynamically-loaded modules.
|
|
In any case, that will wait until we have native Swift dylibs.
|
|
|
|
|