Files
swift-mirror/docs/proposals/Inplace.rst
Dave Abrahams 6f03d836eb [stdlib] Half-open ranges are now spelled x...y
Fully-closed ranges are coming soon.  See the release notes for details.
Implements the "hard" part of <rdar://problem/14586400>

Swift SVN r13674
2014-02-08 05:37:57 +00:00

358 lines
12 KiB
ReStructuredText
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
:orphan:
.. @raise litre.TestsAreMissing
==============================================
Strings, Mutability, and In-Place Operations
==============================================
:Author: Dave Abrahams
:Author: Joe Groff
:Abstract: The design of Strings has revealed some misconceptions
we held in the past, and leads us to a general design for handling
in-place operations analgous to ``+=``. This paper discusses the
thinking behind the current design and proposes a language extension
for in-place operation support.
String Mutation
===============
Should Swift ``String``\ s be immutable? Even if the backing store is
immutable, the values themselves could still be reassigned and
swapped. Therefore, there's really no choice: ``String``\ s **are
mutable**.
We can also ask if it makes sense to *limit* mutations to those that
can be expressed as wholesale assignments, but that question turns out
to be meaningless, because *any* mutation of a ``String`` can be
expressed in terms of a wholesale assignment. Even if we tried to
impose an “assignment-only” limitation, I'd still be free to write::
extension String {
func inplace_upper() {
this = self.upper()
}
}
The ``inplace_upper`` implementation above is semantically
indistinguishable from one that's written in terms of by-part
mutations. We never pass out *logical* references to the underlying
string buffer—even though the buffer may be shared by many strings,
each ``String`` instance presents an logically-independent value.
In-Place Mutations
==================
Once we allow assignment and concatenation via ``a.s1 + b.s2``\ —which
creates a new ``String``\ —it clearly makes sense to also allow ``a.s1
+= b.s2``\ —which modifies a ``String`` in place. However, there are
many operations for which “create a new string” and “modify in place”
variants both make sense, but don't have distinct, concise, accepted
spellings. For example, does ``s.upper()`` modify ``s`` in-place, or
does it create a new string value that can only be used to overwrite
``s`` via ``s = s.upper()``?
.. Note:: We could also present both interfaces, using a canonical
naming relationship for creating and mutating variants
like the one we have for the (inplace) operators. We'll
explore that approach—which has the obvious downside of
complicating the API—after working through this one.
.. _creating-or-mutating:
Creating or Mutating?
=====================
From a usability point-of-view, this question answers itself fairly
easily. With a creating ``upper()``, we get::
var y = x.upper() // y is an upcased copy of x
x = x.upper() // upcase x "in-place"
var z = f().upper().split() // compose operations
With a mutating ``upper()``, we get::
var y = x.copy() // y is going to be an upcased copy of x...
y.upper() // ...eventually
x.upper() // upcase x in place
var z = f() // operations don't compose
z.upper()
z.split()
The creating interfaces are a clear usability win. The minor
inconvenience of assigning ``x.upper()`` into ``x`` is more than
outweighed by the disadvantages of the mutating interface:
1. Verbosity
2. The need to introduce a named temporary
3. Spurious mutations of ``y`` and ``tmp``, which are conceptually
costly. If we eventually get immutability in the type system,
we still won't be able to label ``y`` immutable
One could attempt to address the first two issues by making mutating
operations chainable, but we believe that only replaces one set of
problems with new ones. The third issue, we believe, is an inevitable
symptom of using a mutating operation.
The Argument for Mutating Operations
====================================
Although, if we had to choose, we would choose creating operations,
there *are* good arguments for their mutating variants. For example,
if you want to do an in-place modification on something that's verbose
to access, ::
some.thing().that_is.verbose().to_access.inplace_upper()
is a lot cleaner than either of these approaches::
some.thing().that_is.verbose().to_access
= some.thing().that_is.verbose().to_access.upper()
var tmp = some.thing().that_is.verbose()
tmp.to_access = tmp.to_access.upper()
Furthermore, ``x = x.upper()`` causes an allocation/deallocation pair
and data copying that can be avoided with a mutating interface
and are are unlikely to be optimized away by even a clever compiler.
.. Admonition:: It's not just about ``String``\ s
We stipulate that it's possible in the compiler to implement
special-case optimizations for ``String``, but all of these
arguments apply to other types as well. We recommend getting the
general feature we're proposing into the core language and leaving
these optimizations to the library wherever possible.
Copy On Write
=============
Once we agree that mutating operations are viable, we can also agree
that copy-on-write is a viable optimization for mutating operations in
those cases where the string's buffer is uniquely referenced::
struct String {
...
func inplace_upper() {
self.unique() // copy buffer iff refcount > 1
for i in 0...buffer.length {
buffer[i].inplace_upper() // naïve ASCII-only implementation
}
}
...
}
Ponies for Everyone!
====================
When considering ways to present both mutating and creating
interfaces, we considered several possibilities. The leading
candidates fell into two basic schemes: either use methods for one
semantics and “free functions” for the other, or simply choose two
different names.
Using “Method-ness” to Distinguish Semantics
--------------------------------------------
There are two choices.
1. “Methods Mutate”::
var y = upper(x) // creating
x.upper() // mutating
This approach fits with the OOP-ish expectation that methods have
special privileges to mutate an instance. However, it sacrifices
the ability to chain create methods, an important syntactic
advantage. Instead we must use nested calls::
var z = split( trim( upper(x) ) ) // composition
2. “Methods Create”::
var y = x.upper() // creating
upper(&x) // mutating
var z = x.upper().trim().split() // composition
Here, composition is nicer: it reads left-to-right and without
conceptual nesting. That said, the prevalent mental association of
methods with access control may make it harder for our audience to
swallow, and it has the disadvantage that when you type “up” in an
IDE, code completion will have to show you all the functions whose
names begin with “up,” rather than just those that apply to
``String``.
Tying Semantics to a Naming Convention
--------------------------------------
The precedent for this approach has already been set by the binary
operators. The only question is, what should the convention be? The
two categories here are:
1. Mutating operations get the short name::
var y = x.uppered() // creating
x.upper() // mutating
var z = x.uppered().trimmed().splitted() // composed
2. Creating operations get the short name::
var y = x.upper() // creating
x.inplace_upper() // mutating
var z = x.upper().trim().split() // composed
Because the creating interface is the right choice `in so many
cases`__ and because it will appear repeatedly in a single statement
compositions, we favor design #2.
__ creating-or-mutating_
Optimization and Convenience
============================
We've discussed providing a means to automatically derive in-place assignment
versions of operators from the creating operators, and vice
versa. This provides a consistent interface to operators for free without
boilerplate::
operator infix ☃ {}
func ☃ (x:Int, y:Int) -> Int { ... }
// Users want this to work...
var x = 0
x ☃= 12
// ...without typing all this
operator infix ☃= { assignment }
func ☃=(x:[inout] Int, y:Int) {
x = x ☃ y
}
We've also discussed teaching the compiler the relationship between
value-creating and in-place forms of operators, so that it can optimize
operations that take rvalues or kill lvalues into in-place operations on the
user's behalf::
struct BigInt { ... }
// Users want to write this:
func foo(x:BigInt, y:BigInt, z:BigInt) -> BigInt {
return x + y + z
}
// but want the perfomance of this:
func fooʹ(x:BigInt, y:BigInt, z:BigInt) -> BigInt {
var r = x
r += y
r += z
return r
}
These same motivations extend to methods with in-place and value-creating
variants. Methods such as ``str.upper()`` that return the same type as their
``this`` parameter can be derived from and optimized into
``str.inplace_upper()``, in the same way ``+`` can be from ``+=``.
Enabling the in-place relationship
----------------------------------
For operators, we have the ``assignment`` attribute for in-place
operators. We can extend this attribute to also specify the value-creating form
of the operator::
operator infix += {
// Assignment form of +
assignment +
}
For methods, we propose tying the relationship to the ``inplace_*`` naming
convention proposed for the standard library. That has the advantage of
encouraging consistent coding standards and eliminating boilerplate entirely.
Alternatively, if baking a naming convention into the compiler is unpalatable,
we can use declaration attributes::
struct String {
func [inplace_of=upper] inplace_upper() { ... }
func [inplace=inplace_upper] upper() { ... }
}
Default implementations
-----------------------
When an in-place relationship is created, a definition matching either the
in-place or value-creating form introduces an implicit definition of the other
form::
func += (x:[inout] String, y:String) { ... }
// Implicitly defines func + (x:String, y:String) -> String
func + (x:Int, y:Int) -> Int { ... }
// Implicitly defines func += (x:[inout] Int, y:Int) -> ()
struct String {
func upper() -> String { ... }
// Implicitly defines inplace_upper() -> ()
}
struct Stringʹ {
func inplace_upper() { ... }
// Implicitly defines upper() -> Stringʹ
}
Both forms can also be explicitly defined if desired.
The implicit value-creating definition copies its left argument and applies the
in-place form, as if written::
func + (x:String, y:String) -> String {
var r = x
x += y
return r
}
extension Stringʹ {
func upper() -> Stringʹ {
var r = this
r.inplace_upper()
return r
}
}
The implicit in-place form applies the value-creating form to its arguments and
assigns the result to its left argument, as if written::
func += (x:[inout] Int, y:Int) {
x = x + y
}
extension String {
func inplace_upper() {
this = self.upper()
}
}
Optimizations
-------------
The compiler should be allowed to exploit the in-place relationship to optimize
code. Some obvious optimization opportunities include:
* Code that performs in-place assignment using value-creating forms, such as
``x = x + y`` or ``s = s.upper()``, can be transformed to use the in-place
form.
* Compound expressions can be written in terms of value-creating forms, with
the compiler transforming operations on rvalues into in-place operations.
* If the last use of an lvalue is as an argument to an operation with an
in-place form, that operation can be turned into the in-place form.