:orphan: .. @raise litre.TestsAreMissing ============================================== Strings, Mutability, and In-Place Operations ============================================== :Author: Dave Abrahams :Author: Joe Groff :Abstract: The design of Strings has revealed some misconceptions we held in the past, and leads us to a general design for handling in-place operations analgous to ``+=``. This paper discusses the thinking behind the current design and proposes a language extension for in-place operation support. String Mutation =============== Should Swift ``String``\ s be immutable? Even if the backing store is immutable, the values themselves could still be reassigned and swapped. Therefore, there's really no choice: ``String``\ s **are mutable**. We can also ask if it makes sense to *limit* mutations to those that can be expressed as wholesale assignments, but that question turns out to be meaningless, because *any* mutation of a ``String`` can be expressed in terms of a wholesale assignment. Even if we tried to impose an “assignment-only” limitation, I'd still be free to write:: extension String { func inplace_upper() { this = self.upper() } } The ``inplace_upper`` implementation above is semantically indistinguishable from one that's written in terms of by-part mutations. We never pass out *logical* references to the underlying string buffer—even though the buffer may be shared by many strings, each ``String`` instance presents an logically-independent value. In-Place Mutations ================== Once we allow assignment and concatenation via ``a.s1 + b.s2``\ —which creates a new ``String``\ —it clearly makes sense to also allow ``a.s1 += b.s2``\ —which modifies a ``String`` in place. However, there are many operations for which “create a new string” and “modify in place” variants both make sense, but don't have distinct, concise, accepted spellings. For example, does ``s.upper()`` modify ``s`` in-place, or does it create a new string value that can only be used to overwrite ``s`` via ``s = s.upper()``? .. Note:: We could also present both interfaces, using a canonical naming relationship for creating and mutating variants like the one we have for the (inplace) operators. We'll explore that approach—which has the obvious downside of complicating the API—after working through this one. .. _creating-or-mutating: Creating or Mutating? ===================== From a usability point-of-view, this question answers itself fairly easily. With a creating ``upper()``, we get:: var y = x.upper() // y is an upcased copy of x x = x.upper() // upcase x "in-place" var z = f().upper().split() // compose operations With a mutating ``upper()``, we get:: var y = x.copy() // y is going to be an upcased copy of x... y.upper() // ...eventually x.upper() // upcase x in place var z = f() // operations don't compose z.upper() z.split() The creating interfaces are a clear usability win. The minor inconvenience of assigning ``x.upper()`` into ``x`` is more than outweighed by the disadvantages of the mutating interface: 1. Verbosity 2. The need to introduce a named temporary 3. Spurious mutations of ``y`` and ``tmp``, which are conceptually costly. If we eventually get immutability in the type system, we still won't be able to label ``y`` immutable One could attempt to address the first two issues by making mutating operations chainable, but we believe that only replaces one set of problems with new ones. The third issue, we believe, is an inevitable symptom of using a mutating operation. The Argument for Mutating Operations ==================================== Although, if we had to choose, we would choose creating operations, there *are* good arguments for their mutating variants. For example, if you want to do an in-place modification on something that's verbose to access, :: some.thing().that_is.verbose().to_access.inplace_upper() is a lot cleaner than either of these approaches:: some.thing().that_is.verbose().to_access = some.thing().that_is.verbose().to_access.upper() var tmp = some.thing().that_is.verbose() tmp.to_access = tmp.to_access.upper() Furthermore, ``x = x.upper()`` causes an allocation/deallocation pair and data copying that can be avoided with a mutating interface and are are unlikely to be optimized away by even a clever compiler. .. Admonition:: It's not just about ``String``\ s We stipulate that it's possible in the compiler to implement special-case optimizations for ``String``, but all of these arguments apply to other types as well. We recommend getting the general feature we're proposing into the core language and leaving these optimizations to the library wherever possible. Copy On Write ============= Once we agree that mutating operations are viable, we can also agree that copy-on-write is a viable optimization for mutating operations in those cases where the string's buffer is uniquely referenced:: struct String { ... func inplace_upper() { self.unique() // copy buffer iff refcount > 1 for i in 0...buffer.length { buffer[i].inplace_upper() // naïve ASCII-only implementation } } ... } Ponies for Everyone! ==================== When considering ways to present both mutating and creating interfaces, we considered several possibilities. The leading candidates fell into two basic schemes: either use methods for one semantics and “free functions” for the other, or simply choose two different names. Using “Method-ness” to Distinguish Semantics -------------------------------------------- There are two choices. 1. “Methods Mutate”:: var y = upper(x) // creating x.upper() // mutating This approach fits with the OOP-ish expectation that methods have special privileges to mutate an instance. However, it sacrifices the ability to chain create methods, an important syntactic advantage. Instead we must use nested calls:: var z = split( trim( upper(x) ) ) // composition 2. “Methods Create”:: var y = x.upper() // creating upper(&x) // mutating var z = x.upper().trim().split() // composition Here, composition is nicer: it reads left-to-right and without conceptual nesting. That said, the prevalent mental association of methods with access control may make it harder for our audience to swallow, and it has the disadvantage that when you type “up” in an IDE, code completion will have to show you all the functions whose names begin with “up,” rather than just those that apply to ``String``. Tying Semantics to a Naming Convention -------------------------------------- The precedent for this approach has already been set by the binary operators. The only question is, what should the convention be? The two categories here are: 1. Mutating operations get the short name:: var y = x.uppered() // creating x.upper() // mutating var z = x.uppered().trimmed().splitted() // composed 2. Creating operations get the short name:: var y = x.upper() // creating x.inplace_upper() // mutating var z = x.upper().trim().split() // composed Because the creating interface is the right choice `in so many cases`__ and because it will appear repeatedly in a single statement compositions, we favor design #2. __ creating-or-mutating_ Optimization and Convenience ============================ We've discussed providing a means to automatically derive in-place assignment versions of operators from the creating operators, and vice versa. This provides a consistent interface to operators for free without boilerplate:: operator infix ☃ {} func ☃ (x:Int, y:Int) -> Int { ... } // Users want this to work... var x = 0 x ☃= 12 // ...without typing all this operator infix ☃= { assignment } func ☃=(x:[inout] Int, y:Int) { x = x ☃ y } We've also discussed teaching the compiler the relationship between value-creating and in-place forms of operators, so that it can optimize operations that take rvalues or kill lvalues into in-place operations on the user's behalf:: struct BigInt { ... } // Users want to write this: func foo(x:BigInt, y:BigInt, z:BigInt) -> BigInt { return x + y + z } // but want the perfomance of this: func fooʹ(x:BigInt, y:BigInt, z:BigInt) -> BigInt { var r = x r += y r += z return r } These same motivations extend to methods with in-place and value-creating variants. Methods such as ``str.upper()`` that return the same type as their ``this`` parameter can be derived from and optimized into ``str.inplace_upper()``, in the same way ``+`` can be from ``+=``. Enabling the in-place relationship ---------------------------------- For operators, we have the ``assignment`` attribute for in-place operators. We can extend this attribute to also specify the value-creating form of the operator:: operator infix += { // Assignment form of + assignment + } For methods, we propose tying the relationship to the ``inplace_*`` naming convention proposed for the standard library. That has the advantage of encouraging consistent coding standards and eliminating boilerplate entirely. Alternatively, if baking a naming convention into the compiler is unpalatable, we can use declaration attributes:: struct String { func [inplace_of=upper] inplace_upper() { ... } func [inplace=inplace_upper] upper() { ... } } Default implementations ----------------------- When an in-place relationship is created, a definition matching either the in-place or value-creating form introduces an implicit definition of the other form:: func += (x:[inout] String, y:String) { ... } // Implicitly defines func + (x:String, y:String) -> String func + (x:Int, y:Int) -> Int { ... } // Implicitly defines func += (x:[inout] Int, y:Int) -> () struct String { func upper() -> String { ... } // Implicitly defines inplace_upper() -> () } struct Stringʹ { func inplace_upper() { ... } // Implicitly defines upper() -> Stringʹ } Both forms can also be explicitly defined if desired. The implicit value-creating definition copies its left argument and applies the in-place form, as if written:: func + (x:String, y:String) -> String { var r = x x += y return r } extension Stringʹ { func upper() -> Stringʹ { var r = this r.inplace_upper() return r } } The implicit in-place form applies the value-creating form to its arguments and assigns the result to its left argument, as if written:: func += (x:[inout] Int, y:Int) { x = x + y } extension String { func inplace_upper() { this = self.upper() } } Optimizations ------------- The compiler should be allowed to exploit the in-place relationship to optimize code. Some obvious optimization opportunities include: * Code that performs in-place assignment using value-creating forms, such as ``x = x + y`` or ``s = s.upper()``, can be transformed to use the in-place form. * Compound expressions can be written in terms of value-creating forms, with the compiler transforming operations on rvalues into in-place operations. * If the last use of an lvalue is as an argument to an operation with an in-place form, that operation can be turned into the in-place form.