:orphan:

.. @raise litre.TestsAreMissing

==============================================
 Strings, Mutability, and In-Place Operations
==============================================

:Author: Dave Abrahams
:Author: Joe Groff

:Abstract: The design of Strings has revealed some misconceptions
  we held in the past, and leads us to a general design for handling
  in-place operations analgous to ``+=``.  This paper discusses the
  thinking behind the current design and proposes a language extension
  for in-place operation support.

String Mutation
===============

Should Swift ``String``\ s be immutable? Even if the backing store is
immutable, the values themselves could still be reassigned and
swapped.  Therefore, there's really no choice: ``String``\ s **are
mutable**.

We can also ask if it makes sense to *limit* mutations to those that
can be expressed as wholesale assignments, but that question turns out
to be meaningless, because *any* mutation of a ``String`` can be
expressed in terms of a wholesale assignment.  Even if we tried to
impose an “assignment-only” limitation, I'd still be free to write::

  extension String {
    func inplace_upper() {
        this = self.upper()
    }
  }

The ``inplace_upper`` implementation above is semantically
indistinguishable from one that's written in terms of by-part
mutations.  We never pass out *logical* references to the underlying
string buffer—even though the buffer may be shared by many strings,
each ``String`` instance presents an logically-independent value.

In-Place Mutations
==================

Once we allow assignment and concatenation via ``a.s1 + b.s2``\ —which
creates a new ``String``\ —it clearly makes sense to also allow ``a.s1
+= b.s2``\ —which modifies a ``String`` in place.  However, there are
many operations for which “create a new string” and “modify in place”
variants both make sense, but don't have distinct, concise, accepted
spellings.  For example, does ``s.upper()`` modify ``s`` in-place, or
does it create a new string value that can only be used to overwrite
``s`` via ``s = s.upper()``?

.. Note:: We could also present both interfaces, using a canonical
          naming relationship for creating and mutating variants
          like the one we have for the (inplace) operators.  We'll
          explore that approach—which has the obvious downside of
          complicating the API—after working through this one.

.. _creating-or-mutating:

Creating or Mutating?
=====================

From a usability point-of-view, this question answers itself fairly
easily.  With a creating ``upper()``, we get::

  var y = x.upper()           // y is an upcased copy of x

  x = x.upper()               // upcase x "in-place"

  var z = f().upper().split() // compose operations

With a mutating ``upper()``, we get::

  var y = x.copy()   // y is going to be an upcased copy of x...
  y.upper()          // ...eventually

  x.upper()          // upcase x in place

  var z = f()      // operations don't compose
  z.upper()
  z.split()

The creating interfaces are a clear usability win.  The minor
inconvenience of assigning ``x.upper()`` into ``x`` is more than
outweighed by the disadvantages of the mutating interface:

1. Verbosity

2. The need to introduce a named temporary

3. Spurious mutations of ``y`` and ``tmp``, which are conceptually
   costly.  If we eventually get immutability in the type system,
   we still won't be able to label ``y`` immutable

One could attempt to address the first two issues by making mutating
operations chainable, but we believe that only replaces one set of
problems with new ones.  The third issue, we believe, is an inevitable
symptom of using a mutating operation.

The Argument for Mutating Operations
====================================

Although, if we had to choose, we would choose creating operations,
there *are* good arguments for their mutating variants.  For example,
if you want to do an in-place modification on something that's verbose
to access, ::

   some.thing().that_is.verbose().to_access.inplace_upper()

is a lot cleaner than either of these approaches::

   some.thing().that_is.verbose().to_access
   = some.thing().that_is.verbose().to_access.upper()

   var tmp = some.thing().that_is.verbose()
   tmp.to_access = tmp.to_access.upper()

Furthermore, ``x = x.upper()`` causes an allocation/deallocation pair
and data copying that can be avoided with a mutating interface
and are are unlikely to be optimized away by even a clever compiler.

.. Admonition:: It's not just about ``String``\ s

   We stipulate that it's possible in the compiler to implement
   special-case optimizations for ``String``, but all of these
   arguments apply to other types as well.  We recommend getting the
   general feature we're proposing into the core language and leaving
   these optimizations to the library wherever possible.

Copy On Write
=============

Once we agree that mutating operations are viable, we can also agree
that copy-on-write is a viable optimization for mutating operations in
those cases where the string's buffer is uniquely referenced::

  struct String {
    ...

    func inplace_upper() {
      self.unique()                  // copy buffer iff refcount > 1
      for i in 0...buffer.length {
        buffer[i].inplace_upper()    // naïve ASCII-only implementation
      }
    }
    ...

  }

Ponies for Everyone!
====================

When considering ways to present both mutating and creating
interfaces, we considered several possibilities.  The leading
candidates fell into two basic schemes: either use methods for one
semantics and “free functions” for the other, or simply choose two
different names.

Using “Method-ness” to Distinguish Semantics
--------------------------------------------

There are two choices.

1. “Methods Mutate”::

     var y = upper(x)     // creating
     x.upper()            // mutating

   This approach fits with the OOP-ish expectation that methods have
   special privileges to mutate an instance.  However, it sacrifices
   the ability to chain create methods, an important syntactic
   advantage.  Instead we must use nested calls::

    var z = split( trim( upper(x) ) ) // composition

2. “Methods Create”::

     var y = x.upper()                // creating
     upper(&x)                        // mutating
     var z = x.upper().trim().split() // composition

   Here, composition is nicer: it reads left-to-right and without
   conceptual nesting.  That said, the prevalent mental association of
   methods with access control may make it harder for our audience to
   swallow, and it has the disadvantage that when you type “up” in an
   IDE, code completion will have to show you all the functions whose
   names begin with “up,” rather than just those that apply to
   ``String``.

Tying Semantics to a Naming Convention
--------------------------------------

The precedent for this approach has already been set by the binary
operators.  The only question is, what should the convention be?  The
two categories here are:

1. Mutating operations get the short name::

     var y = x.uppered()                      // creating
     x.upper()                                // mutating
     var z = x.uppered().trimmed().splitted() // composed
     
2. Creating operations get the short name::

     var y = x.upper()                // creating
     x.inplace_upper()                // mutating
     var z = x.upper().trim().split() // composed

Because the creating interface is the right choice `in so many
cases`__ and because it will appear repeatedly in a single statement
compositions, we favor design #2.

__ creating-or-mutating_

Optimization and Convenience
============================

We've discussed providing a means to automatically derive in-place assignment
versions of operators from the creating operators, and vice
versa. This provides a consistent interface to operators for free without
boilerplate::

      operator infix ☃ {}
      func ☃ (x:Int, y:Int) -> Int { ... }

      // Users want this to work...
      var x = 0
      x ☃= 12

      // ...without typing all this
      operator infix ☃= { assignment }
      func ☃=(x:[inout] Int, y:Int) {
        x = x ☃ y
      }

We've also discussed teaching the compiler the relationship between
value-creating and in-place forms of operators, so that it can optimize
operations that take rvalues or kill lvalues into in-place operations on the
user's behalf::

      struct BigInt { ... }
  
      // Users want to write this:
      func foo(x:BigInt, y:BigInt, z:BigInt) -> BigInt {
        return x + y + z
      }
  
      // but want the perfomance of this:
      func fooʹ(x:BigInt, y:BigInt, z:BigInt) -> BigInt {
        var r = x
        r += y
        r += z
        return r
      }

These same motivations extend to methods with in-place and value-creating
variants. Methods such as ``str.upper()`` that return the same type as their
``this`` parameter can be derived from and optimized into
``str.inplace_upper()``, in the same way ``+`` can be from ``+=``.

Enabling the in-place relationship
----------------------------------

For operators, we have the ``assignment`` attribute for in-place
operators. We can extend this attribute to also specify the value-creating form
of the operator::

      operator infix += {
        // Assignment form of +
        assignment +
      }

For methods, we propose tying the relationship to the ``inplace_*`` naming
convention proposed for the standard library. That has the advantage of
encouraging consistent coding standards and eliminating boilerplate entirely.

Alternatively, if baking a naming convention into the compiler is unpalatable,
we can use declaration attributes::

      struct String {
        func [inplace_of=upper] inplace_upper() { ... }
        func [inplace=inplace_upper] upper() { ... }
      }

Default implementations
-----------------------

When an in-place relationship is created, a definition matching either the
in-place or value-creating form introduces an implicit definition of the other
form::

      func += (x:[inout] String, y:String) { ... }
      // Implicitly defines func + (x:String, y:String) -> String

      func + (x:Int, y:Int) -> Int { ... }
      // Implicitly defines func += (x:[inout] Int, y:Int) -> ()

      struct String {
        func upper() -> String { ... }
        // Implicitly defines inplace_upper() -> ()
      }

      struct Stringʹ {
        func inplace_upper() { ... }
        // Implicitly defines upper() -> Stringʹ
      }

Both forms can also be explicitly defined if desired.

The implicit value-creating definition copies its left argument and applies the
in-place form, as if written::

      func + (x:String, y:String) -> String {
        var r = x
        x += y
        return r
      }

      extension Stringʹ {
        func upper() -> Stringʹ {
          var r = this
          r.inplace_upper()
          return r
        }
      }

The implicit in-place form applies the value-creating form to its arguments and
assigns the result to its left argument, as if written::

      func += (x:[inout] Int, y:Int) {
        x = x + y
      }

      extension String {
        func inplace_upper() {
          this = self.upper()
        }
      }

Optimizations
-------------

The compiler should be allowed to exploit the in-place relationship to optimize
code. Some obvious optimization opportunities include:

* Code that performs in-place assignment using value-creating forms, such as
  ``x = x + y`` or ``s = s.upper()``, can be transformed to use the in-place
  form.
* Compound expressions can be written in terms of value-creating forms, with
  the compiler transforming operations on rvalues into in-place operations.
* If the last use of an lvalue is as an argument to an operation with an
  in-place form, that operation can be turned into the in-place form.