mirror of
https://github.com/apple/swift.git
synced 2025-12-14 20:36:38 +01:00
380 lines
13 KiB
ReStructuredText
380 lines
13 KiB
ReStructuredText
:orphan:
|
|
|
|
.. _ValueSemantics:
|
|
|
|
==========================
|
|
Value Semantics in Swift
|
|
==========================
|
|
|
|
:Abstract: Swift is the first language to take Generic Programming
|
|
seriously that also has both value and reference types. The
|
|
(sometimes subtle) differences between the behaviors of value and
|
|
reference types create unique challenges for generic programs that we
|
|
have not yet addressed. This paper surveys related problems
|
|
and explores some possible solutions.
|
|
|
|
|
|
Definitions
|
|
===========
|
|
|
|
I propose the following definitions of "value semantics" and
|
|
"reference semantics."
|
|
|
|
Value Semantics
|
|
---------------
|
|
|
|
For a type with value semantics, variable initialization, assignment,
|
|
and argument-passing (hereafter, "the big three operations") each
|
|
create an *independently modifiable copy* of the source value that is
|
|
*interchangeable with the source*. [#interchange]_
|
|
|
|
If ``T`` has value semantics, the ``f``\ s below are all equivalent::
|
|
|
|
func f1() -> T {
|
|
var x : T
|
|
return x
|
|
}
|
|
|
|
func f2() -> T {
|
|
var x : T
|
|
var y = x
|
|
return y // a copy of x is equivalent to x
|
|
}
|
|
|
|
func f2a() -> T {
|
|
var x : T
|
|
var y : T
|
|
y = x
|
|
return y // a copy of x is equivalent to x
|
|
}
|
|
|
|
func f3() -> T {
|
|
var x : T
|
|
var y = x
|
|
y.mutate() // a copy of x is modifiable
|
|
return x // without affecting x
|
|
}
|
|
|
|
func f3a() -> T {
|
|
var x : T
|
|
var y : T
|
|
y = x;
|
|
y.mutate() // a copy of x is modifiable
|
|
return x // without affecting x
|
|
}
|
|
|
|
func g(_ x : T) { x.mutate() }
|
|
|
|
func f4() -> T {
|
|
var x : T
|
|
g(x) // when x is passed by-value the copy
|
|
return x // is modifiable without affecting x
|
|
}
|
|
|
|
|
|
Reference Semantics
|
|
-------------------
|
|
|
|
Values of a type with reference semantics are only accessible
|
|
indirectly, via a reference. Although swift tries to hide this fact
|
|
for simplicity, for the purpose of this discussion it is important to
|
|
note that there are always *two* values in play: the value of the
|
|
reference itself and that of the object being referred to (a.k.a. the
|
|
target).
|
|
|
|
The programmer thinks of the target's value as primary, but it is
|
|
never used as a variable initializer, assigned, or passed as a
|
|
function argument. Conversely, the reference itself has full value
|
|
semantics, but the user never sees or names its type. The programmer
|
|
expects that copies of a reference share a target object, so
|
|
modifications made via one copy are visible as side-effects through
|
|
the others.
|
|
|
|
If ``T`` has reference semantics, the ``f``\ s below are all
|
|
equivalent::
|
|
|
|
func f1(_ x: T) {
|
|
x.mutate()
|
|
return x
|
|
}
|
|
|
|
func f2(_ x: T) -> T {
|
|
var y = x
|
|
y.mutate() // mutation through a copy of x
|
|
return x // is visible through x
|
|
}
|
|
|
|
func f2a(_ x: T) -> T {
|
|
var y : T
|
|
y = x
|
|
y.mutate() // mutation through a copy of x
|
|
return x // is visible through x
|
|
}
|
|
|
|
func g(_ x : T) { x.mutate() }
|
|
|
|
func f3(_ x: T) -> T {
|
|
g(x) // when x is passed to a function, mutation
|
|
return x // through the parameter is visible through x
|
|
}
|
|
|
|
The Role of Mutation
|
|
--------------------
|
|
|
|
It's worth noting that in the absence of mutation, value semantics and
|
|
reference semantics are indistinguishable. You can easily prove that
|
|
to yourself by striking the calls to ``mutate()`` in each of the
|
|
previous examples, and seeing that the equivalences hold for any type.
|
|
In fact, the fundamental difference between reference and value
|
|
semantics is that **value semantics never creates multiple paths to
|
|
the same mutable state**. [#cow]_
|
|
|
|
.. Admonition:: ``struct`` vs ``class``
|
|
|
|
Although ``struct``\ s were designed to support value semantics and
|
|
``class``\ es were designed to support reference semantics, it would
|
|
be wrong to assume that they are always used that way. As noted
|
|
earlier, in the absence of mutation, value semantics and reference
|
|
semantics are indistinguishable. Therefore, any immutable ``class``
|
|
trivially has value semantics (*and* reference semantics).
|
|
|
|
Second, it's easy to implement a ``struct`` with reference semantics:
|
|
simply keep the primary value in a ``class`` and refer to it through
|
|
an instance variable. So, one cannot assume that a ``struct`` type
|
|
has value semantics. ``Array`` could be seen (depending on how you
|
|
view its value) as an example of a reference-semantics ``struct``
|
|
from the standard library.
|
|
|
|
The Problem With Generics
|
|
=========================
|
|
|
|
The classic Liskov principle says the semantics of operations on
|
|
``Duck``\ 's subtypes need to be consistent with those on ``Duck`` itself,
|
|
so that functions operating on ``Duck``\ s still "work" when passed a
|
|
``Mallard``. More generally, for a function to make meaningful
|
|
guarantees, the semantics of its sub-operations need to be consistent
|
|
regardless of the actual argument types passed.
|
|
|
|
The type of an argument passed by-value to an ordinary function is
|
|
fully constrained, so the "big three" have knowable semantics. The
|
|
type of an ordinary argument passed by-reference is constrained by
|
|
subtype polymorphism, where a (usually implicit) contract between
|
|
base- and sub-types can dictate consistency.
|
|
|
|
However, the situation is different for functions with arguments of
|
|
protocol or parameterized type. In the absence of specific
|
|
constraints to the contrary, the semantics of the big three can vary.
|
|
|
|
Example
|
|
-------
|
|
|
|
For example, there's an algorithm called ``cycle_length`` that
|
|
measures the length of a cycle of states (e.g. the states of a
|
|
pseudo-random number generator). It needs to make one copy and do
|
|
in-place mutation of the state, rather than wholesale value
|
|
replacement via assignment, which might be expensive.
|
|
|
|
Here's a version of cycle_length that works when state is a mutable
|
|
value type::
|
|
|
|
func cycle_length<State>(
|
|
_ s : State, mutate : ([inout] State) -> ()
|
|
) -> Int
|
|
requires State : EqualityComparable
|
|
{
|
|
State x = s // one copy // 1
|
|
mutate(&x) // in-place mutation
|
|
Int n = 1
|
|
while x != s { // 2
|
|
mutate(&x) // in-place mutation
|
|
++n
|
|
}
|
|
return n
|
|
}
|
|
|
|
The reason the above breaks when the state is in a class instance is
|
|
that the intended copy in line 1 instead creates a new reference to
|
|
the same state, and the comparison in line 2 (regardless of whether we
|
|
decide ``!=`` does "identity" or "value" comparison) always succeeds.
|
|
|
|
You can write a different implementation that only works on clonable
|
|
classes:
|
|
|
|
.. parsed-literal::
|
|
|
|
// Various random number generators will implement this interface
|
|
abstract class RandomNumberGenerator
|
|
: Clonable, Equalable
|
|
{
|
|
func nextValue() -> Int
|
|
}
|
|
|
|
func cycle_length<State>(
|
|
_ s : State, mutate : ([inout] State) -> ()
|
|
) -> Int
|
|
requires State : EqualityComparable, **Clonable**
|
|
{
|
|
State x = s\ **.clone()**
|
|
Int n = 1
|
|
while **! x.equal(s)** {
|
|
*etc.*
|
|
}
|
|
|
|
RandomNumberGenerator x = new MersenneTwister()
|
|
print(
|
|
cycle_length(x, (x : [inout] RandomNumberGenerator) { x.nextValue() })
|
|
)
|
|
|
|
You could also redefine the interface so that it works on both values and
|
|
clonable classes:
|
|
|
|
.. parsed-literal::
|
|
|
|
func cycle_length<State>(
|
|
_ s : State,
|
|
**next : (x : State) -> State,**
|
|
**equal : (x : [inout] State, y : [inout] State) -> Bool**
|
|
) -> Int
|
|
requires State : EqualityComparable
|
|
{
|
|
State **x = next(s)**
|
|
Int n = 1
|
|
while **!equal(x, s)** {
|
|
**x = next(x)**
|
|
++n
|
|
}
|
|
return n
|
|
}
|
|
|
|
However, this implementation makes O(N) separate copies of the state.
|
|
I don't believe there's a reasonable way write this so it works on
|
|
clonable classes, non-classes, and avoids the O(N)
|
|
copies. [#extension]_
|
|
|
|
Class Identities are Values
|
|
---------------------------
|
|
|
|
It's important to note that the first implementation of
|
|
``cycle_length`` works when the state is the *identity*, rather than
|
|
the *contents* of a class instance. For example, imagine a circular
|
|
linked list::
|
|
|
|
class Node {
|
|
constructor(Int) { next = this; prev = this }
|
|
|
|
// link two circular lists into one big cycle.
|
|
func join(_ otherNode : Node) -> () { ... }
|
|
|
|
var next : WeakRef<Node> // identity of next node
|
|
var prev : WeakRef<Node> // identity of previous node
|
|
}
|
|
|
|
We can measure the length of a cycle in these nodes as follows::
|
|
|
|
cycle_length(someNode, (x: [inout] Node){ x = x.next })
|
|
|
|
This is why so many generic algorithms seem to work on both
|
|
``class``\ es and non-``class``\ es: ``class`` *identities*
|
|
work just fine as values.
|
|
|
|
The Role of Moves
|
|
=================
|
|
|
|
Further complicating matters is the fact that the big three operations
|
|
can be--and often are--combined in ways that mask the value/reference
|
|
distinction. In fact both of the following must be present in order
|
|
to observe a difference in behavior:
|
|
|
|
1. Use of (one of) the big three operations on an object ``x``,
|
|
creating shared mutable state iff ``x`` is a reference
|
|
|
|
2. In-place mutation of ``x`` *while a (reference) copy is extant* and
|
|
thus can be observed through the copy iff ``x`` is a reference.
|
|
|
|
Take, for example, ``swap``, which uses variable initialization and
|
|
assignment to exchange two values::
|
|
|
|
func swap<T>(_ lhs : [inout] T, rhs : [inout] T)
|
|
{
|
|
var tmp = lhs // big 3: initialization - ref copy in tmp
|
|
lhs = rhs // big 3: assignment - ref copy in lhs
|
|
rhs = tmp // big 3: assignment - no ref copies remain
|
|
}
|
|
|
|
Whether ``T`` is a reference type makes no observable difference in
|
|
the behavior of ``swap``. Why? Because although ``swap`` makes
|
|
reference copies to mutable state, the existence of those copies is
|
|
encapsulated within the algorithm, and it makes no in-place mutations.
|
|
|
|
Any such algorithm can be implemented such that copy operations are
|
|
replaced by destructive *moves*, where the source value is not
|
|
(necessarily) preserved. Because movability is a weaker requirement
|
|
than copyability, it's reasonable to say that ``swap`` is built on
|
|
*moves*, rather than copies, in the same way that C++'s ``std::find``
|
|
is built on input iterators rather than on forward iterators.
|
|
|
|
We could imagine a hypothetical syntax for moving in swift, where
|
|
(unlike assignment) the value of the right-hand-side of the ``<-`` is
|
|
not necessarily preserved::
|
|
|
|
var tmp <- lhs
|
|
lhs <- rhs
|
|
rhs <- tmp
|
|
|
|
Such operations are safe to use in generic code without regard to the
|
|
differences between value- and reference- semantics. If this syntax
|
|
were extended to handle function arguments, it would cover the "big
|
|
three" operations::
|
|
|
|
f(<-x)
|
|
|
|
How to Build an Interesting Type with Value Semantics
|
|
=====================================================
|
|
|
|
Suppose we want to build a variable-sized data structure ``X`` with
|
|
(mutable) value semantics? How do we do it?
|
|
|
|
If we make ``X` a ``class``, we automatically get reference semantics, so
|
|
its value must be copied before each mutation, which is tedious and
|
|
error-prone. Its public mutating interface must be in terms of free
|
|
functions (not methods), so that the original reference value can be
|
|
passed ``[inout]`` and overwritten. Since there's no user access to the
|
|
reference count, we can't determine that we hold the only reference to
|
|
the value, so we can't optimize copy-on-write, even in single-threaded
|
|
programs. In multi-threaded programs, where each mutation implies
|
|
synchronization on the reference count, the costs are even higher.
|
|
|
|
If we make the type a ``struct``, you have only two ways to create
|
|
variable-sized data:
|
|
|
|
1. Hold a type with reference semantics as an instance variable.
|
|
Unfortunately, this is really nothing new; we must still implement
|
|
copy-on-write. We can, however, use methods for mutation in lieu
|
|
of free functions.
|
|
|
|
2. Use discriminated unions (``union``). Interestingly, a datatype
|
|
built with ``union`` automatically has value semantics. However,
|
|
there vocabulary of efficient data structures that can be built
|
|
this way is extremely limited. For example, while a singly-linked
|
|
list is trivial to implement, an efficient doubly-linked list is
|
|
effectively impossible.
|
|
|
|
----
|
|
|
|
.. [#interchange] Technically, copies of objects with value semantics
|
|
are interchangeable until they're mutated.
|
|
Thereafter, the copies are interchangeable except
|
|
insofar as it matters what value type they are
|
|
*aggregated into*.
|
|
|
|
.. [#cow] Note that this definition *does* allow for value semantics
|
|
using copy-on-write
|
|
|
|
.. [#extension] I can think of a language extension that would allow
|
|
this, but it requires creating a protocol for generic
|
|
copying, adding compiler magic to get both classes and
|
|
structs to conform to it, and telling generic
|
|
algorithm and container authors to use that protocol
|
|
instead of ``=``, which IMO is really ugly and
|
|
probably not worth the cost.
|