Fix undefined behavior in SmallString.withUTF8

withUTF8 currently vends a typed UInt8 pointer to the underlying
SmallString. That pointer type differs from SmallString's
representation. It should simply vend a raw pointer, which would be
both type safe and convenient for UTF8 data. However, since this
method is already @inlinable, I added calls to bindMemory to prevent
the optimizer from reasoning about access to the typed pointer that we
vend.

rdar://67983613 (Undefinied behavior in SmallString.withUTF8 is miscompiled)

Additional commentary:

SmallString creates a situation where there are two types, the
in-memory type, (UInt64, UInt64), vs. the element type,
UInt8. `UnsafePointer<T>` specifies the in-memory type of the pointee,
because that's how C works. If you want to specify an element type,
not the in-memory type, then you need to use something other than
UnsafePointer to view the memory. A trivial `BufferView<UInt8>` would
be fine, although, frankly, I think UnsafeRawPointer is a perfectly
good type on its own for UTF8 bytes.

Unfortunately, a lot of the UTF8 helper code is ABI-exposed, so to
work around this, we need to insert calls to bindMemory at strategic
points to avoid undefined behavior. This is high-risk and can
negatively affect performance. So far, I was able to resolve the
regressions in our microbenchmarks just by tweaking the inliner.
This commit is contained in:
Andrew Trick
2020-08-29 01:31:24 -07:00
parent 2544806bba
commit 5eafc20cdd
3 changed files with 31 additions and 15 deletions

View File

@@ -250,7 +250,9 @@ extension _StringGuts {
) -> Int? {
#if _runtime(_ObjC)
// Currently, foreign means NSString
if let res = _cocoaStringCopyUTF8(_object.cocoaObject, into: mbp) {
if let res = _cocoaStringCopyUTF8(_object.cocoaObject,
into: UnsafeMutableRawBufferPointer(start: mbp.baseAddress,
count: mbp.count)) {
return res
}