This brings the David Owens benchmark from http://owensd.io/2015/06/27/performance-xcode7-beta-2.html from parity with simd.h-based C to 3x faster.
Before:
RenderGradient ([UInt32].withUnsafeMutablePointer (SIMD)) │ 7.035851 │ 6.304739 │ 9.815832 │ 1.212 │
After:
RenderGradient ([UInt32].withUnsafeMutablePointer (SIMD)) │ 2.318357 │ 2.223325 │ 2.697981 │ 0.1490 │
This also addresses rdar://problem/21574425, since Builtin.add_VecNxIntM isn't overflow-checked, and overflow checks really aren't wanted when working with vector types directly.
Reapplying now that Nadav's fixed the ARM64 SelectionDAG issue this exposed before.
Swift SVN r29922