Example of Sharing Memory between GPU and CPU with Swift and Metal for iOS8

1. Background

In two prior postings I presented i) data-parallel Swift/Metal example and ii) benchmark (compared to Accelerate) of how to use Swift combined with Metal for General Purpose GPU based processing on iOS8 devices (e.g. iPhone and iPad), this posting extends those by showing how use shared memory between GPU and CPU with Swift and Metal.

The primary benefits of sharing memory between GPU and CPU are

  1. much less time copying data back and forth, and
  2. less overall memory resource use (no duplication needed).

Note that sharing memory between CPU and GPU only works when data amounts (e.g. an array) is larger than or equal to 16 KiloBytes (e.g. 2^12 = 0x4000 = 4096  4-byte Float elements in an array).

All Swift/Metal and Accelerate code and benchmarks can be found at githup repo:


2. Shared Memory between GPU and CPU with Swift and Metal

2.1 Allocate Memory for use by both GPU and CPU

var xvector:UnsafeMutablePointer = nil
var alignment:UInt = 0x4000 
var xvectorByteSize:UInt = UInt(maxcount)*UInt(sizeof(Float))
// actual allocation with alignment
posix_memalign(&xvector, alignment, xvectorByteSize)
// similar as for xvector
var yvector:UnsafeMutablePointer = nil
posix_memalign(&yvector, alignment, xvectorByteSize)

2.2 Fill xvector with data

// pointer handling and casting in Swift..
var xvectorVoidPtr = COpaquePointer(xvector)
var xvectorFloatPtr = UnsafeMutablePointer(xvectorVoidPtr)
var xvectorFloatBufferPtr = UnsafeMutableBufferPointer(start: xvectorFloatPtr, count: maxcount)
// fill xvector with data
for index in xvectorFloatBufferPtr.startIndex..<xvectorFloatBufferPtr.endIndex {
  xvectorFloatBufferPtr[index] = Float(Index)

2.3 make xvector and yvector available for GPU

Using newBufferWithBytesNoCopy() to create buffer to memory already allocated.

var xvectorBufferNoCopy = device.newBufferWithBytesNoCopy(xvector, length: Int(xvectorByteSize), 
            options: nil, deallocator: nil)
computeCommandEncoder.setBuffer(xvectorBufferNoCopy, offset: 0, atIndex: 0)

var yvectorBufferNoCopy = device.newBufferWithBytesNoCopy(yvector, length: Int(xvectorByteSize), 
            options: nil, deallocator: nil)
computeCommandEncoder.setBuffer(xvectorBufferNoCopy, offset: 0, atIndex: 1)

2.4 Do the GPU processing

var threadsPerGroup = MTLSize(width:32,height:1,depth:1)
var numThreadgroups = MTLSize(width:(Int(maxcount)+31)/32, height:1, depth:1)
computeCommandEncoder.dispatchThreadgroups(numThreadgroups, threadsPerThreadgroup: threadsPerGroup)

2.5 Access results in yvector

var yvectorVoidPtr = COpaquePointer(yvector)
var yvectorFloatPtr = UnsafeMutablePointer(yvectorVoidPtr)
var yvectorFloatBufferPtr = UnsafeMutableBufferPointer(start: yvectorFloatPtr, count: maxcount)

// print the first result to console!

// iterate through results .. similar to 2.2..
// ..

3. Conclusion

Have given a simple example of how to use sharing of memory between GPU and CPU with Swift and Metal. Don’t remember to manually release the memory with free(xvector) and free(yvector) afterwards since it was allocated with a c function – posix_memalign(). All code is available at github –


Best regards,
Amund Tveit
Memkite Team

