Go to Top

Example of Sharing Memory between GPU and CPU with Swift and Metal for iOS8

1. Background

In two prior postings I presented i) data-parallel Swift/Metal example and ii) benchmark (compared to Accelerate) of how to use Swift combined with Metal for General Purpose GPU based processing on iOS8 devices (e.g. iPhone and iPad), this posting extends those by showing how use shared memory between GPU and CPU with Swift and Metal.

The primary benefits of sharing memory between GPU and CPU are

  1. much less time copying data back and forth, and
  2. less overall memory resource use (no duplication needed).

Note that sharing memory between CPU and GPU only works when data amounts (e.g. an array) is larger than or equal to 16 KiloBytes (e.g. 2^12 = 0x4000 = 4096  4-byte Float elements in an array).

All Swift/Metal and Accelerate code and benchmarks can be found at githup repo:

https://github.com/atveit/SwiftMetalGPUParallelProcessing

2. Shared Memory between GPU and CPU with Swift and Metal

2.1 Allocate Memory for use by both GPU and CPU

var xvector:UnsafeMutablePointer = nil
var alignment:UInt = 0x4000 
var xvectorByteSize:UInt = UInt(maxcount)*UInt(sizeof(Float))
// actual allocation with alignment
posix_memalign(&xvector, alignment, xvectorByteSize)
            
// similar as for xvector
var yvector:UnsafeMutablePointer = nil
posix_memalign(&yvector, alignment, xvectorByteSize)

2.2 Fill xvector with data

// pointer handling and casting in Swift..
var xvectorVoidPtr = COpaquePointer(xvector)
var xvectorFloatPtr = UnsafeMutablePointer(xvectorVoidPtr)
var xvectorFloatBufferPtr = UnsafeMutableBufferPointer(start: xvectorFloatPtr, count: maxcount)
            
// fill xvector with data
for index in xvectorFloatBufferPtr.startIndex..<xvectorFloatBufferPtr.endIndex {
  xvectorFloatBufferPtr[index] = Float(Index)
}

2.3 make xvector and yvector available for GPU

Using newBufferWithBytesNoCopy() to create buffer to memory already allocated.

var xvectorBufferNoCopy = device.newBufferWithBytesNoCopy(xvector, length: Int(xvectorByteSize), 
            options: nil, deallocator: nil)
computeCommandEncoder.setBuffer(xvectorBufferNoCopy, offset: 0, atIndex: 0)

var yvectorBufferNoCopy = device.newBufferWithBytesNoCopy(yvector, length: Int(xvectorByteSize), 
            options: nil, deallocator: nil)
computeCommandEncoder.setBuffer(xvectorBufferNoCopy, offset: 0, atIndex: 1)

2.4 Do the GPU processing

var threadsPerGroup = MTLSize(width:32,height:1,depth:1)
var numThreadgroups = MTLSize(width:(Int(maxcount)+31)/32, height:1, depth:1)
computeCommandEncoder.dispatchThreadgroups(numThreadgroups, threadsPerThreadgroup: threadsPerGroup)
computeCommandEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()

2.5 Access results in yvector

var yvectorVoidPtr = COpaquePointer(yvector)
var yvectorFloatPtr = UnsafeMutablePointer(yvectorVoidPtr)
var yvectorFloatBufferPtr = UnsafeMutableBufferPointer(start: yvectorFloatPtr, count: maxcount)

// print the first result to console!
println(yvectorFloatPtr.memory)

// iterate through results .. similar to 2.2..
// ..

3. Conclusion

Have given a simple example of how to use sharing of memory between GPU and CPU with Swift and Metal. Don’t remember to manually release the memory with free(xvector) and free(yvector) afterwards since it was allocated with a c function – posix_memalign(). All code is available at github –

https://github.com/atveit/SwiftMetalGPUParallelProcessing

Best regards,
Amund Tveit
Memkite Team

, , ,

About Amund Tveit (@atveit - amund@memkite.com)

Amund Tveit works in Memkite on developing large-scale Deep Learning and Search (Convolutional Neural Network) with Swift and Metal for iOS (see deeplearning.education for a Memkite app video demo). He also maintains the deeplearning.university bibliography (github.com/memkite/DeepLearningBibliography)

Amund previously co-founded Atbrox , a cloud computing/big data service company (partner with Amazon Web Services), also doing some “sweat equity” startup investments in US and Nordic startups. His presentations about Hadoop/Mapreduce Algorithms and Search were among top 3% of all SlideShare presentations in 2013 and his blog posts has been frequently quoted by Big Data Industry Leaders and featured on front pages of YCombinator News and Reddit Programming

He previously worked for Google, where he was tech.lead for Google News for iPhone (mentioned as “Google News Now Looks Beautiful On Your iPhone” on Mashable.com), lead a team measuring and improving Google Services in the Scandinavian Countries (Maps and Search) and worked as a software engineer on infrastructure projects. Other work experience include telecom (IBM Canada) and insurance/finance (Storebrand).

Amund has a PhD in Computer Science. His publications has been cited more than 500 times. He also holds 4 US patents in the areas of search and advertisement technology, and a pending US patent in the area of brain-controlled search with consumer-level EEG devices.

Amund enjoys coding, in particular Python, C++ and Swift (iOS)

2 Responses to "Example of Sharing Memory between GPU and CPU with Swift and Metal for iOS8"

Leave a Reply to Memkite – Deep Learning for iOS (tested on iPhone 6S), tvOS and OS X developed in Metal and Swift | Memkite Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>