Go to Top

Data-Parallel Programming with Metal and Swift for iPhone/iPad GPU

Apple describes Metal as: “Metal provides the lowest-overhead access to the GPU, enabling you to maximize the graphics and compute potential of your iOS 8 app. With a streamlined API, precompiled shaders, and support for efficient multi-threading, Metal can take your game or graphics app to the next level of performance and capability.”source: https://developer.apple.com/metal/

1. Example of Metal and Swift for calculating the Sigmoid Function

Metal is an alternative to OpenGL for graphics processing, but for general data-parallel programming for GPUs it is an alternative to OpenCL and Cuda. This (simple) example shows how to use Metal with Swift for calculating the Sigmoid function (Sigmoid function is frequently occurring in machine learning settings, e.g. for Deep Learning and Kernel Methods/Support Vector Machines).

If you want to read up on Metal I recommend having a look at https://developer.apple.com/metal/ (Metal Programming Guide, Metal Shading Language and Metal Framework Reference)

2. Metal and Swift Code for calculating Sigmoid

Have divided the code description into 8 parts below:

  1. Function for setting up Metal
  2. Sigmoid function in Shaders.metal
  3. Prepare original input data – a Swift array
  4. Prepare Sigmoid function to run on GPU
  5. Create GPU input and output data
  6. Configure GPU threads
  7. Wrap up encoding setup and start processing, and wait until finished!
  8. Get result data out from GPU and into Swift

2.1 Function for setting up Metal

func initMetal() -> (MTLDevice, MTLCommandQueue, MTLLibrary, MTLCommandBuffer, 
                             MTLComputeCommandEncoder){
  // Get access to iPhone or iPad GPU
  var device = MTLCreateSystemDefaultDevice() 

  // Queue to handle an ordered list of command buffers
  var commandQueue = device.newCommandQueue() 

  // Access to Metal functions that are stored in Shaders.metal file, e.g. sigmoid()
  var defaultLibrary = device.newDefaultLibrary()

  // Buffer for storing encoded commands that are sent to GPU
  var commandBuffer = commandQueue.commandBuffer()

  // Encoder for GPU commands
  var computeCommandEncoder = commandBuffer.computeCommandEncoder()

  return (device, commandQueue, defaultLibrary!, commandBuffer, computeCommandEncoder)
}

2.2 Sigmoid function in Shaders.metal

kernel void sigmoid(const device float *inVector [[ buffer(0) ]],
                    device float *outVector [[ buffer(1) ]],
                    uint id [[ thread_position_in_grid ]]) {
    // This calculates sigmoid for _one_ position (=id) in a vector per call on the GPU
    outVector[id] = 1.0 / (1.0 + exp(-inVector[id]));
}

2.3 Prepare original input data – a Swift array

  var myvector = [Float](count: 123456, repeatedValue: 0)
  for (index, value) in enumerate(myvector) {
     myvector[index] = Float(index)
  }

2.4 Prepare Sigmoid function to run on GPU

// a. initialize Metal
var (device, commandQueue, defaultLibrary, commandBuffer, computeCommandEncoder) = initMetal()

// b. set up a compute pipeline with Sigmoid function and add it to encoder
let sigmoidProgram = defaultLibrary.newFunctionWithName("sigmoid")
var pipelineErrors = NSErrorPointer()
var computePipelineFilter = device.newComputePipelineStateWithFunction(sigmoidProgram!, error: pipelineErrors)
computeCommandEncoder.setComputePipelineState(computePipelineFilter!)

2.5 Create GPU input and output data

// a. calculate byte length of input data - myvector
var myvectorByteLength = myvector.count*sizeofValue(myvector[0])

// b. create a MTLBuffer - input data that the GPU and Metal and produce
var inVectorBuffer = device.newBufferWithBytes(&myvector, length: myvectorByteLength, options: nil)

// c. set the input vector for the Sigmoid() function, e.g. inVector
//    atIndex: 0 here corresponds to buffer(0) in the Sigmoid function
computeCommandEncoder.setBuffer(inVectorBuffer, offset: 0, atIndex: 0)

// d. create the output vector for the Sigmoid() function, e.g. outVector
//    atIndex: 1 here corresponds to buffer(1) in the Sigmoid function
var resultdata = [Float](count:myvector.count, repeatedValue: 0)
var outVectorBuffer = device.newBufferWithBytes(&resultdata, length: myvectorByteLength, options: nil)
computeCommandEncoder.setBuffer(outVectorBuffer, offset: 0, atIndex: 1)

2.6 Configure GPU threads

// hardcoded to 32 for now (recommendation: read about threadExecutionWidth)
var threadsPerGroup = MTLSize(width:32,height:1,depth:1)
var numThreadgroups = MTLSize(width:(myvector.count+31)/32, height:1, depth:1)
computeCommandEncoder.dispatchThreadgroups(numThreadgroups, threadsPerThreadgroup: threadsPerGroup)

2.7 Wrap up encoding setup and start processing, and wait until finished!

computeCommandEncoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()

2.8 Get result data out from GPU and into Swift

// a. Get GPU data 
// outVectorBuffer.contents() returns UnsafeMutablePointer roughly equivalent to char* in C
var data = NSData(bytesNoCopy: outVectorBuffer.contents(), 
                  length: myvector.count*sizeof(Float), freeWhenDone: false)
// b. prepare Swift array large enough to receive data from GPU
var finalResultArray = [Float](count: myvector.count, repeatedValue: 0)

// c. get data from GPU into Swift array
data.getBytes(&finalResultArray, length:myvector.count * sizeof(Float))

// d. YOU'RE ALL SET!

Hope you find it useful.

Best regards,
Amund Tveit (amund@memkite.com – @atveit)
Memkite Team

, , , , , , ,

About Amund Tveit (@atveit - amund@memkite.com)

Amund Tveit works in Memkite on developing large-scale Deep Learning and Search (Convolutional Neural Network) with Swift and Metal for iOS (see deeplearning.education for a Memkite app video demo). He also maintains the deeplearning.university bibliography (github.com/memkite/DeepLearningBibliography)

Amund previously co-founded Atbrox , a cloud computing/big data service company (partner with Amazon Web Services), also doing some “sweat equity” startup investments in US and Nordic startups. His presentations about Hadoop/Mapreduce Algorithms and Search were among top 3% of all SlideShare presentations in 2013 and his blog posts has been frequently quoted by Big Data Industry Leaders and featured on front pages of YCombinator News and Reddit Programming

He previously worked for Google, where he was tech.lead for Google News for iPhone (mentioned as “Google News Now Looks Beautiful On Your iPhone” on Mashable.com), lead a team measuring and improving Google Services in the Scandinavian Countries (Maps and Search) and worked as a software engineer on infrastructure projects. Other work experience include telecom (IBM Canada) and insurance/finance (Storebrand).

Amund has a PhD in Computer Science. His publications has been cited more than 500 times. He also holds 4 US patents in the areas of search and advertisement technology, and a pending US patent in the area of brain-controlled search with consumer-level EEG devices.

Amund enjoys coding, in particular Python, C++ and Swift (iOS)

5 Responses to "Data-Parallel Programming with Metal and Swift for iPhone/iPad GPU"

Leave a Reply to GPGPU Performance of Swift/Metal vs Accelerate on iPhone 6 & 5S, iPad Air and iPad Mini | Amund Tveit's Blog Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>