Go to Top

GPGPU Performance of Swift/Metal vs Accelerate on iPhone 6 & 5S, iPad Air and iPad Mini

1. Background

In a prior posting I presented how to use Swift combined with Metal for General Purpose GPU based processing, this posting presents related benchmarks – compared to using the Accelerate Framework – on iPhone 5S, iPhone 6, iPad Air and iPad Mini. All Swift/Metal and Accelerate code and benchmarks can be found at githup repo:

https://github.com/atveit/SwiftMetalGPUParallelProcessing

2. GPGPU Performance on Swift/Metal vs Accelerate

Metal is an alternative to OpenGL for graphics processing, but for general data-parallel programming for GPUs it is an alternative to OpenCL and Cuda. The performance results are for comparing 1) Metal+Swift performance with the 2) Accelerate framework (numerical methods such as BLAS that are highly optimized) – where both approaches calculate the Sigmoid function (Sigmoid function is frequently occurring in machine learning settings, e.g. for Deep Learning and Kernel Methods/Support Vector Machines).

Performance data for curves below can be found in: https://github.com/atveit/SwiftMetalGPUParallelProcessing/tree/master/simplebenchmark/

2.1 Swift+Metal Throughput – number of sigmoid vector element calculations per second

Observation: iPhone 6 (with A8 CPU/GPU) has almost twice the performance of A7-based devices) for larger arrays (of length 2^21 – 2^22)

2.2 Swift+Metal compared to Accelerate-based approach

Observation: All devices are from 2 to 20 times more processing efficient with Swift+Metal compared to using Accelerate, and increasing with larger amount of data

3. Swift Code for Sigmoid with Metal and Accelerate

See https://github.com/atveit/SwiftMetalGPUParallelProcessing/blob/master/SwiftMetalProcessing/ViewController.swift for the primary code.

3.1 – Sigmoid function in Metal Shader Language

 

kernel void sigmoid(const device float *inVector [[ buffer(0) ]],
                    device float *outVector [[ buffer(1) ]],
                    uint id [[ thread_position_in_grid ]]) {
    outVector[id] = 1.0 / (1.0 + exp(-inVector[id]));
}

 

3.2 – Sigmoid function using Accelerate and Swift

 

vvexpf(&expMinusX, &mynegativeVector, &localcount)
cblas_saxpy(Int32(oneVec.count), 1.0, &expMinusX, 1, &oneVec, 1)
vvpowf(&finalResultVector, &negOneVec, &oneVec, &localcount)

 

3.3 – Sigmoid function implemented in (pure) Swift

 for j in 0..<ccount {
                fra[j] = 1.0/(1.0 + exp(-myvector[j]))
            }

4. Conclusion

Metal combined with Swift can give bang for the bucks in terms of performance for also non-graphical processing, when it is set up it is easier to use than accelerate and can give significant speedup (e.g. up to 20 in this example). If comparing it to CPU-based processing (i.e. pure Swift-code) the performance gains can be even higher (up to 75 times faster on iPhone 6).

Best regards,
Amund Tveit (amund@memkite.com)
Memkite Team

, , , , ,

About Amund Tveit (@atveit - amund@memkite.com)

Amund Tveit works in Memkite on developing large-scale Deep Learning and Search (Convolutional Neural Network) with Swift and Metal for iOS (see deeplearning.education for a Memkite app video demo). He also maintains the deeplearning.university bibliography (github.com/memkite/DeepLearningBibliography)

Amund previously co-founded Atbrox , a cloud computing/big data service company (partner with Amazon Web Services), also doing some “sweat equity” startup investments in US and Nordic startups. His presentations about Hadoop/Mapreduce Algorithms and Search were among top 3% of all SlideShare presentations in 2013 and his blog posts has been frequently quoted by Big Data Industry Leaders and featured on front pages of YCombinator News and Reddit Programming

He previously worked for Google, where he was tech.lead for Google News for iPhone (mentioned as “Google News Now Looks Beautiful On Your iPhone” on Mashable.com), lead a team measuring and improving Google Services in the Scandinavian Countries (Maps and Search) and worked as a software engineer on infrastructure projects. Other work experience include telecom (IBM Canada) and insurance/finance (Storebrand).

Amund has a PhD in Computer Science. His publications has been cited more than 500 times. He also holds 4 US patents in the areas of search and advertisement technology, and a pending US patent in the area of brain-controlled search with consumer-level EEG devices.

Amund enjoys coding, in particular Python, C++ and Swift (iOS)

2 Responses to "GPGPU Performance of Swift/Metal vs Accelerate on iPhone 6 & 5S, iPad Air and iPad Mini"

  • Nevin Brackett-Rozinsky
    July 23, 2015 - 7:28 pm Reply

    It looks like your Swift+Accelerate implementation will blow through the cache twice on large arrays. Have you tried processing batches of, say, 1024 elements at a time? I’m not sure how easy that is to do in Swift, but with C-style pointer arithmetic it is trivial.

    Also, you could use vvrecf to take the reciprocal, rather than vvpowf.

    And for that matter, did you check whether vDSP_vsadd is faster than cblas_saxpy?

    So for example, you might have something like this, only Swiftier (or in a wrapper):

    const int batchSize = 1024;
    const int r = n % batchSize;
    const float *endingPoint = startingPoint + n - r;
    for (float *p = startingPoint; p < endingPoint; p += batchSize) {
    vvexpf(…); // batchSize elements of p
    vDSP_vsadd(…); // batchSize elements of p
    vvrecf(…); // batchSize elements of p
    }
    // process final r < 1024 elements of p

    And finally, it may be worth mentioning that 1/(1+exp(-x)) == exp(x)/(1+exp(x)), so the negation can be eliminated at the cost of replacing a reciprocal with a division (and needing to store a second array). Testing should reveal which is faster.

Leave a Reply to Nevin Brackett-Rozinsky Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>