Go to Top

Memkite – Deep Learning for iOS (tested on iPhone 6S), tvOS and OS X developed in Metal and Swift

Last week we purchased the new iPhone 6S and had high expectations of its GPU performance. One of the reasons for our expectations was a blog post by Simon Gladman where he wrote that iPhone 6S had 3 times the GPU performance of iPhone 6, this was also reported by TheNextWeb.

In our GPU programming case (developing Deep Learning algorithms with Metal) – going from iPhone 5S to iPhone 6S – got 1 order of magnitude in improved performance! Calculation time to run through a 20 layer deep convolutional neural network model for image recognition went from approximately 2 seconds to less than 100 milliseconds. Note that 100 milliseconds or in other words 0.1 seconds is what Jacob Nielsen stated is one of 3 important response times – that a user feels a system reacts instantenously.

This blog post gives a brief overview of Memkite (where we are and where we are going) – a Deep Learning Kit for iOS, OS X and tvOS. It is developed in Metal in order to make efficient use of the GPU and Swift for setting up Metal as well as loading data and integrate with apps.

1. Memkite – GPU Accelerated Deep Learning for Apple’s iOS, tvOS and OS X with Metal and Swift

Memkite currently implements Convolutional Neural Networks in Metal (parallelized for the GPU), deep learning layer operators include: convolution, pooling, relu layer.

On OS X Memkite can easily be adapted to utilize several GPUs if present, e.g. to run the same deep learning model on several GPUs to increase throughput or run different models in order to increase number of classes to predict over.

let GPUs = MTLCopyAllDevices()

gave the following on a (2012) Retina Macbook Pro

Screen Shot 2015-10-14 at 13.16.49

An interesting feature on iOS (and most likely on tvOS, but not yet tested in our case) is that one can share memory between GPU and CPU (less copying of data).

2. App Store for Deep Learning Models

Given the immense asymmetry in time taken to train a Deep Learning Model versus time needed to use it (e.g. to do image recognition), it makes perfect sense to build a large repository of pre-trained models that can be (re)used several times. Since there are several popular tools used to train Deep Learning models (e.g. Caffe, Torch, Theano, DeepLearning4J, PyLearn and Nervana) we’re working on supporting importing pre-trained models in those tools into an “app store” for deep learning models (currently we’ve been primarily been working with Caffe CNN models).

Screen Shot 2015-10-14 at 10.05.24

The tweet above illustrates how much energy is required to train a Deep Network (per night), some Deep Learning Models can take weeks of training on GPUs like the Nvidia TitanX, or in other words piles of wood of energy. Using a model is quite different since it requires less energy than lighting match.

Screen Shot 2015-10-14 at 10.51.52


Deep Learning Models also typically have a (low) limit in the number of classes they can predict per model (e.g. in the ImageNet competition there are 1000 classes, CIFAR-100 100 classes and CIFAR-10 10 classes). This means that in order to create real-life applications one need to intelligently (and very rapid load them from SSD into GPU accessible RAM) switch between several Deep Learning Models, or if there is enough capacity one can run several models in parallel on the same GPU. Selecting an approriate Deep Learning model (i.e. which is the most likely to work well in a given context) is to our knowledge not a well-studied field of research, and in some ways it resembles the meta or universal search problem found in web search (e.g. cross-model ranking), but latency plays an even bigger part in the mobile on-device case (don’t have time to run many models).

With state-of-the-art compression techniques for Convolutional Neural Network the (groundbreaking) AlexNet model from 2012 can be compressed from 240MB to 6.9MB.  This means that one could theoretically fit more than eighteen thousand AlexNet models on a 128 GB mobile device like the iPhone 6!


Deep Learning on iOS, tvOS and OS X devices is still in its infancy, and we hope to play a part in it, and we look forward to test Memkite Deep Learning on the forthcoming iPad Pro (it has a very powerful GPU!).

Best regards,

Torbjørn Morland and Amund Tveit (Memkite Team)


, , , , , ,

About Torbjørn Morland

Torbjørn works in Memkite on developing Deep Learning (Convolutional Neural Network) with Swift and Metal for iOS. See this video for how he integrated on-device Deep Learning and Search.

Torbjørn came to Memkite from Facebook, where he worked as a software engineer on mobile infrastructure, specifically for Facebook Home, Photo Sync for Android, and Android Product Infrastructure, as well as Storage backend for Messaging.

He has a background from algorithms, information retrieval and big data from his studies at NTNU, where he attended a MSc program in computer science, specializing in complex computer systems. He elected to join Facebook before finishing the degree, thus filling Memkite's college dropout slot.

Torbjørn enjoys coding, in particular Python, Java (Android), C++, Objective C and Swift (iOS).

See also Torbjørn's Linkedin Profile.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>