B.R.,

Amund

B.R.,

Amund

Best,

Amund

**computePipelineState = try metalDevice.newComputePipelineStateWithDescriptor(computePipeLineDescriptor**

of:

*cannot invoke ‘newComputePipelineStateWithDescriptor’ with an argument list of type ‘(MTLComputePipelineDescriptor)’*

Does this mean it wants more arguments? (and what would they be) Or has the Metal API and/or Swift syntax changed since this was written?

Thanks,

Jim

Also, you could use vvrecf to take the reciprocal, rather than vvpowf.

And for that matter, did you check whether vDSP_vsadd is faster than cblas_saxpy?

So for example, you might have something like this, only Swiftier (or in a wrapper):

`const int batchSize = 1024;`

const int r = n % batchSize;

const float *endingPoint = startingPoint + n - r;

for (float *p = startingPoint; p < endingPoint; p += batchSize) {

vvexpf(…); // batchSize elements of p

vDSP_vsadd(…); // batchSize elements of p

vvrecf(…); // batchSize elements of p

}

// process final r < 1024 elements of p

And finally, it may be worth mentioning that 1/(1+exp(-x)) == exp(x)/(1+exp(x)), so the negation can be eliminated at the cost of replacing a reciprocal with a division (and needing to store a second array). Testing should reveal which is faster.

]]>Thanks for sharing a wonderful example. It really helps to understand the basic building blocks of data parallel programming with metal. However when I tried it to run the example it gives a gpu run time exception telling the accessing invalid memory address of the kernel. By modify the kernel program only coping the input array data into output array, program started working. The observation is that any arithmetic computation in the kernel program giving the above exception (even outputVector[id] = 5.0 .//or any hard code assignment is not working.)

It would be great help if you able to give any solution around the above problem.

]]>