c++ - OpenCL data parallel summation into a variable -

is possible use opencl data parallel kernel sum vector of size n, without doing partial sum trick?

say if have access 16 work items , vector of size 16. wouldn't not possible have kernel doing following

    __kernel void summation(__global float* input, __global float* sum) {     int idx = get_global_id(0);      sum[0] += input[idx]; }

when i've tried this, sum variable doesn't updated, overwritten. i've read using barriers, , tried inserting barrier before summation above, update variable somehow, doesn't reproduce correct sum.

let me try explain why sum[0] overwritten rather updated.

in case of 16 work items, there 16 threads running simultaneously. sum[0] single memory location shared of threads, , line sum[0] += input[idx] run each of 16 threads, simultaneously.

now instruction sum[0] += input[idx] (i think) expands performs read of sum[0], adds input[idx] before writing result sum[0].

there will data race multiple threads reading , writing same shared memory location. might happen is:

all threads may read value of sum[0] before other thread writes updated result sum[0], in case final result of sum[0] value of input[idx] of thread executed slowest. since different each time, if run example multiple times should see different results.
or, 1 thread may execute more slowly, in case thread may have written updated result sum[0] before slow thread reads sum[0], in case there addition using values of more 1 thread, not threads.

so how can avoid this?

option 1 - atomics (worse option):

you can use atomics force threads block if thread performing operation on shared memory location, results in loss of performance since making parallel process serial (and incurring costs of parallelisation -- such moving memory between host , device , creating threads).

option 2 - reduction (better option):

the best solution reduce array, since can use parallelism effectively, , can give o(log(n)) performance. here overview of reduction using opencl : reduction example.

SoEM

Search This Blog

c++ - OpenCL data parallel summation into a variable -

Comments

Post a Comment