GPU or CPU

vindos

Newcomer
Hello guys...

I have an image processing algorithm where output of each pixel is determined by the sum of the product of each pixel in that row multiplied by a kernel.
Would an optimisation using GPU be possible for this kind of a problem???

Because if the row has 1024 elements i would have to read 1024 elements for each output pixel and then multiply it with kernel and then take their sum.

Thank u
 
Because if the row has 1024 elements i would have to read 1024 elements for each output pixel and then multiply it with kernel and then take their sum.
Thank u
Unless there is an equivalent of an "FFT" transform for your kernel weights**, then running it on a CPU VS a GPU will require the same number of MAD operations. The only difference is that a GPU will be able to do a lot more operations in parallel than the CPU.



**And even then you may be able to code it appropriately for a GPU.
 
I'm not totally clear on what you're trying to do, but it sounds suspiciously like a "scan" or similar algorithm, which can be implemented extremely efficiently on GPUs.

If you're talking about just summing full rows (i.e. you don't need partial sums like in summed-area tables), it's probably even simpler: a reduction over the rows will get you that. The variable here is it's unclear to me what you mean by "multiply it with kernel and then take their sum". If you'd like to lay out the algorithm in more detail here I'm sure people would be happy to give you parallel/GPU implementation advice.
 
Thanks for the replies... I would like to make myself more clear about my req.

sum of the product of each pixel in that row multiplied by a kernel

if i have to calculate the value of a pixel Cij of output image of size n X n it would be

Cij = C0j * W1 + C1j * W2 + C2j * W3.......... + Cnj * Wn


where W1, W2..Wn are weights which i select from an array depending on the value of i and j.
I perform this operation for every i, j < n
 
Cij = C0j * W1 + C1j * W2 + C2j * W3.......... + Cnj * Wn

where W1, W2..Wn are weights which i select from an array depending on the value of i and j.
Well if you don't give us any more information about the weights and how you choose them, then there's no way in general to make that faster than just brute force (namely, Wijk could be unique for every i,j,k tuple).

There may be a more clever way to approach this depending on the weight values though. If you could give us more information on that step it might be helpful.
 
Back
Top