View Full Version : FFT on gpu problem (56k alert)
asmatic
30-Jul-2004, 12:51
Hello
I have been working in a framework that lets you link shaders in a chain to perform image processing filters.
I had been working with the fft implementation shown at ShaderX2 and Ati VideoShader Demo. However i'm unable to make it work. I have been working for days, but cant find what am I doing wrong. I'm becoming very frustrated:
I have tried to check each small step: the FFT consists in 2 basic steps: scrambles & butterflyes. I have no problem with scrambles but the butterfly does not work.
This is the result of aplying only one horizontal butterfly to a 512x512 lena image: http://campus.uab.es/~2066821/fftcomp.jpg
right is ATi videoshader demo, left is my framework
even with the image resized and stretched, it is clear that the Ati's butterflyes creates some dark patterns (they are not created as a result of resize and stretching). My butterfly presents some minor moiré effect due to resizing.
The problem is the the same shader is being used for both images.
So if is not the shader, maybe its the butterfly texture that the butterfly texture uses:
a 512x32x4 channel float texture in raw format.
I have checked the texture and I load it correctly. Even with the raw file loaded in a texture, i lock that texture, write its content to another file unlock, and compare the original file with the newly created with the file compare fc command, and they are equal, so it's not problem of the texture.
Also the texture D3DFMT's formats are the same in both aplications
Maybe it's the constant float value that you have to pass to each butterfly pass but I have checked that and the shader receives the correct value.
I tried also checking pixel/textel alineations. On Ati's demo this is done in a vertex shaders, on my app is done at vertex declaration. However i have been disabled the texel-alignement code at the ATi's vertex shader ans his app still runs Ok, presenting that dark bands on the butterfly.
Of course all textures are sampled POINT in mag and min filters.
So back on the loockup texture I modified the pixel shaders of both aplications to show that loockup texture:
(remember that is a 512x32x4 floats, but forcing the pixel shader to sample this texture with the image texture (512x512) coords gives this results:
Ati's App (that app resizes the output window):
http://campus.uab.es/~2066821/AtiHButter.jpg
My App:
http://campus.uab.es/~2066821/MyHButter.jpg
As you can see the former has some different pattern.
Mine has no pattern :cry:
It seems tha there is something wrong near that lookup texture, but I have checked that the texture is loaded Ok.
I know it is difficult that someone can help me, but maybe anyone has worked with that fft and has faced the same problem and may help me.
Thank-you
I might be able to help. I don't see anything wrong with the images at the bottom of you post, maybe you can explain further what you meant there.
I need to look back at my notes, but I think there are two possible issues you are running into.
First, IIRC the butterfly shader is somewhat dependent on the scramble shader to conver the image to a complex number. The algorithm puts the real component into the color channels, and then the imaginary component in the alpha channel. By having a different alpah channel setup, you could get bad results, but I don't expect this is it.
Second, the shader relies on either a constant or a tex coord manipulation to fetch from the right part of the butterfly shader. The bands in the shader represent the coefficients needed for different passes. I believe the D3D version uses a constant, while the OpenGL version originally used a texcoord. It looks to me like you might not be setting this correctly, and are just fetching from the wrong band in the texture.
-Evan
[Edit for spelling]
asmatic
30-Jul-2004, 14:39
Thakns Evan.
If wou look carefully to the VideoShader texture, specially at the first and second "rows" (or band) of the image you will see that it has some areas in which the pattern changes and becomes different (a bit more darker)
My texture does not behave like this. The first two rows of my butterfly texture are homogenous rows.
The Ati app works in D3D, as my framework does. I don't know any OGL version.
You are right: one must pass to each butterfly pass a float value that informs the shader from where to read. But I debugged the shader in Visual Studio and the float was passed ok. In the ShaderX2 article that float value is calculated as i / log2(image_width) (i = number of the pass) but in the real application, other values (eg: 0.015625 for the first pass) are used
That's the code for the buyyerfly shader that comes with Videoshader
//------------------------------------------------------------------------------
// Hbutterfly.hlsl
//
// Performs horizontal butterfly pass of FFT.
//
// $Header: //depot/3darg/Demos/Internal/D3DXEffectDemos_DX9/VideoDemo/Shaders/FFT/Hbutterfly.hlsl#3 $
//
// Evan Hart - ATI Research, Inc. - 2003
//------------------------------------------------------------------------------
//all textures sampled nearest
sampler butterfly : register(s12);
sampler sourceImage : register(s0);
struct PS_INPUT
{
float2 srcLocation:TEXCOORD0;
};
//constant to tell which pass is being used
float4 passNum : register(c30); // passNum = passNumber / log2(width)
float4 main( PS_INPUT In ) : COLOR
{
float2 sampleCoord;
float4 butterflyVal;
float2 a;
float2 b;
float2 w;
float temp;
sampleCoord.x = In.srcLocation.x;
sampleCoord.y = passNum.x;
butterflyVal = tex2D( butterfly, sampleCoord);
w = butterflyVal.ba;
//sample location A
sampleCoord.x = butterflyVal.y;
sampleCoord.y = In.srcLocation.y;
a = tex2D( sourceImage, sampleCoord).ra;
//sample location B
sampleCoord.x = abs(butterflyVal.x);
sampleCoord.y = In.srcLocation.y;
b = tex2D( sourceImage, sampleCoord).ra;
//multiply w*b (complex numbers)
temp = w.x*b.x - w.y*b.y;
b.y = w.y*b.x + w.x*b.y;
b.x = temp;
//perform a + w*b or a - w*b
a = a + ((butterflyVal.x < 0.0) ? -b : b);
//make it a 4 component output for good measure
return a.xxxy;
//return butterflyVal;
}
this shader return the lena image with dark patterns
My framework uses this shader modified to read the butterfly in sampler s1 and float constant on c1. No other modifications.
to generate the other two textures (the butterfly ones) I just change
sampleCoord.x = In.srcLocation.x;
//sampleCoord.y = passNum.x; // comment this line
sampleCoord.y = In.srcLocation.y // use that instead;
and
//make it a 4 component output for good measure
//return a.xxxy; // comment this
return butterflyVal; // return the value readed from butterfly tex
The Scramble pass only swaps rows/files however for testing purposes the first image on my post above is obtained only with a black-white conversion and only 1 Horizontal Butterfly input of 0.015625, because i discarded the scramble phase as a source for error
regards
Thakns Evan.
If wou look carefully to the VideoShader texture, specially at the first and second "rows" (or band) of the image you will see that it has some areas in which the pattern changes and becomes different (a bit more darker)
My texture does not behave like this. The first two rows of my butterfly texture are homogenous rows.
That's the effect of resizing with point sampling. I can make your texture look exactly the same by loading it in PSP and resizing it with "point resize".
asmatic
31-Jul-2004, 01:29
you are absolutely rigth xmast, i have checked that too, the pattern is caused by POINT resizing
So I'm lost again :( :?:
This is the output of my app's FFT (fft -> magnitude and quadrant swap included)
http://campus.uab.es/~2066821/fft.JPG :(
this other gif compares 1 single HButterfly pass with float value = 0.203125 (as if it was the 7th butterfly pass)
My 512x512 ahs been resized to point 638x476 to match the Ati's size.
http://campus.uab.es/~2066821/HB7.gif
Ati's one has the R9700 logo
regards
Looking at this some more, I suspect that the two examples are starting with different values stuffed into the alpha channel. I am guessing, that if you switched the shaders to read both the real and imaginary components out of the red channel, you would get the same results.
This is my suspiction, because the value modulated with the alpha channel seems to vary in the same way the images differ.
-Evan
asmatic
01-Aug-2004, 16:08
Thanks everyone, finally I have made it work.
It was not working because I'm a little stupid :oops:
First the float constants weren't passed correctly.
They worked with only one pass but with multipass they were not passing correctly to the shader. And because of that dark bands I was expecting to find erros on individual butterfly passes.
Then the dozens of changes on my code in order to try to find the bug ended with a couple of copy-paste hidden errors....
Well at last i can say it works ^^
http://campus.uab.es/~2066821/fft2.jpg
pretty FFT... :D
Thanks to everyone!
asmatic
03-Aug-2004, 22:31
just one quick question
can I use a 512 butterfly texture to perform, for example a 128x128 FFT?
I would need a custom scramble 128 texture, and probably a custom 128 butterfly. am I right?
thanks
Go to the linke below. In the zip file, there is a file Buffer.cpp that has the code to generate these textures, for arbitrary sizes.
http://www.ati.com/developer/sdk/RadeonSDK/Html/Samples/OpenGL/HW_Image_Processing.html
-Evan
asmatic
05-Aug-2004, 00:52
Thans Evan.
So that's the OGL FFT version that you were talking before ;)
Wow! I can only say "thanks" to the ati dev team for his research in image processing on gpu's
m000gle
02-Jan-2006, 15:08
I have just implemented the OpenGL solution and it seems to work, although it seems strange that my fft and ifft operations are identical except for the rescaling. In ATI's example the ifft butterfly texture is created with a negative angle, but if I do that then an IMG->FFT->IFFT->IMG operation results in the image being rotated... so I use the same butterfly texture for FFT and IFFT. Is this incorrect?
Does anyone know how this can be adapted to work with all 4 colour channels (without having to do the process 4 times)?
Is there a better method that does all channels at once, or is this currently the best solution?
Can't you just use multiple render targets? Doubt it would matter speed wise though.
m000gle
02-Jan-2006, 16:05
Can't you just use multiple render targets? Doubt it would matter speed wise though.
I could, but my hardware doesnt support them.
Am I right in thinking that you need 2 channels for every colour channel in a fourier transformed image (1 for real, 1 for imaginary)?
Not necessarily, in principle you can do the FFT such that you get only 1 value per pixel in the output (in total N/2 - 2 complex and 4 real ones). Complex FFTs are a bit sloppy with storage and computation when used for real input.
m000gle
03-Jan-2006, 01:20
What do you mean by "A real FFT is a little harder to implement than a complex one", I am new to the world of FFT's so forgive my ignorance, but I assumed that there is only one type of FFT (as provided by the implementation discussed uin this thread) and that it contains both real and complex components. Could you explain what you mean by real FFT's and complex FFT's?
I have looked at the solution provided by "Moreland, K and Angel, E", http://www.cs.unm.edu/~kmorel/documents/fftgpu/ and that appears to be able to transform a 4 channel image and store the entire fourier transform in the same sized texture.
Is their solution better than the solution discussed in this thread (in terms of speed/storage)?
AFAICS the ATI example is complex. What I was talking about is described in the "frequency compression" part of the paper you linked ... and yes, that is better.
m000gle
03-Jan-2006, 03:05
I had half implemented the kmorel method (I was having problems converting one of their shaders to GLSL - weird cg parameter setting ambiguity that looked like a bug to me) when I saw the ATI method which was so much simpler and switched to that. I should have guessed there would be a catch...
Thanks for clearing it up.
If you are interested in the maths then this (http://web.archive.org/web/20050217091453/www.eptools.com/tn/T0001/INDEX.HTM) has a pretty good description (site is down, but archive.com is a pretty good backup ... they even have the zip (http://web.archive.org/web/20050217184152/http://www.eptools.com/tn/T0001.ZIP) for offline reading).
m000gle
08-Jan-2006, 02:50
I have almost got this working now, but my image after doing ->FFT->IFFT-> has minor defects (clearly a result of a problem with the transform rather than precision - I am using a 32bit buffer)
I have basically implemented a GLSL non texRECT version from the source provided at http://www.cs.unm.edu/~kmorel/documents/fftgpu/
I suspect the problem lies with part of their Cg code that looks to me to be ambiguous, and I am not sure how to interperet it.
Basically in fft.cg there are a couple of uniforms:
uniform float PartitionSize
uniform float NumPartitions
Now these are single floats and are used as such, but in the c code they are uploaded for one part of the process like this:
(from oca/ocaProgramFFT1c.cxx)
cgGLSetParameter1f(this->PartitionSize,(float)partitionSize);
cgGLSetParameter1f(this->NumPartitions,(float)numPartitions);
Which is fine... but in a later part of the process they are uploaded like this:
(from oca/ocaProgramFFT2c.cxx)
cgGLSetParameter2f(this->PartitionSize, (float)partitionSize, (float)(partitionSize/2));
cgGLSetParameter2f(this->NumPartitions, (float)numPartitions, (float)(numPartitions*2));
Does anyone know how Cg handles this? I would assume that it would ignore the second dimension of the vector, hence it is effectively the same as the upload in the earlier process.
I have tried the following for the later process:
cgGLSetParameter1f(this->PartitionSize,(float)partitionSize);
cgGLSetParameter1f(this->NumPartitions,(float)numPartitions);
I also tried this:
cgGLSetParameter1f(this->PartitionSize,(float)partitionSize/2);
cgGLSetParameter1f(this->NumPartitions,(float)numPartitions*2);
and got better results, bit still not quite right.
Its really frustrating because I cannot easily work with their version as it requires nVidia hardware (that I dont have), so I would have to reimiplement much of their buffer code to be able to work investigate the issue with their code.
Would it be possible for someone with an nVidia card to compile their code and change the second upload to be a 1f rather than a 2f and see how that affects the results, and which are the correct values to upload...
I am trying to get my head around the math behind it all so that I am in a better position to fix the problem myself, but I wondered if anyone could shed some light on this issue...
Thanks
m000gle
15-Jan-2006, 01:24
Yay finally got this working. As usual it was a really stupid mistake that I had overlooked.
The cg parameter thing looks like it is just a benign bug in their code, as it would ignore the unused parameters.
Just got to make it faster now...
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.