Volume rendering large data problem

hesicong

Newcomer
I'm now doing a mecial software which rendering volume data of MRI. I use ray casting method to complete this. But on my G80 the performance is unsatisified. The card spends most time fetching texture.
I tried 256*256*256 3D texture which I set texture parameter to GL_LUMINANCE_ALPHA, and render windows 800*600, and I get only below 12fps.
I want to know:
1. If set GL_LUMINANCE_ALPHA can really decrease the memory usage and increase performance.
2. If I split the 3D texture into 8 3D textures should speed up the fetching performance?
3. If my texture's internal data format is set to GL_RGBA, my texture size will be 130M, any tips to speed up texture fetching and filting work?
Thanks for all asking my question!
 
Not a trivial problem to solve! I have an axis-aligned slice volume renderer but am considering moving to volume textures in the future. Performance is very important for interactivity so I've thought quite a bit about techniques.

The best optimization is to do nothing! That means that you don't want to call the pixel shader when its output will have an alpha=0. There are several ways to do this but a simple one is to create a 3d array in CPU memory that contains the minimum and maximum data values in a block of the volume data texture. For example, create a 16x16x16 3d array containing the volume data mins/maxs for each 16x16x16 block of the volume texture. At render time, compare the data mins/maxs in this array to your output transform color texture to see if the block has a chance of producing any visible output. You only need to start your pixel shader on the blocks that will produce output. Render the ray starting distance into an offscreen target. A similar thing can be done on the backside of the volume texture. Render the ray termination distance into another offscreen target. That'll reduce the total number of steps in your raycasting pixel shader.

How much those steps improve performance depends on the nature of your data and the particular color transformation used to convert the data values to output colors. If there's alot of "empty space", you'll get a huge win.

There are several, less significant optimizations you can do in the raycasting pixel shader. For example, if your accumulated output alpha approaches 1.0, you can terminate the loop since any further steps would not be visible.
 
The card spends most time fetching texture.
I tried 256*256*256 3D texture which I set texture parameter to GL_LUMINANCE_ALPHA, and render windows 800*600, and I get only below 12fps.
Something is really wrong with those numbers... Im getting 12.1G fetches/sec on G92 (780MHz). This is with single channel R16f, 256*256*256 3D texture, linear filtering without mipmaping.
Should be more than enough: 416 fetches/pixel for 800*600*60fps.
 
Something is really wrong with those numbers... Im getting 12.1G fetches/sec on G92 (780MHz). This is with single channel R16f, 256*256*256 3D texture, linear filtering without mipmaping.
Should be more than enough: 416 fetches/pixel for 800*600*60fps.
Could be that he has a complex pixel shader with gradient calculations, etc. I had a pow() call or two and that slowed things down quite a bit until I turned it into a LUT.
 
Not a trivial problem to solve! I have an axis-aligned slice volume renderer but am considering moving to volume textures in the future. Performance is very important for interactivity so I've thought quite a bit about techniques.

The best optimization is to do nothing! That means that you don't want to call the pixel shader when its output will have an alpha=0. There are several ways to do this but a simple one is to create a 3d array in CPU memory that contains the minimum and maximum data values in a block of the volume data texture. For example, create a 16x16x16 3d array containing the volume data mins/maxs for each 16x16x16 block of the volume texture. At render time, compare the data mins/maxs in this array to your output transform color texture to see if the block has a chance of producing any visible output. You only need to start your pixel shader on the blocks that will produce output. Render the ray starting distance into an offscreen target. A similar thing can be done on the backside of the volume texture. Render the ray termination distance into another offscreen target. That'll reduce the total number of steps in your raycasting pixel shader.

How much those steps improve performance depends on the nature of your data and the particular color transformation used to convert the data values to output colors. If there's alot of "empty space", you'll get a huge win.

There are several, less significant optimizations you can do in the raycasting pixel shader. For example, if your accumulated output alpha approaches 1.0, you can terminate the loop since any further steps would not be visible.

Your method is empty space skiping. I read about SIGGRAPH 2004 volume rendering course and Eurograph 2006 papers, they mentioned about this. The idea is: splitting the volume box into n*n*n boxes, then filter out the boxes which are empty. This will reduce the texture feching and sampling work. But when I use the technical that I split to 8*8*8 boxes, performance drops very significantly. Especially when I rotate my scene, the performance will drop to 1 fps or below.
I don't know if by using empty space skipping, the cache of texture is not works efficiently.
 
Something is really wrong with those numbers... Im getting 12.1G fetches/sec on G92 (780MHz). This is with single channel R16f, 256*256*256 3D texture, linear filtering without mipmaping.
Should be more than enough: 416 fetches/pixel for 800*600*60fps.

Can you tell me how to measure the performance?
 
Also I post my shader code here, is there any optimization about this shader?

uniform sampler1D transferFunction;

uniform sampler2D texFrontPos;
uniform sampler2D texFrontTex;
uniform sampler2D texBackPos;
uniform sampler2D texBackTex;
uniform sampler2D texScenePos;

uniform sampler3D volumeData;
uniform sampler3D maskData;

uniform float transparency;
uniform float alphaCutOff;
uniform float sampleDensity;

uniform vec4 volScale;
uniform vec2 renderSize;

void main()
{
vec4 dst=vec4(0,0,0,0);
vec2 screenTex=gl_FragCoord.xy/renderSize.xy;

//Calcuate ray length
float rayLength;
vec4 sceneFrontPos=texture2D(texScenePos, screenTex);
vec4 frontPos=texture2D(texFrontPos, screenTex);
vec4 backPos=texture2D(texBackPos, screenTex);

//If scene is empty
if(sceneFrontPos.a==0)
{
rayLength=distance(frontPos, backPos);
}
else
{
rayLength=distance(frontPos, sceneFrontPos);
}

//Calcuate ray direction
vec3 rayDirection;
vec4 frontTex=texture2D(texFrontTex, screenTex);
vec4 backTex=texture2D(texBackTex, screenTex);

rayDirection=normalize(frontTex - backTex).xyz;

rayLength=rayLength/length(rayDirection*volScale.xyz);

if(rayLength>1.732)
{
return;
}

vec3 rayPosition=frontTex.xyz;

vec3 deltaRayPosition=rayDirection/sampleDensity;

//rayPosition-=deltaRayPosition*fract(sin(dot(screenTex.xy ,vec2(12.9898,78.233))) * 43758.5453);

int stepCount=sampleDensity*rayLength;

for(int i=0;i<stepCount;i++)
{
vec4 vol=texture3D(volumeData, rayPosition);
vec4 src=texture1D(transferFunction, vol.a);
if(vol.a<alphaCutOff)
{
rayPosition-=deltaRayPosition;
continue;
}

//vec4 mask=texture3D(maskData, rayPosition);

//if(mask.r<0.5)
//{
src.a=src.a*transparency;
dst.rgb=mix(src.rgb, dst.rgb, dst.a);
dst.a=mix(src.a, 1.0, dst.a);
if(dst.a>=1) break;
//}

rayPosition-=deltaRayPosition;
}

gl_FragColor=dst;
}
 
Your method is empty space skiping. I read about SIGGRAPH 2004 volume rendering course and Eurograph 2006 papers, they mentioned about this. The idea is: splitting the volume box into n*n*n boxes, then filter out the boxes which are empty. This will reduce the texture feching and sampling work. But when I use the technical that I split to 8*8*8 boxes, performance drops very significantly. Especially when I rotate my scene, the performance will drop to 1 fps or below.
I don't know if by using empty space skipping, the cache of texture is not works efficiently.
Don't do it in the pixel shader, do it on the CPU before calling the pixel shader. Fill in your texFrontPos and texBackPos with the faces of the blocks that contain visible data instead of simply filling them in with the front and back faces of the entire volume texture.

As I noted earlier, my VR uses axis-aligned slices. I do empty space clipping by finding the min/max data values in a 6x6 grid for each slice. I see huge perf wins, especially in isosurface mode, because my data usually contains plenty of "empty space". However, my volumes are much smaller the the OPs, typically 96x96x64.
 
Back
Top