Radeon 9700 floating pointing point support

DemoCoder · Sep 7, 2002

OpenGL guy said:
You want full programmability? Use a CPU then. There's a reason why CPU don't compete with GPUs (or whatever your favorite term is). If you go to full programmability, how does a GPU keep it's advantage?

Because a GPU is a stream based SIMD vector processor and a CPU isn't, that's how. Full programmability (e.g. turing completeness) can be achieved without losing the GPU's advantage of stream processing and highly parallel vectorized code.

CPU's are optimized mainly for scalar computations and the vast majority of code (OS, applications, databases, etc) that executes on them is scalar in nature. This code is also "non-pure" in that it mutates the environment/manipulates external state. This yields all kinds of roadblocks to optimization. In addition, GPU's have a well defined memory access pattern due to their streaming nature, making life much easier for the memory controller. CPU's have to deal with heavy indirection and random access.

For the most part, GPU's execute "pure functional" code. (see Functional Programming languages like Haskell). They cannot store state inbetween executions (except in multipass) and such programs take well defined input and return well defined output. There is also no aliasing. This allows a compiler to do aggressive liveness analysis, redundant code elimination, and equational reasoning. It also allows the hardware to highly parallelize execution since interdependency can be exactly determined.

It is certainly possible to come up with a model of computation that is based on streams and which is nearly universal. Once you factor in multipass, universality is assured.

Hellbinder · Sep 7, 2002

3d accelerators were created for a reason. The further you go away from Specific use to general programability... The Further you go away from the advantage of having one in the first place.

Sooner or later the chip is going to lose its speed advatage.. or costt 1000$.

People need to back up and look at the big picture.

OpenGL guy · Sep 7, 2002

gking said:
OpenGL guy wrote:
NV30 can't handle floating point cubemaps natively. Is it a big deal? I don't know as I am not a developer

Click to expand...

Of course it can. The only issue is whether there are dedicated transistors for it, or if the developer (or the chip) coopts some other execution unit in order to provide that functionality.

I wouldn't call that "native" support. I would call that emulation.

Third, NV30 can't filter floating point textures

Sure it can -- this highly unoptimized routine (which I posted in OpenGL.org) does exactly that:

float4 FilterFloatBuffer(samplerRECT tex, float2 uv) {
float4 deltabase;
deltabase.xy = frac(uv);
deltabase.zw = floor(uv);
float4 smp = f4texRECT(tex, detabase.zw);
float4 accum = (1.0.xxxx-deltabase.xxxx)*smp;
smp = f4texRECT(tex, deltabase.zw+float2(1,0));
accum = accum + smp*deltabase.xxxx;
accum = accum * (1.0.xxxx-deltabase.yyyy);
smp = f4texRECT(tex, deltabase.zw+float2(0, 1));
float4 tmp = smp*(1.0.xxxx-deltabase.xxxx);
smp = f4texRECT(tex, deltabase.zw+float2(1,1));
tmp = tmp+smp*deltabase.xxxx;
accum = accum + tmp*deltabase.yyyy;
return accum;
}

And with a logarithm, a couple of derivative operations, a couple of dot products, and a mad or two, this should be extendable to perform trilinear filtering as well.[/quote]
And, again, this is all emulation. You can't flip a bit (i.e. enable bilinear/trilinear filtering" and expect the hardware to give you the desired result.

KimB · Sep 7, 2002

OpenGL guy said:
And, again, this is all emulation. You can't flip a bit (i.e. enable bilinear/trilinear filtering" and expect the hardware to give you the desired result.

I don't see how that can't be done with an HLSL. After all, shouldn't constant-based branching be available no matter what?

gking · Sep 7, 2002

You can't flip a bit (i.e. enable bilinear/trilinear filtering" and expect the hardware to give you the desired result.

What's the "desired result" when the texture is being used to store depth? What about vertex positions? Specular exponents? BRDF samples? Projected areas? The developer knows (or at least should know) far more about the right way to filter the data in the texture (if it should be filtered at all) than the driver -- and certainly more than the average user.

Unless you expect that the average use for floating-point textures will be as a replacement for current 8-bit per channel textures, then linear filtering will not be acceptable in most cases.

I wouldn't call that "native" support. I would call that emulation.

Then it's a semantic issue - I'd say that any execution path which runs efficiently on a processor without external involvement is completely native. Since fetch bandwidth will dominate performance for floating point textures, given the option of a programmable floating point multiplier or a multiplier hardwired to the texture filtering engine, I'd rather have the multiplier available for me to do as I please. My texture fetch might not be quite as efficient as if it were hardwired, but that performance hit should be more than offset by the addiitonal performance an additional multiplier would offer during non-texturing operations.

I was obviously smoking something pretty strong when I wrote my bilinear filter code... LERP is much more efficient than doing the MADs myself.

manux · Sep 7, 2002

I think this whole too programmable/not programmable enough discussion is a bit silly when we all have very different motives. One might be a gamer(make my games run fast and nice), one might be developer(I'll publish this game in 2004, but it would be nice to do the engine ready now). Also there might be some researchers doing work on new algorithms using new (g/v)pus. And then there of course is this bashing of the competitors from fanboys.

Given time like 5 years there *will* be enough programmability and speed. In light of this its not so productive to argue about vendor xxx or yyy's approach beeing better when they take different routes. But currently I don't see vendors taking that different routes, possibly with the exception of 3dlabs. I'm quite curious about their refresh for p10.

I think what currently is important for gamers is good enough performance for most of the stuff used NOW and maybe in one years time(doom3?). If the new features or "emulations" for old features aren't that fast it doesn't matter for developer as long as they can use the features to create future games. And for the gamers, why would they care if hardware runs slow some feature when their games even don't use the features.

What matters is that the hardware when games using extreme programmability come out is fast enough for those games. Even then there might be some new features that are slow and not used anywhere else than technology demos. And the gamer is pissed because he always has to buy new hardware, the developer might be satisfied with slower hardware if it just has the features, solid drivers etc.

What matters at least to me is that this programmability comes here sooner than later, I wouldn't mind coding for slower more programmable hardware, at least as long as the hardware is faster than than the cpu in my computer. And I quite like the idea of writing everything myself, but for commercial solutions I don't see it very viable option, at least not for small companies. Hence there is niche market for some cool near the hardware guys that would make nice, fast and robust support libraries that work on all platforms... Most likely we see the chipmakers making most of these support libraries by themselves.

And think this is my first post and all off topic

so shoot me.

DemoCoder · Sep 7, 2002

I think gking and I have mentioned this several times in various threads already, but in how many cases would you actually desire bilinear or trilinear filtering on a floating point map?

As for the ability to output multiple (non-packed) 128-bit FP pixels per pixel, again, in which cases would this be desirable compared to outputing multiple (packed) 32-bit pixels?

The way I see it, either you are outputing 1 single 128-bit FP pixel so that you can use that as input to another pass, OR, you want to output multiple 32-bit pixels to do 2D image manipulation/multisampling/post filtering.

The question is, what is the performance drop when outputing 4 128bit pixels on the R300? You're talking about 16 times the bandwidth over a 32-bit pixel write. Moreover, depending on the layout of these pixels, you might drive down memory efficiency further. The packed solution seems to cover the common case and eats up no more bandwidth than a single 128-bit pixel and requires no extra memory swizzling tricks.

This seems like a R300 feature that will be rarely be used because of performance implications. Moreover, if you wanted to do it on other platforms, you could just use go the multipass route.

I doubt many of these exotic features of either the NV30 or R300 are going to be used unless they can be hidden in a high level library or language. Dev's are gonna stick with DX9. I'm sure ATI and NVidia will ship lots of demos showing just how you can use these features for a special effect, but I doubt they will make any difference in the long run until DX10.

Humus · Sep 7, 2002

DemoCoder said:
I think gking and I have mentioned this several times in various threads already, but in how many cases would you actually desire bilinear or trilinear filtering on a floating point map?

One case that comes to mind immediately is high dynamic range lightmaps.

gking · Sep 7, 2002

One case that comes to mind immediately is high dynamic range lightmaps.

If RGBE or rescaling the lightmap won't suffice (they do in a surprising number of cases), then you either have to pack portions of the floating point numbers into 32-bit textures to be reconstituted in the shader, or filter a floating point texture in the shader.

However, it's still my opinion that anyone who cares enough about quality to store lightmaps in float textures should probably be doing bicubic, gaussian, summed-area, or some other fancy filtering on them. Linear artifacts are just ugly.

Humus · Sep 7, 2002

Anyway ... HLSL's will solve much of the problems. But we will need to have predefined functions for fetching bi-/trilinear and anisotropic etc texels. Then the driver might or might not support it natively, with performance being the biggest difference. As long as we can hide hardware details being function names that the IHV can implement whatever way they want we should be pretty fine. We also need to get away from the limitations on number of instructions/constants/textures etc. As long as we have limits on these kinds of stuff the IHV's implementation of certain functions may break a certain shader.

Anyway, cubemaps/3d textures should still be supported natively IMO. Now if the hardware really lays it out on a 2d surface or not doesn't matter, but the developer should be able to upload it as a cubemap and use it as such.

Humus · Sep 7, 2002

gking said:
One case that comes to mind immediately is high dynamic range lightmaps.

Click to expand...

If RGBE or rescaling the lightmap won't suffice (they do in a surprising number of cases), then you either have to pack portions of the floating point numbers into 32-bit textures to be reconstituted in the shader, or filter a floating point texture in the shader.

However, it's still my opinion that anyone who cares enough about quality to store lightmaps in float textures should probably be doing bicubic, gaussian, summed-area, or some other fancy filtering on them. Linear artifacts are just ugly.

Using floats doesn't neccesarily mean that you care about quality, you might just be interested in being able to light a surface with an intersive light and make it brighter than base texture to the point where it turns white. Can be used for loads of cool effects.

darkblu · Sep 7, 2002

nice thread forming up here. as long as the kind participants in it try to restrain from statements like 'part A sux as viewed from my hilltop' we may actually enjoy it a bit longer.

OpenGL guy said:
gking said:

OpenGL guy wrote:
NV30 can't handle floating point cubemaps natively. Is it a big deal? I don't know as I am not a developer

Click to expand...

Of course it can. The only issue is whether there are dedicated transistors for it, or if the developer (or the chip) coopts some other execution unit in order to provide that functionality.

Click to expand...

I wouldn't call that "native" support. I would call that emulation.

since when is 'emulation' a dirty word? consider this - ever since p2, IA32 has being effectively emulated - does anybody consider a pre-pentium part overall better than a p2 and on? i think we could all agree that if both performance- and convenience-wise the emulated feature compares to the hardwired feature, then we can all call it a day for the emulation.

ogl guy said:
Third, NV30 can't filter floating point textures

gking said:
Sure it can -- this highly unoptimized routine (which I posted in OpenGL.org) does exactly that ..

ogl guy said:
And, again, this is all emulation. You can't flip a bit (i.e. enable bilinear/trilinear filtering" and expect the hardware to give you the desired result.

here's a totally lame question from me: purely featurewise, what would you prefer - a processing unit that has rountine A as a build-in instruction or a processing unit that allows you to implement routine A, along with a bunch of others? i, for one, would prefer the latter. of course, the topic of this discussion is GPUs in particular, not CPUs, XPUs or YPUs. so here we have at hand this trade-off - fixed functionality vs programmable functionality at the cost of performance. now, apparently nvidia, along with gking, democoder and others, believe that time has arrived when we have the means that allow us to shift the slider of the above trade-off more towards the higher-programability end, as this is what we all've been wishing for.

Xmas · Sep 7, 2002

Humus said:
Anyway ... HLSL's will solve much of the problems. But we will need to have predefined functions for fetching bi-/trilinear and anisotropic etc texels. Then the driver might or might not support it natively, with performance being the biggest difference. As long as we can hide hardware details being function names that the IHV can implement whatever way they want we should be pretty fine. We also need to get away from the limitations on number of instructions/constants/textures etc. As long as we have limits on these kinds of stuff the IHV's implementation of certain functions may break a certain shader.

100% agreed.

Humus said:
Using floats doesn't neccesarily mean that you care about quality, you might just be interested in being able to light a surface with an intersive light and make it brighter than base texture to the point where it turns white. Can be used for loads of cool effects.

Why would you need float textures for this?

Oh, and why is this discussion about filtering coming up anyways? Both next generation chips do not support floating point texture filtering.

gking · Sep 7, 2002

Humus wrote:
you might just be interested in being able to light a surface with an intersive light and make it brighter than base texture to the point where it turns white. Can be used for loads of cool effects.

Yes, it does look really cool -- but you don't need float textures for it, especially if perfect image fidelity or physically accurate rendering isn't your final goal. You just need to be able to represent a value greater than 1 in the shader.

Whether that means that you use RGBE (which decodes as R*2^m, G*2^m, B*2^m, where m=255*E-128), just a scaled RGBA texture (with optional per-pixel scale factor in A), passing a light radiance constant to the shader in the range of [0..inf) rather than just [0..1], or using float textures is irrelevant.

Just because float textures are available doesn't mean old texture types have been deprecated. Arbitrary swizzling, generalized dependent lookups, floating-point intermediate registers, and per-pixel exponentiation operations provide a great deal of added functionality to old texture formats -- it might require some clever bit-twiddling to figure out how to stuff your data into 8-bit/channel textures, but in many cases it can be done -- especially if what you are trying to stuff was originally image data.

hax · Sep 7, 2002

DemoCoder said:
In PS. inf

Sure it can. The only thing I don't see being supported is rrgg, you could do rrgb, just not 4 to 2 and replicated. I wouldn't say it can't support swizzling.

Basic · Sep 7, 2002

hax:
Do you say that it can do any swizzles, except those that replicates two source components?

Xmas · Sep 7, 2002

hax said:
DemoCoder said:

In PS. inf

Click to expand...

Sure it can. The only thing I don't see being supported is rrgg, you could do rrgb, just not 4 to 2 and replicated. I wouldn't say it can't support swizzling.

That's not exactly what I'd call arbitrary swizzle, but hey, you can "emulate" it

alexsok · Sep 7, 2002

That's not exactly what I'd call arbitrary swizzle, but hey, you can "emulate" it

I guess that's the reason "Swizzling" is marked as not supported by R300 in nVidia's papers...

hax · Sep 7, 2002

Basic said:
hax:
Do you say that it can do any swizzles, except those that replicates two source components?

Yes, kind of. You just can't end up with rrgg, you can get rggg (which is 2 replicated). On the rgb instruction you can pull any of the argb components (like aaa, bbb, rrr, ggg, abg, bgr,etc.). On the alpha side you can pull any channel in, completing the 4 channels.

hax · Sep 7, 2002

alexsok said:
That's not exactly what I'd call arbitrary swizzle, but hey, you can "emulate" it

Click to expand...

I guess that's the reason "Swizzling" is marked as not supported by R300 in nVidia's papers...

I'd call that marketing BS against a competitor.

Radeon 9700 floating pointing point support

DemoCoder

Hellbinder

OpenGL guy

KimB

gking

manux

DemoCoder

Humus

Crazy coder

gking

Humus

Crazy coder

Humus

Crazy coder

darkblu

Xmas

Porous

gking

hax

Basic

Xmas

Porous

alexsok

hax

hax

Similar threads