QuadroFx1000 ( NV30GL ) news and pic

DaveBaumann said:
What you mean to say is that potentially an alternative architecture may gain performance by specifying a lower bit rate, since R300's pixel shader processor rate is constant - it will always operate a 96bits of precision per clock. However, you also have to be sure of what rate the alternative architecture actually executes 64/128bit instructions at.

Of course. That's a good point, and you're probably refering in part to that 3 instructions/pipe/clock thing on the R300. Yes, it'll be interesting to see how that turns out.

In a similar architecture, 64-bit would probably not be two times faster than 96-bit. But IIRC ( and I could be wrong on that ) , the R300 VS always work at 128-bit ( unlike its PS which works at 96-bit ) , so the GFFX 64-bit could be very useful against the R300 in geometry-limited situations. Which are so rare it's not really so useful, ah well...


Uttar
 
Uttar said:
In a similar architecture, 64-bit would probably not be two times faster than 96-bit. But IIRC ( and I could be wrong on that ) , the R300 VS always work at 128-bit ( unlike its PS which works at 96-bit ) , so the GFFX 64-bit could be very useful against the R300 in geometry-limited situations. Which are so rare it's not really so useful, ah well...
There's no 'half float' mode in the VS, it's always 128-bit.
 
But the NV30 does have a "clustered" FP unit architecture, so that using less than a 128-bit 4-tuple results in a performance boost.
 
Xmas said:
There's no 'half float' mode in the VS, it's always 128-bit.

CineFX documents all seem to indicate the NV30 got FP16 ( 64-bit ) support in *all* the pipeline.
However, after looking at the DX9 SDK again, it seems DX9 only supports "half" ( or rather, Partial Precision, that's the name the SDK gives for that ) in the Pixel Shader, and not in the Vertex Shader.

Could it be only OpenGL is able to use FP16 in the VS? Or am I just interpreting CineFX documents wrong?


Uttar
 
Chalnoth said:
I don't think half-floats in the VS are a possibility. The z-errors would be horrendous.

I agree half-floats might give some fairly bad rounding errors in the VS. However, my point simply is that I think the CineFX architecture allows it.
And remember half-floats can be used for only some parts of a VS Shader program, so you could use floats for Z and half-floats for some other things.


Uttar
 
Chalnoth said:
I don't think half-floats in the VS are a possibility. The z-errors would be horrendous.
That's nothing. Would you bound your world in a 1024 units wide cube? I wouldn't :)

ciao,
Marco
 
16fp could be excellent for inputs. It can be high enough precision for a model-local coordinate system. But the calculations is better done in 32fp.

But 16i would be even better input format in those cases. Or maybe even 4x8i.

Calculations in 16fp could perhaps be useful for vertex lightning, and maybe for normals (as long as the normal isn't used for reflective surfaces or specular lightning on shiny surfaces).
 
Well 4*8u is fairly commonly used as packed colour data so that would be pretty much required to support it at least.
 
Basic said:
Calculations in 16fp could perhaps be useful for vertex lightning, and maybe for normals (as long as the normal isn't used for reflective surfaces or specular lightning on shiny surfaces).

Well, I would definitely hesitate to use it for normals, but I could definitely see applications for vertex lighting (thanks, didn't think of that!). In fact, I personally see no reason why the vertex lighting calculations shouldn't be done at 16-bit precision all the time. I wonder if we'll see significantly faster multiple-light polycounts on the FX due to this? I hope so!

And I kind of doubt that half-floats would be useful for input, except in limited demo situations (very small world).
 
Xmas said:
Basic said:
16fp could be excellent for inputs.
Besides the problem that the system that generates those inputs doesn't have native support for 16fp :)
Wouldn't need to. The conversion could be done in the driver to lower bandwidth over the AGP bus. But I still think this wouldn't be all that great for today's games. Maybe for small demos.
 
Chalnoth:
Notice, model-local coordinate system.
If the full range of the coordinate system is just large enough to fit a small model (say, some pick-up item). Then even 8 bits per component can be plenty for vertices. The local coordinate system doesn't have to span the whole world.

Xmas:
If the halfs aren't generated in real time, it doesn't matter if there's any native support. The important part is that it should be standardized across gfx cards, and I believe it is. A modelling program could easily implement 16fp-modes, by "truncating" their floats while running, and then convert them when finished.

And you may actually generate the halfs in a system that has native support. (Render to vertex buffer.)
 
Basic said:
Chalnoth:
Notice, model-local coordinate system.
If the full range of the coordinate system is just large enough to fit a small model (say, some pick-up item). Then even 8 bits per component can be plenty for vertices. The local coordinate system doesn't have to span the whole world.

Well, that sort of thing would work very well, but I have doubts about how well it could be used. You would need to at least to some of the processing at 32-bit precision, if not all, so it seems the main gain would be in AGP bus bandwidth, but can that realistically be obtained?

Would also only be useful for dynamic objects in a 3D scene...it's better to have all static objects pre-transformed in world-space (according to Vogel, in the UT2k3 engine anyway).
 
Basic said:
But the calculations is better done in 32fp.
So we agree about that. The smaller vertex formats were just for input, to save memory space and/or bandwidth.

And yes it's best on dynamic models, where you have to change the transform matrix anyway. And even more so for morphing or key frame interpolated models, since many keyframes kan take up a lot of memory. Or how about mesh displacements.
 
GPUTemps.jpg


:?:
 
Great...does this mean that to be fair, we're going to have to run benchmarks at several different environmental conditions and lengths of time to get the whole picture?
 
Back
Top