A Few Notes on Future NV Hardware

Dave Baumann

Gamerscore Wh...
Moderator
Legend
I've just got back from NVIDIA's "Dusk-till-Dawn" developer event. I'll have an article with an overview up of the event sometime, but here a a couple of little hardware details picked up from the event:

  • W-Buffer support dropped from future NV hardware (use FP calculations if you want higher precision).
  • Sounds as though NVIDIA has interpreted the precision spec in DX to allow them to always calculate the PS stages in FP16. The _PP precision hint seems to only apply to texture co-ordinates.
  • Z and Stencil "are one buffer" so they should be cleared at the same time. Z-Cull is disabled with stencils.
  • Numerous references to "Second texture runs at full speed". Mmmmm...
 
Is this not the same as for the R300? Or something similar?

Yes. NV3x sounds as though it will operate in exactly the same way as R300 hardware does in this respect.
 
DaveBaumann said:
Sounds as though NVIDIA has interpreted the precision spec in DX to allow them to always calculate the PS stages in FP16. The _PP precision hint seems to only apply to texture textture co-ordinates.

LOL! That's exactly what I thought they'd do! It's kinda too bad. . . I thought I was just being a pessemistic bastard. . .
 
Did they mention anything or did you pick up any "hints" regarding their next-gen, unified shading model?

MuFu.
 
DaveBaumann said:
[*]Sounds as though NVIDIA has interpreted the precision spec in DX to allow them to always calculate the PS stages in FP16. The _PP precision hint seems to only apply to texture co-ordinates.[/list]

How inventive of them...

Now where did my large texture maps go - or at least, where did all my bilinearly interpolated texels go... :?
 
Ostsol said:
DaveBaumann said:
Sounds as though NVIDIA has interpreted the precision spec in DX to allow them to always calculate the PS stages in FP16. The _PP precision hint seems to only apply to texture textture co-ordinates.

LOL! That's exactly what I thought they'd do! It's kinda too bad. . . I thought I was just being a pessemistic bastard. . .

The good news is that nVidia is wrong! Amar has just confirmed to me (on DirectX mailing list) that temp registers have to support s16e7 (i.e. 24 bit floats). It appears that there was a typo in the spec.

It this hadn't have been corrected, it would have set D3D back several years. This is going to cause havoc with benchmarks, when nVidia 'fix' the drivers and use 32 bit mode how much LOWER will they score with pixel shader 2.

Now I've got to convince them to fix there drivers... but at least I can publish my ShaderX2 stuff without worry.

Pissed off at 3pm, happy at 10pm (with getting stuck in a bomb scare for 3 hours, thats not bad going).
 
DaveBaumann said:
W-Buffer support dropped from future NV hardware (use FP calculations if you want higher precision)

Uh-oh, DS, where are you?
Its not just ATI now - it must be a global conspiracy to destroy your games.
But at leat we wont have to hear one-sided anti-ATI rants abut this anymore...
 
Thanks Dean

The good news is that nVidia is wrong! Amar has just confirmed to me (on DirectX mailing list) that temp registers have to support s16e7 (i.e. 24 bit floats). It appears that there was a typo in the spec.

I assume that is "at least 24 bit floats"? i.e. I assume NV3x will actually operate at FP32 precision, rather than FP24?

This is going to cause havoc with benchmarks, when nVidia 'fix' the drivers and use 32 bit mode how much LOWER will they score with pixel shader 2.

You think the drivers are currently usin FP16?
 
Yep of course your correct, for PS_2_0 the minimum precision for temp registers has to be s16e7. Same for texture registers but constant registers can be a minimum of s10e5 and colour iterators can be low precision with a range of 0-1.

Woah. Must have been on your train home?
The bomb scare was at King Cross, you most have just got your train just before.
I got in the house and made the post straight away and Amar replied very quickly. As you know I was not a happy bunny when nVidia told us, so it was the first thing I wanted to do.

Now the hard bit is going to be convincing them to redo the drivers! Lets hope we can get this cleared up before the GeforceFX boards ever go really live.

I wonder if I've just made an enemy of a large corparation... hope not they do throw such good parties :)
 
For Dave (and others) ref, here's a quick key quote from the list that DeanoC is talking about.

- For ps_2_0 compliance, the minimum level of internal precision for temporary registers (r#) is s16e7** (this was incorrectly s10e5 in spec)
- The minimum internal precision level for constants (c#) is s10e5.
- The minimum internal precision level for input texture coordinates (t#)
is s16e7.
- Diffuse and specular (v#) are only required to support [0-1] range, and
high-precision is not required.

It seems to me (I'm no dev) that a FP16 pipeline might get in a bit of trouble here, although most of us are still in the dark about the NV30 implementation.
 
jjayb said:
So the nv30 will have Chalnoths so called "ati stencil problems" in doom 3 also? :LOL:
Possibly. Will need some tests. The disabled technology is definitely called by a different name between the two cards. How different is it in reality?
 
Chalnoth said:
Possibly. Will need some tests. The disabled technology is definitely called by a different name between the two cards. How different is it in reality?

The upshot, Chalnoth, is that regardless of how they will perform in any single title they both have exactly the same characteristics in regards to Z removal and stencil ops.
 
DeanoC said:
The bomb scare was at King Cross, you most have just got your train just before.

Thameslink is not joined to the main King Cross station, prolly why we avoided it. Heh - sounds like fun getting home for everyone what with bomb scares at train stations and the army around Heathrow!

Now the hard bit is going to be convincing them to redo the drivers! Lets hope we can get this cleared up before the GeforceFX boards ever go really live.

What makes you think they are already using FP16 in the drivers?

I wonder if I've just made an enemy of a large corparation... hope not they do throw such good parties :)

Yes, shall have to dig out my camera! ;)
 
Sometimes you need to clear stencil buffer only. Shadow volumes with multiple lights comes to mind.

When can we expect Derek Smart here? :D
 
DaveBaumann said:
DeanoC said:
Now the hard bit is going to be convincing them to redo the drivers! Lets hope we can get this cleared up before the GeforceFX boards ever go really live.

What makes you think they are already using FP16 in the drivers?

Various chats with some of Dev Rels (I spend a fair few hours after the party/at breakfast chatting with them), seemed to indicate that Dx9 PS_2_0 was running. I was told that I could run my code as soon as I got the board, also the cgFX demo was ran on both OpenGL and Dx9 and some of them used long shaders.

As they were warning us about using FP16, they most have wrote the driver to use it. The backend conversion from PS_2_0 to internal ops will be quite FP16 specific to get good performance (packing temp registers etc).
 
Hey Dave,

Where you the one dancing on the bar with the girls ?

:)


The whole FP16/32 issue is very confusing - the presenter struggled with a LOT of the information he was trying to get across.
 
Back
Top