NVIDIA CineFX Architecture (NV30)

Yeah, I get the feeling that John Carmack is just going to go bananas...not to say that anything else sucks in comparison, but man...this thing _seems_ to be a developers dream, as far as flexibility is concerned.

Anybody want to place a bet that more and more information is going to be leaked out over...oh, let's just say...the next 30 days or so?
 
Hmm. Somehow, I can't help thinking that this paper was written by the marketing department . . . PS2.0+ ? ;)

But still, some interesting information in it. Hmm. Is dynamic branching only supported for Vertex Shaders?

ta,
-Sascha.rb
 
Typedef Enum said:
Anybody want to place a bet that more and more information is going to be leaked out over...oh, let's just say...the next 30 days or so?

:)

NVidia's silence since the R300 paper release has been absolutely deafening.

I was thinking that they were either suprised by the R300 or not impressed at all. After seeing this I'm starting to think that it is the latter.
 
Typedef Enum said:
Yeah, I get the feeling that John Carmack is just going to go bananas...

Aforementioned John
worship.gif
must certainly have prototype board by now ...

( this emoticon suits the name so well .. kudos to whoever posted it first :D )[/img]
 
Hmm, this definitly seems to be an marketing paper.
The vertex-shader specs looks very similar to the R300 (sure, thats because both are designed around DX9). It seems to have
nealy the same functionality and limitations (1024 static instructions, both nv30 and R300 have loops and sub-routines).
The pixelshader seems to missing Flowcontrol. The support of up to 1024texel reads and 1024instructions sounds really impressive(compared to 16/32 texelreads and 160instructions, but I dont think that even an nearly full used R300 pixelshader will be useful for real-time graphics. (maybe a few generations later).
 
Geeforcer said:
Shatters existing limitations: the number of instructions supported is increased
from 128 to 65,536 through the use of data-dependent branching and more
instructions, registers, and constants.

What do they mean by "increased from 128 to 65,536 through the use of data-dependent branching and more instructions, registers, and constants."? Are they using marketing speak to count how many times you can loop back to execute instructions over again? If they don't mean that, how could they slip and use "through" instead of "with" in a public presentation?

Educate me.

Provides greater flow control: dynamic loops and branches provide for forward and
backward changes in flow; call and return functions have been introduced, and
vertex processing can also invoke an early exit on program termination.

This does indeed sound powerful, but how is this different than the DX 9 spec? Is it the keyword "dynamic" used with "forward and backward changes in flow"? And/or "call and return"? I guess I'm asking for where the DX 9 spec is listed.

Introduces new capabilities: per-component condition codes and write masks.
Evolves to an advanced instruction set: new instructions and capabilities including
branching (BRA), high-precision trigonometric functions (COS, SIN), and high-precision
exponentiation and logarithm functions (EX2, LG2, and others).

Again, all sound very powerful (and I have to think specifically targetted at NV30, not a later part as someone suggested) but where is the info that says this is not a standard part of DX 9?

The key benefit I saw for the NV30 (compared to R300) was the 1024 instruction limit on pixel shaders and perhaps the term "conditional write masks" (I don't know what it means...but perhaps the earlier comment about calculating two outputs and conditionally applying one or the other relates to this).

What does "Enhances fragment program storage: stored in video memory, unlike vertex programs, bringing costs down for managing lots of fragment programs." imply? There seems to be a wealth of info that could be deduced from that, but I don't have the knowledge. Again, educate me. ;)
 
The inherent 16- and 32-bit floating point formats (FP16 and FP32) of the NVIDIA “CineFXâ€￾ architecture give developers the flexibility to create the highest-quality graphics. FP32 offers the ultimate image quality, delivering true 128-bit color. FP16 provides an optimal balance of image quality and performance. In fact, the FP16 format offers exactly the same format and precision that Industrial Light and Magic and Pixar use for production of their feature films and special effects

I wonder if they are just talking about increased frame buffer/memory performance or something at the hardware level when in FP16 mode.

[edit] Reading on I'm not so sure they are just talking about the frame buffer:

Developers are free to move back and forth between these formats in their application, using the format that is best suited to a particular computation. For instance, some actions such as indexing into a texture can only be optimally accomplished using a 32-bit floating-points format. If the texture is larger than 1024 x 1024 (210 x 210, requiring at least 20 bits), the developer needs FP32 to access all of the data. Other computations can be accurately accomplished using FP16, and can benefit from the maximized execution speed afforded by this level of precision.
 
According to the developer site, they'll be releasing a whole bunch of white papers and presentations from Siggraph over the course of the next week. Maybe it would be a good time to keep tuned!? ;)
 
Ahem, what is GEOMETRY displacement mapping?
Could this be what JC was referring to when he rather petulantly snapped at Matrox's displacement mapping; he originally thought that they were performing DM via quads but then revised his comment when he realised this wasn't true. He then went on to say something along the lines of "I was getting confused with another company that is using quads".
 
Maybe Geometry Displacement Mapping merely means that the displacement value (sampled from the displacement map) is available in a register in the vertex shader so that the vertex itself can be displaced?

Let there exist a new register called D0 available in the vertex shader that contains a value sampled from the displacement map

You can then write code like this

Position = Position + D0 * normal

Which would move the vertex in the direction of the normal by D0 amount.

Or, you could displace the geometry in other kooky ways

Let C0 = the position of the center of our primitive object

R = Position - C0
Position = Position + D0 * Normalize(R)

This could be used with an animated displacement map (or via other parametric equation) to "explode" the object with a map.


Another possibility is the ole fashion heightmap

Position = Position + <0,1,0> * D0


Anyway, that's the theory. This "make displacement value available in vertex shader register" was discussed during a GDC presentation.
 
Maybe Geometry Displacement Mapping merely means that the displacement value (sampled from the displacement map) is available in a register in the vertex shader so that the vertex itself can be displaced?

I think it is.
 
Well, I'm not that impressed. Seems a lot like "damge control" to me. Basic nVidia propaganda, with few juicy bits. Annoying ones are : "Industry standard Cg" and "Worlds poweful GPUs". Both claims are false.

About the nv30 specs, yes pixel shaders look good. BUT there isn't flow control and vertex stuff seems about the same with R300. Texture count seems to about same too (per pixel). Also maybe can someone can clear this for me, but why it's a big thing if there is max 1024 instructions for pixel shader to execute ?

That is, what you are going to do with that long programs in real-time use ?! Probably execellent when converting Renderman shaders and speeding offline rendering, but in real-time use ?
 
eSa said:
About the nv30 specs, yes pixel shaders look good. BUT there isn't flow control and vertex stuff seems about the same with R300. Texture count seems to about same too (per pixel). Also maybe can someone can clear this for me, but why it's a big thing if there is max 1024 instructions for pixel shader to execute ?

That is, what you are going to do with that long programs in real-time use ?! Probably execellent when converting Renderman shaders and speeding offline rendering, but in real-time use ?

If the textures can be reused by different texture coordinates, 16 textures should be enough for many effects. Furthermore, many operations will not need table look-up at all if you have so many instructions to do it.

And I think limitless is better than limited. 1,024 instructions is close to limitless for current applications (and implementations). That's the point. Perhaps some people will want to use them for some nice off-line rendering.
 
eSa said:
About the nv30 specs, yes pixel shaders look good. BUT there isn't flow control and vertex stuff seems about the same with R300. Texture count seems to about same too (per pixel). Also maybe can someone can clear this for me, but why it's a big thing if there is max 1024 instructions for pixel shader to execute ?

Well, if you don't have loops/branching, then you have to unroll the pixel shader and use conditional assignment. Therefore, a bigger max limit (1024 vs 96 in DX9) makes a lot of difference.

eSa said:
That is, what you are going to do with that long programs in real-time use ?! Probably execellent when converting Renderman shaders and speeding offline rendering, but in real-time use ?

Yes, no one is going to write realtime pixel shaders much longer than 16-32 instructions. On GF3 IIRC, each pixel shader op takes approximately 1/2 a cycle, and you can do 2 shader (RGB) ops per cycle. Even assuming that the NV30 and R300 are beefed up and can execute say, 4 color ops per cycle, a 32 instruction shader could chew up 8 clocks. Now consider 128 instruction shader. It will chew up 32 clocks, stalling the vertex pipeline and everything else, and putting a whammy on performance.

The long shaders are nice for previews of offline renders, but I doubt the shader pipelines are fast enough to make longer programs worthwhile. Maybe when the NV40 and R300 on .10 micron hit 800Mhz and they can dispatch up to 8-16 pixel shader ops per cycle, you will see the longer programs. For now, they will kill fillrate and make you crank down to 1024x768 or 800x600 if you want the eye candy.
 
DemoCoder said:
Even assuming that the NV30 and R300 are beefed up and can execute say, 4 color ops per cycle, a 32 instruction shader could chew up 8 clocks. Now consider 128 instruction shader. It will chew up 32 clocks, stalling the vertex pipeline and everything else, and putting a whammy on performance.

Once we have DX9/DX10 in place I hope that we're going to move from fillrate marketing to ops per cycle marketing in regard to vertex and pixel shaders performance. It'll be quite interesting to see a DX-test that can expose just how powerful a given shader-engine is. In a not to distant future every pixel will probably be rendered with a shader (simpel or complex) so ops per cycle is what we want much more of! (And please inquire about this figure when doing a tech preview on future hardware guys...)
 
It says 256 temp registers. Do you think they mean 256 32bit words or 256 128bit words? I think if they were going to introduce NV31 at 64bit and NV at 128, it would be easier for NV if it were 256 32bit words. Which may be funny if true because then the NV31 would have more registers to use(when operating at full precision).
 
Yes, once you have 1600x1200 FSAA running at monitor refresh rates and acceptable levels of overdraw, ops per cycle becomes more important. Since DX9+ lessens the need for multipass, and into the future, more and more will be done in one pass, it becomes not the fillrate that is important (since the screen can be filled many times over at refresh rates @ 1600x1200 4XFSAA) but how much stuff you can do per pixel.


Stencil fillrate will probably still be important in the near term tho, but long term, I would like to see 3D pipelines that can issue 16 or even 32 ops per cycle per pipeline.
 
What I would want to see is:
a/ What games will use this in the near future?
b/ which GPU is fastest in CURRENT games?
 
K.I.L.E.R said:
What I would want to see is:
a/ What games will use this in the near future?
b/ which GPU is fastest in CURRENT games?
These & also how will these new features affect current or not so far into the future games.{e.g.the increased color per component}

Maybe we won't have to worry about banding any more for all games from the past & presently. :)
 
Back
Top