NVIDIA CineFX Architecture (NV30)

"My current work on Doom is designed around what was made possible on the original GeForce, and reaches an optimal implementation on the NV30. My next generation of work will be designed around what is made possible on the NV30."

LOL! It's bad enough that we'll have people quibbling over which new card will better run a game that's due for another year....(at which time even faster cards will be on the market competing for that prize)....

....NOW we'll have arguments about which of these cards is going to run some game better 5 years from now. ;)
 
Well, according to Tomshardware, that is not correct:

"Vertex programs can now consist of up to 1024 instructions (previously 128 instr.), but this is only a theoretical number, as loops and jumps naturally allow even larger numbers of successive instructions."

Hopefully he's right...I couldn't reach ATI's website for the real skinny...
 
Ah, but is there any point in 1024 instructions for this generation of hardware?

Even if there is 'backwards compatibility' when talking nv100, which can execute 1024 instructions per pixel per clock, do you think a game programmer will run the same shader on nv30 if the performance is more than two orders of magnitude slower?

The most you can make up for by changing resolution is about 1 order of magnitude (even 1600x1200 vs. 640x480 is only a factor of 6.25)... and AA and anisotropic filtering seem to be less than a factor of 2 now with R9700 (and, I presume, nv30).
 
DemoCoder said:
No point in doing multichip boards. Current software renderers are already designed to split up the scene and ship it to multiple computers for rendering. 10 cheap linux boxes with their 10 separate CPUs, agp buses, and NV30s will be alot cheaper, simpler to engineer/maintain, and better performing than a single linux box with 10 NV30s on a single board.

I just seem to remember J. Carmack mentioning upcoming multichip boards that could do what the Pixar-farms did in realtime.

This could be next-generation though nv35, r400
 
Well, this all sounds nice. What remains to be seen is if the AA and aniso implementations on NV30 are up to [the r300] par.

It would be really nice to see smart, high quality FSAA ( for instance
4 sample 64x Z3, with sample buffer compression ). I doubt this will happen though :/

Serge
 
Dio said:
Ah, but is there any point in 1024 instructions for this generation of hardware?

Even if there is 'backwards compatibility' when talking nv100, which can execute 1024 instructions per pixel per clock, do you think a game programmer will run the same shader on nv30 if the performance is more than two orders of magnitude slower?

The most you can make up for by changing resolution is about 1 order of magnitude (even 1600x1200 vs. 640x480 is only a factor of 6.25)... and AA and anisotropic filtering seem to be less than a factor of 2 now with R9700 (and, I presume, nv30).

There is no point for games right now (except maybe cinematic cut scenes), but NVidia is a for-profit company, and they aren't concerned with just the games market, but *all of 3D*. The NV30 will be sold to gamers, but the same chip will be sold into the workstation market.


Is Beyond3D just about games, or is this forum about 3D technology in general? I don't see games in the forum title. In terms of the workstation market, it seems like NVidia may have 3dlabs and ATI cornered, especially since they purchased ExLuna. 3dlabs and ATI will produce Cg-like tools to compile RenderMan shaders to work on ExLuna and PRMan, but NVidia will actually build these into ExLuna itself for NV30.
 
It would be really nice to see smart, high quality FSAA ( for instance 4 sample 64x Z3, with sample buffer compression ). I doubt this will happen though :/

We've already had talk of framebuffer compression for nearly no penalty 4x FSAA.
 
Definitely! That's is the real unknown with the NV30. I do truly hope that nVidia has significantly improved the quality and performance of both FSAA and anisotropic with the NV30 (over the GeForce4, of course...).

And the comparison on image quality between the two will be very, very interesting.
 
MfA said:
4X for offline rendering? Thats a bit of a joke.

Well, if you want to do offline rendering, it's really not a problem to have as many samples for FSAA as you care to take the time to do. "Free" 4x FSAA would sort of be a first-step, and then the final image would use some massive supersampling (Along with motion blur for truly high-end rendering...).

However, I do truly hope that the NV30 offers more than simple 4-sample MSAA. After all, ATI has 6-sample MSAA. If the NV30 is to be offered in 256MB versions, 8-sample MSAA should be available.

As a quick note, unless nVidia jumps to 9-sample MSAA, it will no longer be possible to use an ordered-grid or Quincunx-style blur filters.
 
I think the 4x comment was an attempt to steer the conversation back from render farm usefulness of NV30 to "what can it do for my desktop"
 
Even then, if nVidia only does 4x MSAA on the NV30, then they will be behind ATI in the FSAA image quality department.
 
Is a Matrox Parhelia approach out of the question? With a better implementation, i can't see what would be wrong with that.

I guess 64x FAA would be effortless on the NV30, that would certainly be a reason to buy a card
 
The only problem is, I'm not sure it's possible to produce a fully-hardware perfect edge detection algorithm for use with FAA.

Additionally, there may be problems with buffer overruns in certain scenarios (i.e. if there are too many edges detected, it could cause more edges to just not be AA'd, or significantly hurt performance).

If nVidia can do it 100% properly, then great! But, I'm just not sure it's feasible.

What I think would be best would be a sort of "framebuffer compression" algorithm that would only store one sample per pixel when multisampling whenever that full pixel is completely covered by that triangle.
 
AFAICS if you do it right worst case performance can never be below multisampling with the same number of samples per pixel (real samples, not the number of sampling positions for the masks).
 
MfA said:
AFAICS if you do it right worst case performance can never be below multisampling with the same number of samples per pixel (real samples, not the number of sampling positions for the masks).

Only if you also do multisampling for all fragment pixels. After all, there won't always be an individual triangle for each pixel sample within the fragment pixels.
 
ben6 said:
Oh how I wish I could talk more about this. But how about this interesting quote from Carmack :

"My current work on Doom is designed around what was made possible on the original GeForce, and reaches an optimal implementation on the NV30. My next generation of work will be designed around what is made possible on the NV30."

This seems consistent with everything said thusfar in this thread. His "current work" is Doom 3. His "next generation" will be a new game engine to be developed. The time interval between the Q3A engine and Doom3 will be what, 3 years? JC doesn't just crank out new game engines every year. Nor would it be appropriate to release an engine with NV30 as its lowest common denominator any time in the remotely foreseeable future. The rule of thumb for even the most cutting edge developers is that the technology must be widely available in OEM platforms or it isn't feasible to program with that feature set in mind. This is why GF1 is JC's minimum standard for Doom3, because the GF2mx generation is now becoming more common in OEM's, whereas say a year ago most "casual" gamers still had TNT2's. At most, JC is talking about making *some* use of this feature set within say 2 years. That would make the feature more interesting, but not much of a factor for consideration in current buying decisions.
 
Well, for DOOM3, and perhaps some other games in the near future, the added precision and significantly increased performance will help tons for those who buy an NV30 or R300.
 
Back
Top