NVIDIA GF100 & Friends speculation

Knowing Nvidia they will try to push any perceived advantage so it'll be interesting to see how they try to influence developers if they in fact have a big advantage in geometry processing.
I really hope developers get a big kick to get us out of the pre-tessellation dark ages, so NVidia being great here is a big plus. I'm sick of seeing polys in games.

Jawed
 
So assuming the Fermi can do 2 triangles/clock...what does this mean for real games like crysis? Would it be possible to double polygon counts without any hit in performance? Crysis would look a LOT better if things could be smoothed out a bit.
Hopefully NVidia's done it right and made setup a kernel, like VS or DS. That would mean it's arbitrarily fast, only limited by internal bandwidths/ALUs.

Better, if the setup algorithm queries the early-Z system and early-outs wodges of triangles (e.g. in batches of 32).

Makes me wonder if L1 cache is used to communicate Z from ROPs to ALUs.

Jawed
 
there is no fixed function portion to tesselation when it comes to coding or hardware. The code just runs better on a given hardware because of the hardware advantage.
Tessellation (the TS stage), per se, is fixed function, the same as rasterisation - the result is mandated by D3D. That doesn't mean it requires fixed function hardware.

Just want to clear up any confusion in the usage of "fixed function".

Jawed
 
Will be amusing if the GF100 does end up much faster than the 5800 series at Tessellation considering how much AMD have gone on about it in the past.
 
Hopefully NVidia's done it right and made setup a kernel, like VS or DS. That would mean it's arbitrarily fast, only limited by internal bandwidths/ALUs.

Better, if the setup algorithm queries the early-Z system and early-outs wodges of triangles (e.g. in batches of 32).

Jawed

That wouldn't work with tessellation, would it? Because with tessellation, the app will likely pump out LOTS of pixel/sub-pixel sized triangles.
 
Do you think it'd be off better being serially processed by some un-scalable piece of hardware?
 
So, why choose P score if X would give it the lead anyway?

Something doesn't add up.

Maybe NVidia doesn't want to show too much of its true performance hand before launch to shorten the time AMD has to respond. That is, they'll show just enough to prove it's the top card, but won't show exact clocks, memory, or top-end benchmarks until March. If they show extreme setting benchmarks now, it gives AMD a target to shoot for when tweaking their refresh clocks.

In the original Fermi architecture announcement, they withheld information (like any radical changes they made to graphics, leading everyone to assume they had made none!) This time around, they may repeat, and debrief everyone on the graphics pipelines, but not fully given away everything.

In any case, behaving like Apple, controlling information flow, tends to increase hype and interest, confuse competitors, and make any final product reveals look more amazing than they might otherwise be.
 
I really hope developers get a big kick to get us out of the pre-tessellation dark ages, so NVidia being great here is a big plus. I'm sick of seeing polys in games.

Jawed

Totally agree. With all the other progress that's been made, it's bad that we still see hexagonal barrels and wheels! Roll on good tessallation...
 
Totally agree. With all the other progress that's been made, it's bad that we still see hexagonal barrels and wheels! Roll on good tessallation...

It's amazing to think that no one's broken the 1-tri/clk barrier since the Voodoo 1, and whereas texturing, rasterizing, and ALU performance has gone through the roof, geometry has been tied inherently to clocks and followed more of a linear or quadratic growth. That's what, 13 years ago IIRC?
 
Maybe NVidia doesn't want to show too much of its true performance hand before launch to shorten the time AMD has to respond. That is, they'll show just enough to prove it's the top card, but won't show exact clocks, memory, or top-end benchmarks until March. If they show extreme setting benchmarks now, it gives AMD a target to shoot for when tweaking their refresh clocks.

In the original Fermi architecture announcement, they withheld information (like any radical changes they made to graphics, leading everyone to assume they had made none!) This time around, they may repeat, and debrief everyone on the graphics pipelines, but not fully given away everything.

In any case, behaving like Apple, controlling information flow, tends to increase hype and interest, confuse competitors, and make any final product reveals look more amazing than they might otherwise be.

AMD does not need till March to know the clocks. They'll know it the moment it hits Taiwan. ;) Otherwise, they are indeed throttling info flow carefully. Though I am not at all sure why bother having a media event now. Why not just have one at launch? That way they'll give even less time to AMD to respond. To stem the tide of people buying 58xx?
 
It's amazing to think that no one's broken the 1-tri/clk barrier since the Voodoo 1, and whereas texturing, rasterizing, and ALU performance has gone through the roof, geometry has been tied inherently to clocks and followed more of a linear or quadratic growth. That's what, 13 years ago IIRC?

AFAIK, rasterizing has also been stuck at 1-tri/clk for a long time now. Not sure what it was in Voodoo 1 days.
 
. To stem the tide of people buying 58xx?

Exactly. If they announce that a world-beating card is imminent, some people might hold off. I'm not sure chips getting into the hands of Taiwanese now will allow clocks to be predicted, since that could be a function of cooling, drivers, and binning, which you don't know just from a few sample chips.
 
AMD does not need till March to know the clocks. They'll know it the moment it hits Taiwan. ;) Otherwise, they are indeed throttling info flow carefully. Though I am not at all sure why bother having a media event now. Why not just have one at launch? That way they'll give even less time to AMD to respond. To stem the tide of people buying 58xx?
Putting key "journalists" under NDA now stops them bad-mouthing it for the next X months ;)

Jawed
 
That wouldn't work with tessellation, would it? Because with tessellation, the app will likely pump out LOTS of pixel/sub-pixel sized triangles.
I can't work out what you're saying here.

A triangle that is pixel sized (or AA sample-sized if MSAA is on) still needs to update Z.

I'm thinking of, say, a square inch of screen being filled with 1000 triangles. Let's say it's some monster's head. But it's hidden behind a corner. Might as well cull those triangles before they're setup - or, as part of the setup process they're culled rather than being generated, only to be culled later by rasterisation/early-Z.

NVidia might even be able to propogate the Z query back into DS so that the vertices that make up doomed triangles don't waste time computing attributes (i.e. early-out from DS) and mark the vertices in some way that allows them to be deleted (e.g. they get culled instead of being put in post-DS cache).

Or at the very least generate an always-on GS (in addition to anything the developer codes) that culls triangles by quering the early-Z system.

Jawed
 
It's amazing to think that no one's broken the 1-tri/clk barrier since the Voodoo 1, and whereas texturing, rasterizing, and ALU performance has gone through the roof, geometry has been tied inherently to clocks and followed more of a linear or quadratic growth. That's what, 13 years ago IIRC?
Since we lose much of the benefit rasterization has over other methods with too much triangles, the increase is already quite significant. From about 100Mtris/s we're now at 850Mtris/s while resolution only increased by a factor of 3, considering the mainstream resolution was 1024x768 and is now 1920x1200.

Look at RV670, with its half setup rate... it worked perfectly fine and performance was almost equal to R600 in the real world, so setup rate is by no mean a limiting factor without tessellation, and the way adaptative tessellation work won't increase triangles count to more than 6 triangles per pixel.
 
Question about benchmarking tesselation, part of tesselation pipeline is vendor specific, if I understood correcltly it is the fixed function part, couldn't this part vary in quality from one too other too much making the comparison of fps pointless?

Nope, the output of the tessellator is defined by Microsoft. The IHV only has control over the implementation and performance, not the output.

So... why choose 3DM Vantage Performance score?

Why not? It's the most popular setting. Looking for ghosts there methinks.

Could higher res/shaders complexity lead to lower performance than the competition?

Why would it? There's no indication that Fermi is a slouch in that respect. But presumably any geometry advantage will diminish with higher resolution / pixel complexity.
 
I really hope developers get a big kick to get us out of the pre-tessellation dark ages, so NVidia being great here is a big plus. I'm sick of seeing polys in games.

Jawed

But I hope they do it decently. The Unengine demos at first looked like crap with crazy popping everywhere. The tesselation needs to be managed so you don't see a wave heading out ahead of you in the game. Other than that I have have always been super excited about the prospect since years ago when ATI brought it up (what was it the 8500?).
 
I'm thinking of, say, a square inch of screen being filled with 1000 triangles. Let's say it's some monster's head. But it's hidden behind a corner. Might as well cull those triangles before they're setup - or, as part of the setup process they're culled rather than being generated, only to be culled later by rasterisation/early-Z.
You have to rasterize up to the hierarchical Z resolution to cull in the first place ... so you have a part of the pipeline which is behind setup proper which does rasterization ... lets just call it the rasterizer okay? :) (Hierarchical fragment rejection is nice, even better with a fast path for small triangles, but it doesn't make sense to count it as part of setup.)

Culling patches is possible, but you have to do some program analysis on the shaders to determine bounds before you can do a conservative rasterization to hierarchical-Z tile level (and developers can almost certainly find ways to break that analysis, but as I said in the bit about tiling patches, program specific code isn't uncommon anyway). This doesn't even really need any hardware on top of what is already there ... just render all patches twice, the first time use the shaders to create a conservative tesselation, render them without Z-update with early-Z checking and use GDS (or something equivalent, it just takes a single bit per patch) in the pixel shader to couple the results back to the hull shader so it can cull the patches on the second pass.
 
Last edited by a moderator:
Back
Top