NVIDIA GT200 Rumours & Speculation Thread

Status
Not open for further replies.
How so? 512-bit MC matches that figure.
There are 4 ROPs in G92 and 6 ROPs in G80 =)
I also like this part:
330-350mm2 die size
G92 is 324mm2 8)
And this is very good also:
9900GTX SLI runs Crysis 2560x1600 VH 4XAA smoothly
-)

However, if the source is PHK, then there's a chance it may be very close to the truth, that's a fact.
Well, everything written here seems to be true (although i'm still thinking that omitting 10.1 support from G1xx would be a mistake for NVIDIA that doesn't mean that NVIDIA thinks the same way), but i don't see there anything about 24x8, 192 SPs, 330-350mm2 and so on =)

Oh, yeah, could "GT" stand for "GeForce Tegra"? 8)
 
Last edited by a moderator:
G92 is 324mm2 8)
And this is very good also:

-)

I know where do people come up with this BS. Fastest performance of any setup WITHOUT AA in crysis at 2560x1600 right now is: 21 FPS!

So if we assume that its 2.5x the performance of 8800 Ultra we are still looking at under 60 fps WITHOUT AA. AA will likely result in a 20-30% performance degradation (hard to tell cause no one really even bothers to test crysis with AA) which would it at about 38-42 FPS.

In other words, it will take a miracle for any of the coming generation cards to run crysis at 2560x1600 VH 4xAA (and I assume 16x AF). Sad really....

Aaron Spink
speaking for myself inc.
 
I think the definition of "smoothly" used there is 25-30 fps. Don't know if it's due to the motion blur or DOF but that seems to be the general consensus for an acceptable framerate in Crysis. Even so 30fps is still a stretch.
 
I know where do people come up with this BS. Fastest performance of any setup WITHOUT AA in crysis at 2560x1600 right now is: 21 FPS!

So if we assume that its 2.5x the performance of 8800 Ultra we are still looking at under 60 fps WITHOUT AA. AA will likely result in a 20-30% performance degradation (hard to tell cause no one really even bothers to test crysis with AA) which would it at about 38-42 FPS.

In other words, it will take a miracle for any of the coming generation cards to run crysis at 2560x1600 VH 4xAA (and I assume 16x AF). Sad really....

Aaron Spink
speaking for myself inc.

Didn't they say 4xAA playable with 9900 GTX in SLI? If a single card is 2.5x more powerful than 8800 Ultra, wouldn't it be playable when using two of them (assuming that performance scaled accordingly)? Also, it's certainly possible that this new card is a bit more efficient at handling AA compared to the last generation.
 
Didn't they say 4xAA playable with 9900 GTX in SLI? If a single card is 2.5x more powerful than 8800 Ultra, wouldn't it be playable when using two of them (assuming that performance scaled accordingly)? Also, it's certainly possible that this new card is a bit more efficient at handling AA compared to the last generation.

The numbers I gave were assuming a 9900 GTX/Ultra was 2.5x performance and 2 of them were used in SLI (used the 8800 Ultra SLI scaling for the 9900 GTX SLI scaling).

With the specs as we know them, I think they'll be doing good to hit 2x 8800 Ultra performance in Crysis let alone 2.5x. There may however be some memory capacity effects holding the ultra back somewhat at max rez and a possible additional 256 MB of memory might help out there, but the scaling from 1920 -> 2560 is actually super linear with pixal count so I doubt it. From looking at the scaling, it appear pretty much shader bound on the Ultra at all resolutions (ie, 9800 gtx and ultra are roughly equivelent until 2560x1600).

Aaron Spink
speaking for myself inc.
 
Last edited by a moderator:
Well, it's not like adding a few blocks to the core would make the basic design different in any significant way, right ?
Just look at R300 vs R420, for instance.
R420 is not a dual-core R300, it's a new chip.
Besides, there's no firm indication yet that G92b (55nm) has the exact same number of functional blocks as G92 (65nm).
It's much easier to do an optical die-shrink of an existing GPU than to make changes to it (basically requires designing a new chip).
 
Wake me up when some of you are done learning the alphabet from scratch *sheesh*.....
 
Yes, I think 32 ROPs is pretty much a given with a 512-bit MC - not that that makes the specs real.

And these ROPs are needed... where? nVidia already went back down to 16 because it's (more than) enough at todays clocks and displays.
 
And these ROPs are needed... where? nVidia already went back down to 16 because it's (more than) enough at todays clocks and displays.

Oversimplyfied think of the amount of ROPs as a result of the memory controller configuration to reach X, Y or Z amount of bandwidth.

Assume GF-Next is measured internally to need =/>140GB/sec memory bandwidth. Whether now you get to that value with 2.2GHz GDDR5@256bit / 4*64bits or with 1.1GHz GDDR3@512bit / 8 *64bits, doesn't make as much difference. The defining factor here would be that it would be damn hard to find for the projected timeframe of its release as fast GDDR5 memory in adequate quantities. Those things are supposed to come with 1GB ram on board from what it looks like anyway.
 
Meh, things have worked with quad pipelines since the original GeForce. And you have no guarantee the internal redundancy mechanisms even work like that for the entire pipeline. So sure, it might still be slightly more technically accurate, but I really don't see the problem with counting ROPs as one unit per pixel... Neither scheme is so superior that it's worth having a religious argument about it! :)
 
ROP is a unit in the chip in the same way as 1 SP is a unit and 1 cluster of SPs is a unit.
1 G8x/9x ROP can write some number of pixels in memory per clock.
When you're talking about 16 or 24 you're talking about the number of 32-bit color pixels written not about ROP units. Each ROP unit can write 4 of these pixels per clock. So basically you have 4/6 ROP units in G92/G80 which can write up to 16/24 32-bit opaque color values per clock.
The real number of ROPs is one thing, the number of pixels written by them per clock is another thing and i don't think it's wise to mix them because the same ROPs can write some different numbers of different pixels -- 32 Z values or 8 FP16 values or 12 FX8 values with alpha blending and so on. Why should we care about 32-bit rate of ROPs if most of our games are using FP16 frame buffers with blending for example? -) And what if the overall throughput of ROPs are the same for FP16 blending between GPUs with a different number of ROP units?
I think it's a matter of accuracy. When you're talking about ROPs -- talk about ROPs. When you're talking about pixels written by ROPs -- talk about pixels, not ROPs.
 
Oh come on, ROPs arguably have little quad-based sharing, unlike TMUs/Interpolators which benefit a lot from working in quads. And there's another consequence to this: RV670 could really have eight 2-wide ROP partitions in terms of redundancy and you wouldn't know about it. In the end, in the case of ROPs, it's clearly an implementation detail...

In fact, I would suspect that from a design POV, they 'copy-paste' 32 times one unit, rather than 8 times one 4-wide unit. They obviously have some logic at the top of that which is quad-centric, but that's probably not the majority of the transistors. This is unlike, say, the shared interpolator/SFU unit which is 100% quad-centric and must be designed as one block. And yet I don't often see people talking about those as quad units...

As I said I don't care either way, I just don't think it makes sense to insist that one is massively more accurate than the other.
 
There are 4 ROPs in G92 and 6 ROPs in G80 =)

When I see the term "ROP" I think individual raster operation "processor", not an SIMD-like unit ala NV's ROP "partitions" or ATI's Render Back Ends which are each a quad unit of ROPs. I believe we've been on this quad ROP unit style architecture since the R300.

i.e.
G80 has 24 ROPs, not 6. It has 6 ROP Partitions or quads.
 
Status
Not open for further replies.
Back
Top