NVIDIA GT200 Rumours & Speculation Thread

Status
Not open for further replies.
32 ROPs in such a case are a result of the above necessity and not IMO a necessity to have as many ROPs.
Any possibility of redesigned ROPs? A few things come to mind, such as 32-Bit-Granularity or reduced functionality (which could partially be taken over by the shader ALUs).
 
R600's specs were impressive too :) Even if the most optimistic rumours are true - 160 SP's, 32 TMU's, improved AA/Z, GDDR5 - it may be that case that its advantage over G92 won't be as large as expected.

Another thing is that Nvidia usually goes high-end first with its new architecture so it wouldn't have anything to go head to head with RV770. AMD is probably targeting a lower price point with RV770 so in order to stay competitive (with what is most likely a slower chip) Nvidia is moving to a smaller process to help keep costs in check.
 
That G92b thingy will be most likely on 55nm; for that rumoured "GT200/G100" or whatever it's called after all so far rumours are conflicting between 55 and 65nm. I wouldn't exclude 65nm as a possibility for it at all though; after all they stayed on 90nm for G80 too.
 
I talked about this with farhan, and indeed you need a 27-bit MUL mantissa instead of a 24-bit mantissa for FP64. You also need a 53-bit mantissa for the ADD. Still, going from 27-bit to 32-bit for the MUL is hardly free, but maybe they figured single-cycle INT32 MULs were required after talking to game developers. And given that the overall ALU is more expensive now, obviously the relative cost is now lower. I can't really think of any fixed-function workload that'd benefit from single-cycle INT32 MULs though, hmm...

Some other interesting notes from the CUDA/PTX 1.1 docs which might hint at GT200 or later hardware,

4.3.4.3 Texturing from Linear Memory versus CUDA Arrays ... Textures allocated in linear memory: ... Can only be addressed using a non-normalized integer texture coordinate; : So hints that a single-cycle INT32 MUL would be somewhat important in accessing from a large linear texture. Obvious importance for CUDA, not so obvious for DX/GL SM4.0 stuff, but my guess is that this would be useful when using a texture buffer object (which can be very useful when rendering, ie skinning/blending/etc, in the future when one can accept a SM4 min API level).

mul24.hi may be less efficient on machines without hardware support for 24-bit multiply : This is from the PTX guide, I think the point here is that only mul24.hi/mad24.hi would be slower (software emulation) when the ALUs have a INT32 (or better) MUL/MAD.

5.1.2.1 Global Memory ... We recommend fulfilling the coalescing requirements for the entire warp as opposed to only each of its halves separately because future devices will necessitate it for proper coalescing. : Intent for future hardware to double the internal bus size from 512-bit to 1024-bit (16 thread half warp * 32bit -> 32 thread warp * 32bit)? Could this be matched with 1024-bit GDDR5 (I don't know anything about GDDR so sorry if that was a dumb question)?

Support for .f64 type in SIN, COS, LG2, and EX2 has been removed from the ISA. These were unimplemented in version 1.0 : Very obvious, interpolation/special function unit going to remain low precision only. However implies that FP64 hardware will still have RCP, RSQRT, and SQRT ability.

SAD, and DIV no longer support a rounding modifier. For doubleprecision, DIV implements round-to-nearest-even by default. : Couldn't think of anything specific, but might suggest something as to how FP64 hardware works.

7.6.1. Machine-specific Semantics of 16-bit Code : Was under the impression that all CUDA capable hardware had ran only with a 32-bit data path. Only though here is that perhaps a programmable 16-bit data path could be used for a programmable ROP. Doubt we will see this soon. Or perhaps there was some kind of initial idea to run limited functionality CUDA on older hardware (6/7 series).
 
Would be barely over 1 TFLOPs, if we think about conservative ~1.5GHz SD, which would match to the Cebit rumors of doubled FP-performance over G80.
 
So 240 cores means 240SPs and they are probably talking about GT200 specs right? Hmm i`m not impressed with this specs at all. G80 specs were pretty impressive on it`s time but if it will really have 240SP@1,5Ghz then it will give about 1,08Tflops. It`s pretty weak for Highend GPU when we compare it to Rv770 specs rumours - 800SPs give 1,2Tflops and we should remember that 1,2Tflops is for NOT highend GPU.
So isn`t strange if NVIDIAs highend GPU will have less FP-performance than midrange ATI GPU? :(
 
So isn`t strange if NVIDIAs highend GPU will have less FP-performance than midrange ATI GPU? :(
You're honestly getting tiresome; what part of "unreliable rumour" do you not understand? 800 SPs is likely someone's wishful thinking and little else. Heck, that rumour was actually part of a photoshopped die shot iirc - how much less reliable can you get? Everything does point at NVIDIA having a FP-per-$ disadvantage again if we're looking at 240 SPs vs 2x480 SPs, but repeating that 800 SP rumour all the time is really pointless to say the least.
 
It`s pretty weak for Highend GPU when we compare it to Rv770 specs rumours - 800SPs give 1,2Tflops and we should remember that 1,2Tflops is for NOT highend GPU.
So isn`t strange if NVIDIAs highend GPU will have less FP-performance than midrange ATI GPU? :(
:LOL:
RV770 is supposed to have a 250mm² die, so 800SPs seems very unlikely, or the TMUs will reduced to 8 or 12:LOL:.
ATi has to significant increase TMU-count and repair(hw-resolve and 4x SC) and strengthen ROPs (more Z). So everything over 480 seems not really possible.
400SPs (5 SIMDs with 80SPs) and 800SPs for R700XT(Dual-RV770), would better suite in the rumor picture.;)

And come back to GT200: 240SPs and >=1TFLOPs, would not so bad if setup-efficiency would be increased.
 
400SPs (5 SIMDs with 80SPs) and 800SPs for R700XT(Dual-RV770), would better suite in the rumor picture.;)
I agree =) 400 SPs sounds about right for RV770. Plus 32 TMUs and double Z for AA -- and it will be alright as a middle-end chip. (But beyond that there are some buzzings about unified memory pool for X2 card and if they turn out to be true then we'll have to see what kind of an impact this unified memory thing will make for a single chip config.)

And come back to GT200: 240SPs and >=1TFLOPs, would not so bad if setup-efficiency would be increased.
I'd say it's a bit low. GT200 should be faster than G92GX2 and should be on par with RV770X2. Otherwise they'll miss the sweet spot and will have to rely on G92BGX2 or something like that to compete with RV770X2 (which will be quite hard for G92B if these unified memory rumours are true).
Basically, i'm thinking that G100 should be more that 240 SPs or they'll have a problem.
(240 SPs is a strange number BTW. 256 with 16 left for redudancy sounds more plausible.)
 
I agree 240 SPs has some fundamental problems unless power consumption is also a fair bit below where we think it will be. After all, if a G92GX2 can have a 200W TDP, why would a GT200 with fewer SPs and substantially fewer TMUs have a higher TDP, even if they can't bin it as well? It doesn't really make sense.

I'm not really convinced by 240 SPs myself, but given that the article kinda implies it might have come straight out of Jen-Hsun's mouth, I thought it'd be a good idea to mention it at the very least. We'll see what happens.
 
I'd say it's a bit low. GT200 should be faster than G92GX2 and should be on par with RV770X2.
G71GX2: 384GFLOPs MADD
G80: 345GFLOPs MADD

Theoretical numbers do/did not say much...

(240 SPs is a strange number BTW. 256 with 16 left for redudancy sounds more plausible.)
... and 128 TMUs?:???: I think in GT-generation, we will see an increase in ALU:Tex, and 240:80 would be a good step between tex-performance and higher shader power demand of upcoming games.

I agree 240 SPs has some fundamental problems unless power consumption is also a fair bit below where we think it will be. After all, if a G92GX2 can have a 200W TDP, why would a GT200 with fewer SPs and substantially fewer TMUs have a higher TDP, even if they can't bin it as well? It doesn't really make sense.
High clockrates and voltage @65nm?
 
I agree 240 SPs has some fundamental problems unless power consumption is also a fair bit below where we think it will be. After all, if a G92GX2 can have a 200W TDP, why would a GT200 with fewer SPs and substantially fewer TMUs have a higher TDP, even if they can't bin it as well? It doesn't really make sense.

I'm not really convinced by 240 SPs myself, but given that the article kinda implies it might have come straight out of Jen-Hsun's mouth, I thought it'd be a good idea to mention it at the very least. We'll see what happens.


To my knowledge in recent years, GT200 seems to be a single core version with similar transition from NV40 to G70. The general performance is improving but the rate of efficiency is in decline as compared to previous G80, if Nvidia determines to maintain single die design.
 
G71GX2: 384GFLOPs MADD
G80: 345GFLOPs MADD
Theoretical numbers do/did not say much...
Well, we're assuming that G100 is comparable in its architecture to G8x and G9x, aren't we? G7x on the other hand is a completely different architecture, and while it's true that G71GX2 vs G80 ALU comparisions are quite irrelevant, i think that G80/92 vs G100 comparisions are much closer to reality.

... and 128 TMUs?:???:
Bilinear -- why not? 512-bit bus + high levels of AF in top-end segment will make this number of TMU more or less utilised all the time.
192 TMUs on the other hand is a bit too much even for bilinear only (if we return to the 384 SPs rumour).
 
I'm not really convinced by 240 SPs myself, but given that the article kinda implies it might have come straight out of Jen-Hsun's mouth, I thought it'd be a good idea to mention it at the very least. We'll see what happens.

I disagree on Jen-Hsun having said that.
"In fact, he thinks Nvidia eventually can eat into one of Intel's key markets -- high-end servers -- as software developers take advantage of the "many-core" architecture. (Intel takes a multi-core approach, with two or four core processors; Nvidia graphics processors have as many as 240 cores.)"

At first it's indirect speech, later (for me) it's* an addendum from the editor. And I'm not convinced, that the editor there is as aware of technical details as the top tier of B3Ds forum users.


*it being the number 240 wrt to processor cores. Heck - it could refer even to Tesla machines or X2-cards and he could've gotten the number slightly wrong.

Having said that, 240 does fit in to some of the more recent rumors i have heard.
 
Support for .f64 type in SIN, COS, LG2, and EX2 has been removed from the ISA. These were unimplemented in version 1.0 : Very obvious, interpolation/special function unit going to remain low precision only. However implies that FP64 hardware will still have RCP, RSQRT, and SQRT ability.
This, in my view, points to a clear re-design step performed by NVidia, scrapping the original GPU. The re-design might be why we're looking at the strange GT200 code :p

In the lead-up to the original November launch (6-8 months prior) it seems they had all this stuff in their design, which is why the older CUDA documentation mentioned this functionality.

For a variety of reasons that design failed - maybe it was running too hot/slow or perhaps stuff didn't work at all (e.g. these double-precision transcendentals). So, stuff has been sacrificed/simplified. Perhaps this is why we're looking at "only" 240SPs.

Anyway compared with G92, at the same clocks, we're apparently looking at:
  • ~190% GFLOPs
  • ~125% texel rate
  • ~200% bandwidth (presuming 512-bit bus)
  • ~200% fillrate
Dunno why the long faces, we already know that G92 has too much texel rate for its bandwidth... GT200 should be 2x the performance of 9800GTX.

Obviously, if they cut the bandwidth as part of the re-design, say to 384 bits, that's disappointing. But GDDR5 should solve that problem :D

Jawed
 
I'd say it's a bit low. GT200 should be faster than G92GX2 and should be on par with RV770X2. Otherwise they'll miss the sweet spot and will have to rely on G92BGX2 or something like that to compete with RV770X2 (which will be quite hard for G92B if these unified memory rumours are true).
NVidia could put two GT200s on a card to make a GX2, to compete with RV770X2.

Basically, i'm thinking that G100 should be more that 240 SPs or they'll have a problem.
(240 SPs is a strange number BTW. 256 with 16 left for redudancy sounds more plausible.)
So what you're proposing is a 16 cluster GPU, with 2 multiprocessors and 2 TMUs per cluster, with 1 cluster turned off for redundancy?

I'm dubious NVidia will retain the current ALU:TEX ratio and it's also clear that significantly more TMUs are not needed to attain 2x G92's performance.

Jawed
 
NVidia could put two GT200s on a card to make a GX2, to compete with RV770X2.
I thought more than 300W is not specified for PCIe-cards? ;)

But there was an interview, were NV mentioned that we will also see GX2-cards in future - GT200 @ 45nm?
 
This, in my view, points to a clear re-design step performed by NVidia, scrapping the original GPU. The re-design might be why we're looking at the strange GT200 code :p
...
Anyway compared with G92, at the same clocks, we're apparently looking at:
  • ~190% GFLOPs
  • ~125% texel rate
  • ~200% bandwidth (presuming 512-bit bus)
  • ~200% fillrate
Dunno why the long faces, we already know that G92 has too much texel rate for its bandwidth... GT200 should be 2x the performance of 9800GTX.
So it's your opinion that Nvidia did indeed break up their cluster an decoupled TMUs from ALU-blocks? No more "one Quad-TMU per 16xSIMD"?


I thought more than 300W is not specified for PCIe-cards? ;)
But there was an interview, were NV mentioned that we will also see GX2-cards in future - GT200 @ 45nm?
I doubt very much that the chip alone will consume above 100-120 Watts - which would be even more than a (reasonably) overclocked R600 uses.
 
I thought more than 300W is not specified for PCIe-cards? ;)
I have to admit I don't remember whether the specs put a ceiling on "per board" power consumption or whether it's per connector :???: ...

But there was an interview, were NV mentioned that we will also see GX2-cards in future - GT200 @ 45nm?
I presume 45nm GPUs will be on the market within 12 months, but NVidia usually lags ATI by ~6 months when it comes to the latest processes for GPUs.

Jawed
 
Status
Not open for further replies.
Back
Top