NVIDIA GT200 Rumours & Speculation Thread

Status
Not open for further replies.
It just struck me as odd as going with 2 clusters disabled instead of just 1 for the second-best card.
 
The difference of 48 processors sounds a bit too much of a difference between the 280 and 260 models. That almost implies the 280 is staged as 5 groups of 48 with the 260 staged as 4 groups of 48.

The GTS was only 75% of the GTX. This config makes the 260 80% of the 280.

The only configuration that makes the numbers work is 24 SP and 8 TMU per cluster. 10 clusters active on 280, only 8 on the 260.

Only one cluster disabled would make the 260 90% of the 280 - way too close for comfort.
 
Anyone buy into the one rumor of the GTX-280 already having 1 cluster disabled for fault tolerance, making the GTX-260 having 3 clusters disabled?

The other thing which seems to make a larger impact is the clock differences in core/shader/memory, so even with only 1 cluster difference between the GTX-280 and the GTX-260 you could have enough differentiation.
 
Anyone buy into the one rumor of the GTX-280 already having 1 cluster disabled for fault tolerance, making the GTX-260 having 3 clusters disabled?
Clearly if that was the case, then as you point out simply disabling one cluster might be enough differentiation. So, if it's indeed 8/10 for GT260, that would imply there is no such form of coarse redundancy on GT200... ;)
 
Nurien Software's "3D social networking service" Nurien running on GTX 280? NVIDIA PhysX in action, by the way.

http://www.youtube.com/watch?v=MoQh4R1-Vjs&fmt=18
http://www.youtube.com/watch?v=X9BGiMvrUrk&fmt=18
http://www.youtube.com/watch?v=x3eenx3E0X4&fmt=18

nurien_nvidia_physx.jpg
 
Last edited by a moderator:
The other thing which seems to make a larger impact is the clock differences in core/shader/memory, so even with only 1 cluster difference between the GTX-280 and the GTX-260 you could have enough differentiation.

1 cluster disabled: only 80(10*8(one rop-partion is also disabled) versions of semi-broken dies can be used for GTX 260

2 clusters diabled: 10*9*8 = 720 versions

;)
 
Why is it that Nvidia's clocks, especially core, are taking a step backward?

If AMD can keep ramping core clocks, it can become an advantage. Right now it'd be 850?>600 on the flagships. That is 41%. So for example, 32 TMU's actually becomes 45 equivalent for AMD. Of course the AMD shaders will operate at a deficit, but that has been the case, and the deficit has narrowed (lets say 850:1300 vs 742 (HD2900):1350 (8800GTX) last launch.

Looking at it, 8800GTX core stepped back from 7900GTX as well 575:650. But it seems this split shader clocks situation somehow prevent Nvidia's core clocks from steadily marching. GT200 is coming in at 600. The shaders also took a slight step back to 1300. Of course we know parallelism is by far king in GPU's. I'm curious if this increasing AMD lead in core clock will result in any performance narrowing come bench time.

EDIT: The one thing I later realized I missed from the above analysis is the effect of the "missing mul" being added back for Nvidia. Now I have absolutely zero clue how much performance difference that is, but someone earlier said 20-30% of instructions use mul. So lets take a wild, wild ass guess and say Nvidia gets 25% improvement out of that. That is going to overwhelm any comparative clock gains for AMD this time around. GT200 gained 25 on the core and dropped 50 on the shader versus 8800GTX. Call that a net clock wash. AMD is gaining 75 mhz on 3870 or about 10%. So the 25% mul factor should more than overwhelm that relatively all things equal this time around. Of course Nvidia pulling the mul rabbit out of it's hat was a unique one time situation, had it not been for that, ~10% relative clock deficit increase repercussions could have indeed been felt by Nvidia in this generation.


Overall I'm getting a pretty good feel for how things should turn out. We should see about the same performance situation we currently see, with AMD's top card being perhaps 60-70% of Nvidia's. A lot of people here are hyping super linear gains of 3X for Nvidia, but to me the specs dont back up more than ~2X (unless the mul is a bigger deal than I know).

You are getting stable clocks and a 87% increase in SP's over 8800GTX for GT280. You are getting the mul back (which probably kicks the 87% to ~100+), and you have 512 bit bus and 32 ROPs also, but to me these dont increase performance, they are simply needed to support the shaders (otherwise the 87% sp increase gets bottlenecked, but it's not going to create a more than 87% increase either, except in BW/ROP limited situations, which I think are few, by comparing 9800GTX and 8800GTX, although at uber res it will help more). So I expect a 100-110% increase on GT200 over 8800GTX.

The bottleneck on AMD was the TMU's and those are doubled. You get 50% more shaders and 10% more clock (at least, assuming the split clock domains are false on the shaders) which kicks the increases up to 120% TMU and 65% shader relatively. My earlier game "analysis" based on 1.25X9800GTX suggested 82% performance increase. That is a rough theoretical doubling as well. Perhaps there are some other minor improvements/speedup tweaks in RV770 (fixed AA ROPs? Other minor tweaks/speedups?) we dont know about also.

Also if RV770=1.25 9800GTX, and GT200=2X 9800GTX, this works out to 4870=62.5% GT200. Within my 60-70% range.


This does suggest 4870X2 could in some rare cases of exceptional scaling could outperform GT200, but multi-GPU is such a hideous beast in my opinion it's still an ugly solution, and single card is far far better.
 
Last edited by a moderator:
"New forms of self expression" - most definitely :oops:
"Users can add interactive emotions like kissing and hugging" :LOL:
 
Last edited by a moderator:
Because clocks across different architectures can't be compared directly

Also, the bigger die and yield issues are probably why clocks are down... it just gets way too hot with higher clocks
 
They already support Main Profile today and 1080p. Apparently not High Profile yet (which is a pretty big problem I'll agree), but that's just a lack of time thing, it's only slightly more complicated than Main Profile. Note that I got this info from an interview on PCInpact, a french website...
 
They already support Main Profile today and 1080p. Apparently not High Profile yet (which is a pretty big problem I'll agree), but that's just a lack of time thing, it's only slightly more complicated than Main Profile. Note that I got this info from an interview on PCInpact, a french website...
All these PR numbers are not impressive.
More impressive numbers are FPS for Main Profile Encoding with CABAC.
(About the mentioned interview: http://forum.doom9.org/showthread.php?p=1128724#post1128724
# jesse Says:
April 23rd, 2008 at 8:16 am

Sorry for the late reply, but I’m still recovering from a very successful NAB show.

At the show we demonstrated main profile H.264 encoding at realtime (1440×1080) using an 8800GT or a Quadro 3700. High profile is not as big a delta to main as is to baseline. In other wods, high profile contains only a few more encoding tools compared to main profile and getting these implemented will not take long. The 8×8 transforms in high profile will get us the best “bang for the buck” as far as quality goes. We plan on releasing high profile sometime late this fall.

If you are curious about the profiles, the wikipedia (as usual) has a great description and comparison of Main vs Baseline, vs High, vs High422, etc.
http://en.wikipedia.org/wiki/H.264

Thanks for reading, and keep the questions coming!
)


http://forum.doom9.org/showthread.php?t=137459

Dark Shikari said:
Avail Media is already working on CUDA for x264 ;)



Dark Shikari said:
Actually, basically everything can be reasonably done on the GPU except CABAC (which could be done, it just couldn't be parallelized).

x264 CUDA will implement a fullpel and subpel ME algorithm initially; later on we could do something like RDO with a bit-cost approximation instead of CABAC.
...
 
That's a very naive analysis based on clocks only. You need to factor in relative increases in unit counts and relative performance of those units as well.

Did you read it? I did that.

On the GT280 analysis, I think I probably undercounted the impact of the missing mul just then, because I didn't apply it across the entire 187% of shaders. If you do that you end up with more like 233% of shaders. This is all flawed because I dont have a real clue what the mul means for performance. Oh and is the mul really got back or what? Did the 933 Gflops confirm or deny the mul?
 
Last edited by a moderator:
h.264 for iPod is Level 3 without CABAC and other funny things. What really matters is Level 4 oder 5 with CAVAL/CABAC. And much higher resolutions than for iPod.
Meh, faster than realtime transcoding for uploading video to your I-Phone could be marginally useful, and relevant to more customers than those doing offline encoding.

They both matter (a little).

PS. just knowing FPS in ideal situations is not that relevant, rushing through ME and mode decision with the simplest algorithms is good for FPS but not good for quality.
 
Well, yeah, CABAC is nearly impossible to parallelize. However, I wouldn't be surprised if you could accelerate it a bit if you were smart; either way Elemental claims CPU utilization is low, presumably also with CABAC given they already had it working when they said that, so let's wait and see.
 
Status
Not open for further replies.
Back
Top