Nvidia GT200b rumours and speculation thread

I really hope we'll get some official information on this chip after the R700 performance previews (are they still scheduled for tomorrow?) are published.

Maybe Nvidia wants to spoil the R700 launch like they did with the 4850s launch (9800GTX+).
 
Wouldn't the card have to be 512bit if they enabled the remaining cluster?

Trini already answered, but if you want proof of that, look no further than any run-of-the-mill 8800 GT.
It has a disabled cluster, yet retains the 256bit memory interface and 512MB GDDR3 capacity of its 8800 GTS 512MB, 9800 GTX/GTX+ and 9800 GX2 cousins.
 
well they can independently disable shader clusters or ROP clusters (tied to memory, 4 ROPs = 64bit). a shader cluster is two multiprocessors (8SP + SFU + registers) on G8x/G9x, three on G200, so 16SP and 24SP respectively.

so with G92 you have 128SP/256bit (the full GPU), 112SP/256bit (8800GT, 9800GT), 96SP/192bit (8800GS)
GTX280 is 240SP/512bit, GTX260 is 192SP/448bit. so they respectively have 10 and 8 shader clusters, 8 and 7 ROP clusters.

just recalling those boring facts because there seems to be confusion with those dreadful "clusters" :)
 
I'd rather believe, Nvidia vastly underestimated RV770. :)
Can't say I blame them. ATI's engineering didn't have much to brag about with their previous DX10 efforts.

NVidia may be in a tough spot now due to their commitment to CUDA. I think Jawed pointed this out, but it seems like they're stuck with 8-wide SIMDs. ATI basically has 16x5 SIMDs right now and there's no pressing need to go for better granularity. Even after 55nm scaling, the former is more than half the size of latter, and despite increased utilization and clock speed, that's not even close to being small enough.

I think computational speed is starting to matter less, though. Games are probably using a bit more math, but it's not increasing as fast as GPU ability. We'll see if GT300 has some innovations there.
 
NVidia may be in a tough spot now due to their commitment to CUDA. I think Jawed pointed this out, but it seems like they're stuck with 8-wide SIMDs. ATI basically has 16x5 SIMDs right now and there's no pressing need to go for better granularity. Even after 55nm scaling, the former is more than half the size of latter, and despite increased utilization and clock speed, that's not even close to being small enough.

In terms of math capability I think Nvidia can keep up even with the more expensive 8-way SIMD approach.

What I don't get is why GT200 seems to have a lot more supporting logic than RV770. For example, ALU+TEX on RV770 seems to be a larger percentage of the die than ALU+TEX on GT200 even with NVIO parceled out to a separate chip. Since a lot of the arbitration logic is part of the clusters what is all the extra stuff on the GT200 die?
 
NVidia may be in a tough spot now due to their commitment to CUDA. I think Jawed pointed this out, but it seems like they're stuck with 8-wide SIMDs. ATI basically has 16x5 SIMDs right now and there's no pressing need to go for better granularity. Even after 55nm scaling, the former is more than half the size of latter, and despite increased utilization and clock speed, that's not even close to being small enough.

I think computational speed is starting to matter less, though. Games are probably using a bit more math, but it's not increasing as fast as GPU ability. We'll see if GT300 has some innovations there.

Can you elaborate as to why you think NV is in a tough spot due to their commitment to CUDA?
 
Since a lot of the arbitration logic is part of the clusters what is all the extra stuff on the GT200 die?
NVidia's connecting 10 clusters to 8 ROP partitions - whereas ATI's connecting 10 clusters to 4 MCs. The interconnection logic scales faster than either side being connected- it's a combinatorial explosion.

The sheer quantity of memory bus pins is prolly also a factor.

Jawed
 
Can you elaborate as to why you think NV is in a tough spot due to their commitment to CUDA?
IMO they really don't want to move away from 8-wide SIMDs, as that is something they want to keep consistent in their GPU computing framework. ATI hasn't made any such commitment.

In terms of math capability I think Nvidia can keep up even with the more expensive 8-way SIMD approach.
Sure, but not at the same cost as ATI. NVidia has a long history of designing as optimally as possible for a given set of design constraints (NV3x aside). I'm pretty sure their current design can't get any smaller.

What I don't get is why GT200 seems to have a lot more supporting logic than RV770. For example, ALU+TEX on RV770 seems to be a larger percentage of the die than ALU+TEX on GT200 even with NVIO parceled out to a separate chip. Since a lot of the arbitration logic is part of the clusters what is all the extra stuff on the GT200 die?
I don't think you're right about that. The ALU space is ~25% on both, and while TEX space is about the same for NV's DX10 chips, the TEX area is a lot smaller than the ALUs for ATI. It looks like ~40% ALU+TEX on RV770, and 50% ALU+TEX on GT200 and G92/G80.

It's not just the ALUs that are awesome in RV770, but what ATI can do with 40 seemingly small TMUs is quite impressive compared to the 64 TMUs in G92. XBit labs has some fairly texture-intensive shaders (see R580 vs. R520), but RV770 is still beating G92 in them (here).

Anyway, in the "extra stuff" there's still a lot of arbitration logic to decide which workloads go to which cluster. There's still all the rasterization w/ Z-cull, which needs to feed the shaders twice as fast to take advantage of twice the ROPs in GT200. IMO, 50% non-shader space on GT200 doesn't seem out of place compared to 60% in RV770, all things considered.
 
Last edited by a moderator:
IMO they really don't want to move away from 8-wide SIMDs, as that is something they want to keep consistent in their GPU computing framework. ATI hasn't made any such commitment.
If only CUDA was not so close to the hardware and a little bit more abstract..
They had probably lost a bit of performance here and there but they would have not found themselves in this situation.
I guess a future CUDA revision is going to address this issue.
 
Last edited:
Yup, that's why I think they're in a tough spot. They have to choose between changing some CUDA fundamentals and letting ATI keep the ALU per-mm2 efficiency crown (which may not be so bad). NVidia would rather not have to do either.

I'm sure they felt that they made the right choice when R600 was out, and still felt fine with RV670. Only with RV770 does it this years-old decision look a bit restricting.
 
Well, remember that it does give them better branching granularity and dependent instruction throughput (though I think ATI can achieve the latter with minimal cost as well, as I've argued before), and I think they have the option to get even better granularity if they want.

I don't think it's too useful right now, but it could be in the future, especially for non-graphics loads. I think if NVidia improves its texturing, memory controller, and AA performance, the areal math inefficiency may not matter for games.

However, gaudy math numbers must look tempting for HPC customers, too, and if AMD pushed FireStream hard then NVidia may have no choice but to do what you suggested.
 
Slight tangent: what do you guys think of the bulk synchronous parallel processing model? There's a paper from microsoft research at siggraph 08 on it: BSGP: Bulk-Synchronous GPU Programming, scroll down a bit for the paper.
Thanks for the link, I just finished reading it. It looks like a very interesting model and implementation, and the fact that they developed a non-trivial application on it (the X3D parser) makes it a lot more credible.

However, I think it's more than just a slight tangent from the topic of this thread. Perhaps this part should be split off to a separate thread in the GPGPU forum?
 
Fudo says GT200b will be here in September, maybe even August:

http://www.fudzilla.com/index.php?option=com_content&task=view&id=8515&Itemid=1

We said a few months ago that Nvidia drives two projects in parallel. GT200 65nm that got launched and branded as GTX 280 / 260 is out and there is a GT200 55nm chip that should be launched shortly.

Our sources are telling us that 55nm version of the chip should be ready either in late August or in September which means that Radeon HD 4870 X2 will get some competition.

We believe that Shaders and clock of 55nm GT200 are definitely going to be higher than the 65nm and the chip itself should be a bit cooler.

This means R700 will get some better competition and the fact that this ATI’s dual card is going to end of faster than GTX 280, doesn’t mean ATI has already won the war.
 
Back
Top