NVIDIA GT200 Rumours & Speculation Thread

Status
Not open for further replies.
Sure, but if the author of the screenshot just wrote "I have learned from a reliable source..." and wrote down the specs, would that be a rumour or an "unmitigated BS"? Then, if the specs look believable...

If you can't provide a REASONABLE picture of the actual card and die, then yes, its unmitigated BS unless you have a damn damn damn good track record.

So if Arun or Anand or Kyle want to post the info? sure, it might have some credibility, but random joe shmoe? don't think so.

I have no problem with people making guesses and suppositions on what the specs will be but don't try to pass it off as real info, which is what everyone posting these BS GPU-Z shots is trying to do.

Aaron Spink
speaking for myself inc.
 
I agree that posts like that one of HAL's with the GPU-Z screenshot and no word on its origin are quite useless to us, but... nobody ever had PCB/die shots until just a few weeks before launch. And just because you don't know someone doesn't necessarily mean he can't have the info.

Ah well, back to the topic. Here are my thoughts on the GT200, pure speculation:
- It will be released later than RV770. I've been keeping track of events since before G70 was released, and I think I can tell whether a chip is coming soon or is still some time away. G94 came as a bit of a surprise, yes, but high-end GPUs usually generate more buzz and are more predictable. Point is, there seems to be more info circulating about RV770 rather than GT200, which could mean RV770 is closer.
- It will not support Direct3D 10.1. As spineless as nVidia is, I think this time they'll stick to their earlier statements and skip this version. It actually makes more sense not to implement it. GPUs based on the current GeForce 9 architecture already have the performance crown. nVidia sells more cards than ATi and S3 combined and while the two latter companies support DX10.1, game devs are reluctant to use it if nVidia can't do it. Hence nVidia can mock the uselessness of those features of ATi chips, while its own chips, not burdened by the added transistor costs of those features, will win in games that don't support those features. A perfect infinite cycle which only nVidia itself can exit, but it likes the way it is now.
- It will beat the GX2. A 65nm chip with ~1 billion transistors should not be bigger than G80 and it should be possible to pack 40-50% more power (compared to G92) into such a transistor budget. That alone would probably not be enough to beat the current dual-chip card in all tests, but nVidia could go for the 55nm "performance" (TSMC 55GC) process, which would allow for higher clocks, roughly 30% higher, so that should do just fine.
(And then again, maybe not. Some unreliable sources speculate that GT200 could in fact be nothing more than dual G92b, but I there's too much talk about a big monolithic GPU, so it can't be all fake.)
 
Ah well, back to the topic. Here are my thoughts on the GT200, pure speculation:
- It will be released later than RV770.
If only to adjust it's market position after RV770-based cards are released.
G100/GT200 is ready for production for quite a while AFAIK.

- It will not support Direct3D 10.1. As spineless as nVidia is, I think this time they'll stick to their earlier statements and skip this version. It actually makes more sense not to implement it. GPUs based on the current GeForce 9 architecture already have the performance crown. nVidia sells more cards than ATi and S3 combined and while the two latter companies support DX10.1, game devs are reluctant to use it if nVidia can't do it. Hence nVidia can mock the uselessness of those features of ATi chips, while its own chips, not burdened by the added transistor costs of those features, will win in games that don't support those features. A perfect infinite cycle which only nVidia itself can exit, but it likes the way it is now.
They'll need to support 10.1 anyway in the end so why not make it sooner than later? After all it's such a small addition to the base 10 specs that they'll be able to support it with several minor tweaks to their G8x architecture.
Another point to consider is that some key 10.1 features can be implemented in the same 10.0 codepath with fallback to 10.0 if the GPU doesn't support 10.1. What this means is that more and more developers will use 10.1 for stuff like Z access and custom MSAA resolve with HDR -- it's not like it's some other renderer or something, it's just a minor addition to your (long and complex already) shaders.
So i think that NV _should_ support 10.1 in G1xx+. It's better for them and for the industry and for everybody and it's relatively simple to implement.
(As for the usefulness of 10.1 in the RV670 i'd agreed with you if i haven't seen the performance of AMD's own 10.1 pingpong demo on the 3870X2 -- after that i tend to think that most of these features are useless for the RV670 because it simply can't handle them with decent perfomance.)
 
G100/GT200 is ready for production for quite a while AFAIK.
Just like RV770?

They'll need to support 10.1 anyway in the end so why not make it sooner than later? After all it's such a small addition to the base 10 specs that they'll be able to support it with several minor tweaks to their G8x architecture.
If it was a set of minor tweaks then these features would have made it into D3D10 :rolleyes:

Jawed
 
Do we know which specific DX10.1 items are lacking from G80 and R600 that prevent them from being branded as such? It looks like quite a few of those DX10.1 features could already be available in hardware but because not all of them are the API access isn't there.
 
DegustatoR said:
It's better for them and for the industry and for everybody and it's relatively simple to implement.
It was simple to implement for ATi, but that doesn't mean it would be simple for nVidia as well. R6xx was built for it, G8x was not. To nVidia, it doesn't matter for whom it's better if they implement DX10.1, with the sole exception of their own profits. Someone would have to design a DX10.1 chip/architecture, that costs money. The new chip would require more transistors to pack the same power as the older ones, since implementing new features is not free. Bottom line is, they don't have to and they won't. While GT200 is another derivative of the phenomenal G80, nVidia's secretly working on a new generation of GPUs that will support DX11. DX10.1 is unnecessary, contra-productive even.
 
=>triniwboy: I think G80 & co. support plain DX10 and not one feature more, while R600 does have some DX10.1 features (but not all, so it can't be treated as a DX10.1 chip). It's because when DirectX 10 was to be released, the requirements had to be shaped in such a way that nVidia could put a DX10 sticker on G80. The rest (that wasn't supported by the G80) was later added in DX10.1.
 
Just like RV770?
No.

If it was a set of minor tweaks then these features would have made it into D3D10 :rolleyes:
It was simple for AMD to implement it in RV670 (need i remind you that R600 isn't DX10.1 compatible?) and i don't see any reason why it would be harder to implement them in an extension of G8x architecture for NV. Do you?
They'll need to implement more changes to the ROPs then AMD of course but it's doable and nothing really stops them from doing it for the G1xx line.
 
I don't have such an insight into the architecture, but let's suppose it's doable. But the question is not "if", the question is "why".
 
It was simple to implement for ATi, but that doesn't mean it would be simple for nVidia as well. R6xx was built for it, G8x was not.
The only thing that was "built for it" in R600 is shader AA resolve which is doable even on the current G8x architecture. Gather4 is a very minor tweak of TMUs for NV, independent blending modes for RTs wasn't in R600 so it's more or less the same between R600 and G80 to add. What else?

To nVidia, it doesn't matter for whom it's better if they implement DX10.1, with the sole exception of their own profits.
Exactly. And to have a fastest and fully functional DX10.1 card on the market means MONEY for them.

Someone would have to design a DX10.1 chip/architecture, that costs money.
You're overestimating the complexity of this stage.

The new chip would require more transistors to pack the same power as the older ones, since implementing new features is not free.
Probably not. You'll save some transistors in ROPs and then you'll use them for new MRT logic. Other features of DX10.1 are simple. I don't think that DX10.1 architecture needs more transistors then DX10 -- at least not in the numbers that are worth mentioning.

Bottom line is, they don't have to and they won't.
Bottom line is -- they have to and they will (DX11, yeah).
The question is -- will they add the 10.1 support in G1xx or will we have to wait for the DX11 architecture.
Considering that G100 is a somewhat significant G8x overhaul (AFAIK anyway) i don't see anything that's stopping them from supporting DX10.1 with it.
The only possible reason that i can think of is that G1xx architecture was designed before DX10.1 specs was made available to them. It is possible, yes.

While GT200 is another derivative of the phenomenal G80, nVidia's secretly working on a new generation of GPUs that will support DX11. DX10.1 is unnecessary, contra-productive even.
DX11 is ways off as far as i understand.
It wouldn't be wise for them to give a feature edge to AMD for such a long time.
As for 'contra-productive' bit -- see above and try to remember that any DX version is a superset of all previous versions. They'll NEED to support DX10.1 in the end. And from what i know about the DX11 specs i'd say that any currently implemented 10.1 features will still be useful for the DX11 architecture.
 
G94 came as a bit of a surprise, yes

It did? Then you haven't been paying too much attention if it caught you by surprise. ;)

After all it's such a small addition to the base 10 specs that they'll be able to support it with several minor tweaks to their G8x architecture.

NV will need to make a change to their TMUs in order to support DX10.1. AMD didn't have to make big changes to their TMUs since they already supported Fetch4 ever since the RV515/RV530/R580 (which is Gather4 under DX10.1).

I'm not sure if you can consider a change to the TMUs "a minor tweak".
 
Do we know which specific DX10.1 items are lacking from G80 and R600 that prevent them from being branded as such?
Not that I'm aware of.

One thing that's pretty clear is that features were removed from D3D10 quite late in the day - and these features made up D3D10.1.

There may be features in D3D10.1 that were never planned for D3D10. As far as I can tell, D3D10.1 was always in the schedule. The question is why? Just natural evolution, a 1-year update (tick-tock: 10.0, 10.1, 11.0, 11.1 appears to be have been the plan)?

S3's D3D10.1 support (their first D3D10.x GPU :?: ) seems to indicate that D3D10 was cut back quite heavily. Put simply, it appears S3 was aiming for the feature set of D3D10.1 all along, but back then it was going to be D3D10.0.

It looks like quite a few of those DX10.1 features could already be available in hardware but because not all of them are the API access isn't there.
It's very likely. After all the transistor count differences for RV610->RV620 and RV630->RV635 indicate a great deal of similarity ;)

In summary it's quite easy to point the finger at NVidia and Intel for the cut-backs that resulted in D3D10 :cry:

Jawed
 
It was simple for AMD to implement it in RV670 (need i remind you that R600 isn't DX10.1 compatible?)
Some of D3D10.1 might have been opportunistically added - we can't tell (e.g. increased precision? - leading into double-precision support?). But as I've already said, the stark similarity of RV610/620 and RV630/635 (the earlier GPUs were planned to release <6 months after R600) indicates that the fundamentals of D3D10.1 were in place.

and i don't see any reason why it would be harder to implement them in an extension of G8x architecture for NV. Do you?
Something like the count of vertex attributes strikes me as a fundamental design issue. D3D10.1 requires a doubling in the count of attributes per vertex. This is a basic "bus width" and buffering constraint within a GPU. It's sort of similar to specifying that the GPU should setup two triangles per clock, instead of 1 (I'm not suggesting you'd want 2 per clock). Every one of the 16 SIMDs in G80 needs to be able to input and output twice as much vertex data per clock to meet the D3D10.1 spec. Because VS output will prolly be routed to a different SIMD this seems like a non-trivial change, involving buses that link the SIMDs.

Then there's the output limitation of 1024 32-bit values per vertex in the GS. ATI hardware is clearly designed to support far more (more than one ATI person has indicated as much), and the fact people complain about the low limit indicates that some IHVs were struggling, necessitating the reduction. Something that wasn't corrected in D3D10.1.

They'll need to implement more changes to the ROPs then AMD of course but it's doable and nothing really stops them from doing it for the G1xx line.
Nothing stops them, I agree. But the range of cut-backs (and the resulting "asymmetries") affecting D3D10 indicates that some IHVs were significantly behind on things that couldn't be easily fixed.

Since NVidia was planning a more radical GPU for last November, in time for D3D10.1, it seems reasonable to suppose GT200 will be D3D10.1. At the same time I can't help being pessimistic...

Jawed
 
This is a basic "bus width" and buffering constraint within a GPU. It's sort of similar to specifying that the GPU should setup two triangles per clock, instead of 1 (I'm not suggesting you'd want 2 per clock). Every one of the 16 SIMDs in G80 needs to be able to input and output twice as much vertex data per clock to meet the D3D10.1 spec. Because VS output will prolly be routed to a different SIMD this seems like a non-trivial change, involving buses that link the SIMDs.
It's not related to (internal) bus width at all as you can even support 1k input attributes per shader but the hardware just need to be able to fetch a single attribute per instruction.
What you need to support this feature is probably a bigger pre transformed vertices cache.
 
It's not related to (internal) bus width at all as you can even support 1k input attributes per shader but the hardware just need to be able to fetch a single attribute per instruction.
What you need to support this feature is probably a bigger pre transformed vertices cache.
I agree about the cache.

The peculiarity here is that the GS can generate upto 32 attributes per vertex when it feeds setup/rasteriser. The 16 attributes limitation affects just input/output of VS. We've heard in the past that when GS is amplifying data (vertex count or data per vertex, presumably) that G80 performance falls off - the hypothesis being that GS in this mode is only running on a single multiprocessor (or single cluster?). This appears to jibe with attribute bandwidth, per se, being a problem in NVidia's existing architecture.

So I think these limitations are connected and are about more than simply on-die buffer space.

Jawed
 
NV will need to make a change to their TMUs in order to support DX10.1. AMD didn't have to make big changes to their TMUs since they already supported Fetch4 ever since the RV515/RV530/R580 (which is Gather4 under DX10.1).
Isn't NV doing something very similar to Gather4 since NV2x?
 
Status
Not open for further replies.
Back
Top