NVIDIA GT200 Rumours & Speculation Thread

Status
Not open for further replies.
I see your point, but I still do find it interesting that NVIDIA went from 8800 GTX to 9800 GTX while offering very little (at best) performance improvement and no major technological improvement between 8 series and 9 series over a span of 1.5 years. This is in stark contrast to the pattern from 4-->5-->6-->7-->8 series, where each new series had a radically different design from the prior. Only since the days of Geforce 2-->4 did we see relatively little technological leaps, and the main reason for that was lack of strong competition. Certainly the lack of strong competition today is playing a part in the 8-->9 series transition too.

Surely by now, NVIDIA has learned never to underestimate the competition after what happened with the R300/NV30 generation. I can't see them being happy with just "getting by" by being slightly ahead of the competition. The only reasonable conclusion I can come to is that they are holding their best cards closest to their chest in the face of relatively light competition (in the high end market) today. I would be very surprised if GT200 is not a big leap in some areas. As far as I'm concerned, the 9800 is just a placeholder to get by until something much better comes along.


I agree.


The difference between GeForce 8 series and GeForce 9 series is more like the tweaks we saw in the GeForce 2, 4, 6 and 7 series, or basicly any of the very minor refinements to any given member of the GeForce line. Obviously the GF9 series is not worthy of the name, it's little more than the GF8.

Even though GT200 will most likely be a refresh/overhaul of G80, (like NV47/G70 was over NV40) instead of a clean-sheet architecture like NV40 was in 2004 and G80 was in 2006, the G200 will be the first GPU that's really worthy of being called GeForce 9, even though Nvidia will no doubt give it the GeForce 10 name.

We're still basicly looking forward to the long-awaited "NV55" as I've said before.

I'm sure Nvidia is saving its biggest gun, NV60, what'll be called GeForce 11, the next clean-sheet architecture, for 2009-2010 to take on Larrabee.

Nvidia changes the way it gives codenames too often now so that's why I use the most easily understood names NV55 and NV60 even though those aren't official Nvidia internal codenames.
 
Last edited by a moderator:
Jeff Brown : less than 1 billion transistors

Barrons : 240 ALUs

The transitior count and number of ALUs seem to be indicating that the GT200 is just a enhanced G80 with 512bit bus. In addition, the GDDR5 can give ATI and Nvidia a leverage on bandwidth-intensive applications. Personally, I bet the GT200 will be launched by July, if RV770 X2 cannot match up the projection of GT200 performance.
 
Jeff Brown : less than 1 billion transistors

Barrons : 240 ALUs

The transitior count and number of ALUs seem to be indicating that the GT200 is just a enhanced G80 with 512bit bus. In addition, the GDDR5 can give ATI and Nvidia a leverage on bandwidth-intensive applications. Personally, I bet the GT200 will be launched by July, if RV770 X2 cannot match up the projection of GT200 performance.

Supposed specs-
65nm
240SPs in 10 - 24SP clusters
120TMU/TFUs
32 ROPs
512bit bus
1gb of ~2200mhz effective DDR3
600-700mhz core
~1500mhz shader domain

Source
He is the one that kept talking about a G90 being released in Nov with a ~2.4ghz shader clock.

Am i the only one who thinks that "about 1 billion transistors" sounds quite a bit lower than we'd expect?..
Eh, 1.3-1.4b is more likely IMO.
 
Last edited by a moderator:
Why 120TMUs anyway? I've no idea what it'll look like but is there some kind of architectural restriction that dicates to have per 8 ALUs 4 TMUs or what?

By the way I'm not entirely sure but the G80 32 TA/64 TF scenario sounded better than the 64 TA/ 64 TF thingy on G92.

Strange; I expected the ALU:TEX ratio to change from 2:1 to something more like 3:1 this time around. Do we really need that much fillrate these days, or is this some kind of ancient philosophy leftover from the "fillrate is king" era? :?:
 
Suuposed specs-
65nm
240SPs in 10 - 24SP clusters
120TMU/TFUs
32 ROPs
512bit bus
1gb of ~2200mhz effective DDR3
600-700mhz core
~1500mhz shader domain

Source
He is the one that kept talking about a G90 being released in Nov with a ~2.4ghz shader clock.


Eh, 1.3-1.4b is more likely IMO.

These specs are rather unbelievable. Do you think it could be true in 65nm? Moreover is it any possibility to get these specs with only 250M transistors more than G92? On the other hand he is saying something about 1.3 billion transistors in GT200. Then these specs could be true.
:)
 
Why 120TMUs anyway? I've no idea what it'll look like but is there some kind of architectural restriction that dicates to have per 8 ALUs 4 TMUs or what?

By the way I'm not entirely sure but the G80 32 TA/64 TF scenario sounded better than the 64 TA/ 64 TF thingy on G92.

Strange; I expected the ALU:TEX ratio to change from 2:1 to something more like 3:1 this time around. Do we really need that much fillrate these days, or is this some kind of ancient philosophy leftover from the "fillrate is king" era? :?:

Well he had originally posted a 60TMU and 120TFUs but then changed it to what you see.
 
Supposed specs-
65nm
240SPs in 10 - 24SP clusters
120TMU/TFUs
32 ROPs
512bit bus
1gb of ~2200mhz effective DDR3
600-700mhz core
~1500mhz shader domain

Source
He is the one that kept talking about a G90 being released in Nov with a ~2.4ghz shader clock.


Eh, 1.3-1.4b is more likely IMO.

A can of whoopass!
 
Did anyone take notice of this slide during the Analysts Day presentation ?
Could it be a simple exemplification of CUDA and thread management..., or does the actual number give away something else (other than 1U setups) ? ;)
 
I rather think that relates to 9800 GX2 Quad-SLI. Where Nvidia count's every ALU as a core but only in their GPUs, disregarding the SSE2-Units in the Quad-Core-CPUs.
 
I rather think that relates to 9800 GX2 Quad-SLI. Where Nvidia count's every ALU as a core but only in their GPUs, disregarding the SSE2-Units in the Quad-Core-CPUs.
Well, you don't have an instruction pointer per element in a SSE vector, but yeah - it's a massive load of bullshit if I say so myself. The 'stream processor' nomenclature is already a bit stupid, but it's far from the 'errr that's FUD' line from a legal POV. Calling them cores, on the other hand, is really pushing it IMO...

Going back to GT200, could it be that both 240 SPs and 384 SPs are correct? i.e. GT200 would have 80 TMUs, 240 SPs, 32 ROPs/GDDR3 and a refresh coming out later would have 96 TMUs, 384 SPs and 32 ROPs/GDDR5? That would be quite aggressive on the same process node (or just one half-node ahead; i.e. 65->55nm) but not entirely impossible, especially given that GT200 did get delayed apparently.
 
Well, you don't have an instruction pointer per element in a SSE vector, but yeah - it's a massive load of bullshit if I say so myself. The 'stream processor' nomenclature is already a bit stupid, but it's far from the 'errr that's FUD' line from a legal POV. Calling them cores, on the other hand, is really pushing it IMO...

Well lets just say I'm fairly certain that they don't have an instruction pointer for each SIMD element/lane in G80 either! Each lane of an SSE SIMD is just as much a "core" or "processor" as each lane in a G80 SIMD, which is to say its the correct terminology about as much as calling lead, gold! Also know a load of marketing BS that is so blatantly BS as to have most of Nvidia legal up all night sweating the eventual class action lawsuit!

Look at it this way, if AMD/Intel wanted to use the same BS for real processors they could sell something like Core2Quad as a 56 "imaginary BS core" design! This marketing BS just needs to stop IMNSHO, its gotten entirely out of hand.

Aaron Spink
speaking for myself inc.
 
Well lets just say I'm fairly certain that they don't have an instruction pointer for each SIMD element/lane in G80 either!
Well, it doesn't matter how you think it's implemented, the fact remains that you can run branches and have the 16 or 32 elements inside a batch each diverge in a different direction. Good luck doing that with SSE in the same way... But yes, that's so ridiculously far from being able to call it a 'core' it's not even funny. Although I would argue it is such an incredibly imprecise term that I'm curious what arguements would be brought forward in a class action suit...

I mean, just look at the first definition you stumble upon via Google: "The processing part of a CPU chip minus the cache. It is made up of the control unit and the arithmetic logic unit (ALU)." - I think it'd be ridiculously easy to prove that there is a per-ALU 'control unit' even if it's ridiculously small and doesn't handle much of anything. And don't even get me started on what you'd get from traditional dictionaries. I agree the way they're using it is massively ridiculous either way though (and may or may not prove illegal in a class action suit), and it does need to stop ASAP.
 
Well lets just say I'm fairly certain that they don't have an instruction pointer for each SIMD element/lane in G80 either! Each lane of an SSE SIMD is just as much a "core" or "processor" as each lane in a G80 SIMD, which is to say its the correct terminology about as much as calling lead, gold!

Side question - how is SSE setup? Is it serial scalar like G80, vector based with no co-issue capability or VLIW like R600?
 
Well, it doesn't matter how you think it's implemented, the fact remains that you can run branches and have the 16 or 32 elements inside a batch each diverge in a different direction. Good luck doing that with SSE in the same way... But yes, that's so ridiculously far from being able to call it a 'core' it's not even funny. Although I would argue it is such an incredibly imprecise term that I'm curious what arguements would be brought forward in a class action suit...

I can do it inside SSE rather easily. I really don't consider mask registers and running serially to count as instruction pointers. Mask register have been around since the 60s and been used with vectors since the 60s. Its inefficient as hell, just like on G80.

I mean, just look at the first definition you stumble upon via Google: "The processing part of a CPU chip minus the cache. It is made up of the control unit and the arithmetic logic unit (ALU)." - I think it'd be ridiculously easy to prove that there is a per-ALU 'control unit' even if it's ridiculously small and doesn't handle much of anything. And don't even get me started on what you'd get from traditional dictionaries. I agree the way they're using it is massively ridiculous either way though (and may or may not prove illegal in a class action suit), and it does need to stop ASAP.

As far as a Class action suit, just take H&P.

Aaron Spink
speaking for myself inc.
 
Side question - how is SSE setup? Is it serial scalar like G80, vector based with no co-issue capability or VLIW like R600?

Its whatever you want the software to due with varying degrees of efficiency. As far as actual hardware is concerned, like all the other devices out there (G80/R600) its just a SIMD pipeline.

Aaron Spink
speaking for myself inc.
 
Well, it doesn't matter how you think it's implemented, the fact remains that you can run branches and have the 16 or 32 elements inside a batch each diverge in a different direction. Good luck doing that with SSE in the same way... But yes, that's so ridiculously far from being able to call it a 'core' it's not even funny. Although I would argue it is such an incredibly imprecise term that I'm curious what arguements would be brought forward in a class action suit...
I'm not sure what GPUs would do different in that case, at least on the hardware level.
At the level of individual units, it's still some kind of serialized execution and then masking off of invalid elements.
The recombination is implicitly handled by the GPU hardware, while x86 is more explicit, but I don't see cores defined too much by the software side of things.

I mean, just look at the first definition you stumble upon via Google: "The processing part of a CPU chip minus the cache. It is made up of the control unit and the arithmetic logic unit (ALU)." - I think it'd be ridiculously easy to prove that there is a per-ALU 'control unit' even if it's ridiculously small and doesn't handle much of anything.
The "control unit" for a G80 channel is a far cry from the front-end of a CPU.

I've been trying to puzzle through what is the bare minimum for what could be considered a "core".

There are tons of different variations on the theme, but I think it might be useful to try to boil things down to some minimum.

My favorite thus far for 1 core: 1 physically independent instruction issue network paired with at least one decoder, an amount of physical hardware capable of managing one or more instruction pointers, access to at least one ALU, and the equivalent of at least 1 read/write port or the ability to somehow get reads/writes to memory.

It's really the independent issue network(edit: "+", not "/")physical instruction pointer that seems the hardest to make fuzzy with clustering and SIMD execution.
 
Last edited by a moderator:
Here is further information regarding the release date. It is definitely set for July, with May being a point of further information about the GT200 from Nvidia:

http://xbitlabs.com/news/video/disp...ribes_Next_Generation_Graphics_Processor.html

Looks like Computex is going to be quite interesting... I'm only $250 from there, so maybe I should fly over and check it out for everyone. I would really like to know if this is DX10 or DX10.1... not that it is a deal breaker to be sure.
 
Status
Not open for further replies.
Back
Top