NVIDIA: Beyond G80...

I dunno, what's your guess for DP performance of the first GPUs to support it?

Err, are you actually asking Bob to guess what DP performance will be?

I'd be looking at SSE vs. SSE2 or some such and figure between quarter and half speed. Wouldn't expect better than half, considering memory overhead increases. Then it's a matter of figuring out whether NV's next part is aiming at R600 500Gflops, or whether they're going to hail Mary to the teraflop endzone. So, somewhere between a quarter of 500 and a half of a teraflop would be my guess. Sadly, that's a somewhat less informed opinion than Bob's. :)
 
As a quick note, 65nm HPC Cell supposedly has a peak DP rate of 102GFLOPs. So that's just one of the DP-capable processors GPUs will have to compete with in the HPC space in the future.
 
Although not entirely related to the G8x family, i wonder if this is the first 65nm GPU ever :D:

http://www.nvidia.com/object/IO_42066.html


Still 90nm...

NVIDIA_G72.DEV_01D3.1 = "NVIDIA GeForce 7300 SE/7200 GS"

But G78 is also in the inf of FW 165.01:
NVIDIA_G78.DEV_04C0.1 = "NVIDIA G78"
NVIDIA_G78.DEV_04C1.1 = "NVIDIA G78 "
NVIDIA_G78.DEV_04C2.1 = "NVIDIA G78 "
NVIDIA_G78.DEV_04C3.1 = "NVIDIA G78 "
NVIDIA_G78.DEV_04C4.1 = "NVIDIA G78 "
NVIDIA_G78.DEV_04C5.1 = "NVIDIA G78 "
NVIDIA_G78.DEV_04C6.1 = "NVIDIA G78 "

;)
 
Err yes, what's so strange about that?
Well, you'd expect his NDAs to be a slight problem there... :) Anyway, FP32-FPUs-That-Can-Also-Do-FP64 (rather than the other way around) tend to be quarter-speed for FP64, I think. I'd imagine it's not *strictly* impossible you'd be slightly faster or slower, though.
 
Arun - That sounds right, since multipliers scale as the square of the input bit-width. So maybe 2x perf hit is a bit optimistic. I was thinking that would keep the required register fetch bandwidth per cycle the same and that the ALU size increase might not be so significant.
 
Arun - That sounds right, since multipliers scale as the square of the input bit-width. So maybe 2x perf hit is a bit optimistic. I was thinking that would keep the required register fetch bandwidth per cycle the same and that the ALU size increase might not be so significant.

Yeah -- it all depends on what they decide to do. I don't know what the ALU size increase would be, but I know I'd prefer area to be spent on single rather than super-fast double at this point.

http://www.hpc2n.umu.se/para06/papers/paper_231.pdf
Even the Intel x87 processor with the use of the Streaming SIMD Extensions (SSE) unit on the Pentium III does 4 flops/cycle for single precision, and SSE2 does 2 flops/cycle for double. Therefore, for any processor with SSE and SSE2 (e.g. Pentium IV), the theoretical peak of single is twice that of double, and on a chip with SSE and without SSE2 (e.g. some Pentium III), the theoretical peak of single is four times that of double. AMD processors share the same relation between SSE and SSE2, the only difference being that their x87 units can do 2 flops/cycle for any precision. Appendix 1 contains additional information on the extensions to the IA-32 instruction set.
 
dnavas - actually I completely agree. The GPGPU stuff doesn't seem to be super well aligned with the core focus of GPUs (from my layman's viewpoint). Maybe the gpgpu and gpu markets wll get addressed by chips with philosophically very similar architectures, but different ALU mixes in the future?
 
Who's to say they won't use the same designation (like the 7600 GT/GS 90nm G73 -> 7600 GT/GS 80nm G73-B1) ? ;)

G78 is the name of G72 in 65nm, because it is not only a optical shrink to go from 90nm to 65nm. ;)
Of course it is possible that NV later shifts to the G78 core.
 
The following is speculative, and should not be considered based on any non-public information, as it quite simply is not so!
The codenames used are most certainly incorrect, and some might not even have their own codenames. As such, they are used exclusively to permit further discussion, and nothing else!

February->March 2007
G81: Optical shrink of G80 on 80GT (G80 currently is on 90GT - R600 is either on 80GT or 80HS). Introduces the 7/8 clusters SKU. Exact specifications (identical to G80 on 90GT or not; GDDR4 or not; etc.) depend on the R600's specifications. 1.1GHz GDDR4 and 600-625MHz is possible, if considered necessary. If such a step is taken, the parts with redundancy would likely also use GDDR4, which would define the 8900 line-up. There will NOT be a notebook version.
G82: G8x on 80GT with 6 native clusters and 5 native ROP partitions. Can act as a 8800GTS (which will then cost $399) or a 8700-series GPU, which would likely have 5 active clusters, 4 active ROP partitions and a 256-bit memory bus. This is needed for the 8800(/8900?)GTS, as G81 will mostly be 8/7 cluster parts due to improved yields with the smaller 80nm die. There will be a notebook version. Maybe pin-compatible with G81(?)
G83: G8x on 80GT(?) with 4 native clusters and 2 native ROP partitions. 192-bit memory bus and roughly $199 target introduction price. 8600/8500-Series, possible version with 3 active clusters.

[...]

Uttar

Now that we know more about Ultra, G84 etc, could we have some updated speculation regarding G90 etc... please! :p
 
Now that we know more about Ultra, G84 etc, could we have some updated speculation regarding G90 etc... please! :p
Heh, why not! (note: those are the wrong codenames)

G91: 4Q07, 192SPs, 2.0-2.6GHz shader domain, 24 TMUs with free trilinear, 16 beefed-up ROPs (stencil enhancements of G84; better blending rates), 750-800MHz+ core clock. 1.4GHz+ GDDR4 on a 256-bit memory bus. 200-240mm² on 65nm. Requires NVIO.
G93: 1H08, 96SPs, 1.8-2.4GHz shader domain, 16 TMUs with free trilinear, 8 ROPs similar to G92's, 700-750MHz+ core clock. 1.2GHz+ GDDR4 on a 128-bit memory bus. 130-150mm² on 65nm, smaller if on 55nm.
G97: 1H08, 48SPs, 1.4-1.8GHz shader domain, 8 TMUs with free trilinear, 4 weaker ROPs, 600-650MHz+ core clock. DDR2/DDR3 on a 64-bit memory bus. 70-85mm² on 55nm.
MCP78: 1H08, 24SPs, 1.2-1.4GHz shader domain, 4 TMUs with free trilinear, 2 weaker ROPs, 550-600MHz+ core clock. 50-60mm²(+NB+SB) on 55nm. Will use 8MiB+ eDRAM. Total chip around 120mm².

I would tend to believe all G9x derivatives will be based on the low-power process variant due to different custom design rules. 55nm-LP at TSMC will become available in Q3 and will be a pure optical shrink (analogue will scale too, apparently!). This also puts UMC and Chartered out of the picture, most likely...

Please note that this is, once again, mostly speculation. The goal is to come up with realistic estimates for the target die sizes, really. As for the roadmap beyond these parts, it'll be interesting to see if they optically shrink the 192SPs part to 55nm or not. I would expect them to, really, as a 6-months-refresh before the 45nm part in 4Q08 or so.
 
Now that we know more about Ultra, G84 etc, could we have some updated speculation regarding G90 etc... please! :p
Heh, why not! (note: those are the wrong codenames)

G91: 4Q07, 192SPs, 2.0-2.6GHz shader domain, 24 TMUs with free trilinear, 16 beefed-up ROPs (stencil enhancements of G84; better blending rates), 750-800MHz+ core clock. 1.4GHz+ GDDR4 on a 256-bit memory bus. 200-240mm² on 65nm. Requires NVIO.
G93: 1H08, 96SPs, 1.8-2.4GHz shader domain, 16 TMUs with free trilinear, 8 ROPs similar to G92's, 700-750MHz+ core clock. 1.2GHz+ GDDR4 on a 128-bit memory bus. 130-150mm² on 65nm, smaller if on 55nm.
G97: 1H08, 48SPs, 1.4-1.8GHz shader domain, 8 TMUs with free trilinear, 4 weaker ROPs, 600-650MHz+ core clock. DDR2/DDR3 on a 64-bit memory bus. 70-85mm² on 55nm.
MCP78: 1H08, 24SPs, 1.2-1.4GHz shader domain, 4 TMUs with free trilinear, 2 weaker ROPs, 550-600MHz+ core clock. 50-60mm²(+NB+SB) on 55nm. Will use 8MiB+ eDRAM. Total chip around 120mm².

I would tend to believe all G9x derivatives will be based on the low-power process variant due to different custom design rules. 55nm-LP at TSMC will become available in Q3 and will be a pure optical shrink (analogue will scale too, apparently!). This also puts UMC and Chartered out of the picture, most likely...

Please note that this is, once again, mostly speculation. The goal is to come up with realistic estimates for the target die sizes, really. As for the roadmap beyond these parts, it'll be interesting to see if they optically shrink the 192SPs part to 55nm or not. I would expect them to, really, as a 6-months-refresh before the 45nm part in 4Q08 or so.

Sorry to ask, I know very little about 3D technology but that 256-bit memory bus on G91 caught my attention, is my lack of experience tricking me or does it make sense to ask why do you think they are going to go to a smaller bus size? It would be nice if you keep your explanation very simple.. if you can give one :p
 
[Brick_top];983685 said:
Sorry to ask, I know very little about 3D technology but that 256-bit memory bus on G91 caught my attention, is my lack of experience tricking me or does it make sense to ask why do you think they are going to go to a smaller bus size? It would be nice if you keep your explanation very simple.. if you can give one :p
In the most simplistic terms; smaller die size, lesser complexity (256b vs 384b) while achieving the same memory bandwidth.
 
[Brick_top];983685 said:
Sorry to ask, I know very little about 3D technology but that 256-bit memory bus on G91 caught my attention, is my lack of experience tricking me or does it make sense to ask why do you think they are going to go to a smaller bus size? It would be nice if you keep your explanation very simple.. if you can give one :p

He means using a 256 bit bus for a smaller die size, but the use of GDDR4 memory to make up for the smaller bus
 
I guess I understood they would be achieving the bandwidth with those memory modules and higher clocks and that they are saving die size so they can use it in "other things"? Would that mean that they feel the next gen isn't going to be very bandwidth hungry? Does any one find curious that in this case AMD would have an even bigger bandwidth advantage since they have the wider bus while having access (I would guess) to the same type of memory nvidia might use? Maybe because of bigger die size AMD isn't going to use that high clocked memory and save money that way? Am I making any sense? I guess it can't be explained as simply as I'm trying.
 
They save on costs. The saved die space is not used for anything, the chip is physically smaller.

Larger dies have higher production costs, which hurts profitability.
 
Back
Top