The Official G84/G86 Rumours & Speculation Thread

Arnold Beckenbauer · Apr 10, 2007

Who says, G84 does have two shader clusters?
What's about this: 4 cluster with Vec8 ALUs and 4 TMUs per cluster?

trinibwoy · Apr 10, 2007

Good point. But it would still be woefully underwhelming in the shader department.

Kocur · Apr 10, 2007

I think that 8500 and 8600 cards are going to be relegated very quickly to the low-end segment. 8800GTS cards are going to be discontiued, 8800GTX is going to be a new GTS and the Ultra part will regin supreme.

Then we can expect proper midrange parts to appear (probably 8700 series).

stevem, I think you are right about the proper, highly overclockable, midrange part approaching 8800GTS performance (700MHz, 64SP core could suffice). Releasing such a card at 200$ right now would cannibalize 8800GTS sales.

Just a food for thought. What do you think guys?

Jawed · Apr 10, 2007

OK, so I can revise the 8600GTS specs in comparison with 8800GTS 320MB. G80 shares 1 TMU quad between 2 shader arrays (each array is 8 SPs), but it seems G84 has 1 TMU for each shader array. A woeful ALU:TEX ratio (however you try to fudge it)...

8600GTS has a

bandwidth of 36.64GB/s
with 8 ROPs clocked at 745MHz, i.e. a fillrate of 5960MP/s
AA fillrate 23840MP/s
zixel rate of 47680MP/s
16 TMUs for a bilinear rate of 11920MT/s
trilinear rate of 23840MT/s
32 SPs @1450MHz for 92.8GFLOPs (assuming the overclocked card doesn't overclock the SPs)

Compared with 8800GTS 320MB, which has:

64GB/s bandwidth
fillrate of 10000MP/s
AA fillrate of 40000MP/s
and a zixel rate of 80000MP/s
bilinear rate of 12000MT/s
trilinear rate of 24000MT/s
96 SPs @ 1200MHz for 230.4GFLOPs

So in percentages, 8600GTS v 8800GTS:

bandwidth = 57%
fillrate = 60%
AA fillrate = 60%
zixel rate = 60%
bilinear rate = 99%
trilinear rate = 99%
GFLOPs = 40%

For the games (averaged across the 3 resolutions):

SC:CT HDR with AF = 71%
SC:CT HDR with AA/AF = 67%
FEAR AF = 60%
FEAR AA/AF = 61%
Quake4 AA/AF = 75%
Company of Heroes = 59%

I still think the overclocked 8600GTS is holding up well in comparison with 8800GTS. The transistor count includes NVIO functionality.

Jawed

stevem · Apr 10, 2007

Jawed said:
I still think the overclocked 8600GTS is holding up well in comparison with 8800GTS. The transistor count includes NVIO functionality.

Only because 8800GTS performance seems patchy? We'd expect an 80nm 1/2 G80 with 32 TMUs & 256bit memory to seriously better this.

Geeforcer · Apr 10, 2007

Jawed said:
OK, so I can revise the 8600GTS specs in comparison with 8800GTS 320MB. G80 shares 1 TMU quad between 2 shader arrays (each array is 8 SPs), but it seems G84 has 1 TMU for each shader array. A woeful ALU:TEX ratio (however you try to fudge it)...

8600GTS has a

bandwidth of 36.64GB/s

with 8 ROPs clocked at 745MHz, i.e. a fillrate of 5960MP/s

AA fillrate 23840MP/s

zixel rate of 47680MP/s

16 TMUs for a bilinear rate of 11920MT/s

trilinear rate of 23840MT/s

32 SPs @1450MHz for 92.8GFLOPs (assuming the overclocked card doesn't overclock the SPs)

Compared with 8800GTS 320MB, which has:

64GB/s bandwidth

fillrate of 10000MP/s

AA fillrate of 40000MP/s

and a zixel rate of 80000MP/s

bilinear rate of 12000MT/s

trilinear rate of 24000MT/s

96 SPs @ 1200MHz for 230.4GFLOPs

So in percentages, 8600GTS v 8800GTS:

bandwidth = 57%

fillrate = 60%

AA fillrate = 60%

zixel rate = 60%

bilinear rate = 99%

trilinear rate = 99%

GFLOPs = 40%

For the games (averaged across the 3 resolutions):

SC:CT HDR with AF = 71%

SC:CT HDR with AA/AF = 67%

FEAR AF = 60%

FEAR AA/AF = 61%

Quake4 AA/AF = 75%

Company of Heroes = 59%

I still think the overclocked 8600GTS is holding up well in comparison with 8800GTS. The transistor count includes NVIO functionality.

Jawed

It seems that 8600 is performing right about where we'd expect based on specks. The question is, why did Nvidia design their mid-rage offering this way, with a clear performance gap in the lineup? Is there a 8600 GTX in the works? Or 8800 GT?

Jawed · Apr 10, 2007

When we get 8800GTX v 8600GTS then we'll get a better picture, I suppose.

Jawed

Jawed · Apr 10, 2007

Jawed said:
G80 shares 1 TMU quad between 2 shader arrays (each array is 8 SPs), but it seems G84 has 1 TMU for each shader array. A woeful ALU:TEX ratio (however you try to fudge it)...

Hmm, I'm beginning to think G84 has less TMUs.

8800GTS as % of 8800GTX:

bandwidth = 74%
fillrate = 72%
bilinear texel rate = 65%
GFLOPs = 67%

If I configure G84's 4 ALU arrays (assuming they're all 8 SPs) in pairs, each pair sharing a quad TMU, then that's 8 TMUs.

So, comparing 8600GTS versus 8800GTS:

bandwidth = 57%
fillrate = 60%
bilinear texel rate = 50%
GFLOPs = 40%

and 8600GTS versus 8800GTX:

bandwidth = 42%
fillrate = 43%
bilinear texel rate = 32%
GFLOPs = 27%

All the rates fall into "similar scaling" patterns. The most odd result there is the overclocked GTS's GFLOPs, but that's really because its other rates are slightly inflated compared to stock.

8600GTS is looking, specification-wise, to be 1/4 of 8800GTX, with a bit of a bump for fillrates/bandwidth. If that means it ends up an over-priced re-run of X1600XT, well forewarned is forearmed.

(One is also left wondering whether the coming 8800U is the real top-end G80 - and that G84 should be 1/4 of that).

Jawed

Arun · Apr 10, 2007

So, wait, things went from 1/2th G80 to 1/4th G80 just because of a random chinese news source? Errrr...

And for what it's worth, I really like to be able to visualize the theoretical percentages like that, Jawed. Very nice idea and very convenient, thankies for the effort!

trinibwoy · Apr 10, 2007

Yeah I think it all started because the new version of Everest reports 32 units for G84.

INKster · Apr 10, 2007

trinibwoy said:
Yeah I think it all started because the new version of Everest reports 32 units for G84.

Yep.

Arun · Apr 10, 2007

Well, they could be changing how the ALUs work, perhaps by grouping 16 processors per multiprocessor and thus doubling the batch coherence... That'd make redundancy more difficult to manage (unless it's a fair bit more advanced than just disabling entire blocks of multiprocessors!) but it might reduce the transistor count slightly.

I'm just speculating here, as I don't know anything yet personally. But it'd certainly explain why it would detect less ALUs!

MulciberXP · Apr 11, 2007

nVidia also has a history of barrowing a lot of their refresh ideas and putting them in the mid/low end cards well before the refresh itself comes out.

INKster · Apr 11, 2007

MulciberXP said:
nVidia also has a history of barrowing a lot of their refresh ideas and putting them in the mid/low end cards well before the refresh itself comes out.

Half node process trials is surely one of those (NV43, G73-B1, G84).

Jawed · Apr 11, 2007

It seems to me 8500GT is lined up to be G84 with one cluster disabled, but with both ROP/MCs blocks active: 16 SPs, 4 TMUs, 8 ROPs, 128-bit.

Presumably, then, there'll be G86s aimed at lower clocks or something (cheapest process at TSMC) with 16 SP, 4 TMUs and 8 ROPs. Then there's a question of redundancy within those cores. Though why this die is apparently "running late" is pretty curious...

I'm dubious that there'll never be a 64-bit (4 ROP) G82 or whatever at some point. TurboCache?

Jawed

Twinkie · Apr 11, 2007

Hmm could it be possible that some of the quads (clusters) are disabled for yields?

Since G80 is quoting b3d "..an 8-way MIMD setup of 16-way SIMD SP clusters. Inwardly, each 16 SP cluster is further organised in two pairs of 8.."

Couldnt G84 be an 4-way MIMD setup of 16-way SIMD SP clusters. Inwardly, each 16 SP cluster is further organised in two pairs of 8? Since the use of 80nm process and have the die size of 160mm^2 (not sure about this though) im sure this might be possible, just like how the G73 was really 4 quads.

So, just pure specualation:
G84
80nm
64 scalar shaders
divided into 4 clusters
16 SP per cluster (8x2)
1 TMU quad per 2 shader array results in a total of 16 TMUs
12 ROPs (where G84 is capable of using 192bit memory interface - more on this later)

So basically, this is 1/2 of G80 however from what we can gather from the current rumours is

8600GTS/8600GT
G84
80nm
32 scalar shaders (sp)
divided into 2 clusters (where 2 out of 4 clusters is disabled)
8 TMUs
8 ROPs (partitioned into 4, where each partition is dedicated to the 64 bit memory channel i.e 128bit memroy interface)

8500GT
G84
80nm
16 scalar shaders (sp)
divided into 1 clusters (3 our of 4 clusters being disabled)
4 TMUs
8 ROPs (partitioned into 4, where each partition is dedicated to the 64 bit memory channel)

i.e the current 8600/8500 series are infact G84s with 2/3 clusters disabled. However, it gets more interesting when the latest drivers list these

// 0400 - NVIDIA GeForce 8600 GTS
// 0401 - NVIDIA G84-350
// 0402 - NVIDIA GeForce 8600 GT
// 0403 - NVIDIA G84-200
// 0404 - NVIDIA G84-100
// 0405 - NVIDIA G84-50

With the initial performance benchmarks against the last gen high end cards, it is clear that the 8600GTS is clearly struggling to keep up with the slowest high end cards from last gen.
e.g 7900GT, X1950pro. And most notably the gap between the 8800GTS 320mb is quite big, and the idea of the 8600GTS being the bridge product is somewhat illogical. This could be somewhat backed up by the fact that the initial GTS versions will only have 256mb. (Could because of marketing reasons..)

So my theory right now is that nVIDIA actually has faster versions of the G84 core, but isnt scheduled for launch just yet. The reason behind this could be because of yields, or they see no point of releasing a product that performs similiar to the 8800GTS 320mb, or they are planning to release it when the RV630 hits the market etc etc.

These faster G84s could be any of the bolded above and one or two could be the products between the 8600GTS and the 8800GTS 320mb. Also the rumours on the possible "ultra" version of the 8600 series. There may well be the GTX version of the 8600 series as well (similiar to the monikers used back in the FX days where high/mid/low range had its own flagshgip card in the form of the 5900/5700/5200 ultra)

So, on pure specualation
8600ultra
G84
80nm
core clock 800? (800mhz was possible on stock volts/stock cooling, but this version could potentially employ a dual slot cooler, e.g just like the RV630XT)
64 scalar shaders
16 TMUs
12 ROP
192bit memory interface (if they managed 384bit for the high end, why not for the mid end. Since the move to 256bit seems unrealistic for cost, 128bit sounds unrealistic for next gen DX10 mid end card that might be bottlenecked by bandwidth and is also expected to perform much faster than last gen high end cards) i.e this results in a new PCB, and 6 memory chips in total.
GDDR4 384 or 768mb? (64 or 128 mb per memory chip?)
memory clock 2400mhz?
bandwidth of ~57.6gb/s
shader clock of roughly more than 1500mhz. (not quite sure)

note- could potentially replace the 8800GTS 320mb due to the performance of this card and the replacement of the G80 core. (we should be seeing the refresh of G80 sometime soon hopefully)

So using jawed method of comparison (didnt calculate the flops because not sure how much the shader clock could be clocked at)

8600ultra has a

bandwidth of 57.6GB/s
fillrate of 9600MP/s
AA fillrate of 38400MP/s
and a zixel rate of 76800MP/s
bilinear rate of 12800MT/s
trilinear rate of 25600MT/s
64 SPs @ 1750MHz = 224GFLOPs (for comparison's sake and i came to 1750mhz, because ALL the shader clock on all G8x seems to follow this equation shader clock = (core clock x2) + 150mhz)

So in percentages, 8600ultra v 8800GTS:

bandwidth = 90%
fillrate = 96%
AA fillrate = 96%
zixel rate = 96%
bilinear rate = 107%
trilinear rate = 107%
GFLOPs = 97%

Finally stock 8600GTS (675/2000) vs 8600 ultra

bandwidth = 56%
fillrate = 56%
AA fillrate = 56%
zixel rate = 56%
bilinear rate = 42%
trilinear rate = 42%
GFLOPS = 43%

And in between the 8600 ultra and the 8600 GTS there simply could be more 8600 series too fill in that gap, e.g a 3 "quad" G84 so on and so forth.

Ok i think i went a little to far with specualation, so half of it probably dont make much sense.

I think this has too do with the G84 being so underwhelming for us, and that nVIDIA might have under estimated the mid range this gen. Even the 7600GT was quite an impressive mid range card (having to beat the 6800 ultra all across the board, even at high res/AA and AF).

All this is useless, if the bridge product is infact a 256bit variant of G8x (maybe a refresh, or another cut down G80), and that the RV630XT does infact perform similarly with the 8600GTS trading blows with the competition.

Off to bed.

Jawed · Apr 11, 2007

Well, that's quite interesting. My first thought was "you're mad", but actually it doesn't sound unreasonable at all.

NVidia certainly has the architecture to easily scale bandwidth, so 192-bits, 50+GB/s seems possible.

But I'm dubious that NVidia would plan to sell the majority of dies (8600GT/GTS) with 1/3 disabled (SPs, TMUs, ROP/MCs) - the volume of cards sold is too much. It'd just knock margins. Especially if 8500GT is 2/3 disabled (except ROP/MCs). Although there'll supposedly be G86-based 8500-series cards too.

It's different when you're talking about the biggest dies (G80) because you're trying to retrieve all possible SKUs from a die. Unless 80nm is problematic for NVidia in G84. But 80nm has been, theoretically, a year in production for this class of GPU.

So a G87, or something, that matches your speculation seems more likely. Two speed grades, one to replace 8800GTS and one at about 85%?

8800GTX's memory, at 900MHz, is starting to look like it's past its sell by date. Why can't 8800GTX have 1100MHz memory, ~106GB/s? It seems bizarre that lower-end cards have higher-clocked memory. (In other words, 8800U or 8900GTX is due.)

Jawed

armchair_architect · Apr 12, 2007

Most of the recent posts have been coming up with imaginative ways to explain away the hole between the 8800 GTs and what we think the 8600 GTS is. Isn't it easier to believe that the hole really exists and will be filled with a different chip? G80, G84, G86 .. definitely seems to be something missing.

Getting the high-end out first is a time-honored tradition (though AMD is mixing it up a little). But with Nvidia's increasing focus on mobile, maybe they decided their second wave should be the chips that matter the most there. In the meantime they can use high-bin G84 and low-bin G80 to avoid losing too much share in the upper-midrange, even though they have to stretch to hit the perf (G84) or price (G80) points there.

Unknown Soldier · Apr 12, 2007

Just remember that Nvidia has gone to great lengths before to divert the media from knowing what they actually doing, first the NV40 with SM3.0 and then the G80 with unified SM4.0 so you can't rule out anything until the cards are actually released.

US

Jawed · Apr 12, 2007

64-bit G86 variants reported, plus other scraps:

http://www.theinquirer.net/default.aspx?article=38884

Jawed

The Official G84/G86 Rumours & Speculation Thread

Arnold Beckenbauer

trinibwoy

Meh

Kocur

Jawed

stevem

Geeforcer

Harmlessly Evil

Jawed

Jawed

Arun

Unknown.

trinibwoy

Meh

INKster

Arun

Unknown.

MulciberXP

INKster

Jawed

Twinkie

Jawed

armchair_architect

Unknown Soldier

Jawed

Similar threads