The G92 Architecture Rumours & Speculation Thread

fellix · Oct 26, 2007

Simply said -- higher shader clock rate.

2900guy · Oct 26, 2007

so the shader clock rate in G80 is faster than G92? dont they run at same speed as core? or are u talking about stream processors, but they run faster on the G92 so that cant be it... :runaway:

AnarchX · Oct 26, 2007

G80 -> 32 TA(64TF) -> GTX 86GB/s
G92 -> 56 TA (56TF) @ 8800GT 58GB/s

Bandwidth limitation or the shaders are not fast enough to feed this 56 texture-adressers with coordinates.

satein · Oct 26, 2007

vertex_shader said:
Link

Looks like Cevit eat too much gyros, or mean the new GTS.

R.I.P. IHV independent game developing.

I might be wrong... but the way they refered to is NV cards in mid Nov. Only product I could think of is the 'Three way SLi' launched at that time frame coincide with the comment of that guy.

Or if we are staying in the myst of NV, NV may have something more to come :???:

.

2900guy · Oct 26, 2007

oh so the G92 does not have the same number of TF units as TM units, and that is why its not reaching it theoretical? so am i even close when i say it must have around 42 or 48 TF? does TMU = TA?

this is lots of stuff to take in on every other forum im like the master but here im like poop.

Arty · Oct 26, 2007

vertex_shader said:
Link

Looks like Cevit eat too much gyros, or mean the new GTS.

R.I.P. IHV independent game developing.

Or maybe he is going by the old roadmap prior to the launch being brought forward.

2900guy · Oct 26, 2007

2900guy said:
oh so the G92 does not have the same number of TF units as TM units, and that is why its not reaching it theoretical? so am i even close when i say it must have around 42 or 48 TF? does TMU = TA?

this is lots of stuff to take in on every other forum im like the master but here im like poop.

oh i just anarx post, i guess i still dont get it.

is there a way to edit ones posts?

Geo · Oct 26, 2007

Try now.

2900guy · Oct 26, 2007

oh i guess B3D will explain the G92 architecture in detail in a few days anyways, guess ill just wait til then.

edit: edit is working.

Kowan · Oct 26, 2007

serenity said:
Or maybe he is going by the old roadmap prior to the launch being brought forward.

That's what many are thinking considering the "really, really very good deals" part of his comment.

_xxx_ · Oct 26, 2007

Silent_Buddha said:
And since then 6xxx -> 7xxx -> 8xxx hasn't exactly been a doubling of performance each time. Although in certain benchmarks with certain settings you could get a doubling or close to it.

Yeah, that's how I meant that.

Pete · Oct 26, 2007

2900guy said:
oh so the G92 does not have the same number of TF units as TM units, and that is why its not reaching it theoretical? so am i even close when i say it must have around 42 or 48 TF? does TMU = TA?

G92 appears to have equal #s of TA and TF units, like G84 (e.g., 8600). AFAIK, the reason cards don't reach their theoretical fillrate is typically a bandwidth limitation (fellix mentions shader clocks, but I'm not sure how shaders themselves are involved with texture fillrate tests beyond sharing the same clock speed). The way to reduce or eliminate this limitation was to test fillrate with 16bit rather than 32bit--in other words, smaller, lower-bandwidth--textures. I'm sure someone will suss out the hard limit with some judicious testing and memory overclocking.

If those 3DM #s are right, the obvious reason G80 would be getting closer to its theoretical limits than G92--assuming the latter has 56 TFs--is that it's got only ~1/7 or ~15% more TFs but ~1/2 or ~50% more bandwidth with which to feed said TFs (as AnarchX said). Of course, there's always texture caches and other things that are over my head to consider. This page is proof of how architectural improvements can yield fillrate improvements with ostensibly identical bandwidth and fillrate. (Again, Dave mentions shader clocks possibly affecting texturing performance, and again I'm unclear on the connection beyond how the shader clock relates to the base and so texture unit clocks.)

You can check out the G80 and G84 reviews to learn more about TAs and TFs.

As for me, I'd just like to say Jeezy Chreezy, GPUs are getting complicated.

Edit: A table, for kicks, assuming stock clocks (update: the GTS seems to be stock). Different drivers for each card, BTW.

fellix, your wink cuts to the core, but thanks.

A quick Google got me right back here.

fellix · Oct 26, 2007

Pete said:
(fellix mentions shader clocks, but I'm not sure how shaders themselves are involved with texture fillrate tests).

Err, the attribute interpolation rate?

Mintmaster · Oct 26, 2007

Holy crap! 25 GTexels per second?

I guess NVidia figured that for a very low addtional cost they might as well increase the texturing speed for the situations where only one TF unit is needed per fetch.

Strange how none of the tests - even the theoretical ones - get even close to the increase in math throughput. R600 outperforms G80 in the perlin noise test, so it can't be texture limited. Any theories?

fellix · Oct 26, 2007

Yup, various vertex texture fetching should see great improvement, but looking at the Shader Particles set it doesn't seem to be the case.

To bring the texture rate efficiency again on the table, I've made a quick math suggesting that around and past 2200MHz shader domain rate (keeping the 600MHz base clock), the 56 TMUs should sustain their maximum real fill rate of ~90% the theoretical one, neglecting the potential internal and external bandwidth limitations.

2900guy · Oct 26, 2007

Pete said:
G92 appears to have equal #s of TA and TF units, like G84 (e.g., 8600). AFAIK, the reason cards don't reach their theoretical fillrate is typically a bandwidth limitation (fellix mentions shader clocks, but I'm not sure how shaders themselves are involved with texture fillrate tests beyond sharing the same clock speed). The way to reduce or eliminate this limitation was to test fillrate with 16bit rather than 32bit--in other words, smaller, lower-bandwidth--textures. I'm sure someone will suss out the hard limit with some judicious testing and memory overclocking.
*snip*

MEMORY bandwidth has something to do with fillrate? i did not know that. so the 8800 GT has 56 TMU, 56 TF AND 56 TA?

Pete · Oct 27, 2007

Well, the GPU isn't plucking textures out of thin air. :smile: It's got to find them (say, in VRAM) before it can do anything.

I'll say what I know, but no promises it's right. With the GF8 architecture, the "TMU" has been split into TA (texture address/fetch) and TF (texture filter) units. So, it has no "TMUs," per se. I linked to the page in Rys' G80 review that details those bits. The act of fetching texture data from VRAM requires bandwidth.

As for how many TAs/TFs G92 has, I'm using 56 b/c ppl think it makes sense. A 25GT/s fillrate on a 112SP GPU definitely suggests 56 TAs and TFs, given what we know of NV's current GPUs:

16 SPs per TCP. 4 TAs per TCP for G80. 8 TAs per TCP for G84.

G92 has 112SPs, so it has 112 / 16 = 7 TCPs

Assuming G80 TA:TF ratio:
7 TCP * 4 TA+F/TCP = 28 filtered texels per clock
28 * 600MHz = 16.8 GT/s, so not valid b/c it hits 25 GT/s in 3DM06

Assuming G84 TA:TF ratio:
7 * 8 = 56 texels/ck
56 * 600 = 33.6 GT/s

Getting "only" 25GT/s in 3DM06 makes sense given the ridiculous theoretical texturing fillrate (more than G80 or anything) and pedestrian bandwidth (way less than G80). Using 16bit textures should use half the bandwidth and so should hopefully remove bandwidth as the bottleneck, letting us get closer to its theoretical texel fillrate limit.

fellix · Oct 27, 2007

Actually, looking at the "spiritual" predecessor of G92--the G84--it seems the bandwidth is the key here, but not only:

G92 (8800GT) -- 57GB/s, 75% out of 33600GTex/s theoretical fillrate;
G84 (8600GTS) -- 32GB/s, 70% out of 10800GTex/s theoretical fillrate;

2900guy · Oct 27, 2007

yes thank you both for showing numbers breakdown. that really helps. so it seems overclocking the memory should be very beneficial to performance. so looking at the next high end would something like this be realistic:

G92 core with 56 TMU 24/32 ROPs
384 bit membus
800mhz core (dual slot cooling)
2.2-2.4ghz GDDR4

Jawed · Oct 27, 2007

Mintmaster said:
Holy crap! 25 GTexels per second?

I guess NVidia figured that for a very low addtional cost they might as well increase the texturing speed for the situations where only one TF unit is needed per fetch.

Very low additional cost eh?

Jawed

The G92 Architecture Rumours & Speculation Thread

fellix

2900guy

AnarchX

satein

2900guy

Arty

KEPLER

2900guy

Geo

Mostly Harmless

2900guy

Kowan

_xxx_

Pete

Moderate Nuisance

Attachments

fellix

Mintmaster

fellix

2900guy

Pete

Moderate Nuisance

fellix

2900guy

Jawed

Similar threads