The G92 Architecture Rumours & Speculation Thread

Status
Not open for further replies.
so the shader clock rate in G80 is faster than G92? dont they run at same speed as core? or are u talking about stream processors, but they run faster on the G92 so that cant be it...:runaway:
 
Link

Looks like Cevit eat too much gyros, or mean the new GTS.

R.I.P. IHV independent game developing.

I might be wrong... but the way they refered to is NV cards in mid Nov. Only product I could think of is the 'Three way SLi' launched at that time frame coincide with the comment of that guy.

Or if we are staying in the myst of NV, NV may have something more to come :???:.
 
oh so the G92 does not have the same number of TF units as TM units, and that is why its not reaching it theoretical? so am i even close when i say it must have around 42 or 48 TF? does TMU = TA?

this is lots of stuff to take in on every other forum im like the master but here im like poop.
 
oh so the G92 does not have the same number of TF units as TM units, and that is why its not reaching it theoretical? so am i even close when i say it must have around 42 or 48 TF? does TMU = TA?

this is lots of stuff to take in on every other forum im like the master but here im like poop.

oh i just anarx post, i guess i still dont get it.

is there a way to edit ones posts?
 
oh i guess B3D will explain the G92 architecture in detail in a few days anyways, guess ill just wait til then.

edit: edit is working.
 
oh so the G92 does not have the same number of TF units as TM units, and that is why its not reaching it theoretical? so am i even close when i say it must have around 42 or 48 TF? does TMU = TA?
G92 appears to have equal #s of TA and TF units, like G84 (e.g., 8600). AFAIK, the reason cards don't reach their theoretical fillrate is typically a bandwidth limitation (fellix mentions shader clocks, but I'm not sure how shaders themselves are involved with texture fillrate tests beyond sharing the same clock speed). The way to reduce or eliminate this limitation was to test fillrate with 16bit rather than 32bit--in other words, smaller, lower-bandwidth--textures. I'm sure someone will suss out the hard limit with some judicious testing and memory overclocking.

If those 3DM #s are right, the obvious reason G80 would be getting closer to its theoretical limits than G92--assuming the latter has 56 TFs--is that it's got only ~1/7 or ~15% more TFs but ~1/2 or ~50% more bandwidth with which to feed said TFs (as AnarchX said). Of course, there's always texture caches and other things that are over my head to consider. This page is proof of how architectural improvements can yield fillrate improvements with ostensibly identical bandwidth and fillrate. (Again, Dave mentions shader clocks possibly affecting texturing performance, and again I'm unclear on the connection beyond how the shader clock relates to the base and so texture unit clocks.)

You can check out the G80 and G84 reviews to learn more about TAs and TFs.

As for me, I'd just like to say Jeezy Chreezy, GPUs are getting complicated.

Edit: A table, for kicks, assuming stock clocks (update: the GTS seems to be stock). Different drivers for each card, BTW.
attachment.php


fellix, your wink cuts to the core, but thanks. :) A quick Google got me right back here.
 

Attachments

  • 8800gt2.png
    8800gt2.png
    18.8 KB · Views: 164
Holy crap! 25 GTexels per second? :oops:

I guess NVidia figured that for a very low addtional cost they might as well increase the texturing speed for the situations where only one TF unit is needed per fetch.

Strange how none of the tests - even the theoretical ones - get even close to the increase in math throughput. R600 outperforms G80 in the perlin noise test, so it can't be texture limited. Any theories?
 
Yup, various vertex texture fetching should see great improvement, but looking at the Shader Particles set it doesn't seem to be the case.

To bring the texture rate efficiency again on the table, I've made a quick math suggesting that around and past 2200MHz shader domain rate (keeping the 600MHz base clock), the 56 TMUs should sustain their maximum real fill rate of ~90% the theoretical one, neglecting the potential internal and external bandwidth limitations.
 
Last edited by a moderator:
G92 appears to have equal #s of TA and TF units, like G84 (e.g., 8600). AFAIK, the reason cards don't reach their theoretical fillrate is typically a bandwidth limitation (fellix mentions shader clocks, but I'm not sure how shaders themselves are involved with texture fillrate tests beyond sharing the same clock speed). The way to reduce or eliminate this limitation was to test fillrate with 16bit rather than 32bit--in other words, smaller, lower-bandwidth--textures. I'm sure someone will suss out the hard limit with some judicious testing and memory overclocking.
*snip*

MEMORY bandwidth has something to do with fillrate? i did not know that. so the 8800 GT has 56 TMU, 56 TF AND 56 TA?
 
Last edited by a moderator:
Well, the GPU isn't plucking textures out of thin air. :smile: It's got to find them (say, in VRAM) before it can do anything.

I'll say what I know, but no promises it's right. With the GF8 architecture, the "TMU" has been split into TA (texture address/fetch) and TF (texture filter) units. So, it has no "TMUs," per se. I linked to the page in Rys' G80 review that details those bits. The act of fetching texture data from VRAM requires bandwidth.

As for how many TAs/TFs G92 has, I'm using 56 b/c ppl think it makes sense. A 25GT/s fillrate on a 112SP GPU definitely suggests 56 TAs and TFs, given what we know of NV's current GPUs:

16 SPs per TCP. 4 TAs per TCP for G80. 8 TAs per TCP for G84.

G92 has 112SPs, so it has 112 / 16 = 7 TCPs

Assuming G80 TA:TF ratio:
7 TCP * 4 TA+F/TCP = 28 filtered texels per clock
28 * 600MHz = 16.8 GT/s, so not valid b/c it hits 25 GT/s in 3DM06

Assuming G84 TA:TF ratio:
7 * 8 = 56 texels/ck
56 * 600 = 33.6 GT/s

Getting "only" 25GT/s in 3DM06 makes sense given the ridiculous theoretical texturing fillrate (more than G80 or anything) and pedestrian bandwidth (way less than G80). Using 16bit textures should use half the bandwidth and so should hopefully remove bandwidth as the bottleneck, letting us get closer to its theoretical texel fillrate limit.
 
Actually, looking at the "spiritual" predecessor of G92--the G84--it seems the bandwidth is the key here, but not only:

G92 (8800GT) -- 57GB/s, 75% out of 33600GTex/s theoretical fillrate;
G84 (8600GTS) -- 32GB/s, 70% out of 10800GTex/s theoretical fillrate;
 
yes thank you both for showing numbers breakdown. that really helps. so it seems overclocking the memory should be very beneficial to performance. so looking at the next high end would something like this be realistic:

G92 core with 56 TMU 24/32 ROPs
384 bit membus
800mhz core (dual slot cooling)
2.2-2.4ghz GDDR4
 
Status
Not open for further replies.
Back
Top