The New and Improved "G80 Rumours Thread" *DailyTech specs at #802*

Discussion in 'Pre-release GPU Speculation' started by Geo, Sep 11, 2006.

Thread Status:
Not open for further replies.
  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I dunno how you're calculating this stuff, and whether you're remembering the clock rate of G80's ALUs (1350MHz).

    On top of that, I don't think the ADD-only (mini-ALU also doing SAT, SCALE, BIAS) ALU in R580 is worth counting particularly. Admittedly I haven't seen an analysis of the number of ADDs in fragment shaders that can be dual-issued with any regular FP op on the main ALU. It's bound to be used some of the time, but again it looks like an underutilised ALU.

    I think it's a bit of fat in R580 that'll get trimmed off in R600 - seemingly Xenos trimmed that fat.

    Jawed
     
  2. Fornowagain

    Newcomer

    Joined:
    Jan 15, 2006
    Messages:
    35
    Likes Received:
    2
    No, thats more like 9". An ATX board is 9.6" wide, an 11" cards going to overhang bigtime.
     
  3. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY
    Just an interjection here about power requirements


    The gts veriety will only have one power connector so that gives us a good idea it won't take up more then 150 watts no matter what. So whats left, 128 mb of vram 200 mhz on the ram, and 75 mHz on the GPU, which doesn't add up to much more wattage on the GTX.
     
  4. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    7900 gx2 was 11 inchs
     
  5. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Here's a thought. If the 128 ALUs are scalar, and multifunctional (scalar ops plus special functions plus interpolators) why would the G80 have so many more transistors than a G70? First, by combining the ALU functions, NVidia has claimed that the multipurpose ALU saves transistors, so 4 of the new ALUs are probably not significantly larger than the 2 per-pipe G70 ALUs.

    Secondly, the G70 has 48 of such SIMD ALUs, for a total of 192 FP-op components. So, we're supposed to believe that 128 scalar processors + control logic are 700M transistors, but a G71 with 192 FP adders/mulipliers is only 278M transistors? I'm supposed to think that 400+M more transistors are spent on TCP, AA, memory controller?

    So, either those 128 ALUs are not scalar units, or the G80 has some other major features we are not hearing about, OR, the claims of 700M transistors are wrong.
     
  6. Pete

    Pete Moderate Nuisance
    Moderator Legend Veteran

    Joined:
    Feb 7, 2002
    Messages:
    5,030
    Likes Received:
    463
    Good point about area efficiency, Jawed.

    Would this benefit, say, the GS or physics (heh, quantum sounds pretty appropriate to 1D ALUs, if that's what they are) calcs?

    Demo, I think the 700M may have been wrong, and we're looking at ~500M for both G80 and R600.

    I--dammit, two in a row! My posts aren't good enough to go on the top of the page! :lol:
     
    #1026 Pete, Oct 6, 2006
    Last edited by a moderator: Oct 6, 2006
  7. Arty

    Arty KEPLER
    Veteran

    Joined:
    Jun 16, 2005
    Messages:
    1,906
    Likes Received:
    55
    Those were very rare (not in retail at all), I'd doubt Nvidia wants G80 to have such limited adoption rate.
     
    Razor1 likes this.
  8. kyetech

    Regular

    Joined:
    Sep 10, 2004
    Messages:
    532
    Likes Received:
    0
    Its been very interesting following this thread,

    But the last few pages have gotten over my head. Any chance you guys could sumarise the latest speculation and make direct speed / performance comparisons with the current gen chips (in a kind of bullet point approach)

    Cheers guys.

    PS: is it time to nip this thread in the bud now, and create yet another g80 thread. (round three).


    -- The way you guys pull this stuff apart really is fascinating.
     
  9. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,508
    Likes Received:
    1,314
    Is it possible we're talking about 128 full (vec3/4 whatever) ALU's?

    Knowing Nvidia's ability to cram lots of computing into small areas, I'm not sure I'd doubt it.

    The thing that makes it irrational is the "double pumping", which would change it from an effective 64 pipe machine (in traditional terms, which is reasonable) into 128 pipes effective.

    If they pulled that off it'd be fairly absurd.

    It seems Nvidia has really gone crazy with so many changes here, my only wonder is if it's Nv30 part 2, or the opposite.
     
  10. ERK

    ERK
    Regular

    Joined:
    Mar 31, 2004
    Messages:
    287
    Likes Received:
    10
    Location:
    SoCal
    I am not an expert on ALU design, but wouldn't putting all funcitionality into all 128 streams be a waste of transistors? I'm picturing an ALU which is doing a multiply; at that time is its sin/cos or isqrt transistors idle? Wouldn't it be more efficient to put more transistors into the scheduler and not cram every function into every ALU?

    Seems that for best transistor efficiency, one should populate the math functions into the ALUs in the ratio that they are expected to occur in shader programs.

    Am I not seeing something?

    ERK
     
  11. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    It would be very hard for them to do another nv30, they would actually have to try to make something fail like that :wink:

    The 128 ALU double pumped, Rangers might be on a right track it might be a play with words. Or as mentioned earlier which I don't see how they will get the mhz's wrong on both the GTX and GTS of 128 and 96 streaming processors at thier respective 675 and 600 mhz double pumped.
     
    #1031 Razor1, Oct 6, 2006
    Last edited by a moderator: Oct 6, 2006
  12. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    I don't see why. You could just have a swizzle unit before and after the ALU to copy/combine the components.
     
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    You've also got register pressure to deal with: both bandwidth per clock (4xFP32s) and pipeline-bubbles caused by the register file running out of space for the minimum number of fragments in flight. Sure, G80's ALU organisation doesn't directly solve register pressure.

    But you're right, I am guessing about that 40%. It'll be interesting if NVidia markets a performance gain, should the architecture come out as 128x scalar.

    Jawed
     
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Isn't that what the barycentrics are for? :???: The plane equation, expressed as barycentrics, converts vertex coordinates in screen space (post-rasterisation) into texture coordinates in triangle space? The interpolation can't be done in screen space because of perspective.

    :???:

    It is late...

    Jawed
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I'm going to have to come back to this in the morning/afternoon, brain ache.

    No, I just haven't posted my qualms about this - that's a few hundred words I haven't gotten round to yet - I've mentioned the register file and instruction fetch/decode twice now and I did say it's "hairy" earlier. It would be nice to know the width of the SIMDs (if they are SIMD!). I need to go dig out NVidia's patents to see what clues about SIMDness there are.

    I was sorta hoping, while I was away this evening, someone would come up with a solid critique/sizing. If Intel can do this on a small scale in GMA X3000, then why not NVidia?...

    R580 sacrifices dynamic branching granularity against R520, so clearly there is a trade-off in its "thicker" pipelines. (Though in absolute terms it's never slower than R520 on dynamic branching.)

    Jawed
     
  16. Pete

    Pete Moderate Nuisance
    Moderator Legend Veteran

    Joined:
    Feb 7, 2002
    Messages:
    5,030
    Likes Received:
    463
    So where does the concept of a quad fit into all of this?

    Well, the fillrate and bandwidth should make for straightforward comparisons to previous cards. Just look up a previous review, or the 3D Tables, and compare. The tricky part is not so much that the shaders are unified (so, no more separate vertex and pixel shader ALUs; now, vertex, geometry, and pixel shader programs all run on the same ALUs) but that they're much finer-grained than before. On the 7900, a single pixel shader ALU can process up to five components per clock; on G80/8800, apparently a single (unified) shader can process a single component per clock. Then you take into account that the shader ALUs are apparently clocked twice as high as the rest of the chip!

    So, very basically, G70 has (24 pixel pipes with two ALUs per pipe) + (8 vertex shader ALUs)= 48 + 8 = 56 ALUs optim(istic)ally working on 4 components per clock. Very basically, G80 has 128 ALUs which can process one component per clock, so essentially 1/4 a G70 ALU, so 32 G70 ALUs, but they're clocked twice as high, so 64 G70 ALUs. The key difference is that apparently G80's ALUs will never be idle, whereas G70's ALUs may be partially "idle," so the actual performance difference should be greater than the theoretical (in G80's favor). But, I say this knowing nothing of G80's texture units or ROPs (or really of its ALUs even), so consider this the broadest of generalizations.

    Edit: And, actually, the only source for G80's shader processors being scalar is trumphsiao, so I don't know if that's gospel yet. Chalnoth may have a good point about the marketing department perhaps describing 64 double-pumped (more traditional 4(+1)D) ALUs as 128.
     
    #1036 Pete, Oct 6, 2006
    Last edited by a moderator: Oct 6, 2006
  17. V3

    V3
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    3,304
    Likes Received:
    5
    What reasons is there to think those ALUs are scalar anyway ? Xenos with its 48 ALUs is around 200 mil transistors, if the 700 mil transistor is correct, that's alot of transistors budget for more capable ALUs.
     
  18. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    It would make slightly more sense from a transistor budget standpoint (I think?) if each stream processor could issue four (a quad) of Vec4s, which each take 4 cycles to complete. Each processor would then have its own instruction stream. Each processor would have a little more than half the transistors of a G7x pipe (more decode logic, only one "ALU", but arranged differently). A G7 with 24 * 2 + 8 would be a bit less than half the transistor size of a G8 (broadly assuming that pipelines are the only thing there, and that anything else would scale accordingly), which works out roughly according to rumor.

    I doubt that's right, because it makes the G8 a pretty serious monster. Still, hope springs eternal :)


    Reading above, VC is vertex coordinate??

    -Dave
     
  19. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,508
    Likes Received:
    1,314
    Xenos is actually 232, that's with no ROPS (they are on the EDRAM), and furthermore ATI and Nvidia count so differently I'm not sure what that worth at all (as example R580 has 80% die size over G70, but the official respective transistor counts are 384 vs 278 not near the die spread).

    But yeah, niggles aside Xenos shader core is a pretty small die I think, from photos and what estimates I have seen (have not seen a official measurment). I think the XCpu is officially (from IBM) 168 MM^2 and the pics I have seen of 360 mobos the Xenos core is not significantly bigger than the CPU. G70 is 198mm^2, Xenos core alone is probably right in that area/slightly smaller, 180-something.

    Also ATI added 36 shader pipes to R520 with just ~60m more transistors in R580.

    That said I think ~500m is much more realistic on these G80/R600 chips, that's why I dont expect huge wonders either. The thing is you dont know how large the new control logic/overhead to support the new execution units is.
     
  20. SugarCoat

    Veteran

    Joined:
    Jul 17, 2005
    Messages:
    2,091
    Likes Received:
    52
    Location:
    State of Illusionism
    Few tidbits from the boys
    http://www.theinq.com/default.aspx?article=34885

    seems to imply an internal 512 internal 384bit external (non-split) bus. Or perhaps they are split but work in unison unless the smaller 128-bit is called upon to do something like physics, new HDR or 16AA? That would imply that the 128-bit bus is a jack of all trades, fulfilling a need all the time to always provide some kind of additional performance. I dont see Nvidia letting it go to waste if the user wasnt doing some amazing next generation effect though. Personally think the card is going to be very bandwidth starved which scares me about buying one because they could pull a refresh with something like GDDR4 1200 (before AIB overclocking of course, will be interesting to see what they do with it) which the chip may gobble up for a substatial boost. Wonder if any AIBs could get away with doing cards with GDDR4 :D.

    Doesnt vista already support system memory being used by the card if it needs it? Dont see the importance of TC in consumer models. Thought it was 64MB per each 512MB of system memory can be given to graphics if needed. I fail to see the importance of that comment other then Nvidia not having 2 cards essentially identicle with a huge price delta to set them apart. Sort of a no-brainer.
     
    #1040 SugarCoat, Oct 6, 2006
    Last edited by a moderator: Oct 6, 2006
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...