The New and Improved "G80 Rumours Thread" *DailyTech specs at #802*

Discussion in 'Pre-release GPU Speculation' started by Geo, Sep 11, 2006.

Thread Status:
Not open for further replies.
  1. pocketmoon66

    Newcomer

    Joined:
    Mar 31, 2004
    Messages:
    163
    Likes Received:
    9
    tbuffer.... oh sorry, it's a form of tourette's :twisted:
     
  2. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,516
    Likes Received:
    24,424
    HDCP provisioning?
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Only because Intel's GMA X3000 unified GPU apparently does the same. And if you can get MIMD vertex execution you're better off, because the coherency of vertices doesn't fit into the 2x2 pattern of pixel quads (i.e. batch size is a multiple of 4) so well. Any dynamic branching benefits from being MIMD, too.

    That said, once you start doing a lot of vertex fetching, I guess bandwidth plays such an important part (because of all the vertex attribute data being fetched per vertex) that coherence of any kind could be a win simply by making better use of bandwidth. I don't understand how that pans out, though...

    Overall I'm still guessing!

    Jawed
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Dave, you'll be dissapointed in me...
    I sorta get bored after a while trying to decode these patents. I actually scrapped a posting I was going to make on precisely the subject of unravelling the unit hierarchy indicated by this patent.

    All I would say is, it's prolly safe to assume that per fragment (or vertex or primitive) there is prolly more than one type of ALU available. The set of PCUs may or may not be symmetric. As I say, I get bored - it seems there are dozens of ways of skinning this cat.

    Jawed
     
  5. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    955
    Likes Received:
    52
    Location:
    LA, California
    So I haven't finished reading the p10 patents yet, but I have to say giant kudos to the authors - so far they are just about the most comprehensible patents I've ever read.
     
  6. INKster

    Veteran

    Joined:
    Apr 30, 2006
    Messages:
    2,110
    Likes Received:
    30
    Location:
    Io, lava pit number 12
    Multiple display SLI support ?
     
  7. g__day

    Regular

    Joined:
    Jun 22, 2002
    Messages:
    580
    Likes Received:
    2
    Location:
    Sydney Australia
  8. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,516
    Likes Received:
    24,424
    #2048 BRiT, Oct 28, 2006
    Last edited by a moderator: Oct 28, 2006
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    OK, I'm going to assume that all the hints we're getting are that G80 uses a windowed thread scheduler over scalar sequential pipelines.

    In the example below, I'm going to assume that "batch size" is 16, and that for every 16 MMAD ALUs there's 4 MI ALUs. Really it's just an excuse to have some fun, and no, I'm sober.

    Here's my tortuous, silly code:

    [​IMG]

    and here's how it executes ( :lol: at how long this is, scroll down!)

    [​IMG]

    it's four batches executed one after the other.

    The white bits are bubbles, effectively. The junk code I've dreamt-up is very dependent from the MIs to the MMADs, hence the bubbles.

    Jawed
     
    tEd likes this.
  10. lopri

    Regular

    Joined:
    Aug 4, 2004
    Messages:
    259
    Likes Received:
    1
    Brilliant.
     
  11. INKster

    Veteran

    Joined:
    Apr 30, 2006
    Messages:
    2,110
    Likes Received:
    30
    Location:
    Io, lava pit number 12
    Well, currently SLI only works on a single display, and it's a true limitation.

    There is another one.
    Card's like the 7950 GX2, standard SLI of Crossfire have shown us that it's the lack of a large installed base of ultra-high-resolution displays that limits both the performance gains visibility, as well as sales of multidisplay-capable hardware (not just graphics cards, but also motherboards with compliant chipsets -often from the same brand as the GPU, obviously-).
    Nvidia never refrained itself from marketing the 7900/7950 GX2 products for the ultra-high-definition (UHD) crowd only.
    What's the next step after that, knowing that the 30' form-factor seems to be a limit for some time now in the PC arena ? Go multi. What else ?


    Of course, only a very narrow section of consumers can afford a 24, 27 or even 30 inch display in addiction of the remaining high-end hardware, and even those are still limited by the single display issue.
    So, its reasonable to assume that, say, two side-by-side widescreen 20' LCD's with 1680 x 1050 native resolution each (3360 x 1050) are much cheaper to the average high-end gamer than a single Apple/Dell 30' 2560 x 1600 LCD, and perhaps even more flexible (you can always game on one and use the other one for something else, especially now in the multi-core CPU age).
    It would instantly remove both a bottleneck for current multi-GPU's, and grow the appeal of the platform beyond just "the usual suspects".


    Finally, the two-way SLI on 8800 GTX seems to suggest a kind of daisy chaining of graphics cards, one that would only make sense if more than two cards where present on a single system.
    Since that would be costly, and the GTS doesn't have the two connectors, 3+ cards is almost a requirement when you need the power to drive one, two and more UHD displays while gaming with DX10 quality games, and for that you need the best there is (unless gaming is not important and you do like Apple, with 7300 GT's driving 30 inchers...:D).


    What do you guys think ?
     
    #2051 INKster, Oct 28, 2006
    Last edited by a moderator: Oct 28, 2006
  12. Rangers

    Legend

    Joined:
    Aug 4, 2006
    Messages:
    12,791
    Likes Received:
    1,596
    It makes sense because it's obvious quad SLI by way of GX2 style is dead with these cards. They are too hot and power hungry too fit two on one PCB (even the G71, a remarkably mild chip for it's power, had too be downclocked for GX2).

    So the only way to get Nvidia's precious quad SLI is 4 physical cards.

    Ugh.
     
  13. Demirug

    Veteran

    Joined:
    Dec 8, 2002
    Messages:
    1,326
    Likes Received:
    69
    Looks like that the German PC Games Hardware magazine (print not online) had another NDA than anyone else. The current issue (delivered today) includes an official G80 preview. Only tech stuff no benchmarks.
     
    LeStoffer likes this.
  14. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,262
    Likes Received:
    22
    Location:
    Land of the 25% VAT
    Does anybody mind posting a short rundown for the "tech stuff" from PC Games Hardware then? :wink:
     
  15. christoph

    Newcomer

    Joined:
    Feb 6, 2002
    Messages:
    148
    Likes Received:
    4
    from 3dcenter:

     
    #2055 christoph, Oct 28, 2006
    Last edited by a moderator: Oct 28, 2006
    Jawed, LeStoffer and Geo like this.
  16. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Expect the power and heat to go down significantly once nVidia puts out its next set of high-end parts based upon this architecture, as they should be based upon a die shrink of the soon-to-be-released GPU's. The mid-high part may well work well with a GX2-style part (certainly not the highest-end part).
     
  17. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    Cool.

    128 scalar MADD+MUL - 518 GFLOPS/s
    Groups of 16
    4 Texture Address calculators per group - 32 total
    8 TMU's per group - 64 total
    6 ROP clusters (how many ROPs in a cluster? One?) / 4 samples/clock (minimum 4xAA ?) Double Z/stencil?
    New Coverage Sampling AA not universally compatible
     
    #2057 trinibwoy, Oct 28, 2006
    Last edited by a moderator: Oct 28, 2006
    Jawed likes this.
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    LOL, back to NV40?!!! style "asymmetric" ALUs "per pipe".

    My head hurts. I can't think what the best way of scheduling that is! Utilisation isn't necessarily very good there, at all. What the scalar gives, the dual-issue takes away. ARGH.

    So, minimum of 16 granularity for fragment batches? I wonder about vertices, 16 too?

    It'll be interesting to see the difference in performance that branching granularity of, say, 16 in G80 versus 64 in R600 makes...

    So, is that the Multifunction Interpolators?

    8 TMUs running at ~half the speed of 4 streaming processors (the bit that does texture address calculation) - I think this was the concensus already based on specs.

    I dare say it would make sense that this is a single decoupled functional block. I would guess each "cluster" is 4 ROPs, so a 24 ROP design?

    Oh dear, how can that be? ARGH

    64 TMUs seems like an awful lot, considering the bandwidth available is only 86GB/s. R580's 16 TMUs happily use 60GB/s. The only way to explain this, I guess, is if the 64 TMUs are designed for full-speed FP16 texture filtering, say, and half-speed FP32 texture filtering.

    Jawed
     
    Acert93 likes this.
  19. Demirug

    Veteran

    Joined:
    Dec 8, 2002
    Messages:
    1,326
    Likes Received:
    69
    Well you shouldn’t ignore the high clock rate of the stream processors and that they are no longer tied to the texture processing.

    No. That the part that calculates the position and weight of the samples from the texture coordinates.
     
  20. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    The ALUs don't seem that impressive, indeed.

    4 interpolators per group? Not sure. Are we sure that neither the MADD nor the MUL are more capable units?

    Modulo 64 vs 48 TMUs, which is a bit of a shock. Thsi sounds like a texturing monster. One thing Trini doesn't share is that, despite the clock differences, apparently you don't get full use of the TMUs until the results of the addressing can be reused as in 2xAF (if I'm reading that translation correctly).

    Doesn't sound programmable. <sigh>

    Yeah -- this looks like where the transistor budget went. As my interest was mainly in a beefy ALU section, I'm a bit disappointed, but, I daresay most gamers should be quite happy with this.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...