G80 programmable power

Discussion in 'Architecture and Products' started by Pigman BABY!!!, Jan 25, 2007.

  1. PeterAce

    Regular

    Joined:
    Sep 15, 2003
    Messages:
    489
    Likes Received:
    6
    Location:
    UK, Bedfordshire
    Eh? Isn't this some repeat of an old discussions here?

    http://www.beyond3d.com/forum/showpost.php?p=895509&postcount=351

    Personally I didn't realise that the MADD ALUs were decoupled and seperatly threaded from the SF ALUs.

    But after checking..... here in the G80 Arch article :

     
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,414
    Likes Received:
    411
    Location:
    New York
    Yep I saw that line too. But to be fair "performed outside" doesn't really paint a clear picture of independent scheduling. But given the different execution times it does seem more obvious now that the main ALU and the SF units are fed independently. I'm actually surprised this wasn't fleshed out in a bit more detail - it seems like a pretty big deal.
     
  3. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,414
    Likes Received:
    411
    Location:
    New York
    Heh, I just noticed that Jawed's approach here would always give R600 the per-flop efficiency edge due to its assumed 2-cycle MAD. How about we invoke a new standard - per channel utilization! :)
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    But that text is wrong. SF is pipelined to produce one result every clock. Each of the four MI/SF units in a shader cluster can produce 1 SF or 4 MI results per clock.

    There is no looping in the MI/SF to produce any results.

    As for "outside", all that means is that SF doesn't share an ALU component with MAD functionality, there's no multi-threading implied there. In prior NVidia GPUs SF was shared functionality on the fourth component, Alpha.

    In NV4x/G7x each of the superscalar ALUs shares the SF workload. I can't remember the exact division, but for instance RCP and RSQ are on the top ALU while the rest are on the bottom ALU.

    Jawed
     
  5. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    It might be 'wrong' (I have better things to argue with right now), but it's also a lot less wrong than the majority of the bullshit you write on these very forums. Not only that, but unlike many of your posts, we also marked it very clearly to imply that we were unsure when we wrote that. You could argue the same quality standards do not apply to posts than to articles, but I wouldn't really consider that as much more than an easy and improper way out of the arguement.

    If you think the way this website should be handled is to have a single article that we update every 5 minutes, rather than new ones over time, then please be my guest and get the hell out of here (and yes, this is an exageration, so don't bother saying 'I don't think it should be every 5 minutes, maybe every 10 or so...'). If the place doesn't fit your tastes, nobody's forcing you to stay, and especially with that attitude and viewpoint.


    Uttar
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    That's why I included pixel rate and pixels per clock. Also remember this is just messing about with "difficult" code, rather than code that easily utilises each ALU. It's not meant to be typical, but to illustrate how ALUs can lose utilisation.

    Additionally it's worth noting that whenever there's a SF unit, you have a problem counting FLOPs (if you want to take FLOP counting seriously). An SF is actually more than just 1 FLOP, yet in R580/Xenos/G71 I've treated it as being 1 or 2 (because of the shared use for ADD or MAD). Their respective efficiencies would look worse if SF counted for more.

    In G80 each SF is counted as 4, not 1, FLOPs. That's because (8x2 + 4x2) FLOPs per half-cluster x 16 half-clusters x 1350 = 518GFLOPs.

    In R600, my hypothesis is that an 8-clock macro (2 ADDs and 3 MADs) is used to calculate SF, so in this case there's no argument one way or another about the relative FLOP cost of SF. I sized R600 at 512GFLOPs.

    Jawed
     
  7. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Why doesn't Bob, and others, just come out and say exactly what they mean? :-| That would help clear up the confusion.
     
  8. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,414
    Likes Received:
    411
    Location:
    New York
    What's with all the staff hostility towards Jawed all of a sudden? I'm guessing it has to do with more than just the stuff posted in this thread. You guys can't be THAT sensitive to criticism, or can you? :???:
     
  9. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    I can't talk for them (I'm not parte of the site staff) but it's pretty clear to me that Jawed criticism is not exactly that kind of constructive criticism B3D is looking for.
     
  10. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    I think it's fairly clear we had a fair number of valued members getting pissed off at him because he spews off so much inaccurate information, on a variety of subjects. And when it's not "information", it's 3 pages long posts of speculation that insiders literally roll their eyes at. That's obviously not true of every single one of his posts, but that's not the point either.

    While this would be acceptable by itself, combined with the fact he has already crapped on the G80 Architecture article as having inaccuracies in, iirc, at least 2 other threads and a variety of other posts, I think this is getting absolutely ridiculous. I'd like not to have to be so harsh about this, but at this point, I doubt the message would pass otherwise.

    I don't have anything personal against Jawed, and I don't really have anything against big speculative posts either, even when in the end, it turns out they were completely wrong. I had my fair share of those back in the days, heh - and some people definitely appreciate them when they aren't the vast majority of your posts, or of a thread's posts. Anyway, when an overall situation degrades below basic quality standards over extended extended periods of time, something is wrong. And when that is combined with him shitting on our work and implying he could do a much better job (that might not be how he means it, but it is certainly how I'm interpreting it, and various others seem to agree), I think it's about time to be made clear that this has to change...


    Uttar
    P.S.: This represents my viewpoint, although I believe it is shared by other admins - I'm not officially speaking for them here, though, and opinions and magnitudes may vary.
     
  11. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,414
    Likes Received:
    411
    Location:
    New York
    Much appreciated, thanks.
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    In my excitement at having solved the G80 MAD utilisation problem by sequencing instructions across the batch, I didn't notice that it actually requires scheduling across all 32 pixels in the batch to work, not the 16 I indicated :oops: The diagram's wrong, but the throughput is correct.

    Bob, the pixel rates fell because I changed the final RSQ for a DP3.

    Uttar, I'm sure PeterAce appreciates being told there's an error in the article - whether that article resides here or another site. You and I have already debated the error and you agreed that it is indeed so. And that was before the article was updated.

    Jawed
     
  13. Bob

    Bob
    Regular Subscriber

    Joined:
    Apr 22, 2004
    Messages:
    424
    Likes Received:
    47
    Why do you guys just assume I know what I'm talking about?
     
  14. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Now Douglas Adams has some serious competition for my signature space..:lol:
     
  15. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    213
    Location:
    Uffda-land
    Well, we don't all! Some of us remember your defense of Kirk's comments re NV30's 128-bit bus! :cool:
     
  16. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Interestingly, if I look at NVIDIA's roadmap, we might just have new arguements for that discussion soon! With R600's apparent 512-bit bus and NVIDIA's comparatively more conservative approach in the mid-end (128-bit + DDR2/GDDR3), it'll be very interesting to see how both companies' memory bandwidth efficiency will compare. Don't you just love how OT you suddenly made this thread, geo? ;) (not that it wasn't OT enough already!)


    Uttar
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...