G80 Shader Core @ 1.35GHz, How'd They Do That?

Discussion in 'Architecture and Products' started by ^eMpTy^, Jan 15, 2007.

  1. stepz

    Newcomer

    Joined:
    Dec 11, 2003
    Messages:
    66
    Likes Received:
    3
    Because the SIMD vector size is a really important performance characteristic that shouldn't be omitted. Information wise I think it's equivalent to say 16 8way SIMD ALUs and 128 ALUs in 8way SIMD configurations, depending on how you define an ALU. I'm used to the microprocessor architecture worlds tradition to call SIMD units a single ALU and think that it'd be confusing to arbitrarily change that convention when talking about GPUs. GPUs are microprocessors too you know.
     
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,537
    Likes Received:
    589
    Location:
    New York
    Heh, well it seems your "marketing bullshit" stamp shouldn't only apply to G80. You make a valid point but if we use prior generations' common definition of an ALU then Nvidia hasn't really changed anything by saying G80 has 128 of them. What you're proposing seems to go against the GPU convention that we've been used to for a while now.
     
  3. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,429
    Likes Received:
    181
    Location:
    Chania
    I personally couldn't give a rat's ass if someone calls it a 16 or 128 ALU design, as long as he bothers to define what each unit is capable of.
     
  4. stepz

    Newcomer

    Joined:
    Dec 11, 2003
    Messages:
    66
    Likes Received:
    3
    Unfortunately the current convention doesn't apply to G80 in any reasonably accurate way. Thats what you get if you use a convention that ignores reality.
     
  5. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    It doesn't ignore reality. It simply lists things as they are seen from the point of view of the programming model. You could argue it's not ideal from an information perspective, but I wouldn't call it dishonest or anything either.


    Uttar
     
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,537
    Likes Received:
    589
    Location:
    New York
    Sure if you want to take the pedantic approach but from a marketing point of view I see nothing wrong with it. It's not like they're lying based on some layman's definition of an "ALU". I just think it's a bit late to be re-defining GPU processing units based on the SIMD width.

    I also disagree that the current convention doesn't apply to G80. We used to look at units on a per-fragment basis. If we do the same for G80 we come up with 128 so I'm not sure what you're pointing at there.

    PS: All this renaming is kinda weird - Uttar is now "Arun Demeure" ? It's as if he's a real person all of a sudden! :shock: :razz:
     
  7. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    :shock: But yes, staff members will now have their real names used as forum nicknames. Oh well, I guess I'll survive... maybe!
     
  8. stepz

    Newcomer

    Joined:
    Dec 11, 2003
    Messages:
    66
    Likes Received:
    3
    It depends on what are the necessary criteria for applicability of the current convention. If you ignore the requirement for being useful in performance predictions and only require countability of some kind of hardware features by some rough mapping of the convention, then sure, it applies. I maintain that the so called per-fragment ALU counts of G70 and G80 are incomparable. (what would be the per-fragment ALU count of G70 anyway)
    I am pointing out that if your metric doesn't have a clear fundamental microarchitectural counterpart, then it is to be expected that you can't use it to do any relevant comparisons between generations.
    I was actually originally trying to point out that the execution units (to use a less overloaded term) really aren't any smaller than in the previous generation. In hindsight, I should have been more polite, verbose and less controversial. I'm just annoyed at marketing for taking perfectly good established terminology, and then using it completely differently. I object to the images that all this talk about sea of units and 128 ALUs conjures up. In reality it's still pretty much the same bunch of heavily multithreaded SIMD cores. With a significantly better organised thread allocation system though.
     
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,537
    Likes Received:
    589
    Location:
    New York
    Well yeah that's exactly right - in the past we've just "counted" ALU's in this manner without much regard for architectural differences. That's when the flops counting started......

    Most people/reviewers counted it as 24.

    So how do you plan to do comparisions between G80 and R600 ALU counts?

    But how do you reconcile that with the fact that G80 can have 128 different fragments or vertices in flight at a given stage in the pipeline? What you're proposing would be a marketing nightmare - they would be going from 24 to 16. How do you think that would stack up against AMD's 64/96 etc, etc? Marketing isn't exactly just some annoyance that corrupts the honest engineering truth. It is an integral part of the process.
     
  10. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Well, pipeline counts are beside the point, as far as I'm concerned. They just aren't going to be very comparable, since nVidia's going scalar and ATI's going vector. So it'd be best just to go by the benchmarks and be done with it.

    And right now, the benchmarks say that the 8800 is really a beastly processor. How will it look when the R600 comes out? Well, it should be very interesting, at the least.
     
  11. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,537
    Likes Received:
    589
    Location:
    New York
    Yeah but you can't put benchmarks on a retail box or spec sheet which is the issue at the root of this discussion me thinks.
     
  12. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Quite right. But spec sheets have always been full of misleading crap. I don't see why that should stop now. See, for example, the tradition of using graphics card memory size on required and recommended specifications on games.
     
  13. Unreal

    Newcomer

    Joined:
    Dec 20, 2006
    Messages:
    10
    Likes Received:
    0
    I believe nvidia was able to raise the clock so much mainly due to the fact that the ALUs got simpler, 32bit scalar instead of 128bit vec4.

    CPUs can achieve higher clocks cause they have less transistors, so for the same frequency less heat dissipation and therefore less power demand.

    Frequency depends on the number of transistors, you can have few transistors with high clock speed, or many transistors with low clock speed. Depends on what you want to do. If you want more pipelines each more complex (GPU approach) , then you need more transistors hence you cannot raise the frequency so much. If you want few pipelines each not so complex (CPU approach), then transistor count is lower hence you can raise the frequency much more.
     
  14. INKster

    Veteran

    Joined:
    Apr 30, 2006
    Messages:
    2,110
    Likes Received:
    30
    Location:
    Io, lava pit number 12
    I don't think that's true, but i'll let the chip design experts detail it further.
    To my knowledge, CPU clockspeeds scale much better because the circuit design is much more "hand-tuned" than in a GPU.
    That is, GPU designers can't afford the man-power, don't have to include full general purpose computing capabilities and respective "baggage" (x86, etc), have to release products in a narrow timetable, and don't enjoy the often 50%+ profit margin provided by CPU's.
     
    Simon F likes this.
  15. Techno+

    Regular

    Joined:
    Sep 22, 2006
    Messages:
    284
    Likes Received:
    4
    Inkster, what u said is another reason for difference in CPU and GPU clockspeeds. Gfx ASIC designers just use a collection of building blocks for building GPUs, while CPU designers kind of start CPU desgin form scratch. That's why CPUs have almost 3 times the desgin cycle of GPUs.
     
  16. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,326
    Likes Received:
    107
    Location:
    San Francisco
    Like G80 u mean? :razz:
     
  17. Techno+

    Regular

    Joined:
    Sep 22, 2006
    Messages:
    284
    Likes Received:
    4
    i meant that it takes us 18-24 months to see a new generation of GPUs appearing, unlike CPUs which take almost 4-5 years, however, I think this will be changing with Intel.
     
  18. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,326
    Likes Received:
    107
    Location:
    San Francisco
    G80 was started 4+ years ago
     
  19. Unreal

    Newcomer

    Joined:
    Dec 20, 2006
    Messages:
    10
    Likes Received:
    0

    You may have a point with the hand tuning, but i still dont believe it is the main reason.

    The increased transistor count (increased power, more difficult to cool) and the increased complexity(more complex interconnects which leads to signal interference, signal attenuation and signal delay times issues) of the GPUs are the main reasons.

    These reasons make even the best "hand tuning" not capable of delivering a substantial increase in clock speed.
     
  20. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,295
    Likes Received:
    3,622
    Location:
    Well within 3d
    There's also the process factor. GPUs thus far have been fabbed on foundry processes that must cover a wide range of customers and customer designs. Timings and drive currents are inferior to the more extreme engineering involved in high-end CPU processes.

    Timings could be 2-3 times worse for a foundry process than they are for an Intel or AMD fab at the same geometry.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...