AMD: R8xx Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 19, 2008.

?

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

Poll closed Oct 14, 2009.
  1. Within 1 or 2 weeks

    1 vote(s)
    0.6%
  2. Within a month

    5 vote(s)
    3.2%
  3. Within couple months

    28 vote(s)
    18.1%
  4. Very late this year

    52 vote(s)
    33.5%
  5. Not until next year

    69 vote(s)
    44.5%
  1. ECH

    ECH
    Regular

    Joined:
    May 24, 2007
    Messages:
    692
    Likes Received:
    30
    Had you properly understood my question instead of replying to his, referring it to my post(s), would you clearly see that he is only asking what I am asking. IE: Is there any truth to the rumors? This is why you don't answer a question with a question. :grin:
    Therefore, if you don't know then you don't know. Keep in mind that such inquires are common in a speculation thread such as this.
     
    #4361 ECH, Oct 16, 2009
    Last edited by a moderator: Oct 16, 2009
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,112
    Location:
    New York
    Does anyone know what the L1 and LDS bandwidth is on Cypress? The slide deck says 960 dwords/cycle but I'm assuming that's combined L1+LDS.

    Also, is it still true that all operands must first be fetched from the LDS into the register file before being made available to the shader core? Just wondering how the core is fed such a huge number of operands per cycle.
     
  3. neliz

    neliz GIGABYTE Man
    Veteran

    Joined:
    Mar 30, 2005
    Messages:
    4,904
    Likes Received:
    23
    Location:
    In the know
    If you want numbers. .well.. it's 50-50 right now.

    And regarding "FUD" it's actually anti-Fermi FUD, this card would be very close to projected Fermi performance while costing less
     
  4. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    As a pure Fermi counter it will be only coming if

    a) Fermis is really faster then 5870
    b) Fermi is not that much faster then 5870 that 5890 becomes pointless

    and when ATI knows the final clockspeed of Fermi.

    I personally think they will bring a 5980 using selected Chips when Fermi goes retail. .
     
  5. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,418
    Likes Received:
    10,311
    Well in that case the short answer would be.

    Anyone that knows whether that is true or not wouldn't be able to say it's true if it is true.

    Everyone else doesn't know if it's true or not. :) But they may have or claim to have a source that knows.

    In which case, we're back to square one, whether you believe the rumor is true or not.

    Regards,
    SB
     
  6. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,535
    Likes Received:
    144
    And for the chip overall. But if they've given you that, they've also given you LDS fetch rate, because you know how many 32-bit fetches (dword) can be done from L1, so the difference is LDS. Unless that slide is hyper creative.
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Which would seem to suggest that LDS fetch rate is 640 dwords per clock, which is twice L1 fetch rate - which is known to be 80 x 4 = 320 dwords per clock.

    Which would also seem to suggest that LDS in R800 is two 16KB blocks accessed in parallel, since this rate is double the per-16KB-LDS rate of R700.

    Jawed
     
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Was the bandwidth for L1 and LDS additive in RV770?
     
  9. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Do writes not also add to L1+LDS-Bandwidth?
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Ooh, good point. Forgot about that. Erm, I doubt it.

    I've always assumed they're not additive. But I suppose it's conceivable that they are.

    LDS in R700 assembly appears as a TEX clause. I've taken this to imply that texture fetches and LDS fetches (or stores) mutually-exclude each other.

    Hmm...

    Jawed
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    Perhaps - there's the concept of data exchange between threads that can run within a single LDS clause. So a LDS read and an LDS write could be going simultaneously.

    Jawed
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I haven't turned up something stating that TEX and LDS accesses can run simultaneously, though nothing stating they can't.

    If they can in RV770, and AMD obfuscates its bandwidth math a little by counting write traffic to the LDS in the total, Cypress would have double bandwidth of its predecessor by virtue of doubled SIMD count only.
     
  13. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    That sounds like the simplest answer.

    Jawed
     
  14. Spyhawk

    Newcomer

    Joined:
    Oct 31, 2007
    Messages:
    76
    Likes Received:
    1
    Ive been reading several reviews that mention that ATI gpus are Vliw Superscalar ? Is this correct even just a bit or not at all and just marketing mumbojumbo
     
  15. bridgman

    Newcomer Subscriber

    Joined:
    Dec 1, 2007
    Messages:
    62
    Likes Received:
    123
    Location:
    Toronto-ish
    There's a good description in the B3D Cypress article :

    http://www.beyond3d.com/content/reviews/53/5

    ... and more detail in section 4 of the R700 Instruction Set Architecture doc at amd.com :

    http://developer.amd.com/gpu_assets/R700-Family_Instruction_Set_Architecture.pdf

    The shader core organizes the ALUs in sets of 5, and each ALU shader instruction (called an instruction group) includes up to 5 different opcodes (operation + inputs/outputs), one for each ALU. So... superscalar via very long instruction words.
     
    #4375 bridgman, Oct 16, 2009
    Last edited by a moderator: Oct 16, 2009
  16. Mat3

    Newcomer

    Joined:
    Nov 15, 2005
    Messages:
    168
    Likes Received:
    10
    I'm guessing you've been reading the Hardforums board lately?

    I say the "superscalar" part is mostly marketing.
     
    #4376 Mat3, Oct 17, 2009
    Last edited by a moderator: Oct 17, 2009
  17. bridgman

    Newcomer Subscriber

    Joined:
    Dec 1, 2007
    Messages:
    62
    Likes Received:
    123
    Location:
    Toronto-ish
    I'm not sure I understand why that thread got so hot. [EDIT - now I do; it was like that before the superscalar discussion started :D].

    I don't think there is any debate about how the ATI chips work, just a lack of agreement on the definition of "superscalar". Some definitions of superscalar include VLIW while others do not.

    I found references to "static superscalar" implementations (aka VLIW) where opportunities for superscalar execution are determined at compile time, and "dynamic superscalar" implementations where the hardware analyzes dependencies and identifies opportunities for superscalar execution at run time.
     
    #4377 bridgman, Oct 17, 2009
    Last edited by a moderator: Oct 17, 2009
  18. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    3,261
    Likes Received:
    813
    Could it be argued that the splitting of the load between SIMDs is Superscalar?
    Or is that way off base? :oops:
     
  19. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    To me superscalar architecture is just one type of ILP extraction. VLIW architectures (amd gpu's, Itanium) extracts this at compile time. Dynamic superscalar (pentium and all modern CPUs) extract this at run time.
     
  20. Broken Hope

    Regular

    Joined:
    Jul 13, 2004
    Messages:
    483
    Likes Received:
    1
    Location:
    England
    ATI seems to have uploaded the OpenCL beta drivers again but without the 5900 driver ID's in the inf's. I'm guessing they weren't supposed to be letting us know about the 5900 series yet.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...