Nvidia GT300 core: Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 20, 2008.

Thread Status:
Not open for further replies.
  1. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,418
    Likes Received:
    178
    Location:
    Chania
    Albeit I don't know anything yet but assuming the 384bit bus for GF100 is true, what guarantees that we might see something similar here too?
     
  2. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I on the other hand believe that CPU style caches dont scale. LRB's rendering pipeline is an ample proof of that. We'll need scratch pad memories, just like cell/gpu's of today. However, the one thing that I'll change over cell is to allow vector scatter gather from global memory as well, and not just async. dma's.

    Cell programmers might be banging their heads against walls, stones etc. But gpu programmers have got on pretty fine in the last 2.5 years on CUDA.
     
  3. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    If you believe that you haven't read enough CUDA based research papers :)
    edit: sooner or later nvidia & ati will add proper coherent r/w caches to their architectures, it's just a matter of time.
     
  4. FUDie

    Regular

    Joined:
    Sep 25, 2002
    Messages:
    581
    Likes Received:
    34
    Yes, I did read what you wrote and I do understand it. And nothing you say contradicts the fact that Crysis scaled better with engine clock. It doesn't matter if the memory wasn't scaling as well due to errors: 9% engine clock gave 5% performance boost. If both engine and memory were increased by 9% the maximum gain we'd expect would be 9%. So 9% memory clock increase could give at most 4% more performance.

    Engine clock is having a larger impact here. Note that engine speed regulates more than just ALU speed, it also controls ROP performance, vertex rates, etc.

    -FUDie
     
  5. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    May be. But I'd like to see someone using r/w coherency of caches on a say O(50) core chip with high performance to be convinced otherwise.

    I am in the software managed caches camp for now. r/w coherent caches hurt more than the help in the O(50) cores regime, as your compute increases as O(p) but your communication increases by O(p^2).
     
  6. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    With naive/simple hw implementations.
     
  7. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    May be it is possible to reduce the O(p^2) to something lower, but I am still waiting for something that uses the r/w coherency of caches on an O(50) core chip with high performance.
     
  8. Rys

    Rys PowerVR
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,156
    Likes Received:
    1,433
    Location:
    Beyond3D HQ
    Those for HD 5870 are done, and were done before I started work on GF100 (thanks Alex!). We'll publish on it soon.
     
  9. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,853
    Likes Received:
    2,271
    GF100 ? where did this come from I know about G300, but Gf100 ???

    edit: and Gt212 what the bloody hell is that ?
     
  10. Dr Evil

    Dr Evil Anas platyrhynchos
    Legend Veteran

    Joined:
    Jul 9, 2004
    Messages:
    5,767
    Likes Received:
    775
    Location:
    Finland
    Go back to post nr. 2548 and read forward.
     
  11. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    BSN
     
  12. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,428
    Likes Received:
    426
    Location:
    New York
    :roll:
     
  13. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,766
    Likes Received:
    470
    Is he just being disingenuous here or does he still don't get it will generally only corrects transfer errors?
     
  14. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,418
    Likes Received:
    178
    Location:
    Chania
    GT212 was IMHO a 40nm/D3D10.1 project which would had been a pretty dumb release considering that it also had a 384bit bus and 32SPs/cluster. It wouldn't had come close to GF100 though but most likely a future performance iteration of it. I'd say that if they had any common sense when they cancelled that project they moved its human resources into a GF10x performance GPU project.

    Since you're asking questions I hope now some come can understand why the intentional false information in supposed roadmaps. They just "named" the D12U something like GTX280 1.5GB.
     
  15. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Isn't there supposed to be 32 kb shared mem per block in dx11?
     
  16. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,319
    Likes Received:
    23
    Location:
    msk.ru/spb.ru
    48>32?
    Ah, I see, it's Theo again.
    He's talking about L1 cache there. Considering there is 1 MB of memory total and 16 KB L1 per SM and 16 SMs (512/32=16) how do you get to 1 MB from 16x16KB?
     
  17. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,428
    Likes Received:
    426
    Location:
    New York
    Looks like that's exactly what they're trying to do. Strange that there's no mention of any graphics specific bits so far. Not saying there aren't any but the focus seems to have veered sharply away from graphics.

    That's true, but the same could be said for G71->G80 which was an even bigger change. Though they are trying to do more stuff now which could have put a strain on resources.

    It's probably safe to assume that if they're serious about computing, performance of atomics would have been high on their todo list. Side question - are the existing caches on GPUS generally useful for non-texture data (not referring to the specialized caches like PTVC)?
     
  18. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,428
    Likes Received:
    426
    Location:
    New York
    Heh, where did you see 48? Theo didn't mention it :razz:

    Ah, I see what you did thar! 1024/16-16=48 :)
     
  19. Rys

    Rys PowerVR
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,156
    Likes Received:
    1,433
    Location:
    Beyond3D HQ
    There isn't 16KB of L1 per SM.
     
  20. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,319
    Likes Received:
    23
    Location:
    msk.ru/spb.ru
    There might be -)
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...