Nvidia GT300 core: Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 20, 2008.

Thread Status:
Not open for further replies.
  1. dnavas

    Regular

    Joined:
    Apr 12, 2004
    Messages:
    375
    Likes Received:
    7
    That is interesting -- and I wonder which instructions dominate in the SFU as well. Might not be DIV at all. And to nao's point, not a lot of FMAs either. It would be instructive, I would think, to also know the breakdown of MUL vs. ADD. Clearly, of the programs they ran, the DIV:MUL ratio is less than the 1/2 I estimated (from FMA alone), but now I'm curious :)
     
  2. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    Curiously, the analysis that we did for RV790 clock settings said that Crysis (or Crysis Warhead - forget which) was one of the few titles that gained with more bandwidth on this arch.

    Generally speaking though, internal testing on Cypress has indicated similar findings as the FS overclocking does - it benefits more from engine speed than, at least, I expected.
     
  3. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I said parts are CPU (or PCI-e) limited, and your numbers are wrong, too. He overclocks the GPU by 9.4% and mem by 12.5%. He gets 7.2-8.5% gain, depending on resolution and settings. 100% GPU limited would have given over 10%.

    Anyway, that's all besides the point. nAo is right in saying RV870 is more BW limited than RV770. Look at this graph:
    http://www.firingsquad.com/hardware/ati_radeon_4850_4870_performance/images/lpoc.gif
     
  4. SimBy

    Regular Newcomer

    Joined:
    Jun 21, 2008
    Messages:
    502
    Likes Received:
    135
    What do you mean by AMD doesn't have DirectCompute and OpenCL?!

    Not only does HD 5000 series support both, it's actually faster by a fair amount compared to nV GTX series.

    http://www.anandtech.com/video/showdoc.aspx?i=3643&p=8

    Thats nV Ocean demo for DX Compute.
     
  5. CouldntResist

    Regular

    Joined:
    Aug 16, 2004
    Messages:
    264
    Likes Received:
    6
    This reminds me certain paper on compiler optimisation technology. The idea was to use novel technique to implement "virtual" (as in C++ nomenclature) method invocations in generated machine code.

    Typical way of doing this, is to use virtual-method-tables and indirect branches. In the paper, they didn't generate any indirect branches. Instead, at each call site there was generated (inline) tiny binary search tree traversal to find right jump target among set of precalculated candidates. All done with conditional branches. The goal was to better utilise CPU's branch prediction resources, which allegedly were underutilised under the typical way.

    This optimisation technique relied on whole-program-analysis (as opposed to dumb linking of separately compiled fragments, typical in C/C++ world) to make these search tree really tiny, or even to eliminate need of search altogether (in as much as 90% of virtual method invocations in tested programs).

    If this technique was used, I think you could run fully OO code on a GPU, even now (ignoring SIMD, memory organisation etc. of course).
     
  6. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I think most people expect BW to make more of a difference than it actually does. Take a look at my findings here:
    http://forum.beyond3d.com/showthread.php?t=48761

    In most games, the 4850 is BW limited for less than 30% of the frame time.
     
  7. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,319
    Likes Received:
    23
    Location:
    msk.ru/spb.ru
    So RV740 avialability 6 months after it was announced means that AMD didn't suffer as much, eh? And it's price parity with 4850 surely mean the same thing?

    It's not "safe", it's easier, less risky and it's done so that later you won't make the same mistakes with a bigger chip.

    I don't have enough info for any conclusions right now.
    And it puzzles me when I see someone who apparently does.
    It puzzles me even more to read about G300 delays while the initally planned launch frame haven't even passed yet.
    All we have right now is a delay of GT21x series for which TSMC is the one to blame. That's all. How it'll end up with GT21x power/price/performance and GF100 we'll eventually see.
     
  8. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    DirectCompute works, but only on the HD5800-series.
    OpenCL isn't supported yet.
     
  9. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    By my testing, DirectCompute doesn't work at all on any NVIDIA parts yet (maybe it does on Win7?)... not even DirectCompute 4. AMD is clearly a step ahead here with DirectCompute 5 working even on Vista.
     
  10. apoppin

    Regular

    Joined:
    Feb 12, 2006
    Messages:
    255
    Likes Received:
    0
    Location:
    Hi Desert SoCal
    the Jensen Keynote will be live on Nvidia.com at 1 PM; i am here at the GTC now
    - they say about 1/3rd of it requires 3D glasses .. so you know it will be a lot of 3D

    what i am interested in is the PRESS conference after the keynote; it is at 2:45 PM
    - i expect a lot more to be revealed then

    The Fairmont Hotel, San Jose is such a cool place for a technology conference; Nvidia has an entire floor for it .. and (best of all, good) food is free for the press
    :razz:
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Well it turns out that 512MB is too little for a lot of games - the assessment of bandwidth limitation needs to be done with a 1GB card.

    Jawed
     
  12. nexus_alpha

    Newcomer

    Joined:
    Nov 25, 2006
    Messages:
    44
    Likes Received:
    0
    Specs from Bright side of news

    3.0 billion transistors
    40nm TSMC
    384-bit memory interface
    512 shader cores [renamed into CUDA Cores]
    32 CUDA cores per Shader Cluster
    1MB L1 cache memory [divided into 16KB Cache - Shared Memory]
    768KB L2 unified cache memory
    Up to 6GB GDDR5 memory
    Half Speed IEEE 754 Double Precision

    By comparison to ATI Rv870

    20 SIMDS
    16 kb L1 cache per SIMD = 320kb Texture cache
    8 kb L1 cache per SIMD for computational work= 160kb computational cache

    32 kb local data share L1 cache per SIMD = 640kb local data share cache

    128 kb L2 cache per memory controller= 512 kb L2 cache

    L1 cache speed 1 terabyte per second

    L2-L1 cache speed 435 gb per second.
     
  13. Scali

    Regular

    Joined:
    Nov 19, 2003
    Messages:
    2,127
    Likes Received:
    0
    DirectCompute 4 works out-of-the-box on Win7 with 190-release drivers.
    It works on Vista aswell, but by default it is disabled through a registry key.
    The release notes of the GPU Computing SDK tell you how to enable it (that's where Anandtech got the Ocean demo from).
    So looks like nVidia is ahead. They've had support on release drivers for a while, and I don't think we need to compare the installed base :)
     
    #2793 Scali, Sep 30, 2009
    Last edited by a moderator: Sep 30, 2009
  14. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,428
    Likes Received:
    425
    Location:
    New York
    How did they write/show the DirectCompute wave demo in that case?
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    If the BSN specs are accurate, then the L2 size alone indicates that is one part of Larrabee's design that wasn't copied.
    A scheme similar to Larrabee's (in particular the tiling) would need the capacity such an L2 affords, and the L2 given isn't much bigger than that of Cypress.
     
  16. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Why on earth would they do that? The rest of "feature level 10" on DX11 interfaces seems to work fine... silly decision IMHO, but thanks for the pointer. I'll go look into that.

    Huh? You're arguing that ComputeShader 4 support on G80+ HW with a registry key setting somehow puts them "ahead" of ATI's full ComputeShader 5 implementation that works "out of the box" on their latest hardware? From a developer's point of view, you and I have different definitions of "ahead"...

    No point in arguing though, the key point is that I can write CS5 code right now on AMD parts, with no ETA on when I can do that on NVIDIA. This puts AMD as the obviously more useful piece of hardware at my disposal right now :)
     
  17. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,416
    Likes Received:
    178
    Location:
    Chania
  18. madyasiwi

    Newcomer

    Joined:
    Oct 7, 2008
    Messages:
    194
    Likes Received:
    32
  19. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,486
    Likes Received:
    397
    Location:
    Varna, Bulgaria
    From this russian site, one curious line:

    Отсутствие аппаратного блока тесселяции, данный функционал будет реализован программно; -- There is no hardware tessellation unit, the function is implemented on a program level;

    :roll:
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...