Nvidia GT300 core: Speculation

Discussion in 'Architecture and Products' started by Shtal, Jul 20, 2008.

Thread Status:
Not open for further replies.
  1. Richard

    Richard Mord's imaginary friend
    Veteran

    Joined:
    Jan 22, 2004
    Messages:
    3,508
    Likes Received:
    40
    Location:
    PT, EU
    One would hope that by then we wouldn't need POST-screens anymore. Who am I kidding, we're going to still be posting in x86 mode regardless... the future will bring x86 bootstrap hw in the mobo using CMOS for working set; mark my words. For an industry so quick to change we are an awful crotchety bunch.
     
  2. GrapeApe

    Newcomer

    Joined:
    Apr 3, 2004
    Messages:
    57
    Likes Received:
    2
    Location:
    Calgary, Canada
    With nV being so die space limited again, focusing heavily on the Tesla family in design, and trying to pack in as much compute power onto the die as possible under current fab, I'm guess the return of the NVIO is a safe assumption, no?

    With that, what would the limitations be towards putting more than one traditional NVIO on the the PCB to allow for greater multiple monitor configurations (more as a rarer 'we can do it too' configuration than as a general design). With the DRAM and ROP/RBE partitions being an odd number as inferred from the blurry-diagram, I'm assuming a six-cluster would be easier to feed to two external NVIOs than 3 distinct groups of even numbers.

    It would be another way to address a PR checkbox, in an era of the return of the checkbox (3DVision, Eyefinity, PhysX etc), and if possible would be simpler than an NVIO near-term redesign.

    I'm just not sure of the restriction on the NVIO as there's not too much on the underlying design, just the base components included (TMDS, RAMDACs, etc).

    I always thought the NVIO was a cop-out for near term, but would be essential if you wanted to go to an multi-die MCM style future design to avoid duplication of resources and maximize the transistor budget for this and the idea of multiple offspring designs (like Tesla).

    I know there's 2 NVIO on the GTX295, but that's primarily due to the SLi considerations when communicating with the bridge.

    Anywhoo, just curious if anyone knows for sure if dual NVIOs per chip was possible, or if it's limited by memory interface or RBE/ROP restrictions by design?
     
  3. jaredpace

    Newcomer

    Joined:
    Sep 28, 2009
    Messages:
    157
    Likes Received:
    0
    What is this?

    [​IMG]
     
  4. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,319
    Likes Received:
    23
    Location:
    msk.ru/spb.ru
    AFAIK even the first version of NVIO allows to have 4 simultaineous outputs.
    And NVIO has nothing to do with being die size limited.
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    At best 10-20% better performance than HD4890 in games despite having more bandwidth and being dramatically larger. I don't see any overstatement there.

    Maybe they did but AMD blanked them :sad:

    It wasn't a reference to the performance of RV740. It was a reference to the ability to refresh on a new node and improve all performance-per metrics significantly.

    Why? They're direct competitors (until Cedar arrives). If it was higher performance and/or lower-power we'd say "that's the benefit of 40nm". Instead we're just scratching our heads.

    It'll need to be quite a turnaround. Remember NVidia was boasting about expecting to be first with 40nm chips.

    When something as "simple" as GT218 is delayed and working badly it's not particularly surprising that NVidia's not ready for W7 launch with a 40nm D3D11 GPU.

    Jawed
     
  6. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    G92(b)
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Ooh, very interesting, thanks. Can't find anything about those online :sad:

    Is there something similar for use in DS to help in obtaining attributes at the newly generated points?

    Jawed
     
  8. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    1,319
    Likes Received:
    23
    Location:
    msk.ru/spb.ru
    I fail to see any correlation between GT218 and DX11.
    And it was late because of TSMC not NVIDIA. Which rises the question of who's to blame for it's power characteristics also.
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Another thought is merely that the L2 system can query the RBE-owned render target structures and either decode the RBEs' compression tag tables or request decompression semantics for the data it wants to fetch from memory. So the on-chip linkage might be quite simple and L2 is simply doing most of the work, rather than having RBEs fetching the data and using the render target caches.

    Sure this isn't just the regular L1 cache that's used for textures? I don't trust Anandtech.

    Jawed
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Even Larrabee has RCP and RSQRT intrinsics :grin: I wonder what the throughput for these is. I guess that's the cost of doing graphics, rather than just general compute. The EXP2 and LOG2 functions are useful too - though base-2 stuff is pretty easy I dare say (partly re-using FTOI/ITOF I guess?).

    Jawed
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I was looking at it from the point of view of the throughput for the ALUs, that 1 operand per clock is available per MAD: 30 SIMDs * 8 ALUs * 4 bytes * 1476MHz (GTX285) = 1.417TB/s.

    Jawed
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    I don't know. It seemed like an odd thing to just make up out of thin air.
     
  13. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,428
    Likes Received:
    426
    Location:
    New York
    Oh, ok, though I'm not sure if the register file and/or shared memory runs at the hot clock. For one thing, results from the pipeline are written 16 at a time which implies some sort of buffering.
     
  14. Arty

    Arty KEPLER
    Veteran

    Joined:
    Jun 16, 2005
    Messages:
    1,906
    Likes Received:
    55
    That's hardly an excuse, AMD didn't suffer as much so it comes down to NV's design.

    Thanks for the webcast time, appreciate it. :)
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    See figure 4:

    http://www.ece.ubc.ca/~aamodt/papers/gpgpusim.ispass09.pdf

    Sure, it's not comprehensive, but SFU isn't getting much use there.

    Jawed
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    They both need a 40nm process and it's "safe" for IHVs to build an "easy" chip on a new process before attempting a behemoth.

    You think NVidia was entirely blameless?

    Jawed
     
  17. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    That doesn't prove nAo wrong. I did analysis of these games with RV770, and it has even less dependence on BW for Crysis. Crysis is a bad game to evaluate this, too, as the timedemos/walkthroughs that most reviewers use definately have some parts that are CPU limited.
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    That's due to banking. RF and SM are both twice as wide as the MAD SIMD.

    Jawed
     
  19. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,428
    Likes Received:
    426
    Location:
    New York
    Yeah I know but I took the RF and SM clocks to be 600Mhz (for GTX280). Not sure what's the right way to calculate it.
     
  20. FUDie

    Regular

    Joined:
    Sep 25, 2002
    Messages:
    581
    Likes Received:
    34
    If you gain 8% from a 9% increase in engine and memory clocks, how can you claim it's CPU limited?

    -FUDie
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...