NVIDIA Maxwell Speculation Thread

Discussion in 'Architecture and Products' started by Arun, Feb 9, 2011.

Tags:
  1. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Thanks, I fixed that! Of course the GM107 has 16 ROP as well but benefits from the bigger rasterizer and 4-5 SMM able to deliver 16-20 4 bytes pixels per clock. And you're correct that FP32 throughput is not limited by memory bandwidth, I just double checked that.

    I'm using my own test. It shows the best case when it comes to data compression opportunities.
     
  2. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    Ok. I still heavily disagree with the fp32 blend conclusion though :).
    The bandwidth needed for 4xfp32 would be 4 times more than for 4xint8. Ok compression could change that but the result is way way lower. I believe Kepler/Maxwell can do fp32 blend only with 1/16 rate (but being able to use all resources for just one channel, hence 1/4 rate for single-channel fp32). This matches the actual numbers coming out of the test MUCH better than assuming it's bandwidth limited...
    I'll assume no data locality though (the 3dmark result seems to indicate it could take advantage of large (ROP) cache to me).
     
  3. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Looks like I answered to this second part at the same time as you were posting hehe

    I agree that everything points to the FP32 blending throughput being limited to 1/16. It's actually what I wrote previously.
     
  4. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    Ok I agree then :).
    GM107 definitely seems to make good use of its available resources. Now I realize the complexity (transistor count) is quite close to Bonaire (and performance is quite close too, though with 20% less memory bandwidth to boot), but all the raw numbers (peak fp rate, tmus, rasterization) are nearly identical to Cape Verde instead.
     
  5. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
    http://www.anandtech.com/bench/product/1037?vs=1130

    Is the GTX 750 Ti suppose to outperform the GTX 770 in Luxmark 2.0? That's impressive for a little GPU.

    P.S Nevermind, I just looked at pcgameshardware and GTX 770 gets about double the score with the latest drivers, still the GM107 is close to it.
     
    #1265 DSC, Feb 27, 2014
    Last edited by a moderator: Feb 27, 2014
  6. LiXiangyang

    Newcomer

    Joined:
    Mar 4, 2013
    Messages:
    81
    Likes Received:
    47
    Looking at the folding@home benchmark (which employs an embrassingly parallel method) the computing efficiency of Maxwell has been improved quite significantly.

    Faster than GF100 despite of roughly comparable gflops and half memory bandwidth:
    http://www.anandtech.com/bench/product/1135?vs=1130

    Roughly 1/2 of Titan's performance:
    http://www.anandtech.com/bench/product/1060?vs=1130

    3/4 the computing power of the AMD 290X here:
    http://www.anandtech.com/bench/product/1056?vs=1130

    Very efficient arch, would save people alot of time on optimization.
     
  7. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,907
    Likes Received:
    1,607
    http://eu.evga.com/articles/00821/
     
  8. homerdog

    homerdog donator of the year
    Legend Veteran Subscriber

    Joined:
    Jul 25, 2008
    Messages:
    6,128
    Likes Received:
    903
    Location:
    still camping with a mauler
    Could please post some sort of description or provide some context for that?
     
  9. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
    Blender users are testing and benchmarking GM107, early results, seems to be close to GTX 570 at a much lower power usage. Still more testing to be done.
     
  10. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,797
    Likes Received:
    2,056
    Location:
    Germany
    So it seems, results from Luxmark and SLG with higher complexity models seem to be valid performance indicators if done properly with the same driver revision.

    edit:
    Ah yes, right. Register contention occurs within the schedulers domain. So it should even be a bit worse now, with the slightly higher lifetime of each Warp compared to Kepler ("The additional ALUs of Kepler could only lead to a (most of the time) marginally faster execution, that's all."), yes?
     
  11. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
  12. iMacmatician

    Regular

    Joined:
    Jul 24, 2010
    Messages:
    771
    Likes Received:
    200
    What is the GTX 745? A desktop GM108?
     
  13. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    Device-ID 138x is GM107. Maybe a DDR3 GTX 750 which allows vendor pleasant 4/8GiB versions. :lol:
     
  14. Blazkowicz

    Legend Veteran

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    Maybe an OEM card with lower TDP.
    DDR3 card would really not deserve an 'X' at all :)
     
  15. DSC

    DSC
    Banned

    Joined:
    Jul 12, 2003
    Messages:
    689
    Likes Received:
    3
    http://forums.laptopvideo2go.com/topic/30757-hp-mobile-driver-33233/

    I believe Device-ID 134x is GM108.
     
  16. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    Well it could be both. If it's ddr3 and they do it "right" at least (that is, on a 4 SMM card lower the core clock a bit because it won't matter one bit anyway) tdp would get trivially down to like 30-40W.
    I fully agree it wouldn't deserve a GTX designation, but I'm sceptical that would stop them...
     
  17. LiXiangyang

    Newcomer

    Joined:
    Mar 4, 2013
    Messages:
    81
    Likes Received:
    47
    The problem with luxmark is, just like lots other benchmarks, it only support Open CL, and NVIDIA's Open CL support is very poor, so its not a very good performance indicator for nvidia products thus shouldnt be a benchmark for cross-platform comparsion.

    Whilst Folding@home support both Open CL and CUDA routines, thats why I picked it as a performance indicator for cross-platform comparisions.
     
  18. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,166
    Likes Received:
    1,836
    Location:
    Finland
    It all depends on whether you want to test "theoretical performance" or real-life performance - if NVIDIAs OpenCL support is bad, it's bad, and reviews should point it out rather than just picking only software where the card performs well to test with
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...