Nvidia Pascal Speculation Thread

Discussion in 'Architecture and Products' started by DSC, Mar 25, 2014.

Tags:
Thread Status:
Not open for further replies.
  1. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Has it been double confirmed, this 1:3 relation of actual TFLOPS to „DL TOPS“? I'd think 1:2 much more realistically.
    I rather think, it's not GP200 in there, but more mainstreamy GP204's (maybe even got demoted to GP206) with 6 TFLOPS each, 3.072 ALUs and 1 GHz.
     
  2. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    939
    Likes Received:
    35
    Location:
    LA, California
    I am not sure this says that much about what a big pascal will look like - the press release states that drive px2 contains 2 tegra SoCs and 2 discrete pascal GPUs. So I guess 4 sp gflops at around 100W? Tegras usually also have GPUs, so maybe they are contributing flops to the peak numbers.
     
  3. dbz

    dbz
    Newcomer

    Joined:
    Mar 21, 2012
    Messages:
    98
    Likes Received:
    41
    Would seem almost certain that the GPUs are a 2nd/3rd tier GPU no? Power envelope, the GDDR5 interface, and what looks like a relatively modest GPU size (unless Jen-Hsun has paws the size of Manute Bol's) seem to point to something less than "big" Pascal. Here's a quick screencap from the presentation:

    [​IMG]
     
  4. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    711
    Likes Received:
    282
    To clarify, of course this PX 2 has no 'big' Pascal, since it uses 2x Tegra Pascal at the front and 2x Pascal GPUs at the back of the PCB. The latter are probably half the big Pascal like GP204.
    Given the aggregate performance of 8 TFLOPS and 24 DTOPS of the 4 PX 2 GPUs, I infer what the first 'big' Pascal could be.
     
  5. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    One other performance boost will be better bandwidth (even with their compression) that seems to hobble the current Maxwell cards above 1080p.
    This could make a big difference for those cards utilising HBM2, and maybe some for those that may get GDDR5x.
    I think one of the reasons Maxwell went with lower bandwidth/bit memory interface was relating to power consumption/efficiency.

    Cheers
     
  6. Frenetic Pony

    Regular Newcomer

    Joined:
    Nov 12, 2011
    Messages:
    332
    Likes Received:
    87
  7. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,998
    Likes Received:
    4,571
    Did they show it working or was this a mockup with woodscrews?


    Assuming roughly ~1 TFLOPs FP32 for each Tegra 7 (expecting twice the GPU performance of Tegra X1, following the cadence between previous iterations), that's 2 TFLOPs for both Tegras combined and 6 TFLOPs for the discrete GPUs. 3 TFLOPs per GPU.

    Thinking in mobile graphics solutions because those are MXM cards, 3 TFLOPs is close to a Geforce GTX 980M, or twice that of a GTX 960M using a GM107.
    I'm guessing those are two Pascal GP107 cards, if the Pascal architecture turns out more of a Maxwell 3 with FinFet for twice the transistors and execution resources, as it's been suggested.
    If GM107 doubled the performance of GK107, it makes sense that GP107 makes that transition again.

    On the desktop front, if they end up using say 20% higher clocks, then we are indeed looking at the compute performance of a GTX 970, though probably with significantly less fillrate resources (only 32 ROPs for a 128-bit bus).

    Either way, these cards are probably far away from AMD's Polaris Mini that they showed up and running. Different performance segments at least.
     
  8. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    TBH I am not sure what can be taken from the Drive PX 2 design and the TFLOPs presented, this product seems to be specially designed towards Deep Learning with tracking and processing a large amount of objects and real world environment in context of large scale visual recognition.
    In that terms closest comparison between Titan X and this is the AlexNet benchmark NVIDIA presents and even that is not ideal; Titan X has 450 images/sec while Drive PX 2 has 2,800 images/sec.

    Cheers
     
  9. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    711
    Likes Received:
    282
    Given the TDP of 250Watt for this PX 2 card it points more in the direction of a GPx04. Also looking at the die size on the photo, it's in the 3-4 cm2 range. As for neural net computation you are mainly limited by memory bandwdith, I bet the bus is 256 bit wide.
     
  10. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,277
    Likes Received:
    3,726
    What's a DTOP?

    I thought I read something that said the GPU was cut down for compute only. Guess texture units and ROPs and all of that was ripped out.
     
  11. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,184
    Likes Received:
    1,841
    Location:
    Finland
    DL TOPS
    Deep-learning TeraOPS
     
  12. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    711
    Likes Received:
    282
    But what does a DL OP do ? Can it only be used for deep learning ?
     
  13. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    It helps with car-vehicle safety/auto driving/warning systems/etc - uses multiple cameras mounted on the vehicle.
    Hence why I mentioned the AlexNet benchmark NVIDIA presented where the Drive PX 2 is 6x more powerful than the Titan X for handling images/second.
    I cannot see how this product can be compared to traditional GPU usage/context as it has a specialised purpose, albeit much more powerful than what is currently available.
    Considering what it is doing, the power consumption and size is pretty good.

    Cheers
     
  14. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Just to add,
    these two links help to put what NVIDIA is doing in perspective and how it relates to the recent news being discussed:
    "1st gen" before latest news: http://blogs.nvidia.com/blog/2015/01/06/audi-tegra-x1/
    Latest news evolution of the following: http://www.nvidia.com/object/drive-px.html

    I am not sure if Audi's Bobby (it beat a journalist around a race track 1 lap timed lol) self drive system was built on the NVIDIA technology or was Audi's own work, although they are signed up for the current Drive PX 2 development and was working with NVIDIA back in beginning of 2015 and even going back quite a lot further still.

    Cheers
     
  15. Voxilla

    Regular

    Joined:
    Jun 23, 2007
    Messages:
    711
    Likes Received:
    282
    I don't agree at all. The 2 GPUs at the backside are traditional discrete GPUs of some Pascal variant.
    So does think: http://www.anandtech.com/show/9903/nvidia-announces-drive-px-2-pascal-power-for-selfdriving-cars

    Regarding SP FLOPS / Watt performance is actually pretty poor and not much better as Maxwell at 250 Watt
    But the question remains what is a deep learning operation (DL OP), we all know what is a floating point operation ie a add or a mul.
    I'm pretty well aware what is needed to compute deep neural networks, and most of it is just multiply and accumulate...
     
  16. firstminion

    Newcomer

    Joined:
    Aug 7, 2013
    Messages:
    217
    Likes Received:
    46
    "What if we made up a new metric so we look good?"
     
  17. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,277
    Likes Received:
    3,726
    Now that I know what a DL OP is, I'll probably never acknowledge it again.
     
    Kej, Malo, firstminion and 4 others like this.
  18. AnarchX

    Veteran

    Joined:
    Apr 19, 2007
    Messages:
    1,559
    Likes Received:
    34
    Wrong base for GDDR5. Is actually ~400mm² and if you check package parts and production date provided by Anandtechs high-res pics, you will see NV just show GM204.
     
  19. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,420
    Likes Received:
    179
    Location:
    Chania
    A simple "like" is not enough for that one :runaway:
     
  20. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    The problem is this architecture (Drive PX 2 utilising 2 Pascal GPUs) is combining ARM and Denver processors and specifically for Deep Learning.
    How can you do any comparison to a traditional Pascal discrete GPU?
    If you compare then the Drive PX 2 is 6x more powerful than a TitanX in the task it was built for at the same wattage; specifically large scale visual recognition and object-image processing/compute - 450 images/sec for TitanX while Drive PX 2 has 2,800 images/sec.
    So they may have similar TFLOPs but it is meaningless as their core scope-focus-implementation are very different with one only benchmarked while tightly linked to Denver and ARM implementation.
    Cheers
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...