Nvidia Pascal Announcement

Discussion in 'Architecture and Products' started by huebie, Apr 5, 2016.

Tags:
  1. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    They will have the same challenges as Nvidia and that is multiple segments where they need varied FP64/FP32/FP16, this also needs to be balanced against power draw.
    These days it is unlikely (well apart from Nvidia P100 it seems) a manufacturer will use a dedicated GPU die in just one segment out of the three when it comes to their top consumer GPU.
    Need to remember the previous gen were a bit of an anomaly as they had minimal DP, while also they now need good FP16 and Int8 for certain research.
    Cheers
     
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,058
    Likes Received:
    3,116
    Location:
    New York
    Yeah on paper it's a little disappointing as a gaming card vs 1080. Looking forward to reviews.
     
  3. I don't see how those graphics support your claims. I also don't get how Tomshardware can build a "FPS-per-watt" curve (I would totally get it if it was a table)..
    Are they changing clocks on the fly? Which clocks are being achieved for each FPS value? Are they touching the core voltage or is it all using standard voltage? Is the memory being overclocked? It seems like there's a lot of stuff must be oversimplified/assumed to make a curve like that.

    The "performance per MHz" that you claim could be because the chip is hitting other bottlenecks, but the single-threaded advantage - as small as it could be - is still there.
     
  4. Clukos

    Clukos Bloodborne 2 when?
    Veteran

    Joined:
    Jun 25, 2014
    Messages:
    4,688
    Likes Received:
    4,353
    Judging by how well GM204 -> GM200 clocked i'd wager that the new Titan X will be able to hit 1.9-2 ghz easily, if not more (just like GP104). The difference should be substantial, and this is most probably the first GPU able to hit 4k60 in most titles with maxed settings.
     
  5. LiXiangyang

    Newcomer

    Joined:
    Mar 4, 2013
    Messages:
    87
    Likes Received:
    48
    I am really really disappointed about the specs of this GP102

    Not only it is overpriced but also the best they can pull is 11T SP-FLOPS @ 16nm? thats only 44G SP FLOPS/w.

    Whats the point to waste so many silicons on uint8 which nobody would even touch that besides a very limited deep-learning zealots, and what if current DL/AI algorthims evolves into something more complicated/smart than the brainless stupid GEMM and grey computation? wtf?

    A month ago I have a meeting with professors from NUDT of China (National University of Defence Tech), and they are about to release a GPU-like accerlator for China's next gen exscale supercomputer in the near future, that accerlator has a performance of 30-60G DP FLOPS/W and 60-120G DP FLOPS/W @14nm, and thats roughly 2-3X faster @FP32 and probably 100X faster@FP64 than this overpriced piece of silicon, and that accrelator also support a open-cl/cuda-like vectorized computing language.

    Its seems that the lack of proper competition turn Nvidia just like intel.
     
    #1745 LiXiangyang, Jul 22, 2016
    Last edited: Jul 22, 2016
  6. xpea

    Regular

    Joined:
    Jun 4, 2013
    Messages:
    551
    Likes Received:
    783
    Location:
    EU-China
    Like if GP102 won't OC...
    Moreover, at 4K, GP104 is bandwidth limited. With 50% more bandwidth to start with, I predict GP102 will scale much better when OC than GP104. So even if it won't reach 2.1GHz but "only" 2GHz, the gap will remain at least the same when both are OC.
     
  7. The Fury X seems severely bottlenecked by fillrate and geometry output, and Hawaii seems like a much more balanced chip in comparison. The only situation where all that compute performance was put to good use without hitting the other bottlenecks so far has been Doom in Vulkan, which is obviously not good enough to warrant the GPU itself. IMO the only truly good thing that came out of Fiji was the Nano, which seems like a nice deal even today because it's hitting close to 300€.

    That said, let's hope Vega isn't just a simple do-over of Fiji's bottlenecks, with "only" 64 ROPs and 4 geometry engines. Vega doesn't need 16 TFLOPs to be competitive with this new Titan. AMD needs to spend their transistors elsewhere IMO.


    You're both suggesting this card will to 33% core overclock easily?
    Wow...
     
  8. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    Maybe you need some new glasses then, because cause it's clear. At lower clock rates, it's performing relatively faster.

    I don't see how any of that is relevant at all, other than this being an attempt on your part to shoot the messenger instead of the message...
    And as for how to build a curve out of a set of data points... come on now, is this Beyond3d forum or a kindergarten?

    Well obviously those bottlenecks have a much larger impact than your allegued single-threaded advantage (for which I'd like to see a definitely proof of btw). Which brings us to the Titan having most of those bottlenecks at a much higher limits. i.e memory B/W 50% higher, pixel fillrate around 35% higher, 50% bigger L2 more than likely, 35% higher geometry rates more than likely, etc.
     
    DavidGraham likes this.
  9. Clukos

    Clukos Bloodborne 2 when?
    Veteran

    Joined:
    Jun 25, 2014
    Messages:
    4,688
    Likes Received:
    4,353
    Why not? The previous Titan X is "clocked" at 1000 MHz yet it can easily be overclocked to 1450-1500 MHz. At "worst" a 45% overclock.

    Edit: With the way Pascal GPUs are behaving so far it'd be logical to expect 1900 MHz to be achievable, at least.
     
    #1749 Clukos, Jul 22, 2016
    Last edited: Jul 22, 2016
  10. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    That's a pretty bold assertion; would you care to back it up?

    PS: No, the pun was not intended, but I'll leave it anyway.
     
    homerdog, Razor1 and DavidGraham like this.
  11. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
  12. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    If it's bandwidth starved, why can't GTX 1080 even pull to the theoretical bandwidth difference of 25% with 25% more execution units and higher clocks to boot, compared to 1070? In fact, the performance gap barely widens couple %s at 4K compared to 1080p
     
  13. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    It's GP102, and the transistor count is far lower than GP100, so FP64 units got cut I assume
     
  14. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Yeah I agree,
    I would had expected DP to be around 1.2-1.8Tflops but I am not longer convinced it even has that now (context if it was to be used also as Tesla, news mentioned it would be at least a Quadro part).
    Cheers
     
    ieldra likes this.
  15. ieldra

    Newcomer

    Joined:
    Feb 27, 2016
    Messages:
    149
    Likes Received:
    116
    6144 ALU Vega 11 would be roughly 14.7tflop at 1200mhz

    GP102 at 2GHz (28SMs, not full complement of 30; 3584 ALU) is 14.3tflop.

    I agree with the poster above, AMD should spend their transistor budget elsewhere and produce a balanced GPU for a change, shader throughput is nice only when you can out it to use, otherwise it's like a third nipple on the elbow: useless
     
  16. Benetanegia

    Regular

    Joined:
    Sep 4, 2015
    Messages:
    394
    Likes Received:
    425
    Compared to the 1070 the 1080 has 33% more execution units and higher clocks to boot. Thus the agregate outputs are 37% higher at boost clocks.

    Performance aligns more with bandwidth than anything else. Specially if we also take the 1060 into account. 60% for bandwidth and performance.

    The 1070 doing slightly better could be explained by it having te same amount of ROPs and L2 as the 1080.

    [​IMG]
     
  17. ieldra

    Newcomer

    Joined:
    Feb 27, 2016
    Messages:
    149
    Likes Received:
    116
    Can anyone explain where they're getting 44TOP/s Int8?

    It see seems its just 4x the fp32 rate, but that's considering the use of FMA so it's already two ops per cycle.

    22TOP/s makes sense to me as it would be 4x the int32 rate with one operation per instruction, assuming 1:1 ratio of FPU:ALU

    Just read about dp4a and dp2a2, that would make sense
     
    #1757 ieldra, Jul 22, 2016
    Last edited: Jul 22, 2016
  18. CSI PC

    Veteran

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Here was the dp4a testing done earlier in the year, and I think it talked about dp2a as well and also limitation between sm_60 and _sm61 [edit yep just read it again and does].
    https://devtalk.nvidia.com/default/...e-gtx-1080-amp-gtx-1070/post/4889750/#4889750
    Person to read is Scott Gray.
    Cheers
     
    ieldra likes this.
  19. Razor1

    Veteran

    Joined:
    Jul 24, 2004
    Messages:
    4,232
    Likes Received:
    749
    Location:
    NY, NY

    Same thing they said about Fury ;)
     
  20. DuckThor Evil

    Legend

    Joined:
    Jul 9, 2004
    Messages:
    5,995
    Likes Received:
    1,062
    Location:
    Finland
    Pascal Titan vs 1080 is very close to being the same as 980Ti vs 980. Titan will have a very clear edge in performance and no 1080 model will be able to overcome that, unless LN is used. The Titan should boost to 1600Mhz+ out of the box anyway and overclock reasonably close to the same frequency as the 1080 does, just like with previous nVidia architectures Kepler and Maxwell.

    Yes the stock cooler with stock power targets and temp limits will hold it back somewhat, but those can be cranked up and even at stock, it will be out of touch from the 1080.

    It would be nice to see this chip with better coolers though. They could still release a very formidable 1080Ti cards with custom coolers, even if they cut 1 or 2 more SMs and I do expect that to happen.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...