Nvidia Pascal Announcement

Discussion in 'Architecture and Products' started by huebie, Apr 5, 2016.

Tags:
  1. xEx

    xEx
    Regular Newcomer

    Joined:
    Feb 2, 2012
    Messages:
    938
    Likes Received:
    398
    Very interesting. Also talking about HDR as far as it seens the Gforce wont be able to run HDR video content(there is no mention about it) which is a disappointment.
     
  2. pixelio

    Newcomer

    Joined:
    Feb 17, 2014
    Messages:
    47
    Likes Received:
    75
    Location:
    Seattle, WA
    That's not correct.
     
  3. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    372
    Likes Received:
    308
    [​IMG]

    [​IMG]

    and Pascal supports 12bit HEVC decode
     
    homerdog, Malo, pharma and 3 others like this.
  4. renderstate

    Newcomer

    Joined:
    Apr 24, 2016
    Messages:
    54
    Likes Received:
    51
    12 or 10 bit?
     
  5. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    372
    Likes Received:
    308
  6. xEx

    xEx
    Regular Newcomer

    Joined:
    Feb 2, 2012
    Messages:
    938
    Likes Received:
    398
    HEVC is the container for h265 witch is a video code but it is not* related to HDR perse. It is just a way to compress the video saving bandwidth. and yes it support 10 and 12 bits color but remember, colors are just a small part of HDR. If you want to be able to use HDR video content you will need to ether implement the standard DV or HDR10. Why? Well because the TVs will ether support one or both. Why is this important? Well both standards are very similar the mayor differences are: HDR10 work with 10.000 nits while DV works with 1000 DV use p3 color gamut and HDR10 use an enchantment of the P3(trying to cover the rec.2020 but not fully supporting it) and the way the meta data is processed. DV process the meta data in the TV: The devise renders in SDR and creates the additional data to make that SDR content into HDR(this saves tons of bandwidth but content creators have only a guess of how the tv will make that conversion. HDR10 "ask" the TV for its capabilities(brightness, contrasts, color gamut, et) then creates and HDR image within the parameters of the TV and send it(uses a lot of bandwidth)

    As you can see, you won't be able to reproduce HDR content without fully supporting one or both standards because the TVs simple won't understand it. Nvidia presentation is about HDR in games which is something nice but no what I was looking for.

    Looking at the graph it seens that Nvidia supports rec2020 but only 1000 nits. It is in the middle of the 2 standards; HDR10 use rec2020 but at 10000 nits. and DV use 1000 nits but only supports P3.
     
  7. Gipsel

    Veteran

    Joined:
    Jan 4, 2010
    Messages:
    1,620
    Likes Received:
    264
    Location:
    Hamburg, Germany
    Is there anything to back that up?
    And is it possible for Maxwell or Pascal to freely mix warps from different kernels of the same context? Are there restrictions? Because different contexts are clearly not possible, from the look of it also not with Pascal.
     
    #907 Gipsel, May 16, 2016
    Last edited: May 16, 2016
  8. pixelio

    Newcomer

    Joined:
    Feb 17, 2014
    Messages:
    47
    Likes Received:
    75
    Location:
    Seattle, WA
    Sure, the CUDA forums have examples of Kepler and Maxwell GPUs that support more concurrent streams than there are multiprocessors.

    Streams are independent "kernel queues". Kernels are executed in grids of cooperating blocks of threads. An SM can only execute a block when there are enough resources available -- smem, registers, warps, etc.

    For example, a tiny two SMX GK208 Kepler GPU can support 16 concurrent streams. With the proper driver, a single SMX K1 mobile GPU supports 4 streams.
     
    Razor1, pharma and spworley like this.
  9. spworley

    Newcomer

    Joined:
    Apr 19, 2013
    Messages:
    146
    Likes Received:
    190
    Caveat: The full 10/12 bit decoder is probably only for Windows. The existing (10 bit) HEVC decoder on the GTX 950 is limited to 8 bit in Linux, mostly due to an unfortunate design limitation of the Linux VDPAU decoder interface. This affects AMD and Intel too.
     
    xpea and pharma like this.
  10. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    But that does not guarantee parallel execution, does it? You may schedule to 16 different queues and they are not blocking each other, but there is no guarantee that they are processed in parallel and not temporarily stall each other on available SMMs. Plus, we have observed weird disparities between compute via CUDA, and via DirectX before, so even if it should with CUDA (for which we do't have proof yet), that doesn't allow for a generic statement yet.
     
    elect and no-X like this.
  11. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    PS:
    Why I'm tempted to believe that rumor, is that this would allow Nvidia to perform allocation of SMEM, registers etc. for a single kernel in a single go and statically, with only warp initialization happening at run time. Seems like a great way to eliminate any chance of fragmentation of the RF and SMEM, so you can avoid any form of paging inside these, and possibly even optimize for a multi-tier layout of these.

    Why I'm less convinced about it, is the graphics pipeline, as that would also mean all the shared special function units (mainly the Raster Engine) would be unable to run if not at least a single SMM had a corresponding kernel enabled, so you wouldn't be able to put these units under permanent load. On the other hand, this probably allows to reuse parts of the polymorph engine rather than providing dedicated resources to each function or requiring arbitration, so who knows?
     
    nnunn likes this.
  12. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,171
    Location:
    La-la land
    Only slightly larger than good ole GK104, but a majorly higher sales price. Geforce 770 was upper midrange or lower high-end, however you want to look at it, and certainly did not retail for 600-700 USD. :p
     
  13. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,142
    Likes Received:
    1,830
    Location:
    Finland
    x mm^2 chip on 14nm is also vastly more expensive to make than x mm^2 chip on 28nm
     
    Razor1 likes this.
  14. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,171
    Location:
    La-la land
    Yes, but $700 is like twice the price the 770 retailed for AFAIR. IE, GP102 costs several hundred dollars more to make than GF104? Seems a tad much maybe.
     
  15. CSI PC

    Veteran Newcomer

    Joined:
    Sep 2, 2015
    Messages:
    2,050
    Likes Received:
    844
    Not sure this fits, but didn't NVIDIA show a difference in how it handles Parallel Recursion between Fermi (more crude and complex) and Kepler (separate streams) with Cuda examples?

    Cheers
     
  16. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    You mean the Dynamic Parallelism stuff? I thought so as well, but that's actually just sequential execution with an explicit yield.

    From https://devblogs.nvidia.com/parallelforall/cuda-dynamic-parallelism-api-principles/:
    So, no, this isn't confirming it either.
     
  17. Dr Evil

    Dr Evil Anas platyrhynchos
    Legend Veteran

    Joined:
    Jul 9, 2004
    Messages:
    5,766
    Likes Received:
    774
    Location:
    Finland
    770 was going for $399, but that was 14 months after the GK104 chip had been introduced and there were more high end products on top of it. GTX 680 was $499 four years ago. I would think that in summer 2017 you'll get a GP104 cheaper than at launch and you actually do get it now too, just not fully enabled, but 1070 is closer to the top than 770 was back in 2013.
     
  18. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    499
    Likes Received:
    177
    It's actually rather easy to write a program that runs two kernels on the same SM simultaneously. Your rumor is false.

    The thing that couldn't be done (at least until Pascal, not sure if this has changed) is to run a compute kernel and a graphics kernel concurrently on the same SM. But two different compute kernels can indeed run concurrently on the same SM.
     
    homerdog, pharma and Razor1 like this.
  19. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    337
    Likes Received:
    294
    This didn't change. It just can switch between either mode now without halting the GPU, and possibly switch without waiting for the SM to run dry, thanks to preemption.
    How? It's not like you could control the SM selection, nor does the API permit any pattern in which actual concurrent execution of multiple kernels would be required. So unless someone manages to achieve a measurable speedup by concurrently running two kernels with orthogonal per SM resource constraints (e.g. two kernels which intentionally exhaust RF and SMEM limit each), I still can't consider this confirmed.
     
  20. pixelio

    Newcomer

    Joined:
    Feb 17, 2014
    Messages:
    47
    Likes Received:
    75
    Location:
    Seattle, WA
    Below is a presentation from GTC2015. See pages 29, 30 and 32. The K1 is only one multiprocessor so any execution overlap of independent kernel grids is happening within the same multiprocessor.

    http://on-demand.gputechconf.com/gtc/2015/presentation/S5457-Paulius-Micikevicius.pdf

    Note that this presentation describes a really great use case where grids become so small that if you did not have concurrent execution the grids would conga line through an underutilized multiprocessor.
     
    pharma and silent_guy like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...