Nvidia Pascal Announcement

"dark silicon" as part of cooling solution? Cool idea!
It's done routinely. Often extra unconnected metal wire called "fill" is added to free space in interconnect layers mostly to keep polishing thickness consistent over the area of the circuit, but it's also useful for its thermal conductivity to keep one region from accumulating too much heat. This is mostly to reduce very local temperatures (both absolute and differential) which causes physical stress as different materials expand differently with temperature, leading to failure.
 
Why not wait a bit for actual prices before starting the round of complaints (about a product that you probably aren't going to buy anyway?)
No thats actually the reason, I've been planning for over half a year actually make a jump to Pascal.
 
I think he was talking about die size in regard to total package size of HBM2 vs HBM1.
HBM1 was what? ~40mm2.
HBM2 is supposedly more than twice the size of HBM1, ~90mm, if my memory is correct.
Is it ecc memory, or did you use parity bits?
 
Last edited:
I think he was talking about die size in regard to total package size of HBM2 vs HBM1.
HBM1 was what? ~40mm2.
HBM2 is supposedly more than twice the size of HBM1, ~90mm, if my memory is correct.
Correct, it 40mm2 for HBM1 and 92mm2 for HBM2.

Sendt fra min SM-G928F med Tapatalk
 
"HBM gen2 will physically be larger than HBM gen1". What is this "gen" thing you refer to?
That's the JEDEC spec, though some people have chosen to use the moniker HBM2.
Ye5ZChO.png

(CRTL+F -> search for HBM2 -> "no results were found")

To correct this/myself further: The spec does not really differentiate between 1st and 2nd gen. This is merely done by the manufacturers, where SK Hynix stated that their 2nd gen HBM will use larger packages than 1st-gen products. AFAIK, JEDEC only specifies stack height for A2-height of the W-Variation "very very thin profile" (sic!). But there are 7 stack heights (w/o Micro Bumps) specified as well, ranging from "die thin" (0,26 mm) to "very thin" (0,96 mm).
 
Last edited:
"Compute Preemption" on Pascal. Where are the limits to that, respectively what can be preempted?

Nvidia claims compute tasks can be preempted at an instruction level granularity. But NV also differentiates between compute and display tasks, and claims the preemption related benefits, including proper debugging using HW breakpoints, only for compute tasks/kernels.

Just an oversight, or does this mean that kernels dispatched from a draw call can not be subject to preemption and debugging for these is therefor limited as well?


Also: The "GigaThread Engine" is a complete black box once again.
 
Not an expert or an insider, but maybe non-compute things are not preemptible at a fine grain because that might involve state spill/fill from fixed function units (or at least, their queues) and that was a pain to do for whatever reason?
 
"Compute Preemption" on Pascal. Where are the limits to that, respectively what can be preempted?

Nvidia claims compute tasks can be preempted at an instruction level granularity. But NV also differentiates between compute and display tasks, and claims the preemption related benefits, including proper debugging using HW breakpoints, only for compute tasks/kernels.

Just an oversight, or does this mean that kernels dispatched from a draw call can not be subject to preemption and debugging for these is therefor limited as well?


Also: The "GigaThread Engine" is a complete black box once again.
Maybe I misunderstood but preemption is at the instruction boundaries means it is before the end of a draw call?
Cheers
 
Preemption is great but the ability to prioritize and run graphics and compute kernels concurrently is a more elegant solution to the problem. Wouldn't be surprised at all if they announce that capability at the GeForce launch.
 
Preemption is great but the ability to prioritize and run graphics and compute kernels concurrently is a more elegant solution to the problem. Wouldn't be surprised at all if they announce that capability at the GeForce launch.
NVIDIA has stated already that it's still based on preemption, so I doubt it
 
Though speculative and based on information derived from recent listings from a driver, this link mentions a GP102 possibly targeted towards the enthusiast class sometime early 2017. They mention 4096-4608 shader units on a 384-bit or 4096 bit GDDR5X- HBM2 interface on ~ 500mm² chip area . The performance target would be roughly 2x the GTX 980 ti levels.

http://www.3dcenter.org/news/wie-nvidias-geforce-1000-serie-womoeglich-aussieht
 
I linked it in the past with what The Register reported:
The Register said:
Software running on the P100 can be preempted on instruction boundaries, rather than at the end of a draw call. This means a thread can immediately give way to a higher priority thread, rather than waiting to the end of a potentially lengthy draw operation. This extra latency – the waiting for a call to end – can really mess up very time-sensitive applications, such as virtual reality headsets. A 5ms delay could lead to a missed Vsync and a visible glitch in the real-time rendering, which drives some people nuts.
http://www.theregister.co.uk/2016/04/06/nvidia_gtc_2016/
Although it does not fully answer Ext3h's point. and we need to see it in action.
Cheers
 
NVIDIA has stated already that it's still based on preemption, so I doubt it


No they haven't, and we have seen that it doesn't need preemption to do these operations, only in DX has this be an issue and only after certain amount of time.

And the only people that have stated its based on preemption, has been AMD, I think that was Hallock. Which doesn't seem to add up to the tests we have done here and when profiling different games with different API's.
 
I have noticed some call it compute pre-emption in the news.
So its not async compute in the AMD sense (I doubt any here was expecting that though) but more efficient than what Maxwell had, how it performs needs to be seen and I guess we also need to wait until more info is released.
Cheers
 
No they haven't, and we have seen that it doesn't need preemption to do these operations, only in DX has this be an issue and only after certain amount of time.

And the only people that have stated its based on preemption, has been AMD, I think that was Hallock. Which doesn't seem to add up to the tests we have done here and when profiling different games with different API's.

This is from GDC'15
jYb5JoK.jpg


I can dig the newer slide where they specify Pascal as "finer grained preemption" (as pointed by registers article linked above)
 
Back
Top