Intel Gen9 Skylake

Yes, and alpha blend of most formats is also full rate on SKL (ex. 8ppc/slice for 32bpp). Texturing is 12 bilinear/clk/slice (like Broadwell).

More slides from this morning (I have no idea why this link is so wacky and I hope it works):

https://hubb.blob.core.windows.net/e5888822-986f-45f5-b1d7-08f96e618a7b-published/54f4f27e-62d8-4b7b-8364-fa8f110b1664/GVCS004 - SF15_GVCS004_100f.pdf?sv=2014-02-14&sr=c&sig=Cv7l/gyeEHCeyeBY+26YNU+bhh2HgcazoGBTkobMU10=&se=2015-08-21T18:15:08Z&sp=rwd
Fully bindless architecture + all the FL 12_1 goodies + tier 3 conservative raster :)

Can't wait to try the 72 EU + EDRAM part.

I especially liked this:
Recommend applications switch to bindless style
- Use a single descriptor table that maps the entire heap
- Store binding indices with other material constants (cbuffers, root constants)
- Enables new capabilities: all resources are accessible without CPU intervention
 
Last edited:
Does anyone know what the geometry throughput of Gen9 is? If I'm not mistaken Hawsell was one triangle/2clk? Is it the same for Broadwell and Skylake, or higher?
 
This is the kinda GPU hardware feature support that I wish to see as soon as possible in a dedicated GPU lineup (with HBM2, standard swizzle and cross sharing node tier 3 : D ...yes, dream hardware to do dirty things with pointers and vram ).

Could Intel Gen9 iGPUs support HDR ASTC textures too?

What's the uncertainty regions precision of conservative rasterization, 1/256 or 1/512?
 
Is there a listing for the tier levels for feature types like conservative rasterization?
Is the tier Gen 9 has based on having the ability to do things like inner and outer conservative raster, and the various other features that round out the functionality?
 
This page says that Tier 3 conservative rasterization (which Skylake has, right?) supports 1/512 uncertainty regions.
Yeah the page is out of date. It's actually something we will likely go in a slightly different direction on in terms of "tightening it up", as it turns out 1/512 (i.e. rounding) is not actually a sufficient condition to get pixel exact results between different implementations, which is really the ultimate goal. In practice any implementation that doesn't skip entire subpixels (i.e. 1/256) is equivalent until we can tighten a few other things, hence the Tier 3 requirements.

Is the tier Gen 9 has based on having the ability to do things like inner and outer conservative raster, and the various other features that round out the functionality?
Yes, the main advantages being the inner coverage flag and the fact that it is truly conservative (does not cull post-snap degenerates).
 
Yeah the page is out of date. It's actually something we will likely go in a slightly different direction on in terms of "tightening it up", as it turns out 1/512 (i.e. rounding) is not actually a sufficient condition to get pixel exact results between different implementations, which is really the ultimate goal. In practice any implementation that doesn't skip entire subpixels (i.e. 1/256) is equivalent until we can tighten a few other things, hence the Tier 3 requirements.


Yes, the main advantages being the inner coverage flag and the fact that it is truly conservative (does not cull post-snap degenerates).
So what happen to those GPUs architecture planned to have a 1/512 pixel of uncertainty region? (if there were any). Sometimes MSDN is quite cryptic...

edit:

MaxGPUVirtualAddressBitsPerResource

Don't use this field; instead, use the D3D12_FEATURE_DATA_GPU_VIRTUAL_ADDRESS_SUPPORT query (a structure with aMaxGPUVirtualAddressBitsPerResource member), which is more accurate.

lol, looks like no-one wanted to fully up-to-date the SDK :p
 
So what happen to those GPUs architecture planned to have a 1/512 pixel of uncertainty region? (if there were any).
Implementations are free to still have "narrower" regions where it can determine conclusively in a conservative manner, that's just not a sufficient condition for pixel exact results. Thus further tightening of the spec (via Tiers or otherwise) will likely have slightly different constraints than just the size of the uncertainty region. None of this is really a big deal until we can get to pixel-exact results between all implementations (arguably not super-important with CR anyways, but still a good goal) :)
 
Does anyone know what the geometry throughput of Gen9 is? If I'm not mistaken Hawsell was one triangle/2clk? Is it the same for Broadwell and Skylake, or higher?


On my i7-6700k HD 530 1.15 Ghz I get 99 fps (792 Mtri/s) in Cap Viewer HW Geometry Instancing, i5-4670 HD 4600 1.2 Ghz did 47 fps (376 Mtri/s).
 
On my i7-6700k HD 530 1.15 Ghz I get 99 fps (792 Mtri/s) in Cap Viewer HW Geometry Instancing, i5-4670 HD 4600 1.2 Ghz did 47 fps (376 Mtri/s).

Thanks, so it looks like throughtput has been doubled compared to gen7.5 and I would assume the theoretical peak is now at 1 tri/clk. I'm guessing that wouldn't scale up with additional slices? i.e. GT4e won't be pushing 3 tri/clk?
 
That's why I saw up to 1.1 Ghz on default, I thought this is a bios bug. So it's basically 1.1 Ghz for the fastest GT2 SKU.
 
That's why I saw up to 1.1 Ghz on default, I thought this is a bios bug. So it's basically 1.1 Ghz for the fastest GT2 SKU.
And what is more interesting, if a second slice becomes active, the maximum frequency is lowered even more.
If it was just no slices active/any slices active, I would have imagined that the unslice needs less voltage to run at the maximum clock.
 
Back
Top