Intel Gen9 Skylake

oscarbg · Aug 26, 2015

also seems some Gen9 manuals missing:
first @ https://software.intel.com/en-us/articles/intel-graphics-developers-guides we can find gen9 compute arch guide but missing is
Gen9 graphics API dev guide note Gen8 graphics API dev guide is already there..
also @ https://01.org/linuxgraphics/documentation/hardware-specification-prms we can only find broadwell manuals.. seeing they get recently posted BDW manuals seems a long wait there..
finally reading:
http://www.intel.com/content/www/us...ktop-6th-gen-core-family-datasheet-vol-1.html
OpenGL 5.0-> Vulkan?-> also hope this implies OpenGL 4.5 full support + 2015 new ARB extensions since 4.5<5.0

but the interesting meat is few lines later:
I can find things not mentioned anywhere else (I put on bold below) (i.e. on HD 530 reviews and IDF presentations) all on page 31:
seems they are exposing also extra GPU features in DirectX (D3D11?) via extensions.
I assume similar to how they exposed Pixelsync two years ago on D3D11 (please correct me if I'm wrong but anyway some things aren't even supported on D3D12 so perhaps they are D3D12 extensions)..
Render Target Reads (on OGL driver exposed via GL_EXT_shader_framebuffer_fetch) (see:https://gfxbench.com/device.jsp?benchmark=gfx31&os=Windows&api=gl&D=Intel(R)+HD+Graphics+530&testgroup=info)
Floating Point atomics (nice already exposed on NV GPUs via NVAPI ext..)
MSAA sample-indexing (more info please? is equal to AMD D3D11 ext? AMDDXextAPI.h SetSingleSampleRead(ID3D10Resource* pResource, BOOL singleSample) = 0; ?)
Fast Sampling (Coarse LOD) (is that equal to undocumented GL_INTEL_multi_rate_fragment_shader present? if not more info on what this extension may provide? related to https://software.intel.com/en-us/articles/coarse-pixel-shading ?)
Quilted Textures (more info, please?)
GPU Enqueue Kernels (is that like OpenCL 2.0 launch kernels from kernels/ CUDA dynamic parallelism but in DirectCompute?)
GPU Signals processing unit (interested to see more info on this..)

Also seems OpenCL has cl_khr_fp16 extension and coming to D3D via optinal cap bit but is coming to OpenGL in some way, as there is no GL_ARB_fp16?
As said lot of things remain unanswered to me.. hope Andrew Lauritzen can answer most if not all

DavidC · Aug 26, 2015

pjbliverpool said:
Thanks, so it looks like throughtput has been doubled compared to gen7.5 and I would assume the theoretical peak is now at 1 tri/clk. I'm guessing that wouldn't scale up with additional slices? i.e. GT4e won't be pushing 3 tri/clk?

One of the presentations say that in some SKUs the Unslice can clock higher than the slices to provide more geometry throughput and more bandwidth.

Andrew Lauritzen · Aug 26, 2015

oscarbg said:
also seems some Gen9 manuals missing:
MSAA sample-indexing (more info please? is equal to AMD D3D11 ext? AMDDXextAPI.h SetSingleSampleRead(ID3D10Resource* pResource, BOOL singleSample) = 0; ?)

Bit different - this allows a pixel-rate shader to individually write separate sub-samples of a bound MSAA render-target. In current APIs this is only possible by running the whole shader at sample rate.

oscarbg said:
Fast Sampling (Coarse LOD) (is that equal to undocumented GL_INTEL_multi_rate_fragment_shader present? if not more info on what this extension may provide? related to https://software.intel.com/en-us/articles/coarse-pixel-shading ?)

Not related to course pixel shading or multi-rate shading. It's basically a texture sampler feature that allows the sampler to dynamically take the "fast path" if explicit LODs or derivatives are "close enough". Particularly useful for deferred texturing/shadowing, as normally if you use explicit gradients in those passes (as you have to) you'll take a slower path through the sampler. This allows you to recover the fast path dynamically on the pixels that are coherent.

oscarbg said:
Quilted Textures (more info, please?)

Don't have any good documentation to point you at but this is basically a method of stitching large textures together such that you can go beyond the regular 64k x 64k limit - particularly useful for sparse textures of course.

DavidC said:
One of the presentations say that in some SKUs the Unslice can clock higher than the slices to provide more geometry throughput and more bandwidth.

Yes although this isn't relevant to the GT2 SKUs (i.e. the HD 530). More likely what you're seeing is the new "autostrip" stuff - i.e. previous architectures could have gotten to similar rates if you use triangle *strips*, but not on triangle *lists*. Skylake has hardware that allows it to attain similar rates with "strip-like" triangle lists.

oscarbg · Aug 26, 2015

thanks Andrew..
really interested about DX extension for GPU Enqueue Kernels..
also related to Floating Point atomics and Render Target Reads DirectX extensions are they coming soon (public not under NDA) or is only a possiblity to expose?
I say that because Haswell PixelSync and InstantAccess samples were ready before Haswell was buyable..
To finish seeing AMD AGS 3.0 released yesterday they have extensions for multidraw indirect and depth bounds test under D3D11 seems Intel GPUs are the unique desktop ones lacking depth bounds test feature could be nice to see implemented on upcoming Intel GPUs if not much work..

Andrew Lauritzen · Aug 26, 2015

Audio from our talk last week (IDF) is up now as well: http://intelstudios.edgesuite.net/idf/2015/sf/aep/GVCS004/GVCS004.html
There were some good questions at the end (including some from folks who visit B3D), but unfortunately it looks like they cut those out (probably legalities). Do feel free to ask any other questions here and I'll try to answer to the best of my ability.

sebbbi · Aug 27, 2015

Andrew Lauritzen said:
Bit different - this allows a pixel-rate shader to individually write separate sub-samples of a bound MSAA render-target. In current APIs this is only possible by running the whole shader at sample rate.

Sounds awesome. I will definitely find uses for this

oscarbg said:
really interested about DX extension for GPU Enqueue Kernels..

Thumbs up! This has been one of my top requests to get into DirectX for some years now.

Andrew Lauritzen · Aug 27, 2015

I should note one other thing Gen9 can do because I know it will make sebbbi happy: change the tiled resource mappings via GPU commands.

sebbbi · Aug 27, 2015

Andrew Lauritzen said:
I should note one other thing Gen9 can do because I know it will make sebbbi happy: change the tiled resource mappings via GPU commands.

YEAH

Turtle 1 · Aug 29, 2015

sebbbi said:
YEAH

Thanks so much for your input . Its really good understanding you have. Myself I a liitle disappointed intels is not showing a S chip with GT4e on a K 91 w. It looks like the only thing I can buy is an H chip. Which likely means I have to buy a notebook or all in one . I like the idea of powerful iGPU. But gaming rig without Dgpu is pretty hard to pass off as a performance PC. I feel that AMD/NV at 14/16 nm is going to be interesting . AMD has to Aim high as does NV . As neither knows how far the other will push the 14/16 nm arch. I feel when it comes to HBM 2 . That maybe just maybe NV will use the higher bandwidth HMC perhaps even using Intels logic layer. These are indeed interesting times. I would like to see Intels 10nm use QWfets. Intel with its 10nm delay may just do that . The others get to finfet and Intel moves to QWfet that's moving the goal post. That's if Cern doesn't make an error in splitting the God particle on sept 23

sebbbi · Aug 30, 2015

Turtle 1 said:
Myself I a liitle disappointed intels is not showing a S chip with GT4e on a K 91 w. It looks like the only thing I can buy is an H chip. Which likely means I have to buy a notebook or all in one .

I am really disappointed if a 72 EU high end desktop Skylake doesn't arrive later. As a rendering programmer, I would prefer to have a high end desktop CPU with high end integrated GPU all in one. It is hard to write and optimize DX12 explicit multiadapter (discrete + integrated) code without a chip like this. Also it would be much nicer to optimize rendering code for Intel GPUs using my main workstation. This kind of high end desktop CPU+GPU would lead to products that are better optimized for Intel GPUs.

pjbliverpool · Aug 30, 2015

sebbbi said:
I am really disappointed if a 72 EU high end desktop Skylake doesn't arrive later. As a rendering programmer, I would prefer to have a high end desktop CPU with high end integrated GPU all in one. It is hard to write and optimize DX12 explicit multiadapter (discrete + integrated) code without a chip like this. Also it would be much nicer to optimize rendering code for Intel GPUs using my main workstation. This kind of high end desktop CPU+GPU would lead to products that are better optimized for Intel GPUs.

It would also be very popular with consumers. Theres a market out there pining fo a high end upgrade that's actually worthwhile. The combo of high end skylake + L4 as well as a powerful igpu for multi adapter could be just that.

Kaarlisk · Aug 30, 2015

pjbliverpool said:
as a powerful igpu for multi adapter

Some do not even need multiadapter.
Until VR arrives in force, GT4e would probably be enough for my minimalistic gaming needs – I do not need high quality, I just want recent games to run at 30-60fps@1080p on low (maybe medium) quality settings.

Grall · Aug 30, 2015

What I would like to see is ability to output iGPU/dGPU graphics through either monitor output.

Like, my current mobo only has a HDMI 1.2 or whatever compatible output that maxes at 1080P, while my monitor is 1440P; if I could run on the iGPU while doing regular windows tasks (internet browsing, writing and whatnot), or even some light gaming maybe while still hooked up to my regular AMD GPU, that would be quite helpful.

We've had this on laptops for a while, but the only one who has done it cross-vendor AFAIK (IE, intel CPU, AMD/NV graphics) is friggin Apple...which doesn't help me! lol. So we could use some work on this front, I think; getting seamless graphics switching as a real actual thing implemented straight into the OS and graphics drivers.

Is there any efforts being exerted on making this happen at some point in the future?

Andrew Lauritzen · Aug 30, 2015

Grall said:
What I would like to see is ability to output iGPU/dGPU graphics through either monitor output.

That already works in Win10, and in a far more robust way than how it works in most laptops. It mostly already worked in Win8.1. An application can separately enumerate which adapter to use for rendering regardless of which monitor the window is being displayed on and the compositor will do whatever work is needed. You can connect multiple displays to whatever combinations of outputs you want and applications are free to any/all GPUs for rendering.

https://twitter.com/AndrewLauritzen/status/636708414677192704

Grall · Aug 31, 2015

Andrew Lauritzen said:
That already works in Win10, and in a far more robust way than how it works in most laptops.

Really? That sounds sweet, except when I tried enabling both GPUs in my system, the intel driver created a "virtual monitor" which could not be disabled from what I was able to tell, then windows put another, off-screen, desktop on that monitor which I could not access (but still lose my mouse pointer into since it attached to an edge of the screen), which randomly caused my monitor to display nothing at all but solid black when coming out of power save/turned on.

I wouldn't call that very robust...

Maybe it's just windows being quirky on my particular install, I dunno. I had audio pops and framerate hitches when gaming when using the default AMD driver installed during the win10 upgrade; maybe the intel driver could use a manual update too.

You can connect multiple displays to whatever combinations of outputs you want and applications are free to any/all GPUs for rendering.

One monitor for both GPUs is fine with me...

pjbliverpool · Aug 31, 2015

Andrew Lauritzen said:
That already works in Win10, and in a far more robust way than how it works in most laptops. It mostly already worked in Win8.1. An application can separately enumerate which adapter to use for rendering regardless of which monitor the window is being displayed on and the compositor will do whatever work is needed. You can connect multiple displays to whatever combinations of outputs you want and applications are free to any/all GPUs for rendering.

https://twitter.com/AndrewLauritzen/status/636708414677192704

So could we potentially be facing the pretty funny situation of people thinking there is something wrong with their new high end GPU because the game is running on their 520 without then even knowing? And is there a way for us to easily manually select which GPU a game/application runs on or is that completely down to the app?

tongue_of_colicab · Aug 31, 2015

Kaarlisk said:
Some do not even need multiadapter.
Until VR arrives in force, GT4e would probably be enough for my minimalistic gaming needs – I do not need high quality, I just want recent games to run at 30-60fps@1080p on low (maybe medium) quality settings.

I doubt that. Nobody is going to buy a 400+ dollar cpu that is going to offer console performance or less if they can buy a console for 400 bucks or less and have the comfort of knowing it will run all games for the next 5 years.

I see anybody remotely interested in playing games on pc either going for a high and cpu and high end gpu or or hardly noticeably slower i5 or i7 and spend the 100 ~ 200 bucks they save on a gpu. The latter is undoubtedly going to give you much better performance for the same money.

entity279 · Aug 31, 2015

Except for casual gamers who need a PC for other purposes , for family PC, it is a reasonable compromise to make

tongue_of_colicab · Aug 31, 2015

entity279 said:
Except for casual gamers who need a PC for other purposes , for family PC, it is a reasonable compromise to make

Exactly that group has no benefit from a cpu like this. This is going to be a high end cpu with a high end price. They are going to be paying like 200 bucks more for no meaningful amount of extra performance on the cpu side compared to those popular i5 models if you are a average gamer or pc user.

A 200 dollar gpu is going to get those people a lot more overall performance with the added benefit that they can upgrade the gpu after a couple of years while the cpu is probably still good enough. Try doing that with your igpu without also having to buy a new mainboard and maybe memory as well.

Doesn't make sense for the average desktop user. Laptops, NUC's, etc, systems where you can't easily integrate another chip, thats where you want to use your fast igpu. Not in a desktop, unless you want to absolute best performance possible.

sebbbi · Aug 31, 2015

tongue_of_colicab said:
Nobody is going to buy a 400+ dollar cpu that is going to offer console performance or less if they can buy a console for 400 bucks or less and have the comfort of knowing it will run all games for the next 5 years.

You want a decent PC if you have hobbies like photography, painting, home video editing, etc. Photoshop, Lightroom, Premiere and transcoding software demand quite a lot. Not everyone is a hardcore gamer, but wants to game occasionally. A good CPU and lots of RAM are important for many people. A fast integrated GPU is nice, since it doesn't cost much extra and doesn't require a big case and big power supply. The big EDRAM L4 cache is also speeds up other things than gaming.

My original point was about the professional usage. High end integrated GPU with big L4 is important for some special fields, such as graphics programming. It doesn't have to be a high clocked 4 core i7. I would actually prefer a low clocked 8 core Xeon with 72 EU GPU + EDRAM. It would both compile fast and be useful for integrated GPU shader optimization.

Intel Gen9 Skylake

oscarbg

DavidC

Andrew Lauritzen

Moderator

oscarbg

Andrew Lauritzen

Moderator

sebbbi

Andrew Lauritzen

Moderator

sebbbi

Turtle 1

sebbbi

pjbliverpool

B3D Scallywag

Kaarlisk

Grall

Invisible Member

Andrew Lauritzen

Moderator

Grall

Invisible Member

pjbliverpool

B3D Scallywag

tongue_of_colicab

entity279

tongue_of_colicab

sebbbi

Similar threads