Intel Gen9 Skylake

also seems some Gen9 manuals missing:
first @ https://software.intel.com/en-us/articles/intel-graphics-developers-guides we can find gen9 compute arch guide but missing is
Gen9 graphics API dev guide note Gen8 graphics API dev guide is already there..
also @ https://01.org/linuxgraphics/documentation/hardware-specification-prms we can only find broadwell manuals.. seeing they get recently posted BDW manuals seems a long wait there..
finally reading:
http://www.intel.com/content/www/us...ktop-6th-gen-core-family-datasheet-vol-1.html
OpenGL 5.0-> Vulkan?-> also hope this implies OpenGL 4.5 full support + 2015 new ARB extensions since 4.5<5.0 :D
but the interesting meat is few lines later:
I can find things not mentioned anywhere else (I put on bold below) (i.e. on HD 530 reviews and IDF presentations) all on page 31:
seems they are exposing also extra GPU features in DirectX (D3D11?) via extensions.
I assume similar to how they exposed Pixelsync two years ago on D3D11 (please correct me if I'm wrong but anyway some things aren't even supported on D3D12 so perhaps they are D3D12 extensions)..
Render Target Reads (on OGL driver exposed via GL_EXT_shader_framebuffer_fetch) (see:https://gfxbench.com/device.jsp?benchmark=gfx31&os=Windows&api=gl&D=Intel(R)+HD+Graphics+530&testgroup=info)
Floating Point atomics (nice already exposed on NV GPUs via NVAPI ext..)
MSAA sample-indexing (more info please? is equal to AMD D3D11 ext? AMDDXextAPI.h SetSingleSampleRead(ID3D10Resource* pResource, BOOL singleSample) = 0; ?)
Fast Sampling (Coarse LOD) (is that equal to undocumented GL_INTEL_multi_rate_fragment_shader present? if not more info on what this extension may provide? related to https://software.intel.com/en-us/articles/coarse-pixel-shading ?)
Quilted Textures (more info, please?)
GPU Enqueue Kernels (is that like OpenCL 2.0 launch kernels from kernels/ CUDA dynamic parallelism but in DirectCompute?)
GPU Signals processing unit (interested to see more info on this..)

Also seems OpenCL has cl_khr_fp16 extension and coming to D3D via optinal cap bit but is coming to OpenGL in some way, as there is no GL_ARB_fp16?
As said lot of things remain unanswered to me.. hope Andrew Lauritzen can answer most if not all :rolleyes:
 
Thanks, so it looks like throughtput has been doubled compared to gen7.5 and I would assume the theoretical peak is now at 1 tri/clk. I'm guessing that wouldn't scale up with additional slices? i.e. GT4e won't be pushing 3 tri/clk?

One of the presentations say that in some SKUs the Unslice can clock higher than the slices to provide more geometry throughput and more bandwidth.
 
also seems some Gen9 manuals missing:
MSAA sample-indexing (more info please? is equal to AMD D3D11 ext? AMDDXextAPI.h SetSingleSampleRead(ID3D10Resource* pResource, BOOL singleSample) = 0; ?)
Bit different - this allows a pixel-rate shader to individually write separate sub-samples of a bound MSAA render-target. In current APIs this is only possible by running the whole shader at sample rate.

Fast Sampling (Coarse LOD) (is that equal to undocumented GL_INTEL_multi_rate_fragment_shader present? if not more info on what this extension may provide? related to https://software.intel.com/en-us/articles/coarse-pixel-shading ?)
Not related to course pixel shading or multi-rate shading. It's basically a texture sampler feature that allows the sampler to dynamically take the "fast path" if explicit LODs or derivatives are "close enough". Particularly useful for deferred texturing/shadowing, as normally if you use explicit gradients in those passes (as you have to) you'll take a slower path through the sampler. This allows you to recover the fast path dynamically on the pixels that are coherent.

Quilted Textures (more info, please?)
Don't have any good documentation to point you at but this is basically a method of stitching large textures together such that you can go beyond the regular 64k x 64k limit - particularly useful for sparse textures of course.

One of the presentations say that in some SKUs the Unslice can clock higher than the slices to provide more geometry throughput and more bandwidth.
Yes although this isn't relevant to the GT2 SKUs (i.e. the HD 530). More likely what you're seeing is the new "autostrip" stuff - i.e. previous architectures could have gotten to similar rates if you use triangle *strips*, but not on triangle *lists*. Skylake has hardware that allows it to attain similar rates with "strip-like" triangle lists.
 
thanks Andrew..
really interested about DX extension for GPU Enqueue Kernels..
also related to Floating Point atomics and Render Target Reads DirectX extensions are they coming soon (public not under NDA) or is only a possiblity to expose?
I say that because Haswell PixelSync and InstantAccess samples were ready before Haswell was buyable..
To finish seeing AMD AGS 3.0 released yesterday they have extensions for multidraw indirect and depth bounds test under D3D11 seems Intel GPUs are the unique desktop ones lacking depth bounds test feature could be nice to see implemented on upcoming Intel GPUs if not much work..
 
Bit different - this allows a pixel-rate shader to individually write separate sub-samples of a bound MSAA render-target. In current APIs this is only possible by running the whole shader at sample rate.
Sounds awesome. I will definitely find uses for this :)
really interested about DX extension for GPU Enqueue Kernels..
Thumbs up! This has been one of my top requests to get into DirectX for some years now.
 
Thanks so much for your input . Its really good understanding you have. Myself I a liitle disappointed intels is not showing a S chip with GT4e on a K 91 w. It looks like the only thing I can buy is an H chip. Which likely means I have to buy a notebook or all in one . I like the idea of powerful iGPU. But gaming rig without Dgpu is pretty hard to pass off as a performance PC. I feel that AMD/NV at 14/16 nm is going to be interesting . AMD has to Aim high as does NV . As neither knows how far the other will push the 14/16 nm arch. I feel when it comes to HBM 2 . That maybe just maybe NV will use the higher bandwidth HMC perhaps even using Intels logic layer. These are indeed interesting times. I would like to see Intels 10nm use QWfets. Intel with its 10nm delay may just do that . The others get to finfet and Intel moves to QWfet that's moving the goal post. That's if Cern doesn't make an error in splitting the God particle on sept 23
 
Myself I a liitle disappointed intels is not showing a S chip with GT4e on a K 91 w. It looks like the only thing I can buy is an H chip. Which likely means I have to buy a notebook or all in one .
I am really disappointed if a 72 EU high end desktop Skylake doesn't arrive later. As a rendering programmer, I would prefer to have a high end desktop CPU with high end integrated GPU all in one. It is hard to write and optimize DX12 explicit multiadapter (discrete + integrated) code without a chip like this. Also it would be much nicer to optimize rendering code for Intel GPUs using my main workstation. This kind of high end desktop CPU+GPU would lead to products that are better optimized for Intel GPUs.
 
I am really disappointed if a 72 EU high end desktop Skylake doesn't arrive later. As a rendering programmer, I would prefer to have a high end desktop CPU with high end integrated GPU all in one. It is hard to write and optimize DX12 explicit multiadapter (discrete + integrated) code without a chip like this. Also it would be much nicer to optimize rendering code for Intel GPUs using my main workstation. This kind of high end desktop CPU+GPU would lead to products that are better optimized for Intel GPUs.

It would also be very popular with consumers. Theres a market out there pining fo a high end upgrade that's actually worthwhile. The combo of high end skylake + L4 as well as a powerful igpu for multi adapter could be just that.
 
as a powerful igpu for multi adapter
Some do not even need multiadapter.
Until VR arrives in force, GT4e would probably be enough for my minimalistic gaming needs – I do not need high quality, I just want recent games to run at 30-60fps@1080p on low (maybe medium) quality settings.
 
What I would like to see is ability to output iGPU/dGPU graphics through either monitor output.

Like, my current mobo only has a HDMI 1.2 or whatever compatible output that maxes at 1080P, while my monitor is 1440P; if I could run on the iGPU while doing regular windows tasks (internet browsing, writing and whatnot), or even some light gaming maybe while still hooked up to my regular AMD GPU, that would be quite helpful.

We've had this on laptops for a while, but the only one who has done it cross-vendor AFAIK (IE, intel CPU, AMD/NV graphics) is friggin Apple...which doesn't help me! lol. So we could use some work on this front, I think; getting seamless graphics switching as a real actual thing implemented straight into the OS and graphics drivers.

Is there any efforts being exerted on making this happen at some point in the future?
 
What I would like to see is ability to output iGPU/dGPU graphics through either monitor output.
That already works in Win10, and in a far more robust way than how it works in most laptops. It mostly already worked in Win8.1. An application can separately enumerate which adapter to use for rendering regardless of which monitor the window is being displayed on and the compositor will do whatever work is needed. You can connect multiple displays to whatever combinations of outputs you want and applications are free to any/all GPUs for rendering.

https://twitter.com/AndrewLauritzen/status/636708414677192704
 
That already works in Win10, and in a far more robust way than how it works in most laptops.
Really? That sounds sweet, except when I tried enabling both GPUs in my system, the intel driver created a "virtual monitor" which could not be disabled from what I was able to tell, then windows put another, off-screen, desktop on that monitor which I could not access (but still lose my mouse pointer into since it attached to an edge of the screen), which randomly caused my monitor to display nothing at all but solid black when coming out of power save/turned on.

I wouldn't call that very robust... ;) Maybe it's just windows being quirky on my particular install, I dunno. I had audio pops and framerate hitches when gaming when using the default AMD driver installed during the win10 upgrade; maybe the intel driver could use a manual update too.

You can connect multiple displays to whatever combinations of outputs you want and applications are free to any/all GPUs for rendering.
One monitor for both GPUs is fine with me... ;)
 
That already works in Win10, and in a far more robust way than how it works in most laptops. It mostly already worked in Win8.1. An application can separately enumerate which adapter to use for rendering regardless of which monitor the window is being displayed on and the compositor will do whatever work is needed. You can connect multiple displays to whatever combinations of outputs you want and applications are free to any/all GPUs for rendering.

https://twitter.com/AndrewLauritzen/status/636708414677192704

So could we potentially be facing the pretty funny situation of people thinking there is something wrong with their new high end GPU because the game is running on their 520 without then even knowing? And is there a way for us to easily manually select which GPU a game/application runs on or is that completely down to the app?
 
Some do not even need multiadapter.
Until VR arrives in force, GT4e would probably be enough for my minimalistic gaming needs – I do not need high quality, I just want recent games to run at 30-60fps@1080p on low (maybe medium) quality settings.

I doubt that. Nobody is going to buy a 400+ dollar cpu that is going to offer console performance or less if they can buy a console for 400 bucks or less and have the comfort of knowing it will run all games for the next 5 years.

I see anybody remotely interested in playing games on pc either going for a high and cpu and high end gpu or or hardly noticeably slower i5 or i7 and spend the 100 ~ 200 bucks they save on a gpu. The latter is undoubtedly going to give you much better performance for the same money.
 
Except for casual gamers who need a PC for other purposes , for family PC, it is a reasonable compromise to make
 
Except for casual gamers who need a PC for other purposes , for family PC, it is a reasonable compromise to make

Exactly that group has no benefit from a cpu like this. This is going to be a high end cpu with a high end price. They are going to be paying like 200 bucks more for no meaningful amount of extra performance on the cpu side compared to those popular i5 models if you are a average gamer or pc user.

A 200 dollar gpu is going to get those people a lot more overall performance with the added benefit that they can upgrade the gpu after a couple of years while the cpu is probably still good enough. Try doing that with your igpu without also having to buy a new mainboard and maybe memory as well.

Doesn't make sense for the average desktop user. Laptops, NUC's, etc, systems where you can't easily integrate another chip, thats where you want to use your fast igpu. Not in a desktop, unless you want to absolute best performance possible.
 
Nobody is going to buy a 400+ dollar cpu that is going to offer console performance or less if they can buy a console for 400 bucks or less and have the comfort of knowing it will run all games for the next 5 years.
You want a decent PC if you have hobbies like photography, painting, home video editing, etc. Photoshop, Lightroom, Premiere and transcoding software demand quite a lot. Not everyone is a hardcore gamer, but wants to game occasionally. A good CPU and lots of RAM are important for many people. A fast integrated GPU is nice, since it doesn't cost much extra and doesn't require a big case and big power supply. The big EDRAM L4 cache is also speeds up other things than gaming.

My original point was about the professional usage. High end integrated GPU with big L4 is important for some special fields, such as graphics programming. It doesn't have to be a high clocked 4 core i7. I would actually prefer a low clocked 8 core Xeon with 72 EU GPU + EDRAM. It would both compile fast and be useful for integrated GPU shader optimization.
 
Last edited:
Back
Top