They'll be there. Just not mentioned here.Interesting that there was no mention of dedicated transcendental units, guess those instructions will run on the ALUs as well.
The TMUs ("Filter") are mentioned in this slide:Iam wondering if the TMU-s are still in CU-s Not a single slide mentioned them.
I think it should be (a), else it would break dx11 spec.
If a kernel declares a workgroup needs only 16 kB of it, you can run 4 groups without breaking any spec.If a spec requires a certain amount of local memory, then not exposing that amount is spec breaking.
If a kernel declares a workgroup needs only 16 kB of it, you can run 4 groups without breaking any spec.
I think it's one instruction from four threads over four cycles. The batch/workgroup size is still 64.
If instructions are sourced from 4 different threads, they might as well be from 4 different IP's each. I think the organization is similar to fermi which dual issues from 2 warps. Here it quad issues from 4 different wavefronts.
You can run at least two.What about kernels written assuming 32kb local mem (dx11)?
What do you mean? Isn't that how it works now?!? Does not matter if it is nvidia or AMD, R700/Evergreen/NI or G80/Fermi. It always works in the same way.While that would work, I don't think any IHV will try such a solution.
Why not? We know the LDS usage at compilation time, so we can easily manage LDS resources at dispatch time either in the GPU or in the driver.While that would work, I don't think any IHV will try such a solution.
What do you mean? Isn't that how it works now?!? Does not matter if it is nvidia or AMD, R700/Evergreen/NI or G80/Fermi. It always works in the same way.
http://twitter.com/#!/DKrwtDavid Kanter said:don't know about schedule, but probably 28nm so late this year maybe. VLIW4 was a small change, but a precursor to the new uarch.
Why not? We know the LDS usage at compilation time, so we can easily manage LDS resources at dispatch time either in the GPU or in the driver.
If a kernel declares 32KiB of LDS usage, then you would only get one wavefront per SIMD, but if you only used 1 KiB of LDS then we could schedule up to 32 wavefronts per SIMD.