Exactly, if this is truly 67xx, I don't think they can or will simply yank NI core out and replace it with Evergreen shader core.
I can't reconcile this sentence...
The shader core per se is relatively easy, isn't it?
... with this sentence. I'm not sure what you're saying here.
Otherwise they might as well produce something like Cedar-like NI in 40nm to save some R&D.
RV740 has ALU level (burst fetch) and ROP-level (doubling of ROPs per MC) advances beyond any of the prior R7xx GPUs.
One thing I'm musing over is the peculiar dual-tessellator architecture of Evergreen. There's the old-style tessellator for compatibility with R600's tessellator, which is limited to a tessellation factor of 16 and then there's the new "more programmable" (which I can't see, frankly) tessellator that handles higher tessellation factors/D3D11.
My suspicion is that the D3D11-tessellator is a short-term kludge to get past the "missing" rasteriser/ALU architecture in Evergreen. Because stuff was cut, AMD had to retreat to an adjunct. What I'm wondering is if tessellation is meant to be kernel based, not fixed-function.
My problem with TS is that a lot of it is simply interpolation. A feature of Evergreen is that the interpolation for vertex attributes (Shader Pipe Interpolators) was deleted and became an ALU instruction. Why isn't TS using the same interpolation instruction?
If TS becomes a kernel (see the Export Shader kernel for an example of a "hidden" kernel in ATI) then this has ramifications on throughput - i.e. vastly higher than the current D3D11 TS, necessitating, in my view, an overhauled rasteriser and overhauled sequencer/SIMDs/ALUs.
Though I admit I haven't worked out what a TS kernel would look like - seems pretty fiddly.
Quite unlikely, Fermi-like cache doesn't really translate to real world performance,
How do you conclude that?
neither is Cypress cache-bound
How do you conclude that?
and GPGPU is not SI's forte anyway.
How do you conclude that?
Fact is, GPGPU is a major focus of D3D now, so one way or another ATI needs to catch up.
Like I said above, improving GPGPU-related performance will be highly unlikely on SI GPUs which is either a stop-over or the next 67x0, ie. 57x0 even removed DP capability.
I don't think it's wishful thinking to imagine that 1 year+ later than Evergreen, progress on all fronts would be made.
Clearly the disappearance of 32nm is potentially a big knock, perhaps causing feature loss as seen in Cypress. And I agree, we don't know if the next chip is just a minor refresh-style increment.
Anyway, you asked for suggestions on spending upto 20% extra die space - there it is. And 20% really isn't very much.
On the other hand, SI/NI might share some miracle-worker (MC/ROP) from Llano to drastically reduce or at least mitigate bandwidth requirement.
You mean like a Fermi-style cache hierarchy?
Otherwise even GDDR5+ won't do it on NI, and SI won't even have faster memory parts. At least 20% increase in real-life performance, which should be the minimal expectation 6-month from now, can't just come out of nowhere.
32nm's disappearance spoils most guesstimations here
Jawed