AVX-512 is rumored to be supported by Skylake, scheduled for release next year. Not in 10 or 7 years from now! So you have to compare that quad-core with today's or next year's integrated graphics, and then you'll realize that it is indeed a big deal to have that amount of processing power in an otherwise modest CPU. Over the next ten years we still have AVX-1024 and higher core counts to come, so no need to worry about the raw computing power. The trickier part is power consumption, but Haswell and Broadwell already achieve impressive FLOPS/Watt, so I'm sure AVX-512 will close the remaining gap.You talk about 10 years in the future. And about 1TFLOPS being a big deal. Let's look 7 years in the past, and I see a GT200 breaking the 1TFLOPS barrier. On a worse process. Without full custom design.
TEX:ALU ratios have been steadily going down. Also, several games now do part of the advanced filtering in the shaders, without much impact on performance. That's because of the Memory Wall: you can have lots of arithmetic operations per datum you load from RAM. In the case of the GeForce Titan, you can have over 60 operations on every floating-point value. Of course caches and compression can improve the effective bandwidth, but the fact remains that it's really hard to get arithmetic limited. And the Memory Wall is only getting worse over time. So I'm not worried about CPUs lacking dedicated sampler units. The only thing that makes a big difference is parallel gather support, and that's an AVX-512 feature. The low latency of a direct gather, instead of a full texture unit, is an added advantage that helps keep the thread count low and locality high, to defeat the Memory Wall.With dedicated resources for texture and ROP, something your Intel processor will have to do with the general FLOPS pool.
Likewise ROPs are not a big deal. Some mobile (!) chips don't even have them, and doing the blending in the shaders offers new capabilities. Fixed-function hardware is dissolving into the programmable units, and it brings us another step closer to unification.
I didn't say it strengthens the unification theory. I'm saying it's not relevant in the long run. Yes, the integrated GPU has grown as a result of retina displays, but AVX-512 doubles the throughput of the CPU cores. So the CPU side isn't given any less attention. Also, there's dual-core and quad-core Iris chips, and there's dual-core and quad-core non-Iris chips as well. So cherry picking doesn't prove anything. The majority of laptops don't have a retina display yet, and the transition is happening quite slowly. And even though I'm sure retina will become standard everywhere, it's only a relatively minor increase in comparison to the increase in parallelism from Moore's Law. From about 640x480 to the resolutions we have today, the GPU has had the opportunity to outgrow the CPU many times over, but instead we observe that they're still pretty much in the same ballpark, due to the CPU growing its transistor count aggressively as well. So you're grasping at straws, and this is the last one.You talk about a discrete GPU not being able to drive a retina display as if that somehow strengthens the argument about unification. I see it as exactly the reason why Intel is increasing GPU area in their dies instead of the other way around.
That's only anecdotal. It's a very expensive laptop, and does not represent the average. For what it's worth, my other laptop, which is also brand new, has a resolution of 1600x900. So it's not even Full HD.And 1080p is supposed to be the remain gold standard, but at the same time you're talking retina display on your laptop.
Five years ago, we still didn't have CPUs and GPUs on the same die. Five years ago, we only had 128-bit SSE. Clearly we're talking about things very differently today. At this pace of change, unification in 10 years isn't a far fetched idea at all.It would be very interesting to so what you wrote 5 years ago about this upcoming unification, because I'm pretty sure that 5 years from now, we'll still be talking GPUs the way we do right now.
The time it took for CPU vectors to become wider than 128-bit was an artifact of legacy programming models, not of any technical impossibility. 512-bit is almost here, and while unification will probably require AVX-1024, scaling to that width should not be a major issue within this 10 year span. Also, this catching up in SIMD width only needs to happen once, and doesn't demand much from Moore's Law. If GPUs can do it, CPUs can too.By carving out a very narrow part of the market, tailored to your argument, you can make everything work. Hell, business desktops have been able to get by without GPUs worth the name since forever. But unification? If it took more than 7 years for a CPU to barely catch up with a GPU, then the slowdown of Moore's Law is more likely to stall the march towards unification than to further it.