My initial statement wasn't in response to a claim you made. If a one-sentence reply is all that is made to contradict my claim, I don't find it unreasonable to assume you are taking the same position.
Also note that discrete graphics cards are slowly becoming a niche market.
Aside from certain advantages such as PCB-mounted high speed memory, the discrete/integrated dichotomy is almost orthogonal. With the likely advance in memory stacking and 2.5D/3D integration coming at some point in the future, the GPU add-in board could go away. My argument on the divergent needs of silicon targeting one workload or the other would not change based on location.
People who do require such performance level are also likely in the same market for a 24-core CPU several years from now. So let's be very clear about what segment and time frame we're talking about. It's obvious that the CPU will unify with the iGPU first, before core counts go up. Something like an 8-core successor to Haswell with no iGPU can have plenty of generic computing power for mainstream graphics needs and many other purposes (including new ones).
I was specific on the segment given: consumer-level real time graphics.
The time frame I gave for forseable could have been tightened down to indicate that it was based on data that could be pulled from publically available process roadmaps from the major foundries and Intel, along with any product roadmaps--although this detail peters out earlier. That's about the 14nm node or equivalent for the foundries, maybe one more for Intel.
A user looking to replace their Radeon 7970 or GeForce 680 on a quad or hex core system is not going to be in the same market for a 24-core CPU.
I'm not sure what you mean by a Haswell successor. Haswell will most likely have SKUs that can max out consumer TDPs with 4 cores, and its immediate successor isn't going to make six times as many cores more palatable. Go much further and it's less likely to be a successor than a replacement.
Please don't compare a mobile GPU against a desktop CPU. Many discrete graphics cards are bigger power hogs than the CPU, even at 4 GHz. Mobile Haswell CPUs will consume as low as 10 Watt (and that's CPU+GPU).
That's a specious argument based on a superficial sampling of top-end GPUs not designed for NTV operation. Those GPUs burn power, but it is expected that they can accomplish far more graphically than the 4 GHz CPU, and they do.
To then compare discrete desktop product to an ultraportable platform is pointless, and ignores that Haswell's portable variant actually has more GPU as a fraction of areas specifically because having more low-clocked silicon saves power.
So exactly what operating parameters do you believe to be "far" closer to what is necessary for near-threshold operation on a GPU versus a CPU?
Their clock speeds are far lower and architecturally they tend to favor simpler pipelines and an economy in logic implementation. Their processing engines are closer to an original Pentium than a Haswell. Some mobile GPUs can operate in the hundreds of MHz, which is much closer than a multi-GHz processor to the low ceiling NTV puts on switching speeds.
NTV adds area and complexity costs, and it becomes a negative once it approaches regular speeds and voltages.
Peak clock frequency is affected by pipeline length but otherwise it seems to me that a CPU is just as close to being able to operate at near-threshold voltage than a GPU.
The switch speeds ceiling allowed by NTV would require a very long pipeline, assuming that an acceptable FO4 per stage is reachable with a pipeline specified to run NTV and at 4 GHz.
Actually that Pentium (a 4 stage architecture) was able to run at up to 915 MHz at 1.2 Volt, and the logic side was still operational at 0.28 Volt. So I don't see any reason to assume that a GPU would be "far" closer to NVT operation than any CPU. The required design changes are the same for both.
Its power efficiency curve is not as interesting at 1.2V, and we see the frequency curve just about stall above .8V. It's a small and ancient core that's burning power at the upper end of its range that can be matched by more modern designs with more performance.
Configuring the chip for NTV requires trade-offs against high-speed operation, and forcing it to those speeds actually makes it less efficient or less manufacturable.
Yes, but striving for this optimal performance/Watt completely obliterates performance/dollar.
The glut is in transistor counts and the sheer number of die the industry can produce to service a slower-growing global demand. Intel's already idling fabs at 22nm due to softness in demand. There is more flexibility in terms of transistor count and area, but very little for power going forward.
Hence outside of ultra-low performance niche devices that need to run on harvested energy, the only practical use is for standby operation, still requiring it to be able to run at a relatively high frequency during peak usage, to be commercially viable.
Intel's not getting funding from the US government on NTV vector permute engines for the sake of harvested energy computing. The power constraints for HPC at the exascale level are immense. Haswell's low-wattage variant is pushing further towards broad areas of low-speed logic as a power/performance tradeoff.
No. This is exactly the chicken-and-egg issue I mentioned. Back when 640 kB was enough for everyone, there was no "compelling need" for a mobile phone capable of running Angry Birds.
Back then mobile phones bricks, and a desktop tower couldn't have run Angry Birds.
There was no compelling need for the physically impossible, or at least no more than any other thing requiring unicorns.
You don't miss what you never had. Likewise, today there appears to be a low demand for more cores, but that's only because of a lack of software, which is in turn caused by the huge challenges of multi-core development. It's not due to a lack of task parallelism, nor a lack of desire for higher performance itself. People still want CPUs with higher single-threaded performance. TSX will no doubt be a game-changer for multi-core by simplifying things for developers and making it more efficient at the same time.
There is a fundamental shift in the dynamics of the market, from the outset of the IBM-compatible era.
Until recently, the PC had the anomolous benefit of being a business, media, and personal use portal. It was an open and fragmented era where creative, commercial, and individual use flexibility and capabilities were satisfied and funded by the same pool of silicon and the same pool of dollars.
This is not the same era.
The drivers for creative computing or scientific computing are no longer the same as consumer computing, or the same as business computing or enterprise system computing.
It used to be that engineering and revenue went into and came from this one big pool where all stakeholders could benefit from the PC chip as a disruptive technology.
If any sector stagnated, there were other needs or other customers who wanted more, and their contribution pushed the whole forward. The marginal utility of the next big thing drove rapid upgrade cycles across the whole domain.
The market trends now are for a fragmentation of a mature platform, one that is no longer disruptive but mundane and plodding.
For various reasons, we see spending going away from the single clunky box or merchant chip that does everything inconveniently for the consumer.
The consumer market is at least in part regressing, because silicon integration has advanced so far that people now have portable devices that can do just enough of the job of that clunky box that does everything, just not very prettily. The new platform is an inflexible portal for consumption, locked down, and hostile to creating content or processing it. It doesn't need to last, and it is better the more disposable it becomes.
Their money is not going to bring about a need for 24-core PC chips. Their devices do not necessarily want cloud servers running on those either. The supercomputers want more than those chips can provide.
There is still a need for pushing the envelope here, but it is not universally beneficial, so it is not going to be the product priced for the consumer.
Haswell consumes 10x less power at low frequency and voltage.
Will you be able to test at some point in the future what FPS is acheivable for some games using Swiftshader for a 10W Haswell chip?
You can then run the same games with the GPU on.
Log battery life.
So like I said, the operating parameters of future CPUs with very wide SIMD units could be adjusted to the workload on a core-by-core basis. So you'll get the benefits of homogeneous computing, with the performance of heterogeneous computing.
It's not enough for those interested in NTV, particularly since so much of Haswell's output will rely on binning to get the cream of the crop. NTV is meant for even lower power consumption with better throughput per Watt, and it is meant to do so consistently.