It's not a monopoly.Because it's in a monopoly's shareholders' best interests are to sit tight and screw dollars on the penny from it's customers.
It's not a monopoly.Because it's in a monopoly's shareholders' best interests are to sit tight and screw dollars on the penny from it's customers.
I consider AVX(2) to be a child of LRBni.This discussion has since long been derailed. If you think AMD is a threat, or poses a risk to Intel, then you should start a separate thread on how Intel might evolve AVX to compete with AMD and keep this one for KC and it's children.
The price is not mentioned, so we cannot make a fair comparison, but one of the two is rated at $621, so it's for the 1% anyway. There is no mention of cache bandwidth increase and the 64 bit support is just a price gouging tactic.
I am assuming that the FMA unit will also be able to use it's multiplier and adder simultaneously. So that should sustain status quo at a minimum. Besides, the FMA might just decode to 2 uops anyway. We won't know until we see it.No, Sandy Bridge has independent MUL + ADD. Applications would suffer badly if Haswell supported only one FMA since it can only execute dependent MUL/ADD and only when using FMA3 instructions.
Didn't Intel said more power will be consuput by its own IGP, not CPU?Not according to this: Shark Bay Platforms. There's a new ultrabook segment, but the desktop and laptop products target the same TDP levels as Sandy Bridge.
If so It's good. Most common application can't use both 256bit MUL and 256bit ADD simultaneouslyI am assuming that the FMA unit will also be able to use it's multiplier and adder simultaneously. So that should sustain status quo at a minimum. Besides, the FMA might just decode to 2 uops anyway. We won't know until we see it.
And for the last time, this is the wrong thread for speculation on future Intel CPUs.
You totally missed the point. It has twice the SIMD width, twice the cache bandwidth, higher clock frequency, higher FSB speed, twice the cache size, and x86-64 support. Yet all of that only increased TDP by a fraction. It's an indication that doubling the SIMD width and cache bandwidth, the only things relevant to the discussion, likely won't increase power consumption by a lot. And given that Haswell will use a new process with exceptional energy efficiency, there should actually be plenty of headroom for ultrabook products, despite vastly increased throughput.Also, 4W is not really the same ballpark for notebook computer.
That's because I selected the highest clocked models in their TDP class, to get as close as possible to fair comparison. You're most welcome to compare other relevant models.The price is not mentioned, so we cannot make a fair comparison, but one of the two is rated at $621, so it's for the 1% anyway.
T2700 is Yonah (Core Duo). T7800 is Merom (Core 2 Duo). Core 2 features Advanced Digital Media Boost and Advanced Smart Cache, or as I like to call it, twice the SIMD width and twice the cache bandwidth.There is no mention of cache bandwidth increase and the 64 bit support is just a price gouging tactic.
Besides, they are both apparently from Conroe family, so I am not sure why you are saying that the vector width is different. Is that a difference that is not listed here?
Please clarify what you mean by "simultaneously".I am assuming that the FMA unit will also be able to use it's multiplier and adder simultaneously.
Please explain how you'd implement a fused multiply-add operation in two uops.Besides, the FMA might just decode to 2 uops anyway.
You might want to check who started this thread. I also suggested early on that AVX could use LRBni type instructions, in particular gather. So when AVX2 was announced the discussion naturally started to focus on CPUs instead of MICs.And for the last time, this is the wrong thread for speculation on future Intel CPUs.
Why would it be any different from 128-bit SSE?Most common application can't use both 256bit MUL and 256bit ADD simultaneously
I just mean they don't need it.Why would it be any different from 128-bit SSE?
Is that the same as SSE?Please clarify what you mean by "simultaneously".
?
In what sense? Are you referring to the 256-bit, the separate ports for MUL and ADD, or just floating-point performance in general? Please motivate.I just mean they don't need it.
All SSE implementations I know of have separate ports for MUL and ADD.Is that the same as SSE?
I just start to wondering wether program need both MUL and ADD at excatly the same time.In what sense? Are you referring to the 256-bit, the separate ports for MUL and ADD, or just floating-point performance in general? Please motivate.
Of course they do. As you can see in the image fellix posted, Sandy Bridge considers up to 54 uops for execution each cycle. Chances of finding both an independent MUL and ADD in floating-point intensive code is quite high. As I've mentioned earlier, Intel has had separate ports for MUL and ADD since the Pentium Pro. Also note that with Hyper-Threading the instructions come from two threads, further increasing the chances of finding independent instructions.I just start to wondering wether program need both MUL and ADD at excatly the same time.
That's only a loss if the CPUs can't be power gated off due to inactivity.Ivy Bridge will feature support for 16-bit floating-point values, which is very useful for software vertex processing. It would be a waste to leave the CPU cores' GFLOPS unused while the IGP is swamped.
Intel did this for the previous generations of GMA hardware before transitioning away from doing so.Software vertex processing, if you really want to call it that, would be part of a gentle transition toward a homogeneous architecture.
Vertex processing?!? IIRC the vertex shaders were first to support 32bit floats before the pixel shaders could do it (back in the times before unified shaders got common).Here's one more reason why 2 x 256-bit FMA for Haswell makes most sense:
Ivy Bridge will feature support for 16-bit floating-point values, which is very useful for software vertex processing.
Games are becoming increasingly more multi-threaded, as are the drivers, so you can largely forget about gating off CPU cores anyhow.That's only a loss if the CPUs can't be power gated off due to inactivity.
The main problem was that they relied on Microsoft's software vertex processing. Unfortunately some games downright refuse to run on Direct3D implementations that don't report hardware vertex processing support, for no good reason.Intel did this for the previous generations of GMA hardware before transitioning away from doing so.
Power efficiency is not the all-determining factor to (not) make a move like this. If it was, we'd still have separate vetex and pixel shader cores. Heck I'm absolutely certain that the vast majority of people wouldn't care if speeding up Intel's graphics required increased power consumption. Note also that there was a clear pendulum swing for both dedicated sound and physics processing, and there's no sign of it ever coming back.It could be due for a pendulum swing back, if taking work from specialized low-clocking hardware and firing up one or more high-speed OoO engines is supposed to be a power efficiency gain, which is not the impression I'm getting so far.
These instructions I'm talking about are for converting between 16-bit and 32-bit floating-point formats. Actual arithmetic operations still use at least 32-bit. But 16-bit floating-point formats have a major use case for vertex processing, since it's a popular compact data type for vertex buffers. Gather support is also quite useful for vertex processing since it avoids having to explicitly transpose the data.Vertex processing?!? IIRC the vertex shaders were first to support 32bit floats before the pixel shaders could do it (back in the times before unified shaders got common).
Having two FMA units would be vastly superior though. Not only can it execute a FMA & FMA combination, it can also execute MUL & FMA, ADD & FMA, ADD & MUL, ADD & ADD, and MUL & MUL. The latter two combinations also help with legacy code. AMD already has two FMA units in Bulldozer, and although they're only 128-bit each, it would clearly be a huge mistake for Intel to equip Haswell with only one 256-bit FMA unit. In particular for legacy code it would only be capable of executing either a MUL or an ADD each cycle, instead of all the above combinations.