so 5% was real?
Now we have anand saying 50% (as clarified by AMD employees) and AMD employees writing on official AMD blogs saying it is ~5%.
Smoke and mirrors, anyone?
so 5% was real?
Smoke and mirrors, anyone?
Either way it seems AMD is going the opposite way compared to their older designs which emphasized FP performance, in my layman's understanding of it. Very eager to see the performance of it.
This could be the cause of the difference. Though if so, why bother to make a supposedly very large 256bit FP unit at all rather than staying with 128bit?That's because they expect floating point workloads to migrate to GPUs.
I'd like to see some sort of utilisation stats INT|FP or x86|SSEx for typical tasks ie Windows desktop stuff, Flash, browsers, videos, games etc.
It seems like AMD & Intel are taking wildly different directions on INT|FP ratio, with i7 having 3*128bit FP|core vs Bulldozer heading for 1*128bit/0.5*256bit FP|core
This could be the cause of the difference. Though if so, why bother to make a supposedly very large 256bit FP unit at all rather than staying with 128bit?
Sure about flash (video)?For the desktop stuff, flash, browsers, it's almost fully integer based stuff.
Depends on how you count them. But yes this is usually refered to as 3 units - 1 mul unit, 1 add unit, 1 mov unit. Needless to say a mov unit isn't exactly powerful. And if you compare that to BD where both units can do fmac it gets a bit complicated, though as was mentioned it may be possible the fmac units can get split so that would basically double the units.I thought i7 had one 128 bit sse unit per core.
Sure about flash (video)?
Depends on how you count them. But yes this is usually refered to as 3 units - 1 mul unit, 1 add unit, 1 mov unit. Needless to say a mov unit isn't exactly powerful. And if you compare that to BD where both units can do fmac it gets a bit complicated, though as was mentioned it may be possible the fmac units can get split so that would basically double the units.
There are 2 128 bit units to take care of old sse code.Though really this needs to be compared to Sandy Bridge I guess, which supposedly has twice as wide units, though it's unclear to me yet if these are actually physically twice as wide, and if so what happens with "old" SSE code (half the unit just idle or not).
It's just video decode, right? Except possibly in scaling frames, why should it have any floating point business?
No I don't think fmac can be split. And I count mov+add+mul as one unit.
There are 2 128 bit units to take care of old sse code.
they also explained that to run just an add or a mul in an FMAC unit is much slower then doing an FMAC, so if they dont do a bridge they either have extra adds and muls or they are taking a big "traditional" floating point hit.
The use of a fused multiply-add unit in place of a floating-point adder and floating-point
multiplier has yet another drawback. Due to their large area and power consumption,
implemented fused multiply-add blocks typically replace the floating-point adder and floatingpoint
multiplier entirely. This replacement removes the ability to have floating-point add and
multiply instructions execute independently in different parallel units. For code that needs strings
of floating-point adds and multiplies executed independently, the use of a fused multiply-add
unit will reduce the throughput by 30% to 75%.
taken from the document that i linked that is now back up:
Pretty sure at least for some video formats things like idct are usually done with float arithmetic. Given the general slowness of flash video, I wouldn't be surprised if it does everything with traditional x87 code .It's just video decode, right? Except possibly in scaling frames, why should it have any floating point business?