Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 28-Apr-2012, 22:14   #1
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 5,933
Send a message via MSN to pjbliverpool
Default A Comparison: SSE4, AVX & VMX

As it stands currently it can be argued that there are 3 major CPU SIMD instruction sets in use for modern high end gaming. (okay, ignoring SPU's).

Those being:

SSE4: Used on Pernyn and Nehalem (in slightely different configurations)
AVX: Used on the very latest PC CPU architecures, namely Sandybridge, Bulldozer and Ivybridge.
VMX: Used in Xenon x3 and in a slightely reduced form in the PPU on Cell

So given the same theoretical throughput, what are the general thoughts about which of these instructions sets is best suited for modern gaming?

Obviously AVX has twice the theoretical single precision throughput of SSE4 and VMX per clock so lets say were using as near as dammin 100% vectorised code on the following hypothetical CPU's:

1x Penryn Core @ 3.2 Ghz
1x SandyBridge Core @ 1.6 Ghz
1x Xenon core @3.2Ghz

Any views on how these would fair against one another?
__________________
PowerVR PCX1 -> Voodoo Banshee -> GeForce2 MX200 -> GeForce2 Ti -> GeForce4 Ti 4200 -> 9800Pro -> 8800GTS -> Radeon HD 4890 -> GeForce GTX 670 DCUII TOP

8086 8Mhz -> Pentium 90 -> K6-2 233Mhz -> Athlon 'Thunderbird' 1Ghz -> AthlonXP 2400+ 2Ghz -> Core2 Duo E6600 2.4 Ghz -> Core i5 2500K 3.3Ghz
pjbliverpool is offline   Reply With Quote
Old 28-Apr-2012, 22:29   #2
Davros
Regular
 
Join Date: Jun 2004
Posts: 11,079
Default

Is avx used in any games ?
or is it transparent to the programmer. I'm guessing sse3 is needed for older cpu's, my cpu doesnt support sse4
__________________
Guardian of the Bodacious Three Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 29-Apr-2012, 01:35   #3
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,274
Send a message via Skype™ to rpg.314
Default

Quote:
so lets say were using as near as dammin 100% vectorised code on the following hypothetical CPU's:
Then all of them suck. You should be using a GPU.
rpg.314 is offline   Reply With Quote
Old 29-Apr-2012, 03:19   #4
mczak
Senior Member
 
Join Date: Oct 2002
Posts: 2,727
Default

You cannot really say which instruction set is faster as that would be dependent on implementation. Latency and throughput of i.e. sse2 instructions vary greatly between different cpus.
Furthermore the instruction set of AVX isn't actually different to SSE(4), it's exactly the same instructions just extended to 256bit (well for floats only - 256bit ints need to wait til AVX2, Haswell). The instructions are just mostly slightly different with AVX since the vex encoding has non-destructive (3 operand) syntax (makes the instructions slightly larger but saves most register-register move instructions which should be good for some small performance improvement).
AVX with ints is thus just just minimally faster than SSE4 on the same cpu (the only advantage comes from less move instructions), and with floats it's a bit more than twice as fast in theory (except for divisions on sandy as the divide unit is only 4-wide though Ivy "fixed" that). This assumes though your algorithm really can be adjusted to use 8-wide floats trivially, and further assumes no load/store bottlenecks (sandy can load 2 128bit values and store 1 128bit value per clock) not to mention obviously other things like limitations due to memory bandwidth or latency also still are the same.

I don't know much about VMX, I believe it has some better support for horizontal operations and shuffles but if you can benefit from such instructions can't be said generally. About VMX on Xenon I have absolutely no idea what the throughput for even the "basic" operations (float vec mul, add) are just because the instructions are 4-wide doesn't tell you much what the cpu can do per clock, not sure if that information was published anywhere for Xenon (it might be possible that just like older cpus supporting sse2 they really only have 2-wide instead of 4-wide execution units for instance).
mczak is offline   Reply With Quote
Old 29-Apr-2012, 08:21   #5
tunafish
Member
 
Join Date: Aug 2011
Posts: 408
Default

AVX1 is honestly not all that interesting. Getting 8-wide parallelism without gather is a whole lot harder than 4-wide. AVX2, to be released with Haswell, however is very. All the low-level coders I routinely talk with are pretty stoked for the gather support and FMA. Not only does it make "lists of elements" style code a lot easier to vectorize, it should finally make reasonable gains from autovectorization a reality. Vector instructions with gather are just better than than ones without.

Quote:
Originally Posted by mczak
not sure if that information was published anywhere for Xenon (it might be possible that just like older cpus supporting sse2 they really only have 2-wide instead of 4-wide execution units for instance)
I'm reasonably certain that VMX is full-width, and has always been.
tunafish is offline   Reply With Quote
Old 29-Apr-2012, 09:47   #6
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 5,933
Send a message via MSN to pjbliverpool
Default

Thanks for the input guys. So it sounds like AVX is most certainly not just 2x SSE4 but AVX2 coming with Haswell might be getting close? Sounds like there's a lot to be excited about in Haswell.

Does VMX support FMA? I assume that's quite advantageous for games and so would give it a leg up in some respects over AVX?
__________________
PowerVR PCX1 -> Voodoo Banshee -> GeForce2 MX200 -> GeForce2 Ti -> GeForce4 Ti 4200 -> 9800Pro -> 8800GTS -> Radeon HD 4890 -> GeForce GTX 670 DCUII TOP

8086 8Mhz -> Pentium 90 -> K6-2 233Mhz -> Athlon 'Thunderbird' 1Ghz -> AthlonXP 2400+ 2Ghz -> Core2 Duo E6600 2.4 Ghz -> Core i5 2500K 3.3Ghz
pjbliverpool is offline   Reply With Quote
Old 29-Apr-2012, 10:44   #7
Davros
Regular
 
Join Date: Jun 2004
Posts: 11,079
Default

Do you need to code specifically for sse4/avx ?
Back in the day (cue old fart story) if I wanted to support x87 i would just use a comiler directive ($N i think) and that was it, job done if the pc had a math co-pro it would get used, if not the program would just use the integer unit
__________________
Guardian of the Bodacious Three Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 29-Apr-2012, 15:24   #8
tunafish
Member
 
Join Date: Aug 2011
Posts: 408
Default

Quote:
Originally Posted by rpg.314 View Post
Then all of them suck. You should be using a GPU.
Just to elaborate. Most loads do not vectorize that easily on all implementations. That's why comparing ideal cases is pointless -- you just don't see them that much in the real world. The ability to vectorize more cases is much more important than the optimal throughput in the optimal case.

Quote:
Originally Posted by pjbliverpool View Post
Thanks for the input guys. So it sounds like AVX is most certainly not just 2x SSE4 but AVX2 coming with Haswell might be getting close? Sounds like there's a lot to be excited about in Haswell.
I'd say that AVX2 doesn't "get close to being 2x SSE4". It's much better than that. It will allow a lot of code that is currently done with single elements to be autovectorized by the compiler.

Quote:
Does VMX support FMA? I assume that's quite advantageous for games and so would give it a leg up in some respects over AVX?
It does. Note that AVX (and SSE) can do both a multiply and an add in the same clock, so the advantage isn't that dramatic.

Quote:
Originally Posted by Davros View Post
Do you need to code specifically for sse4/avx ?
Back in the day (cue old fart story) if I wanted to support x87 i would just use a comiler directive ($N i think) and that was it, job done if the pc had a math co-pro it would get used, if not the program would just use the integer unit
For doing math on single elements, yeah sure. But that's not all that much faster than x87. SIMD does not make the operations any faster, it allows you to do more of them at the same time. So instead of loading two individual elements, you load two vectors of 4 or 8 and multiply each element of one vector with the corresponding element of the other vector. So not only do you need to use special instructions, pre-AVX2 you have to layout your data so that you can load consecutive (16-byte aligned) elements into memory. And since cross-lane operations are slow, you ideally want the vectors to have elements from different objects. So instead of putting the value in the object, you have to build an array that has one value from each object, for each value in said objects.

You can probably see why this gets hairy fast. It's hard to do by hand, and nigh-impossible to do automatically by a compiler. There is some downright heroic work on the subject by the Intel and GCC teams, but even they really don't get that much speedup from autovectorized code. So today, only the things that are absolutely trivial tend to get optimized. (position and speed = 2 4-element vectors.)

AVX2 brings gather instructions, which are basically vectorized loads. they take a base address and a vector full of offsets, and fill the target register with [base + offset]. This should make vector instructions useful in a lot of places they weren't before, because a lot of loops can then be trivially vectorized by the compiler.
tunafish is offline   Reply With Quote
Old 29-Apr-2012, 18:45   #9
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 3,033
Send a message via Skype™ to fellix
Default

Let's not forget the hardware transactional memory support, primed for Haswell too. It will further optimize memory pipeline performance under heavy MP loads.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 29-Apr-2012, 21:01   #10
sebbbi
Senior Member
 
Join Date: Nov 2007
Posts: 1,388
Default

VMX128 is actually a very good set of instructions (compared to SSE at least). It has very good shuffles/inserts/select, multiply-add, complex bit packing instructions (including float16 conversion), (AOS) dot product, etc. However instruction set is only one side of the coin, the other is the CPU architecture implementing the instruction set.

Nothing of course compares to AVX2 (in Haswell). But gather is only good if it is fast enough, and nobody really knows that yet. 256 bit wide integer operations are of course nice addition as well.
sebbbi is offline   Reply With Quote
Old 29-Apr-2012, 21:40   #11
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 5,933
Send a message via MSN to pjbliverpool
Default

Sounds like Haswell going to be a pretty impressive chip. I wonder how long it will be before CPU's drop specialised SIMD units altogether though and move vector processing to the GPU's. Are we getting close to that yet? Or would GPU's be unsuitable as complete replacements?

I know AMD has been hinting about it in a future fusion iteration but I'm not sure whether that would be a complete replacement for the CPU's SIMD abilities or just complimentary.
__________________
PowerVR PCX1 -> Voodoo Banshee -> GeForce2 MX200 -> GeForce2 Ti -> GeForce4 Ti 4200 -> 9800Pro -> 8800GTS -> Radeon HD 4890 -> GeForce GTX 670 DCUII TOP

8086 8Mhz -> Pentium 90 -> K6-2 233Mhz -> Athlon 'Thunderbird' 1Ghz -> AthlonXP 2400+ 2Ghz -> Core2 Duo E6600 2.4 Ghz -> Core i5 2500K 3.3Ghz
pjbliverpool is offline   Reply With Quote
Old 30-Apr-2012, 06:16   #12
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,486
Default

There would need to be some pretty striking advances in implementation to allow for a CPU FP unit to be completely stripped out of the CPU core.
The latency in hopping from a CPU to a GPU would be unacceptable for workloads that require higher straightline performance. Problems that do not need much more data level parallelism than a CPU provides would also be a waste on a CU unit that needs four to eight times as many work items.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 30-Apr-2012, 12:40   #13
tunafish
Member
 
Join Date: Aug 2011
Posts: 408
Default

Quote:
Originally Posted by pjbliverpool View Post
Sounds like Haswell going to be a pretty impressive chip. I wonder how long it will be before CPU's drop specialised SIMD units altogether though and move vector processing to the GPU's. Are we getting close to that yet? Or would GPU's be unsuitable as complete replacements?
It's all about latency. If you were to move the processing to the GPU, even on the same die you are talking of several extra cycles. No matter how awesome throughput you have, that would still hurt you on a lot of loads.

I really don't think that the cpu vector units will ever be dropped. More likely, either they will evolve into the GPU ones (expand avx to full width, put 4-8 threads into the frontend, run GPU code on the CPU), or at some point the manufacturers will stop adding to them, and just put all the new advancements in the new dedicated vector block.
tunafish is offline   Reply With Quote
Old 05-May-2012, 02:09   #14
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,881
Default

Quote:
Originally Posted by pjbliverpool View Post
Sounds like Haswell going to be a pretty impressive chip. I wonder how long it will be before CPU's drop specialised SIMD units altogether though and move vector processing to the GPU's.
The reverse will happen. Note that a Haswell quad-core will be capable of 500 GFLOPS, while today's 22 nm HD 4000 can only do about 300 GFLOPS. GPUs also still have a lot of catching up to do to support complex code and not choke due to latency and bandwidth. So you can't get rid of the CPU's SIMD units any time soon, and the GPU is evolving into a CPU architecture to support more complex generic code. So the GPU and CPU are converging.

Eventually it will make sense to just move all programmable throughput computing to the CPU. AVX2 will already be perfectly suitable for graphics shaders. The only remaining deal breaker is the higher power consumption. But this can be tackled with AVX-1024. The VEX encoding already supports extending it to 1024-bit registers, and by executing such instructions on 256-bit units in four cycles, the CPU's front-end and scheduler will have four times less switching activity, hence dramatically lowering the power consumption. A 16 nm successor to Haswell could deliver 2 TFLOPS for the same die size and not break a sweat.

GPGPU is dying. Even though AMD is making its GPU architecture more flexible, NVIDIA went the other direction with Kepler. And on top of that you get wildly inconsistent performance between discrete and integrated parts. So GPGPU is utter rubbish for mainstream applications. Developers will instead focus on AVX2, since that will be available in every CPU from Haswell forward, and is only going to get more powerful.
Nick is offline   Reply With Quote
Old 05-May-2012, 06:34   #15
Davros
Regular
 
Join Date: Jun 2004
Posts: 11,079
Default

Quote:
Originally Posted by Nick View Post
Perhaps nv are trying to ensure that people who do GPGPU will buy tesla cards, but then again theres the risk people on a tight budget would buy amd unless they are locked into nv's tool chain.
__________________
Guardian of the Bodacious Three Terabytes of Gaming Goodness™
Davros is offline   Reply With Quote
Old 05-May-2012, 10:26   #16
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 5,933
Send a message via MSN to pjbliverpool
Default

Quote:
Originally Posted by Nick View Post
The reverse will happen. Note that a Haswell quad-core will be capable of 500 GFLOPS, while today's 22 nm HD 4000 can only do about 300 GFLOPS. GPUs also still have a lot of catching up to do to support complex code and not choke due to latency and bandwidth. So you can't get rid of the CPU's SIMD units any time soon, and the GPU is evolving into a CPU architecture to support more complex generic code. So the GPU and CPU are converging.

Eventually it will make sense to just move all programmable throughput computing to the CPU. AVX2 will already be perfectly suitable for graphics shaders. The only remaining deal breaker is the higher power consumption. But this can be tackled with AVX-1024. The VEX encoding already supports extending it to 1024-bit registers, and by executing such instructions on 256-bit units in four cycles, the CPU's front-end and scheduler will have four times less switching activity, hence dramatically lowering the power consumption. A 16 nm successor to Haswell could deliver 2 TFLOPS for the same die size and not break a sweat.

GPGPU is dying. Even though AMD is making its GPU architecture more flexible, NVIDIA went the other direction with Kepler. And on top of that you get wildly inconsistent performance between discrete and integrated parts. So GPGPU is utter rubbish for mainstream applications. Developers will instead focus on AVX2, since that will be available in every CPU from Haswell forward, and is only going to get more powerful.
Interesting stuff cheers. It'd certainly be great to see developers start to really take advantage of the vector processing on PC CPU's. I can't help but think that AVX is pretty underutilised at the moment, obviously AVX2 is going to be a lot more useful so once it starts becoming the standard hopefully developers will start pushing it to its limits thus driving it forwards to more GPU like performance.

I'm not sure how you get 500 GFLOPS out of a quad Haswell though? Even running at 4 Ghz (which is certainly possible) if would need to be capable of twice the single precision FLOPs as Ivy Bridge. Is AVX2 going to double the throughput of AVX? (32 flops per cycle vs 16)
__________________
PowerVR PCX1 -> Voodoo Banshee -> GeForce2 MX200 -> GeForce2 Ti -> GeForce4 Ti 4200 -> 9800Pro -> 8800GTS -> Radeon HD 4890 -> GeForce GTX 670 DCUII TOP

8086 8Mhz -> Pentium 90 -> K6-2 233Mhz -> Athlon 'Thunderbird' 1Ghz -> AthlonXP 2400+ 2Ghz -> Core2 Duo E6600 2.4 Ghz -> Core i5 2500K 3.3Ghz
pjbliverpool is offline   Reply With Quote
Old 05-May-2012, 11:33   #17
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 3,033
Send a message via Skype™ to fellix
Default

Quote:
Originally Posted by pjbliverpool View Post
I'm not sure how you get 500 GFLOPS out of a quad Haswell though? Even running at 4 Ghz (which is certainly possible) if would need to be capable of twice the single precision FLOPs as Ivy Bridge. Is AVX2 going to double the throughput of AVX? (32 flops per cycle vs 16)
Dual FMA3 pipelines, replacing the current ADD and MUL vector units?
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 14-May-2012, 14:44   #18
denev2004
Member
 
Join Date: Apr 2010
Location: China
Posts: 143
Send a message via MSN to denev2004 Send a message via Skype™ to denev2004
Default

Quote:
Originally Posted by fellix View Post
Dual FMA3 pipelines, replacing the current ADD and MUL vector units?
That's not enough if you're talking about DP.

Even talking about SP it's just barely enough...Cos I don't think a 4-core-haswell can went up to 4Ghz
__________________
Well I'm not a native English speaker so there might be misuse through my words. I just hope it won't cause too much misunderstanding.
denev2004 is offline   Reply With Quote
Old 14-May-2012, 16:35   #19
fellix
Senior Member
 
Join Date: Dec 2004
Location: Varna, Bulgaria
Posts: 3,033
Send a message via Skype™ to fellix
Default

By the time Haswell is out, I think Intel should already have a refined 22nm process up and running. After all, Haswell will be the first native architecture built for Tri-Gate.
__________________
Apple: China -- Brutal leadership done right.
Google: United States -- Somewhat democratic.
Microsoft: Russia -- Big and bloated.
Linux: EU -- Diverse and broke.
fellix is offline   Reply With Quote
Old 14-May-2012, 20:18   #20
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 5,933
Send a message via MSN to pjbliverpool
Default

Quote:
Originally Posted by fellix View Post
Dual FMA3 pipelines, replacing the current ADD and MUL vector units?
Is this actually what AVX will have or just a guess at this point. 1Tflop SP from an 8 core x86 would be pretty impressive!

EDIT: Ive no doubt Haswell will be capable of hitting 4 Ghz but I doubt Intel will clock it that high given the lack of competition. I'm fairly sure intel could have been releasing stock 4ghz CPU's since Sandybridge if they'd have felt the need.
__________________
PowerVR PCX1 -> Voodoo Banshee -> GeForce2 MX200 -> GeForce2 Ti -> GeForce4 Ti 4200 -> 9800Pro -> 8800GTS -> Radeon HD 4890 -> GeForce GTX 670 DCUII TOP

8086 8Mhz -> Pentium 90 -> K6-2 233Mhz -> Athlon 'Thunderbird' 1Ghz -> AthlonXP 2400+ 2Ghz -> Core2 Duo E6600 2.4 Ghz -> Core i5 2500K 3.3Ghz
pjbliverpool is offline   Reply With Quote
Old 15-May-2012, 06:50   #21
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,881
Default

Quote:
Originally Posted by pjbliverpool View Post
Is this actually what AVX will have or just a guess at this point. 1Tflop SP from an 8 core x86 would be pretty impressive!
Yes, this is what Haswell's implementation of AVX2 and FMA will be capable of. It hasn't been officially confirmed yet, but it's easy to deduce as the only logical answer.

We know for a fact Haswell will support FMA, and we also know Sandy Bridge has a separate ADD and MUL execution unit. They can't go for a single FMA unit with Haswell, since that would dramatically cripple legacy performance. They also can't go for an ADD+FMA or MUL+FMA combination, because then the same port is needed by MUL and FMA or ADD and FMA respectively, and with typical Instruction mix frequencies this actually results in lower performance due to port contention!

So under the safe assumption that they want the extra transistors to pay off, the only sane option is dual FMA units. This also simplifies scheduling. And note that Bulldozer already has dual FMA (even though it's 128-bit each, note that it's on 32 nm).

This also isn't all that incredible compared to what we've come to expect from GPUs. And Intel clearly is putting a lot of Larrabee's technology into AVX2.
Quote:
Ive no doubt Haswell will be capable of hitting 4 Ghz but I doubt Intel will clock it that high given the lack of competition. I'm fairly sure intel could have been releasing stock 4ghz CPU's since Sandybridge if they'd have felt the need.
AMD is clearly aiming to hit 4 GHz sooner rather than later. Regardless of superior IPC, the market will demand Intel to follow suit (or steal their thunder). Also for what it's worth 3.9 GHz would actually suffice for 500 GFLOPS out of a quad-core, and we're at 3.8 GHz Turbo Boost frequencies already.

Last edited by Nick; 15-May-2012 at 06:57.
Nick is offline   Reply With Quote
Old 15-May-2012, 08:52   #22
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 5,933
Send a message via MSN to pjbliverpool
Default

Cheers Nick, its a pretty exciting prospect. I kinda wish we could see Haswell in a console just so we can see what such a monster would be capable of if fully utilised for games. Good point about 4ghz too, I guess if AMD hit it then Intel will have little choice but to match for marketing reasons.
__________________
PowerVR PCX1 -> Voodoo Banshee -> GeForce2 MX200 -> GeForce2 Ti -> GeForce4 Ti 4200 -> 9800Pro -> 8800GTS -> Radeon HD 4890 -> GeForce GTX 670 DCUII TOP

8086 8Mhz -> Pentium 90 -> K6-2 233Mhz -> Athlon 'Thunderbird' 1Ghz -> AthlonXP 2400+ 2Ghz -> Core2 Duo E6600 2.4 Ghz -> Core i5 2500K 3.3Ghz
pjbliverpool is offline   Reply With Quote
Old 15-May-2012, 15:12   #23
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,486
Default

Quote:
Originally Posted by Nick View Post
AMD is clearly aiming to hit 4 GHz sooner rather than later. Regardless of superior IPC, the market will demand Intel to follow suit (or steal their thunder). Also for what it's worth 3.9 GHz would actually suffice for 500 GFLOPS out of a quad-core, and we're at 3.8 GHz Turbo Boost frequencies already.
A quick note, AMD has already hit 4 GHz base clock with a Bulldozer SKU. No thunder was to be had.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote
Old 15-May-2012, 16:28   #24
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,881
Default

Quote:
Originally Posted by 3dilettante View Post
A quick note, AMD has already hit 4 GHz base clock with a Bulldozer SKU. No thunder was to be had.
Thanks for the heads up, I totally missed that. Unfortunately the FX-4170 is only a dual-module chip, not quad-module. It sacrifices cores for clock speed, and consumes a whopping 125 Watt. That's not much of a victory over Intel. It seems to me that they're merely putting out a feeler to get a sense of the market's response before we truly enter the 4 GHz era.

A quad-core Haswell chip would offer four times the peak FP throughput, and definitely consume less. Fortunately Piledriver looks like an improvement on the power consumption front, but AMD has to put AVX2 on the roadmap sooner rather than later to keep up.
Nick is offline   Reply With Quote
Old 15-May-2012, 17:53   #25
3dilettante
Regular
 
Join Date: Sep 2003
Location: Well within 3d
Posts: 5,486
Default

Quote:
Originally Posted by Nick View Post
Thanks for the heads up, I totally missed that. Unfortunately the FX-4170 is only a dual-module chip, not quad-module. It sacrifices cores for clock speed, and consumes a whopping 125 Watt. That's not much of a victory over Intel. It seems to me that they're merely putting out a feeler to get a sense of the market's response before we truly enter the 4 GHz era.
It's not so much a feeler as much as it's a case of an architecture that's supposed to run in that range.
If they had their way, there would have been an 8 core running several hundred MHz above where the 8150 is at.
The architecture is not able to overcome its many tradeoffs in per-clock performance until it does.

Quote:
A quad-core Haswell chip would offer four times the peak FP throughput, and definitely consume less. Fortunately Piledriver looks like an improvement on the power consumption front, but AMD has to put AVX2 on the roadmap sooner rather than later to keep up.
It still needs to be able to handle native AVX, much less AVX2.
I'm in a relatively sour mood with regards to AMD today, so I was going to snark that the "keep up" part was out of the question several years ago.
__________________
Dreaming of a .065 micron etch-a-sketch.
3dilettante is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 04:02.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.