FX60 Benchies

ANova · Jan 9, 2006

Guden Oden said:
Well duh! Neither have multithreading implemented in hardware so of course it isn't possible. Nothing that says it couldn't be done tho if intel/AMD wanted to.

Not without an overhaul they won't, which changes everything.

pc999 · Jan 10, 2006

http://www.xbitlabs.com/articles/cpu/display/athlon64-fx60.html

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2668

Rys · Jan 10, 2006

[BLATANT SELF PROMOTION WARNING :!:

]

I had a go, too

John Reynolds · Jan 10, 2006

Rys, that's just incredibly rude, don't you think? Sheesh.

nutball · Jan 10, 2006

ANova said:
Multithreading on the A64 as well as the Pentium M is not a possibility due to their architectures. P4s have substantially better branch prediction which is needed due to their long 20 stage (Northwood) and 31 stage (Prescott) pipelines; these characteristics are what make HT possible.

Back to front. Look at POWER5 or EV8 if you want an example of what makes a good SMT CPU. Wide, short and slow, not narrow, long and fast. HT is a POS implementation of SMT and deserves to die the death.

nutball · Jan 10, 2006

pcchen said:
However, SMT is not an easy thing to do (mostly about verification).

Maybe not in verification, but IBM regularly say (to me at least!) that adding SMT to POWER5 increased the transistor count by ~10%, so it's a *lot* cheaper than dual-core in transistor terms. In some codes I run it boosts performance by 30%, so it's potentially a big win.

Tahir2 · Jan 10, 2006

Adding more pipelines will not automagickally increase the clockspeed achievable by a processor at a given process technology. Of course no one has stated this specifically but I thought I would clear it up anyway.

AMD has chosen not to implement a HT like multithreaded structure because they believe it will not add much to the performance of the Athlon64 architecture as already stated by mczak.

It will be interesting to see what AMD has in store for its [edit 9th generation is the Dual Core architecture] 10th generation processor (not M2).

ANova · Jan 11, 2006

nutball said:
Back to front. Look at POWER5 or EV8 if you want an example of what makes a good SMT CPU. Wide, short and slow, not narrow, long and fast. HT is a POS implementation of SMT and deserves to die the death.

Yes that is the direction AMD and Intel are heading. HT isn't that bad, it does help (in some situations more then others), but the more efficient wide design coupled with multiple cores is a better idea.

Fox5 · Jan 11, 2006

How is hyperthreading not the poor man's VLIW?

Frank · Jan 11, 2006

Fox5 said:
How is hyperthreading not the poor man's VLIW?

Different metaphors. Apples and oranges.

wireframe · Jan 11, 2006

Rys said:
[BLATANT SELF PROMOTION WARNING ]

I had a go, too

John Reynolds said:
Rys, that's just incredibly rude, don't you think? Sheesh.

Great reviews guys. Keep it up.

But Rys, I think I saw a slight error in your review. In the Cinebench section you wrote

666 is the fastest score from a single CPU we've yet seen at HEXUS, Opteron and Xeon included.

when surely what you meant to say was

666 is the number of the beast.

I'm probably in limited company thinking this, but the FX-60 evoked a "meh" from me. It just doesn't seem to have the specs to really set it apart and warrant the price. If I am not alone in thinking this way I expect the FX-60 to spark sales for the X2 4800+, which is very close.

Richard · Jan 11, 2006

wireframe said:
Great reviews guys. Keep it up.

Same here. I was a bit dazed thinking where I'd get my review fix and you two sure fixed it.

I'm probably in limited company thinking this, but the FX-60 evoked a "meh" from me. It just doesn't seem to have the specs to really set it apart and warrant the price. If I am not alone in thinking this way I expect the FX-60 to spark sales for the X2 4800+, which is very close.

I'm in the same boat but for slightly different reasons. As the last chip of this slot it's really narrowing your upgrade choices. Right now I have a P4 3.2 (Northwood variety which is fast, and reasonably quiet and cool) using socket 478. My sole upgrade path is a 3.4Ghz processor and only to the hot and power-hungry Prescott variety. So WRT the FX-60, paying so much for the end-of-the-line just doesn't appeal to me.

wireframe · Jan 11, 2006

Mordenkainen said:
I'm in the same boat but for slightly different reasons. As the last chip of this slot it's really narrowing your upgrade choices. Right now I have a P4 3.2 (Northwood variety which is fast, and reasonably quiet and cool) using socket 478. My sole upgrade path is a 3.4Ghz processor and only to the hot and power-hungry Prescott variety. So WRT the FX-60, paying so much for the end-of-the-line just doesn't appeal to me.

I'm on socket 939 right now, so any time AMD releases something new I get the itch. If it had been 2.8GHz it would at least have been and "Ok, looking good" instead of the "meh". 3.0Ghz and it would have the "Wow! I must have you!!!" factor

ANova · Jan 11, 2006

I'm probably in limited company thinking this, but the FX-60 evoked a "meh" from me. It just doesn't seem to have the specs to really set it apart and warrant the price.

Anything above $300 does the same for me.

Accord1999 · Jan 11, 2006

nutball said:
Maybe not in verification, but IBM regularly say (to me at least!) that adding SMT to POWER5 increased the transistor count by ~10%, so it's a *lot* cheaper than dual-core in transistor terms. In some codes I run it boosts performance by 30%, so it's potentially a big win.

That's comparable to the performance gains of HT on the P4. 20% throughput gain is pretty much standard with two or more CPU intensive apps/threads.

suryad · Jan 12, 2006

IF you are looking for price and for performance and for long term longevity then I think an FX 60 would be the way to go especially when the AM2 socket processors get released because that FX 60 will prob be on par with initial AM2 releases in terms of performance...if not power consumption because AMD might hae 65 nm tech out by then.

Gubbi · Jan 15, 2006

nutball said:
Maybe not in verification, but IBM regularly say (to me at least!) that adding SMT to POWER5 increased the transistor count by ~10%, so it's a *lot* cheaper than dual-core in transistor terms. In some codes I run it boosts performance by 30%, so it's potentially a big win.

The number is 24% for Power5 to make SMT useful. This is still lower than duplicating an entire core and IBM claims that the increase in throughput is larger than the increase in die area, which is good.

But it isn't stellar.

SMT is implemented to take better use of the execution units, but a lot of other resources has to be beefed up, things like rename registers, buffers, D$ and I$ all experiences increased pressures unless they are reworked to sustain multiple contexts.

Take a look at the P4 Northwood->Prescott transformation. The die grew much more than the 32bit -> 64bit execution units and data lanes could account for. The thing is that Intel made a real effort to make SMT worth while. This meant increasing the amount of rename registers from 128 to 256, doubling (or more, I forget) the amount of write combine buffers, store buffers, etc. Most important they increased the D$ from 8KB to 32KB to reduce thrashing when two contexts are active.

Increasing these structures did not come without a cost: The D$ load-to-use latency increased from 2 to 4 cycles having a very serious impact on single thread performance, the more than doubling of the D$ was in part to alleviate this increase in latency.

Also, to maintain clock frequency with these larger structures, the new core had to be more deeply pipelined. This had the negative impact of a bigger branch mispredict penalty, so the branch predictor had to beefed up to ensure mispredictions were fewer.

All in all this added up to a big increase in die area.

If you look at these slides from an AMD presentation you get a good feel about the breakdown of a modern CPU, an Opteron consists of:

1. FP exec units, FP registers+scheduler: 5%
2. LS (load/store): 4%
3. Integer exec units/register+scheduler: 4%
4. x86 decode: 2%
5. Branch prediction: 2%
6. Branch unit: 1%
7. D$: 6%
8. I$: 4%
9. Northbridge: 5%
10. L2: 42%
11. I/O: 25%

The core (with L1 caches) only takes up 30% of the total die area. To implement SMT you'd need to beef the size up on the following:
1. FP registers and preferably the size of the scheduler as well
2. Write combine buffers in the LS
3. INT registers and scheduler
7. D$
8. I$

Let's say 1.) grows by 50%, 2.) doesn't really grow, 3.) grows by 50%, 7.) by 100% (actually both size and associativity should be doubled to avoid a negative impact, so could be even worse) and 8.) by 100%, that would make the core 42% (42.5/30) larger than today. And what you get in return is potential doubling of throughput and a guaranteed negative impact on single thread performance.

I'm fairly certain that AMD has just chosen to not implement SMT for these reasons, and instead put multiple cores on a die instead. Because of the smaller core and the reduced complexity they get a time-to-market advantage.

Multiple threads/cores makes a lot of sense as long as there is parallellism to exploit, but once you reach the point of diminishing returns Amdahl's law will make single thread performance matter again.

So AMD's choice makes perfect sense, but then again so does Intel's and in particular IBM's

Cheers

FX60 Benchies

ANova

pc999

Rys

Graphics @ AMD

John Reynolds

Ecce homo

nutball

nutball

Tahir2

ANova

Fox5

Frank

Certified not a majority

wireframe

Richard

Mord's imaginary friend

wireframe

ANova

Accord1999

suryad

Gubbi