Barcelona overview article at RWT

http://arstechnica.com/articles/paedia/cpu/will-barcelona-cure-what-ails-amd.ars

Jon Stoke's vision of AMD seems accurate to me. Especially after reading the Realworldtech article.
Jon's article is more or less based on David Kanter's performance summary and conclusions.

Here comes the micro-op/macro-op question into the picture. David is using Intel's terminology all along, so that macro-ops are the x86 instructions, and micro-ops are the low-level RISC ones. In contrast, for AMD, the macro-ops are lower-level ops already, whose hold up to two micro-ops.

So, f.ex. the "µops" on the diarams should be considered as macro-ops in AMD's terms, so two times that many micro-ops. This way you get 3-6 micro-ops, versus 4 on C2D's side. Taking into account there is the micro-op fusion on Intel's part, it's really "up to 5" (two of them fused), according to Agner. This all is true for the retirement phase, too. There are another things to consider here, too: f.ex. that the "6 instructions" below the C2D's predecode is really "up to 6 instructions" (because of the size of the buffer), or that 3 of 4 decoders in C2D are rather simple ones that can emit only one micro-op at a time, and so on. So there are several limiting factors here and there at which Barcelona is just better.

All in all, the RWT article's remark, so that C2D is 33% wider because of the "4 vs. 3 µops" thingy is not really accurate, I think.

And so the conclusion that Barcelona will probably be ahead only in multithreaded server applications, and not in single-threaded, usually desktop applications is IMHO questionable. Okay, David probably counted on the 2.5 GHz number of earlier roadmaps. But more recently slipped roadmaps shows 4-500 MHz higher debuting clocks...
 
Last edited by a moderator:
I doubt Kanter decided solely from the wider peak retirement rate that Conroe would lead in single-threaded performance. Both AMD and Intel have admitted that their chips are doing fantastically if they break an IPC of 1 for most code.

Conroe is more aggressive at pursuing single-threaded performance than Barcelona. For desktop loads, its cache is better (offset in various scenarios by AMD's IMC), its memory reordering is more aggressive, and its clock speeds are higher.

A possibly more telling source of information on the Barcelona derivatives is AMD's silence when it comes to benchmarks of single-threaded performance.

They wouldn't only be showing Specrate numbers if the single-threaded numbers were also good.
 
Does anyone else wonder if Conroe was actually intended as primarily a mobile chip?

After they figured out Tejas wasn't going to work out, they quick changed a few things (higher FSB and voltage perhaps for more clock speed), but it's still an awful lot like Yonah when you consider things. Obviously Conroe had been in development for probably years before Tejas was known to be a flop....
 
Does anyone else wonder if Conroe was actually intended as primarily a mobile chip?

After they figured out Tejas wasn't going to work out, they quick changed a few things (higher FSB and voltage perhaps for more clock speed), but it's still an awful lot like Yonah when you consider things. Obviously Conroe had been in development for probably years before Tejas was known to be a flop....

Pentium M was heavily based on the P6 (Pentium Pro to P3) architecture and then Core Duo/Yonah was just a evolution of that. Conroe is a "more of everything" approach on that with some nifty little changes here and there. So it is certainly derived from a mobile family. I have my doubts Conroe has been in development for longer than Tejas. I think Conroe was a mixture of needed enhancements that were in the works and then also a really beefed up Yonah. I get all weird feeling inside saying this but I think Conroe has some advancements that would have been going into what was originally going to be Nehalem, which itself was originally the successor to Tejas. That is all just a like of crazy stuff though, super chip and robot over lords come to mind saying that...

Conroe is a representation of what I believe will become the norm for CPU production in the mainstream markets. Chips designed for use in the mobile market and then derivatives moved to the desktop market.
 
I doubt Kanter decided solely from the wider peak retirement rate that Conroe would lead in single-threaded performance. Both AMD and Intel have admitted that their chips are doing fantastically if they break an IPC of 1 for most code.

He didn't. Indeed, he wrote the same as you:
"While it does appear that the Core 2 is 33% wider than Barcelona, in reality, neither processor comes close to peak capabilities on real code, so the performance will be much closer than the block diagrams imply. Barcelona's 3-wide issue, execute and retire capabilities are not a performance problem."
Seems I remembered this part wrong, sorry. He mentioned also the clock-rate, as an important factor here, so how high AMD could scale it, and I think he counted on 2.5 GHz - while according to more recent informations it will reach 2.8 GHz and beyond.

Thing is, Conroe's really not wider. Indeed, it's up to 5 micro-ops wide (two of them fused), according to Agner, while Barcelona is up to 6 micro-ops wide (in AMD's terms). Okay that it alone rarely counts much. But there are other differences that helps Barcelona increase the average IPC, and especially taking advantage of the 128 bit FPUs, in this one certainly better than Conroe.

Conroe is more aggressive at pursuing single-threaded performance than Barcelona. For desktop loads, its cache is better (offset in various scenarios by AMD's IMC), its memory reordering is more aggressive, and its clock speeds are higher.

This is perhaps true for the integer performance, but I wouldn't say it regarding the SIMD performance. So Barcelona could still be better f.ex. in media-processing, which is indeed a desktop application.

A possibly more telling source of information on the Barcelona derivatives is AMD's silence when it comes to benchmarks of single-threaded performance.

They wouldn't only be showing Specrate numbers if the single-threaded numbers were also good.

Possibly.
 
This is perhaps true for the integer performance, but I wouldn't say it regarding the SIMD performance. So Barcelona could still be better f.ex. in media-processing, which is indeed a desktop application.

This seems possible, though Conroe and Barcelona have different advantages and disadvantages.
Conroe's peak SSE capability is twice that of Barcelona's, though this requires a pretty specific mix of instructions.
Barcelona has a number of measures that allows for theoretically higher sustained performance over a wider set of circumstances.

Per-clock, it may very well be that Barcelona is capable of a higher sustained level of performance, but it's not just per-clock that matters.
Vector performance has done well on highly-clocked architectures.
The P4 was very good at SSE.

Intel is likely sandbagging on one or two speed grades for Core2.
 
Intel is likely sandbagging on one or two speed grades for Core2.

I think we can be sure that Intel is sandbagging on more than a few speedgrades on the Yonah platform. Even the original Core Duo (non-"2" models) would overclock pretty handily; the C2D's are seeing overclocks into the stratosphere relatively speaking. Even on stock voltage you can see C2D's doing more than 25% overclocks, even on their top-of-the-line Quad EE models.

As much as I'd love to have the competition to keep everything in-check for prices, I still don't think Barcelona is going to be performance competitive with Intel's product during it's released timeframe.
 
Back
Top