I think I found the whitepaper you are referring to over on Calypto. I'll give it a read later tonight...
Does anyone have a link to this white paper? I'd appreciate a gander.
I think I found the whitepaper you are referring to over on Calypto. I'll give it a read later tonight...
The first link to an overview of AMD's power optimizations is on this page. There's also an EE Times article linked there.
I haven't yet read the whole whitepaper, however, as that requires filling out additional personal data.
For some reason, the images on that page aren't loading for me now, but one was a graph showing the number of active flops that shows increasing clock gating efficiency over the development period. There were a lot of existing blocks that did improve a little, but the new block corresponding the L2 interface has a massively higher total. It improves quite a bit, but even afterwards it's still a big chunk of active flops.
http://calypto.com/en/blog/2013/02/04/rtl-clock-gating-analysis-cuts-power-by-20-in-amd-chip/
Question is what kind of deal they made with Sony ... do they have the right to leverage that effort for PC APUs? I don't think Sony would like to see a laptop APU which could run an almost 1:1 port of PS4 games ... AMD might very well be forced to leave the PC side stuck with slow DRAM standards designed for expansion sockets.It would be especially nice if PCs or tablets could leverage the nice bandwidth figures from the embedded GDDR5 configuration of PS4 or even the more modest ones Durango which still dwarf PC APU bandwidths commonly seen today.
saw this posted at anandtech
http://www.planet3dnow.de/cgi-bin/newspub/viewnews.cgi?id=1361486916
Question is what kind of deal they made with Sony ... do they have the right to leverage that effort for PC APUs? I don't think Sony would like to see a laptop APU which could run an almost 1:1 port of PS4 games ... AMD might very well be forced to leave the PC side stuck with slow DRAM standards designed for expansion sockets.
It would be especially nice if PCs or tablets could leverage the nice bandwidth figures from the embedded GDDR5 configuration of PS4 or even the more modest ones Durango which still dwarf PC APU bandwidths commonly seen today.
4C/4T Jaguar narrowly beating 2C/4T Sandy Bridge at the same clock speed in a highly parallel test isn't very shocking.
4C/4T Jaguar narrowly beating 2C/4T Sandy Bridge at the same clock speed in a highly parallel test isn't very shocking.
I think cinebench 11.5 is largely dominated by sse2 code. I'm not sure though if it actually uses any packed instructions or just scalar ones, in the former case a roughly 30% IPC improvement would really be on the low side of expectations otherwise that would be very good indeed.The E-350 scores 0.63 in cinebench 11.5, that's 1.6 GHz dual core. I guess a quad bobcat at the same 1.4 GHz clocks would score ~1.0?
Yeah, this is a huge increase in IPC...it has to be.
I think that's not really a fair comparison (for power), since that Celeron is a power-deoptimized incarnation of sandy bridge (compared to the other ulv chips).the CB results looks good considering the TDP...
I think the lowest performing sandy bridge part is the Celeron 847 (1.1GHz, 17W), and it scores something like 0.42 for a single core, Jaguar does 0.35 using a lot less power I think.
It is shocking to me to be honest because Bobcat was nowhere near it. This would be like Atom suddenly being on par with the i3 in multithreading too.
The E-350 scores 0.63 in cinebench 11.5, that's 1.6 GHz dual core. I guess a quad bobcat at the same 1.4 GHz clocks would score ~1.0?
Yeah, this is a huge increase in IPC...it has to be.
And don't forget, especially for single-threaded tasks integer IPC is more important than fpu one, and you really can't expect that much improvement there (well at least I wouldn't think AMD stated those 15% for nothing...).
It's going to be interesting to see how the 19W 1.6 GHz quad core (two module) Piledriver stacks up against Jaguar (15W Kabini) in general purpose code. According to the first benchmarks Jaguar seems to be slightly ahead in vector throughput when all four cores are used (and the clocks are normalized). Factor in the clock difference: 1.65 GHz * 1.1 = 1.815 GHz for Jaguar (AMD slides say it will have 10% higher clocks compared to Bobcat) vs 1.6 GHz for Piledriver (when all cores are taxed the turbo will be off). I would estimate that the multithreaded (general purpose) performance will be pretty close (because Jaguar has slightly higher clocks). In single threaded code Piledriver will likely beat Jaguar handily thanks to the 2.4 GHz turbo (Jaguar has no turbo to match that clock increase). The module architecture will also help Piledriver in 1-2 thread scenarios as well, since each module will only run one thread, and have exclusive access to all the shared resources (such as the L1 instruction cache and decode). Piledriver will likely win many application benchmarks, while Jaguar should be better in some games and CPU heavy software... assuming the GPU performance is identical...FWIW looks like the 2-module ULV Trinity part (A8-4555M) has been released. Unlike the 1-module version (A6-4455M, released ages ago) it didn't quite make it to 17W though instead it's now a 19W part. Clocks 1.6Ghz/2.4Ghz - so turbo clock should be higher than Jaguar but I don't know how often it's actually able to clock up that much.