Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Well if there is a real Power7 (no in marketing term) then there might be 3MB of L3 (edram).Well, they've also said it has quite a lot of eDRAM, so it might be used as L3 cache?
liolio said:Damned any news about the Jaguar cores? AMD conf at hot chip was this morning @ 8:45 10:15 west coast time. Still nothing in the SMS
EDIT
oops found that just I after I posted:
http://semiaccurate.com/2012/08/28/a...e-jaguar-core/
Not the most reliable source though.
Where was FMA mentioned? In the slides, there are still separate FPMUL and FPADD units.Hot chips inofrmation about both streamrollers and jaguar cores are out
I've to say that Jaguar cores look indeed really good. Whether they are good for a console is another matter though.
AMD promises:
10/15% improvement to IPC.
clock speed up by 10 % within the same power budget.
Big boost in FP performance more than x2. SIMD units is now 4wide (twice as wide than Bobcat) and FMA are supported.
I'd expect that you can have more than 4 cores, it's just that a single 2MB slice of L2 will service 4 of them. To have more, you'd have to have more L2.It looks like a really good one from AMD. The only thing I can't see holding its use in pretty high performance console is the limitation to 4 cores.
Hot chips inofrmation about both streamrollers and jaguar cores are out
I've to say that Jaguar cores look indeed really good. Whether they are good for a console is another matter though.AMD promises:
10/15% improvement to IPC.
clock speed up by 10 % within the same power budget.
Big boost in FP performance more than x2. SIMD units is now 4wide (twice as wide than Bobcat) and FMA are supported. Big boost vs Bobcat.
The cache hierarchy look really good to with the shared 2MB of L2.
This irks me a bit every time it comes up. Modern approachs to OOOe are not features than can be plugged on to existing designs. I doubt any of Intel, AMD or IBM would design an OOOe chip that didn't use a PRF, and when you design a chip that uses a PRF, the PRF is literally the first thing that gets added to an empty design, everything else would be designed in around it. Any OOOe Atom will share very little, if any design with the existing Atom -- it would, in fact, be a completely new chip. Given how bad rep Atom has with consumers, when Intel finally replaces it, I doubt the new one would even be called Atom.
I may indeed missread that slide. It's weird though, they say they can do 4 multiply and 4 adds, so it looks like it would have been at the same time. It's indeed misleading.Where was FMA mentioned? In the slides, there are still separate FPMUL and FPADD units.
That would be nice if doable, I honestly don't know enough to say one way or another.I'd expect that you can have more than 4 cores, it's just that a single 2MB slice of L2 will service 4 of them. To have more, you'd have to have more L2.
Given that the L2 acts as a snoop filter, doing 2*4 multicores should not be too hard. Of course, this would make it a little less of a clean design to use when programming -- sharing data through L2 would likely be much faster than hitting the other core, so the very least you want to do is to treat it as a numa machine and design your programs to minimize that.
Well I was just trying to match the rumors and early document that hinted at 6/8 core running around 2GHz.Whether anyone wants to use the die area for 8-cores instead of more GPU is a whole another business. I'd expect that a 4-core Jaguar at 2GHz would be ~3-4 times faster than Xenon. I don't know about you, but at that point I'd be very tempted to spend the silicon on more GCN CUs instead.
Well for me it turned out better than expected, I was hoping it was good not for console considerations only.Considering this is the next gen console thread, I'd call it thoroughly disappointing.
I was holding out hope for a surprise, but sadly it's as expected.
Kinda makes you wonder just how poorly a stripped down bulldozer tested out to make this look like an attractive option.
It's PR speak. Watson uses 3.5 GHz 8 core POWER7s. That'll be the 710 going by Wikipedia, with 32 MBs eDRAM L3. Big, hot, expensive chip designed for supercomputers and servers, with lots of features useless for a console. It makes no sense to put a POWER7 in a small home console. The CPU is pretty certainly something other than the same CPU in Watson. Ergo it's 'Power 7 architecture' which could mean almost anything.The "same processor tech" statement is ambiguous, but the "same #power7 chips." not.
It is a dual issue architecture, it has two FPU/SIMD pipes so it can do separate MUL and ADD instructions at the same time. There is nothing misleading.I may indeed missread that slide. It's weird though, they say they can do 4 multiply and 4 adds, so it looks like it would have been at the same time. It's indeed misleading.
They also said that they support AVX operations, so AVX supports FMA so my brain took a short cut...
BG's source said his source was one saying Durango is 1+ GFLOPS GPU, not 1.1-1.5.
They can do multiples and adds at the same time -- as separate instructions on separate registers, issued to separate execution units. This is different from doing FMA, where you do an add immediately on the result of the multiply and a third register. Which is better? It's complicated. Separate units often have lower latencies for FADD, which helps in some situations, but longer latencies when you are doing FMA, which is what a lot of the vector math you do will be all about.. It's weird though, they say they can do 4 multiply and 4 adds, so it looks like it would have been at the same time.
The hard part of putting more cores in a system is cache coherency, or snooping. Every time you write to a cache line for the first time, or read in a new cache line, you effectively need to ask every cache in the system if they have that line. In older cache systems, that's literally what happened. As you add more cores, that doesn't scale.That would be nice if doable, I honestly don't know enough to say one way or another.
FYI hardware.fr (so in French) state that the L2 interface can deal simultaneously with 24 read/write operations, I don't know if that would be a limitation when adding (or at which point it would start to be a bottleneck once adding core).
As a side note from your post I can tell that you have a positive view of this architecture
So hypothetically:
200mm SOC:
8 Jaguars @ 2GHz, ~ 64Gigaflops of AVX throughput
Acert93 seems to suggest it will be widely more powerful. We don´t know how old BG´s info is.