AMD Shangai die size disclosed; Fudzilla hints at 0MiB L3 chip

B3D News

Beyond3D News
Regular
AMD displayed wafers of its 45nm Shanghai chip at CeBit, and Hans de Vries took the opportunity to create a comparison picture between Shanghai and Nehalem. Surprisingly, their die size is just about identical, so AMD doesn't have the previously expected die size advantage. Or do they? Fudzilla claims that a 0MiB version of Shanghai/Deneb, codenamed Propus, is also coming and that it should be more than 30% smaller than Nehalem.

Read the full news item
 
Better late than never, I was a bit slow on reporting this... ;) Anyhow, if anyone wants to discuss the 0MiB L3 version of Shanghai (aka Propus), this is the place to do it. Nehalem vs Shanghai could still be discussed elsewhere if you prefer.
 
If Fudzilla FUDs are said to be ASSUMED correct, the 2.8GHz shanghai may be a low end chip. Maybe they are comparing a higher nehalen against a lower shanghai.
 
The die size advantage would only apply of course if AMD creates a mask set that physically omits the L3. A dead L3 would take up just as much space as a live one.

The extra design option is doable, with some amount of extra cost.
Hopefully AMD's designed the chip so that the L3 can be removed without leaving a big bubble in the memory pipeline where the L3 used to be (perhaps a few cycles for a hard-wired L3 tag miss?).

Nehalem couldn't compete all that well by dropping the L3, due to its core size.
It might not want to, since its smaller L2 would mean ditching its L3 would have a larger impact. It's also possible that Intel's L3 is also designed to be higher performance than AMD's (not saying much going by Barcelona, but anyway), so Nehalem might suffer more from the lack as well.

AMD used the L3 to reduce coherency traffic and enable improved multisocket scaling (edit: not the only reason, but one of them), so a Shanghai without L3 might reside on lower socket count server or single-socket systems where that functionality is not as relevant.

There are cheaper Nehalem variants, at least in the desktop arena, where the multithreading and better design might still make AMD's life tough.
 
Last edited by a moderator:
Once "Fusion" enters the fray, is CPU L3 desirable?

Jawed

Could be. I would think that any shared on-die cache accessible by all cores (including GPU) would be very beneficial to programmers. Could yield some interesting physical and graphical effects.
 
Fusion doesn't seem applicable to the concerns the L3 was supposed to address, though the L3's presence outside of the server market may have more to do with AMD's limited ability to design multiple cores.

A shared last-level cache is helpful with reducing coherency traffic, and any high capacity cache is very useful for servers, even if the L3 is slow.
Barcelona's L3 was less than impressive because was slow and small.

The shared cache is helpful in some cases on the desktop, though the implementation is so dog slow in some cases that it was only marginally better than the shared FSB Core2 uses.
 
Fusion doesn't seem applicable to the concerns the L3 was supposed to address, though the L3's presence outside of the server market may have more to do with AMD's limited ability to design multiple cores.

A shared last-level cache is helpful with reducing coherency traffic, and any high capacity cache is very useful for servers, even if the L3 is slow.
Barcelona's L3 was less than impressive because was slow and small.

The shared cache is helpful in some cases on the desktop, though the implementation is so dog slow in some cases that it was only marginally better than the shared FSB Core2 uses.
I'm thinking of coherency between CPU and GPU cores quite specifically as well as whether a GPU core would benefit in its own right in using L3 (bearing in mind that the GPU in a Fusion configuration is stuck with a miserly 10-25GB/s).

I'd hope by the time Fusion arrives the GPU performs dramatically better than 780G's GPU...

Jawed
 
3dilettante, Fudo has said explicitly there would be a chip without the L3. It won't just be disabled.
 
I'm thinking of coherency between CPU and GPU cores quite specifically as well as whether a GPU core would benefit in its own right in using L3 (bearing in mind that the GPU in a Fusion configuration is stuck with a miserly 10-25GB/s).

I'd hope by the time Fusion arrives the GPU performs dramatically better than 780G's GPU...

Jawed

I dunno.
My expectations for the early iterations of Fusion are pretty low when it comes to the level of integration we can expect.

AMD's more conservative approach might mean they'll just slap a GPU core on-die or on-package with no real attempt to add any real coherence.
Since the mobile chips that Fusion debuts on are also the first chips with on-die PCI-E, it might mean that the GPU will just sit on one side of the PCI-E bridge, which would probably rule out any coherency at all.

If the GPU can take advantage of the cache hierarchy, I hope there would be some way to control what is kept coherent. Only certain kinds of data would be expected to be shared between the CPUs and the GPU, and any other traffic would probably pollute the L3.

3dilettante, Fudo has said explicitly there would be a chip without the L3. It won't just be disabled.

I think this makes sense, but I'm waiting for more data or confirmation.

An L3-free 65nm Phenom would have made sense too, but it hasn't happened for some reason.
 
I'm thinking of coherency between CPU and GPU cores quite specifically as well as whether a GPU core would benefit in its own right in using L3 (bearing in mind that the GPU in a Fusion configuration is stuck with a miserly 10-25GB/s).

I'd hope by the time Fusion arrives the GPU performs dramatically better than 780G's GPU...

Jawed
I don't know if you would want your GPU to compete with your CPU for L3 cache if you're not sharing much data. Current GPUs don't use a cache coherent connection to the CPU anyway right? So unless it is something more integrated than just a CPU and GPU together on one package/die, i don't see why you would want to unify the caches. You'd get more bandwidth if the GPU had its own cache anyway.

Edit: oops didn't notice that 3dilettante pretty much said everything i wanted to say already :p
 
but I do remember reading somewhere just a couple days ago where a AMD guy said what they are doing is much more than slapping two cores in one package. Who knows it could be just marketing talk. I also recall a recent Fusion slide where they showed 3 squares representing cores(2cpu, 1gpu)with a big rectangular chunk of cache underneath it.
 
aha! found it

You can integrate a CPU and a GPU by having an internal PCI-E bus,” said Hester. “But we’re trying to do a much tighter integration so that we get the best possible power efficiency. Putting more and more cores that use up more power but don’t change the user experience is not a good thing.” This tighter integration apparently involves having all the accelerators on one die.

http://channel.hexus.net/content/item.php?item=12024&search=AMD%20Fusion
 
Now we just need to know if he means Swift or he's saying their eventual goal is to have higher integration.

AMD's being forced to be more conservative about platform introductions makes me wonder if they can resist the temptation to slap the GPU on PCI-E, or whether another design delay will leave them with little choice.
 
I only got the last link to load, but anything dealing with the APU stuff is long-term.
The conceptual drawings are not exact enough to indicate much about what the first Fusion product will look like.

AMD previously set out a pretty gradual route for Fusion, starting with mostly separate and becoming more integrated over time.
 
Back
Top