Fusion die-shot - 2009 Analyst Day

4 cores is >440M trannies according to the presentation.

I think they want to design a ~200 sq. mm die, which points towards Redwood too, but with a 4MB L3.

4 cores = 440M trannies & 71 sq. mm.

They have already stated one core is 9.69 mm^2. That's ~40mm^2 for 4 cores.

This might however be the number without L2 cache, and the L2 cache size for each core is about 60% of the core size, which would make it about (10+6) * 4 = 64.

4MB L3 = 200M trannies & ~36 sq. mm (based on L2 density with the power gating ring added).

There is no L3 cache in llano.
 
1920x1080 is quite large, I believe you would need 24MB edram without AA and 40MB with AA. but devs could use multiple render targets, do compositing back and forth and tank your hard earned performance bought from a cache similar to that of a Nehalem EX or POWER 7.
 
Unless you can fit the entire framebuffer there is nothing to be gained from a large cache ... just putting it in for that 1 developer out of a 100 who would use tiling especially for it is hardly worth it.
 
1920x1080 is quite large, I believe you would need 24MB edram without AA and 40MB with AA. but devs could use multiple render targets, do compositing back and forth and tank your hard earned performance bought from a cache similar to that of a Nehalem EX or POWER 7.

And with MRT's it gets worse. Fast.
 
Unless you can fit the entire framebuffer there is nothing to be gained from a large cache ... just putting it in for that 1 developer out of a 100 who would use tiling especially for it is hardly worth it.

Come to think of it, a 2560x1600 frame, with four float4 rendertargets consumes 250MBytes, without AA. :oops:

That 2Gbit GDDR5 mem module is looking pretty good now with MCM, ain't it?
 
They have already stated one core is 9.69 mm^2. That's ~40mm^2 for 4 cores.

This might however be the number without L2 cache, and the L2 cache size for each core is about 60% of the core size, which would make it about (10+6) * 4 = 64.

There is no L3 cache in llano.

In another forum, they came up with the ~200 - 220 mm² for the Llano APU too. At first I again thought that this is too much but then I searched for a picture of the Propus core. Propus has a die area of 169 mm² but each core has only a size of ~ 23,5 mm² (including 512KB L2-cache). Therefore the complete die is 7 - 7,5 times bigger than a single core. It could be the same with Llano. So with each core having a size of ~16,7 mm² incl. the L2 cache the CPU side of Llano could have a size of 120 mm². this would leave ~80 mm² for the GPU side.

we will see how it plays out.
 
Nonsense!

Again, at the stated 9.69 mm² for the size of the CPU core alone, a simple die-shot image scaling shows that the total area of Llano is anything but sub-200 mm² part (223~225).
 
There is no L3 cache in llano.
According to...?

For what I know, they stated it would have "4MB cache", but that's from a 2008 roadmap, and we've not seen any full die shot which isn't or doesn't seem to be a fake.

Hell, even AMD's slide seem to show a fake die shot combining Istanbul and RV770.

Die size should be in the 150-250 sq. mm range, if you look at past CPUs from both AMD and Intel.

L3 is quite cheap and without it the IGP could lack BW.
 
AMD usually sums the L2 arrays and states the total size in various presentations, which technically is incorrect, as none of the CPU cores can access the other's L2 cache for direct operations (that's where the coherent traffic comes).
So 4*1M of L2 in Llano would turn in "4MB", for the sake of PR marketing convenience.
L3 is quite cheap and without it the IGP could lack BW.
Not in the AMD's case -- they are still re-using the L2 SRAM cell structures to build the L3 array, and that makes it quite fat-ish in comparison with Intel's designs.
 
AMD usually sums the L2 arrays and states the total size in various presentations, which technically is incorrect, as none of the CPU cores can access the other's L2 cache for direct operations (that's where the coherent traffic comes).
So 4*1M of L2 in Llano would turn in "4MB", for the sake of PR marketing convenience.
That's exactly what I was saying.

"4MB Cache" refers to a 2008 roadmap, but in 2009 roadmap there's no indication on cache size.

Some drawings show a crossbar between the IMC and the "xPU" (x = C or G), and L3 lies inside this IMC in current CPUs.


As for the relatively high cost of L3, it's still quite low if it adresses a bottleneck. 36 sq. mm for a 4MB L3 is a better move than twice the cores (17.7 sq. mm each including 1MB L2) or twice the GPU SIMDs (~50 sq. mm for 5 16x5 SIMDs based on Redwood to Juniper die size differential non-optimally downscaled to 32nm).
 
As for the relatively high cost of L3, it's still quite low if it adresses a bottleneck. 36 sq. mm for a 4MB L3 is a better move than twice the cores (17.7 sq. mm each including 1MB L2) or twice the GPU SIMDs (~50 sq. mm for 5 16x5 SIMDs based on Redwood to Juniper die size differential non-optimally downscaled to 32nm).
I don't think there will be L3 cache. At first sight, it seems like a convincing argument (shareable by gpu and cpu), but really gpus do quite well with quite small caches located at their rops. A shared L3 cache would enable some things you probably can't do efficiently otherwise (that is, simultaneous access of cpu and gpu to the same memory space), but just for integrated graphics there's probably not much point.
Also, I take the 4x1MB L2 cache as a hint there won't be any L3. I think AMD would have opted for smaller L2 if they'd planned on having L3.
 
AMD's L3 impl in their current architecture line is rather slow bandwidth-wise. The difference with a moderately fast DDR3 interface is negligible for that purpose. The only benefit is the low-latency cached access and as a L2 eviction buffer for the same reason (given a sufficient L3), but all this is next to no use for the GPU core, where bandwidth is the main deficit, not access latency.
 
How much less could Llano use as a singular package though?

I'd assume GPU at SOI 32nm would bring along some nice perks process-wise- enough to render tech like Optimus pretty much redundant, no?
 
I don't seen any useful amount of on die gpu cache on the die shot.

Is that die shot real? I heard a few people say it was a photochop.

In any case is there any reason excluding the possibility of a variant with larger on-die cache like they have with the Phenom II/Athlon II architectures for say a mobile variant where they know they have a maximum monitor resolution target?
 
How much less could Llano use as a singular package though?

I'd assume GPU at SOI 32nm would bring along some nice perks process-wise- enough to render tech like Optimus pretty much redundant, no?

Ati already has switchable graphics on laptops. I have a hd 3200 igp and 4330. I can switch between the two.


Aside from that , an add in card will allways be more powerfull than waht they can put in the cpu and have acess to its own faster ram pool.
 
What are the chances of seeing TRam?

Just putting it out there for consideration & discussion but everybody, including AMD, "knows" they have to improve their Cache structure/density.
 
What are the chances of seeing TRam?

Just putting it out there for consideration & discussion but everybody, including AMD, "knows" they have to improve their Cache structure/density.

Considering the amounts of close-to-ALU memory needed (ie for a framebuffer, possily with MRTs), DRAM die stacking/PoP/MCM seems more likely to me than edram/tram/zram.....
 
AMD went for 1st gen Z-RAM, then 2nd gen Z-RAM.
Now it's T-RAM.
Maybe the third pie-in-the-sky memory type is the charm?

T-RAM's operating principles are based on some rather novel ideas, though it sounds very new and not field tested.
Z-RAM initially relied on what was otherwise a sometimes problematic side-effect of PD-SOI, one that is going away eventually. The 2nd gen had some other idea.
It sounds like Z-RAM "worked", in that there were functioning devices. It did not work in the sense that it could work at the desired performance and reliability required.

On-die eDRAM at least has one high-performance CPU using it. Sure, its a chip that goes into systems that can cost more than a house/neighborhood, but it's at least something real and something that has had to face far more stringent requirements than AMD's flavor of the year memory tech.

At least as far as the first instantiations of Bulldozer, the cache capacities are not out of line with what we'd expect of SRAM.
 
Back
Top