Predict: The Next Generation Console Tech

Status
Not open for further replies.
Not saying it would fit, but could a large eDRAM function as a shared L3 between CPU and GPU?

I'm sure they could do this, however you can't just slap an L3 on top of two non-coherent core clusters to make them coherent. The whole thing has to be designed with this functionality. And this approach means that the L3 is inclusive, something AMD has never done before.

Personally, I think a big L3 cache on the CPU would be a bad design choice. Big eDRAM for the GPU is a given. Giving the CPU wide/low latency access to it would be a nice touch, especially if they're on the same SoC. Giving the CPU its own pool of eDRAM isn't necessarily a bad idea. But I don't think a big cache is the right way to go. Then you need to manage it with tags, and you tend to get main RAM accesses going through it typically increasing their latency (usually you mark pages as either cacheable or not, not different per cache). It's nice to have for server applications or big desktop applications that are written with general hardware in mind but games are probably better off manually managing a big pool of memory beyond L2.
 
Maybe a shrunk Bobcat would achieve this. Jaguar no way.
A Jaguar core measures just 3.1 mm^2 in 28nm. That's an official number from AMD. How much area does one need for 2 MB SRAM in 28nm? Surely less than 10 mm^2. The <25mm^2 number of kalelovil is quite realistic.
And this approach means that the L3 is inclusive, something AMD has never done before.
Actually, the L2 caches of BD/PD as well as Jaguar are inclusive. But I doubt one would need an L3 to keep two Jaguar CUs coherent. It could be done quite easily over the Northbridge uplink of the L2 controller. BD/PD does it the same way. You just have to hook up two CUsmodules to a northbridge (like on Trinity) handling the snooping traffic (of course the L2 controller has to be taught to send the snoop requests in the first place, but this appears as a minor addition).
 
Last edited by a moderator:
A Jaguar core measures just 3.1 mm^2 in 28nm. That's an official number from AMD. How much area does one need for 2 MB SRAM in 28nm?

Surely less than 10 mm^2. The <25mm^2 number of kalelovil is quite realistic.

Hadn't seen the 3.1mm^2 number. That's smaller than I expected. Given that I agree my estimation was too large (and my reaction wasn't justified), and also agree with you that 2MB of cache (which is more than SRAM arrays mind you) should take less than 10mm^2.

Actually, the L2 caches of BD/PD as well as Jaguar are inclusive. But I doubt one would need an L3 to keep two Jaguar CUs coherent. It could be done quite easily over the Northbridge uplink of the L2 controller. BD/PD does it the same way. You just have to hook up two CUsmodules to a northbridge (like on Trinity) handling the snooping traffic (of course the L2 controller has to be taught to send the snoop requests in the first place, but this appears as a minor addition).

I said L3 cache for a reason :p I know AMD has done inclusive L2 caches. This isn't really a minor point; in BD/PD but even Jaguar's case we're talking about a fairly big L2 cache that would need a much bigger L3 cache for inclusive to begin to make sense. It's not like backing Intel's 256KB dedicated L2s.

Yes, BD/PD has external coherency because it is designed to interface over coherent HyperTransport. This was an explicit design decision done because AMD was targeting servers, something we've never seen any indication of for Bobcat or Jaguar. So there's no reason to assume coherency is a drop in functionality of Jaguar's CUs and that it wouldn't need to be added. "Minor addition" or not (I don't know if I agree with this) it still counts as a customization. It isn't just hooking it up to a northbridge, HT and HTcc links aren't really the same. You can't just drop an FX Bulldozer in an SMP motherboard for instance..
 
is an 8 core jaguar 1.6Ghz enough power for the console?
it should be just the equivalent to a dual core ivy-bridge.

it would´t surprise-me if they clocked it higher at 2.0 2.2Ghz, at an initial power cost, thinking in future shrinks
 
Yes, BD/PD has external coherency because it is designed to interface over coherent HyperTransport. This was an explicit design decision done because AMD was targeting servers, something we've never seen any indication of for Bobcat or Jaguar. So there's no reason to assume coherency is a drop in functionality of Jaguar's CUs and that it wouldn't need to be added. "Minor addition" or not (I don't know if I agree with this) it still counts as a customization. It isn't just hooking it up to a northbridge, HT and HTcc links aren't really the same. You can't just drop an FX Bulldozer in an SMP motherboard for instance..
It has nothing to do with multi CPU setups or HT or coherent HT, respectively. "External" coherency is basically a function of the northbridge (as it provides the glue), as long as the cores expose some basic SMP capabilities to the northbridge. Ask yourself how Trinity keep the two modules coherent. There is no possibility to get external coherency with Trinity, it lacks HT completely. Nevertheless, one module can snoop the second one through the northbridge. As Jaguar supports some coherency protocol anyway (the usual MOESI), the only extension needed is that the L2 cache controller shared within a CU sends snoop requests (and processes the answers) to the L2 controller(s) of the other CU(s). Basically just as it is done in Trinity. No traffic whatsoever goes over HT (which isn't present to start with) in that case.
And as the upcoming Kabini SoCs (with Jaguar cores) are supposed to have a much improved HSA architecture (possibly enabling some kind of coherency between GPU and CPU part), the northbridge used there is probably able to handle that, as well as the cores.
 
may be if they customize the cores heavily !

AMD is saying that Jaguar should clock 10% higher than Bobcat within the same power envelope. That means that at 2 GHz Jaguar should consume a bit more power than Bobcat, but it should be able to run at 1.8 GHz with the same power consumption as Bobcat. We should also not forget that Jaguar should have a 15% higher IPC and a much broader set of supported instructions, including full support for AVX AFAIK (The SIMD unit is 128-bit wide, but it does support 256-bit instructions).
EDIT:Rereading this paragraph makes me wonder if I used the word 'should' a few too many times :p.

Another thing, 8 Jaguar cores including 4 MB of L2 cache should be under 50 mm² in size on 28 nm. SRAM cache on 28 nm is roughly 2 Mbit/mm² (very rough estimate, it's probably denser than that), at 32 Mbit that means 16 mm² in cache. Those 8 cores are 3.1 mm² each, which is 24.8 mm² in total. Adding everything up, you get 40.8 mm² for 4 MB cache and 8 Jaguar cores. Estimating it at 50 mm² for 8 cores including cache seems like a safe bet in other words.
 
It has nothing to do with multi CPU setups or HT or coherent HT, respectively. "External" coherency is basically a function of the northbridge (as it provides the glue), as long as the cores expose some basic SMP capabilities to the northbridge. Ask yourself how Trinity keep the two modules coherent. There is no possibility to get external coherency with Trinity, it lacks HT completely. Nevertheless, one module can snoop the second one through the northbridge. As Jaguar supports some coherency protocol anyway (the usual MOESI), the only extension needed is that the L2 cache controller shared within a CU sends snoop requests (and processes the answers) to the L2 controllers of the other CU(s). Basically just as it is done in Trinity. No traffic whatsoever goes over HT (which isn't present to start with) in that case.
And as the upcoming Kabini SoCs (with Jaguar cores) are supposed to have a much improved HSA architecture (possibly enabling some kind of coherency between GPU and CPU part), the northbridge used there is probably able to handle that already, as well as the cores.

Yes, Trinity is designed with inter-module coherency. Jaguar isn't necessarily designed with the L2s externally coherent. I said that if the support isn't there then a vanilla Jaguar compute unit needs to be modified. Do you disagree with this? Because it doesn't look like you are and I'm not saying anything else.

No, I'm not saying it literally needs HT, but it obviously needs SOME coherency interconnect. You don't look like you're saying anything different. And even if it does already have coherency with the GPU (unknown) that doesn't really imply the ability to broadcast coherency traffic to more than that GPU.

Anyway, I don't get why anyone feels the need to try to make the case for a vanilla totally unmodified Jaguar compute unit in this.. what difference does it make? That's the real point I'm getting at..
 
I´ve read on another forum a guy who claims Orbis is a Kabini + 8850. Is this powerful enough?
Powerful enough for what?...

Durango is 8 jaguars + 8970 ghz edition so no.
Unless Durango launches at $500 and Orbis sells at $200. Or whatever other differentiators there are. The business of these consoles is a complete unknown and whole other matter entirely, and shouldn't be derailing this thread.
 
If the rumor is true that Steamroller was cancelled, and that Sony initially wanted Steamroller cores, would you rather see them go with many Jaguar CU or rolling back to Piledriver?
It seems AMD went back to piledriver for the A10-6800K. If the devkit has an A10 variant, maybe it would make more sense to go with that too.
 
If the rumor is true that Steamroller was cancelled, and that Sony initially wanted Steamroller cores, would you rather see them go with many Jaguar CU or rolling back to Piledriver?
It seems AMD went back to piledriver for the A10-6800K. If the devkit has an A10 variant, maybe it would make more sense to go with that too.

I wonder if the PS4 chip was to get Piledriver from the beginning, with Steamroller going slow anyway.
By doing this, AMD also had its crash plan for the PC, which results in the A10 6800K.
 
Status
Not open for further replies.
Back
Top