Predict: The Next Generation Console Tech

Status
Not open for further replies.
Dont necessarily have a dog in this fight, but lherre also told me via pm that more RAM, lesser GPU was a good summation of Durango vs Orbis (when I asked). That was about six months ago too.

People love to qoute lherre saying Durango more beast, it may well be, but I got that conflicting info from him.

It could quite possibly be that he (and perhaps Bgassassin's source as well) was referring to an earlier kit, one that IGN said had a HD6670 in it.
 
A lot of people talk to friends and family about work NDA or no NDA.

Most of the leaks I've seen over the years aren't from developers, but from people with secondary access to the information, as the set of disclosed people get larger, the leaks usually get more frequent.

Pretty much, you learn very quickly that even trivial details can give away much larger pieces of the picture if you aren't carful. The recruiter that nabbed me made this mistake, thinking that a 2 letter codename would be obscure enough.

Although to be fair I later nearly made a similar mistake not long after :oops:

I've certainly had a few cases where people have repeatedly bugged me for details... "we all work for the same company! it'll be fine, tell me!" etc
 
Could an 8 core Jaguar at say 1.6Ghz emulate the 6 threads of Xenos? It seems the single threaded performance wasn't anything to write home so if they emulate it at a thread per core they should be able to do it so long as they have sufficiently beefy vector units?
 
Could an 8 core Jaguar at say 1.6Ghz emulate the 6 threads of Xenos? It seems the single threaded performance wasn't anything to write home so if they emulate it at a thread per core they should be able to do it so long as they have sufficiently beefy vector units?

Maybe in some capacity but probably far from a lock. If 3x ~1.25GHz Broadway cores (assuming that's accurate) are struggling with a lot of ports then emulated code on cores that aren't really that much faster in integer scalar stuff and not that much more highly clocked then emulation is going to be tough. And the SIMD part isn't really easier, it can be expensive matching some (common) VMX128 operations to SSE4. The lack of registers for both integer and especially SIMD doesn't help. There'll be a fair amount of spilling even with a very aggressive register allocator that spans multiple basic blocks, and Jaguar has neither the decode width nor extra load/store capacity to really handle a bunch of spilled emulated register accesses. Still, Xenon's single threaded performance is pretty famously poor, no doubt due in large part to the awful cache miss latencies and tremendous overhead on branching (even correctly predicted taken branches have an 8 cycle latency and pretty tiny fetch buffers to hide it, if it's the same as Cell PPE anyway), so if Durango has decent latencies it would do a lot better. But at least some of the emulation overhead will turn into more main RAM accesses, particularly in facilitating a larger code footprint.

Here's a question, why would it only be 1.6GHz? It doesn't look like that's anywhere close the limits of the uarch and this isn't a terribly power constrained environment. Maybe it'd be 1.6GHz with all cores active but allow something higher when they aren't (as would probably be the case during emulation)?

The thing that sounds fishy with the 8 core Jaguar rumors is that it's using a shared L2 cache and sharing that 8 ways doesn't sound trivial. It's possibly it's multiple clusters w/shared L2 but that doesn't sound like a drop in feature either. Maybe I'm altogether underestimating AMD's potential involvement and they're pulling off something even more customized than a Jaguar with more cores and different cache arrangement.
 
The thing that sounds fishy with the 8 core Jaguar rumors is that it's using a shared L2 cache and sharing that 8 ways doesn't sound trivial. It's possibly it's multiple clusters w/shared L2 but that doesn't sound like a drop in feature either. Maybe I'm altogether underestimating AMD's potential involvement and they're pulling off something even more customized than a Jaguar with more cores and different cache arrangement.

Jaguar is the name of both the CPU core and the combination of four Jaguar CPU cores and shared L2 cache into a 'Jaguar Compute Unit'.

Graphics and north bridge functionality (such as possible L3 cache, interconnects and the memory controller) which go along with this to make up a SoC are however are not part of the Jaguar terminology and I am guessing quite modular in design.
Changing these would not make the design a customised Jaguar, but simply a separate CPU, APU or SoC using Jaguar CU(s).

05wvegas.lg5.jpg
 
Last edited by a moderator:
Jaguar is the name of both the CPU core and the combination of four Jaguar CPU cores and shared L2 cache into a 'Jaguar Compute Unit'.

Graphics and north bridge functionality (such as possible L3 cache, interconnects and the memory controller) which go along with this to make up a SoC are however are not part of the Jaguar terminology and I am guessing quite modular in design.
Changing these would not make the design a customised Jaguar, but simply a separate CPU, APU or SoC using Jaguar CU(s).

05wvegas.lg5.jpg

Maybe Durango CPU is a 4 Jaguar CU, with 4 jaguar cores each...

Microsoft's 'Durango' development kits are already in developers' hands, and the CPU at the heart of the machine is a monster. Where 360 has a three-core CPU from 2005, Durango promises four hardware cores, each divided into four logical cores
 
Jaguar is the name of both the CPU core and the combination of four Jaguar CPU cores and shared L2 cache into a 'Jaguar Compute Unit'.

Right, well the underlying point here is that you'd either be looking at 8 cores sharing the same L2 cache, which would definitely be a customization/deviation, or you'd be looking at two compute units. While the latter might seem obvious it implies that the compute units are basically SMP capable and can stay coherent. Such capability, if it exists, is probably not going to be exposed on any commercial Jaguar SKUs. It could still be part of the design but unexposed. But if not it's going to be mean customization.

It would not surprise me at all if we are looking at customization here, and more extensive than that. For instance, XBox 360's CPU cores may have used Cell's PPE as a base but the SIMD pipelines and architectural resources were pretty majorly rebalanced and MS didn't just get a new type of encoding with more registers but plenty of custom instructions solely for their purposes. I'm sure IBM's bill for this didn't run that cheap, and I'm sure AMD would love to receive something similar (and if MS was willing to pay it for XBox 360 I don't see why they wouldn't be willing to pay the same amount or more for the successor).

Such customizations could include instructions and functional blocks to make BC more efficient.

Maybe Durango CPU is a 4 Jaguar CU, with 4 jaguar cores each...

4 physical cores with 4 "logical cores" doesn't fit this description at all, it pretty unambiguously refers to SMT. 16 physical cores is also pretty batshit insane. 8 is on the high end of feasible.

There are a bunch of conflicting rumors out there. People have to come to terms with the fact that they can't all be true.

How big is one Jaguar CU in 28nm? Charlie mentioned that nextgen main chip is huuuge 500mm2+.

I'm not aware of any numbers or die shots, but a Bobcat core with 512KB L2 cache is allegedly about 8mm^2 on TSMC 40nm. Jaguar isn't hugely different although it has 128-bit SIMD units and can have twice as much cache per core, but it's on a new process node so I'd expect it wouldn't be that much bigger. Maybe a similar figure per core w/512KB, so an 8-core with 4MB of cache may be < 70mm^2, assuming good scaling to 28nm (which AMD has demonstrated so far). Not a huge amount by any means.

500mm^2 sounds ridiculous though. I'm not aware of any consumer mainstream chip, even GPUs, reaching these sizes. And the ones that get up there cost a few hundred dollars.
 
Last edited by a moderator:
Largest 28nm GPU chip AMD made was 7970 [352mm2]. With Radeon 8000 that size will go up to ~400mm2.

Manufacturing [and cooling] of APU that is even bigger than that would be very tricky, especially if they want to make it viable for high powered use [high GPU clocks].
 
Largest 28nm GPU chip AMD made was 7970 [352mm2]. With Radeon 8000 that size will go up to ~400mm2.

Manufacturing [and cooling] of APU that is even bigger than that would be very tricky, especially if they want to make it viable for high powered use [high GPU clocks].

is it possible to put 4-8 jaguar or steamroller cores and 2 gpus of 1024 gcn or more cores each in one single soc without any edram or with wide IO ram config ?
like a MCM setup ?
 
The thing that sounds fishy with the 8 core Jaguar rumors is that it's using a shared L2 cache and sharing that 8 ways doesn't sound trivial. It's possibly it's multiple clusters w/shared L2 but that doesn't sound like a drop in feature either. Maybe I'm altogether underestimating AMD's potential involvement and they're pulling off something even more customized than a Jaguar with more cores and different cache arrangement.

Not saying it would fit, but could a large eDRAM function as a shared L3 between CPU and GPU?
 
It is possible Charlie's 500mm2 is pure BS or his 'source' mistaken it with interposer. I can believe 500mm2 interposer, but not custom chip die size.
Just look at prior generations and their respective silicon budgets.
 
I think x720 APU will be 300-350mm2, with additional 100-200mm2 interposer space left from two-four memory stacks [4gbit DDR4 chips].
 
Status
Not open for further replies.
Back
Top