Predict: The Next Generation Console Tech

Status
Not open for further replies.
What? Trinity has Piledriver cores and is very often a bit behind the K10(.5) cores (the fastest ones are actually the Husky cores in Llano) in terms of IPC. Factor in power consumption and die size and you may very well get a higher performance/W and performance/mm² with Jaguar.

funny thing, if I had to build a comp. from scratch I would probably choose an Athlon II X4 641, with a mediocre FM1 mobo (they're designed to supply the full A8 with GPU), cheapo ddr 1866 CL11 and try to overclock it.
not upgradable but who cares :p, it's quite cheap next to an i3 or FX 4300 and should be gaming able.
 
Could a single Jaguar core with AVX emulate a single Xenon thread? If you've got 8 cores then you could do one thread per core and possibly emulate the Xbox 360? Or is backwards compatible out of the question without recompiled binaries?
Depends on the workload. Xenon averaged 0.2 IPC for normal workloads, due to in-order and other issues. Comes to a normalised clock of 640MHZ. Jaguar gets up to 0.8 IPC on normal workloads, giving a normalised clock of 1280MHZ. So, assuming a normal workload, as long as every PPC instruction could be emulated in 2 or less x86 instructions, you could theoretically do it.

In practice it's not so easy. Optimised loops, register bank deficiencies, no built in dot-product instruction. I don't think you could emulate a xenon thread with a Jaguar core with any performance guarantee.
 
The posited scenario also involved emulation (edit for clarity: emulator) overhead as well, so chop all that by some unacceptably large factor.
 
Last edited by a moderator:
A Jaguar core does 8 flops/clock. While it supports AVX, it is done by splitting it up in two 128bit parts. And Jaguar does not support FMA. It has the classic ADD and MUL pipes. That means 8 cores at 1.6GHz deliver 102.4 GFlop/s. I doubt AMD would heavily modify the core to add FMA or full 256 bit units. The would necessitate a full redo of all data paths from register files to L1 access (on top of the probably easier modifications to the decoders).

Are 8 Jaguar cores good enough for physics work ?
 
Isn't xbox 360 currently CPU limited? Would be bad case if its like that again next generation...

And I would think debs would use open cl for the physics instead of use a CPU?
 
What capabilities are these supposed DSPs supposed to have anyway? It's a pretty broad term... I suppose I should ask what features would be needed (various audio codecs?).
 
And I would think debs would use open cl for the physics instead of use a CPU?

For some parts of it, yes. Some parts of physics processing fit the OpenCL model really well, some don't. I think an ideal implementation for next-gen is a hybrid one, where parts of the processing is offloaded to the GPU, and parts are handled on the CPU.

That's not really that a good idea today on PC, because moving data between CPU and GPU is often more expensive than doing the processing. Just one more thing where HSA would shine...
 
Depends on the workload. Xenon averaged 0.2 IPC for normal workloads, due to in-order and other issues. Comes to a normalised clock of 640MHZ. Jaguar gets up to 0.8

Where do the ipc numbers come from? Id be interested to see how they compare to other architectures.
 
Depends on the workload. Xenon averaged 0.2 IPC for normal workloads, due to in-order and other issues. Comes to a normalised clock of 640MHZ. Jaguar gets up to 0.8 IPC on normal workloads, giving a normalised clock of 1280MHZ. So, assuming a normal workload, as long as every PPC instruction could be emulated in 2 or less x86 instructions, you could theoretically do it.

In practice it's not so easy. Optimised loops, register bank deficiencies, no built in dot-product instruction. I don't think you could emulate a xenon thread with a Jaguar core with any performance guarantee.

Xenon has a very gimpy memory pipeline, but a lot of registers. This means that code optimized for it tries hard to use as many registers as possible, while minimizing touching any memory, even the cache. A Jaguar core is the exact opposite of this. Only 16 AVX registers per core, but a really, really good memory backend (well, compared to Xenon, anyway :p).

This difference makes any kind of emulation hard. While a single Bobcat/Jaguar core is IMHO much more powerful than a Xenon thread (by somewhere around 3-4x), emulation is totally out of the picture.

Recompilation should be pretty easy, though.

You've made very good points and it has been enlightening to find out more about the old Xenon architecture.

As this generation progressed the PC CPU requirements of many console ports seemed to increase quite substantially. Does this mean towards the latter part of the generation developers were getting much more than the 0.2 IPC average earlier? Even if you were to get a recompiled binary, would that be enough if for instance you've got code which wants a Core iX to run adequately?
 
Gamefest 2008
Slide 43

It was mentioned again in one of the 2010 presentations.

If we go by Treyarch Call Of Duty Titles:

World at war required minimum:

P4@3Ghz
512MB RAM
ATI Radeon X1600

Black Ops 2 minimum:

Core 2 Duo 2.66Ghz
2GB RAM
HD 3870

Essentially the CPU requirements have doubled in the intervening time period.
 
What capabilities are these supposed DSPs supposed to have anyway? It's a pretty broad term... I suppose I should ask what features would be needed (various audio codecs?).
If he told you, he'd have to kill you... Or I'd have to kill him... Or someone would have to kill his job... Can't remember which. :)

Where do the ipc numbers come from? Id be interested to see how they compare to other architectures.
That gamefest link is one place, it's been mentioned in multiple developer conferences. The number was obtained by measuring the real-world performance of actual games, and has been reconfirmed with later games internally.
2wovqjm.png


The AMD number you'll have to take on faith ;)
 
how can a FPGA be used in the nxt gen consoles?
I'm assuming you're asking whether they could use a FPGA in a next gen console. The answer is generally no. FPGAs, as they are currently manufactured, are expensive and slow in comparison to a fixed function chip that would get you most of the way there, like a DSP. They could be rearchitected, but as it stands, they're only used for silicon development and so manufacturers concentrate on the wrong features (in terms of mass market adoption) when designing new ones.
 
Status
Not open for further replies.
Back
Top