Predict: The Next Generation Console Tech

Status
Not open for further replies.
Playstation orbis will have either piledriver or steamroller cores because in that vg247 article it was clearly mentioned that A10 will stay as the base for orbis ! AFAIK the A10 name is only applicable for Amd's mainstream cpus; the count of the cores will be 4 or more . The core clock will also be significant if not 4 ghz . And every rumor is pointing at only one gpu in orbis.

A10 is in the *dev kits*, probably not in the final console. If the purpose is to train the devs to use an AMD HSA system, the A10 is the only presently available chip with any sense. Using it in a dev kit says nothing about whether they want to use a Jaguar or BD in the final console.
 
Wasn't many rumor said Durango&PS4 have almost same spec?of course i don't want this become true,because it will be not fun anymore lol

The power budget, not the cash budget.

Last gen, the performance of the console was pretty much determined by the cost they were willing to pay for the silicon. This gen, I don't think it is anymore -- instead, they have a pretty hard limit of how much power they can dissipate in a console envelope. Every watt spent on the CPU is a watt not spent on the GPU. That's why a lot of devs, me included, want a low-power cpu this time around. After you have those 8 Jaguar cores, spending one watt of power on the GPU is just *better* than putting it in the CPU.
Platon just talking about price budget,but yeah MS(+Sony) will take care the power budget
 
Could a single Jaguar core with AVX emulate a single Xenon thread? If you've got 8 cores then you could do one thread per core and possibly emulate the Xbox 360? Or is backwards compatible out of the question without recompiled binaries?

Xenon has a very gimpy memory pipeline, but a lot of registers. This means that code optimized for it tries hard to use as many registers as possible, while minimizing touching any memory, even the cache. A Jaguar core is the exact opposite of this. Only 16 AVX registers per core, but a really, really good memory backend (well, compared to Xenon, anyway :p).

This difference makes any kind of emulation hard. While a single Bobcat/Jaguar core is IMHO much more powerful than a Xenon thread (by somewhere around 3-4x), emulation is totally out of the picture.

Recompilation should be pretty easy, though.
 
A10 is in the *dev kits*, probably not in the final console. If the purpose is to train the devs to use an AMD HSA system, the A10 is the only presently available chip with any sense. Using it in a dev kit says nothing about whether they want to use a Jaguar or BD in the final console.

A10 is not quite a real HSA, that said. If you're right you might well see Jaguar APU based devkits shipped to the devs as soon as possible, even though it looks kind of like a Wii U without the edram :).
This chip, Kabini, is the first real HSA one - most visible feature is the shared address space. Then, or else, can Sony get Steamroller?
Steamroller is delayed for 2014 on the desktop, with maybe AMD having problems with 28nm from GloFo, I have no idea. I therefore speculate they are trying to get the Steamroller version for the PS4 through the door first.

The desktop and consoles Steamroller would be different in a number of way, different foundries is possible, along with g-ddr5 + side memory on the console versus ddr3 on the desktop, the cores would be identical and the GPU design too but quite bigger on the PS4.. I wonder how big we can expect the chip to be?
 
Btw MS has acquired recently patents on AR glasses. It falls in line perfectly with the leaked pdf's we got "recently" which MS tried to hide.

I am not sure if that requires additional significant performance to implement.

I have also heard recently about another rumor that MS has actually invested on 360 degree wall projected imagery while Sony has invested on an alternative technology for 360 degree AR viewing.

If anything is true that should also be factored in relation to the performance they can pack with a certain price.
 
360 degree wall projected imagery coming 2014,according to rumor,so it's unlikely this one will bundle in nextbox
 
I just don't think large cores with a high clock make sense in consoles anymore. More of smaller cores and clocked at a lower will be a lot more power efficient in a console. Higher frequency require larger transistors and higher voltages.

For leakage power, larger transistors will have more static power dissipation. While leakage doesn't depend directly on frequency, it does get worse with increasing temperature. A higher clocked design will run hotter unless cooled more aggressively (additional cost)

Dynamic power dissipation does depend on frequency, area, and the biggest factor, square of the voltage.

It honestly wouldn't surprise me if an 8 core 1.6 GHz Jaguar based CPU had a power consumption that was over 50% lower than a 4 core 3.2 GHz Steamroller CPU, but the performance difference for a typical parallel CPU gaming load was much closer. Obviously, the 3.2 GHz CPU would have better single thread performance, but I'm not sure that matters much.
 
The SIMD capabilities would be interesting if the Jaguar rumour is true. As I understand it Jaguar will feature AVX giving 8 cores at 1.6 Ghz a peak throughput of 204.8 GFLOPS or identical to the Cell in PS3.

That is unless it's an enhanced version of AVX - perhaps bringing in FMA from AVX2?

Either way having that instruction set in use on the consoles at a similar peak performance to a good quadcore from today probably means good things for future PC ports.

I'm guessing non SIMD but heavily multi threaded performance would be around half a Sandy/Ivy quad at 3.2 Ghz while single thread performance would be around 1/4.
 
The main game engine loop, or the AI typically are very limited by single thread performance. I think it matters a huge lot, while not being that critical on a console.
The lack of API overhead is helpful on console but I'd say, if they all switch to low power cores, hopefully it puts a dent in PC games (some of them) requiring an i5, or games that run well on the fastest pentium G but run like crap on a phenom II, bulldozer, core2quad etc.

It's also only acceptable because Xenon and PPE had a low single-thread performance to begin with. So, we're spared of a situation where the Jaguar cores are slower than the last gen consoles for a single thread.
Then there are the optimizations. I don't believe the optimizations have a really great impact per se (compilers, etc.) but the performance predictability of a fixed target probably allows to make the best compromises so that the game can run smooth (or at least at 30fps)
 
The SIMD capabilities would be interesting if the Jaguar rumour is true. As I understand it Jaguar will feature AVX giving 8 cores at 1.6 Ghz a peak throughput of 204.8 GFLOPS or identical to the Cell in PS3.

That is unless it's an enhanced version of AVX - perhaps bringing in FMA from AVX2?
A Jaguar core does 8 flops/clock. While it supports AVX, it is done by splitting it up in two 128bit parts. And Jaguar does not support FMA. It has the classic ADD and MUL pipes. That means 8 cores at 1.6GHz deliver 102.4 GFlop/s. I doubt AMD would heavily modify the core to add FMA or full 256 bit units. The would necessitate a full redo of all data paths from register files to L1 access (on top of the probably easier modifications to the decoders).

Edit:
That means for pure number crunching using the vector extensions, a Jaguar core has the same theoretical peak throughput as a Xenon thread. Xenon has the advantage of two additional scalar FP units besides the VMX unit (can in theory be used in parallel, but is probably often hard to realize in real world code). Jaguars advantage would be that the ADD + MUL setup is actually more flexible and achieves a higher throughput outside of dot products/matrix multiplications (whenever adds and muls can't be combined to an FMA) even in the most preferable condition for Xenon (especially as it also features a lower latency). And Jaguar will probably crush Xenon badly on any Code not optimally scheduled for its in-order architecture.
 
Last edited by a moderator:
Xenon has a very gimpy memory pipeline, but a lot of registers. This means that code optimized for it tries hard to use as many registers as possible, while minimizing touching any memory, even the cache. A Jaguar core is the exact opposite of this. Only 16 AVX registers per core, but a really, really good memory backend (well, compared to Xenon, anyway :p).

This difference makes any kind of emulation hard. While a single Bobcat/Jaguar core is IMHO much more powerful than a Xenon thread (by somewhere around 3-4x), emulation is totally out of the picture.

Recompilation should be pretty easy, though.

Would be a shame if thats the way things unfold switching to x86. I still play Shadowrun and it's one of the best hardcore shooters created for a console...the lack of backwards compatibility would extinguish this candle in the wind.
 
Would a system akin to "Turbocore" make sense, but not as in "single thread - high clocks", but rather clock up the CPU, when the CPU needs it or clock up the GPU, when it needs it... keeping the overall heat envelope within the threshold allowed by the HSF? Then any dev can decide if his game needs more CPU or more GPU power (or let the API decide, which is needed more).
 
Devs can use higher ipc CPU when they need it and lower jaguar cores when they need it or use them combined !
Jaguar and Piledriver have about the same IPC.
Would a system akin to "Turbocore" make sense, but not as in "single thread - high clocks", but rather clock up the CPU, when the CPU needs it or clock up the GPU, when it needs it... keeping the overall heat envelope within the threshold allowed by the HSF? Then any dev can decide if his game needs more CPU or more GPU power (or let the API decide, which is needed more).
Trinity already does it and I'm sure Kabini will do it too (if the GPU load is low, it has increased headroom for a CPU turbo, if GPU utilization is high, the CPU runs basically at base clock, it generally prioritize the GPU [which makes sense, if the CPU limits the framerate, the GPU utilization drops allowing the CPU to clock higher until it is in some kind of equilibrium]). But I'm not sure we will see this functionality heavily used in consoles. I guess developers may like to rely on a certain performance (i.e. clock speed does not vary all the time).
 
Last edited by a moderator:
Devs can use higher ipc CPU when they need it and lower jaguar cores when they need it or use them combined !

That would add complications for the developer at this stage. The two cores don't support the same instructions, so the developer or the compiler is going to need to make sure instructions from BD code paths do not accidentally get used by the Jaguar core. The worst case is a crash, the other case is performance degradation.

There can be more subtle numeric differences between separate ADD and MUL instructions and FMA, since the latter can perform rounding differently than two separate instructions. If there were a static division in core use, it might not pop up, but if both core types get used the code is going to start outputting different results, worse if the same thread can migrate between them.
 
Trinity would surely be a better option? It's one of the fastest IPC architectures AMD make.
What? Trinity has Piledriver cores and is very often a bit behind the K10(.5) cores (the fastest ones are actually the Husky cores in Llano) in terms of IPC. Factor in power consumption and die size and you may very well get a higher performance/W and performance/mm² with Jaguar.
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top