Thoughts on next gen consoles CPU: 8x1.6Ghz Jaguar cores

That is several years ago and it was IBM, not AMD. What does AVX2 add, which could be of interest for games? 256 Bit Integer instructions? Some bit manipulation or vector shifts? Don't hink so. Gather support is very unlikely as well as FMA, as the whole load/store architecture is probably too weak to get any tangible performance benefit in most cases. So one can save the effort. 256Bit SIMD units are completely out of the question (would imply changing core parts of the design).

AMD got Jaguar just doing its first steps. I really doubt they had much time to fiddle with changes for MS or Sony. If there are customizations beyond getting a 8 core version to run, they will be small. The customization is done on the SoC level, not the cores itself.

They've also been designing the CPUs for years at this point. It's very possible they could have forked the development for the CPU core. Two FMA* 128 bit units per core has also been suggested. I don't expect any such modifications, but it's possible. Both solutions double the compute performance which is attractive for physics and many other calculations.
 
Last edited by a moderator:
Jaguar has two 128 bit pipelines as standard. They are not symmetric, but they are there.

Sorry, referring to this:

http://forum.beyond3d.com/showpost.php?p=1700219&postcount=541

AVX2/FMA are just instruction set extensions, actual throughput depends on the implementation. A single 256-bit FMA unit gives 16 SP flops per cycle. Haswell can do 32 SP flops per cycle because it has two 256-bit FMA units.

As I wrote a couple of weeks ago, I think the most likely and less intrusive change would be to replace the 128-bit adder and multiplier with two 128-bit FMA units. This would effectively double the peak throughput, while decreasing latency when executing dependent multiples and adds.

AVX2 adds gather and integer instructions. While gather is extremely useful if implemented efficiently, I'm not sure that 256-bit wide vector integer instructions are useful for game code, which I suppose performs mostly floating point operations.
 
I wonder why people are expecting AVX2 in an AMD CPU, it is unclear when AMD will add this to his top of the line CPUs. It is not that they don't need it so if they were to do it I would expect to do it first for them selves.
Wrt Jaguar, before jumping to AVX2 in any case I would expect them to have a native support of AVX /not executing AVX instruction at half speed.
I think none of that is to happen.

At this point I think that integrating special hardware on the same die as the CPU is more likely. I would put the odds low for something really "new". At best they could have a tiny GPU (1/2SIMD/CUs) on the CPU die unavailable for rendering (so not count in the 12CU) for compute purpose, but I think even that is unlikely.
If they need more "muscles", looking at a pretty tiny CPU die (I assume it is not a massive APU), they could have add 4 more cores without breaking the bank (then you have to feed the thing but the same is true whatever special units you would integrate on the CPU).
 
Two FMA* 128 bit units per core has also been suggested. I don't expect any such modifications, but it's possible
Two FMA units are probably near useless in all practical workloads unless you add a second load pipeline.
FMA though gets you more than just double the theoretical throughput:
- should be lower latency than mul/add
- for some HPC algorithms the increased precision might be crucial (i.e. save you other operations to get similar precision without fma) - probably not really an issue here though.
So adding one FMA unit would probably have way better benefit/cost ratio than adding two (also means you only need to beef up the multiplier with an adder and don't need a second multiplier). But that makes the pipelines really asymmetric (with one being able to fetch 3 regs), so no doesn't seem likely neither (and I don't really expect customized cores - maybe some changes twoards the the outside but not deeply inside the cores).
 
AMD got Jaguar just doing its first steps. I really doubt they had much time to fiddle with changes for MS or Sony. If there are customizations beyond getting a 8 core version to run, they will be small. The customization is done on the SoC level, not the cores itself.

I'm just going by this post by bkilian; his response to CPU customisations being unlikely.
http://forum.beyond3d.com/showpost.php?p=1692679&postcount=18007

And Jaguar supports AVX right? If not, perhaps they've added that.
 
That is several years ago and it was IBM, not AMD. What does AVX2 add, which could be of interest for games? 256 Bit Integer instructions? Some bit manipulation or vector shifts? Don't hink so. Gather support is very unlikely as well as FMA, as the whole load/store architecture is probably too weak to get any tangible performance benefit in most cases. So one can save the effort. 256Bit SIMD units are completely out of the question (would imply changing core parts of the design).

AMD got Jaguar just doing its first steps. I really doubt they had much time to fiddle with changes for MS or Sony. If there are customizations beyond getting a 8 core version to run, they will be small. The customization is done on the SoC level, not the cores itself.

AMD first unified shader arch came on the 360 almost a year and a half before the R600 came to the PC market. I think that if MS wanted a customized CPU and tweaks to the core, AMD would have no problem.

There is no telling how deeply AMD is willing to work with MS especially when there are always AMD requests to be made on the Windows OS/AMD Desk Labtop side of their relationship.
 
If you mean by double pumping the FP to get 1 256-bit operation per clock, you are probably right.
Nice I missread those now old slide about jaguar cores.
Interesting, so you can get the peak FLOPS figures two way:
8 (cores) x 4 (Fp elements) x 2 (operation /best case scenario the core schedule a Mull and add at the same time) x1.6 (GHz)= 102.4GLOPS
Or
6 (cores) x 8 (fp elements) x 1 (operation) x 1.6 (GHz) = 102.4 GFLOPS

Can somebody explain which "mode" is the most likely to give the best result?
 
AMD first unified shader arch came on the 360 almost a year and a half before the R600 came to the PC market. I think that if MS wanted a customized CPU and tweaks to the core, AMD would have no problem.

There is no telling how deeply AMD is willing to work with MS especially when there are always AMD requests to be made on the Windows OS/AMD Desk Labtop side of their relationship.

Yeah, sweetvar said the Durango project was a higher priority at AMD and they had more engineers working on it.
 
Yeah, sweetvar said the Durango project was a higher priority at AMD and they had more engineers working on it.

He also implied they were working on a compressed time table due to set backs, and that the Orbis silicon had sailed smoothly through testing at that point.
 
He also implied they were working on a compressed time table due to set backs, and that the Orbis silicon had sailed smoothly through testing at that point.

If anything that just supports the idea of significant modifications vs bog standard for Sony.
 
If you mean by double pumping the FP to get 1 256-bit operation per clock, you are probably right.

I don't think so. Jaguar has 128-bit FADD and 128-bit FMUL units, you can't get a single 256-bit wide operation by running them together. It's not Bulldozer (and that's not what double pumping means, that would be getting a result twice a clock, on the rising and falling edge)
 
If anything that just supports the idea of significant modifications vs bog standard for Sony.

There's a lot of new silicon between the DMEs, display plane elements and eSRAM. Those could be the issue and you'd still have vanilla jaguar and gcn cores.
 
I don't think so. Jaguar has 128-bit FADD and 128-bit FMUL units, you can't get a single 256-bit wide operation by running them together. It's not Bulldozer (and that's not what double pumping means, that would be getting a result twice a clock, on the rising and falling edge)

http://i328.photobucket.com/albums/l327/encia/PC_hardware/36_zpsd15ac367.jpg

Unless someone added the extra commentary thats what the slide says.

http://aphnetworks.com/news/2012/10/19/amd-already-tests-next-gen-low-power-kabini-chip
 
Back
Top