Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Yeah, that's what I was thinking. Going from the 128 bit units in Jaguar to 256 bit would only double the 102 GFlop number we have now.
 
Yeah, that's what I was thinking. Going from the 128 bit units in Jaguar to 256 bit would only double the 102 GFlop number we have now.

Yes, it would be double if they went with AVX. AVX2 - first in market with Haswell- would give us the 409,6 Gflops.

Maybe that´s why Bkillian said that Durango would make things that our todays monsters PCs couldn´t ( nobody has AVX2 vectors yet in its PC) ;)
 
if its 3 operand FMA why do you need an extra read port, more register space, or more L/S bandwidth?

Whether the FMA is 3-op or 4-op is completely irrelevant for the execution hardware -- they both are passed on as exactly the same operation after the decoders. Both need to read 3 registers and write one, the fact that for 3-op the architectural register name of one of those read registers is the same as the write register is irrelevant -- they don't point to the same register in the PRF.
 
Yes, it would be double if they went with AVX. AVX2 - first in market with Haswell- would give us the 409,6 Gflops.

Maybe that´s why Bkillian said that Durango would make things that our todays monsters PCs couldn´t ( nobody has AVX2 vectors yet in its PC) ;)

Isn't that doubling inherent to the Haswell implementation of AVX2 (ie, it just has twice as many FMA units)? Or is double the FMAs implied for AVX2 support? In any case, I can't imagine AMD basically quadrupling the size of the FPU on Jaguar for Durango to match a design even Intel hasn't shipped yet.
 
Isn't that doubling inherent to the Haswell implementation of AVX2 (ie, it just has twice as many FMA units)? Or is double the FMAs implied for AVX2 support? In any case, I can't imagine AMD basically quadrupling the size of the FPU on Jaguar for Durango to match a design even Intel hasn't shipped yet.

Yeah, unlikely.
 
I can't imagine AMD basically quadrupling the size of the FPU on Jaguar for Durango to match a design even Intel hasn't shipped yet.
Vector ALUs in Jaguar are already double-pumped (one 256-bit op per base clock), so just doubling the unit count should suffice. ;)
 
Vector ALUs in Jaguar are already double-pumped (one 256-bit op per base clock), so just doubling the unit count should suffice. ;)

The physical SIMD units are 128 bit internally. Executing a 256 bit AVX2 instruction takes two cycles the same way a 128 bit SSE2 instructions took two cycles on Bobcat with its 64bit SIMD units.

Bobcat and Jaguar are designed to be low power general purpose CPUs, where you don't see a huge demand for floating point performance. One can argue it makes sense to use narrower data paths and execution units to lower idle power consumption.

Durango is an entirely different design point. Games can use lots of floating point resources and while power is a concern it isn't a mobile platform. Adding FMA and full width SIMD units seems like low hanging fruit to me although significant changes need to be made to the scheduler and register file to support the extra source operand.

Cheers
 
The physical SIMD units are 128 bit internally. Executing a 256 bit AVX2 instruction takes two cycles the same way a 128 bit SSE2 instructions took two cycles on Bobcat with its 64bit SIMD units.

Bobcat and Jaguar are designed to be low power general purpose CPUs, where you don't see a huge demand for floating point performance. One can argue it makes sense to use narrower data paths and execution units to lower idle power consumption.

Durango is an entirely different design point. Games can use lots of floating point resources and while power is a concern it isn't a mobile platform. Adding FMA and full width SIMD units seems like low hanging fruit to me although significant changes need to be made to the scheduler and register file to support the extra source operand.

Cheers

If Durango brings AVX2 it will be the best console CPU ever. This plus ESRAM (if low latency) to talk with the GPU: GPGPU heaven.Then i could see sense to the comments after Durango Summit last year about it being a super computer. And also would have a little more sense to have Xeon in the devkits as ESRAM its an aproximation to the Xeon giant L3 cache
 
Last edited by a moderator:
The physical SIMD units are 128 bit internally. Executing a 256 bit AVX2 instruction takes two cycles the same way a 128 bit SSE2 instructions took two cycles on Bobcat with its 64bit SIMD units.

Bobcat and Jaguar are designed to be low power general purpose CPUs, where you don't see a huge demand for floating point performance. One can argue it makes sense to use narrower data paths and execution units to lower idle power consumption.

Durango is an entirely different design point. Games can use lots of floating point resources and while power is a concern it isn't a mobile platform. Adding FMA and full width SIMD units seems like low hanging fruit to me although significant changes need to be made to the scheduler and register file to support the extra source operand.

Cheers

No, look at the hotchips presentation from AMD about Jaguar.

One Jaguar FP-unit can do 8 SP-MUL's and 8 SP-ADD's per cycle; or 1 DP-MUL and 2 DP-ADDs per cycle
 
No, look at the hotchips presentation from AMD about Jaguar.

One Jaguar FP-unit can do 8 SP-MUL's and 8 SP-ADD's per cycle; or 1 DP-MUL and 2 DP-ADDs per cycle

Mind linking us that? I have it on pretty good authority that one Jaguar FP-unit can do 4 SP-MUL and 4 SP-ADDs per cycle.
 
No, look at the hotchips presentation from AMD about Jaguar.

One Jaguar FP-unit can do 8 SP-MUL's and 8 SP-ADD's per cycle; or 1 DP-MUL and 2 DP-ADDs per cycle
You should have another look!
It's 4 SP ADDs + 4 SP MULs per cycle
or 2 DP ADDs + 1 DP MUL per cycle.
That means peak is 8 SP Flops or 3 DP Flops per cycle and core.

PS:
And I'm quite sure the Jaguar cores in Durango are largely unmodified. They won't fiddle with the supported instruction set. That means FMA support or such things are out of question. The same is valid for wider FPU pipes. Too much trouble (would need a complete overhaul of the register files and probably the L1D cache) for too less in return. It was probably way easier to double the core count to 8.
 
If Durango brings AVX2 it will be the best console CPU ever. This plus ESRAM (if low latency) to talk with the GPU: GPGPU heaven.Then i could see sense to the comments after Durango Summit last year about it being a super computer. And also would have a little more sense to have Xeon in the devkits as ESRAM its an aproximation to the Xeon giant L3 cache

AVX2+fma gives 32 flops per cycle, this would change everything
 
But isn't AVX2 an Intel (Haswell) exclusive instruction set?

EDIT: What is better (for devs)? GPGPU or AVX2?
 
Last edited by a moderator:
But isn't AVX2 an Intel (Haswell) exclusive instruction set?

Exclusive in terms of only announced supporting architecture? Yes. But any extension Intel adds to the x86 instruction set, AMD is free to implement and so with AMD adding to the instruction set for Intel. That's how the license works.
 
Exclusive in terms of only announced supporting architecture? Yes. But any extension Intel adds to the x86 instruction set, AMD is free to implement and so with AMD adding to the instruction set for Intel. That's how the license works.

Ok, thanks, it is like AMD can use something "like" AVX2, but not AVX2.

Are there any future AMD CPUs/APUs with "AVX2" support?
 
Ok, thanks, it is like AMD can use something "like" AVX2, but not AVX2.

Are there any future AMD CPUs/APUs with "AVX2" support?

To use it, AMD would have to provide support for AVX2 instructions exactly as Intel has created the specification, else it's not AVX2. How they implement the silicon is up to them, though.

AMD has not announced any CPUs with AVX2 support. They only just started supporting AVX in H2 2011. (Intel was Q1)
 
To use it, AMD would have to provide support for AVX2 instructions exactly as Intel has created the specification, else it's not AVX2. How they implement the silicon is up to them, though.

AMD has not announced any CPUs with AVX2 support. They only just started supporting AVX in H2 2011. (Intel was Q1)

Thanks again, then, for me, it looks not easy to get AVX2 support in Durango.
 
Ok, thanks, it is like AMD can use something "like" AVX2, but not AVX2.

Are there any future AMD CPUs/APUs with "AVX2" support?

Steamroller cores (scheduled for late 2013 with Kaveri) wont support it. I guess implementing AVX2 is not that easy.

What kind of changes (beside AVX2/FMA support) would benefit Jaguar's FP capability?
 
Status
Not open for further replies.
Back
Top