Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Jaugar has two integer pipes, two FP pipes, a load pipe, and a store pipe.
That's the most straightforward interpretation of the 6 ops claim.

The four op claim might reflect that the two-wide front end can decode two reg-mem ops that decompose into an ALU and memory operation, which can mean up to four issue ports can be used at the same time.

The core's back end is wider than the sustainable instruction bandwidth to allow it to catch up after stalls.

So no additional CPU processing. With MS choosing to go with a weak gpu, offloading CPU functions with other processors and them stating they modified every component. It appeared to me, MS might possibly going with a more CPU focused console and added additional processing to the CPU. I guess I thought I had found the "secret sauce". Thanks for bearing with me. :smile:
 
That's the most straightforward interpretation of existing data. It's not impossible that there were changes to the CPU section, however the rising costs of more invasive changes make it increasingly unlikely.
The sort of changes needed to create a 3-wide (6 uop decode) front end in a Jaguar-based design or a spectacularly improbable 6-wide front end would be so costly and time consuming that it may actually be impossible in terms of investment and time required to implement.
The downside in power consumption and limits to the upside make it increasingly unnecessary as well.
 
That's the most straightforward interpretation of existing data. It's not impossible that there were changes to the CPU section, however the rising costs of more invasive changes make it increasingly unlikely.
The sort of changes needed to create a 3-wide (6 uop decode) front end in a Jaguar-based design or a spectacularly improbable 6-wide front end would be so costly and time consuming that it may actually be impossible in terms of investment and time required to implement.
The downside in power consumption and limits to the upside make it increasingly unnecessary as well.

Thank you 3d. Your perspective has been invaluable.
 
Thats a cynical view imho.

In reality the question is, what is there left to talk about within the Xbox One architecture which requires an NDA well into September?

My question would be: Did they explicitly said they couldn't talk due a NDA made with AMD? Or just a NDA in general? All the time I saw they speaking about that the impression I got was that they were saying: "We can't talk about that, because the high levels at Ms doesn't want us talking specifics of our console", and people were like: "We can't talk about that because we are using top secret info from AMD and they re not ready yet to show".

Meaning, the NDA that prevents them from talking about the console will probably never expire, people just misinterpreted it...
 
People generally do LZ decompression on the GPU?, also sound on the GPU?, I don't think texture swizzling (which from my knowledge exists for free on modern GPU's?) is a big win either.

The only thing that Orbis seems to be lacking in regards to Durango is the SHAPE audio block and even then it still has a audio decompressor.

Do people generally deploy a LZ compression specific logic on a gpu? Who knows why MS is ultimately deploying such logic. Maybe the Jags aren't fast enough or maybe compression on the cpu cores eats up too many resources. And yes, accelerating LZ compression on a gpu has been shown. Maybe MS didn't want to devote CUs to the process, so they added custom logic to accommodate.

Yes, I have readily seen swizzling on a gpu described as free. I am guessing because you don't have to burden the ALUs to reorder bits. But swizzling costs cycles and introduces latency into the graphics pipeline that is dictated by the complexity of the manipulation.
 
Yes, I have readily seen swizzling on a gpu described as free. I am guessing because you don't have to burden the ALUs to reorder bits. But swizzling costs cycles and introduces latency into the graphics pipeline that is dictated by the complexity of the manipulation.
No, the DMA engines do this. No ALU resources are used. The two simple move engines (the ones without the jpg/lz decode or lz encode) are most likely just the normal DMA engines, which also do this tile/untile (or swizzle, it's the same when you talk about textures) on the fly.
 
Is it assumptions that the DME are the ones in the gpu, and not additional ones?
I thought the DME's operate at about 50gbs(each) but the gpu can access memory at full tilt?
Or am I mixing up different operations or something.
 
The DMEs and a number of sundry units like the display controller share a connection capable of 25 GB/s in each direction.

I believe this is the expansion hub that is part of AMD GPUs that allows them to more freely add or subtract units from the GPU.
 
Don't you think it is far more likely that DirectX 11 was modified to take advantage of the Xbox One?

If you mean the specific version of direct x that the xbox one uses then yes
If you mean direct x 11 as a whole then no.
Microsoft has alot more to consider for direct x 11.2 than just the xbox one.
They have to consider future gpus from amd and nvidia and other companies.
They have to also consider future memory setups like hsa.
 
Well, you also then have to do the scale from render buffer size to screen size yourself, instead of letting the hardware do it for you when it composites the planes. That would probably mean a lower quality scale, and CU time. Not just a simple blend.

If you could read back from the DCE in PS4 and, the latency of the operation was small, couldn't you scale one buffer read it back, composite then send the composited buffer to it.
 
Do people generally deploy a LZ compression specific logic on a gpu? Who knows why MS is ultimately deploying such logic. Maybe the Jags aren't fast enough or maybe compression on the cpu cores eats up too many resources. And yes, accelerating LZ compression on a gpu has been shown. Maybe MS didn't want to devote CUs to the process, so they added custom logic to accommodate.

The amount of logic needed to implement LZ comression/decompression with a smallish window, like in the XB1, is tiny. The compression hardware throughput is equivalent to two to three Jaguar cores (a little more than one Core i7 @2.7GHz).

Cheers
 
Vbh3ifw.png


There is more coming.
 
Indeed, AFAICT the MS engineers said:
"more CUs isn't better than more Hz for Durango"


I was under the impression that a 6% up clock per cu was better performance because
A)give better performance with causing bottlenecks
B)it was easier to achieve with out unlocking the back up cu's
C)it keeped the system the most balance and inline with the over all design!

Just my opinions!;-p
 
It's also clear that while the statement regarding upclock better than +2CUs might be true (we will never know, we can only take their word for it), it was probably the only real possible solution at that point. Enabling the 2 deactivated CUs would've most definitely lead to yield problems (which have been rumored already) and would therefore cost them much more than a simple upclock which is basically free (costwise). But I guess I'm just stating the obvious here... NVM
 
B)it was easier to achieve with out unlocking the back up cu's
AFAIR the comment was that they tested both possibilities and found that +Hz was more effective. [It's never good to say to customers that "we're taking the easier option" :)]

A)give better performance with causing bottlenecks in their design
C)it kept the system the most balance and inline with their over all design!

Yep, but note they are not suggesting >12CUs = bad idea, instead they are stating that their design (which is 'heavy' on non-CU elements) does not benefit from just sticking more CUs in it. (which seems perfectly logical).

Just my opinions!;-p

Yep, ditto :).
 
Asking MS about more CUs is never going to get you a straight or completely honest answer. Their system was designed around 12, they have no interest in telling anyone that more than 12 is a good thing, then people will make the obvious comparison to 18. Their narrative is the Goldilocks story. They chose everything just right, anything else would "unbalance" the system. 12 probably is a magic number for their system. Of course you can design system which perform great with more or less, hence AMD has a whole line of cards from 8 to 30 CUs.
 
Status
Not open for further replies.
Back
Top