Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Well, that is true, one of their leaks definitely had a multiple soc design. Maybe the second soc wouldn't have anything at all to do with gaming, and would simply be purely for the system, which is what was largely implied in that Yukon leak that we got from before.
 
What amount of bandwidth are we talking about ?
For something more comprehensive than AFR, I think something like half the total system memory bandwidth in each direction, if each APU got its own memory pool. Using hypertransport, a dual APU system with the same DDR3 of desktop chips could manage it if it got the IO portion of an Opteron.

If one chip has the main memory bus and has to forward everything, then full bandwidth.
For Durango, I'm not sure how to account for the ESRAM's contribution.

PCI-E 3.0 delievers 8GT/s bit rate. PCI E 4 is 16GT/s would that be enough ?
Depends on width and what you believe to be the aggregate memory bandwidth of the dual-APU system. If the ESRAM is included, no, and the latencies would be horrible.
Unless PCIe is extended to allow coherence, it's not particularly useful for transparent multi-GPU of the sort the Durango rumor is positing.


Perhaps this is what the data movers .
That saves CPU/GPU cycles on bulk moves. It doesn't make traces on a motherboard or package have less electrical resistance.
 
BTW, has nobody noticed that the Move Engines in Durango have a bandwidth of 25.6GB/s while HT 3.1 has 25.6GB/s unidirectional. Both run at 800Mhz. Just a coincidence?
 
That's for a link width nobody uses, so probably a coincidence.
One, 32 bytes seems like a reasonable on-die interconnect channel width, and second, 800MHz at 1/2 the CPU core speed 1.6 GHz seems like a decent ratio.
 
Is this the possible reason for the eSRAM and DMA lz encoding and decoding functionality.

Full Memory Compression (FMC)
https://www.google.com/url?sa=t&rct...nWhYTkP-N-TFTG4WOmrRUdw&bvm=bv.43148975,d.b2I

Full memory compression keeps the entire memory compressed (with the possible exception of some specialized regions such as DMA). In order to localize the changes needed to support compressed memory, the O/S initializes the system with certain amount of uncompressed memory (e.g., twice the physical memory) and all accesses are to this real address space. The accessed addresses are eventually converted to the compressed space addresses (physical address space) by the memory controller before the actual access is initiated. The access would retrieve the compressed memory block, decompress it and provide it to the processor. Since decompression is a slow process, acceptable performance requires a chipset cache that maintains the recently used uncompressed data. FMC is best illustrated by IBM’s MXT (memory extension technology) that includes the following components [ibm-mxt, pinnacle]:

1. 32 MB of fast (SRAM) chipset cache.
2. Memory compressed in blocks of 1 KB size. Compressed blocks are stored using 1-4 segments, each of size 256 bytes.
3. A compressed block is accessed via a header entry that contains pointers to the 4 segments and other relevant information. For blocks that compress to 64:1 or better, it also allows for an “immediate data” type of representation (in which case all 4 segment pointers would be null).
4. The chipset provides a hardware compression-decompression unit (henceforth called Codec) based on a variant of the LZ77 compression algorithm [ibm-lz].
5. The chipset also provides the TLB (similar to paging TLB) for address translation between real and physical address spaces.

Maybe this is why the GPU reads at 170 GBs but writes at 102 GBs. The gpu output must be compressed using the DMAs before being written into main memory via eSRAM.

In terms of the HDMI in and only a jpeg decoder on one or a couple of the DMAs, it is probably due to Kinect.
 
Last edited by a moderator:
The DMEs with that functionality all sit on a 25.6 GB/s bidirectional connection.
The bottleneck would be the 25.6 GB/s connection that the uncompressed data stream has to go through in order to get to a DME that cannot handle that level of sustained output.

The 102.6 GB/s probably reflects the biggest transaction the GPU can post in a given cycle, which is a Z write from its ROPs to ESRAM.
 
He's not banned, MikeR is banned not BClifford (the BClifford thing is an obvious nod to the MikeR = Mark Rein conflation hence a big giveaway )

Someone can confirm whether he was indeed the first to post the March 6 date, otherwise it's just a case of throwing in some glitter among the chicken feed.
I saw it on GAF at least a week ago, it was the same schedule that was picked up a day or two on some websites. Edit: 2/23/13 -http://www.neogaf.com/forum/showpost.php?p=48116222&postcount=781

I think a few pages later there is more detail, but the troll post could have seen that do get to the March 6th date.
 
Is this the possible reason for the eSRAM and DMA lz encoding and decoding functionality.

Full Memory Compression (FMC)
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDIQFjAA&url=http%3A%2F%2Fkkant.net%2Fpapers%2Fcaecw.doc&ei=SDI1Ud26GOfp2QXOuoHoDw&usg=AFQjCNHSJNInWhYTkP-N-TFTG4WOmrRUdw&bvm=bv.43148975,d.b2I



Maybe this is why the GPU reads at 170 GBs but writes at 102 GBs. The gpu output must be compressed using the DMAs before being written into main memory via eSRAM.

In terms of the HDMI in and only a jpeg decoder on one or a couple of the DMAs, it is probably due to Kinect.

Seems you hit the nail on the head. I went back and read the article on the move engines, and that's indeed the same compression algorithm, and the SRAM seems a perfect match.

So what's the benefit of FMC? Is it just a very slick way to save on memory bandwidth, improve efficiency while simultaneously improving costs, power consumption and physical die space? Are there any obvious benefits I'm missing?

I didn't get what they meant with this line here.

the O/S initializes the system with certain amount of uncompressed memory (e.g., twice the physical memory) and all accesses are to this real address space.

What do they mean by twice the physical memory?

In the end, does this make it much more likely that Microsoft could realistically charge $300 for this on launch day if they wanted to without losing money on each unit sold?

· Significant savings in memory costs for large and medium servers.
· Savings in physical space, required power, and thermal dissipation by the memory subsystem for high-density servers.
· Reliance on lower performance disk subsystems through the use of the compressed disk cache technique, significantly lowering total system cost.
· Improved efficiencies through reduced memory and I/O subsystem BW requirements and costs from the application of compression end-to-end.

Seems like it could be a pretty nice way to go about things.
 
Last edited by a moderator:
FWIW I think the FMC thing is extremely unlikely in the for discussed, but what they mean is that to the software the compressed memory looks a 2x physical address space, and it's translated via some sort of mapping table so the app reads Address X but that's really address Y in the compressed memory, that block is decompressed to the SRAM and read uncompressed from there but that's transparent to the app.

Obviously nothing is free here and the downside of the described FMC system in increased latency, since memory fetches are generally local you can offset this on a CPU by using a fast pool to cache the decompressed memory.

I would imagine the intent of the ZLIB hardware and the JPEG hardware in Durango is to reduce the bandwidth of textures being copied to the ESRAM, I would expect this to be an explicit operation, rather than something like FMC.
 
Okay, so that means Durango has specific parts to it that would make it ideal for FMC, but what's likely closer to the reality is that Durango isn't using FMC, but is simply taking advantage of some of the requirements for FMC to achieve other objectives?

Do I have that right?
 
Okay, so that means Durango has specific parts to it that would make it ideal for FMC, but what's likely closer to the reality is that Durango isn't using FMC, but is simply taking advantage of some of the requirements for FMC to achieve other objectives?

Do I have that right?

No I don't think even the first statement is true, as I read the description it's designed for a CPU to be reading the compressed memory and the SRAM has to be accessible by the CPU.

But yes it's using similar principle in that you use decompression to reduce the overall memory bandwidth. Why that's not a spectacular win is the only reason you're copying it in the first place is because you have two memory pools.
 
No I don't think even the first statement is true, as I read the description it's designed for a CPU to be reading the compressed memory and the SRAM has to be accessible by the CPU.

But yes it's using similar principle in that you use decompression to reduce the overall memory bandwidth. Why that's not a spectacular win is the only reason you're copying it in the first place is because you have two memory pools.

Seems the primary use for something like this is a large dataserver environment where you can achieve 2.33x the adressable memory by adding additional cache. 64gb would give you over 128gb (multiplied across a farm could save a lot of $$$), if your application clients could deal with the added latency.

Unless someone can think of a killer reason why Durango would need 19gb of memory? :)
 
My guess (that's based on absolutely nothing whatsoever and I just feel like saying it anyway) is that this BClifford guy knew there was going to be this event and he posted what he did not because he's outright trolling but because he wants it to be true so badly that he's convinced himself it's just gotta be.

.. not that the stuff about being an insider wouldn't be a lie.

Just getting a similar vibe to those people I encountered who pretended to pulled off unrealistic accomplishments with emulators in hopes of motivating someone to really do it, because they knew it just had to be possible...
 
My guess (that's based on absolutely nothing whatsoever and I just feel like saying it anyway) is that this BClifford guy knew there was going to be this event and he posted what he did not because he's outright trolling but because he wants it to be true so badly that he's convinced himself it's just gotta be.

.. not that the stuff about being an insider wouldn't be a lie.

Just getting a similar vibe to those people I encountered who pretended to pulled off unrealistic accomplishments with emulators in hopes of motivating someone to really do it, because they knew it just had to be possible...


...Or that he simply chose Beyond 3D for a info dump. (If this is true of course)
 
No I don't think even the first statement is true, as I read the description it's designed for a CPU to be reading the compressed memory and the SRAM has to be accessible by the CPU.

But yes it's using similar principle in that you use decompression to reduce the overall memory bandwidth. Why that's not a spectacular win is the only reason you're copying it in the first place is because you have two memory pools.

Ahh, okay, thanks for clearing that up. I definitely worded that wrong, as meeting some requirements, but not all, certainly doesn't make something ideal.
 
...Or that he simply chose Beyond 3D for a info dump. (If this is true of course)

I think he chose B3D because for some reason, some people choose to believe anything written on here wether it has truth or not in it, also I think its a bit rich to call it a information dump, he didn't really say much.
 
I think he chose B3D because for some reason, some people choose to believe anything written on here wether it has truth or not in it, also I think its a bit rich to call it a information dump, he didn't really say much.

The posts on the forum probably do have a higher proportion of technical accuracy than your average forum, or god forbid comments section..
 
Status
Not open for further replies.
Back
Top