Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Does anyone anything about the cryptography engines that was at the bottom of the vgleaks piece...
Or is it just a security chip

Cryptography would imply security... It's probably the ARM chip (TrustZone) that AMD announced plans to integrate awhile ago.
 
The more advanced and complicated the possibility being considered, the less likely it is to actually be the case.

There is nothing complicated in my "fantasy", IMO it is one possibility with very minor changes to the memory-controller and nothing else. The direct ESRAM-to-GPU is already more complicated, that could also have been done via the memory-controller if not for a reason.
 
The more advanced and complicated the possibility being considered, the less likely it is to actually be the case. Same with multi-ported on-chip memories and so on. Multiporting hasn't been used on such a scale in any consumer device ever, it'd add a lot of extra wiring on the chip, as well as additional arbitration logic etc.

Multi-banking seems like it could be possible. If we want the CPUs to overlap execution with data engine loads, they can switch to using banks not being filled or written back by a DME.
Arbitration would be relatively simple, a bank is either ready or it's not.

Maybe by assigning pages to specific cores and then pinning those address ranges to banks, the CPUs can avoid running into each other.
 
ps3 = 256mb mem & 256mb GPU mem
360 - 512mb for mem & gpu most likely the ratio in games is similar to ps3

hence 256mb memory this generation (minus some for OS etc) x 8 = 2GB

RSX can read and write just fine into the XDR pool. Your post makes no sense. You make imaginary split of memory and then discard half of it...The games definitely use most of the available 512MB of memory in both consoles. Even if half of that would be used in non graphic related tasks, doesn't mean that the rest is not there or very important for the game.
 
Multi-banking seems like it could be possible.
Maybe by assigning pages to specific cores and then pinning those address ranges to banks, the CPUs can avoid running into each other.
Yeah, sounds reasonable... Besides, that's pretty much what Intel's core i-series does also, except each CPU isn't hardwired to a set of banks of course, it has that ring bus that gives access to the cache of other cores as well.
 
The hash function used to assign addresses to L3 slices in the Corei series is programmable at some level, since Intel can disable an arbitrary core and slice for yield purposes. The goal there is to avoid contention for the same slice.

Durango could do the same thing at a higher level with the SRAM, or by exposing the mappings to either the run-time or software, do the opposite of Intel and try concentrating accesses to actively used banks.
 
@silhouette:

This is a thread for the VG Leaks specs discussion, that's why your post was moved to the more general thread for rumour speculation/discussion, and the duplicate and triplicates were deleted.
 
The 360 has 32 GB/s unidirectional BW between GPU and daughter die for its 8 ROPS. I'm guessing Durango might have 16 ROPS.

32 GB/s * 2 * (800 mHz / 500 mhz) = 102.4 GB/s. Close enough to the 102 GB/s in the VGLeaks data.

So ... how about 32MB of edram (maybe on a daughter die) for feeding 16 of dem magic ROPS, only with the ability to quickly swap bus direction for GPU / CPU reads of the edram (fixing the 360's limitation)?

The 102 / 102.4 GB/s numbers just fit so well. Magic ROPS - with their "free" read-modify-write ops - might be just the kind of special sauce that Durango needs to see increased real world efficiency, and it would be bang in line with what's worked so well for MS before. And there'd be no need for developers to start learning about using fast local store for SPU style processing. It would just be 360 edram v2.

Edit: you could even put a bank of magic ROPS in an edram daughter die and a bank of regular ROPS in the main GPU for use with main memory. Draw whatever you want to whichever pool of memory fits best.

Edit 2: So render your huge double buffered dynamic shadow map straight into main ram while continuing to work on successive back buffers on the magic ROPS in edram. That kind of thing. So much special sauce potential.
 
Last edited by a moderator:
Interesting. Good math deduction.

That said if ERP is right and the 68GB/s DDR3 is supposed to be used for the ROPs... *O*U*C*H*
 
Working out that maths just got my spider sense tingling.

For "normal" reads and writes the BW would be seen as 102.4 GB/s as the leaked docs show it, but within the edram/sram any embedded ROPS that can perform read / modify / write (like on the 360) would have up to 204.8 GB/s. Following the full 360 way of doing things, 4X MSAA would take you up to something bonkers like 820 GB/s. - but even without that you're still looking at quite a good situation.

MS were very keep to avoid putting everything on the same bus with the 360 as they said contention with high BW use framebuffers made performance unpredictable for everything else. Hopefully the Durange leaked specs are an indication that MS hasn't given up on smart ROPS and predictable performance.
 
I may be naive, but a pool of memory for the framebuffer for that reason makes sense. One developer has noted a number of times that having predictable performance is a big deal to developers--one last thing to worry about. The naive part is if you have 7-32MB of memory (framebuffer) sucking 90% of the bandwidth of your 512MB of memory it seems, if it is economical, dedicating some silicon to the framebuffer to (a) give the wide bandwidth the framebuffer needs to go full bore ahead and (b) give more consistent bandwidth to the remainder of the system (even if the "peak" is lower than a completely unified system as it will often have more real bandwidth due to lack of framebuffer contention).

So in theory the ESRAM for the framebuffer (+ other stuff) is not necessarily a horrible idea.

That said ERP has been told MS is suggesting using the 8GB of DDR3 (68GB/s) for the framebuffer. That, right there, is another major indicator that MS is *not* targeting the same performance ballpark as Sony. If Sony hits 192GB/s that is the same as the 7870XT; but the normal 7870 (20CU) and 7850 (16CU) have 154GB/s so the is about 40GB/s floating around for the CPU even when the GPU is drawing the same bandwidth as the PC counterparts.

IF the Durango framebuffer is in the DDR3 you have a small chunk of 7-64MB of space eating up the majority of the bandwidth of the 5GB of game space (not to mention any of the needs of the 3GB of system memory!) Maybe ERP or someone else can explain why this isn't an issue, but it does seem like a HUGE issue if your small framebuffer sucks up all your DDR3 bandwidth -- what is left over for all the other stuff the CPUs and GPU need to do? (As a snide comment @ MS: it seems they were happy with the crappy filtering on Xbox 360 games because I don't see how this situation will help address that issue).

It seems for a similar budget MS had aimed a lot lower than Orbis in terms of *gaming* performance or there are some important, undisclosed information about the system.
 
The numbers are somewhat compelling for a 16 or 32 ROP part writing to the embedded memory, so I wouldn't rule out ROPS on the ESRAM.
It's also quite possible that either the information I got was misunderstood, I don't have a first hand data source.

In any case you'd almost certainly want to render Z and anything with blending enabled into the ESRAM, but most deferred renderers I know of do separate opaque and transparent passes.
If you were running a deferred renderer you're only other option for the first pass would be to somehow split the render targets from the first pass between memory pools, or to tile.

It's certainly true that rendering to the ESRAM would compete for bandwidth with the CPU, but the GPU is going to do that in either case.
 
Thanks ERP.

I started a thread with you in mind: http://forum.beyond3d.com/showthread.php?t=62932

Oddly enough many, many months ago I had someone in PM claiming to know some 1st party developers and claimed Durango would use embedded memory and was adamant that what he was being told was that the peak compute would be lower but it would be very much like the NV/AMD flops situation. Specifically he said the developers were talking about wavefront utilization and avoiding parts of the GPU remaining idle. He has been PM'ing me ever since the VGLeaks telling me "he told me so" and pasting things the claimed developers are telling him. Either the PR machine is in full order or some developers like the design?
 
Thanks ERP.

I started a thread with you in mind: http://forum.beyond3d.com/showthread.php?t=62932

Oddly enough many, many months ago I had someone in PM claiming to know some 1st party developers and claimed Durango would use embedded memory and was adamant that what he was being told was that the peak compute would be lower but it would be very much like the NV/AMD flops situation. Specifically he said the developers were talking about wavefront utilization and avoiding parts of the GPU remaining idle. He has been PM'ing me ever since the VGLeaks telling me "he told me so" and pasting things the claimed developers are telling him. Either the PR machine is in full order or some developers like the design?

uh oh! One of my theories!. Out-of-order GPU!.

By, the way, has this any sense?:

We Say: Our source from our Sister Site has never let us down yet. We're leaning towards a yes on this one.

Source Says: (09:33:46) Source:: Something they don't mention, though, is the new box will have TWO scaler chips... and what's cool about that is the UI can always be 1080p, while in-game rendering resolutions can scaled to increase perf as needed... two dynamic resolution planes.It's nifty.
(09:34:29) Source:: It'll definitely be BC... definitely. But, will every game work? Probably not...
Source: showtimeforfree.com


Could be possible something like that? downscale to make the pixel and vertex rendering and upscale again without looking like simple upscaling?.
 
Last edited by a moderator:
Could be possible something like that? downscale to make the pixel and vertex rendering and upscale again without looking like simple upscaling?.


So , they put hardware to render in low resolutions and then upscale it , but not looking like simple upscaling ?
It keeps getting worse , no thanks !!
 
uh oh! One of my theories!. Out-of-order GPU!.
With GPUs being as multithreaded as they are there's really no reason for out of order execution...nor feasibility I suspect, since keeping track of the ~100ish instructions in flight at any one time on a CPU is very expensive in both power and transistors. Now think of a GPU, with thousands of instructions, tens of thousands for a large chip.

Nah. Not happening, for a while yet at least. If ever. :D
 
I was thinking.. it's not possible to use esram with classic deferred rendering engines, but more modern tile-based deferred engines can do the magic, it uses screen tiles to group lights, with a tight tile frusta to cull non-intersecting lights (reducing the number of lights to consider)

am I right?

this was discussed byt Lauritzen at the siggraph 2010

it's possible to tile in a fast 32 MB?
 
Status
Not open for further replies.
Back
Top