Xbox One (Durango) Technical hardware investigation

Status
Not open for further replies.
Basically, the BW version of double buffering except people usually don't claim they have 512kb of cache when they're double buffering 256kb.

So banked access is the core of this simultaneously read/write, them my question is how these memory controllers per 8MB partition (4 total) receive both a read and write address?

Even the dual ports( one read and the other write), have to receive an address each.
Banked access per se do not fully explain things, because both DDR3 and GDDR5 can have banked access too.

http://inst.eecs.berkeley.edu/~cs250/fa10/lectures/lec08.pdf Can anyone explain requests in detail at page 23
 
Last edited by a moderator:
What I don't understand with the dynamic resolution is how do you know when to drop the resolution and by the time you know wouldn't it be too late? Is there some sort of counter indicating time to render the frame and you need to make the decision on the resolution at the start of the rendering of the frame?

Variations in rendering time are normally fairly low from frame to frame. So you set a deadline for rendering your current frame to 85% of a frame (ie. ~13.5 ms of a 16ms frame). If you finish after the deadline, there is still a good chance you're within the full 16ms frame time, so no harm done.

To ensure the next frame will finish before the deadline, you scale back resolution by the time you went past your current deadline. Say your rendering took 15ms, you then scale back resolution by the factor 13.5/15. If you only scale horizontally (like Wipeout), you end up with 1728x1080. With XB1's scaler you can scale dynamically in both dimensions and render the next frame in 1820x1024, which will be much harder to notice than dropped frames.

Cheers
 
so gubbi is it something like this?

wipeout could only scale horizontally.

1 frame = 1024x1080
2 frame = 1920x1080

while xbox one can do

1 frame = 1820x1024 or any resolution
2 frame = 1920x1080

correct?
Did you miss that part where DF mentioned Wipeout HD, a PS3 game? This technique is not new.

i understand that but ms is doing it with fixed fucntion hardware wipeout did it with software. which means more developers are likely to use it since they don't have to implement it themselves. also it frees up resources (im not sue how much but maybe not alot since most of the games that did it are 60fps?).
 
Is 16ROPS going to become a huge issue?

Doubtful. The ROPs can handle 4x16bit pixel formats at full tilt. The ESRAM matches this perfectly with 109GB/s bandwidth. On XB1 writes from the ROPs to the ESRAM are basically free (ie. will almost never stall).

The only obvious place where more ROPs could be useful is in shadow buffer creation. However, if shadow buffer rendering shifts to PRT based techniques, already demoed by MS, the burden is likely to shift somewhat away from the ROPs.

Cheers
 
So banked access is the core of this simultaneously read/write, them my question is how these memory controllers per 8MB partition (4 total) receive both a read and write address?

Even the dual ports( one read and the other write), have to receive an address each.
Banked access per se do not fully explain things, because both DDR3 and GDDR5 can have banked access too.

It's called the banked multiport design. Essentially there's a read port and a write port at the interface, with an arbitrator in front the banked memory. When both read and write are requested for the same address, the arbitrator decides how the data flow is handled. In this particular application, it could just be that writes always trump reads.

It's not a new technique, and MSFT did not mislead on the BW calculation. The final design is just 109G/218G theoretical, their benchmark showed typical access pattern maxed out at 150G.
 
What I don't understand with the dynamic resolution is how do you know when to drop the resolution and by the time you know wouldn't it be too late? Is there some sort of counter indicating time to render the frame and you need to make the decision on the resolution at the start of the rendering of the frame?

I'm thinking that you can probably guesstimate it either by the object count, or baked it in when scene complexity offline.
 
Even the dual ports( one read and the other write), have to receive an address each.
Banked access per se do not fully explain things, because both DDR3 and GDDR5 can have banked access too.
To add to what was said already, DDR3 and GDDR5 have only a single set of data and address lines which have to handle both read and writes. So while they are organised in banks internally (which helps to reduce/avoid additional latency when switching from one bank to another), there is only a single port from the DRAM to the interface.
 
Variations in rendering time are normally fairly low from frame to frame. So you set a deadline for rendering your current frame to 85% of a frame (ie. ~13.5 ms of a 16ms frame). If you finish after the deadline, there is still a good chance you're within the full 16ms frame time, so no harm done.

To ensure the next frame will finish before the deadline, you scale back resolution by the time you went past your current deadline. Say your rendering took 15ms, you then scale back resolution by the factor 13.5/15. If you only scale horizontally (like Wipeout), you end up with 1728x1080. With XB1's scaler you can scale dynamically in both dimensions and render the next frame in 1820x1024, which will be much harder to notice than dropped frames.

Cheers

Additionally, you can adjust framerates for each plane independently as well as color depth.
 
I'm thinking that you can probably guesstimate it either by the object count, or baked it in when scene complexity offline.
You can simply measure the time distance between the frame being completed and the buffer flip. Usually there should be a slight margin (let's say 1 ms). If that falls under a certain threshold, you can preemptively reduce resolution to ensure you keep a margin. Or you could turn to a "softer" version of VSYNC, i.e. one syncs if the rendering is ready, but not, if the rendering completes just slightly too late (causing tearing in the very first lines of the screen which is often not very noticable) and use this as trigger to reduce resolution. As the work for the GPU tends to change relatively slowly from frame to frame, this should work pretty good in most cases.

Edit: Oops, Gubbi explained it already.
 
The MS engineers said that the gpus for xb1 and ps4 are based on Sea Islands. I thought the ps4 gpu was comparable to Pitcairn?
 
To add to what was said already, DDR3 and GDDR5 have only a single set of data and address lines which have to handle both read and writes. So while they are organised in banks internally (which helps to reduce/avoid additional latency when switching from one bank to another), there is only a single port from the DRAM to the interface.

Thanks for the info now, this single port is for each channel of RAM? A memory controller can arbitrate between channels right, so for 8GB RAM: X1 has 4-64bit channels and PS4 GDDR5 in clamshell mode has 8-32bit "channels" (not really channels as in this case its point-to-point, assuming 16-4Gbit chips).

The same technique can be extended to both DDR3 and GDDR5 with limitation on granularity, if the memory controllers are programmable?
 
Last edited by a moderator:
A "request" is simply a request to the memory controller to carry out a memory access (i.e. to read or write at a certain address). You could also call it the command (but that fits not exactly, imo). Or what was the question?

Thanks, exactly if the address+command with other customizations like ECC, write masks and such, form this "request".
 
Thanks for the info now, this single port is for each channel of RAM? A memory controller can arbitrate between channels right, so for 8GB RAM: X1 has 4-64bit channels and PS4 GDDR5 in clamshell mode has 8-32bit "channels" (not really channels as in this case its point-to-point, assuming 16-4Gbit chips).

The same technique can be extended to both DDR3 and GDDR5 with limitation on granularity, if the memory controllers are programmable?
Yes, of course. One can write on one channel of the DRAM while reading on another at the same time. And GDDR5 has indeed 32bit channels as the two 16bit devices in clamshell mode share the command and address pins, just the (data) clock and data pins (16 per chip, so together 32) are separate and point to point.
Thanks, exactly if the address+command with other customizations like ECC, write masks and such, form this "request".
Yes, it does.
 
What I don't understand with the dynamic resolution is how do you know when to drop the resolution and by the time you know wouldn't it be too late? Is there some sort of counter indicating time to render the frame and you need to make the decision on the resolution at the start of the rendering of the frame?

Yes usually when you know it's too late, but you assume that subsequent frames will likely have similar execution time, so you add some hysteresis when returning to higher frame rates, and ideally you drop 1 frame instead of a sequence of them.
 
Yes usually when you know it's too late, but you assume that subsequent frames will likely have similar execution time, so you add some hysteresis when returning to higher frame rates, and ideally you drop 1 frame instead of a sequence of them.

is that for like software implementation? would hardware implementation have that same problem? MS said their scaler chip can change it on a frame by frame basis so i assume that it doesn't apply to them?
 
is that for like software implementation? would hardware implementation have that same problem? MS said their scaler chip can change it on a frame by frame basis so i assume that it doesn't apply to them?

Yes the issue is by the time you know you can't render the whole frame at 1080P, you've already rendered most of it at 1080P.
But any reasonably intelligent algorithm should only see a dropped frame when there is a significant increase in rendering cost between frames.
 
It's called the banked multiport design. Essentially there's a read port and a write port at the interface, with an arbitrator in front the banked memory. When both read and write are requested for the same address, the arbitrator decides how the data flow is handled. In this particular application, it could just be that writes always trump reads.

It's not a new technique, and MSFT did not mislead on the BW calculation. The final design is just 109G/218G theoretical, their benchmark showed typical access pattern maxed out at 150G.

The article still has the theoretical max at 204 GB/s.
The side bar of the article points to a peak mix of full-rate reads and a bubble on the write pipeline.
That is the apparent best-case.
 
Importantly for me, the design for XB1 has centred around the same value of Wii U - power efficiency. It's an interesting choice rather than going with significant power draw.

Are XB1 and WiiU operating in similar power envelopes? I expected the XB1 to be more potent than Wii U and closer to PS4 in terms of graphic fidelity at HDTV resolutions. Does anyone happen to have relative TDP of these systems?
 
The XBox One will likely draw 2-3 times the power of the WiiU when gaming (guess-timation based on rumored PSU size).
The WiiU when at dashboard or playing a game draws 33-35 Watts and has a max-rated PSU of 75 Watts.
 
Status
Not open for further replies.
Back
Top