Xbox One (Durango) Technical hardware investigation

Brad Grenz · Feb 12, 2013

Even doing it in software would only cost like one half of one percent of the Orbis GPU's total processing power. So unless the Orbis version inexplicably needs 100+ HUDs composited in one at a time it should be fine.

inefficient · Feb 12, 2013

Silent_Buddha said:
That last little bit also made me think that it could give a graphical advantage in some respects if some game, for example, only ran at 720p or any other sub 1080p resolution, but Durango's Game UI was at 1080p while Orbis's Game UI was at the same resolution as the game.

For some side by side comparison like Digital Foundry, you'd be able to pick it apart. For your average consumer though they may think the game with the sharp native res UI is actually better looking than the other one, even if the other one had, say, better shadows and lighting.

Of course, if that were the case, that would force Orbis developers to also have a native res UI. But if there's no hardware support it may require more GPU resources, thus leveling the playing field regardless.

Regards,
SB

Like it has been said before... lots of PS3 games already do this. People are making mountains out of mole hills.

patsu · Feb 12, 2013

Yep, I think it's more for features like in-game video chat (for example) with Rockstar's next game. Our regular HUDs are quite trivial in comparison. Those video chat live feeds should give way to the game, like how Torne DVR yields to PS3 game during recording.

astrograd · Feb 13, 2013

patsu said:
The DME is an inherent part of the system. It will be used for "everything".

I know, I thought we were referring to the other 2 display planes. :?:

Where dynamic resolution is concerned, the QoS is guaranteed by sacrificing something, as the patent put it. I didn't say it's not a component of optimization.

Hmmm...fair enough. I don't see anyone suggesting they aren't 'sacrificing' anything. That seems to be the point of having them! I view these planes are moreso giving devs an extra tool for deciding how to manage resources effectively.

I pointed out there are other software optimization techniques developers may use to avoid sacrifices in a standalone game. The developers will/should focus on those techniques first.

Why utilize a software solution that requires additional processing first and foremost when hardware does it for free? I can see doing both, but why prioritize the software approach ahead of the hardware approach instead of the other way around? Just by the sound of it wouldn't the hardware approach be objectively smarter to leverage first and then use software if need be?

The display planes are helpful in compositing (for free). But within a game, I doubt the saving is great. They can control the update rate, timing, quality, quantity, and region of different elements by software.

If it wasn't useful we wouldn't be having a conversation about which is the better way to implement it would we?

I am still unsure about your assertion here though. I agree the planes do all of that, but if (might be a big if, honestly not sure so feel free to correct me with evidence to the contrary!) the planes have no distinction between one another, what would keep devs from rendering foregrounds in the HUD plane and a background in the game plane? This sounds like it could be much more helpful in terms of resource management since backgrounds are inherently blurry in most modern games anyhow. In this sense I am suggesting they may be pretty big deals due to the options they afford devs as opposed to only viewing their benefits from the perspective of a hardware vs. software dynamic res comparison.

It seems that the display planes are more useful and their benefits more apparent when you have layers of information from different sources. e.g., Custom software HUD over a dedicated video decoder output, OS mouse cursor over an app window, miniaturized game output in OS screen, AR over Kinect video or Blu-ray. It may mean Durango can compose all the different output together while ensuring the responsiveness of the OS. These are part of the "new" experiences I expect in the Durango OS.

I'm not disagreeing with this. I'm suggesting that there are gaming benefits for visuals that may be leveraged in addition to the obvious OS applications.

As others have pointed out, without the display planes, it is not uncommon to have color conversion, scaling, compositing done as part of the display engine too (Many have simple hardware overlays to do this). The Durango display planes seem more elaborate in that you can divide them into 4 quadrants to mask obscured output.

But this right here sounds like it may be leveraged to do foreground/background sort of stuff instead of simply HUD overlays. That to me sounds like a HUGE difference in terms of how devs can leverage it in modern games.

As such, they may be there because Durango's OS relies on them for a new and consistent experience.

Certainly possible (practically guaranteed imho). That doesn't somehow limit its application to gaming though.

Averagejoe · Feb 13, 2013

Silent_Buddha said:
PS2 also had a small army of processors and did quite well compared to relatively more straightforward Dreamcast. That one turned out differently.

Regards,
SB

Well the PS2 was more powerful in all areas than the DC,from GPU to CPU to ram.

The dreamcast was just friendlier to develop for.

astrograd · Feb 13, 2013

So just to ask, does anyone see anything either in the patent or the VGLeaks article that suggests devs couldn't use one plane to display a foreground in a game at one res and the background of the game world in another, with DoF or whatnot applied to it? It seems to me that if the 2 application/game planes are the same in their operation you could do something like that, leaving the lower LOD background at dynamic res/HDR, etc and potentially save tons on the fillrate. Or no?

patsu · Feb 13, 2013

astrograd said:
Why utilize a software solution that requires additional processing first and foremost when hardware does it for free? I can see doing both, but why prioritize the software approach ahead of the hardware approach instead of the other way around? Just by the sound of it wouldn't the hardware approach be objectively smarter to leverage first and then use software if need be?

If it wasn't useful we wouldn't be having a conversation about which is the better way to implement it would we?

Because software is flexible and specific. You can usually find pockets of stuff to optimize based on additional application knowledge. Modern GPU is done by programmable shaders and compute units right ?

I'm not prioritizing it based on h/w vs s/w. The dynamic resolution technique (even if done by software) starts with compromise first. I'm saying developers will look at the whole "picture" first and see if they can do it without compromises.

As for hardware leverage, they will take advantage of h/w acceleration as long as they see benefits; and the h/w setup doesn't interfere with where they want to go. 360 has "free" AA but I think not everyone used it (unless the TRC mandates its use).

astrograd · Feb 13, 2013

patsu said:
I'm not prioritizing it based on h/w vs s/w.

You said devs should start with a software approach first. That kinda statement makes it sound like you were prioritizing sw over hw.

I'm saying developers will look at the whole "picture" first and see if they can do it without compromises.

Right...and this gives those devs one more option to tweak/manage while trying to decide how to maximize the meaningful detail displayed on screen. This is in addition to any software approach they may want to take.

patsu · Feb 13, 2013

astrograd said:
You said devs should start with a software approach first. That kinda statement makes it sound like you were prioritizing sw over hw.

I said:

I pointed out there are other software optimization techniques developers may use to avoid sacrifices in a standalone game. The developers will/should focus on those techniques first.

...because you started talking about dynamic resolution using the display planes first.

Right...and this gives those devs one more option to tweak/manage while trying to decide how to maximize the meaningful detail displayed on screen. This is in addition to any software approach they may want to take.

Sure, but I doubt they will start by talking about dynamic resolution first. If they can find a way to avoid downgrading the resolution, they may prefer that approach. Whether it uses the display planes would be a natural decision after that.

e.g. They may look at their existing engine architecture first. Some devs also have multiplatform considerations. They may also consider display plane uses in other ways.

Shifty Geezer · Feb 13, 2013

astrograd said:
So just to ask, does anyone see anything either in the patent or the VGLeaks article that suggests devs couldn't use one plane to display a foreground in a game at one res and the background of the game world in another, with DoF or whatnot applied to it? It seems to me that if the 2 application/game planes are the same in their operation you could do something like that, leaving the lower LOD background at dynamic res/HDR, etc and potentially save tons on the fillrate. Or no?

Yes, but ti's not as exciting as you think. Firstly, DOF applied to a plance is just a background blur. Planes won't help in creating high quality DOF effects. Secondly, the game engine still needs to render out two seperate passes into two separate buffers and combine them. The rendering output resolution and refresh is developer controlled. Any game can choose to separate out the background, render it a lower resolution (and not just background, but particles and other 'layers' which happens frequently), and then composite. The difference with Durango is the compositing is happening in a hardware video out device. I still expect games to use software multiresolution buffers and composite just as they do now. Particles and reflections will be rendered in a separate pass to a lower resolution, blurred, upscaled, and composited with the many geometry and lighting. If Durango isn't a hardware deferred renderer, it'll have no advantage in any of that. And with deferred rendering, you'll have lots of buffers where the ability to alpha blend two in hardware isn't going to help.

The planes system is pretty clear to have low res game + high res UI + OS. That's why it's fixed at 3 planes instead of being generic upscale and blend hardware. It's probably a tiny piece of silicon, using 1% of the silicon budget to provide 3% of the rendering power, making it an efficiency optimisation more than a performance optimisation. The on-screen results are likely just going to be very clean UIs and smooth OS integration.

Love_In_Rio · Feb 13, 2013

What are the average latency differences between:
-6T - SRAM
-1T - SRAM
-DRAM

This article about SiSoft Sandra´s caches latency measurement is very very good to understand how a low latency main pool of memory could improve efficiency in a GPU in a radical way:
http://www.sisoftware.net/?d=qa&f=gpu_mem_latency

The latency of the main memory directly influences the efficiency of the GPU, thus its performance: reducing wait time can be more important than increasing execution speed. Unfortunately, memory has huge latency (today, by a factor of 100 or more): A GPU waiting for 100 clocks for data would run at 1/100 efficiency, i.e. 1% of theoretical performance!

french toast · Feb 13, 2013

Love_In_Rio said:
What are the average latency differences between:
-6T - SRAM
-1T - SRAM
-DRAM

This article about SiSoft Sandra´s caches latency measurement is very very good to understand how a low latency main pool of memory could improve efficiency in a GPU in a radical way:
http://www.sisoftware.net/?d=qa&f=gpu_mem_latency

The latency of the main memory directly influences the efficiency of the GPU, thus its performance: reducing wait time can be more important than increasing execution speed. Unfortunately, memory has huge latency (today, by a factor of 100 or more): A GPU waiting for 100 clocks for data would run at 1/100 efficiency, i.e. 1% of theoretical performance!

Damn that latency!! If durango has a very low latency sram..then thatooks to be a VERY smart thing to do...interesting how that article says improved latency would increase gpu performance better than faster execution resources. ....mmm.

Love_In_Rio · Feb 13, 2013

french toast said:
Damn that latency!! If durango has a very low latency sram..then thatooks to be a VERY smart thing to do...interesting how that article says improved latency would increase gpu performance better than faster execution resources. ....mmm.

If latency is 20 cycles or less we could see the 800 mhz CUs behave like a 3x800 = 2400 mhz CUs, and so, getting the rumored 680GTX performance.

What if it is called Kryptos because the ESRAM is the system kryptonite?.

french toast · Feb 13, 2013

Love_In_Rio said:
If latency is 20 cycles or less we could see the 800 mhz CUs behave like a 3x800 = 2400 mhz CUs, and so, getting the rumored 680GTX performance.

What if it is called Kryptos because the ESRAM is the system kryptonite?.

Ha yes! Now were talking...does seem hard to believe we will be seeing anywhere near that kind of performance though...

Edit perhaps we could see some increased performace with the shaders...3x seems too fantastical to me.

kots · Feb 13, 2013

So that's why Microsoft put an outdated GPU inside the 720 , because of the ESRAM ... maybe it will more than make up for any limitations ... that's how you get the"680gtx performance" .
That's amazing .

Love_In_Rio · Feb 13, 2013

kots said:
So that's why Microsoft put an outdated GPU inside the 720 , because of the ESRAM ... maybe it will more than make up for any limitations ... that's how you get the"680gtx performance" .
That's amazing .

Well, the ESRAM if really 6T-ESRAM will be very big but the TDP that will provide will be very little. And yields, well, will be very low. But if performance / tdp is so good then MS could be in something.

Let´s wait for the Vgleaks memory config article.

Gubbi · Feb 13, 2013

Love_In_Rio said:
Well, the ESRAM if really 6T-ESRAM will be very big but the TDP that will provide will be very little. And yields, well, will be very low. But if performance / tdp is so good then MS could be in something.

The size of the embedded RAM array won't affect yield at all. It is trivial to build rendundancy into the array to counter any yield issues.

It is unlikely to be SRAM because of cost.

The low latency won't help rendering much, but it might very well boost GP compute.

The extra bandwidth will help both, the alternative is to cut capacity and spend more money doing so.

Cheers

Love_In_Rio · Feb 13, 2013

Gubbi said:
The size of the embedded RAM array won't affect yield at all. It is trivial to build rendundancy into the array to counter any yield issues.

The low latency won't help rendering much, but it might very well boost GP compute.

The extra bandwidth will help both, the alternative is to cut capacity and spend more money doing so.

Cheers

When you say rendering you include also avoiding ALUs stalls?. Because in the article i posted you can see how when the more random accesses to main memory the more cycles ALUs are waiting for data. So, the boost would not restrict only to GPGPU ops, but also to GPU ordinary shader ops.

Gubbi · Feb 13, 2013

Love_In_Rio said:
When you say rendering you include also avoiding ALUs stalls?. Because in the article i posted you can see how when the more random accesses to main memory the more cycles ALUs are waiting for data. So, the boost would not restrict only to GPGPU ops, but also to GPU ordinary shader ops.

The article you linked to specifically mentioned SiSoft Sandra's cryptography benchmark, a GPGPU benchmark.

Cheers

Love_In_Rio · Feb 13, 2013

Gubbi said:
The article you linked to specifically mentioned SiSoft Sandra's cryptography benchmark, a GPGPU benchmark.

Cheers

In that case the example of 3x800 mhz would be only for GPGPU ops, and in this way we already would have matched the 7970 performance for GPGPU ops.

Well, i don´t know how much efficiency GCN architecture gets with general shader ops but i supposse that wavefronts management in the CUs won´t be able to hide all the latency misses in the 100% of the cases when we are talking of visual memories with 300-500 latency cycles, so, the question is now...
What are the ALU average usage in the GCN architecture when not doint GPGPU ops with rendering shader intensive code?.

Xbox One (Durango) Technical hardware investigation

Brad Grenz

Philosopher & Poet

inefficient

patsu

astrograd

Averagejoe

astrograd

patsu

astrograd

patsu

Shifty Geezer

uber-Troll!

Love_In_Rio

french toast

Love_In_Rio

french toast

kots

Love_In_Rio

Gubbi

Love_In_Rio

Gubbi

Love_In_Rio

Similar threads