Xbox One (Durango) Technical hardware investigation

Pixel · Feb 10, 2014

In the case of Forza they used deferred lighting which decreases memory bandwidth requirements at the cost of poly count. Polycount for environments are generally lower in racers so it works perfectly as devs have much more control & can optimize that much further since you are on a predetermined path.

Obviously Xbox One can do 1080p 60fps, if they did a game like Trine2 for example, they could even go well above 1080p or above 60fps.

MetalSpirit · Feb 10, 2014

Since we talked Rebelion Games and the Xbox Esram, can somebody explain something to me?

This developer once wrote on his blog some of the problems of using tile rendering to allow for smaller framebuffers to fit the 10 MB EDRAM of the Xbox 360 (the same method that will be used to overcome the One limitations).

As he explains there are inconveniences on this method. But besides what he refers is there no diference between rendering the screen in two parts instead of just one?
I have no know how in this, but doesn´t this division require extra clock cicles or has any extra penalties in performance, even if associated with other paralel graphical tasks?

I tried to search for an answer for my questions, and besides the matters he refers I found no other issues, so I´m asking you guys what other problems you have encountered with this method.

PS: This may seem like a software question, but in reality its related with the ESRAM and the way it may perform, thus it is hardware and performance related.

Shifty Geezer · Feb 10, 2014

Pixel said:
In the case of Forza they used deferred lighting which decreases memory bandwidth requirements at the cost of poly count.

You're mistaken. Model detail is independent of rendering model - you can render the same meshes forwards or deferred. Deferred rendering is all about dynamic light sources.

steveOrino · Feb 10, 2014

MetalSpirit said:
Since we talked Rebelion Games and the Xbox Esram, can somebody explain something to me?

This developer once wrote on his blog some of the problems of using tile rendering to allow for smaller framebuffers to fit the 10 MB EDRAM of the Xbox 360 (the same method that will be used to overcome the One limitations).

As he explains there are inconveniences on this method. But besides what he refers is there no diference between rendering the screen in two parts instead of just one?
I have no know how in this, but doesn´t this division require extra clock cicles or has any extra penalties in performance, even if associated with other paralel graphical tasks?

I tried to search for an answer for my questions, and besides the matters he refers I found no other issues, so I´m asking you guys what other problems you have encountered with this method.

PS: This may seem like a software question, but in reality its related with the ESRAM and the way it may perform, thus it is hardware and performance related.

Tiling is problematic from a performance standpoint as it increases latency and geometry passes required. The consensus is to avoid tiling if possible. Bungie had a write-up about this issue but I cant remember where I saw it.

Pixel · Feb 10, 2014

Shifty Geezer said:
You're mistaken. Model detail is independent of rendering model - you can render the same meshes forwards or deferred. Deferred rendering is all about dynamic light sources.

Deferred lighting does rendering in 3 stages, the geometry pass twice, instead of once. Unlike typical deferred shading which only does it once. I recall the Dead Rising 3 devs talking about how they got assistance from MS in their development and how they were envious of Forza's uses of Deffered lighting. Lemme see if i can dig up the article.

AlNom · Feb 10, 2014

Forza is a forward renderer unless you've mistaken it for Project Cars, which is known to be a variant of light pre-pass.

Pixel said:
Deferred lighting does the geometry pass twice, instead of once. Unlike typical deferred shading which only does it once.

Sure, it's possible to fit a mini G-buffer (3MRT+depth) @ 1080p 32bpp, but you'll want to take advantage of the larger read/write bandwidth of the scratchpad for post-processing and shadowmaps, which also take up space.

A basic 2k x 2k shadowmap is 16MB. VSM/ESM/EVSM, SDSM, etc use up more memory.

At the end of the day, it's going to require more managing than just rendering to a single large memory pool, especially for engines using 4+ render targets just for the G-buffer alone (Frostbite 3, UE4, Panta Rhei)

Globalisateur · Feb 10, 2014

Lalaland said:
Turn 10 did solid work on F5 but racing games do represented the best case scenario for high frame rates as there are relatively few polys to push at 1080/60. F5 uses sprite spectators and some pretty low res/poly structures off track as when racing none of these things matter that much.

Don't forget zero Anti Aliasing which is a first for a Forza game (which is perfectly fine for me, I prefer Forza 5 sharp zero AA than the GT6 horrible Quincunx):

Of course many future XB1 games will use memory tiling but tiling is not free.

AlNom · Feb 10, 2014

Forza 5 is 2xAA. It's just extremely broken (early in the rendering).

zupallinere · Feb 10, 2014

Pixel said:
... devs have much more control & can optimize that much further since you are on a predetermined path.

... and you are driving in a 2d plane and you are only looking forward for the most part. Knowing what is needed to be loaded next could be quite a help in knowing what and how to fill the esram module. Not to say that other games won't do it ( fighting games come to mind ) it's just that it is going to be a bigger bang for the buck in driving games with fixed courses. Forza Horizon ( if it is coming ) will be a bit more of a challenge but they will also have a lot more experience.

AlNom · Feb 10, 2014

It's highly unlikely you'd stream game assets to and from the scratchpad.

zupallinere · Feb 10, 2014

liolio said:
From a technical pov it is wasteful, (it also applies to a lesser extend to the ps4), I'm far from convinced that a system embarking 4 jaguar cores @2GHz, 8 CUs / 16ROPs high clocked GPU, linked to really fast GDDR5 through a 128 bit bus (so 4GB only) would take the back seat as far as gaming performances are concerned. May be not as power efficient (though both the ps4 and XB1 suck here the power consumption while in menus is dreadful...) but it is a far stretch from melting the mobo.

This cpu heavy design reminds me of the PS3 and the use of the Cell to offset the issues with the GPU. If we are merely talking about hitting certain thermal thresholds ( which are the gate here I think we can all agree ) and what can be drawn on the screen at any one point and time I can't argue with you. From the point of view of the gaming ecosystem I would say however that lots of memory and a GPGPU focus does have certain advantages going forward.

Lots of memory might not give you a huge bang for the buck in terms of what can be displayed on the screen per se but it does give the dev more flexibility to allocate resources. Lots of CPU is great for AI for instance but if you are running out of memory it could be a waste. Deforming worlds can gain from a heavy duty CPU as well but again I think memory constraints could be more problematic than lack of CPU cycles. Creating a floor now in terms of what memory will be available for the next generation or 2 is a pretty good bet IMHO. Moving from 4gb this gen to say 8 gb next gen could limit the types of games and gameplay available. Starting with 8 gb and having almost a decade to play with those resources would seem to be the better play.

GPGPU is a bit more of a stretch but how many games are GPU constrained vs CPU constrained ? IF GPGPU becomes a thing ( since both consoles can benefit from it as well as PCs ) then leveling up the CU count will be seen as forward thinking. If not well at least the pictures look pretty

Multiprocessing cpu cores will have long legs as well of course BUT if HSA-nassity gains some momentum it may allow for a bit more competition and diversity in the computing space vs pushing cpu cores and process shrinks. Might keep AMD in the fight and give ARM a bridgehead at least for a while ;-)

Still HSA or whatever comes of this leveling up on GPGPU IS a bigger bet and more fraught than pushing lots of memory.

taisui · Feb 10, 2014

zupallinere said:
... and you are driving in a 2d plane and you are only looking forward for the most part. Knowing what is needed to be loaded next could be quite a help in knowing what and how to fill the esram module. Not to say that other games won't do it ( fighting games come to mind ) it's just that it is going to be a bigger bang for the buck in driving games with fixed courses. Forza Horizon ( if it is coming ) will be a bit more of a challenge but they will also have a lot more experience.

you can always optimize loading for visibility set, heck, I wonder if there's any AAA that does NOT do it, racing game or not.

Cyan · Feb 10, 2014

AlNets said:
Forza 5 is 2xAA. It's just extremely broken (early in the rendering).

It seemed to happen to a game Brad_Grenz mentioned, Powerstar Golf, which allegedly runs at 1080p but had very visible jaggies. Maybe it's that I got used to it or the new patch added a different AA, but it looks better to me now, less jaggier.

I wonder if the new SDK increases the drivers' performance or they managed to free up 8% of the GPU resources, but Boltaco doesn't talk about that.

zupallinere · Feb 10, 2014

taisui said:
you can always optimize loading for visibility set, heck, I wonder if there's any AAA that does NOT do it, racing game or not.

Sure but there is a significant difference between a FPS with 360 degrees of freedom walking/running around with all kinds of things needed to be animated based on gameplay vs creating a very small world, that doesn't drift very far from a known path with a "cone of rendering" that is almost always pointing in in the direction of travel for said known path. Very few things changing based on gameplay. I'm not saying rendering other cars well isn't a challenge but the changes that have to be made are fairly easily predicted for the most part.

AlNom · Feb 10, 2014

Cyan said:
It seemed to happen to a game Brad_Grenz mentioned, Powerstar Golf, which allegedly runs at 1080p but had very visible jaggies. Maybe it's that I got used to it or the new patch added a different AA, but it looks better to me now, less jaggier.

hm... the only shots I've seen of PG looked rather upscaled, IIRC.

(Anyhoo, a bit off topic)

taisui · Feb 11, 2014

zupallinere said:
Sure but there is a significant difference between a FPS with 360 degrees of freedom walking/running around with all kinds of things needed to be animated based on gameplay vs creating a very small world, that doesn't drift very far from a known path with a "cone of rendering" that is almost always pointing in in the direction of travel for said known path. Very few things changing based on gameplay. I'm not saying rendering other cars well isn't a challenge but the changes that have to be made are fairly easily predicted for the most part.

a mesh is a mesh, a rig is a rig, an animation is an animation, saying that esram is easier to be "filled" with a racing game versus any other genre is really stretching it at best.

zupallinere · Feb 11, 2014

taisui said:
a mesh is a mesh, a rig is a rig, an animation is an animation, saying that esram is easier to be "filled" with a racing game versus any other genre is really stretching it at best.

Knowing what is coming up and in what order is not helpful ?? Sure a polygon is a polygon but being able to predict with in some cases near certainty what data access patterns you will need must be of some value. Games aren't made with one big array of of data that is accessed randomly so knowing what is going to come next can't help but give you better performance. With the ESRAM being fast in access but limited in size it would be effective in getting the resources needed in and out and in a particular order.

3dilettante · Feb 18, 2014

eSRAM aside, one thing that came to mind abou the FPS differential is a comment made about Microsoft's removing virtualization overhead for the GPU, and how they worked hard to cut down the number of interrupts since those did have a higher cost with the virtualization setup.

No hard numbers are given, so this is again speculative, but an interrupt would be something that would be more dependent on the front end, OS, and uncore than on anything that is actually scalable on a GPU.
Potentially, this a smallish cost, but given the long latency numbers for other forms of control events on GPUs and tendency for such events to not be readily overlaid with graphics work, it might be a relatively small but fixed time cost that would scale linearly with the FPS target. That would become more noticeable as you cut things down to 16ms per frame.

pMax · Feb 19, 2014

3dilettante said:
eSRAM aside, one thing that came to mind abou the FPS differential is a comment made about Microsoft's removing virtualization overhead for the GPU, and how they worked hard to cut down the number of interrupts since those did have a higher cost with the virtualization setup.

...GPU doesnt send SO many interrupts to the CPU, usually. Of course, you'd have more interrupt to manage UMA stuff and the like, but GPU should fill/process its command queues without sending interrupts, unless something is wrong or such, I guess.

So, which GPU-flooding interrupts are we talking about? :???:

Also, most of today's GPU should support virtualization helpers for VMWare/whatever cloud jobs...

what could be killing, imho, is if their video memory is allocated and reallocated very fast and for many objects.
-----------
rethinking, the way they reflect interrupt back to the two VMs can be quite expensive, but still I do not get which GPU interrupt is so expensive to reflect back, unless it is related somehow to the memory management.

3dilettante · Feb 19, 2014

pMax said:
...GPU doesnt send SO many interrupts to the CPU, usually.

The XBox One designers made a point of indicating that they did a significant amount of work to reduce the number of interrupts per frame to two.
Negligible costs per interrupt don't strike me as being the reason why they'd do so much.

Of course, you'd have more interrupt to manage UMA stuff and the like, but GPU should fill/process its command queues without sending interrupts, unless something is wrong or such, I guess.

A number of things that initially seemed trivial for GPUs seem to be more onerous in reality.

So, which GPU-flooding interrupts are we talking about?

Potentially some kind of changeover between the separate virtualized game and OS graphics contexts.
I would grant AMD's CPUs with enough capability as to handle two VM trips every 16 or 30ms, but I don't count on the GPU being on the same level. We actually do have GPU latencies for various front-end functions and queueing delays under load (PS4, but still) and I was surprised how bad they could be (30 or so ms). Even an order of magnitude improvement over some of those delays would still be an uncomfortably large chunk of 16ms.

Also, most of today's GPU should support virtualization helpers for VMWare/whatever cloud jobs...

I've not seen it described as being as optimized or automatic as CPU virtualization support, nor that 60 FPS is a high priority for a cloud service.

Xbox One (Durango) Technical hardware investigation

Pixel

MetalSpirit

Shifty Geezer

uber-Troll!

steveOrino

Pixel

AlNom

Moderator

Globalisateur

Globby

AlNom

Moderator

zupallinere

AlNom

Moderator

zupallinere

taisui

Cyan

orange

zupallinere

AlNom

Moderator

taisui

zupallinere

3dilettante

pMax

3dilettante

Similar threads