Rendering Tech of Ryse

Scott_Arm · Mar 28, 2014

I'd avoid any kind of renderer comparisons until we see some second and third efforts.

I'd like to see the talk that went with the slides. I'm curious to know if they talked about areas where they felt they had the most room for improvement either in terms of features or performance.

AlNom · Mar 28, 2014

MJP said:
I talked to quite a few graphics devs at GDC, and the hybrid approach is becoming pretty popular. It makes sense, since you can just dump out your tile indices from your lighting compute shader.

mm... no uber shader to deal with either.

--

Wonder if it'd be worth it to pack their G-buffer into 2 FP16 targets even if it takes more space? Dunno if it'd work for their attributes in particular.

KKRT · Mar 28, 2014

AlNets said:
Wonder if it'd be worth it to pack their G-buffer into 2 FP16 targets even if it takes more space? Dunno if it'd work for their attributes in particular.

Wouldnt that just increase bandwidth requirements because You would need to access bigger framebuffer more frequently than in 3 buffers setup?

AlNom · Mar 28, 2014

KKRT said:
Would that just increase bandwidth requirements because You would accessing bigger framebuffer more frequently than in 3 buffers setup?

Was thinking more along the lines of 2xFP16 being better (ala Sebbbi) than 4xRGBA, but yeah, it's a larger G-buffer by nature in this particular case.

That's why I'm wondering "even if it takes more space". FP16 is full speed write for GCN (3 RTs eats up more fill than 2), but... nevermind. Bandwidth was probably more important.

On the other hand, could they spill over into DDR3 and maybe take advantage of having combined read bandwidth? *shrug*

Laa-Yosh · Mar 28, 2014

shredenvain said:
Im not sure I can agree that 900p is the highest achievable res on the Xbox one though.

A higher resolution would require trading off pixel quality in turn; Crytek says that they believe 900p to be a sweet spot. Obviously 720p would then allow even better looks, but even with their AA and upscale it might be too extreme as well. And yeah there might be some performance left to squeeze out and maybe 4-5 years from now it'd be possible to do Ryse at 1080p - I'm not nearly an expert on the issue to decide.

All I know is that I'd personally be content with this 900p IQ at a good stable 30fps. We'll see how it turns out

Laa-Yosh · Mar 28, 2014

Scott_Arm said:
I'm curious to know if they talked about areas where they felt they had the most room for improvement either in terms of features or performance.

What I don't really see is any mention of lower level hw optimizations, to the level that DICE went with BF3 on the PS3. But then again at that time the system was well known and they had enough time to focus on that, with some pretty talented and experienced programmers.
So my conclusion was that this is most likely the aspect to focus on for upcoming titles.

AlNom · Mar 28, 2014

Catering to the GPRs is somewhat low level.

Laa-Yosh · Mar 28, 2014

I was thinking more about maximizing the utilization of all the various buses and memories

Allandor · Mar 29, 2014

Laa-Yosh said:
I was thinking more about maximizing the utilization of all the various buses and memories

the game was originally developed for the xbox 360 and than the target console changed to the xbox one. the SDK was far from beeing ready and the dev-kits were far from final. the most time in the development cycle it was just not possible to optmize the game for the systems special architecture.
If the release date was march 2014, it might be looking even better (maybe the res would be be higher, too) and running smoother. but it was a launch title and launch titels are poorly optimized. if it is not running smooth at a predefined date, it gets downgraded. this is the simplest way to get it running up to the launch date.

Laa-Yosh · Mar 29, 2014

I think there's very little of the X360 version left in Ryse, the engine is new and most of the content was rebuilt (or uprezed). The more important aspect was the short production time IMHO, starting from almost scratch.

And of course this is the reason why it isn't fully optimized and it's of course perfectly understandable. All I'm saying is that the presentation seems to back this up by not getting into the optimization stuff. I'm in no way saying that Crytek did a bad job, if anything I'm interested to see what they can do with a 2nd gen title.

Nevertheless, I'm not sure if they'll spend any extra performance on more pixels instead of even better pixels. Nicer images are easier to sell to the general public than actual 1080p.

Ethatron · Mar 29, 2014

AlNets said:
Wonder if it'd be worth it to pack their G-buffer into 2 FP16 targets even if it takes more space?

If you want to use the filtering hardware and mip-maps then the G-buffers need to manage independence of sample-information. Data-packing is a no-no in that case.

AlNom · Mar 29, 2014

Ethatron said:
If you want to use the filtering hardware and mip-maps then the G-buffers need to manage independence of sample-information. Data-packing is a no-no in that case.

'mm... thought there'd be some funny business with the filtering...

chris1515 · Mar 29, 2014

I like their idea of preloading the static shadow in a big statyic shadow map. I hope more games without dynamic weather or day/night cycle will do the same! Great idea to have great shadow quality!!

Finally, an enormous static shadow map is generated only once when each level loads or when transitioning to a different area, taking advantage of the increased memory of the Xbox One. It includes all the static objects in the level and avoids re-rendering distant objects with every frame. The shadow map is 8192×8192 16 bit, weighs 128 mb and covers an 1 square kilometer area of the game’s world, providing sufficient resolution. This saves between 40 and 60% draw calls in shadow map passes.

SenjutsuSage · Mar 29, 2014

Laa-Yosh said:
I'd also like to say that I'm very impressed with the journey of Crytek since Cyris 1.

Back then they had some very talented graphics programmers coming up with a LOT of inventive solutions for implementing many visual features; but they were quite lacking in the performance aspects, and even high-end PCs have struggled to realize their vision.

And now on the X1, they seem to have that conquered as well, having both one of the best featured engine and image quality, while running at reasonably good performance and pretty much the highest resolution* on the system. We're now at a point where I'm more interested to see games using this engine, instead of UE4 (and by the way, where are Epic's nextgen titles??).

It's a bit of a disappointment that they have yet to make a game that can score 90+ altogether, but maybe that'll require lesser involvement from Yerli. He seems to be doing a good job at managing the studios and the tech development and all, but game design doesn't seem to be his strong suit. The Crysis games were never really catching to me either and I'm not a bit interested in playing Ryse either - but the tech is there to make something truly outstanding. Maybe they should start to pursue all the design talent leaving the Sony studios recently?...

* yeah, Forza is 1080p, but it does not have to bother with detailed characters and what they require, and the IQ is not the best either, seems more like a need to hit a checklist feature instead of using the best solution.

Oh, such a shame. I think Ryse is the best game they've ever made, and it really is better at higher difficulties. You, for example, can't button mash your way to victory at the higher difficulty settings such as Centurion. Maybe you can do this to an extent on lesser enemies earlier in the game, but you get mobbed a lot more in later levels, forcing you to flee and regroup quite often until the squad of enemies has settled enough for you make something happen. In quite a few encounters, you're lucky to get in a single good hit or two against an enemy before you're forced to dodge entirely, or pray you can nail a perfect counter on the more difficult to defend heavy attacks. There's one problem with blocking the heavy, however. In many situations, that just sends you sliding backwards into yet another enemy that is already on the attack and barely giving you a chance to recover.

It can't be said enough that the reviews got it wrong on this one. We aren't exactly talking game of the year, but this game and its overall production quality, especially the way the story and characters are handled, will surprise many if they actually saw all of what it had to offer. And the gameplay is legitimately good. You're really forced to make use of you're ability to alternate between the different execution rewards to give yourself that extra edge. All have their benefits, all have their downsides. Marius the main character is also one of the better game protagonists I've seen, and Commander Vitalion is as strong a lead support character as you could possibly have expected in this game. The person who voiced him did an incredible job, and Crytek did an even better job bringing the character to life from an overall visual standpoint.

KKRT · Mar 29, 2014

chris1515 said:
I like their idea of preloading the static shadow in a big statyic shadow map. I hope more games without dynamic weather or day/night cycle will do the same! Great idea to have great shadow quality!!

You can still use that in games with day/night cycles. You can render that shadow map slowly through 60-120 frames and apply update when its ready to new lighting conditions.
The moving distant geometry is more problematic.

----

Laa-Yosh said:
And of course this is the reason why it isn't fully optimized and it's of course perfectly understandable. All I'm saying is that the presentation seems to back this up by not getting into the optimization stuff. I'm in no way saying that Crytek did a bad job, if anything I'm interested to see what they can do with a 2nd gen title.

Nevertheless, I'm not sure if they'll spend any extra performance on more pixels instead of even better pixels. Nicer images are easier to sell to the general public than actual 1080p.

I think that they need to drop highest quality post-processing setting on current gen consoles, they just do not have spare power for it. Ryse is the only game on current gen that uses 1/2 res Bokeh DoF with high amount of taps and very high quality per-pixel motion blur.
Actually in DF framerate analysis most framerate drops were in scenes with execution mode, where DoF is applied.

sebbbi · Mar 29, 2014

MJP said:
I talked to quite a few graphics devs at GDC, and the hybrid approach is becoming pretty popular. It makes sense, since you can just dump out your tile indices from your lighting compute shader.

I agree. This is a nice approach, assuming you need a forward passes for transparencies (or other hard cases).

Cyan said:
I thought that Forward Rendering is a better approach for the memory architecture of the Xbox One.

It depends on many factors. If you are using obscenely fat g-buffer then... either you optimize it or use forward rendering. Thinnest possible g-buffer uses exactly same amount of memory than a single HDR render target. Bit packing data tightly is almost free on modern GPUs, because of full rate integer instructions (except for 32 bit IMUL of course).

Ethatron said:
If you want to use the filtering hardware and mip-maps then the G-buffers need to manage independence of sample-information. Data-packing is a no-no in that case.

Why do you want to use filtering when loading g-buffer data? I have never used anything else than point filtering (or raw load from compute shader) when accessing g-buffers.

Filtering from lighting buffer (after lighting is done) is of course important (because blur kernels, downsampling, etc need it).

AlNets said:
Wonder if it'd be worth it to pack their G-buffer into 2 FP16 targets even if it takes more space? Dunno if it'd work for their attributes in particular.

Packing two 8888 buffers to one 4x16 bit integer buffer is always a good idea on GCN. I haven't seen a case yet where that would slow down performance. Packing three 8888 buffers to two 4x16 bit buffers however is stupid... but why would anyone want to do that? Just pack the first two 8888 buffers to a single 4x16 bit, and leave the third as is. Modern GPUs have no problems in rendering to MRTs of different bit depths.

8888 + 8888 + 8888 = 1/3 fill rate
4x16 + 8888 = 1/2 fill rate

Both solutions use exactly same amount of memory and bandwidth.

AlNets said:
Catering to the GPRs is somewhat low level.

... but highly necessary if you want to achieve good occupancy. Many shaders need good occupancy to hide latency. If you don't optimize your GPR usage, your performance goes nosedive.

sebbbi · Mar 29, 2014

chris1515 said:
I like their idea of preloading the static shadow in a big statyic shadow map. I hope more games without dynamic weather or day/night cycle will do the same! Great idea to have great shadow quality!!

That is a very good idea. It definitely saves a lot compared to brute force rendering every mesh to distant shadow cascades every frame. I would expect to see even higher gains (than 40%-60%) in games that have very long view distances.

However I firmly believe that future shadow map techniques (which are able to automatically detect data reuse possibilities) solve this same problem more elegantly, with less memory wasted, with less constraints and with less manual artist/level design work.

AlNom · Mar 29, 2014

sebbbi said:
Modern GPUs have no problems in rendering to MRTs of different bit depths.

8888 + 8888 + 8888 = 1/3 fill rate
4x16 + 8888 = 1/2 fill rate

Both solutions use exactly same amount of memory and bandwidth.

Ooooh... Good to know.

sebbbi · Mar 29, 2014

AlNets said:
Ooooh... Good to know.

This is easy to calculate from fillrate tester numbers. Both 4x16 and 8888 render targets are full rate on GCN. 4x32 is half rate. Packing two 8888 RTs to a single 4x16 will allow you to render worth of two 8888 render targets at (fill rate / ROP) cost of one.

If you render simultaneously to two full rate render targets (such as 8888 or 4x16) you will have 1/2 fill rate, with three MRTs you will have 1/3, with four 1/4, etc. Simple math.

Many old GPUs had half rate 4x16 rendering, so packing two 8888s to one 4x16 didn't help at all. But nowadays this is a good trick to double your deferred rendering g-buffer fill rate.

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/05/GCNPerformanceTweets.pdf <--- this is actually tip 6 in this AMD performance tweet list

AlNom · Mar 29, 2014

Thanks.

I wasn't too clear on having mixed MRT bit depth (or if there were penalties) since MRTs typically needed to be same bit depth on PC side.

Rendering Tech of Ryse

Scott_Arm

AlNom

Moderator

KKRT

AlNom

Moderator

Laa-Yosh

I can has custom title?

Laa-Yosh

I can has custom title?

AlNom

Moderator

Laa-Yosh

I can has custom title?

Allandor

Laa-Yosh

I can has custom title?

Ethatron

AlNom

Moderator

chris1515

SenjutsuSage

KKRT

sebbbi

sebbbi

AlNom

Moderator

sebbbi

AlNom

Moderator

Similar threads