Why does Frostbite engine perform relatively better on PS4 than XB1? *spawn

iroboto · Nov 19, 2015

Metal_Spirit said:
Can you be more specific? We are talking about eSRAM, and bandwidth! Xbox One bandwidth is funny since the large memory pool is slow, and the very small one is fast. But not as fast as we may think. Average its the same as PS4, but with limits on both read and writes!

So, how come you say that? Besides, the use of GPGPU should shoud allow for less memory bandwidth usage, and PS4 has more GPGPU!

With the exception of the limits placed by the RAW power, I dont see any differences between the consoles, and I only see the Xbox internal memory and bandwidth fragmentation as an aditional problem!

Also, as many games have shown Xbox usually has aditional performance problems alpha effects!

To clarify some terms:
Bandwidth is a metric that describes the amount of data that can be moved over a set interval of time.
Latency is a measurement of speed.

If the objective is to measure speed, then accessing a single memory location of say 16 bits from DDR3 vs GDDR5, the DDR3 would be faster. It would take less time for the CPU to send out the signal to retrieve the data, and it would leave, go to the memory location and retrieve it and return it back to the CPU faster.

ESRAM sits in the actual APU, it's round trip latency will be significantly less just looking at the distance the data must travel. It is _significantly_ faster than any off-die solution when it comes to retrieve speed, not as fast as cache of course, but still faster than off-die RAM.

Bandwidth however is about how much data comes per second, in which if ESRAM is like a sports car, seats 2. But gets from a to b really fast. It would take 50 round trips to move 100 people. (lets assume the car is self driving). DDR3 is like a double decker bus. GDDR5 is like a passenger train. The train is much slower than the other two, but it's going to move more than 100 people in a single trip.

Hence your concepts of -speed- vs bandwidth are being used interchangeably, as you are looking at the amount of data moved over a set interval period of time (a period of time that is extremely not time sensitive as 1 second is _very_ long in computing time) and declaring a winner.

This is just speaking technically of course.

As many have noted here before, developers will optimize their games to extract performance out of their hardware, which is why bandwidth is a more important number than latency (at least for graphics)

Metal_Spirit · Nov 19, 2015

iroboto said:
To clarify some terms:
Bandwidth is a metric that describes the amount of data that can be moved over a set interval of time.
Latency is a measurement of speed.

We have different perspectives here.

Not disagreeing with you, but want it or not, bandwidth is also a measurement of speed. And the best example is: downloads. More bandwidth, faster downloads!
If you move 100 GB/s you will take 1/10 of a second to move 10 GB. But if you move 1000 GB/s it will take only 1/100 of a second to do the same!
It is a metric that describes describes the amount of data that can be moved over a set interval of time, yes, but it cannot not be dissociated from speed on internal data transfer!
If you need a set amount of data, the time for it to be retrieved depends largely (although not only) on bandwidth.

Now latency... That's a pure measurement of speed. No arguments there!

iroboto said:
If the objective is to measure speed, then accessing a single memory location of say 16 bits from DDR3 vs GDDR5, the DDR3 would be faster. It would take less time for the CPU to send out the signal to retrieve the data, and it would leave, go to the memory location and retrieve it and return it back to the CPU faster.

You are talking about accessing a single piece of information on RAM to the CPU, but we also have Burst Access to the GPU... Things are a bit different there, and thats why GDDR is used on GPUs!.

iroboto said:
ESRAM sits in the actual APU, it's round trip latency will be significantly less just looking at the distance the data must travel. It is _significantly_ faster than any off-die solution when it comes to retrieve speed, not as fast as cache of course, but still faster than off-die RAM.

No arguments there! But don´t forget that not all data is present on the 32 MB eSRAM, and some has to be fetched from DDR to GDDR (and vice versa) using the DDR3 bandwidth and latency. You cannot accommodate all on 32 MB! So, I don´t think it's that straighforward.

iroboto said:
Bandwidth however is about how much data comes per second, in which if ESRAM is like a sports car, seats 2. But gets from a to b really fast. It would take 50 round trips to move 100 people. (lets assume the car is self driving). DDR3 is like a double decker bus. GDDR5 is like a passenger train. The train is much slower than the other two, but it's going to move more than 100 people in a single trip.

Once again not saying you are incorrect, because you are not. But regardless of how fast eSRAM is it can only move 140 to 150 people at max (people=GB/s). About the same as PS4! So.. As Mark Cerny once said, different methods, for basically the same results, but with a much more complex solution in Xbox, not so easy to master. Sony even considered a similar solution and rejected it because of the cons.

iroboto said:
As many have noted here before, developers will optimize their games to extract performance out of their hardware, which is why bandwidth is a more important number than latency (at least for graphics)

Well.. according to VGleaks, widely accepted in this forum as accurate, PS4 uses huma to hide the GDDR 5 latency from the CPU, removing the downsides of GDDR5 by the CPU!

iroboto · Nov 19, 2015

Metal_Spirit said:
We have different perspectives here.

Not disagreeing with you, but want it or not, bandwidth is also a measurement of speed.

I assure you it is the wrong way to measure speed. You're measuring bottlenecking, you're not measuring speed. The smaller the pipe the less data that can get through, that doesn't mean the data is travelling slower or faster. Having a massive internet connection like optical OC3 network speeds (192 GB internet connections) vs my 2MB local network, if the server was on my local network, bandwidth all you want, my ping will be 0ms, your ping will be in the 80ms range. I win at counterstrike and you lose, that's the difference in speed. I will move data much slower than you will across my network, but you are certainly not receiving the data packets faster than I am if the network connection is NOT bottlenecking the load.

And the best example is: downloads. More bandwidth, faster downloads!
If you move 100 GB/s you will take 1/10 of a second to move 10 GB. But if you move 1000 GB/s it will take only 1/100 of a second to do the same!
It is a metric that describes describes the amount of data that can be moved over a set interval of time, yes, but it cannot not be dissociated from speed on internal data transfer!

Wrong. Explained above, but also you've averaged it entirely - it doesn't work like that. If I have a pipe that was 1000GB wide, I could in theory send 1000GB worth of bits in 1 shot, and wait one full second to arrive. You'd get no data until the second was up. You are getting all 1000GB but you are only receiving it once every second. There are a lot of ways to manipulate this. You don't have to receive anything the way you expect it to, and that's the way bandwidth works even today in hardware systems and on the internet. Don't expect your memory controller to working fresh, it's getting tons of requests from everywhere, it won't be able to fulfill your requests in a uniquely spaced out way that you describe it. If we did, we wouldn't be talking about theoretical performance, because at that time we're just talking about perfect performance.

Which is why graphics developers have latency hiding in their shader code. While you wait for information to come from memory, you switch threads and do the same thing, and then switch threads again and do the same thing, and then when the data arrives you go back and continue the thread forward. Be very careful how you compare the two, just because you have huge bandwidth doesn't mean there aren't limitations on things like data cache sizes, and instruction caches. I am likely speaking out of my butt, but I imagine that there are limits to how many threads can be reasonably held until the pipeline must just stall and wait for information to come in.

The rest of it I won't respond to. These are known things.

I'm not referring to which system is better. The developers will always optimize for what is best. It's just that there will be technically operations that will benefit from lower latency and there will be operations that greatly benefit from large bandwidth. The hint here is that no one debates that games are optimized for the latter, but that doesn't mean latency is interchangeable with bandwidth. Most of the time it will call for bandwidth, but sometimes it calls for latency.

It's best to separate the two for the sake of not confusing the discussion. If we are talking about operations that require fast response time, we can't bring bandwidth into the equation unless a large of data must be moved as well.

function · Nov 19, 2015

Metal_Spirit said:
Well.. according to VGleaks, widely accepted in this forum as accurate, PS4 uses huma to hide the GDDR 5 latency from the CPU, removing the downsides of GDDR5 by the CPU!

HUMA isn't about hiding main memory latency from the CPU. Main memory latency for the CPU is pretty high on the PS4.

Metal_Spirit · Nov 19, 2015

OK, re-read both your posts and got your point now! English is not my native linguagem, thats why you say I averaged it entirely. Sorry about that!

Metal_Spirit · Nov 19, 2015

function said:
HUMA isn't about hiding main memory latency from the CPU. Main memory latency for the CPU is pretty high on the PS4.

Yes, but CPU- GPU data copies are gone, and this compensates a lot by reducing memory operations.

Pixel · Nov 20, 2015

In frostbite Xbox is either bound by system memory or shaders. Which is why the ps4 is pulling ahead slightly. I'd assume shader.

onQ · Nov 20, 2015

Maybe because the engine is more compute heavy.

Metal_Spirit · Nov 20, 2015

Metal_Spirit said:
OK, re-read both your posts and got your point now! English is not my native linguagem, thats why you say I averaged it entirely. Sorry about that!

Linguagem=language. Spell corrector on my tablet

iroboto · Nov 21, 2015

Metal_Spirit said:
Yes, but CPU- GPU data copies are gone, and this compensates a lot by reducing memory operations.

Yes and no. It only saves the write memory operation- that is copying from system RAM to graphics ram. It would also save one reverse copy if you were doing GPGPU. But otherwise, that's all the savings there are. So you definitely save a couple of cycles there.

I think with heavy GPGPU usage you are looking at very good savings.

high latency will occur on every transaction by GPU and CPU. So you lose cycles there. I really don't have any idea how it weighs out in the end. I think PC games have their Stuff copying over to their Graphics ram all the time and with dx12 they can do it concurrently, so I think it should be okay.

I think GPGPU is really strong with HUMA, outside of that I think it offers little benefit.

Would be cool to know to find out though.

Clukos · Nov 21, 2015

Pixel said:
In frostbite Xbox is either bound by system memory or shaders. Which is why the ps4 is pulling ahead slightly. I'd assume shader.

Slightly? I think Frostbite games usually have the biggest difference between the two platforms. Ps4 has higher res and more stable perf in most Frostbite 3 games. For example, in Battlefront X1 probably has to drop sub-720p to reach Ps4 levels of performance at 900p with the same visual settings.

Pixel · Nov 21, 2015

Barbarian said:
From page 15, they show 4 GBuffer MRTs (16 bytes), but since they'll also need a depth buffer, that would put the total at 20 byets per pixel.
At any rate, that should be enough to fit in ESRAM at 900p but since it would leave very little space for other things (shadows, lit buffers etc) it will likely impact performance negatively, which I'm guessing is the main reason to not go there.

Consider that main memory bandwidth may be the culprit.

In tests of a few variety of other AMD APU architectures had memory contention reduced ddr3 memory bandwidth by up to ~25% in scenarios where the gpu and the cpu both heavily accessed the unified memory. If the same holds true for XboxOne instead of 68GB/s BW, that leaves you with ~50GB/s memory bandwidth.
If memory serves correct in MS own public documents for Xbox Devs they emphasize heavy bw loss can occur from contention and discuss variety of means to reduce cpu bandwidth usage.

Now add the appearance that in the frostbite engine MRTs at 900p or greater may go over 32MB and you need to fit shadow buffers and depthbuffers as well, thus its quite possible that they are placing part of the render targets in the main system memory outside of the esram.

This will strain a memory architecture already burdened with cpu code and texture sampling, AF etc.

Ike Turner · Nov 21, 2015

Clukos said:
Slightly? I think Frostbite games usually have the biggest difference between the two platforms. Ps4 has higher res and more stable perf in most Frostbite 3 games. For example, in Battlefront X1 probably has to drop sub-720p to reach Ps4 levels of performance at 900p with the same visual settings.

What the...

turkey · Nov 21, 2015

Pixel said:
Consider that main memory bandwidth may be the culprit.

In tests of a few variety of other AMD APU architectures had memory contention reduced ddr3 memory bandwidth by up to ~25% in scenarios where the gpu and the cpu both heavily accessed the unified memory. If the same holds true for XboxOne instead of 68GB/s BW, that leaves you with ~50GB/s memory bandwidth.
If memory serves correct in MS own public documents for Xbox Devs they emphasize heavy bw loss can occur from contention and discuss variety of means to reduce cpu bandwidth usage.

Now add the appearance that in the frostbite engine MRTs at 900p or greater may go over 32MB and you need to fit shadow buffers and depthbuffers as well, thus its quite possible that they are placing part of the render targets in the main system memory outside of the esram.

This will strain a memory architecture already burdened with cpu code and texture sampling, AF etc.

Memory contention will be an issue for ps4 also, only every single access is going to be hitting the single pool, Sony also had slides showing how the cpu could decimate the bandwidth the gpu requires.

Clukos · Nov 21, 2015

Ike Turner said:
What the...

According to DF at least, the Xbox 1 version seems to run a bit worse than the Ps4 version which is almost a locked 60 (nothing too bad but certainly below 60, in the 50-55 range). Unless you know something i don't, it seems like DICE has to drop the resolution further to get the performance up to par while keeping the same visual settings with the Ps4 version.

Shifty Geezer · Nov 21, 2015

turkey said:
Memory contention will be an issue for ps4 also...

The theory is that even when contention is affecting PS4's BW, there's plenty to spare. If XB1's DDR3 is being affected to the same degree, there's far less available for working. That is, if 10 GB/s CPU access on PS2 has a 20 GB/s reduction on GDDR5 BW to GPU, and 10 GB/s CPU access on XB1 has same impact, DDR3 RAM BW left for GPU is 48 GB/s. There's never been to my knowledge any talk of contention impact on XB1, but it's pretty much the same architecture and so must have the same issues. The ESRAM is unaffected, but moving data to and from ESRAM is going to be limited to 48 GB/s in this example. That could certainly be a bottleneck.

Clukos said:
According to DF at least, the Xbox 1 version seems to run a bit worse than the Ps4 version which is almost a locked 60 (nothing too bad but certainly below 60, in the 50-55 range). Unless you know something i don't, it seems like DICE has to drop the resolution further to get the performance up to par while keeping the same visual settings with the Ps4 version.

Are the visual settings greatly different? Otherwise a 50 > 60 fps performance deficit wouldn't require a 40% drop in res from 900p to 720p.

upnorthsox · Nov 21, 2015

isn't xb1 memory allocated in 1/3's so the cpu would get 20, 40, or 60GB slices of bandwidth, or am I remembering that wrong?

Clukos · Nov 21, 2015

Shifty Geezer said:
Are the visual settings greatly different? Otherwise a 50 > 60 fps performance deficit wouldn't require a 40% drop in res from 900p to 720p.

I don't think they are different at all. And that's why X1 runs at a lower res (among other reasons), it doesn't have anything different than Ps4 (better or worse) other than resolution. My point is if both systems run the exact same presets and Ps4 runs better at 900p than X1 does at 720p, the latter needs to drop further to achieve parity in performance.

turkey · Nov 21, 2015

upnorthsox said:
isn't xb1 memory allocated in 1/3's so the cpu would get 20, 40, or 60GB slices of bandwidth, or am I remembering that wrong?

The whole system used to be timesliced to give 10% of the resources to the system for kinect skeleton tracking and similar but that is now optional and developers can reclaim it?

Clukos · Nov 21, 2015

NX gamer is also reporting better perf on Ps4.

Edit:

Why does Frostbite engine perform relatively better on PS4 than XB1? *spawn

iroboto

Daft Funk

Metal_Spirit

iroboto

Daft Funk

function

None functional

Metal_Spirit

Metal_Spirit

Pixel

onQ

Metal_Spirit

iroboto

Daft Funk

Clukos

Bloodborne 2 when?

Pixel

Ike Turner

turkey

Clukos

Bloodborne 2 when?

Shifty Geezer

uber-Troll!

upnorthsox

Clukos

Bloodborne 2 when?

turkey

Clukos

Bloodborne 2 when?

Similar threads