I mean; those aren’t his words you quoted. Somehow you quoted Metal Spirit and put 3Dillante thereI just realize that. I had to change up my language to be a bit more diplomatic. LOL
I mean; those aren’t his words you quoted. Somehow you quoted Metal Spirit and put 3Dillante thereI just realize that. I had to change up my language to be a bit more diplomatic. LOL
I mean; those aren’t his words you quoted. Somehow you quoted Metal Spirit and put 3Dillante there
Completing request Y on channels not immediately needed for function X means that the overall system is seeing a performance benefit by having both X and Y make progress.The unused channels during an access to the 6GB portion would deplete their queue.
This isn't always known as far as the memory channels and the controllers are concerned. The GPU path is aggressively reordered and does a lot of combining, which is not always obvious at the algorithm level. Compression can complicate things further by making the number of bus transactions for a set of fetches more variable.Any gpu algorithm would statistically request equally across all channels to get an equivalent 560GB/s. So having some request resolved (the channel that are still free) while the others stalled from serving the 6GB portion would result with a proportional stall.
If the channels aren't being used, then it sounds like the workload is fine with less that 560 GB/s, which can be frequently true. Even when a game is supposedly bandwidth-bound, it's more that a subset of the frame time budget is bandwidth-bound and it's holding up further progress.If the additional requests going to the 336 portion don't have an equivalent number of requests going to the remaining channels, wouldn't that still imbalance the queues? OTOH, if the data is spread unevenly to make the unused channels useful in that time frame, they are then not balanced equally when the 6GB portion is not used, and cannot do 560GB/s in total, it would instead cause the opposite queues starving.
That quote seems to be focusing on a specific use case rather than texturing in general: a depth buffer that is then switched out and then read linearly later.From what I remember from 2013 was that texture data isn't bandwidth intensive. Otherwise, Xbox One textures would be half the fidelity as PS4 texture. It didn't seem to be the case last gen.
https://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview
"Doctor, it hurts when I do this."
"Don't do that".
Just like if games on PS5 will use 90 GB/s of bandwidth on audio and 22 GB/s on decompression it will only have memory bandwidth of 336 GB/s left to use for GPU and CPU.
Developers won't do that.
Though a nice investigation, one shouldn't take one example of a launch title as indicative of 1) what games are doing now and 2) what they'll be doing next-gen. We might see a five fold increase in audio, maybe, and a load of RT data that doesn't fit at all into the KZSF model. The moment the CPU processing is no longer a bottleneck, the requirement for RAM use may increase dramatically, or maybe it'll go down with the GPU doing more compute work then before?I've divided Killzone ShadowFall's memory usage into two groups, one requiring fast access, and one requiring slow access.
If your'e following @3dilettante, who is providing very thorough answers to address Metal Spirit, the tldr is: there is only 1 possible scenario in which the asymmetric memory will cause an issue, and that's purely on the developer not knowing or caring about how they optimize their memory, and just dumping major critical items into the wrong area. All other scenarios are a non-issue whether the memory setup is symmetric or not.3 pages and I still dont understand shit in a thread that should have helped me understand
Besides, the CPU can consume the full 3.5 GB available on the slow pool very easily, but he will not generate 168 GB/s of bandwidth traffic, meaning several large GB/s of bandwidth on this pool are just wasted.
I'll let the more technically knowledgeable address the rest, but your thinking here seems really strange. If you don't use every TF of compute in every cycle on the GPU, are the unused TFlops wasted? If you only have 13.5GB total and in that mix is 9GB of data that needs fast memory access and 4.5GB that doesn't, what actual difference does it make if you have more capacity of fast memory available than data that needs it?
I understand the conclusion but not how it worksIf your'e following @3dilettante, who is providing very thorough answers to address Metal Spirit, the tldr is: there is only 1 possible scenario in which the asymmetric memory will cause an issue, and that's purely on the developer not knowing or caring about how they optimize their memory, and just dumping major critical items into the wrong area. All other scenarios are a non-issue whether the memory setup is symmetric or not.
I think Metal Spirit is hung up on slow and fast pools of memory. And we've tried to explain it to him before that it's a single physical pool. And that there is no such thing as fast or slow, it's just bits * clock speed.
We've also tried to explain that the amount of memory bandwidth provided by the 6 GB chips is very generous for CPU loads that would likely never get close to that amount.
We've also tried to explain that asymmetric memory chip setups has nothing to do with CPU and GPU contention over memory. And if the CPU ever monopolized that much data irrespective of being on a symmetric or asymmetric, you'd dog the memory anyway. Basically suggesting the CPU is a bigger consumer of bandwidth over the GPU.
He's then also explained that unused channels can be leveraged making the total system always fill out its potential. Instead of wasting its calls.
All in all, Xbox should perform well with it's memory setup
I might have not been clear! I'm sorry for that!
Let me try to explain it in another way:
Imagine a single pool of 16 GB memory, with 560 GB/s. (2.5GB already used by the OS)
You acess it with the CPU and use 3.5 GB of RAM generating a 50 GB/s traffic.
The GPU will have 10 GB RAM to use, and it can create a 510 GB/s traffic.
Now lets look at this case:
Accessing the 6 GB of slow ram, creates a 192 bit bus to this ram. This gives you 168 GB/s bandwidth to this memory! The remaining pool gets 392 GB/s, and both together give you the same 560 GB/s.
Now you use the full 3.5 GB creating a 50 GB/s in traffic.
How much will the GPU get?
392 GB/s... because there is no more memory on the other pool to generate extra bandwidth traffic.
Compared to the first case, unused bandwidth is wasted!
And this will not be for just a couple of cycles... We are talking CPU usage of this ram, so acess will be intense.
Was I clear this time?
The slow pool is 336 GB/s, not 168 GB/s. As for the rest, there are really people more equipped to properly explain this than me, but the 560 GB/s isn't the fast pool + the slow pool. The 560GB/s is the bandwidth of the fast pool any time you access data from it.
Sure if you don't use the CPU at all (I know what you meant: that ram can be accessed only by one component at the same time). But I doubt it will happen. With all the power in those CPUs, and seeing how CPU starved developers were during this gen, I highly doubt they wouln't try to max the CPUs with 60fps, physics, audio stuff etc.The slow pool is 336 GB/s, not 168 GB/s. As for the rest, there are really people more equipped to properly explain this than me, but the 560 GB/s isn't the fast pool + the slow pool. The 560GB/s is the bandwidth of the fast pool any time you access data from it.
The slow pool is 336 GB/s, not 168 GB/s. As for the rest, there are really people more equipped to properly explain this than me, but the 560 GB/s isn't the fast pool + the slow pool. The 560GB/s is the bandwidth of the fast pool any time you access data from it.
The slow pool is 336 GB/s, not 168 GB/s. As for the rest, there are really people more equipped to properly explain this than me, but the 560 GB/s isn't the fast pool + the slow pool. The 560GB/s is the bandwidth of the fast pool any time you access data from it.
I might have not been clear! I'm sorry for that!
Let me try to explain it in another way:
Imagine a single pool of 16 GB memory, with 560 GB/s. (2.5GB already used by the OS)
You acess it with the CPU and use 3.5 GB of RAM generating a 50 GB/s traffic.
The GPU will have 10 GB RAM to use, and it can create a 510 GB/s traffic.
Now lets look at this case:
Accessing the 6 GB of slow ram, creates a 192 bit bus to this ram. This gives you 168 GB/s bandwidth to this memory! The remaining pool gets 392 GB/s, and both together give you the same 560 GB/s.
Now you use the full 3.5 GB creating a 50 GB/s in traffic.
How much will the GPU get?
392 GB/s... because there is no more memory on the other pool to generate extra bandwidth traffic.
Compared to the first case, unused bandwidth is wasted!
And this will not be for just a couple of cycles... We are talking CPU usage of this ram, so acess will be intense.
Was I clear this time?
I made the same mistake when I first considered Series X, but these are rates, not volumes. If in one second of game you access the RAM for 0.2 seconds at 560 GB/s (reading and writing 112 GB of data!), then you have 0.8 ms of the frame for the GPU to read data at 560 GB/s. Both get the full bandwidth. But of course, the GPU transfer a total lower quantity of data using the bus for 0.8s versus using it for 1s, but data consumed is meaningless in relation to transfer rates.If we say the CPU uses 50GB/s we are actually saying that the CPU has a demand for 50GB of data over a second. On the XBSX, 50GB can be delivered from the slow pool using around 15% of the cycles of the memory bus given that the max that can be transferred from that memory is 336GB over a second if it were to be used every cycle. That leaves 85% of the bus cycles available for everything else, including (and probably most frequently) GPU accesses to the fast memory. 85% of 560 is 476 which is still more than the full bandwidth of the PS5.
Nope...
Nothing of the sort:
If you acess only 10 GB you have 560 GB/s, if you acess only 6 GB, you have 336 GB/s, If you acess both at once, you have 392 GB/s on the fast pool,168 on the slow pool, if you average acess to both pools, you will have 280GB/s on the fast, and 168 on the slow.
Yes, the SX has 2.5 GB reserved for system functions and we don't know how much the PS5 reserves for that similar functionality but it doesn't matter - the Xbox SX either has only 7.5 GB of interleaved memory operating at 560 GB/s for game utilisation before it has to start "lowering" the effective bandwidth of the memory below that of the PS5... or the SX has an averaged mixed memory bandwidth that is always below that of the baseline PS4.
All right, F it. I'll take a stab and, hopefully if I get anything wrong someone will step in and correct me. I think the confusion may be from conflating the usage of GB/s in both a spec that represents a potential available resource and an actual measurement of use over a period of time.
If we say the CPU uses 50GB/s we are actually saying that the CPU has a demand for 50GB of data over a second. On the XBSX, 50GB can be delivered from the slow pool using around 15% of the cycles of the memory bus given that the max that can be transferred from that memory is 336GB over a second if it were to be used every cycle. That leaves 85% of the bus cycles available for everything else, including (and probably most frequently) GPU accesses to the fast memory. 85% of 560 is 476 which is still more than the full bandwidth of the PS5.
Now let's do the same for the PS5. 50GB can be transferred using 11% of the cycles of the bus with a max bandwidth of 448GB/s. That leaves 89% left for the GPU and everything else. 89% of 448GB/s is ~399 GB/s.
Get it now?