quick answer is yes, as DDR3 alone would not be able to sustain the bw need, but what exactly do you mean by "esram<->ddr3 read\writes" and what's "sizable"?
I think he means copy data from ESRAM to DDR3 and vice versa.
quick answer is yes, as DDR3 alone would not be able to sustain the bw need, but what exactly do you mean by "esram<->ddr3 read\writes" and what's "sizable"?
quick answer is yes, as DDR3 alone would not be able to sustain the bw need, but what exactly do you mean by "esram<->ddr3 read\writes"?
I mean to say transfers between ddr3 and esram. The esram so to speak has to be "fed" data, as well as feed out data (eg a game might have 1 copy out per frame to mainmem for each front buffer write).
That's very true, and that means dedicated ESRAM would not be suffering from contention, and that's where most of the read/write would happen at the pixel level.But the ddr3 has less bandwidth to work with, as well as memory contention issues sharing bandwidth between the two processing units (all share memory architecture suffers from this).
Again, depends on what do you mean by "sizable" as in an actual number in your mind.So my question is do the combination of all the esram to&from ddr3 reads/writes take sizeable amount of the ddr3 bandwidth in most games?
Why wouldn't GPU code scale up with the longer GPU time slice when it's BW bound?Sidebar: The BW is not the bottleneck here, if the BW is the limiting factor, then the June SDK wouldn't have helped at all, and that's the only logical conclusion.
Why wouldn't GPU code scale up with the longer GPU time slice when it's BW bound?
The code will execute proportionally more work if it has more time to run. Otherwise things don't add up. There would need to be a bottleneck that changes based on the time slice width, which I cannot see.
When you're dealing with time slices, more time effectively is more BW, in a long-term time-average sense. You're not going to be making memory access when you're not using the GPU.Wouldn't more work requires more bw, in general?
When you're dealing with time slices, more time effectively is more BW, in a long-term time-average sense. You're not going to be making memory access when you're not using the GPU.
The June SDK doesn't change the "instantaneous" BW available to the GPU, but a game is still able to make more total memory access over time.
If you spend 90% of 1 second accessing a 100MB/s bus, you can achieve up to 90MB of transfer during that second. If you spend 100% of 1 second accessing the same bus, you can achieve up to 100MB of transfer during that second. If you task is bound by the transfer of data, you're going to be able to get more done in the second case than in the first. Unless a game on XB1 is able to buffer a cartoonishly massive amount of memory access requests and carry them out (somehow without contention issues) during the system's time slice, even a completely BW-bound game should see markedly better performance by getting rid of the time slice.You can't make more memory access if you are already bw bound
If you spend 90% of 1 second accessing a 100MB/s bus, you can achieve up to 90MB of transfer during that second. If you spend 100% of 1 second accessing the same bus, you can achieve up to 100MB of transfer during that second. If you task is bound by the transfer of data, you're going to be able to get more done in the second case than in the first. Unless a game on XB1 is able to buffer a cartoonishly massive amount of memory access requests and carry them out (somehow without contention issues) during the system's time slice, even a completely BW-bound game should see markedly better performance by getting rid of the time slice.
But that's the thing, BW *is* being added. So now you can continue along at your previously BW bound rate, but as you doing it for longer (with the time slice reduced or removed) you've moved more data.
I haven't really commented on that question (my entire point is that the characteristics of things before and after the update doesn't say much about whether something is BW bound or not), though I'd be a little surprised if it were a tremendous issue.you and I are saying the same thing in the gist (that the X1 does not have a bw problem).
That's an oddly black-and-white view of the situation.By definition, you simply can not be bw bound if you are able to improve performance w/o adding physical bandwidth.
It's not a view, it's the very definition of "bandwidth bound." If you believe there is a different definition, then what you thought I meant is not what I meant, in which case further discussion on this is moot.That's an oddly black-and-white view of the situation.
Eh, the recognized definitions of these things aren't very strict.It's not a view, it's the very definition of "bandwidth bound."
I suppose.If you believe there is a different definition, then what you thought I meant is not what I meant, in which case further discussion on this is moot
this is the simplest approach, yes, but it is not the most efficient. you waste much bandwidth if you just fill the esram with your buffers, rendertarget or whatever. you must swap memory from sram to dram and back if you want to get a real advantage out of the sram.Indeed, it was interesting.
It seems that esram + dram memory operations will usually be additive in terms of none-copy operations they allow, with exceptions being cases where the speed up from having the data waiting in sram will be worth the BW cost of a pre-emptive copy in.
This does mean paying more attention to when and where your memory access occurs than on PC & PS4 however, and the easiest way to use the esram is still probably to just cram your buffers in to the esram and write out when your done.