No, we're probably thinking about the same thing. I'm just taking it in a different direction.Ailuros said:Unless you mean something entirely different I can see "SW tiling" on GPUs for ages now.
Inane_Dork said:No, we're probably thinking about the same thing. I'm just taking it in a different direction.
I would think that something Xenos-like would be quite beneficial to PC developers, if they were all used to the mindset of fitting the framebuffer into cache. Granted, we're talking about several tiles to hit the uber high resolutions that PCs can do, but if your renderer is set up to scale that way, the benefits are tremendous.
Inane_Dork said:No, we're probably thinking about the same thing. I'm just taking it in a different direction.
I would think that something Xenos-like would be quite beneficial to PC developers, if they were all used to the mindset of fitting the framebuffer into cache. Granted, we're talking about several tiles to hit the uber high resolutions that PCs can do, but if your renderer is set up to scale that way, the benefits are tremendous.
And the perennial programmer favorite, access to the framebuffer within the pixel shader, would be much more feasible if the current tile was in cache.
Or maybe GPUs are taken in a more programmable stream direction. That way, data is kept on chip and external bandwidth is reserved for other things. Well, every stream that fits in cache must be small enough, so I'm back at tiling again.
It's not that tiling is the answer to everything. It's just that when I thought about the question, the answers I came up with all pretty much demanded that software be made aware of some GPU memory which cannot be exceeded. 'Twas just my initial response.
That was the idea. XDR and Rambus DRAMs are not produced in high enough volume to be cheap. The cost difference there probably counters any advantage in cost that you'd get from having the simpler board design.there any reason why they cant make a higher clocked more serial ram? isnt that what the idea behind xdr/rambus was or am i way outta line?
I've already read it several times. I don't know what you're trying to convince me of, but I would really appreciate a simple laying out of why my ideas are infeasible.Ailuros said:http://www.beyond3d.com/articles/xenos/index.php?p=05#tiled
There are both directions in that page. Read carefully and re-think.
You can't tile efficiently in software, because tiling sits between the calculation of screen-space vertex positions (done in the vertex shade currently), and the computation of pixels. As such, the only way to tile efficiently is in hardware. But I don't think it's really necessary.Inane_Dork said:No, we're probably thinking about the same thing. I'm just taking it in a different direction.
I would think that something Xenos-like would be quite beneficial to PC developers, if they were all used to the mindset of fitting the framebuffer into cache. Granted, we're talking about several tiles to hit the uber high resolutions that PCs can do, but if your renderer is set up to scale that way, the benefits are tremendous.
ERK said:This is something I've always been very curious about. For instance, would it be possible to get any kind of reasonable quality by moving to significantly higher resolution textures, but with lossier compression?
I often get annoyed with the smeary magnified look.
ERK said:How close to the limits of compression are we now? Seems like if there were performance to be mined here it would have been done already.
Chalnoth said:we'll need something pretty different to keep improving performance (i.e. eDRAM, on-package DRAM, or TBDR)
What are the Advantages of XDR DRAM?
Highest Frequency Memory
4.0/3.2/2.4Gbps speed with max. 8.0GB/s sustained bandwidth
More head room for expandability
Highly Effective Memory Bandwidth
Large number of banks (8 banks)
Efficient operation for different bank-set (Even/Odd)
Zero refresh overhead
But the counter argument is that rendering to cubemaps and shadowing etc. are all techniques that will increase in pervasiveness. There will no longer be just a backbuffer chewing up ROP/memory-bandwidth.Chalnoth said:Additionally, as we move into the future, pixel shader are naturally going to get longer. So framebuffer bandwidth demands are going to decrease in relation to fillrate demands. And the same goes for texture bandwidth, since the ALU to TEX operation ratio is just going to increase.
Inane_Dork said:I've already read it several times. I don't know what you're trying to convince me of, but I would really appreciate a simple laying out of why my ideas are infeasible.
It's okay, I don't bite. Just say it.
The net result here is that geometry needs to be recalculated multiple times for each of the buffers.
ShootMyMonkey said:That was the idea. XDR and Rambus DRAMs are not produced in high enough volume to be cheap. The cost difference there probably counters any advantage in cost that you'd get from having the simpler board design.
As far as actually raising the clock itself, the fundamental problem with that is simply the capacitance of those wire traces on your circuit boards. It's not easy to swing voltages that fast when you've got lots of capacitance. XDR manages it by using a very small voltage swing (only 0.2V), and using a differential signaling scheme to be a little more noise-resistant as 0.2V is not a lot.
Second of all, RAM itself can't be clocked super high. XDR at 3.2 GHz signaling means that the DRAM clock is 400 MHz. That's not an easy clock to reach considering that the larger the DRAM, the slower it is, the longer the multiplexor delays, the more there is to refresh and so on.
Well, rendering to cubemaps, in general, isn't going to be any different than rendering to the framebuffer, so that's not a concern.Jawed said:But the counter argument is that rendering to cubemaps and shadowing etc. are all techniques that will increase in pervasiveness. There will no longer be just a backbuffer chewing up ROP/memory-bandwidth.
It's a concern because it's an additional workload on the ROPs/bandwidth.Chalnoth said:Well, rendering to cubemaps, in general, isn't going to be any different than rendering to the framebuffer, so that's not a concern.
I think the compression is relatively limited where there's high geometric complexity, which makes for a particular problem when generating self-shadowing.Rendering shadowmaps is, of course, but this is where z-buffer compression comes in handy. It should be possible to compress a shadowmap in the same way that the z-buffer is compressed, dramatically reducing the bandwidth requirements.
hughJ said:What would a 512bit bus do to the pincounts and board complexity? What kind of difference did we see from 128bit to 256bit?
ShootMyMonkey said:That was the idea. XDR and Rambus DRAMs are not produced in high enough volume to be cheap. The cost difference there probably counters any advantage in cost that you'd get from having the simpler board design.
As far as actually raising the clock itself, the fundamental problem with that is simply the capacitance of those wire traces on your circuit boards. It's not easy to swing voltages that fast when you've got lots of capacitance. XDR manages it by using a very small voltage swing (only 0.2V), and using a differential signaling scheme to be a little more noise-resistant as 0.2V is not a lot.
Second of all, RAM itself can't be clocked super high. XDR at 3.2 GHz signaling means that the DRAM clock is 400 MHz. That's not an easy clock to reach considering that the larger the DRAM, the slower it is, the longer the multiplexor delays, the more there is to refresh and so on.
Very efficient software tiling, no. But likely efficient enough that, in a bandwidth constrained situation, recomputation of part of the scene is a win in order to fit inside cache. It would basically boil down to frustum culling and maybe some tile selection algorithm, but if you were really pressed for bandwidth, it well could be a win on the whole.Chalnoth said:You can't tile efficiently in software, because tiling sits between the calculation of screen-space vertex positions (done in the vertex shade currently), and the computation of pixels. As such, the only way to tile efficiently is in hardware. But I don't think it's really necessary.
Absolutely. I don't see that bandwidth is going to become the big bottleneck in real time graphics, but that's the question put forth in this thread.Consider ATI's performance hit from FSAA as a quick example. Simply being very careful about what you do with available bandwidth can really improve things quite a lot.
Additionally, as we move into the future, pixel shader are naturally going to get longer. So framebuffer bandwidth demands are going to decrease in relation to fillrate demands. And the same goes for texture bandwidth, since the ALU to TEX operation ratio is just going to increase.
Already known. It's a trade-off. But, like I said above, in the situation presumed in this thread, it would probably be advantageous. You trade off a resource that's not getting maxed (shader processing) for a resource that is (bandwidth). And we're talking about vertex shaders here which, to date, have not been terribly large.Ailuros said:From said page for tiling on IMRs.
But in this situation you'll have significantly higher bus and vertex bandwidth. So it may not be an overall bandwidth win after all.Inane_Dork said:Very efficient software tiling, no. But likely efficient enough that, in a bandwidth constrained situation, recomputation of part of the scene is a win in order to fit inside cache. It would basically boil down to frustum culling and maybe some tile selection algorithm, but if you were really pressed for bandwidth, it well could be a win on the whole.
Inane_Dork said:Already known. It's a trade-off. But, like I said above, in the situation presumed in this thread, it would probably be advantageous. You trade off a resource that's not getting maxed (shader processing) for a resource that is (bandwidth). And we're talking about vertex shaders here which, to date, have not been terribly large.