I was under the impression that render targets consumed well over half the bandwidth on a current PC card. If so, rendering to cache is more likely to be a win than not, even when quadrupling vertex bandwidth.Chalnoth said:But in this situation you'll have significantly higher bus and vertex bandwidth. So it may not be an overall bandwidth win after all.
This is 3-5 years down the line. I kinda doubt there will be separate hardware shaders. And I also doubt that there will be numerous cases with long vertex shaders and short pixel shaders. Shadow map generation is probably the closest match I can think of for this situation.Ailuros said:On a GPU with separate PS/VS units where it would hit in a theoretical scene onto a very long vertex shader at the same time with a quite short pixel shader I'm not so sure it would be advantageous.
Difficult, because the TMU (or rather, a texel prefetcher, however you want to call it) would need access to the compression flag table that indicates which tiles are compressed and which aren't.Chalnoth said:Rendering shadowmaps is, of course, but this is where z-buffer compression comes in handy. It should be possible to compress a shadowmap in the same way that the z-buffer is compressed, dramatically reducing the bandwidth requirements.
Jawed said:
Well, the problem with this is that for games, temporal coherence is very low. You basically have the HUD. So while it's really need that they are getting good performance, I don't expect that sort of thing to have any impact on the games industry.Joe DeFuria said:
There is another form of compression available: ATI1N, for single-channel textures.Xmas said:Difficult, because the TMU (or rather, a texel prefetcher, however you want to call it) would need access to the compression flag table that indicates which tiles are compressed and which aren't.
Inane_Dork said:This is 3-5 years down the line. I kinda doubt there will be separate hardware shaders. And I also doubt that there will be numerous cases with long vertex shaders and short pixel shaders. Shadow map generation is probably the closest match I can think of for this situation.
Sure, there will be corner cases and instances where caching the render target is slower than the current system. If the situation was really so slam dunk, we would be there by now.
Lemme check... lets see.... SM2.0... SM2.0... (it's hard to find it around here, since we still have a lot of 1.x shaders ... OK, I think it's something in the range of 15-20 instructions. But, as I said before – the shader that really raped high-end GPU was the one with 8 texture fetches (and only 25 instructions!).
As for vertex shaders... well, we went overboard a bit - turns out that some shaders now have >100 instructions (water surface geometry deformation, complex reflections with fresnel and similar high-tech mumbo-jumbo .
Well, I dunno.Ailuros said:What is it then I'm probably misinterpreting here?
And how is that related to using the compressed Z-buffer as a texture? It's read-only, lossy, and low precision and therefore completely useless for shadow maps.Jawed said:There is another form of compression available: ATI1N, for single-channel textures.
Jawed
Conditional rendering builds on top of occlusion queries, which are much harder to implement efficiently in TBDRs than in IMRs.Ailuros said:A TBDR can remove that kind of redundancy for one and with D3D10 and conditional rendering I believe even more so.
I wasn't suggesting it was!Xmas said:And how is that related to using the compressed Z-buffer as a texture?
My mistake, being single-channel it seemed usefulIt's read-only, lossy, and low precision and therefore completely useless for shadow maps.
Yeah, let's hope Intel's plan for 3d chips becomes a reality in the not too distant future.bloodbob said:Well all memory will be on the chip once we moved to high-k dielectrics low clock speeds and 3d chips.
"QDR" in Micron's technology was achieved by having one read and one write port. Each port was DDR, in total achieveing 2 read + 2 write = 4 transfers per clock cycle. It had no bandwidth-per-pin advantage over DDR; its advantage was solely that it eliminated bus turnaround times.{Sniping}Waste said:I find it strange that in 3 pages there is no mention of QDR. Micron has made QDR static ram for some years now and there was R&D being done on QDR SDRAM. This would lower the amout of traces compaired to a 512bit mem bus.
arjan de lumens said:Conditional rendering builds on top of occlusion queries, which are much harder to implement efficiently in TBDRs than in IMRs.
Inane_Dork said:Well, I dunno.
I think that, overall, rendering in smaller chunks and caching those chunks on chip is a win when you're strapped for VRAM bandwidth. I grouped that under "tiling" because that made sense to me, but maybe it has too loaded a definition. The smaller chunks could be portions of render targets or a bundle of streams or whatever.
Every system has its strengths and weaknesses. I think efforts like tiling fit bandwidth constrained scenarios better than the current system does.
Ailuros said:5.) TBDR (who on earth expected me to say otherwise? )