ref rasterizer questions

Driver can't just force all render targets from 16 bit to 32 bits... What if game relies on that the render target is 16 bits per pixel as many older games did to render text? Add 100 exceptions for 100 games in the driver just to make them working?
Similar with palette textures... Palettes got set at frame render. Which means you can change palette every draw call but keep the texture exactly the same! Which again means you'd have to "recompile" texture every time a palette gets changed... If a game has fallback great, if it does not have it then it won't even work on todays hadrware.
I thought that increasing internal rendering precision couldn't mess things up, and that render targets were parts of the frame buffer, but am I wrong?
 
Internal precision is already fp32 for everything (yes, even Half Life 1)! Problem is when games (especially older ones) do multiple passes over the scene. Which means pixels will leave pipeline (written to render target) and get clamped to 16 bits per pixel and then later re-enter pipeline for further processing (blending). This is the stage where you loose precision and huge fp32 internal precision won't save you at all.
The problem is you can't just increase render target bit depth without running into problems.
 
I thought that increasing internal rendering precision couldn't mess things up, and that render targets were parts of the frame buffer, but am I wrong?

Increasing internal calculationg precision does not mess up things, unless the rendering technique specially depends on lower quality rendering / clamping (some toon shading algorithms could do this for example).

However if the game instructs the API to create a 16 bit texture, and the API/driver creates a 32 bit texture instead, there will be major problems if the game locks the surface and processes it by CPU (expecting it to be 16 bits per pixel). Most surfaces cannot be locked (back buffer, front buffer, z-buffer and all default pool textures in DirectX, unless specially described as locked surfaces), so this is not often a problem, and the driver can correctly detect the surfaces that can be rendered in 32 bit depth without problems. However there are cases where the driver cannot be sure, so it's better just to obey the program and create a 16 bit texture. Potential image quality improvement is not as important as the potential incompatiblity (program crashes/hangs).

On earlier DirectX versions (DX5 - DX7) all render targets and depth buffers could be locked by default. On newer DirectX versions render targets and depth buffers cannot be locked by default (there are specific formats for lockable render targets). This allows the hardware to implement the render targets in a format better suited for hardware rendering (cache friendly tiled buffers, stencil bits in separate buffer than z-bits, etc).

Locking a buffer = a method to map a GPU resource (texture, vertex buffer, index buffer, etc) to CPU address space. This allows the developer to modify and read the GPU data using the CPU. Usually most buffers are locked at the beginning of the game/level, and data is filled in. After this only dynamic buffers need to be updated (locked).
 
Thanks for all the replies.

one more question:

if dx5 and 6 were to one day be emulated completely thru fp32 shaders on dx10 hardware, would the emulation be close to perfect? For example, could any dx5 or 6 game look exactly like it did on a voodoo5 if done by dx10 fp32 shaders or are there other factors that would make that impossible?
 
Dithering can be easily done by an additional small screen space (wrapped/repeating) texture, or by rendering to an higher precision texture and doing dither in post process and outputting the result to the target bit depth texture. The performance hit is minimal.

Paletted textures can be simulated with an 8 bit index texture (the image) and an 1x256 palette texture. Point sampling can be achieved with an single additional texture fetch (sample palette texture according to the fetched 8 bit index). Bilinear filtering first needs four texture fetches to the index texture, four texture fetches to the palette texture and then interpolation calculation. With fetch4(ATI DX9) or gather (all DX10), you can get the four samples from the index texture with one texture fetch, dropping the number of fetches to five. The performance is good enough for older games. All textures that are not lockable can be converted safely to 8888 format during loading time to improve performance (identical image quality).
 
Back
Top