Recent content by thepkrl

  1. T

    What do you think of this NV30 Pipeline diagram?

    Yes and no. There's no way to specify combiner operations inside a fragment program, so technically no. However, the number of FX12-operations you can perform between each FP32 operation exactly matches what you could do with two combiners (without the extra scale/bias tricks or dual issue...
  2. T

    What do you think of this NV30 Pipeline diagram?

    When doing just Z/Stencil, there's no situation when any of the fragment processing (FP or FX) can be used. That doesn't mean that the FP/FX hardware isn't used, but probably the pipeline would be designed more around the 4 pixel case with the 8Z being an addition, instead of a main feature...
  3. T

    NV30 fragment processor test results

    More numbers below. Looking at them more closely, it seems there is more structure to the slowdown than just every two registers slowing things. I've grouped the numbers a bit to show this. 32.12 rounds 8.48 cycle/fragm: 1 regs, 32 instr 32.05 rounds 8.46 cycle/fragm: 2 regs, 32...
  4. T

    NV30 fragment processor test results

    Yes you are right. Integer units can be bypassed or the float data just flows through unchanged. Program length doesn't seem to matter. Program is probably loaded from memory and long programs would only be slower in bandwidth limited cases. I also tried texture fetches and they went the same...
  5. T

    NV30 fragment processor test results

    The performance numbers don't really tell how many units are involved, just how many results we get each cycle. There could be four texture units that can do one trilinear or two bilinear. Or there could be 8 texture units, two connected to each fragment processor. The actual hardware wiring is...
  6. T

    NV30/31/34/35 Fragment Processor Diagram(Speculation)

    After reading these comments and doing further tests based on them, I wrote a summary with more details and posted it as a new thread (as it only concerns NV30 and contains test results and not just speculation). link: http://www.beyond3d.com/forum/viewtopic.php?t=5150
  7. T

    NV30 fragment processor test results

    According to further tests, this is what the NV30 fragment shader architecture looks like: FLOAT/TEXTURE-UNIT (handles FP16, FP32 and texture) | INTEGER-UNIT (handles FX12, 1-2 ops in parallel) | INTEGER-UNIT (handles FX12, 1-2 ops in parallel) | (loopback or output) There are 4...
  8. T

    NV30/31/34/35 Fragment Processor Diagram(Speculation)

    Actually GF4 can do 4 multiply/dot or 2 add/mux ops/clock (for both RGB and alpha, equivalent to one RGBA vector). It can also do scale/bias/expand etc operations on inputs/outputs which are not supported in fragment programs. Now that you mention it, this is actually very close to the FX12...
  9. T

    NV30/31/34/35 Fragment Processor Diagram(Speculation)

    43.45 I also ran the tests in the first post with 42.92 which came with the card. The results were identical.
  10. T

    NV30/31/34/35 Fragment Processor Diagram(Speculation)

    FP32 additions (FP16 is same speed) 3.98 fragm/cycle 0.25 cycle/fragm: 1add-FP32 1.90 fragm/cycle 0.53 cycle/fragm: 2add-FP32 1.26 fragm/cycle 0.79 cycle/fragm: 3add-FP32 0.95 fragm/cycle 1.06 cycle/fragm: 4add-FP32 0.76 fragm/cycle 1.32 cycle/fragm: 5add-FP32 0.63...
  11. T

    NV30/31/34/35 Fragment Processor Diagram(Speculation)

    As for architectural speculation on the NV30 fragment shader: Earlier Geforces had TEXTURE SHADER followed by REGISTER COMBINERS. Texture shader is really very much like a simple fragment shader unit, as it can fetch textures and do some limited floating point operations. It would make...
  12. T

    NV30/31/34/35 Fragment Processor Diagram(Speculation)

    I have some results that might help you. I've been testing NV30 (5800 Ultra) fragment program performance with driver 43.45 (results are the same as for 42.92 with which I started). Testing is done with OpenGL NV_fragment_program. I have tested performance for all instructions with FP32...
Back
Top