FP16 and market support

Discussion in 'Architecture and Products' started by radar1200gs, Dec 19, 2003.

  1. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Is it just me or is that an obvious modification for the NV40? If they need Branching in it, they NEED to be smarter than that anyway.
    So assuming the number of "necessary" stages/slots/threads in the NV40 will greatly diminish would be a safe bet IMO. So if they reduced the number of average slots by 50-70%, and doubled the register file again as they did in the NV35... You might get to an extremely reasonable amount of register performance hit.

    Heck, if you need 8 registers to get anysort of real performance hit, and the performance hit of 16 registers would be roughly the same as the one of 4-6 registers on the NV30... I'd even say that unless NVIDIA badly ****s up regarding their ALUs, their performance might be quite excellent indeed!

    Of course, that's a BIG if :)


    Uttar
     
  2. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    No applications have tried to test texture addressing accuracy.
     
  3. Demirug

    Veteran

    Joined:
    Dec 8, 2002
    Messages:
    1,326
    Likes Received:
    69
    Not necessarily.

    If they add a instruction pointer to each quad (10 Bits for 1024 Instruction Programms) it will be easy to execute a different instruction for each quad. At the end of the pipeline you need a additional unit that can change the instruction pointer. This will work well for static branching. dynamic branching is a little bit more complicated because it is possible that not all pixel in one quad have to execute the same instructions. But this can solve too. I know that it work because I have written a little simulationprogramm.

    50-70% is IMHO to much. I think nVidia have to fight for each single Slot they want to remove.
     
  4. Hyp-X

    Hyp-X Irregular
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,170
    Likes Received:
    5
    This doesn't make much sense.

    Say you have 128 slots (because of high register usage), a PS program of 1 TEX instruction and some arithmetic instructions, a 176 tex latency but a <=128 arithmetic latency.

    cycle 0: start the 1st PS instruction on slot 0
    cycle 1: start the 1st PS instruction on slot 1
    ...
    cycle 127: start the 1st PS instruction on slot 127
    cycle 128: idle waiting for slot 0 to become runable
    ...
    cycle 175: idle waiting for slot 0 to become runable
    cycle 176: start the 2nd PS instruction on slot 0
    cycle 177: start the 2nd PS instruction on slot 1
    ...
    cycle 303: start the 2nd PS instruction on slot 127
    cycle 304: start the 3rd PS instruction on slot 0, because not being 2nd instruction is already complete


    I don't say the FX works this way.
    I'm just saying that your argument that the bypass have to be long because the quads order cannot change makes no sense.
     
  5. Nick

    Veteran

    Joined:
    Jan 7, 2003
    Messages:
    1,881
    Likes Received:
    17
    Location:
    Montreal, Quebec
    Could someone direct me to where the DirectX SDK specifies that 24-bit floating-point is enough? I found only the 'Data Types' page of HLSL and it specified 16-bit, 32-bit and 64-bit but not 24-bit (of course I realize this is storage size). There seems to be no precision information about assembly shaders, or am I looking in the wrong document?

    Anyway, I don't think 24-bit for texturing is enough. You can already see artifacts because there are less than 8 bit for the filtering fraction as you can see here: ATI's filtering tricks. Of course you can only see them under rare conditions if there's only one, 'flat' texture. But as soon as the texture coordinates undergo some operations instead of just taking them from the interpolators, there's a great loss of precision that will be visible on things like detail textures. My software renderer uses the rcp SSE instruction which gives at least 12-bit of mantissa precision, for perspective correction, but it gives clear artifacts for nearby surfaces. A 16-bit mantissa is only 16x more precise and this isn't sufficient if some bits are lost with extra operations on these registers.

    So, ATI has to hurry to get some 32-bit floating-point hardware on the market before we have games that experience precision problems. It has been a good compromise for a while, but 24-bit isn't going to last long. I think it's pretty smart of Nvidia to allow different precisions depending on the use. Color calculations can be done perfectly with 16-bit floating-point format while 32-bit can be used for texture coordinates and such. I see no need for 24-bit actually. So I still think DirectX has a major role here in 'deciding' who has the fastest hardware that is 'considered' compliant. They should, like OpenGL, specify exactly what precision is required or recommended for every kind of operation.

    As always, I could have made some serious errors, so please correct me if I'm wrong.
     
  6. radar1200gs

    Regular

    Joined:
    Nov 30, 2002
    Messages:
    900
    Likes Received:
    0
    The simple reality is that there are more GPU's out there in consumer homes capable of doing FP16/32 than there are GPU's capable of doing FP24.

    If I were a developer I would be supporting the format my customer is ,ost likely to have in his box, rather than some theoretically superior standard.

    Back in the day the Motorola 68000 family was considered superior to Intel x86, but x86 won because it had a larger installed base. Superiority does you no good if noone is making use of it.
     
  7. Bouncing Zabaglione Bros.

    Legend

    Joined:
    Jun 24, 2003
    Messages:
    6,363
    Likes Received:
    83
    Oh right, that must be why we are still using 8 bit colour and 286 CPUs .... :roll:
     
  8. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,517
    Likes Received:
    24,424
    I don't buy that. Show me the numbers.
     
  9. Ostsol

    Veteran

    Joined:
    Nov 19, 2002
    Messages:
    1,765
    Likes Received:
    0
    Location:
    Edmonton, Alberta, Canada
    Based on my own tests, precision in fragment shaders will only make a difference for dependant texture reads. For normal texture sampling, the texture coordinate is tossed into the texture sampler at the full FP32 provided by the vertex pipeline.
     
  10. radar1200gs

    Regular

    Joined:
    Nov 30, 2002
    Messages:
    900
    Likes Received:
    0
    The 5200 and 5600 made sure of that. ATi had the same chance but decided not to compete in the low-end with a DX9 class chip. I bet Dave Orton would love to strangle whoever was responsible for ordering several squillion RV250 chips from the foundries...
     
  11. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    Refrast does the same. Also, 32 steps is a lot, it takes some really extreme examples to make then visible.
    Exactly, and these don't occur in games.
    Where's this "great loss in precision" coming from? Not from the LOD fraction, certainly. Not from FP24 certainly.
    Prove it. The HW is here now. And why would you do perspective correction in the pixel shader? Also, there are 17 bits of mantissa when you count the implied bit.
    Whatever.
    DX9 does specify. It specifies a minimum of 24-bit precision in the pixel shader, or 16-bit precision when _pp is specified. Not very complicated.
     
  12. [maven]

    Regular

    Joined:
    Apr 3, 2003
    Messages:
    645
    Likes Received:
    16
    Location:
    DE
    I think it would be in the DDK, couldn't find anything in the SDK...

    But they (limited fractional bits for texture-interpolators and dependent reads) are different sources of error...

    This is a notion I disagree with. To quote one of my Numerical Analysis lecturers, it's the same as proclaiming the patient dead while the operation hasn't even started yet.
    There are certain types of operations (in conjunction with particular data) that can have catastrophic effects on accuracy, but you need to be aware of those (as a programmer) anyway; and they do not necessarily occur.

    Note that I haven't addressed at all, whether FP16/24/32 is enough for anything or not...
     
  13. gokickrocks

    Regular

    Joined:
    Dec 19, 2002
    Messages:
    465
    Likes Received:
    1
    so by your logic, developers should be coding for intel...

    also, the majority of cards dont use floating point in the shaders, so again by your logic, it would imply that developers should be using integers...dont know about you, but I dont see many people running dx9 cards...
     
  14. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,517
    Likes Received:
    24,424
    Once again, show me the numbers.
     
  15. SpellSinger

    Newcomer

    Joined:
    Jan 10, 2003
    Messages:
    60
    Likes Received:
    0
    The NV products can't support 32FP even with their high end at an acceptable performance level.

    I still don't buy it.
     
  16. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Which is precisely what I'm worried about.
     
  17. radar1200gs

    Regular

    Joined:
    Nov 30, 2002
    Messages:
    900
    Likes Received:
    0
    developers already do code for Intel, with no problems since with the exception of SSE2 on the intel side and 3dnow! on the amd side the code is identical anyway and AMD cpu's often benefit more from optimisations for Intel than Intel itself does.

    As for integer support, thats where Cg would have come in.

    Far from being an evil plan to take the graphics world over, Cg lets developers code once and then runs it optimally on the target architecture (including competitor architectures if the competitor actually bothers to write a competent backend). It is HLSL for DX8 and OpenGL.
     
  18. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    This is the way aliasing works. You often don't even notice it in still screens unless extreme examples are taken.

    Examples:
    1. Edge aliasing: under most circumstances, you have to zoom very far in to see edge aliasing in a screenshot. The effect, however, is much more visible in motion.

    2. Texture aliasing: again, you don't see it under most circumstances. It's pretty easy to show, in fact, that a LOD bias of -1 can look pretty darned good in a screenshot. But it there will be quite a lot of shimmer when in motion.

    This is why I have always opposed people using in-game screenshots to test texture quality and AA quality. For these things, synthetic screenshots are vastly more telling. If you want to use games to test such things, the only way to do it properly would be to use video. That never happens, however, and I don't think it ever will.
     
  19. gokickrocks

    Regular

    Joined:
    Dec 19, 2002
    Messages:
    465
    Likes Received:
    1
    way back, MikeC posted a Q3 video in regards to AA on both the 9700 and the 5900 IIRC...so there you go
     
  20. radar1200gs

    Regular

    Joined:
    Nov 30, 2002
    Messages:
    900
    Likes Received:
    0
    What would be a good compromise is a screenshot showing the overall scene, then a 10 to 30 frame animation using animgif format or similar (can .png do anim sequences?) of a small selected portion of the screen designed to show the effect in motion.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...