FP16 and market support

Ailuros said:
Now if you look at the red dots at both pics on each axis (x,y), you'll see that ordered grid has a 2*2 grid, while 4x sparse a 4*4 grid.
Not exactly true. The R3xx hardware supports arbitrary sample positioning (I'm not sure as to the exact accuracy allowed....). The Linux drivers allow you to place the samples wherever you want.

That simplification is good for understanding how the sample pattern was chosen, not for understanding how the hardware was designed.
 
Chalnoth said:
Not exactly true. The R3xx hardware supports arbitrary sample positioning (I'm not sure as to the exact accuracy allowed....). The Linux drivers allow you to place the samples wherever you want.

That simplification is good for understanding how the sample pattern was chosen, not for understanding how the hardware was designed.
I understood his comments to be about EER, not how the hardware is designed.
So...whats your point?
 
The only relevance to fp16 is Nvidia's bad design. As soon as they fix it fp16 will be gone.

Only products like Deltachrome that are built in someones basement are going to follow in Nvidia footsteps in order to save a few transistors.

Nvidia's multiple precision mess has resulted in many games having Nvidia code the shaders so that they can achive playable framerates.
 
Nope, multiprecision will always exist. First of all, it already exists with respect to input/output (look at all the texture formats, and framebuffer formats) So storage is already multiprecision. Developers want the ability to deal with multi-precision data. We don't render everything at HDR resolution just because "it is easy and simple" to always use max-precision on everything.

Secondly, if you look at where 3.0 is going, and OpenGL2.0, they are reintroducing Integer variables back into the pipeline, which means developers will have the choice of carrying out operations at FX16 precision (or any fixed-point inbetween)

NVidia gave FP16 a bad name alright, but there is nothing wrong with multiprecision. It's a shame know-nothing fanboys have to associate a technical format with their war rants. I personally like the idea of having hgh-speed integer scalar units that work in parallel with FP32 vector units.
 
I don't see Microsoft adding fp16 in PS3.0.

I think an integer unit and an fp unit make sense, however having both fp16 and fp32 seems like a foolish waste of transistors.

I don't get your argument about data pecision formats because cpus have max precision fp units and deal with smaller data precisions all the time.

I think fp16 only exists so Nvidia can stuff two pixels down a 128-bit pipe instead on one. Works great for DX8 stuff.
 
Doomtrooper said:
sonix666 said:
No, they support 4x OG FSAA and that is 4x FSAA, no matter what you will claim. ATis 4x RG FSAA looks better at near horizontal/vertical edges, while nVidias 4x OG FSAA looks better at diagonal edges. So, both ATIs and nVidias FSAA implementation have their pros and cons.

Don't agree, the FSAA tester doesn't either.

http://www.nvnews.net/files/graphics/screenshots/fsaatester/aa_comparison.shtml
The FSAA tester does agree completely. ATi looks better at near horizontal and near vertical edges than nVidia. nVidia looks better at near diagonal edges. Not only is it very visible in the screenshots, it's simply a technical fact. If you can't see it, I'd suggest you buy (new) glasses. ;)

But OK, one thing ATi has an advantage over nVidia is that they do gamma corrected FSAA.
 
YeuEmMaiMai said:
1. FP is only part of the spec when the developer opts for partial percision, not when Nvidia decides to replace shaders that call for FP24 (well FP32 for nvidia) with hand coded FP16 ones.... do you see the difference?
Well, in case of the Dawn demo, the developer clearly opted for partial precision, so the demo is still a DirectX 9 demo. I am not discussing the cheating and hand replacement of shaders of games and demos by nVidia.

YeuEmMaiMai said:
If the demo was GPU limited, it is only GPU limited on NVidia's hardware since we already determined that it runs faster on ATi's/ Personally I think it is CPU limited as my 1.8G AMD 2200+ and 9500Pro get an average of 35 fps at 1600*1200*32.
Well, GPU limited means that getting a better CPU will only have a small amount of impact on the performance. CPU limited means that getting a better GPU will only have a small amount of impact on the performance. I think it is GPU limited no matter wheter a ATi or nVidia card is used. But as soon as the download is done, I will try it.

YeuEmMaiMai said:
The gripe I have is FP16 (contrary to what Nvidia claims) clearly is not sufficient for if it was MS would have adopted it and we would not be having this discussion.
MS has adopted it, however only when partial precision hints are used. In my opinion, when the extra precision of FP24 or FP32 doesn't bring anything extra, a developer should use FP16. Somehow, some people here argue that you shouldn't, while on normal software where performance is required, people are used to using the precisions that are enough and bring better performance.

Again, I am not discussing cheating/shader replacement in drivers.

YeuEmMaiMai said:
lets take a look at Nvidia's history concerning IQ shall we?

NV1 failure
I have never seen it, so I can't judge about quality. However, wasn't this the board that supported quads, but Microsoft wasn't bold enough and didn't include support for it in their first Direct3D version?

YeuEmMaiMai said:
Riva128 inferior IQ to every other card out there (load up Jedi Knight on it and notice it cannot render transparent textures inferior IQ compared to S3 verge, Ati rage3d, rendition vertite, 3dfx, matrox, etc but was fairly fast so it sold
2d was crap at anything above 1024*768
Can't judge about it because I have never seen it.

YeuEmMaiMai said:
TNT/2 once again inferior IQ to every other card out there including Rage IIc, Savage 3d/4, rendition vertite 2X00, banshee, matrox very fast so it sold
Image quality in 32 bit 3D was far from inferior to any other card. If you mean 2D quality (sharpness of the image) I can agree with you.

YeuEmMaiMai said:
ditto for GF1,2,3 and finally in GF4 they get their 2D act together but they still have trouble with ANSIO and FSAA quality
What is bad about the image quality of the GF line? The only gripe one can have is that they didn't implement 32 bit interpolation for the DXT1 texture compression, resulting in banding where smooth color changes occurred in textures.
 
rwolf said:
I don't see Microsoft adding fp16 in PS3.0.
They already support it in PS2.0, but only using precision hints. But under normal application programming (not for GPUs) as a developer I give precision hints all the time. Hey compiler, here you need a 16 bit integer, here a 32 bit integer, now use a 32 bit float, now do it with a 64 bit float. Choosing your minimum precision should be standard practise even for pixel shader programmers.

rwolf said:
I think an integer unit and an fp unit make sense, however having both fp16 and fp32 seems like a foolish waste of transistors.
A FP32 unit can also do FP16. For example, a x86 CPU (P4, Athlon, etc.) have a 80 bits FPU, but it can do 80 bits, 64 bits and 32 bits floating point calculations. You don't need a seperate FP16 unit to support FP16 calculations.
 
Two things:

Dawn was a showcase for CineFX, not DirectX9. A DX9 showcase utilizing OpenGL would be curious to say the least.

S3TC/DXTC was fixed in GF4. Easy to prove. Just look at Isako's forehead in NOLF2 on a GF3 & a GF4. Also compare SOF1 using S3TC.
 
radar1200gs said:
S3TC/DXTC was fixed in GF4. Easy to prove. Just look at Isako's forehead in NOLF2 on a GF3 & a GF4. Also compare SOF1 using S3TC.
Not really. The GF4 used a dithering method to attempt to improve image quality, while still using the same 16-bit decompression. This means that the GF4 looks a lot better than the GF3 whenever textures are minified, but looks absolutely terrible when textures are under magnification. One particularly ugly example was the boat level in the original UT (the name escapes me).
 
The dithering option is OpenGL only and can be toggled with RivaTuner. I just tested in SOF1 and could not notice any difference whatsoever between dithering enabled or disabled.

NOLF2 is a DirectX title and you can clearly see the improved DXTC. The first easily noticeable level is "the death of cate archer" in the cutscene where Isako wounds cate. Look at Isako's forehead, particularly the edges of the circle in the middle.

Also visible in nolf1, skin tones on the models in the unity briefing room for example.

EDIT:
I don't have UT installed (not a mutli-player gamer at all), but, without seeing the level, if it is the water that is ugly, then the developer used the wrong (DXTC1/S3TC1) format and should have used DXTC3/S3TC3 instead. DXTC1/S3TC1 is unsuitable for alpha transparency and light maps.
 
the boat level is called the gallon for UT


BTW saying that you never saw it doesn't mean it isn't true, I have seen it both in person and in reviews done so it is true..........


as for the NV1 the whole chip was a vailure and again it was not the fault of MS, almost everyone was using triangles (except Nvida who did not want to support it) and MS went with the common idea.........
 
radar1200gs said:
EDIT:
I don't have UT installed (not a mutli-player gamer at all), but, without seeing the level, if it is the water that is ugly, then the developer used the wrong (DXTC1/S3TC1) format and should have used DXTC3/S3TC3 instead. DXTC1/S3TC1 is unsuitable for alpha transparency and light maps.
UT uses DXT1 only. There are no textures in UT that require more than one bit of alpha, and since the hardware at the time had no problem with DXT1, that format was used.

One of the major problems with the high-res compressed textures in UT is that most of the textures in the game are replaced, not just ones with high-res counterparts. This can cause visual problems on GF3/GF4 cards for coronas and some sky textures.
 
sonix666 said:
rwolf said:
I think an integer unit and an fp unit make sense, however having both fp16 and fp32 seems like a foolish waste of transistors.
A FP32 unit can also do FP16. For example, a x86 CPU (P4, Athlon, etc.) have a 80 bits FPU, but it can do 80 bits, 64 bits and 32 bits floating point calculations. You don't need a seperate FP16 unit to support FP16 calculations.
The only benefit of supporting smaller data types is in I/O bandwidth. That's the way it goes on CPUs, and that's the way it goes in proper GPU designs (yes, R300 can handle 16bit FP external data, such as textures and render targets).

It's still a complete waste of time for on-chip temporaries. Its only relevance is in a decision between
a)you have a sufficiently sized register file
b)you don't

Pick "a" and be done with it.

I'm aware that 3DNow/SSE increase the effective size of the register file, akin to FP16 on NV3x. But even the x87 floating point register space (8 scalars) is sufficient. If you need more than 8 temporaries, you're likely limited by computation anyway, and you also have rather more ILP to work with, so the savings in IO wouldn't buy you much.

The 'legacy' x87 FPU model sucks for a lot of reasons, but it's not the size of the register file. OTOH NV3x sucks for a lot of reasons, too, but this time register file size is high on the list. That's the issue, supporting FP16 is just a lame excuse for not solving the root problem.
 
zeckensack said:
The only benefit of supporting smaller data types is in I/O bandwidth. That's the way it goes on CPUs, and that's the way it goes in proper GPU designs (yes, R300 can handle 16bit FP external data, such as textures and render targets).

I am disagree. A real clever GPU design use a FPU that runs FP32 at full speed and FP16 at a higher speed. It is possible to build such a FPU that use the same adder/halfadder array for both formats. You need only a little bit more control logic.
 
Demirug said:
zeckensack said:
The only benefit of supporting smaller data types is in I/O bandwidth. That's the way it goes on CPUs, and that's the way it goes in proper GPU designs (yes, R300 can handle 16bit FP external data, such as textures and render targets).

I am disagree. A real clever GPU design use a FPU that runs FP32 at full speed and FP16 at a higher speed. It is possible to build such a FPU that use the same adder/halfadder array for both formats. You need only a little bit more control logic.
True. AFAIK AMD's SSE/3DNow implementations work this way.

It's evident that this isn't happening on NV3x though. Also keep in mind that the same register file issue still applies if you do that: if you don't have enough storage for one FP32 vector, you don't have enough storage for two FP16 vectors either. As it seems, there is barely enough space for one FP16 vector (per clock, per op ... you're the expert ;)). The storage issue must be solved first, otherwise there will be no benefit.
 
zeckensack said:
True. AFAIK AMD's SSE/3DNow implementations work this way.

It's evident that this isn't happening on NV3x though. Also keep in mind that the same register file issue still applies if you do that: if you don't have enough storage for one FP32 vector, you don't have enough storage for two FP16 vectors either. As it seems, there is barely enough space for one FP16 vector (per clock, per op ... you're the expert ;)). The storage issue must be solved first, otherwise there will be no benefit.

I am talk about a clever design not NV3X. Possibly it was clever in the beginning before they have to cut it down.

NV35 have space for 2 FP32 Vector per Pixel. If you use more than this it will slow down
 
DemoCoder said:
Nope, multiprecision will always exist. First of all, it already exists with respect to input/output (look at all the texture formats, and framebuffer formats) So storage is already multiprecision. Developers want the ability to deal with multi-precision data. We don't render everything at HDR resolution just because "it is easy and simple" to always use max-precision on everything.

Whats wrong with doing everything in high precision if speed is not affected.
 
Back
Top