MaxPC on R420/NV40 Precision

akira888 · Dec 19, 2003

Nvidia FUD said:
Also, if you want to use FP16 you'll have a smaller frame buffer so it has a lower footprint in memory as well.

What total utter and complete bullshit. Lord, what a bunch of fvcking liars.

I've never heard of a consumer card with more than 32-bit front buffers. Oh well.

AndrewM · Dec 19, 2003

akira888 said:
Nvidia FUD said:

Also, if you want to use FP16 you'll have a smaller frame buffer so it has a lower footprint in memory as well.

Click to expand...

What total utter and complete bullshit. Lord, what a bunch of fvcking liars.

I've never heard of a consumer card with more than 32-bit front buffers. Oh well.

Maybe that person meant pixel buffers, and got confused and for some reason stuck frame buffer in there? Makes sense, since you cant have an FP frame buffer

Geo · Dec 21, 2003

Toasty said:
I believe the 64bit FP refers to the precision of the framebuffer. AFAIK, framebuffers (ones read by the RAMDAC) are still 32bit INT on R3xx/NV3x.

Well, they don't say *where*. They do use the word "confirmed" re NV40 tho, which I assume was not used lightly. Maximum PC really is pretty credible and has good sources. I would be surprised if there isn't increased precision in NV40 *somewhere* that can't credibly be described as "64-bit". Whether that increased precision is in the same place(s) the increased precision they are claiming for R420 is a separate question.

One could see marketing pukes at NV being in favor of such a move, and salivating over the "64-bit precision!" checkbox, no matter how consequential it is in fact, particularly if ATI does bump up to 32bit.

KimB · Dec 21, 2003

sonix666 said:
If only things were that easy. First of all, 16 bit and 32 bit formats don't match at all. Secondly, I don't believe one second that you just combine two FP16 units to create a FP32 compatible one. It simply isn't possible in my opinion.

I think you can. Do you know how an adder is constructed? Anyway, imagine this. Each letter represents a 16-bit word:

Code:

 (32-bit addition)
  ab
 +cd
=efg

If you think about the underlying operations, the only difference is the one bit that gets carried over between the two words. This is obviously trivial with integers (all you need is a single switch), but can it be done with floating point numbers?

I would tend to believe so. It would basically require having two separate exponent calculation units.

A multiplier can be similarly broken up:

Code:

(32-bit multiplication)
   ab
  *cd
=efgh

If you work this out, it seems like the results a*c and b*d are stored at some point along the line before the final result is calculated. Similarly, you would need separate exponent logic, but the rest shouldn't be that expensive.

Edit:
I think it's more than possible to make a single FP32 unit work like two FP16 units. The only question is: how many more transistors will it take? Would it be worth it?

akira888 · Dec 21, 2003

EDIT: Scratch that thought.

KimB · Dec 21, 2003

geo said:
Well, they don't say *where*. They do use the word "confirmed" re NV40 tho, which I assume was not used lightly. Maximum PC really is pretty credible and has good sources. I would be surprised if there isn't increased precision in NV40 *somewhere* that can't credibly be described as "64-bit".

As a couple of others had said, I think the most likely would be that it's support for a 64-bit framebuffer. This would make the most sense, and a FP16 framebuffer would be best described as a 64-bit framebuffer (it's 64 bits per pixel, after all).

Does this mean that the NV40 will support all operations on a 64-bit framebuffer that they currently support on a 32-bit framebuffer? I think that would be the best-case scenario, and would be, in my opinion, just about the only implementation of a 64-bit framebuffer that would mean anything of importance, that would make the statements about "greater high dynamic ranger" and whatnot meaningful. FP64 wouldn't be useful HDR.

Ilfirin · Dec 21, 2003

AndrewM said:
akira888 said:

Nvidia FUD said:

Also, if you want to use FP16 you'll have a smaller frame buffer so it has a lower footprint in memory as well.

Click to expand...

What total utter and complete bullshit. Lord, what a bunch of fvcking liars.

I've never heard of a consumer card with more than 32-bit front buffers. Oh well.

Click to expand...

Maybe that person meant pixel buffers, and got confused and for some reason stuck frame buffer in there? Makes sense, since you cant have an FP frame buffer

framebuffer != front buffer

We already have 64-bit and 128-bit frame buffers on the r(v)3x0 cards (nv3x cards have them too, they just aren't standards compliant), they are just fairly limited (as John said in that plan update).

I can see of no real point to a 64 or 128-bit front buffer.

BTW, the DX Next slides said that the 2004 line of GPUs are going to have proper FP framebuffer blending; perhaps that's what was meant by the article?

KimB · Dec 21, 2003

Limited is they key word there, Ilfrin.

Ilfirin · Dec 21, 2003

Yes, hence this line:

BTW, the DX Next slides said that the 2004 line of GPUs are going to have proper FP framebuffer blending; perhaps that's what was meant by the article?

AndrewM · Dec 21, 2003

Ilfirin said:
AndrewM said:

akira888 said:

Nvidia FUD said:

Also, if you want to use FP16 you'll have a smaller frame buffer so it has a lower footprint in memory as well.

Click to expand...

What total utter and complete bullshit. Lord, what a bunch of fvcking liars.

I've never heard of a consumer card with more than 32-bit front buffers. Oh well.

Click to expand...

Maybe that person meant pixel buffers, and got confused and for some reason stuck frame buffer in there? Makes sense, since you cant have an FP frame buffer

Click to expand...

framebuffer != front buffer

Thanks, but I do understand the distinction, and the limitations. Some people tho, dont..

KimB · Dec 21, 2003

Ilfirin said:
I can see of no real point to a 64 or 128-bit front buffer.

Oh, forgot to comment on this.

I can: higher-precision DACs (of course, the most you'd want for that is a 16-bit per channel integer front buffer)

sonix666 · Dec 21, 2003

Chalnoth,

integer addition and substraction at higher that supported accuracy on CPUs isn't hard indeed because of the carry bit support. Multiplication however is harder and division has to be emulated completely in software.

floating points however have the problem of the exponent and I don't see any way how this exponent problem can be solved by pasting two FP16 units together.

I think that my point is strengthened by the fact that no existing CPU uses two 32 bit FPU units to do a 64 bit FPU operation. The floating point units are 64 bit (or 80 bit in case of x87) and the speedup comes only from less bits to calculate and less memory bandwidth pressure.

KimB · Dec 21, 2003

sonix666 said:
floating points however have the problem of the exponent and I don't see any way how this exponent problem can be solved by pasting two FP16 units together.

First of all, I'm not considering pasting two FP16 units to make one FP32 unit. I'm considering taking one FP32 unit and modifying it to support two FP16 operations.

Imagine this.

Addition:
I believe FP addition essentially caculates the difference in the exponent, and then offsets one of the mantissas, whichever has the lesser exponent. Then it's just integer addition. So, with separate logic units for handling the exponent for each FP16 number (You'd have one 8-bit exponent unit for FP32 or one FP16 number, and one 5-bit exponent unit for the second FP16), doing this with addition would be trivial.

Multiplication:
Again, would need separate logic for the exponent of the second FP16, but this should be similarly simple. Once the exponent math is completed, again it's simple integer multiplication.

Division:
I don't see a problem with not worrying about doing two FP16 divisions

I think that my point is strengthened by the fact that no existing CPU uses two 32 bit FPU units to do a 64 bit FPU operation. The floating point units are 64 bit (or 80 bit in case of x87) and the speedup comes only from less bits to calculate and less memory bandwidth pressure.

Most CPUs also aren't made to handle highly-parallel data, so I'm not sure this says anything.

Hyp-X · Dec 21, 2003

Chalnoth said:
Addition:
I believe FP addition essentially caculates the difference in the exponent, and then offsets one of the mantissas, whichever has the lesser exponent. Then it's just integer addition.

You are missing the re-normalization step.

Btw, I think the best way to double FP16 performance is to have a separate FP16 unit (in addition to the FP32 unit which can also do FP16).

Ostsol · Dec 22, 2003

Chalnoth said:
Division:
I don't see a problem with not worrying about doing two FP16 divisions

Don't bother speculating on this, as there's no division instruction in any shader spec. The HLSLs support it, but probably only by taking the reciprocal of the denominator and multiplying. I read somewhere that floating point division on CPUs is rather expensive, so there's no surprise that it's not natively supported in GPUs.

KimB · Dec 22, 2003

Hyp-X said:
Chalnoth said:

Addition:
I believe FP addition essentially caculates the difference in the exponent, and then offsets one of the mantissas, whichever has the lesser exponent. Then it's just integer addition.

Click to expand...

You are missing the re-normalization step.

Yeah, I assumed that would be part of the exponent logic that was duplicated.

Btw, I think the best way to double FP16 performance is to have a separate FP16 unit (in addition to the FP32 unit which can also do FP16).

That would work, too. And, if the multiplier is the majority of the cost of the math unit, the extra FP16 unit should take roughly 1/4th the transistors of the FP32 unit.

Hyp-X · Dec 22, 2003

Chalnoth said:
Hyp-X said:

You are missing the re-normalization step.

Click to expand...

Yeah, I assumed that would be part of the exponent logic that was duplicated.

But that's not just exponent logic. Far from it.

Say you add 1001b and -1000b, the result will be 0001b.
Re-normalization have to search the largest 1 in the mantissa, subtract from the exponent and shift the mantissa.

It's probably the "hardest" part of an FP adder.

KimB · Dec 23, 2003

Hyp-X said:
Say you add 1001b and -1000b, the result will be 0001b.
Re-normalization have to search the largest 1 in the mantissa, subtract from the exponent and shift the mantissa.

It's probably the "hardest" part of an FP adder.

Really? It doesn't seem all that complex to me. First you would pass the mantissa through a "mask" that is designed to only allow one bit to be on, the leftmost one. That shouldn't be tough.

This information would go to two places. One would go to another adder, but first through an encoder that would translate that one "lit" bit to an exponent form. That integer adder would then add that bit to the current exponent (or, usually, it would subtract, so the encoder could be optimized for this eventuality).

The second place is to a shifter for the mantissa. This would be the most expensive part of the process, since you'd have to shift the mantissa by an arbitrary amount. I can't think offhand of an optimal way to do this, but there's probably an off the wall way somebody's thought of that has become the standard method. This operation, however, was done once before at the beginning of the addition, so this part at the end is hardly the most expensive (I'd say it's just as expensive as the section at the beginning of calculation).

Arun · Dec 23, 2003

Just thought I could shed some light on the interpretation of the interview rwolf and others linked to: http://www.beyond3d.com/previews/nvidia/nv30launch/index.php?p=2

This is not marketing speak or FUD AFAIK. It's sheer lack of technical knowledge from NVIDIA's PR department. This, IMO, is not worse than FUD. It was one of my main points in ULE actually, and one I wasn't clear enough about: NVIDIA's PR personnel doesn't have sufficient technical knowledge to deal with the "hardcore" enthusiast community. This results in them telling lies, which they believe are true, then getting insulted for it. Of course, much of the time, it's pure FUD; but people are exagerating how much FUD NVIDIA *wants* to feed us.

Now, there's some stuff about this in ULE, but I'll try to say it in more details...

1. [stuff]
2. BB says in an obviously sincere way that he believes FP16 = twice the performance of FP32.
3. I say to him I'm going to use that quote in ULE
3. He badly misinterprets it, probably thinks I'm bashing him in particuliar and trying to get him fired; let me restate that it was never my goal with ULE to cause any terminations. He says, among other things: "I don't care if you think I need to know that. It's not my job to know whether FP16 is not twice as fast as FP32. I've been doing this for 10 years, and I'm great at my job" (paraphrase, obviously).

From this, I guess that PR and Marketing personnel was briefed with severly lacking technical facts regarding the NV30. Whether this is what caused the "overhyping" of the NV30, I don't know. Fact is, the communication between PR and Engineering is absolutely catastrophic; and don't tell me they told them this stuff cause they knew it would have looked better for them.
What's better: "Our hardware makes FP16 twice as fast as FP32 because it supports both formats natively"
Or: "Our hardware is capable of running FP32 operations at speeds extremely near to those of FP16 operations, while saving tremendous performance by not using more than FP16 for rendertargets. We support the best of both worlds, along with a featureset which easily benefits from FP32, compared to the competition's lower precision format(s)."

No, seriously. I don't know about NVIDIA's inter-team communication systems, which are supposedly so complex and yet so powerful. I've heard very good things about them, but very bad ones too. I just can't judge as some people might be talking out of their asses. But I firmly believe their inter-departemental communications, well, suck.

Fixing it wouldn't be particularly hard, either. Brian Burke, Derek Perez, and so on are making an astonishingly good job of handling the low-end and mid-end markets IMO. Problem is, for the tech-savy people, their technical knowledge simply isn't sufficient. IMO, it might even be better to have someone with little PR experience but a lot of tech experience for that part of the market.

The problem is NVIDIA doesn't seem to realize the difference between the market segments enough. Simply hiring someone with good GPU experience, training him a few weeks on specifics maybe, would do the trick. And don't tell me either that it's hard to find such a person - the forums of Beyond3D, for example, got quite a ton of people which would fit for the job, although they might need some training for the basics of certain things. And there are tons of other places where they could fetch such a person from. Even converting one of their own engineers might work, who knows!

The last thing I want to see is someone like BB getting fired. That's ridiculous. NVIDIA as a whole would be in an even bigger mess right now if it wasn't for people like him. But perhaps the enthusiast market would be more favorable to them; which is why other people, or to be even more accurate, a different type of people, are required to deal with that market.

Uttar

Geo · Dec 24, 2003

Uttar said:
. . .Problem is, for the tech-savy people, their technical knowledge simply isn't sufficient. IMO, it might even be better to have someone with little PR experience but a lot of tech experience for that part of the market.

Uttar

This is why I support the use of Richard Huddy that ATI has made in this area, and wish they would do so even more. . .

MaxPC on R420/NV40 Precision

akira888

AndrewM

Geo

Mostly Harmless

KimB

akira888

KimB

Ilfirin

KimB

Ilfirin

AndrewM

KimB

sonix666

KimB

Hyp-X

Irregular

Ostsol

KimB

Hyp-X

Irregular

KimB

Arun

Unknown.

Geo

Mostly Harmless

Similar threads