Games and Pixel Shader 2.0

Heathen said:
Microsoft's spec is holding it back,

Interesting placement of blame. Especially considering all the hype Nvidia were pumping out. "A dawn of cinematic computing." Not sure whether to laugh or cry.

It was Nvidia's choice to put such an unbalanced design together.
Irrespective of whether or not the support of the integer format for calculation was a good idea, Microsoft's refusal to support any integer shading format makes things much harder on the FX line (below NV35). Without API support, and since the NV30-34 need to calculate many things at integer precision for decent performance, programmers are going to be unable to make the NV30-34 look good and perform well in all cases.

If there was API support for integer types, programmers could use FP whenever necessary. Since nVidia is essentially forced to break with the spec and use integer calcs anyway, many of the times that integer is used will be detected incorrectly, causing quality loss.

I say it's one hell of a lot easier for Microsoft to add integer types to the spec than it is for nVidia to change the hardware.

Depending on the calculation, one can use integer formats with no noticeable quality loss. It all depends on the math. Always remember that the display output will always be 8-bit integer anyway, and nVidia's integer calculation format is 12-bit. I will admit that it is more than possible for nVidia to have miscalculated how many operations should use integer format, but it is absolutely silly to think that all calculations need integer format. Why do you think we still have integer units on CPU's?
 
Chalnoth said:
Irrespective of whether or not the support of the integer format for calculation was a good idea, Microsoft's refusal to support any integer shading format makes things much harder on the FX line (below NV35).

Microsoft's refusal?

Or Nvidia's lack of planning/collaboration?

Without API support, and since the NV30-34 need to calculate many things at integer precision for decent performance, programmers are going to be unable to make the NV30-34 look good and perform well in all cases.

Why its it the programmers job to make the NV30-34 look good and perform well in all cases?

Isn't it the hardware design that is supposed to help the programmers make programs that look good and perform well?

-----
It seems to me you're logic is rather backwards....
 
Microsoft's refusal to support any integer shading format makes things much harder on the FX line (below NV35)

MS didn't design DX9 in isolation you realise?

If there was API support for integer types, programmers could use FP whenever necessary

ATi proved quite conclusively you don't need int.

Since nVidia is essentially forced to break with the spec and use integer calcs anyway

No, nvidia aren't forced to do anythng. they chose to break with the spec to get any modicum of speed. Their design choice once again.

When it comes to nvidia and the NV30 the following saying applies [sarcasm]"My heart bleeds for them"[/sarcasm]
 
No, nvidia aren't forced to do anythng. they chose to break with the spec to get any modicum of speed. Their design choice once again.
Sure, and Microsoft doesn't change its DirectX9 specs at the last minute out of nothing more than political games.
 
Heathen said:
ATi proved quite conclusively you don't need int.

Yeah, so did Nvidia... if you want to pay for it. But, most users don't, and ATi doesn't have a fp product to compete at the budget level. Just because the high end cards of today don't "need" integer doesn't mean the lower end cards couldn't benefit from it.
 
sonix666 said:
Sure, and Microsoft doesn't change its DirectX9 specs at the last minute out of nothing more than political games.
You think that Microsoft removed all integer pixel processing from the PS2.0 spec at the last minute? :oops:

Based on recent posts it appears that everyone really was out to get nVidia this generation - deliberately altering the specs, reducing their yields, and worst of all, releasing better products... How could they be expected to compete in an unfair environment like that.
 
I think MS deferred in favor of a "simpler" spec and one that was the least common denominator between ATI and NVidia. Having to support multiple datatypes complicates things, and they were rushing to get DX9 out the door.


ATI didn't "prove" you don't need integer. They simply proved that you could have a simpler 24-bit FP design that could run fast and that the lack of integer wouldn't hurt you on old DX7/DX8 content.

If someone ever delivers a card with the FP24-32 performance of ATI, but with 2x the FP16 and integer performance, and that card blazes in OpenGL2.0 (the only API at the moment that can deal with it in the pixel shaders via HLSL), then in hindsight, ATI's decision will look like a conservative design decision, and NVidia's will look like a bungled start, but bold design that panned out.

NVidia and ATI both "prove" you don't need tiled based deferred rendering. But if someone ever delivers a TBR DX9 design with much higher performance, their brute force conservative approach will look bad in retrospect.


Remember, history is written by the winners.
 
Luminescent said:
Theoritically NV35 will be faster than NV30 on fp shaders with more than a few instructions and may approach twice the speed on complex shaders. Remember (brings back memories, doens't it Uttar), NV35 contains two fp units arranged serially, per pipeline (and 4 pipelines). Even though it houses a total of 8 fp units, each pipeline can only write one color result at a time.

Luminescent, have you seen any evidence that actually backs up that it has two FP shaders?
 
DemoCoder said:
On the other hand, OpenGL HLSL fully supports integers in pixel shaders, upcasting to float if you don't have integer support.

So now we have a dichotomy. OGL2.0 will support both 16-bit integer, and 32-bit floating point types in the pipeline (compiled by driver), and DX9 only supports 16-bit floating point, and 32-bit floating point. While the DX9 HLSL supports declaring double (64-bit FP) and int types, there is no way to pass this information through to the driver in ps2.0 or ps3.0.

Care to explain how that really helps NVIDIA? They don't support 16-bit integers.
 
DemoCoder said:
I think MS deferred in favor of a "simpler" spec and one that was the least common denominator between ATI and NVidia. Having to support multiple datatypes complicates things, and they were rushing to get DX9 out the door.
I really don't see how making one more data type (there are two currently) complicates things. We've had multiple data types for ages in CPUs. Why shouldn't there be integer data types in GPUs?

There is a definite reason why integer data types are beneficial: they require fewer transistors. This means that if programmers are willing to bother using integer data types, integer units can be added for increased performance without adding huge numbers of transistors.

It just seems absurd to me that Microsoft closed the idea of integer data types.

And, one final thing, remember that the NV30 was in development for probably two years before the release of DirectX 9. The decision to support integer types was likely made very early on.
 
Potentially Stoopid Question :

Anyone knows if final DX specs are at all influenced (during its formulation period) by demonstration of actual hardware running DX specs proposed by the IHV of that particular hardware?
 
Luminescent said:
Theoritically NV35 will be faster than NV30 on fp shaders with more than a few instructions and may approach twice the speed on complex shaders. Remember (brings back memories, doens't it Uttar), NV35 contains two fp units arranged serially, per pipeline (and 4 pipelines). Even though it houses a total of 8 fp units, each pipeline can only write one color result at a time.

Yep, it does bring back memories :)
But I've recently been trying to get this type of info in other ways - sources. There is a slight mess on the numbers, which seems based on the fact some info came from the 12x1/6x2 days of the NV30 ( RIP ).
It seems the NV30 has 4 PS FP units, and 3 VS FP units, both FP32. I'm not sure about the register usage stuff yet.

I'm not sure about integer existing in earlier designs either. But there are so many of them it wouldn't be too easy at all to track that stuff, eh...


Uttar
 
Chalnoth said:
I really don't see how making one more data type (there are two currently) complicates things. We've had multiple data types for ages in CPUs. Why shouldn't there be integer data types in GPUs?

Why should there be? Just because it's a good/necessary idea in one case for whatever reason (legacy support perhaps) doesn't make it so in another. For example, aren't DSPs typically designed around a fixed width MAC unit? In my opinion having two precisions is already one precision too many.

Does a P4 necessarily execute this instruction:
Code:
add bl, al
faster than:
Code:
add ebx, eax
?

In fact, if you mix the widths of data types on a P4 you can end up with all sorts of nasty things happening like partial register stalls that actually slow your code down.

VPUs are not remotely similar to CPUs in terms of architecture, yet people seem obsessed with saying "It works this way on a CPU, so it should be right for a VPU".

Current VPUs are high-speed streaming SIMD machines, not general purpose processors.

My understanding is that on similar types of architecture in the past (high speed SIMD or vector processors like CRAYs etc.) there has always been a tendency that the high-speed vector unit only runs at its native execution width.

There is a definite reason why integer data types are beneficial: they require fewer transistors. This means that if programmers are willing to bother using integer data types, integer units can be added for increased performance without adding huge numbers of transistors.

Preaching the benefits of having integer data types using fewer transistors is all very well, but if you support a full width data type then adding integer units comes at extra cost, not less, and if as a result you end up with less full-width execution units then you've slowed your architecture down in the most general case in order to speed it up in a specific, less generally applicable case.

It just seems absurd to me that Microsoft closed the idea of integer data types.

Not remotely absurd. I guess they evaluated the idea and decided it was a bad one, or at least they weren't convinced enough that it was a good one. Why do you believe that you by necessity know better than ATI and Microsoft, or that by extension that the decision to support integer wasn't simply the wrong one?

And, one final thing, remember that the NV30 was in development for probably two years before the release of DirectX 9. The decision to support integer types was likely made very early on.

And I believe it was a bad decision given that timeframe, but that's just my opinion.
 
StealthHawk said:
Care to explain how that really helps NVIDIA? They don't support 16-bit integers.

Doesn't help them currently, but could help a future architecture that supports FX16. It really helps 3DLabs, which does support 16-bit integers in the pipeline.

andypski said:
In fact, if you mix the widths of data types on a P4 you can end up with all sorts of nasty things happening like partial register stalls that actually slow your code down.

Except if we're talking MMX and packed types. Here, I can multiply 8 times as many 8-bit values as I can multiply a 64-bit value. Certainly operations can be done far more quickly if you can operate on differently sized packed datatypes. (convolutions, dct, fft, etc)

chalnoth said:
I really don't see how making one more data type (there are two currently) complicates things.

Because Microsoft is trying to coordinate lots of vendors who have incompatible implementations, and believe it or not, adding a single new datatype, can cause alterations all over your spec.

For example, all of the library functions will have to be defined with overloaded versions that take INTs or half floats, or specifics on casting, overflow/saturation/underflow have to be considered.

The 3D apis are so complex now that there is alot of checking that has to be done to make sure new features don't interact badly without existing ones. That's why OpenGL extensions contain lots of verbiage on how the extension interacts with other extensions and API calls.
 
There is a definite reason why integer data types are beneficial: they require fewer transistors. This means that if programmers are willing to bother using integer data types, integer units can be added for increased performance without adding huge numbers of transistors.
For me the most important reason to keep integers is because it's precision doesn't float. For a CPU and normal applications this is very important, but for graphics this might not be a problem. Most graphical issues lend themselves perfectly to the scaling precision of floating points.
 
DemoCoder said:
Except if we're talking MMX and packed types. Here, I can multiply 8 times as many 8-bit values as I can multiply a 64-bit value. Certainly operations can be done far more quickly if you can operate on differently sized packed datatypes. (convolutions, dct, fft, etc)

Yup - there are different cases, and it's something to weigh up when making design decisions. The MMX unit in this case reuses resources (ie. it splits the 64-bit unit into 8-bit units) rather than adding additional execution units to perform narrower packed arithmetic (which would be a waste). In addition the instruction dispatch is the same (ie. you are still issuing a single instruction in the same time-slot, not trying to issue more instructions in a single slot).

You can certainly argue for this sort of packed operation on VPUs - dual 16-bit or single 32-bit per-cycle, with pack/unpack, for example. This doesn't actually appear to be the design route that was taken in terms of NV3x, though.

Whether it makes sense to design this sort of packed arithmetic into the overall environment of a VPU depends on whether it provides significant increases in overall throughput when the IO characteristics are taken into account, and then whether the cases that it accelerates are ones that are deemed to be of interest. When considering a forward looking approach, do you want maximum performance on high-precision data or low precision? Is the low precision basically a legacy path, or is it worth the extra design effort to maximise this performance case as well? All of these things must be considered when choosing the architecture of the system.

I don't think there's been any definitive case made here - the opportunity certainly exists for someone to produce an architecture that demonstrates the potential of a packed arithmetic approach.

We can compare the approaches that are currently out there, however, and I know which one I prefer ;)
 
Whether using integer could be a good thing doesn't matter anymore since nVidia is not using integer anymore in the NV35, and won't either in the NV40. And the NV40 probably ( I did NOT say certainly ) won't even have multiple datatypes.

So let's stop this completely useless flamewar, okay? ;)


Uttar
 
Uttar said:
So let's stop this completely useless flamewar, okay? ;)
Funny - I thought this was a well reasoned discussion about pixel shader architectures myself

You must have spotted all those invisible curses I insert in my posts whenever I mention the N word, then? ;)
 
Uttar said:
Whether using integer could be a good thing doesn't matter anymore since nVidia is not using integer anymore in the NV35

Are you sure of that ??? I haven't seen any proof of that.
 
Back
Top