Will WGF 2.0 video cards be usable in XP/2000?

Bob said:
If by 'random', you mean 'up to the choice of the developer, not the driver', then you'd be right.

yes that whas what i had in mind ;)

i dont work with this, im just a estusiast trying to learn more about it ... just a little newbie :oops:
 
Armored_Spiderman said:
so the internal process is all 100% FP32

Erm...sorry my Vista/D3D10 education is lacking. But does this mean nvidia will no longer be allowed to benefit from _pp hint? That would be a potentially serious performance blow to them.

ERK
 
ERK said:
Erm...sorry my Vista/D3D10 education is lacking. But does this mean nvidia will no longer be allowed to benefit from _pp hint? That would be a potentially serious performance blow to them.

ERK
Correct, there are no _pp instruction modifiers in SM4.0 shader programs in D3D10.

It's only serious if they don't design the architecture to be full speed while doing FP32. The current hardware already is in some circumstances, and I can't imagine them engineering the next-gen hardware (with more temporaries, no skimping on register file size, caches to support it, etc) so that it isn't pretty much full-speed in all processing scenarios.
 
  • Like
Reactions: Geo
Well, sure. But there are ways to turn a profit without using underhanded tactics to eliminate competition. The consumer would be far better off today if, for example, the #1 OS in the nation was GNU/Linux (I also contend that if this were the case, then Linux would be vastly more user-friendly than it is now).

How is this any different than Apple trying to control the digital music market today by tying Ipod to Itunes DRM music only?

Or Sony and AAtrac or whatever it was?

They all try to monopolize to what extent they can get away with legally, pretty much..

In my mind MS is a lot better than Apple about this..

Linux..who knows. I'm not sold on open source being viable. Isn't Vista going to blow linux away in some areas? I think so.

It is just like Firefox. FF is nice but it's no panacea just because it's open source. And FF makes a lot of money themselves anyway.
 
PS: Anyone actually MISS the WGF 2.0 moniker?

I actually kinda liked it..as a way to differentiate something "new". A clean break etc.
 
binky said:
Linux..who knows. I'm not sold on open source being viable. Isn't Vista going to blow linux away in some areas? I think so.

.


i do like Linux, but he fails when we talk about gaming and multimedia.... yes he have capacity to do this... but no one is investing on him in this area....so if you want to play games you have to use windows or windows :LOL:
 
Rys said:
Correct, there are no _pp instruction modifiers in SM4.0 shader programs in D3D10.

It's only serious if they don't design the architecture to be full speed while doing FP32. The current hardware already is in some circumstances, and I can't imagine them engineering the next-gen hardware (with more temporaries, no skimping on register file size, caches to support it, etc) so that it isn't pretty much full-speed in all processing scenarios.

Gad, I hope so. Obviously there will be have to be a DX9 path for new games for quite some time, and it'll be interesting to poke at G80 in that mode to confirm the above.
 
geo said:
Gad, I hope so. Obviously there will be have to be a DX9 path for new games for quite some time, and it'll be interesting to poke at G80 in that mode to confirm the above.
You mean see what a D3D10 part does when you run legacy partial precision shaders through it?

Just a wild guess, but I'd imagine that they'll ignore the PP/half hint and run it at the native 32bit resolution of the chip. PP was always about performance, and if there isn't anything to gain from using it they won't...

Jack
 
JHoxley said:
You mean see what a D3D10 part does when you run legacy partial precision shaders through it?

Just a wild guess, but I'd imagine that they'll ignore the PP/half hint and run it at the native 32bit resolution of the chip. PP was always about performance, and if there isn't anything to gain from using it they won't...

Jack

"From your lips to God's ears!" :D
 
Rys said:
It's only serious if they don't design the architecture to be full speed while doing FP32. The current hardware already is in some circumstances, and I can't imagine them engineering the next-gen hardware (with more temporaries, no skimping on register file size, caches to support it, etc) so that it isn't pretty much full-speed in all processing scenarios.

I don't think that's a workable scenario. Imagine pixel threads using up the max number of registers (32 float4) and a memory latency through the texture unit of 100 cycles on average (optimistic). You'll need at least 32 * 4 * 4 * 100 bytes of memory to store the registers for enough threads to fetch from memory / do texturing at full speed, for one pixel.

That's 50 KB right there. 200 KB for a quad, and a whooping 1.6 MB if you build a 32 pipe machine.

That's a lot of RAM.

Clearly, putting enough register space to run all threads at full speed is not a short term (or even medium term) workable solution.

So now you get to pick how many registers you want to run at full speed. Should it be 16? 12? 8? 5.17? What if you pick 8 and some Important Shader actually needs 9? You don't want to run at 8/9 the speed (12% hit) just for using one more register. Using fp16 (or other smaller representations) for the computations that don't really need fp32 worth of precision lets you squeeze a few more registers out of your full-speed limit, which in turns lets you run more shaders at full speed.

It's also not too hard to build two fp16 MADs that combine into one fp32 MAD unit, so you might even get the benefit of computing 2x faster with fp16.
 
Which means what, Bob? Writing as many large shaders as DX9 as possible in order to continue taking advantage of _pp even on a DX10 part?
 
I should have written, "pretty much full-speed in almost all processing scenarios", which is what I meant. You can always write a shader that'll shit all over the hardware. The challenge is building it so that it does well in the non-pathological cases, as you say.

The balance (ATI choosing 3 temps per fragment for X1K for example, given how the hardware would texture and manage batches) is what you're after. And I got the impression, chatting with Tamasi recently, that it'll be a very nice shader of pixels with that balance 'right'.
 
Xenos supports 12x FP32s per fragment in flight (though part of R0 is reserved as far as I can make out :???: ).

64 threads, each of 64 fragments means 4096 fragments in flight. 16 bytes per FP32 register means 768KB of register file for the fragments in flight.

Then there's another 32 threads, each holding 64 vertices. I don't know how many FP32s are supported there... But assuming the same arrangment, that's another 384KB.

In other words, 1152KB, total, of register file.

Jawed
 
Jawed said:
Aha, I thought it was 2, so it's 3?

Jawed
I think so. Got a tip off from someone it wasn't what ATI were claiming (they say 2), although I've yet to have a proper poke and see if that's right. I'm not sure if 3 makes sense from a register file access point of view (not sure how they port it), but I trust the source.
 
binky said:
How is this any different than Apple trying to control the digital music market today by tying Ipod to Itunes DRM music only?
It's not. And there are current attempts to force software/hardware companies to allow complete interpoerability between DRM software and hardware, which would, for example, allow DRM software to be used on any company's hardware.

Just because it's common practice doesn't mean that you should accept it like sheep. It means you should do what you can to fight against it, no matter how small your contribution might be.
 
ERK said:
Is Bob saying nvidia will defy the SM4 32bit spec and run at 16bit anyway? :oops:
I don't think they can get away with that. Microsoft might start stabbing them with sharp sticks if they tried it...

Jack
 
Back
Top